Skip to main content
Assessing the accuracy, bias and relevance from GPT 3.5 review of articles is very useful. The 70% reduction can be helpful for busy clinicians. However, often one of the first things to be reduced or removed from summaries are methods. This is very important in deciding on the relevance of the results to clinicians. What population was included--age, gender, insurance status, some metric of SES, region of the world, and exclusion criteria such as multi-morbidity. Previous studies have demonstrated that some articles include populations that are not representative of those we see in primary care.
In this study it was not clear if this was considered by the reviewers. Was it this metric that lead to the lower relevance scores? It seems that each reviewer determined their own criteria for each aspect of their scoring.