TY - JOUR
T1 - Generative pre-trained transformer (GPT)-4 support for differential diagnosis in neuroradiology
AU - Sorin, Vera
AU - Klang, Eyal
AU - Sobeh, Tamer
AU - Konen, Eli
AU - Shrot, Shai
AU - Livne, Adva
AU - Weissbuch, Yulian
AU - Hoffmann, Chen
AU - Barash, Yiftach
N1 - Publisher Copyright:
© AME Publishing Company.
PY - 2024/10
Y1 - 2024/10
N2 - Background: Differential diagnosis in radiology relies on the accurate identification of imaging patterns. The use of large language models (LLMs) in radiology holds promise, with many potential applications that may enhance the efficiency of radiologists’ workflow. The study aimed to evaluate the efficacy of generative pre-trained transformer (GPT)-4, a LLM, in providing differential diagnoses in neuroradiology, comparing its performance with board-certified neuroradiologists. Methods: Sixty neuroradiology reports with variable diagnoses were inserted into GPT-4, which was tasked with generating a top-3 differential diagnosis for each case. The results were compared to the true diagnoses and to the differential diagnoses provided by three blinded neuroradiologists. Diagnostic accuracy and agreement between readers were assessed. Results: Of the 60 patients (mean age 47.8 years, 65% female), GPT-4 correctly included the diagnoses in its differentials in 61.7% (37/60) of cases, while the neuroradiologists’ accuracy ranged from 63.3% (38/60) to 73.3% (44/60). Agreement between GPT-4 and the neuroradiologists, and among the neuroradiologists was fair to moderate [Cohen’s kappa (kw) 0.34–0.44 and kw 0.39–0.54, respectively]. Conclusions: GPT-4 shows potential as a support tool for differential diagnosis in neuroradiology, though it was outperformed by human experts. Radiologists should remain mindful to the limitations of LLMs, while harboring their potential to enhance educational and clinical work.
AB - Background: Differential diagnosis in radiology relies on the accurate identification of imaging patterns. The use of large language models (LLMs) in radiology holds promise, with many potential applications that may enhance the efficiency of radiologists’ workflow. The study aimed to evaluate the efficacy of generative pre-trained transformer (GPT)-4, a LLM, in providing differential diagnoses in neuroradiology, comparing its performance with board-certified neuroradiologists. Methods: Sixty neuroradiology reports with variable diagnoses were inserted into GPT-4, which was tasked with generating a top-3 differential diagnosis for each case. The results were compared to the true diagnoses and to the differential diagnoses provided by three blinded neuroradiologists. Diagnostic accuracy and agreement between readers were assessed. Results: Of the 60 patients (mean age 47.8 years, 65% female), GPT-4 correctly included the diagnoses in its differentials in 61.7% (37/60) of cases, while the neuroradiologists’ accuracy ranged from 63.3% (38/60) to 73.3% (44/60). Agreement between GPT-4 and the neuroradiologists, and among the neuroradiologists was fair to moderate [Cohen’s kappa (kw) 0.34–0.44 and kw 0.39–0.54, respectively]. Conclusions: GPT-4 shows potential as a support tool for differential diagnosis in neuroradiology, though it was outperformed by human experts. Radiologists should remain mindful to the limitations of LLMs, while harboring their potential to enhance educational and clinical work.
KW - differential diagnosis
KW - generative pre-trained transformer (GPT)
KW - Large language models (LLMs)
KW - neuroradiology
UR - http://www.scopus.com/inward/record.url?scp=85205545521&partnerID=8YFLogxK
U2 - 10.21037/qims-24-200
DO - 10.21037/qims-24-200
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85205545521
SN - 2223-4292
VL - 14
SP - 7551
EP - 7560
JO - Quantitative Imaging in Medicine and Surgery
JF - Quantitative Imaging in Medicine and Surgery
IS - 10
ER -