TY - JOUR
T1 - Utilizing large language models in breast cancer management
T2 - systematic review
AU - Sorin, Vera
AU - Glicksberg, Benjamin S.
AU - Artsi, Yaara
AU - Barash, Yiftach
AU - Konen, Eli
AU - Nadkarni, Girish N.
AU - Klang, Eyal
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/3
Y1 - 2024/3
N2 - Purpose: Despite advanced technologies in breast cancer management, challenges remain in efficiently interpreting vast clinical data for patient-specific insights. We reviewed the literature on how large language models (LLMs) such as ChatGPT might offer solutions in this field. Methods: We searched MEDLINE for relevant studies published before December 22, 2023. Keywords included: “large language models”, “LLM”, “GPT”, “ChatGPT”, “OpenAI”, and “breast”. The risk bias was evaluated using the QUADAS-2 tool. Results: Six studies evaluating either ChatGPT-3.5 or GPT-4, met our inclusion criteria. They explored clinical notes analysis, guideline-based question-answering, and patient management recommendations. Accuracy varied between studies, ranging from 50 to 98%. Higher accuracy was seen in structured tasks like information retrieval. Half of the studies used real patient data, adding practical clinical value. Challenges included inconsistent accuracy, dependency on the way questions are posed (prompt-dependency), and in some cases, missing critical clinical information. Conclusion: LLMs hold potential in breast cancer care, especially in textual information extraction and guideline-driven clinical question-answering. Yet, their inconsistent accuracy underscores the need for careful validation of these models, and the importance of ongoing supervision.
AB - Purpose: Despite advanced technologies in breast cancer management, challenges remain in efficiently interpreting vast clinical data for patient-specific insights. We reviewed the literature on how large language models (LLMs) such as ChatGPT might offer solutions in this field. Methods: We searched MEDLINE for relevant studies published before December 22, 2023. Keywords included: “large language models”, “LLM”, “GPT”, “ChatGPT”, “OpenAI”, and “breast”. The risk bias was evaluated using the QUADAS-2 tool. Results: Six studies evaluating either ChatGPT-3.5 or GPT-4, met our inclusion criteria. They explored clinical notes analysis, guideline-based question-answering, and patient management recommendations. Accuracy varied between studies, ranging from 50 to 98%. Higher accuracy was seen in structured tasks like information retrieval. Half of the studies used real patient data, adding practical clinical value. Challenges included inconsistent accuracy, dependency on the way questions are posed (prompt-dependency), and in some cases, missing critical clinical information. Conclusion: LLMs hold potential in breast cancer care, especially in textual information extraction and guideline-driven clinical question-answering. Yet, their inconsistent accuracy underscores the need for careful validation of these models, and the importance of ongoing supervision.
KW - Artificial intelligence
KW - Breast cancer
KW - GPT
KW - Large language models
UR - http://www.scopus.com/inward/record.url?scp=85188072334&partnerID=8YFLogxK
U2 - 10.1007/s00432-024-05678-6
DO - 10.1007/s00432-024-05678-6
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.systematicreview???
C2 - 38504034
AN - SCOPUS:85188072334
SN - 0171-5216
VL - 150
JO - Journal of Cancer Research and Clinical Oncology
JF - Journal of Cancer Research and Clinical Oncology
IS - 3
M1 - 140
ER -