Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations

Adiel Cohen*, Roie Alter, Naama Lessans, Raanan Meyer, Yoav Brezinov, Gabriel Levin

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Purpose: Previous studies of ChatGPT performance in the field of medical examinations have reached contradictory results. Moreover, the performance of ChatGPT in other languages other than English is yet to be explored. We aim to study the performance of ChatGPT in Hebrew OBGYN-‘Shlav-Alef’ (Phase 1) examination. Methods: A performance study was conducted using a consecutive sample of text-based multiple choice questions, originated from authentic Hebrew OBGYN-‘Shlav-Alef’ examinations in 2021–2022. We constructed 150 multiple choice questions from consecutive text-based-only original questions. We compared the performance of ChatGPT performance to the real-life actual performance of OBGYN residents who completed the tests in 2021–2022. We also compared ChatGTP Hebrew performance vs. previously published English medical tests. Results: In 2021–2022, 27.8% of OBGYN residents failed the ‘Shlav-Alef’ examination and the mean score of the residents was 68.4. Overall, 150 authentic questions were evaluated (one examination). ChatGPT correctly answered 58 questions (38.7%) and reached a failed score. The performance of Hebrew ChatGPT was lower when compared to actual performance of residents: 38.7% vs. 68.4%, p <.001. In a comparison to ChatGPT performance in 9,091 English language questions in the field of medicine, the performance of Hebrew ChatGPT was lower (38.7% in Hebrew vs. 60.7% in English, p <.001). Conclusions: ChatGPT answered correctly on less than 40% of Hebrew OBGYN resident examination questions. Residents cannot rely on ChatGPT for the preparation of this examination. Efforts should be made to improve ChatGPT performance in other languages besides English.

Original languageEnglish
Pages (from-to)1797-1802
Number of pages6
JournalArchives of Gynecology and Obstetrics
Volume308
Issue number6
DOIs
StatePublished - Dec 2023

Keywords

  • ChatGPT
  • Hebrew
  • OBGYN
  • Performance
  • Test

Fingerprint

Dive into the research topics of 'Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations'. Together they form a unique fingerprint.

Cite this