Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations

Cohen, Adiel; Alter, Roie; Lessans, Naama; Meyer, Raanan; Brezinov, Yoav; Levin, Gabriel

doi:10.1007/s00404-023-07185-4

Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations

General Gynecology
Published: 05 September 2023

Volume 308, pages 1797–1802, (2023)
Cite this article

Archives of Gynecology and Obstetrics Aims and scope Submit manuscript

Adiel Cohen ORCID: orcid.org/0000-0002-6218-1156¹^na1,
Roie Alter¹^na1,
Naama Lessans¹,
Raanan Meyer^3,4,5,
Yoav Brezinov² &
…
Gabriel Levin^2,6

497 Accesses
4 Citations
Explore all metrics

Abstract

Purpose

Previous studies of ChatGPT performance in the field of medical examinations have reached contradictory results. Moreover, the performance of ChatGPT in other languages other than English is yet to be explored. We aim to study the performance of ChatGPT in Hebrew OBGYN-‘Shlav-Alef’ (Phase 1) examination.

Methods

A performance study was conducted using a consecutive sample of text-based multiple choice questions, originated from authentic Hebrew OBGYN-‘Shlav-Alef’ examinations in 2021–2022. We constructed 150 multiple choice questions from consecutive text-based-only original questions. We compared the performance of ChatGPT performance to the real-life actual performance of OBGYN residents who completed the tests in 2021–2022. We also compared ChatGTP Hebrew performance vs. previously published English medical tests.

Results

In 2021–2022, 27.8% of OBGYN residents failed the ‘Shlav-Alef’ examination and the mean score of the residents was 68.4. Overall, 150 authentic questions were evaluated (one examination). ChatGPT correctly answered 58 questions (38.7%) and reached a failed score. The performance of Hebrew ChatGPT was lower when compared to actual performance of residents: 38.7% vs. 68.4%, p < .001. In a comparison to ChatGPT performance in 9,091 English language questions in the field of medicine, the performance of Hebrew ChatGPT was lower (38.7% in Hebrew vs. 60.7% in English, p < .001).

Conclusions

ChatGPT answered correctly on less than 40% of Hebrew OBGYN resident examination questions. Residents cannot rely on ChatGPT for the preparation of this examination. Efforts should be made to improve ChatGPT performance in other languages besides English.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions—an observational study

Article 15 April 2024

Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses

Article Open access 14 February 2024

Evaluating ChatGPT-4 in medical education: an assessment of subject exam performance reveals limitations in clinical curriculum support for students

Article Open access 16 May 2024

Data availability

The data that support the findings of this study are available from the authors on a reasonable request.

References

Cox SM et al (1994) Assessment of the resident in-training examination in obstetrics and gynecology. Obstet Gynecol 84(6):1051–1054
CAS PubMed Google Scholar
Hollier LM et al (2002) Effect of a resident-created study guide on examination scores. Obstet Gynecol 99(1):95–100
PubMed Google Scholar
Withiam-Leitch M, Olawaiye A (2008) Resident performance on the in-training and board examinations in obstetrics and gynecology: implications for the ACGME outcome project. Teach Learn Med 20(2):136–142
Article PubMed Google Scholar
Association IM Residency information booklet. Available at: https://www.ima.org.il/internesnew/viewcategory.aspx?categoryid=7016#.UnoBaEoUGJA. Accessed 22 August 2023
Pekar Zlotin M et al (2022) Preparation for final board exam in obstetrics and gynecology following the outbreak of the COVID 19 pandemic. Harefuah 161(2):125–126
PubMed Google Scholar
Soong TK, Ho CM (2021) Artificial Intelligence in medical OSCEs: reflections and future developments. Adv Med Educ Pract 12:167–173
Article PubMed PubMed Central Google Scholar
van Dis EAM et al (2023) ChatGPT: five priorities for research. Nature 614(7947):224–226
Article PubMed Google Scholar
ChatGPT, Available at: https://openai.com/blog/chatgpt. Accessed 22 August 2023
Arif TB, Munaf U, Ul-Haque I (2023) The future of medical education and research: Is ChatGPT a blessing or blight in disguise? Med Educ Online 28(1):2181052
Article PubMed PubMed Central Google Scholar
Gilson A et al (2023) How Does ChatGPT Perform on the United States medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312
Article PubMed PubMed Central Google Scholar
Kung TH et al (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2(2):e0000198
Article PubMed PubMed Central Google Scholar
Humar P, et al (2023) ChatGPT is Equivalent to First Year Plastic Surgery Residents: Evaluation of ChatGPT on the Plastic Surgery In-Service Exam. Aesthet Surg J sjad130. https://doi.org/10.1093/asj/sjad130
Gupta R, et al (2023) Performance of ChatGPT on the Plastic Surgery Inservice Training Examination. Aesthet Surg J sjad128. https://doi.org/10.1093/asj/sjad128
Mihalache A, Popovic MM, Muni RH (2023) Performance of an artificial intelligence Chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. https://doi.org/10.1001/jamaophthalmol.2023.2754
Article PubMed Google Scholar
Giannos P, Delardas O (2023) Performance of ChatGPT on UK standardized admission tests: insights from the BMAT, TMUA, LNAT, and TSA examinations. JMIR Med Educ 9:e47737
Article PubMed PubMed Central Google Scholar
Nakhleh A, Spitzer S, Shehadeh N (2023) ChatGPT’s response to the diabetes knowledge questionnaire: implications for diabetes education. Diabetes Technol Ther 25(8):571–573
Article PubMed Google Scholar
Subramani M, Jaleel I, Krishna Mohan S (2023) Evaluating the performance of ChatGPT in medical physiology university examination of phase I MBBS. Adv Physiol Educ 47(2):270–271
Article PubMed Google Scholar
Hopkins BS et al (2023) ChatGPT versus the neurosurgical written boards: a comparative analysis of artificial intelligence/machine learning performance on neurosurgical board-style questions. J Neurosurg 1:8
Google Scholar
Fijačko N et al (2023) Can ChatGPT Pass the life support exams without entering the american heart association course? Resuscitation 185:109732
Article PubMed Google Scholar
Huh S (2023) Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study. J Educ Eval Health Prof 20:1
PubMed PubMed Central Google Scholar
Wang YM, Shen HW, Chen TJ (2023) Performance of ChatGPT on the pharmacist licensing examination in Taiwan. J Chin Med Assoc 86(7):653–658
Article PubMed Google Scholar
Lum ZC (2023) Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT, Clin Orthop Relat Res
Google Scholar
Suchman K, Garg S, Trindade AJ (2023) ChatGPT Fails the multiple-choice American college of gastroenterology self-assessment test. Am J Gastroenterol. https://doi.org/10.14309/ajg.0000000000002320
Article PubMed Google Scholar
Birkett L, Fowler T, Pullen S (2023) Performance of ChatGPT on a primary FRCA multiple choice question bank. Br J Anaesth 131(2):e34–e35
Article PubMed Google Scholar
Shay D et al (2023) Assessment of ChatGPT success with specialty medical knowledge using anaesthesiology board examination practice questions. Br J Anaesth 131(2):e31–e34
Article PubMed Google Scholar
Bhayana R, Krishna S, Bleakney RR (2023) Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology 307(5):230582
Article Google Scholar
Deng J, Lin Y (2023) The benefits and challenges of ChatGPT: an overview. Front Comput Intell Syst. 2(2):81–83
Article Google Scholar
Levin G et al (2023) Identifying ChatGPT-written OBGYN abstracts using a simple tool. Am J Obstet Gynecol MFM 5(6):100936
Article PubMed Google Scholar
Levin G et al (2023) ChatGPT-written OBGYN abstracts fool practitioners. Am J Obstet Gynecol MFM 5(8):100993
Article PubMed Google Scholar
Biswas S (2023) ChatGPT and the future of medical writing. Radiology 307(2):223312
Article Google Scholar

Download references

Funding

No funding or support was obtained for this study.

Author information

Adiel Cohen and Roie Alter have contributed equally for this work.

Authors and Affiliations

Department of Obstetrics and Gynecology, Hadassah Medical Organization and Faculty of Medicine, Hebrew University of Jerusalem, Ein Kerem, P.O.B. 12000, 91120, Jerusalem, Israel
Adiel Cohen, Roie Alter & Naama Lessans
Lady Davis Institute for Cancer Research, Jewish General Hospital, McGill University, Montreal, Canada
Yoav Brezinov & Gabriel Levin
Department of Obstetrics and Gynecology, Chaim Sheba Medical Center, Ramat-Gan, Israel
Raanan Meyer
Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel
Raanan Meyer
Cedar-Sinai Medical Center, Los Angeles, USA
Raanan Meyer
The Department of Gynecoloic Oncology, Hadassah Medical Center, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
Gabriel Levin

Authors

Adiel Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Roie Alter
View author publications
You can also search for this author in PubMed Google Scholar
Naama Lessans
View author publications
You can also search for this author in PubMed Google Scholar
Raanan Meyer
View author publications
You can also search for this author in PubMed Google Scholar
Yoav Brezinov
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Levin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

GL, AC and RA: project development, data collection and management, data analysis, manuscript writing/editing. NL, RM and YB: data collection and management, manuscript writing/editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Adiel Cohen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

We used American Association for Public Opinion Research (AAPOR) reporting guidelines. This study did not require ethics approval, as we used only publicly accessible data and no human participants were involved.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cohen, A., Alter, R., Lessans, N. et al. Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations. Arch Gynecol Obstet 308, 1797–1802 (2023). https://doi.org/10.1007/s00404-023-07185-4

Download citation

Received: 11 June 2023
Accepted: 02 August 2023
Published: 05 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00404-023-07185-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations