Skip to main content

Advertisement

Log in

Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations

  • General Gynecology
  • Published:
Archives of Gynecology and Obstetrics Aims and scope Submit manuscript

Abstract

Purpose

Previous studies of ChatGPT performance in the field of medical examinations have reached contradictory results. Moreover, the performance of ChatGPT in other languages other than English is yet to be explored. We aim to study the performance of ChatGPT in Hebrew OBGYN-‘Shlav-Alef’ (Phase 1) examination.

Methods

A performance study was conducted using a consecutive sample of text-based multiple choice questions, originated from authentic Hebrew OBGYN-‘Shlav-Alef’ examinations in 2021–2022. We constructed 150 multiple choice questions from consecutive text-based-only original questions. We compared the performance of ChatGPT performance to the real-life actual performance of OBGYN residents who completed the tests in 2021–2022. We also compared ChatGTP Hebrew performance vs. previously published English medical tests.

Results

In 2021–2022, 27.8% of OBGYN residents failed the ‘Shlav-Alef’ examination and the mean score of the residents was 68.4. Overall, 150 authentic questions were evaluated (one examination). ChatGPT correctly answered 58 questions (38.7%) and reached a failed score. The performance of Hebrew ChatGPT was lower when compared to actual performance of residents: 38.7% vs. 68.4%, p < .001. In a comparison to ChatGPT performance in 9,091 English language questions in the field of medicine, the performance of Hebrew ChatGPT was lower (38.7% in Hebrew vs. 60.7% in English, p < .001).

Conclusions

ChatGPT answered correctly on less than 40% of Hebrew OBGYN resident examination questions. Residents cannot rely on ChatGPT for the preparation of this examination. Efforts should be made to improve ChatGPT performance in other languages besides English.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the authors on a reasonable request.

References

  1. Cox SM et al (1994) Assessment of the resident in-training examination in obstetrics and gynecology. Obstet Gynecol 84(6):1051–1054

    CAS  PubMed  Google Scholar 

  2. Hollier LM et al (2002) Effect of a resident-created study guide on examination scores. Obstet Gynecol 99(1):95–100

    PubMed  Google Scholar 

  3. Withiam-Leitch M, Olawaiye A (2008) Resident performance on the in-training and board examinations in obstetrics and gynecology: implications for the ACGME outcome project. Teach Learn Med 20(2):136–142

    Article  PubMed  Google Scholar 

  4. Association IM Residency information booklet. Available at: https://www.ima.org.il/internesnew/viewcategory.aspx?categoryid=7016#.UnoBaEoUGJA. Accessed 22 August 2023

  5. Pekar Zlotin M et al (2022) Preparation for final board exam in obstetrics and gynecology following the outbreak of the COVID 19 pandemic. Harefuah 161(2):125–126

    PubMed  Google Scholar 

  6. Soong TK, Ho CM (2021) Artificial Intelligence in medical OSCEs: reflections and future developments. Adv Med Educ Pract 12:167–173

    Article  PubMed  PubMed Central  Google Scholar 

  7. van Dis EAM et al (2023) ChatGPT: five priorities for research. Nature 614(7947):224–226

    Article  PubMed  Google Scholar 

  8. ChatGPT, Available at: https://openai.com/blog/chatgpt. Accessed 22 August 2023

  9. Arif TB, Munaf U, Ul-Haque I (2023) The future of medical education and research: Is ChatGPT a blessing or blight in disguise? Med Educ Online 28(1):2181052

    Article  PubMed  PubMed Central  Google Scholar 

  10. Gilson A et al (2023) How Does ChatGPT Perform on the United States medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312

    Article  PubMed  PubMed Central  Google Scholar 

  11. Kung TH et al (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2(2):e0000198

    Article  PubMed  PubMed Central  Google Scholar 

  12. Humar P, et al (2023) ChatGPT is Equivalent to First Year Plastic Surgery Residents: Evaluation of ChatGPT on the Plastic Surgery In-Service Exam. Aesthet Surg J sjad130. https://doi.org/10.1093/asj/sjad130

  13. Gupta R, et al (2023) Performance of ChatGPT on the Plastic Surgery Inservice Training Examination. Aesthet Surg J sjad128. https://doi.org/10.1093/asj/sjad128

  14. Mihalache A, Popovic MM, Muni RH (2023) Performance of an artificial intelligence Chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. https://doi.org/10.1001/jamaophthalmol.2023.2754

    Article  PubMed  Google Scholar 

  15. Giannos P, Delardas O (2023) Performance of ChatGPT on UK standardized admission tests: insights from the BMAT, TMUA, LNAT, and TSA examinations. JMIR Med Educ 9:e47737

    Article  PubMed  PubMed Central  Google Scholar 

  16. Nakhleh A, Spitzer S, Shehadeh N (2023) ChatGPT’s response to the diabetes knowledge questionnaire: implications for diabetes education. Diabetes Technol Ther 25(8):571–573

    Article  PubMed  Google Scholar 

  17. Subramani M, Jaleel I, Krishna Mohan S (2023) Evaluating the performance of ChatGPT in medical physiology university examination of phase I MBBS. Adv Physiol Educ 47(2):270–271

    Article  PubMed  Google Scholar 

  18. Hopkins BS et al (2023) ChatGPT versus the neurosurgical written boards: a comparative analysis of artificial intelligence/machine learning performance on neurosurgical board-style questions. J Neurosurg 1:8

    Google Scholar 

  19. Fijačko N et al (2023) Can ChatGPT Pass the life support exams without entering the american heart association course? Resuscitation 185:109732

    Article  PubMed  Google Scholar 

  20. Huh S (2023) Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study. J Educ Eval Health Prof 20:1

    PubMed  PubMed Central  Google Scholar 

  21. Wang YM, Shen HW, Chen TJ (2023) Performance of ChatGPT on the pharmacist licensing examination in Taiwan. J Chin Med Assoc 86(7):653–658

    Article  PubMed  Google Scholar 

  22. Lum ZC (2023) Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT, Clin Orthop Relat Res

    Google Scholar 

  23. Suchman K, Garg S, Trindade AJ (2023) ChatGPT Fails the multiple-choice American college of gastroenterology self-assessment test. Am J Gastroenterol. https://doi.org/10.14309/ajg.0000000000002320

    Article  PubMed  Google Scholar 

  24. Birkett L, Fowler T, Pullen S (2023) Performance of ChatGPT on a primary FRCA multiple choice question bank. Br J Anaesth 131(2):e34–e35

    Article  PubMed  Google Scholar 

  25. Shay D et al (2023) Assessment of ChatGPT success with specialty medical knowledge using anaesthesiology board examination practice questions. Br J Anaesth 131(2):e31–e34

    Article  PubMed  Google Scholar 

  26. Bhayana R, Krishna S, Bleakney RR (2023) Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology 307(5):230582

    Article  Google Scholar 

  27. Deng J, Lin Y (2023) The benefits and challenges of ChatGPT: an overview. Front Comput Intell Syst. 2(2):81–83

    Article  Google Scholar 

  28. Levin G et al (2023) Identifying ChatGPT-written OBGYN abstracts using a simple tool. Am J Obstet Gynecol MFM 5(6):100936

    Article  PubMed  Google Scholar 

  29. Levin G et al (2023) ChatGPT-written OBGYN abstracts fool practitioners. Am J Obstet Gynecol MFM 5(8):100993

    Article  PubMed  Google Scholar 

  30. Biswas S (2023) ChatGPT and the future of medical writing. Radiology 307(2):223312

    Article  Google Scholar 

Download references

Funding

No funding or support was obtained for this study.

Author information

Authors and Affiliations

Authors

Contributions

GL, AC and RA: project development, data collection and management, data analysis, manuscript writing/editing. NL, RM and YB: data collection and management, manuscript writing/editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Adiel Cohen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

We used American Association for Public Opinion Research (AAPOR) reporting guidelines. This study did not require ethics approval, as we used only publicly accessible data and no human participants were involved.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cohen, A., Alter, R., Lessans, N. et al. Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations. Arch Gynecol Obstet 308, 1797–1802 (2023). https://doi.org/10.1007/s00404-023-07185-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00404-023-07185-4

Keywords

Navigation