Skip to main content

Advertisement

Log in

Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal

  • Rapid communication
  • Published:
Clinical and Experimental Nephrology Aims and scope Submit manuscript

Abstract

Background

Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications.

Methods

Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents.

Results

The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4’s performance was between third- and fourth-year nephrology residents.

Conclusions

GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs’ potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Data availability

Due to the proprietary nature of the data used for this study (Self-Assessment Questions for Nephrology Board Renewal), the authors cannot post the raw data used for the analysis. However, the authors are able to share a part of the collected data (ex. large language model responses, etc.) on request to other researchers who have access to this exam.

References

  1. Zhao WX, Zhou K, Li J et al. A survey of large language models. ArXiv e-prints, 2023 (arXiv:2303.18223).

  2. Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the united states medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9: e45312.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2: e0000198.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Sallam M. The utility of ChatGPT as an example of large language models in healthcare education, research and practice: systematic review on the future perspectives and potential limitations. MedRxiv e-prints, 2023 (medRxiv: 2023.02.19.23286155v1).

  5. Introducing ChatGPT: OpenAI. https://openai.com/blog/chatgpt/. Published November 30, 2022. Accessed 25 May 25 2023.

  6. Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of GPT-4 on medical challenge problems. ArXiv e-prints, 2023 (arXiv: 2303.13375).

  7. Lum ZC. Can artificial intelligence pass the american board of orthopaedic surgery examination? Orthopaedic residents versus ChatGPT. Clin Orthop Relat Res. 2023. Ahead of Print. DOI: https://doi.org/10.1097/CORR.0000000000002704.

  8. Suchman K, Garg S, Trindade AJ. ChatGPT fails the multiple-choice american college of gastroenterology self-assessment test. Am J Gastroenterol. 2023. Ahead of Print. DOI: https://doi.org/10.14309/ajg.0000000000002320.

  9. Humar P, Asaad M, Bengur FB, Nguyen V. ChatGPT Is equivalent to first-year plastic surgery residents: evaluation of ChatGPT on the plastic surgery in-service examination. Aesthet Surg J 2023. Ahead of Print. https://doi.org/10.1093/asj/sjad130.

  10. Skalidis I, Cagnina A, Luangphiphat W, et al. ChatGPT takes on the European Exam in Core Cardiology: an artificial intelligence success story? Eur Heart J Digit Health. 2023;4:279–81.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Passby L, Jenko N, Wernham A. Performance of ChatGPT on dermatology Specialty Certificate Examination multiple choice questions. Clin Exp Dermatol 2023. Ahead of Print. https://doi.org/10.1093/ced/llad197.

  12. Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology 2023. Ahead of Print. https://doi.org/10.1148/radiol.230582.

  13. Bard: Google. https://bard.google.com. Accessed 28 May2023.

  14. Overview of the JSN: Japanese society of nephrology. https://jsn.or.jp/en/about-jsn/overview-of-the-jsn/. Accessed 26 May 2023.

  15. Self-assessment questions for nephrology board renewal: Japanese society of nephrology. (in Japanese) https://jsn.or.jp/medic/specialistsystem/question-unitupdate.php. Accessed 26 May 2023

  16. Uemura K. Exam preparation and taxonomy. Med Edu (in Japanese) 1987;13:315–20.

  17. List of nephrologist experienced cases: Japanese society of nephrology. (in Japanese) https://jsn.or.jp/education-specialist-committee/file-02_20210829.pdf. Accessed 26 May 26 2023.

  18. Kasai J, Kasai Y, Sakaguchi K, Yamada Y, Radev D. Evaluating GPT-4 and ChatGPT on Japanese medical licensing examinations. ArXiv e-prints, 2023 (arXiv: 2303.18027).

  19. Teebagy S, Colwell L, Wood E, Yaghy A, Faustina M. Improved performance of ChatGPT-4 on the OKAP exam: a comparative study with ChatGPT-3.5. MedRxiv e-prints, 2023 (medRxiv: 2023.04.03.23287957v1).

  20. Ali R, Tang OY, Connolly ID et al. Performance of ChatGPT, GPT-4, and google bard on a neurosurgery oral boards preparation question bank. MedRxiv e-prints, 2023 (medRxiv: 2023.04.06.23288265v1).

  21. Ali R, Tang OY, Connolly ID et al. Performance of ChatGPT and GPT-4 on neurosurgery written board examinations. MedRxiv e-prints, 2023 (medRxiv: 2023.03.25.23287743v1).

  22. Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, Child R, et al. Scaling laws for neural language models. ArXiv e-prints, 2020 (arXiv: 2001.08361).

Download references

Acknowledgements

We would like to thank Editage for editing and reviewing this manuscript for English language.

Author information

Authors and Affiliations

Authors

Contributions

RN, DI, and YS participated in the writing of the paper. YI, FK, and JK participated in answering the exam. RN, YI, FK, JK, DI, and YS participated in the approval of the final manuscript.

Corresponding author

Correspondence to Ryunosuke Noda.

Ethics declarations

Conflict of interest

All authors declare no conflict of interest. No funding was received for this study.

Ethics approval

We consulted with the Representative of the Ethics Committee Members at St. Marianna University School of Medicine, our affiliated institution. After careful review, it was determined that the study did not involve patients and was based on the voluntary participation of our medical colleagues, and the committee concluded that Institutional Review Board approval was not required for this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 566 KB)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Noda, R., Izaki, Y., Kitano, F. et al. Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal. Clin Exp Nephrol 28, 465–469 (2024). https://doi.org/10.1007/s10157-023-02451-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10157-023-02451-w

Keywords

Navigation