Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal

Noda, Ryunosuke; Izaki, Yuto; Kitano, Fumiya; Komatsu, Jun; Ichikawa, Daisuke; Shibagaki, Yugo

doi:10.1007/s10157-023-02451-w

Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal

Rapid communication
Published: 14 February 2024

Volume 28, pages 465–469, (2024)
Cite this article

Clinical and Experimental Nephrology Aims and scope Submit manuscript

Ryunosuke Noda ORCID: orcid.org/0000-0002-5472-3277¹,
Yuto Izaki¹,
Fumiya Kitano¹,
Jun Komatsu¹,
Daisuke Ichikawa¹ &
…
Yugo Shibagaki¹

185 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Background

Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications.

Methods

Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents.

Results

The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4’s performance was between third- and fourth-year nephrology residents.

Conclusions

GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs’ potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data availability

Due to the proprietary nature of the data used for this study (Self-Assessment Questions for Nephrology Board Renewal), the authors cannot post the raw data used for the analysis. However, the authors are able to share a part of the collected data (ex. large language model responses, etc.) on request to other researchers who have access to this exam.

References

Zhao WX, Zhou K, Li J et al. A survey of large language models. ArXiv e-prints, 2023 (arXiv:2303.18223).
Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the united states medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9: e45312.
Article PubMed PubMed Central Google Scholar
Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2: e0000198.
Article PubMed PubMed Central Google Scholar
Sallam M. The utility of ChatGPT as an example of large language models in healthcare education, research and practice: systematic review on the future perspectives and potential limitations. MedRxiv e-prints, 2023 (medRxiv: 2023.02.19.23286155v1).
Introducing ChatGPT: OpenAI. https://openai.com/blog/chatgpt/. Published November 30, 2022. Accessed 25 May 25 2023.
Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of GPT-4 on medical challenge problems. ArXiv e-prints, 2023 (arXiv: 2303.13375).
Lum ZC. Can artificial intelligence pass the american board of orthopaedic surgery examination? Orthopaedic residents versus ChatGPT. Clin Orthop Relat Res. 2023. Ahead of Print. DOI: https://doi.org/10.1097/CORR.0000000000002704.
Suchman K, Garg S, Trindade AJ. ChatGPT fails the multiple-choice american college of gastroenterology self-assessment test. Am J Gastroenterol. 2023. Ahead of Print. DOI: https://doi.org/10.14309/ajg.0000000000002320.
Humar P, Asaad M, Bengur FB, Nguyen V. ChatGPT Is equivalent to first-year plastic surgery residents: evaluation of ChatGPT on the plastic surgery in-service examination. Aesthet Surg J 2023. Ahead of Print. https://doi.org/10.1093/asj/sjad130.
Skalidis I, Cagnina A, Luangphiphat W, et al. ChatGPT takes on the European Exam in Core Cardiology: an artificial intelligence success story? Eur Heart J Digit Health. 2023;4:279–81.
Article PubMed PubMed Central Google Scholar
Passby L, Jenko N, Wernham A. Performance of ChatGPT on dermatology Specialty Certificate Examination multiple choice questions. Clin Exp Dermatol 2023. Ahead of Print. https://doi.org/10.1093/ced/llad197.
Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology 2023. Ahead of Print. https://doi.org/10.1148/radiol.230582.
Bard: Google. https://bard.google.com. Accessed 28 May2023.
Overview of the JSN: Japanese society of nephrology. https://jsn.or.jp/en/about-jsn/overview-of-the-jsn/. Accessed 26 May 2023.
Self-assessment questions for nephrology board renewal: Japanese society of nephrology. (in Japanese) https://jsn.or.jp/medic/specialistsystem/question-unitupdate.php. Accessed 26 May 2023
Uemura K. Exam preparation and taxonomy. Med Edu (in Japanese) 1987;13:315–20.
List of nephrologist experienced cases: Japanese society of nephrology. (in Japanese) https://jsn.or.jp/education-specialist-committee/file-02_20210829.pdf. Accessed 26 May 26 2023.
Kasai J, Kasai Y, Sakaguchi K, Yamada Y, Radev D. Evaluating GPT-4 and ChatGPT on Japanese medical licensing examinations. ArXiv e-prints, 2023 (arXiv: 2303.18027).
Teebagy S, Colwell L, Wood E, Yaghy A, Faustina M. Improved performance of ChatGPT-4 on the OKAP exam: a comparative study with ChatGPT-3.5. MedRxiv e-prints, 2023 (medRxiv: 2023.04.03.23287957v1).
Ali R, Tang OY, Connolly ID et al. Performance of ChatGPT, GPT-4, and google bard on a neurosurgery oral boards preparation question bank. MedRxiv e-prints, 2023 (medRxiv: 2023.04.06.23288265v1).
Ali R, Tang OY, Connolly ID et al. Performance of ChatGPT and GPT-4 on neurosurgery written board examinations. MedRxiv e-prints, 2023 (medRxiv: 2023.03.25.23287743v1).
Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, Child R, et al. Scaling laws for neural language models. ArXiv e-prints, 2020 (arXiv: 2001.08361).

Download references

Acknowledgements

We would like to thank Editage for editing and reviewing this manuscript for English language.

Author information

Authors and Affiliations

Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan
Ryunosuke Noda, Yuto Izaki, Fumiya Kitano, Jun Komatsu, Daisuke Ichikawa & Yugo Shibagaki

Authors

Ryunosuke Noda
View author publications
You can also search for this author in PubMed Google Scholar
Yuto Izaki
View author publications
You can also search for this author in PubMed Google Scholar
Fumiya Kitano
View author publications
You can also search for this author in PubMed Google Scholar
Jun Komatsu
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Ichikawa
View author publications
You can also search for this author in PubMed Google Scholar
Yugo Shibagaki
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RN, DI, and YS participated in the writing of the paper. YI, FK, and JK participated in answering the exam. RN, YI, FK, JK, DI, and YS participated in the approval of the final manuscript.

Corresponding author

Correspondence to Ryunosuke Noda.

Ethics declarations

Conflict of interest

All authors declare no conflict of interest. No funding was received for this study.

Ethics approval

We consulted with the Representative of the Ethics Committee Members at St. Marianna University School of Medicine, our affiliated institution. After careful review, it was determined that the study did not involve patients and was based on the voluntary participation of our medical colleagues, and the committee concluded that Institutional Review Board approval was not required for this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 566 KB)

About this article

Cite this article

Noda, R., Izaki, Y., Kitano, F. et al. Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal. Clin Exp Nephrol 28, 465–469 (2024). https://doi.org/10.1007/s10157-023-02451-w

Download citation

Received: 15 June 2023
Accepted: 25 December 2023
Published: 14 February 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s10157-023-02451-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal

Abstract

Background

Methods

Results

Conclusions

Access this article

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 566 KB)

About this article

Cite this article

Share this article

Keywords

Search

Navigation