Abstract
Purpose
According to a previous research, the chatbot ChatGPT® V3.5 was unable to pass the first part of the European Board of Hand Surgery (EBHS) diploma examination. This study aimed to investigate whether Google's chatbot Bard® would have superior performance compared to ChatGPT on the EBHS diploma examination.
Methods
Chatbots were asked to answer 18 EBHS multiple choice questions (MCQs) published in the Journal of Hand Surgery (European Volume) in five trials (A1 to A5). After A3, chatbots received correct answers, and after A4, incorrect answers. Consequently, their ability to modify their response was measured and compared.
Results
Bard® scored 3/18 (A1), 1/18 (A2), 4/18 (A3) and 2/18 (A4 and A5). The average percentage of correct answers was 61.1% for A1, 62.2% for A2, 64.4% for A3, 65.6% for A4, 63.3% for A5 and 63.3% for all trials combined. Agreement was moderate from A1 to A5 (kappa = 0.62 (IC95% = [0.51; 0.73])) as well as from A1 to A3 (kappa = 0.60 (IC95% = [0.47; 0.74])). The formulation of Bard® responses was homogeneous, but its learning capacity is still developing.
Conclusions
The main hypothesis of our study was not proved since Bard did not score significantly higher than ChatGPT when answering the MCQs of the EBHS diploma exam. In conclusion, neither ChatGPT® nor Bard®, in their current versions, can pass the first part of the EBHS diploma exam.
Similar content being viewed by others
References
Gordijn B, Have HT (2023) ChatGPT: evolution or revolution? Med Health Care Philos 26(1):1–2. https://doi.org/10.1007/s11019-023-10136-0
Wang A, Qian Z, Briggs L, Cole AP, Reis LO, Trinh QD (2023) The use of chatbots in oncological care: a narrative review. Int J Gen Med 16:1591–1602. https://doi.org/10.2147/IJGM.S408208
Sallam M (2023) ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel, Switzerland) 11(6):887. https://doi.org/10.3390/healthcare11060887
Klímová B, Ibna Seraj PM (2023) The use of chatbots in university EFL settings: research trends and pedagogical implications. Front Psychol 14:1131506. https://doi.org/10.3389/fpsyg.2023.1131506
Lee JY (2023) Can an artificial intelligence chatbot be the author of a scholarly article? J Educ Eval Health Prof 20:6. https://doi.org/10.3352/jeehp.2023.20.6
Salvagno M, Taccone FS, Gerli AG (2023) Can artificial intelligence help for scientific writing? Critical care (London, England) 27(1):75. https://doi.org/10.1186/s13054-023-04380-2
Miller R, Farnebo S, Horwitz MD (2023) Insights and trends review: artificial intelligence in hand surgery. J Hand Surg Eur 48(5):396–403. https://doi.org/10.1177/17531934231152592
Muir L, Richter M, Verstreken F (2018) Eligibility and structures of the European Board of Hand Surgery Diploma exam. J Hand Surg Eur 43(1):104–106. https://doi.org/10.1177/1753193417740038
Traoré SY, Goetsch T, Muller B, Dabbagh A, Liverneaux PA (2023) Is ChatGPT able to pass the first part of the European Board of Hand Surgery diploma examination? Hand Surg Rehabil 42(4):362–364. https://doi.org/10.1016/j.hansur.2023.06.005
Thirunavukarasu AJ, Hassan R, Mahmood S, Sanghera R, Barzangi K, El Mukashfi M, Shah S (2023) Trialling a large language model (ChatGPT) in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care. JMIR Med Educ 9:e46599. https://doi.org/10.2196/46599
Bibault JE, Chaix B, Guillemassé A, Cousin S, Escande A, Perrin M, Pienkowski A, Delamon G, Nectoux P, Brouard B (2019) A Chatbot versus physicians to provide information for patients with breast cancer: blind, randomized controlled noninferiority trial. J Med Internet Res 21(11):e15787. https://doi.org/10.2196/15787
Sato A, Haneda E, Suganuma N, Narimatsu H (2021) Preliminary screening for hereditary breast and ovarian cancer using a chatbot augmented intelligence genetic counselor: development and feasibility study. JMIR formative research 5(2):e25184. https://doi.org/10.2196/25184
Rigamonti L, Estel K, Gehlen T, Wolfarth B, Lawrence JB, Back DA (2021) Use of artificial intelligence in sports medicine: a report of 5 fictional cases. BMC Sports Sci Med Rehabil 13(1):13. https://doi.org/10.1186/s13102-021-00243-x
Xie Y, Seth I, Hunter-Smith DJ, Rozen WM, Ross R, Lee M (2023) Aesthetic Surgery advice and counseling from artificial intelligence: a rhinoplasty consultation with ChatGPT. Aesthetic Plast Surg 47(5):1985–1993. https://doi.org/10.1007/s00266-023-03338-7
Sevgi UT, Erol G, Doğruel Y, Sönmez OF, Tubbs RS, Güngor A (2023) The role of an open artificial intelligence platform in modern neurosurgical education: a preliminary study. Neurosurg Rev 46(1):86. https://doi.org/10.1007/s10143-023-01998-2
United States Medical Licensing Examination (USMLE). Available from: https://www.usmle.org/
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit Health 2(2):e0000198. https://doi.org/10.1371/journal.pdig.0000198
AKT example questions. Royal College of General Practitioners. 2019. URL: https://gp-training.hee.nhs.uk/cornwall/wp-content/uploads/sites/86/2021/04/RCGP-Sample-questions-2019-with-answers.pdf [accessed 2023–02–15]
Mbakwe AB, Lourentzou I, Celi LA, Mechanic OJ, Dagan A (2023) ChatGPT passing USMLE shines a spotlight on the flaws of medical education. PLoS Digit Health 2(2):e0000205. https://doi.org/10.1371/journal.pdig.0000205
Stephens LD, Jacobs JW, Adkins BD, Booth GS (2023) Battle of the (Chat)Bots: comparing large language models to practice guidelines for transfusion-associated graft-versus-host disease prevention. Transfus Med Rev 37(3):150753. https://doi.org/10.1016/j.tmrv.2023.150753
Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun CH, Lam JSH, Chen DZ, Goh JHL, Tan MCJ, Sheng B, Cheng CY, Koh VTC, Tham YC (2023) Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine 95:104770. https://doi.org/10.1016/j.ebiom.2023.104770
Moons P, Van Bulck (2023) Using ChatGPT and Google Bard to improve the readability of written patient information: a proof-of-concept. european journal of cardiovascular nursing, zvad087. Advance online publication. https://doi.org/10.1093/eurjcn/zvad087
Xie Y, Seth I, Hunter-Smith DJ, Rozen WM, Seifman MA (2023) Investigating the impact of innovative AI chatbot on post-pandemic medical education and clinical assistance: a comprehensive analysis. ANZ J Surg. https://doi.org/10.1111/ans.18666.Advanceonlinepublication.10.1111/ans.18666
Patil NS, Huang RS, van der Pol CB, Larocque N (2023) Comparative performance of ChatGPT and bard in a text-based radiology knowledge assessment. Can Assoc Radiol J. 8465371231193716. Advance online publication. https://doi.org/10.1177/08465371231193716
Funding
NA.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Philippe Liverneaux. Statistics were performed by Thibaut Goetsch. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethical approval
NA.
Consent to participate
NA.
Consent to publish
NA
Competing interests
None with this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Level of evidence: III
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Thibaut, G., Dabbagh, A. & Liverneaux, P. Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam?. International Orthopaedics (SICOT) 48, 151–158 (2024). https://doi.org/10.1007/s00264-023-06034-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00264-023-06034-y