Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam?

Thibaut, Goetsch; Dabbagh, Armaghan; Liverneaux, Philippe

doi:10.1007/s00264-023-06034-y

Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam?

Original Paper
Published: 15 November 2023

Volume 48, pages 151–158, (2024)
Cite this article

International Orthopaedics Aims and scope Submit manuscript

Goetsch Thibaut¹,
Armaghan Dabbagh² &
Philippe Liverneaux ORCID: orcid.org/0000-0002-5509-8995^3,4

423 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Purpose

According to a previous research, the chatbot ChatGPT® V3.5 was unable to pass the first part of the European Board of Hand Surgery (EBHS) diploma examination. This study aimed to investigate whether Google's chatbot Bard® would have superior performance compared to ChatGPT on the EBHS diploma examination.

Methods

Chatbots were asked to answer 18 EBHS multiple choice questions (MCQs) published in the Journal of Hand Surgery (European Volume) in five trials (A1 to A5). After A3, chatbots received correct answers, and after A4, incorrect answers. Consequently, their ability to modify their response was measured and compared.

Results

Bard® scored 3/18 (A1), 1/18 (A2), 4/18 (A3) and 2/18 (A4 and A5). The average percentage of correct answers was 61.1% for A1, 62.2% for A2, 64.4% for A3, 65.6% for A4, 63.3% for A5 and 63.3% for all trials combined. Agreement was moderate from A1 to A5 (kappa = 0.62 (IC95% = [0.51; 0.73])) as well as from A1 to A3 (kappa = 0.60 (IC95% = [0.47; 0.74])). The formulation of Bard® responses was homogeneous, but its learning capacity is still developing.

Conclusions

The main hypothesis of our study was not proved since Bard did not score significantly higher than ChatGPT when answering the MCQs of the EBHS diploma exam. In conclusion, neither ChatGPT® nor Bard®, in their current versions, can pass the first part of the EBHS diploma exam.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment

Article 13 April 2024

Assessing ChatGPT vs. Standard Medical Resources for Endoscopic Sleeve Gastroplasty Education: A Medical Professional Evaluation Study

Article 17 May 2024

Evaluating the accuracy and relevance of ChatGPT responses to frequently asked questions regarding total knee replacement

Article Open access 02 April 2024

References

Gordijn B, Have HT (2023) ChatGPT: evolution or revolution? Med Health Care Philos 26(1):1–2. https://doi.org/10.1007/s11019-023-10136-0
Article PubMed Google Scholar
Wang A, Qian Z, Briggs L, Cole AP, Reis LO, Trinh QD (2023) The use of chatbots in oncological care: a narrative review. Int J Gen Med 16:1591–1602. https://doi.org/10.2147/IJGM.S408208
Article PubMed PubMed Central Google Scholar
Sallam M (2023) ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel, Switzerland) 11(6):887. https://doi.org/10.3390/healthcare11060887
Article PubMed Google Scholar
Klímová B, Ibna Seraj PM (2023) The use of chatbots in university EFL settings: research trends and pedagogical implications. Front Psychol 14:1131506. https://doi.org/10.3389/fpsyg.2023.1131506
Article PubMed PubMed Central Google Scholar
Lee JY (2023) Can an artificial intelligence chatbot be the author of a scholarly article? J Educ Eval Health Prof 20:6. https://doi.org/10.3352/jeehp.2023.20.6
Article PubMed PubMed Central Google Scholar
Salvagno M, Taccone FS, Gerli AG (2023) Can artificial intelligence help for scientific writing? Critical care (London, England) 27(1):75. https://doi.org/10.1186/s13054-023-04380-2
Article Google Scholar
Miller R, Farnebo S, Horwitz MD (2023) Insights and trends review: artificial intelligence in hand surgery. J Hand Surg Eur 48(5):396–403. https://doi.org/10.1177/17531934231152592
Article Google Scholar
Muir L, Richter M, Verstreken F (2018) Eligibility and structures of the European Board of Hand Surgery Diploma exam. J Hand Surg Eur 43(1):104–106. https://doi.org/10.1177/1753193417740038
Article Google Scholar
Traoré SY, Goetsch T, Muller B, Dabbagh A, Liverneaux PA (2023) Is ChatGPT able to pass the first part of the European Board of Hand Surgery diploma examination? Hand Surg Rehabil 42(4):362–364. https://doi.org/10.1016/j.hansur.2023.06.005
Article PubMed Google Scholar
Thirunavukarasu AJ, Hassan R, Mahmood S, Sanghera R, Barzangi K, El Mukashfi M, Shah S (2023) Trialling a large language model (ChatGPT) in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care. JMIR Med Educ 9:e46599. https://doi.org/10.2196/46599
Bibault JE, Chaix B, Guillemassé A, Cousin S, Escande A, Perrin M, Pienkowski A, Delamon G, Nectoux P, Brouard B (2019) A Chatbot versus physicians to provide information for patients with breast cancer: blind, randomized controlled noninferiority trial. J Med Internet Res 21(11):e15787. https://doi.org/10.2196/15787
Article PubMed PubMed Central Google Scholar
Sato A, Haneda E, Suganuma N, Narimatsu H (2021) Preliminary screening for hereditary breast and ovarian cancer using a chatbot augmented intelligence genetic counselor: development and feasibility study. JMIR formative research 5(2):e25184. https://doi.org/10.2196/25184
Article PubMed PubMed Central Google Scholar
Rigamonti L, Estel K, Gehlen T, Wolfarth B, Lawrence JB, Back DA (2021) Use of artificial intelligence in sports medicine: a report of 5 fictional cases. BMC Sports Sci Med Rehabil 13(1):13. https://doi.org/10.1186/s13102-021-00243-x
Article PubMed PubMed Central Google Scholar
Xie Y, Seth I, Hunter-Smith DJ, Rozen WM, Ross R, Lee M (2023) Aesthetic Surgery advice and counseling from artificial intelligence: a rhinoplasty consultation with ChatGPT. Aesthetic Plast Surg 47(5):1985–1993. https://doi.org/10.1007/s00266-023-03338-7
Article PubMed PubMed Central Google Scholar
Sevgi UT, Erol G, Doğruel Y, Sönmez OF, Tubbs RS, Güngor A (2023) The role of an open artificial intelligence platform in modern neurosurgical education: a preliminary study. Neurosurg Rev 46(1):86. https://doi.org/10.1007/s10143-023-01998-2
Article PubMed Google Scholar
United States Medical Licensing Examination (USMLE). Available from: https://www.usmle.org/
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit Health 2(2):e0000198. https://doi.org/10.1371/journal.pdig.0000198
Article PubMed PubMed Central Google Scholar
AKT example questions. Royal College of General Practitioners. 2019. URL: https://gp-training.hee.nhs.uk/cornwall/wp-content/uploads/sites/86/2021/04/RCGP-Sample-questions-2019-with-answers.pdf [accessed 2023–02–15]
Mbakwe AB, Lourentzou I, Celi LA, Mechanic OJ, Dagan A (2023) ChatGPT passing USMLE shines a spotlight on the flaws of medical education. PLoS Digit Health 2(2):e0000205. https://doi.org/10.1371/journal.pdig.0000205
Article PubMed PubMed Central Google Scholar
Stephens LD, Jacobs JW, Adkins BD, Booth GS (2023) Battle of the (Chat)Bots: comparing large language models to practice guidelines for transfusion-associated graft-versus-host disease prevention. Transfus Med Rev 37(3):150753. https://doi.org/10.1016/j.tmrv.2023.150753
Article PubMed Google Scholar
Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun CH, Lam JSH, Chen DZ, Goh JHL, Tan MCJ, Sheng B, Cheng CY, Koh VTC, Tham YC (2023) Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine 95:104770. https://doi.org/10.1016/j.ebiom.2023.104770
Moons P, Van Bulck (2023) Using ChatGPT and Google Bard to improve the readability of written patient information: a proof-of-concept. european journal of cardiovascular nursing, zvad087. Advance online publication. https://doi.org/10.1093/eurjcn/zvad087
Xie Y, Seth I, Hunter-Smith DJ, Rozen WM, Seifman MA (2023) Investigating the impact of innovative AI chatbot on post-pandemic medical education and clinical assistance: a comprehensive analysis. ANZ J Surg. https://doi.org/10.1111/ans.18666.Advanceonlinepublication.10.1111/ans.18666
Article PubMed Google Scholar
Patil NS, Huang RS, van der Pol CB, Larocque N (2023) Comparative performance of ChatGPT and bard in a text-based radiology knowledge assessment. Can Assoc Radiol J. 8465371231193716. Advance online publication. https://doi.org/10.1177/08465371231193716

Download references

Funding

NA.

Author information

Authors and Affiliations

Department of Public Health, Strasbourg University Hospital, FMTS, GMRC, 1 avenue de l’hôpital, 67000, Strasbourg cedex, France
Goetsch Thibaut
Faculty of Medicine, University of Toronto, Toronto, ON, Canada
Armaghan Dabbagh
ICube CNRS UMR7357, Strasbourg University, 2-4 rue Boussingault, 67000, Strasbourg, France
Philippe Liverneaux
Department of Hand Surgery, Strasbourg University Hospitals, FMTS, 1 avenue Molière, 67200, Strasbourg, France
Philippe Liverneaux

Authors

Goetsch Thibaut
View author publications
You can also search for this author in PubMed Google Scholar
Armaghan Dabbagh
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Liverneaux
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Philippe Liverneaux. Statistics were performed by Thibaut Goetsch. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Philippe Liverneaux.

Ethics declarations

Ethical approval

NA.

Consent to participate

NA.

Consent to publish

NA

Competing interests

None with this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Level of evidence: III

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 22 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Thibaut, G., Dabbagh, A. & Liverneaux, P. Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam?. International Orthopaedics (SICOT) 48, 151–158 (2024). https://doi.org/10.1007/s00264-023-06034-y

Download citation

Received: 07 October 2023
Accepted: 01 November 2023
Published: 15 November 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s00264-023-06034-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam?