Skip to main content

Advertisement

Log in

Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam?

  • Original Paper
  • Published:
International Orthopaedics Aims and scope Submit manuscript

Abstract

Purpose

According to a previous research, the chatbot ChatGPT® V3.5 was unable to pass the first part of the European Board of Hand Surgery (EBHS) diploma examination. This study aimed to investigate whether Google's chatbot Bard® would have superior performance compared to ChatGPT on the EBHS diploma examination.

Methods

Chatbots were asked to answer 18 EBHS multiple choice questions (MCQs) published in the Journal of Hand Surgery (European Volume) in five trials (A1 to A5). After A3, chatbots received correct answers, and after A4, incorrect answers. Consequently, their ability to modify their response was measured and compared.

Results

Bard® scored 3/18 (A1), 1/18 (A2), 4/18 (A3) and 2/18 (A4 and A5). The average percentage of correct answers was 61.1% for A1, 62.2% for A2, 64.4% for A3, 65.6% for A4, 63.3% for A5 and 63.3% for all trials combined. Agreement was moderate from A1 to A5 (kappa = 0.62 (IC95% = [0.51; 0.73])) as well as from A1 to A3 (kappa = 0.60 (IC95% = [0.47; 0.74])). The formulation of Bard® responses was homogeneous, but its learning capacity is still developing.

Conclusions

The main hypothesis of our study was not proved since Bard did not score significantly higher than ChatGPT when answering the MCQs of the EBHS diploma exam. In conclusion, neither ChatGPT® nor Bard®, in their current versions, can pass the first part of the EBHS diploma exam.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Gordijn B, Have HT (2023) ChatGPT: evolution or revolution? Med Health Care Philos 26(1):1–2. https://doi.org/10.1007/s11019-023-10136-0

    Article  PubMed  Google Scholar 

  2. Wang A, Qian Z, Briggs L, Cole AP, Reis LO, Trinh QD (2023) The use of chatbots in oncological care: a narrative review. Int J Gen Med 16:1591–1602. https://doi.org/10.2147/IJGM.S408208

    Article  PubMed  PubMed Central  Google Scholar 

  3. Sallam M (2023) ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel, Switzerland) 11(6):887. https://doi.org/10.3390/healthcare11060887

    Article  PubMed  Google Scholar 

  4. Klímová B, Ibna Seraj PM (2023) The use of chatbots in university EFL settings: research trends and pedagogical implications. Front Psychol 14:1131506. https://doi.org/10.3389/fpsyg.2023.1131506

    Article  PubMed  PubMed Central  Google Scholar 

  5. Lee JY (2023) Can an artificial intelligence chatbot be the author of a scholarly article? J Educ Eval Health Prof 20:6. https://doi.org/10.3352/jeehp.2023.20.6

    Article  PubMed  PubMed Central  Google Scholar 

  6. Salvagno M, Taccone FS, Gerli AG (2023) Can artificial intelligence help for scientific writing? Critical care (London, England) 27(1):75. https://doi.org/10.1186/s13054-023-04380-2

    Article  Google Scholar 

  7. Miller R, Farnebo S, Horwitz MD (2023) Insights and trends review: artificial intelligence in hand surgery. J Hand Surg Eur 48(5):396–403. https://doi.org/10.1177/17531934231152592

    Article  Google Scholar 

  8. Muir L, Richter M, Verstreken F (2018) Eligibility and structures of the European Board of Hand Surgery Diploma exam. J Hand Surg Eur 43(1):104–106. https://doi.org/10.1177/1753193417740038

    Article  Google Scholar 

  9. Traoré SY, Goetsch T, Muller B, Dabbagh A, Liverneaux PA (2023) Is ChatGPT able to pass the first part of the European Board of Hand Surgery diploma examination? Hand Surg Rehabil 42(4):362–364. https://doi.org/10.1016/j.hansur.2023.06.005

    Article  PubMed  Google Scholar 

  10. Thirunavukarasu AJ, Hassan R, Mahmood S, Sanghera R, Barzangi K, El Mukashfi M, Shah S (2023) Trialling a large language model (ChatGPT) in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care. JMIR Med Educ 9:e46599. https://doi.org/10.2196/46599

  11. Bibault JE, Chaix B, Guillemassé A, Cousin S, Escande A, Perrin M, Pienkowski A, Delamon G, Nectoux P, Brouard B (2019) A Chatbot versus physicians to provide information for patients with breast cancer: blind, randomized controlled noninferiority trial. J Med Internet Res 21(11):e15787. https://doi.org/10.2196/15787

    Article  PubMed  PubMed Central  Google Scholar 

  12. Sato A, Haneda E, Suganuma N, Narimatsu H (2021) Preliminary screening for hereditary breast and ovarian cancer using a chatbot augmented intelligence genetic counselor: development and feasibility study. JMIR formative research 5(2):e25184. https://doi.org/10.2196/25184

    Article  PubMed  PubMed Central  Google Scholar 

  13. Rigamonti L, Estel K, Gehlen T, Wolfarth B, Lawrence JB, Back DA (2021) Use of artificial intelligence in sports medicine: a report of 5 fictional cases. BMC Sports Sci Med Rehabil 13(1):13. https://doi.org/10.1186/s13102-021-00243-x

    Article  PubMed  PubMed Central  Google Scholar 

  14. Xie Y, Seth I, Hunter-Smith DJ, Rozen WM, Ross R, Lee M (2023) Aesthetic Surgery advice and counseling from artificial intelligence: a rhinoplasty consultation with ChatGPT. Aesthetic Plast Surg 47(5):1985–1993. https://doi.org/10.1007/s00266-023-03338-7

    Article  PubMed  PubMed Central  Google Scholar 

  15. Sevgi UT, Erol G, Doğruel Y, Sönmez OF, Tubbs RS, Güngor A (2023) The role of an open artificial intelligence platform in modern neurosurgical education: a preliminary study. Neurosurg Rev 46(1):86. https://doi.org/10.1007/s10143-023-01998-2

    Article  PubMed  Google Scholar 

  16. United States Medical Licensing Examination (USMLE). Available from: https://www.usmle.org/

  17. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit Health 2(2):e0000198. https://doi.org/10.1371/journal.pdig.0000198

    Article  PubMed  PubMed Central  Google Scholar 

  18. AKT example questions. Royal College of General Practitioners. 2019. URL: https://gp-training.hee.nhs.uk/cornwall/wp-content/uploads/sites/86/2021/04/RCGP-Sample-questions-2019-with-answers.pdf [accessed 2023–02–15]

  19. Mbakwe AB, Lourentzou I, Celi LA, Mechanic OJ, Dagan A (2023) ChatGPT passing USMLE shines a spotlight on the flaws of medical education. PLoS Digit Health 2(2):e0000205. https://doi.org/10.1371/journal.pdig.0000205

    Article  PubMed  PubMed Central  Google Scholar 

  20. Stephens LD, Jacobs JW, Adkins BD, Booth GS (2023) Battle of the (Chat)Bots: comparing large language models to practice guidelines for transfusion-associated graft-versus-host disease prevention. Transfus Med Rev 37(3):150753. https://doi.org/10.1016/j.tmrv.2023.150753

    Article  PubMed  Google Scholar 

  21. Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun CH, Lam JSH, Chen DZ, Goh JHL, Tan MCJ, Sheng B, Cheng CY, Koh VTC, Tham YC (2023) Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine 95:104770. https://doi.org/10.1016/j.ebiom.2023.104770

  22. Moons P, Van Bulck (2023) Using ChatGPT and Google Bard to improve the readability of written patient information: a proof-of-concept. european journal of cardiovascular nursing, zvad087. Advance online publication. https://doi.org/10.1093/eurjcn/zvad087

  23. Xie Y, Seth I, Hunter-Smith DJ, Rozen WM, Seifman MA (2023) Investigating the impact of innovative AI chatbot on post-pandemic medical education and clinical assistance: a comprehensive analysis. ANZ J Surg. https://doi.org/10.1111/ans.18666.Advanceonlinepublication.10.1111/ans.18666

    Article  PubMed  Google Scholar 

  24. Patil NS, Huang RS, van der Pol CB, Larocque N (2023) Comparative performance of ChatGPT and bard in a text-based radiology knowledge assessment. Can Assoc Radiol J. 8465371231193716. Advance online publication. https://doi.org/10.1177/08465371231193716

Download references

Funding

NA.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Philippe Liverneaux. Statistics were performed by Thibaut Goetsch. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Philippe Liverneaux.

Ethics declarations

Ethical approval

NA.

Consent to participate

NA.

Consent to publish

NA

Competing interests

None with this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Level of evidence: III

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 22 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thibaut, G., Dabbagh, A. & Liverneaux, P. Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam?. International Orthopaedics (SICOT) 48, 151–158 (2024). https://doi.org/10.1007/s00264-023-06034-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00264-023-06034-y

Keywords

Navigation