Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions—an observational study

Vaishya, Raju; Iyengar, Karthikeyan P.; Patralekh, Mohit Kumar; Botchu, Rajesh; Shirodkar, Kapil; Jain, Vijay Kumar; Vaish, Abhishek; Scarlat, Marius M.

doi:10.1007/s00264-024-06182-9

Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions—an observational study

Original Paper
Published: 15 April 2024

(2024)
Cite this article

International Orthopaedics Aims and scope Submit manuscript

Raju Vaishya ORCID: orcid.org/0000-0002-9577-9533¹,
Karthikeyan P. Iyengar²,
Mohit Kumar Patralekh³,
Rajesh Botchu⁴,
Kapil Shirodkar⁴,
Vijay Kumar Jain⁵,
Abhishek Vaish¹ &
…
Marius M. Scarlat⁶

78 Accesses
5 Altmetric
Explore all metrics

Abstract

Purpose

This study analyses the performance and proficiency of the three Artificial Intelligence (AI) generative chatbots (ChatGPT-3.5, ChatGPT-4.0, Bard Google AI®) and in answering the Multiple Choice Questions (MCQs) of postgraduate (PG) level orthopaedic qualifying examinations.

Methods

A series of 120 mock Single Best Answer’ (SBA) MCQs with four possible options named A, B, C and D as answers on various musculoskeletal (MSK) conditions covering Trauma and Orthopaedic curricula were compiled. A standardised text prompt was used to generate and feed ChatGPT (both 3.5 and 4.0 versions) and Google Bard programs, which were then statistically analysed.

Results

Significant differences were found between responses from Chat GPT 3.5 with Chat GPT 4.0 (Chi square = 27.2, P < 0.001) and on comparing both Chat GPT 3.5 (Chi square = 63.852, P < 0.001) with Chat GPT 4.0 (Chi square = 44.246, P < 0.001) with. Bard Google AI® had 100% efficiency and was significantly more efficient than both Chat GPT 3.5 with Chat GPT 4.0 (p < 0.0001).

Conclusion

The results demonstrate the variable potential of the different AI generative chatbots (Chat GPT 3.5, Chat GPT 4.0 and Bard Google) in their ability to answer the MCQ of PG-level orthopaedic qualifying examinations. Bard Google AI® has shown superior performance than both ChatGPT versions, underlining the potential of such large language processing models in processing and applying orthopaedic subspecialty knowledge at a PG level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Progression of an Artificial Intelligence Chatbot (ChatGPT) for Pediatric Cardiology Educational Knowledge Assessment

Article 03 January 2024

ChatGPT-4 Performance on USMLE Step 1 Style Questions and Its Implications for Medical Education: A Comparative Study Across Systems and Disciplines

Article 27 December 2023

Evaluating the accuracy and relevance of ChatGPT responses to frequently asked questions regarding total knee replacement

Article Open access 02 April 2024

Data Availability

The data is available with the corresponding author, if needed.

References

ChatGPT: Optimizing Language Models for Dialogue (2024). OpenAI. https://openai.com/blog/chatgpt/ (Accessed 15 March 2024)
Bard is now Gemini - Gemini, Google's new AI (2024). https://gemini.google.com/ (Accessed 15 March 2024)
The Lancet Digital Health (2023) ChatGPT: friend or foe? Lancet Digit Health 5(3):e102. https://doi.org/10.1016/S2589-7500(23)00023-7
Article CAS PubMed Google Scholar
Nune A, Iyengar KP, Manzo C, Barman B, Botchu R (2023) Chat generative pre-trained transformer (ChatGPT): potential implications for rheumatology practice. Rheumatol Int 43(7):1379–1380. https://doi.org/10.1007/s00296-023-05340-3
Article PubMed Google Scholar
Saad A, Jenko N, Ariyaratne S, Birch N, Iyengar KP, Davies AM, Vaishya R, Botchu R (2024) Exploring the potential of ChatGPT in the peer review process: An observational study. Diabetes Metab Syndr 18(2):102946. https://doi.org/10.1016/j.dsx.2024.102946. (Advance online publication)
Article CAS PubMed Google Scholar
Fowler T, Pullen S, Birkett L (2023) Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions. British J Ophthalmol bjo-2023–324091. Advance online publication. https://doi.org/10.1136/bjo-2023-324091
Farhat F, Chaudhry BM, Nadeem M, Sohail SS, Madsen DØ (2024) Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard. JMIR Med Educ 10:e51523. https://doi.org/10.2196/51523
Article PubMed PubMed Central Google Scholar
Iyengar KP, Jain VK, Vaishya R (2021) Virtual postgraduate orthopaedic practical examination: a pilot model. Postgrad Med J 97(1152):650–654. https://doi.org/10.1136/postgradmedj-2020-138726
Article PubMed Google Scholar
Naidoo M (2023) The pearls and pitfalls of setting high-quality multiple choice questions for clinical medicine. S Afr Fam Pract 65(1):e1–e4. https://doi.org/10.4102/safp.v65i1.5726
Article Google Scholar
Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, Chartash D (2023) How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ 9:e45312. https://doi.org/10.2196/45312
Article PubMed PubMed Central Google Scholar
Saad A, Iyengar KP, Kurisunkal V, Botchu R (2023) Assessing ChatGPT’s ability to pass the FRCS orthopaedic part A exam: A critical analysis. Surgeon 21(5):263–266. https://doi.org/10.1016/j.surge.2023.07.001
Article PubMed Google Scholar
Huang RS, Lu KJQ, Meaney C, Kemppainen J, Punnett A, Leung FH (2023) Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study. JMIR Med Educ 9:e50514. https://doi.org/10.2196/5051
Article PubMed PubMed Central Google Scholar
Vaishya R, Misra A, Vaish A (2023) ChatGPT: Is this version good for healthcare and research? Diabetes Metab Syndr 17(4):102744. https://doi.org/10.1016/j.dsx.2023.102744
Article CAS PubMed Google Scholar
Vaishya R, Kambhampati SBS, Iyengar KP, Vaish A (2023) ChatGPT in the current form is not ready for unaudited use in healthcare and scientific research. Cancer Res, Stat Treat 6(2):336–337. https://doi.org/10.4103/crst.crst_144_23
Article Google Scholar
Ariyaratne S, Iyengar KP, Nischal N, Chitti Babu N, Botchu R (2023) A comparison of ChatGPT-generated articles with human-written articles. Skeletal radiology, 52(9), 1755–1758. https://doi.org/10.1007/s00256-023-04340-5
Ariyaratne S, Jenko N, Mark Davies A, Iyengar KP, Botchu R (2023) Could ChatGPT Pass the UK Radiology Fellowship Examinations? Acad Radiol S1076-6332(23):00661–X. https://doi.org/10.1016/j.acra.2023.11.026. (Advance online publication)
Article Google Scholar
Lum ZC (2023) Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT. Clin Orthop Relat Res 481(8):1623–1630. https://doi.org/10.1097/CORR.0000000000002704
Article PubMed Google Scholar
SICOT Diploma Examination (2023). https://www.sicot.org/diploma-examination (Accessed 17 March 2023)
Elnikety S (2023) (2015) Your guide to the SICOT diploma in trauma and orthopaedics. Bull Royal Coll Surg England 97(1):E12–E14. https://doi.org/10.1308/147363515X14134529299385(Accessed17March
Article Google Scholar
National Board of Examinations in Medical Sciences (2023). URL available at: https://www.natboard.edu.in/ (Accessed 17 March 2023)
Sullivan GM, Artino AR Jr (2013) Analyzing and interpreting data from Likert-type scales. J Grad Med Educ 5(4):541–542. https://doi.org/10.4300/JGME-5-4-18
Article PubMed PubMed Central Google Scholar
Vaishya R, Scarlat MM, Iyengar KP (2022) Will technology drive orthopaedic surgery in the future? Int Orthop 46(7):1443–1445. https://doi.org/10.1007/s00264-022-05454-6
Article PubMed Google Scholar
Mavrogenis AF, Scarlat MM (2023) Thoughts on artificial intelligence use in medical practice and in scientific writing. Int Orthop 47(9):2139–2141. https://doi.org/10.1007/s00264-023-05936-1
Article PubMed Google Scholar
Mavrogenis AF, Hernigou P, Scarlat MM (2024) Artificial intelligence, natural stupidity or artificial stupidity: who is today the winner in orthopaedics? What is true and what is fraud? What legal barriers exist for scientific writing? Int Orthop 48(3):617–623. https://doi.org/10.1007/s00264-024-06102-x
Article PubMed Google Scholar
Mavrogenis AF, Scarlat MM (2023) Artificial intelligence publications: synthetic data, patients, and papers. Int Orthop 47(6):1395–1396. https://doi.org/10.1007/s00264-023-05830-w
Article PubMed Google Scholar
Masalkhi M, Ong J, Waisberg E, Lee AG (2024) Google DeepMind’s gemini AI versus ChatGPT: a comparative analysis in ophthalmology. Eye (London, England), https://doi.org/10.1038/s41433-024-02958-w. Advance online publication
Extance A (2023) ChatGPT has entered the classroom: how LLMs could transform education. Nature 623(7987):474–477. https://doi.org/10.1038/d41586-023-03507-3
Article CAS PubMed Google Scholar
Chatterjee S, Bhattacharya M, Pal S, Lee SS, Chakraborty C (2023) ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop 10(1):128. https://doi.org/10.1186/s40634-023-00700-1
Article PubMed PubMed Central Google Scholar
Korneev A, Lipina M, Lychagin A, Timashev P, Kon E, Telyshev D et al (2023) Systematic review of artificial intelligence tack in preventive orthopaedics: is the land coming soon? Int Orthop 47(2):393–403. https://doi.org/10.1007/s00264-022-05628-2
Article PubMed Google Scholar
Thibaut G, Dabbagh A, Liverneaux P (2024) Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam? Int Orthop 48(1):151–158. https://doi.org/10.1007/s00264-023-06034-y
Article PubMed Google Scholar

Download references

Acknowledgements

We are grateful to Ms. Anupama Rawat of Indraprastha Apollo Hospitals, New Delhi, for her help in compiling the result data of this study.

Author information

Authors and Affiliations

Department of Orthopaedics, Indraprastha Apollo Hospitals, Sarita Vihar, New Delhi, 110076, India
Raju Vaishya & Abhishek Vaish
Department of Orthopaedics, Southport and Ormskirk Hospital, Mersey West Lancashire Teaching NHS Trust, Southport, UK
Karthikeyan P. Iyengar
Department of Orthopaedics, Safdarjung Hospital, New Delhi, India
Mohit Kumar Patralekh
Department of Musculoskeletal Radiology, Royal Orthopedic Hospital, Birmingham, UK
Rajesh Botchu & Kapil Shirodkar
Department of Orthopaedics, RML Hospital, New Delhi, India
Vijay Kumar Jain
Clinique Chirurgicale St Michel, Groupe ELSAN Toulon, France
Marius M. Scarlat

Authors

Raju Vaishya
View author publications
You can also search for this author in PubMed Google Scholar
Karthikeyan P. Iyengar
View author publications
You can also search for this author in PubMed Google Scholar
Mohit Kumar Patralekh
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh Botchu
View author publications
You can also search for this author in PubMed Google Scholar
Kapil Shirodkar
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Kumar Jain
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Vaish
View author publications
You can also search for this author in PubMed Google Scholar
Marius M. Scarlat
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RV, KPI-Conceptualization, data collection and analysis, literature search, manuscript writing, editing and final approval.

KPI, MP, RB, KS, VJ, AV- Literature search, data collection and analysis, manuscript writing, references, editing, supervision, and final approval.

MMS- Conceptualization, Manuscript editing and final approval.

Corresponding author

Correspondence to Raju Vaishya.

Ethics declarations

Ethics approval

None, since it is not a clinical article.

Informed consent

Not applicable.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 43 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Vaishya, R., Iyengar, K.P., Patralekh, M.K. et al. Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions—an observational study. International Orthopaedics (SICOT) (2024). https://doi.org/10.1007/s00264-024-06182-9

Download citation

Received: 25 March 2024
Accepted: 03 April 2024
Published: 15 April 2024
DOI: https://doi.org/10.1007/s00264-024-06182-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions—an observational study