A cross-sectional comparative study: ChatGPT 3.5 versus diverse levels of medical experts in the diagnosis of ENT diseases

Makhoul, Mikhael; Melkane, Antoine E.; Khoury, Patrick El; Hadi, Christopher El; Matar, Nayla

doi:10.1007/s00405-024-08509-z

A cross-sectional comparative study: ChatGPT 3.5 versus diverse levels of medical experts in the diagnosis of ENT diseases

Miscellaneous
Published: 16 February 2024

Volume 281, pages 2717–2721, (2024)
Cite this article

European Archives of Oto-Rhino-Laryngology Aims and scope Submit manuscript

Mikhael Makhoul ORCID: orcid.org/0000-0001-5518-5549¹,
Antoine E. Melkane¹,
Patrick El Khoury¹,
Christopher El Hadi¹ &
…
Nayla Matar¹

134 Accesses
Explore all metrics

Abstract

Purpose

With recent advances in artificial intelligence (AI), it has become crucial to thoroughly evaluate its applicability in healthcare. This study aimed to assess the accuracy of ChatGPT in diagnosing ear, nose, and throat (ENT) pathology, and comparing its performance to that of medical experts.

Methods

We conducted a cross-sectional comparative study where 32 ENT cases were presented to ChatGPT 3.5, ENT physicians, ENT residents, family medicine (FM) specialists, second-year medical students (Med2), and third-year medical students (Med3). Each participant provided three differential diagnoses. The study analyzed diagnostic accuracy rates and inter-rater agreement within and between participant groups and ChatGPT.

Results

The accuracy rate of ChatGPT was 70.8%, being not significantly different from ENT physicians or ENT residents. However, a significant difference in correctness rate existed between ChatGPT and FM specialists (49.8%, p < 0.001), and between ChatGPT and medical students (Med2 47.5%, p < 0.001; Med3 47%, p < 0.001). Inter-rater agreement for the differential diagnosis between ChatGPT and each participant group was either poor or fair. In 68.75% of cases, ChatGPT failed to mention the most critical diagnosis.

Conclusions

ChatGPT demonstrated accuracy comparable to that of ENT physicians and ENT residents in diagnosing ENT pathology, outperforming FM specialists, Med2 and Med3. However, it showed limitations in identifying the most critical diagnosis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Is artificial intelligence ready to replace specialist doctors entirely? ENT specialists vs ChatGPT: 1-0, ball at the center

Article 14 November 2023

Application of ChatGPT as a support tool in the diagnosis and management of acute bacterial tonsillitis

Article 11 April 2024

Assessing ChatGPT 4.0’s test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports

Article Open access 23 April 2024

Availability of data and materials

The data that support the findings of this study are available on request from the corresponding author.

Code availability

Not applicable.

Abbreviations

AI:: Artificial intelligence
ChatGPT:: Chat-based generative pre-trained transformer
ENT:: Ear, nose, and throat
FM:: Family medicine
Med2:: Second-year medical students
Med3:: Third-year medical students

References

Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK et al (2023) Assessing the utility of ChatGPT throughout the entire clinical workflow. Health Inform. https://doi.org/10.1101/2023.02.21.23285886
Article Google Scholar
Liu J, Wang C, Liu S (2023) Utility of ChatGPT in clinical practice. J Med Internet Res 28(25):e48568
Article Google Scholar
Duarte F (2023) Number of ChatGPT Users
Barat M, Soyer P, Dohan A (2023) Appropriateness of recommendations provided by ChatGPT to interventional radiologists. Can Assoc Radiol J 13:084653712311701
Google Scholar
Strong E, DiGiammarino A, Weng Y, Basaviah P, Hosamani P, Kumar A et al (2023) Performance of ChatGPT on free-response, clinical reasoning exams. Med Educ. https://doi.org/10.1101/2023.03.24.23287731
Article Google Scholar
AlGhamdi KM, Moussa NA (2012) Internet use by the public to search for health-related information. Int J Med Inf 81(6):363–373
Article Google Scholar
Tonsaker T, Bartlett G, Trpkov C (2014) Health information on the Internet: gold mine or minefield? Can Fam Physician Med Fam Can 60(5):407–408
Google Scholar
Emerick KS, Deschler DG (2006) Common ENT disorders. South Med J 99(10):1090–1099
Article PubMed Google Scholar
Lechien JR, Georgescu BM, Hans S, Chiesa-Estomba CM (2023) ChatGPT performance in laryngology and head and neck surgery: a clinical case-series. Eur Arch Otorhinolaryngol. https://doi.org/10.1007/s00405-023-08282-5
Article PubMed PubMed Central Google Scholar
Hoch CC, Wollenberg B, Lüers JC, Knoedler S, Knoedler L, Frank K et al (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol 280(9):4271–4278
Article PubMed PubMed Central Google Scholar
Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R, Shimizu T (2023) Diagnostic Accuracy of differential-diagnosis lists generated by generative pretrained transformer 3 chatbot for clinical vignettes with common chief complaints: a pilot study. Int J Environ Res Public Health 20(4):3378
Article PubMed PubMed Central Google Scholar
Qu RW, Qureshi U, Petersen G, Lee SC (2023) Diagnostic and management applications of ChatGPT in structured otolaryngology clinical scenarios. OTO Open 7(3):e67
Article PubMed PubMed Central Google Scholar
Ravipati A, Pradeep T, Elman SA (2023) The role of artificial intelligence in dermatology: the promising but limited accuracy of ChatGPT in diagnosing clinical scenarios. Int J Dermatol. https://doi.org/10.1111/ijd.16746
Article PubMed Google Scholar
Cao JJ, Kwon DH, Ghaziani TT, Kwo P, Tse G, Kesselman A et al (2023) Accuracy of information provided by ChatGPT regarding liver cancer surveillance and diagnosis. Am J Roentgenol 221(4):556–559
Article Google Scholar
Massey PA, Montgomery C, Zhang AS (2023) Comparison of ChatGPT–35, ChatGPT-4, and orthopaedic resident performance on orthopaedic assessment examinations. J Am Acad Orthop Surg 31(23):1173–1179
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

No acknowledgements

Funding

This manuscript is not supported by any grant.

Author information

Authors and Affiliations

Department of Otolaryngology–Head and Neck Surgery, Hotel Dieu de France Hospital, Saint Joseph University, Alfred Naccache Boulevard, Ashrafieh, PO Box: 166830, Beirut, Lebanon
Mikhael Makhoul, Antoine E. Melkane, Patrick El Khoury, Christopher El Hadi & Nayla Matar

Authors

Mikhael Makhoul
View author publications
You can also search for this author in PubMed Google Scholar
Antoine E. Melkane
View author publications
You can also search for this author in PubMed Google Scholar
Patrick El Khoury
View author publications
You can also search for this author in PubMed Google Scholar
Christopher El Hadi
View author publications
You can also search for this author in PubMed Google Scholar
Nayla Matar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by MM, AM, PEK, CEH, and NM. The first draft of the manuscript was written by MM and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript. MM and AM revised the manuscript after reviewer comments.

Corresponding author

Correspondence to Mikhael Makhoul.

Ethics declarations

Conflict of interest

All authors have no conflicts of interest to disclose.

Ethics approval

This study received approval from the Hotel Dieu de France Ethical Committee (CEHDF 2219).

Informed consent

Written informed consent was obtained from each participant.

Consent to participate

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 59 KB)

Supplementary file2 (DOCX 58 KB)

Supplementary file3 (DOCX 18 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Makhoul, M., Melkane, A.E., Khoury, P.E. et al. A cross-sectional comparative study: ChatGPT 3.5 versus diverse levels of medical experts in the diagnosis of ENT diseases. Eur Arch Otorhinolaryngol 281, 2717–2721 (2024). https://doi.org/10.1007/s00405-024-08509-z

Download citation

Received: 01 January 2024
Accepted: 24 January 2024
Published: 16 February 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s00405-024-08509-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A cross-sectional comparative study: ChatGPT 3.5 versus diverse levels of medical experts in the diagnosis of ENT diseases