Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology–head and neck surgery

Lechien, Jerome R.; Briganti, Giovanni; Vaira, Luigi A.

doi:10.1007/s00405-023-08441-8

Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology–head and neck surgery

Miscellaneous
Published: 11 January 2024

Volume 281, pages 2159–2165, (2024)
Cite this article

European Archives of Oto-Rhino-Laryngology Aims and scope Submit manuscript

Jerome R. Lechien ORCID: orcid.org/0000-0002-0845-0845^1,2,3,4,8,
Giovanni Briganti⁵ &
Luigi A. Vaira^6,7

467 Accesses
9 Citations
Explore all metrics

Abstract

Introduction

Chatbot generative pre-trained transformer (ChatGPT) is a new artificial intelligence-powered language model of chatbot able to help otolaryngologists in practice and research. We investigated the accuracy of ChatGPT-3.5 and -4 in the referencing of manuscripts published in otolaryngology.

Methods

ChatGPT-3.5 and ChatGPT-4 were interrogated for providing references of the top-30 most cited papers in otolaryngology in the past 40 years including clinical guidelines and key studies that changed the practice. The responses were regenerated three times to assess the accuracy and stability of ChatGPT. ChatGPT-3.5 and ChatGPT-4 were compared for accuracy of reference and potential mistakes.

Results

The accuracy of ChatGPT-3.5 and ChatGPT-4.0 ranged from 47% to 60%, and 73% to 87%, respectively (p < 0.005). ChatGPT-3.5 provided 19 inaccurate references and invented 2 references throughout the regenerated questions. ChatGPT-4.0 provided 13 inaccurate references, while it proposed only one invented reference. The stability of responses throughout regenerated answers was mild (k = 0.238) and moderate (k = 0.408) for ChatGPT-3.5 and 4.0, respectively.

Conclusions

ChatGPT-4.0 reported higher accuracy than the free-access version (3.5). False references were detected in both 3.5 and 4.0 versions. Practitioners need to be careful regarding the use of ChatGPT in the reach of some key reference when writing a report.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ChatGPT vs UpToDate: comparative study of usefulness and reliability of Chatbot in common clinical presentations of otorhinolaryngology–head and neck surgery

Article Open access 13 January 2024

ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources

Article 20 March 2024

Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery

Article Open access 27 April 2023

Data availability

Data are available on request.

References

Ayoub NF, Lee YJ, Grimm D, Divi V (2023) Head-to-head comparison of ChatGPT versus google search for medical knowledge acquisition. Otolaryngol Head Neck Surg. https://doi.org/10.1002/ohn.465
Article PubMed Google Scholar
Salvagno M, Taccone FS, Gerli AG (2023) Can artificial intelligence help for scientific writing? Crit Care 27(1):75. https://doi.org/10.1186/s13054-023-04380-2
Article PubMed PubMed Central Google Scholar
Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA et al (2023) Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg. https://doi.org/10.1002/ohn.489
Article PubMed Google Scholar
Hoch CC, Wollenberg B, Lüers JC, Knoedler S, Knoedler L, Frank K, Cotofana S, Alfertshofer M (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol 280(9):4271–4278. https://doi.org/10.1007/s00405-023-08051-4
Article PubMed PubMed Central Google Scholar
Fokkens WJ et al (2012) European position paper on rhinosinusitis and nasal polyps 2012. Rhinology 50:1–298
Article PubMed Google Scholar
House JW, Brackmann DE (1985) Facial nerve grading system. Otolaryngol Head Neck Surg 93:146–147
Article CAS PubMed Google Scholar
Glasberg BR, Moore BCJ (1990) Derivation of auditory filter shapes from notched-noise data. Hear Res 47:103–138
Article CAS PubMed Google Scholar
Jacobson BH et al (1997) The voice handicap index (VHI): development and validation. Am J Speech Lang Pathol 6:66–70
Article Google Scholar
Bernier J et al (2004) Postoperative irradiation with or without concomitant chemotherapy for locally advanced head and neck cancer. N Engl J Med 350(19):1945–1952
Article CAS PubMed Google Scholar
Lechien JR et al (2020) Olfactory and gustatory dysfunctions as a clinical presentation of mild-to-moderate forms of the coronavirus disease (COVID-19): a multicenter European study. Eur Arch Otorhinolaryngol 277(8):2251–2261
Article PubMed PubMed Central Google Scholar
Rosenbek JC et al (1996) A penetration aspiration scale. Dysphagia 11:93–98
Article CAS PubMed Google Scholar
Jacobson GP, Newman CW (1998) The development of the Dizziness Handicap Inventory. Arch Otolaryngol Head Neck Surg 116:424–427
Article Google Scholar
Luce PA, Pisoni DB (1998) Recognizing spoken words: the neighborhood activation model. Ear Hear 19:1–36
Article CAS PubMed PubMed Central Google Scholar
Koufman JA (1991) The otolaryngologic manifestation of gastroesophageal reflux disease (GERD): a clinical investigation of 225 patients using ambulatory 24-hour pH monitoring and experimental investigation of the role of acid and pepsin in the development of laryngeal injury. Laryngoscope 101:1–78
Article CAS PubMed Google Scholar
Vermorken JB et al (2007) Cisplatin, fluorouracil, and docetaxel in unresectable head and neck cancer. N Engl J Med 357(17):1695–1704
Article CAS PubMed Google Scholar
Stammberger H, Posawetz W (1990) Functional endoscopic sinus surgery: concept, indications and results of the Messerklinger technique. Eur Arch Otorhinolaryngol 247:63–76
Article CAS PubMed Google Scholar
Spiro RH (1986) Salivary neoplasms: overview of a 35-year experience with 2807 patients. Head Neck Surg 8:177–184
Article CAS PubMed Google Scholar
Epley JM (1992) The canalith repositioning procedure: for treatment of benign paroxysmal positional vertigo. Otolaryngol Head Neck Surg 107:399–404
Article CAS PubMed Google Scholar
Hadad G et al (2006) A novel reconstructive technique after endoscopic expanded endonasal approaches: vascular pedicle nasoseptal flap. Laryngoscope 116:1882–1886
Article PubMed Google Scholar
Belafsky PC et al (2002) Validity and reliability of the reflux symptom index (RSI). J Voice 16:274–277
Article PubMed Google Scholar
Hummel T et al (2007) Normative data for the Sniffin’ Sticks including tests of odor identification, odor discrimination, and olfactory thresholds: an upgrade based on a group of more than 3000 subjects. Eur Arch Otorhinolaryngol 264:237–243
Article CAS PubMed Google Scholar
Bernier J et al (2005) Defining risk levels in locally advanced head and neck cancers: a comparative analysis of concurrent postoperative radiation plus chemotherapy trials of the EORTC (#22931) and RTOG (#9501). Head Neck 27:843–850
Article PubMed Google Scholar
Fokkens W et al (2007) European position paper on rhinosinusitis and nasal polyps. Rhinol Suppl 20:1–36
PubMed Google Scholar
Benninger MS (2003) Adult chronic rhinosinusitis: definitions, diagnosis, epidemiology, and pathophysiology. Otolaryngol Head Neck Surg 129:S1-32
Article PubMed Google Scholar
Belafsky PC et al (2001) The validity and reliability of the reflux finding score (RFS). Laryngoscope 111:1313–1317
Article CAS PubMed Google Scholar
Gatehouse S, Noble W (2004) The speech, spatial and qualities of hearing scale (SSQ). Int J Audiol 43:85–99
Article PubMed PubMed Central Google Scholar
Rosenfeld RM et al (2007) Clinical practice guideline: adult sinusitis. Otolaryngol Head Neck Surg 137:S1-31
Article PubMed Google Scholar
Dejonckere PH et al (2001) A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques – Guideline elaborated by the Committee on Phoniatrics of the European Laryngological Society (ELS). Eur Arch Otorhinolaryngol 258:77–82
Article CAS PubMed Google Scholar
Stammberger H (1986) Endoscopic endonasal surgery: concepts in treatment of recurring rhinosinusitis. 1. Anatomic and pathophysiologic considerations. Otolaryngol Head Neck Surg 94:143–147
Article CAS PubMed Google Scholar
Lund VJ, Kennedy DW (1997) Staging for rhinosinusitis. Otolaryngol Head Neck Surg 117:S35-40
Article CAS PubMed Google Scholar
Robbins KT et al (2002) Neck dissection classification update-revisions proposed by the American Head and Neck Society and the American Academy of Otolaryngology-Head and Neck Surgery. Arch Otolaryngol Head Neck Surg 128:751–758
Article PubMed Google Scholar
Piccirillo JF et al (2002) Psychometric and clinimetric validity of the 20- Item Sino-Nasal Outcome Test (SNOT-20). Otolaryngol Head Neck Surg 126:41–47
Article PubMed Google Scholar
Kennedy DW et al (1985) Functional endoscopic sinus surgery: theory and diagnostic evaluation. Arch Otolaryngol Head Neck Surg 111:576–582
Article CAS Google Scholar
Robbins KT et al (1991) Standardizing neck dissection terminology: official report of the Academy’s Committee for Head and Neck Surgery and Oncology. Arch Otolaryngol Head Neck Surg 117:601–605
Article CAS PubMed Google Scholar
Lanza DC, Kennedy DW (1997) Adult rhinosinusitis defined. Otolaryngol Head Neck Surg 117:S1-7
Article CAS PubMed Google Scholar
Frosolini A, Franz L, Benedetti S, Vaira LA, de Filippis C, Gennaro P, Marioni G, Gabriele G (2023) Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines. Eur Arch Otorhinolaryngol. https://doi.org/10.1007/s00405-023-08205-4
Article PubMed Google Scholar
Morath B, Chiriac U, Jaszkowski E, Deiß C, Nürnberg H, Hörth K, Hoppe-Tichy T, Green K (2023) Performance and risks of ChatGPT used in drug information: an exploratory real-world analysis. Eur J Hosp Pharm. https://doi.org/10.1136/ejhpharm-2023-003750
Article PubMed Google Scholar
Lechien JR, Gorton A, Robertson J, Vaira LA (2023) Is ChatGPT accurate in proofread a manuscript in otolaryngology-head and neck surgery? Otolaryngol Head Neck Surg. https://doi.org/10.1002/ohn.526
Article PubMed Google Scholar
Campbell DJ, Estephan LE, Sina E, Mastrolonardo EV, Alapati R, Amin DR, Cottrill E (2023) Evaluating ChatGPT responses on thyroid nodules for patient education. Thyroid. https://doi.org/10.1089/thy.2023.0491
Article PubMed PubMed Central Google Scholar

Download references

Funding

None.

Author information

Authors and Affiliations

Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology-Head Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium
Jerome R. Lechien
Department of Otorhinolaryngology and Head and Neck Surgery, School of Medicine, Phonetics and Phonology Laboratory (UMR 7018, Foch Hospital, CNRS, Université Sorbonne Nouvelle/Paris 3), Paris, France
Jerome R. Lechien
Department of Otorhinolaryngology and Head and Neck Surgery, School of Medicine, CHU de Bruxelles, CHU Saint-Pierre, Université Libre de Bruxelles, Brussels, Belgium
Jerome R. Lechien
Polyclinique Elsan de Poitiers, Poitiers, France
Jerome R. Lechien
Chair of AI and Digital Medicine, Faculty of Medicine, University of Mons, Mons, Belgium
Giovanni Briganti
Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy
Luigi A. Vaira
Biomedical Sciences Department, PhD School of Biomedical Science, University of Sassari, Sassari, Italy
Luigi A. Vaira
Department of Human Anatomy and Experimental Oncology, Faculty of Medicine, UMONS Research Institute for Health Sciences and Technology, Avenue du Champ de Mars, 6, 7000, Mons, Belgium
Jerome R. Lechien

Authors

Jerome R. Lechien
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Briganti
View author publications
You can also search for this author in PubMed Google Scholar
Luigi A. Vaira
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JRL: design, acquisition of data, drafting, final approval, and accountability for the work; final approval of the version to be published; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. GB: data analysis and interpretation, and proofread of the paper, final approval of the version to be published; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. LAV: design, acquisition of data, drafting, final approval, and accountability for the work; final approval of the version to be published; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to Jerome R. Lechien.

Ethics declarations

Conflict of interest

The authors have no conflict of interest.

Ethic committee

The institutional review board of CHU Saint-Pierre was not required for this study (ref.CHUST23).

Ethical declarations

The author Jerome R. Lechien is also guest editor of the special issue on ‘ChatGPT and Artifcial Intelligence in Otolar yngology—Head and Neck Surgery’. He was not involved with the peer review process of this article.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 199 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lechien, J.R., Briganti, G. & Vaira, L.A. Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology–head and neck surgery. Eur Arch Otorhinolaryngol 281, 2159–2165 (2024). https://doi.org/10.1007/s00405-023-08441-8

Download citation

Received: 07 October 2023
Accepted: 26 December 2023
Published: 11 January 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00405-023-08441-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology–head and neck surgery