Abstract
Introduction
Chatbot generative pre-trained transformer (ChatGPT) is a new artificial intelligence-powered language model of chatbot able to help otolaryngologists in practice and research. We investigated the accuracy of ChatGPT-3.5 and -4 in the referencing of manuscripts published in otolaryngology.
Methods
ChatGPT-3.5 and ChatGPT-4 were interrogated for providing references of the top-30 most cited papers in otolaryngology in the past 40 years including clinical guidelines and key studies that changed the practice. The responses were regenerated three times to assess the accuracy and stability of ChatGPT. ChatGPT-3.5 and ChatGPT-4 were compared for accuracy of reference and potential mistakes.
Results
The accuracy of ChatGPT-3.5 and ChatGPT-4.0 ranged from 47% to 60%, and 73% to 87%, respectively (p < 0.005). ChatGPT-3.5 provided 19 inaccurate references and invented 2 references throughout the regenerated questions. ChatGPT-4.0 provided 13 inaccurate references, while it proposed only one invented reference. The stability of responses throughout regenerated answers was mild (k = 0.238) and moderate (k = 0.408) for ChatGPT-3.5 and 4.0, respectively.
Conclusions
ChatGPT-4.0 reported higher accuracy than the free-access version (3.5). False references were detected in both 3.5 and 4.0 versions. Practitioners need to be careful regarding the use of ChatGPT in the reach of some key reference when writing a report.
Similar content being viewed by others
Data availability
Data are available on request.
References
Ayoub NF, Lee YJ, Grimm D, Divi V (2023) Head-to-head comparison of ChatGPT versus google search for medical knowledge acquisition. Otolaryngol Head Neck Surg. https://doi.org/10.1002/ohn.465
Salvagno M, Taccone FS, Gerli AG (2023) Can artificial intelligence help for scientific writing? Crit Care 27(1):75. https://doi.org/10.1186/s13054-023-04380-2
Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA et al (2023) Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg. https://doi.org/10.1002/ohn.489
Hoch CC, Wollenberg B, Lüers JC, Knoedler S, Knoedler L, Frank K, Cotofana S, Alfertshofer M (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol 280(9):4271–4278. https://doi.org/10.1007/s00405-023-08051-4
Fokkens WJ et al (2012) European position paper on rhinosinusitis and nasal polyps 2012. Rhinology 50:1–298
House JW, Brackmann DE (1985) Facial nerve grading system. Otolaryngol Head Neck Surg 93:146–147
Glasberg BR, Moore BCJ (1990) Derivation of auditory filter shapes from notched-noise data. Hear Res 47:103–138
Jacobson BH et al (1997) The voice handicap index (VHI): development and validation. Am J Speech Lang Pathol 6:66–70
Bernier J et al (2004) Postoperative irradiation with or without concomitant chemotherapy for locally advanced head and neck cancer. N Engl J Med 350(19):1945–1952
Lechien JR et al (2020) Olfactory and gustatory dysfunctions as a clinical presentation of mild-to-moderate forms of the coronavirus disease (COVID-19): a multicenter European study. Eur Arch Otorhinolaryngol 277(8):2251–2261
Rosenbek JC et al (1996) A penetration aspiration scale. Dysphagia 11:93–98
Jacobson GP, Newman CW (1998) The development of the Dizziness Handicap Inventory. Arch Otolaryngol Head Neck Surg 116:424–427
Luce PA, Pisoni DB (1998) Recognizing spoken words: the neighborhood activation model. Ear Hear 19:1–36
Koufman JA (1991) The otolaryngologic manifestation of gastroesophageal reflux disease (GERD): a clinical investigation of 225 patients using ambulatory 24-hour pH monitoring and experimental investigation of the role of acid and pepsin in the development of laryngeal injury. Laryngoscope 101:1–78
Vermorken JB et al (2007) Cisplatin, fluorouracil, and docetaxel in unresectable head and neck cancer. N Engl J Med 357(17):1695–1704
Stammberger H, Posawetz W (1990) Functional endoscopic sinus surgery: concept, indications and results of the Messerklinger technique. Eur Arch Otorhinolaryngol 247:63–76
Spiro RH (1986) Salivary neoplasms: overview of a 35-year experience with 2807 patients. Head Neck Surg 8:177–184
Epley JM (1992) The canalith repositioning procedure: for treatment of benign paroxysmal positional vertigo. Otolaryngol Head Neck Surg 107:399–404
Hadad G et al (2006) A novel reconstructive technique after endoscopic expanded endonasal approaches: vascular pedicle nasoseptal flap. Laryngoscope 116:1882–1886
Belafsky PC et al (2002) Validity and reliability of the reflux symptom index (RSI). J Voice 16:274–277
Hummel T et al (2007) Normative data for the Sniffin’ Sticks including tests of odor identification, odor discrimination, and olfactory thresholds: an upgrade based on a group of more than 3000 subjects. Eur Arch Otorhinolaryngol 264:237–243
Bernier J et al (2005) Defining risk levels in locally advanced head and neck cancers: a comparative analysis of concurrent postoperative radiation plus chemotherapy trials of the EORTC (#22931) and RTOG (#9501). Head Neck 27:843–850
Fokkens W et al (2007) European position paper on rhinosinusitis and nasal polyps. Rhinol Suppl 20:1–36
Benninger MS (2003) Adult chronic rhinosinusitis: definitions, diagnosis, epidemiology, and pathophysiology. Otolaryngol Head Neck Surg 129:S1-32
Belafsky PC et al (2001) The validity and reliability of the reflux finding score (RFS). Laryngoscope 111:1313–1317
Gatehouse S, Noble W (2004) The speech, spatial and qualities of hearing scale (SSQ). Int J Audiol 43:85–99
Rosenfeld RM et al (2007) Clinical practice guideline: adult sinusitis. Otolaryngol Head Neck Surg 137:S1-31
Dejonckere PH et al (2001) A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques – Guideline elaborated by the Committee on Phoniatrics of the European Laryngological Society (ELS). Eur Arch Otorhinolaryngol 258:77–82
Stammberger H (1986) Endoscopic endonasal surgery: concepts in treatment of recurring rhinosinusitis. 1. Anatomic and pathophysiologic considerations. Otolaryngol Head Neck Surg 94:143–147
Lund VJ, Kennedy DW (1997) Staging for rhinosinusitis. Otolaryngol Head Neck Surg 117:S35-40
Robbins KT et al (2002) Neck dissection classification update-revisions proposed by the American Head and Neck Society and the American Academy of Otolaryngology-Head and Neck Surgery. Arch Otolaryngol Head Neck Surg 128:751–758
Piccirillo JF et al (2002) Psychometric and clinimetric validity of the 20- Item Sino-Nasal Outcome Test (SNOT-20). Otolaryngol Head Neck Surg 126:41–47
Kennedy DW et al (1985) Functional endoscopic sinus surgery: theory and diagnostic evaluation. Arch Otolaryngol Head Neck Surg 111:576–582
Robbins KT et al (1991) Standardizing neck dissection terminology: official report of the Academy’s Committee for Head and Neck Surgery and Oncology. Arch Otolaryngol Head Neck Surg 117:601–605
Lanza DC, Kennedy DW (1997) Adult rhinosinusitis defined. Otolaryngol Head Neck Surg 117:S1-7
Frosolini A, Franz L, Benedetti S, Vaira LA, de Filippis C, Gennaro P, Marioni G, Gabriele G (2023) Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines. Eur Arch Otorhinolaryngol. https://doi.org/10.1007/s00405-023-08205-4
Morath B, Chiriac U, Jaszkowski E, Deiß C, Nürnberg H, Hörth K, Hoppe-Tichy T, Green K (2023) Performance and risks of ChatGPT used in drug information: an exploratory real-world analysis. Eur J Hosp Pharm. https://doi.org/10.1136/ejhpharm-2023-003750
Lechien JR, Gorton A, Robertson J, Vaira LA (2023) Is ChatGPT accurate in proofread a manuscript in otolaryngology-head and neck surgery? Otolaryngol Head Neck Surg. https://doi.org/10.1002/ohn.526
Campbell DJ, Estephan LE, Sina E, Mastrolonardo EV, Alapati R, Amin DR, Cottrill E (2023) Evaluating ChatGPT responses on thyroid nodules for patient education. Thyroid. https://doi.org/10.1089/thy.2023.0491
Funding
None.
Author information
Authors and Affiliations
Contributions
JRL: design, acquisition of data, drafting, final approval, and accountability for the work; final approval of the version to be published; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. GB: data analysis and interpretation, and proofread of the paper, final approval of the version to be published; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. LAV: design, acquisition of data, drafting, final approval, and accountability for the work; final approval of the version to be published; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest.
Ethic committee
The institutional review board of CHU Saint-Pierre was not required for this study (ref.CHUST23).
Ethical declarations
The author Jerome R. Lechien is also guest editor of the special issue on ‘ChatGPT and Artifcial Intelligence in Otolar yngology—Head and Neck Surgery’. He was not involved with the peer review process of this article.
Informed consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lechien, J.R., Briganti, G. & Vaira, L.A. Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology–head and neck surgery. Eur Arch Otorhinolaryngol 281, 2159–2165 (2024). https://doi.org/10.1007/s00405-023-08441-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00405-023-08441-8