Skip to main content

Advertisement

Log in

Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology–head and neck surgery

  • Miscellaneous
  • Published:
European Archives of Oto-Rhino-Laryngology Aims and scope Submit manuscript

Abstract

Introduction

Chatbot generative pre-trained transformer (ChatGPT) is a new artificial intelligence-powered language model of chatbot able to help otolaryngologists in practice and research. We investigated the accuracy of ChatGPT-3.5 and -4 in the referencing of manuscripts published in otolaryngology.

Methods

ChatGPT-3.5 and ChatGPT-4 were interrogated for providing references of the top-30 most cited papers in otolaryngology in the past 40 years including clinical guidelines and key studies that changed the practice. The responses were regenerated three times to assess the accuracy and stability of ChatGPT. ChatGPT-3.5 and ChatGPT-4 were compared for accuracy of reference and potential mistakes.

Results

The accuracy of ChatGPT-3.5 and ChatGPT-4.0 ranged from 47% to 60%, and 73% to 87%, respectively (p < 0.005). ChatGPT-3.5 provided 19 inaccurate references and invented 2 references throughout the regenerated questions. ChatGPT-4.0 provided 13 inaccurate references, while it proposed only one invented reference. The stability of responses throughout regenerated answers was mild (k = 0.238) and moderate (k = 0.408) for ChatGPT-3.5 and 4.0, respectively.

Conclusions

ChatGPT-4.0 reported higher accuracy than the free-access version (3.5). False references were detected in both 3.5 and 4.0 versions. Practitioners need to be careful regarding the use of ChatGPT in the reach of some key reference when writing a report.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Data availability

Data are available on request.

References

  1. Ayoub NF, Lee YJ, Grimm D, Divi V (2023) Head-to-head comparison of ChatGPT versus google search for medical knowledge acquisition. Otolaryngol Head Neck Surg. https://doi.org/10.1002/ohn.465

    Article  PubMed  Google Scholar 

  2. Salvagno M, Taccone FS, Gerli AG (2023) Can artificial intelligence help for scientific writing? Crit Care 27(1):75. https://doi.org/10.1186/s13054-023-04380-2

    Article  PubMed  PubMed Central  Google Scholar 

  3. Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA et al (2023) Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg. https://doi.org/10.1002/ohn.489

    Article  PubMed  Google Scholar 

  4. Hoch CC, Wollenberg B, Lüers JC, Knoedler S, Knoedler L, Frank K, Cotofana S, Alfertshofer M (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol 280(9):4271–4278. https://doi.org/10.1007/s00405-023-08051-4

    Article  PubMed  PubMed Central  Google Scholar 

  5. Fokkens WJ et al (2012) European position paper on rhinosinusitis and nasal polyps 2012. Rhinology 50:1–298

    Article  PubMed  Google Scholar 

  6. House JW, Brackmann DE (1985) Facial nerve grading system. Otolaryngol Head Neck Surg 93:146–147

    Article  CAS  PubMed  Google Scholar 

  7. Glasberg BR, Moore BCJ (1990) Derivation of auditory filter shapes from notched-noise data. Hear Res 47:103–138

    Article  CAS  PubMed  Google Scholar 

  8. Jacobson BH et al (1997) The voice handicap index (VHI): development and validation. Am J Speech Lang Pathol 6:66–70

    Article  Google Scholar 

  9. Bernier J et al (2004) Postoperative irradiation with or without concomitant chemotherapy for locally advanced head and neck cancer. N Engl J Med 350(19):1945–1952

    Article  CAS  PubMed  Google Scholar 

  10. Lechien JR et al (2020) Olfactory and gustatory dysfunctions as a clinical presentation of mild-to-moderate forms of the coronavirus disease (COVID-19): a multicenter European study. Eur Arch Otorhinolaryngol 277(8):2251–2261

    Article  PubMed  PubMed Central  Google Scholar 

  11. Rosenbek JC et al (1996) A penetration aspiration scale. Dysphagia 11:93–98

    Article  CAS  PubMed  Google Scholar 

  12. Jacobson GP, Newman CW (1998) The development of the Dizziness Handicap Inventory. Arch Otolaryngol Head Neck Surg 116:424–427

    Article  Google Scholar 

  13. Luce PA, Pisoni DB (1998) Recognizing spoken words: the neighborhood activation model. Ear Hear 19:1–36

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Koufman JA (1991) The otolaryngologic manifestation of gastroesophageal reflux disease (GERD): a clinical investigation of 225 patients using ambulatory 24-hour pH monitoring and experimental investigation of the role of acid and pepsin in the development of laryngeal injury. Laryngoscope 101:1–78

    Article  CAS  PubMed  Google Scholar 

  15. Vermorken JB et al (2007) Cisplatin, fluorouracil, and docetaxel in unresectable head and neck cancer. N Engl J Med 357(17):1695–1704

    Article  CAS  PubMed  Google Scholar 

  16. Stammberger H, Posawetz W (1990) Functional endoscopic sinus surgery: concept, indications and results of the Messerklinger technique. Eur Arch Otorhinolaryngol 247:63–76

    Article  CAS  PubMed  Google Scholar 

  17. Spiro RH (1986) Salivary neoplasms: overview of a 35-year experience with 2807 patients. Head Neck Surg 8:177–184

    Article  CAS  PubMed  Google Scholar 

  18. Epley JM (1992) The canalith repositioning procedure: for treatment of benign paroxysmal positional vertigo. Otolaryngol Head Neck Surg 107:399–404

    Article  CAS  PubMed  Google Scholar 

  19. Hadad G et al (2006) A novel reconstructive technique after endoscopic expanded endonasal approaches: vascular pedicle nasoseptal flap. Laryngoscope 116:1882–1886

    Article  PubMed  Google Scholar 

  20. Belafsky PC et al (2002) Validity and reliability of the reflux symptom index (RSI). J Voice 16:274–277

    Article  PubMed  Google Scholar 

  21. Hummel T et al (2007) Normative data for the Sniffin’ Sticks including tests of odor identification, odor discrimination, and olfactory thresholds: an upgrade based on a group of more than 3000 subjects. Eur Arch Otorhinolaryngol 264:237–243

    Article  CAS  PubMed  Google Scholar 

  22. Bernier J et al (2005) Defining risk levels in locally advanced head and neck cancers: a comparative analysis of concurrent postoperative radiation plus chemotherapy trials of the EORTC (#22931) and RTOG (#9501). Head Neck 27:843–850

    Article  PubMed  Google Scholar 

  23. Fokkens W et al (2007) European position paper on rhinosinusitis and nasal polyps. Rhinol Suppl 20:1–36

    PubMed  Google Scholar 

  24. Benninger MS (2003) Adult chronic rhinosinusitis: definitions, diagnosis, epidemiology, and pathophysiology. Otolaryngol Head Neck Surg 129:S1-32

    Article  PubMed  Google Scholar 

  25. Belafsky PC et al (2001) The validity and reliability of the reflux finding score (RFS). Laryngoscope 111:1313–1317

    Article  CAS  PubMed  Google Scholar 

  26. Gatehouse S, Noble W (2004) The speech, spatial and qualities of hearing scale (SSQ). Int J Audiol 43:85–99

    Article  PubMed  PubMed Central  Google Scholar 

  27. Rosenfeld RM et al (2007) Clinical practice guideline: adult sinusitis. Otolaryngol Head Neck Surg 137:S1-31

    Article  PubMed  Google Scholar 

  28. Dejonckere PH et al (2001) A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques – Guideline elaborated by the Committee on Phoniatrics of the European Laryngological Society (ELS). Eur Arch Otorhinolaryngol 258:77–82

    Article  CAS  PubMed  Google Scholar 

  29. Stammberger H (1986) Endoscopic endonasal surgery: concepts in treatment of recurring rhinosinusitis. 1. Anatomic and pathophysiologic considerations. Otolaryngol Head Neck Surg 94:143–147

    Article  CAS  PubMed  Google Scholar 

  30. Lund VJ, Kennedy DW (1997) Staging for rhinosinusitis. Otolaryngol Head Neck Surg 117:S35-40

    Article  CAS  PubMed  Google Scholar 

  31. Robbins KT et al (2002) Neck dissection classification update-revisions proposed by the American Head and Neck Society and the American Academy of Otolaryngology-Head and Neck Surgery. Arch Otolaryngol Head Neck Surg 128:751–758

    Article  PubMed  Google Scholar 

  32. Piccirillo JF et al (2002) Psychometric and clinimetric validity of the 20- Item Sino-Nasal Outcome Test (SNOT-20). Otolaryngol Head Neck Surg 126:41–47

    Article  PubMed  Google Scholar 

  33. Kennedy DW et al (1985) Functional endoscopic sinus surgery: theory and diagnostic evaluation. Arch Otolaryngol Head Neck Surg 111:576–582

    Article  CAS  Google Scholar 

  34. Robbins KT et al (1991) Standardizing neck dissection terminology: official report of the Academy’s Committee for Head and Neck Surgery and Oncology. Arch Otolaryngol Head Neck Surg 117:601–605

    Article  CAS  PubMed  Google Scholar 

  35. Lanza DC, Kennedy DW (1997) Adult rhinosinusitis defined. Otolaryngol Head Neck Surg 117:S1-7

    Article  CAS  PubMed  Google Scholar 

  36. Frosolini A, Franz L, Benedetti S, Vaira LA, de Filippis C, Gennaro P, Marioni G, Gabriele G (2023) Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines. Eur Arch Otorhinolaryngol. https://doi.org/10.1007/s00405-023-08205-4

    Article  PubMed  Google Scholar 

  37. Morath B, Chiriac U, Jaszkowski E, Deiß C, Nürnberg H, Hörth K, Hoppe-Tichy T, Green K (2023) Performance and risks of ChatGPT used in drug information: an exploratory real-world analysis. Eur J Hosp Pharm. https://doi.org/10.1136/ejhpharm-2023-003750

    Article  PubMed  Google Scholar 

  38. Lechien JR, Gorton A, Robertson J, Vaira LA (2023) Is ChatGPT accurate in proofread a manuscript in otolaryngology-head and neck surgery? Otolaryngol Head Neck Surg. https://doi.org/10.1002/ohn.526

    Article  PubMed  Google Scholar 

  39. Campbell DJ, Estephan LE, Sina E, Mastrolonardo EV, Alapati R, Amin DR, Cottrill E (2023) Evaluating ChatGPT responses on thyroid nodules for patient education. Thyroid. https://doi.org/10.1089/thy.2023.0491

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Funding

None.

Author information

Authors and Affiliations

Authors

Contributions

JRL: design, acquisition of data, drafting, final approval, and accountability for the work; final approval of the version to be published; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. GB: data analysis and interpretation, and proofread of the paper, final approval of the version to be published; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. LAV: design, acquisition of data, drafting, final approval, and accountability for the work; final approval of the version to be published; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to Jerome R. Lechien.

Ethics declarations

Conflict of interest

The authors have no conflict of interest.

Ethic committee

The institutional review board of CHU Saint-Pierre was not required for this study (ref.CHUST23).

Ethical declarations

The author Jerome R. Lechien is also guest editor of the special issue on ‘ChatGPT and Artifcial Intelligence in Otolar yngology—Head and Neck Surgery’. He was not involved with the peer review process of this article.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 199 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lechien, J.R., Briganti, G. & Vaira, L.A. Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology–head and neck surgery. Eur Arch Otorhinolaryngol 281, 2159–2165 (2024). https://doi.org/10.1007/s00405-023-08441-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00405-023-08441-8

Keywords

Navigation