ChatGPT vs. web search for patient questions: what does ChatGPT do better?

Shen, Sarek A.; Perez-Heydrich, Carlos A.; Xie, Deborah X.; Nellis, Jason C.

doi:10.1007/s00405-024-08524-0

ChatGPT vs. web search for patient questions: what does ChatGPT do better?

Miscellaneous
Published: 28 February 2024

Volume 281, pages 3219–3225, (2024)
Cite this article

European Archives of Oto-Rhino-Laryngology Aims and scope Submit manuscript

Sarek A. Shen ORCID: orcid.org/0000-0001-6627-6139¹,
Carlos A. Perez-Heydrich²,
Deborah X. Xie¹ &
…
Jason C. Nellis¹

494 Accesses
3 Citations
Explore all metrics

Abstract

Purpose

Chat generative pretrained transformer (ChatGPT) has the potential to significantly impact how patients acquire medical information online. Here, we characterize the readability and appropriateness of ChatGPT responses to a range of patient questions compared to results from traditional web searches.

Methods

Patient questions related to the published Clinical Practice Guidelines by the American Academy of Otolaryngology-Head and Neck Surgery were sourced from existing online posts. Questions were categorized using a modified Rothwell classification system into (1) fact, (2) policy, and (3) diagnosis and recommendations. These were queried using ChatGPT and traditional web search. All results were evaluated on readability (Flesch Reading Ease and Flesch-Kinkaid Grade Level) and understandability (Patient Education Materials Assessment Tool). Accuracy was assessed by two blinded clinical evaluators using a three-point ordinal scale.

Results

54 questions were organized into fact (37.0%), policy (37.0%), and diagnosis (25.8%). The average readability for ChatGPT responses was lower than traditional web search (FRE: 42.3 ± 13.1 vs. 55.6 ± 10.5, p < 0.001), while the PEMAT understandability was equivalent (93.8% vs. 93.5%, p = 0.17). ChatGPT scored higher than web search for questions the ‘Diagnosis’ category (p < 0.01); there was no difference in questions categorized as ‘Fact’ (p = 0.15) or ‘Policy’ (p = 0.22). Additional prompting improved ChatGPT response readability (FRE 55.6 ± 13.6, p < 0.01).

Conclusions

ChatGPT outperforms web search in answering patient questions related to symptom-based diagnoses and is equivalent in providing medical facts and established policy. Appropriate prompting can further improve readability while maintaining accuracy. Further patient education is needed to relay the benefits and limitations of this technology as a source of medial information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quality of information and appropriateness of ChatGPT outputs for urology patients

Article 29 July 2023

Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery

Article Open access 27 April 2023

Evaluating AI in medicine: a comparative analysis of expert and ChatGPT responses to colorectal cancer questions

Article Open access 03 February 2024

Data availability

Questions used within this project are included in the supplementary data.

References

Finney Rutten LJ et al (2019) Online health information seeking among US adults: measuring progress toward a healthy people 2020 objective. Public Health Rep 134(6):617–625
Article PubMed PubMed Central Google Scholar
Bergmo TS et al (2023) Internet use for obtaining medicine information: cross-sectional survey. JMIR Form Res 7:e40466
Article PubMed PubMed Central Google Scholar
Amante DJ et al (2015) Access to care and use of the internet to search for health information: results from the US national health interview survey. J Med Internet Res 17(4):e106
Article PubMed PubMed Central Google Scholar
O’Mathúna DP (2018) How should clinicians engage with online health information? AMA J Ethics 20(11):E1059-1066
Article PubMed Google Scholar
Else H (2023) Abstracts written by ChatGPT fool scientists. Nature 613(7944):423
Article CAS PubMed Google Scholar
Gilson A et al (2023) How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312
Article PubMed PubMed Central Google Scholar
Sarraju A et al (2023) Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA 329:842–844
Article PubMed PubMed Central Google Scholar
Ayoub NF et al (2023) Comparison between ChatGPT and google search as sources of postoperative patient instructions. JAMA Otolaryngol Head Neck Surg 149:556–558
Article PubMed PubMed Central Google Scholar
Ayers JW et al (2023) Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 183(6):589–596
Article PubMed PubMed Central Google Scholar
Gabriel J et al (2023) The utility of the ChatGPT artificial intelligence tool for patient education and enquiry in robotic radical prostatectomy. Int Urol Nephrol 55:2717–2732
Article PubMed Google Scholar
Samaan JS et al (2023) Assessing the accuracy of responses by the language model ChatGPT to questions regarding bariatric surgery. Obes Surg 33(6):1790–1796
Article PubMed PubMed Central Google Scholar
Shneyderman M et al (2021) Readability of online materials related to vocal cord leukoplakia. OTO Open 5(3):2473974x211032644
Article PubMed PubMed Central Google Scholar
Hannabass K, Lee J (2022) Readability analysis of otolaryngology consent documents on the iMed consent platform. Mil Med 188:780–785
Article Google Scholar
Kim JH et al (2022) Readability of the American, Canadian, and British Otolaryngology-Head and Neck Surgery Societies’ patient materials. Otolaryngol Head Neck Surg 166(5):862–868
Article PubMed Google Scholar
Weis BD (2003) Health literacy: a manual for clinicians. American Medical Association, American Medical Foundation, USA
Google Scholar
Shoemaker SJ, Wolf MS, Brach C (2014) Development of the patient education materials assessment tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Educ Couns 96(3):395–403
Article PubMed PubMed Central Google Scholar
Rothwell JD (2021) In mixed company 11e: communicating in small groups and teams. Oxford University Press, Incorporated, Oxford
Google Scholar
Johnson D et al (2023) Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model. Res Sq 28:rs.3.rs-2566942
Ayoub NF et al (2023) Head-to-head comparison of ChatGPT versus google search for medical knowledge acquisition. Otolaryngol Head Neck Surg. https://doi.org/10.1002/ohn.465
Article PubMed Google Scholar
Patel MJ et al (2022) Analysis of online patient education materials on rhinoplasty. Fac Plast Surg Aesthet Med 24(4):276–281
Article Google Scholar
Kasabwala K et al (2012) Readability assessment of patient education materials from the American Academy of Otolaryngology-Head and Neck Surgery Foundation. Otolaryngol Head Neck Surg 147(3):466–471
Article PubMed Google Scholar
Chen LW et al (2021) Search trends and quality of online resources regarding thyroidectomy. Otolaryngol Head Neck Surg 165(1):50–58
Article PubMed Google Scholar
Misra P et al (2012) Readability analysis of internet-based patient information regarding skull base tumors. J Neurooncol 109(3):573–580
Article PubMed Google Scholar
Yang S, Lee CJ, Beak J (2021) Social disparities in online health-related activities and social support: findings from health information national trends survey. Health Commun 38:1293–1304
Article PubMed Google Scholar
Eysenbach G (2023) The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ 9(1):e46885
Article PubMed PubMed Central Google Scholar
Xu L et al (2021) Chatbot for health care and oncology applications using artificial intelligence and machine learning: systematic review. JMIR Cancer 7(4):e27850
Article PubMed PubMed Central Google Scholar
Pham KT, Nabizadeh A, Selek S (2022) Artificial intelligence and chatbots in psychiatry. Psychiatr Q 93(1):249–253
Article PubMed PubMed Central Google Scholar
Chakraborty C et al (2023) Overview of Chatbots with special emphasis on artificial intelligence-enabled ChatGPT in medical science. Front Artif Intell 6:1237704
Article PubMed PubMed Central Google Scholar
Liu J, Wang C, Liu S (2023) Utility of ChatGPT in clinical practice. J Med Internet Res 25:e48568
Article PubMed PubMed Central Google Scholar
van Dis EAM et al (2023) ChatGPT: five priorities for research. Nature 614(7947):224–226
Article PubMed Google Scholar
Rich AS, Gureckis TM (2019) Lessons for artificial intelligence from the study of natural stupidity. Nat Mach Intell 1(4):174–180
Article Google Scholar

Download references

Funding

This work was supported in part by the National Institute of Deafness and Other Communication Disorders (NIDCD) Grant No. 5T32DC000027-33.

Author information

Authors and Affiliations

Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins School of Medicine, 601 North Caroline Street, Baltimore, MD, 21287, USA
Sarek A. Shen, Deborah X. Xie & Jason C. Nellis
Johns Hopkins School of Medicine, Baltimore, MD, USA
Carlos A. Perez-Heydrich

Authors

Sarek A. Shen
View author publications
You can also search for this author in PubMed Google Scholar
Carlos A. Perez-Heydrich
View author publications
You can also search for this author in PubMed Google Scholar
Deborah X. Xie
View author publications
You can also search for this author in PubMed Google Scholar
Jason C. Nellis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Dr. Sarek Shen led study design, analysis and interpretation of the data, and composing the manuscript. Dr. Xie assisted with design and evaluation of ChatGPT and web search responses. Mr. Perez-Heydrich provided literature review and quantification of response readability and understandability. Dr. Nellis helped conceive the project and reviewed the manuscript.

Corresponding author

Correspondence to Sarek A. Shen.

Ethics declarations

Conflict of interest

None.

Ethics approval

This study does not include the use of human or animal subjects and was deemed exempt by the Johns Hopkins Institutional Review Board.

Consent

None

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 21 KB)

Supplementary file2 (DOCX 14 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shen, S.A., Perez-Heydrich, C.A., Xie, D.X. et al. ChatGPT vs. web search for patient questions: what does ChatGPT do better?. Eur Arch Otorhinolaryngol 281, 3219–3225 (2024). https://doi.org/10.1007/s00405-024-08524-0

Download citation

Received: 18 December 2023
Accepted: 31 January 2024
Published: 28 February 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s00405-024-08524-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ChatGPT vs. web search for patient questions: what does ChatGPT do better?