Abstract
Purpose
We aimed to assess the appropriateness of ChatGPT in providing answers related to prostate cancer (PCa) screening, comparing GPT-3.5 and GPT-4.
Methods
A committee of five reviewers designed 30 questions related to PCa screening, categorized into three difficulty levels. The questions were formulated identically for both GPTs three times, varying the prompts. Each reviewer assigned a score for accuracy, clarity, and conciseness. The readability was assessed by the Flesch Kincaid Grade (FKG) and Flesch Reading Ease (FRE). The mean scores were extracted and compared using the Wilcoxon test. We compared the readability across the three different prompts by ANOVA.
Results
In GPT-3.5 the mean score (SD) for accuracy, clarity, and conciseness was 1.5 (0.59), 1.7 (0.45), 1.7 (0.49), respectively for easy questions; 1.3 (0.67), 1.6 (0.69), 1.3 (0.65) for medium; 1.3 (0.62), 1.6 (0.56), 1.4 (0.56) for hard. In GPT-4 was 2.0 (0), 2.0 (0), 2.0 (0.14), respectively for easy questions; 1.7 (0.66), 1.8 (0.61), 1.7 (0.64) for medium; 2.0 (0.24), 1.8 (0.37), 1.9 (0.27) for hard. GPT-4 performed better for all three qualities and difficulty levels than GPT-3.5. The FKG mean for GPT-3.5 and GPT-4 answers were 12.8 (1.75) and 10.8 (1.72), respectively; the FRE for GPT-3.5 and GPT-4 was 37.3 (9.65) and 47.6 (9.88), respectively. The 2nd prompt has achieved better results in terms of clarity (all p < 0.05).
Conclusions
GPT-4 displayed superior accuracy, clarity, conciseness, and readability than GPT-3.5. Though prompts influenced the quality response in both GPTs, their impact was significant only for clarity.
Similar content being viewed by others
Data availability
The data supporting the findings of this study are available upon specific request to the author.
References
Stevenson FA, Kerr C, Murray E, Nazareth I (2007) Information from the Internet and the doctor-patient relationship: the patient perspective – a qualitative study. BMC Fam Pract 8(1):47. https://doi.org/10.1186/1471-2296-8-47
Gualtieri LN (2009) “The doctor as the second opinion and the internet as the first,” in CHI ‘09 Extended Abstracts on Human Factors in Computing Systems, in CHI EA ‘09. New York, NY, USA: Association for Computing Machinery, 2489–2498. https://doi.org/10.1145/1520340.1520352
Murphy M (2019) “Dr Google will see you now: Search giant wants to cash in on your medical queries,” The Telegraph, Mar. 10, 2019. Accessed: Jul. 12, 2023. [Online]. Available: https://www.telegraph.co.uk/technology/2019/03/10/google-sifting-one-billion-health-questions-day/
Mesko B (2023) The ChatGPT (Generative Artificial Intelligence) Revolution has made artificial intelligence approachable for medical professionals. J Med Internet Res 25(1):e48392. https://doi.org/10.2196/48392
Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L (2023) Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA 329(10):842–844. https://doi.org/10.1001/jama.2023.1044
Haver HL, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, Yi PH (2023) Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 307(4):e230424. https://doi.org/10.1148/radiol.230424
Davis R et al (2023) Evaluating the effectiveness of artificial intelligence-powered Large Language Models (LLMS) application in disseminating appropriate and readable health information in urology. J Urol. https://doi.org/10.1097/JU.0000000000003615
Naccarato AMEP, Reis LO, Matheus WE, Ferreira U, Denardi F (2011) Barriers to prostate cancer screening: psychological aspects and descriptive variables–-is there a correlation? Aging Male 14(1):66–71. https://doi.org/10.3109/13685538.2010.522277
Rezaee ME, Goddard B, Sverrisson EF, Seigne JD, Dagrosa LM (2019) ‘Dr Google’: trends in online interest in prostate cancer screening, diagnosis and treatment. BJU Int 124(4):629–634. https://doi.org/10.1111/bju.14846
Daraz L et al (2019) Can patients trust online health information? A meta-narrative systematic review addressing the quality of health information on the internet. J Gen Intern Med 34(9):1884–1891. https://doi.org/10.1007/s11606-019-05109-0
Van Bulck L, Moons P (2023) What if your patient switches from Dr. Google to Dr. ChatGPT? A vignette-based survey of the trustworthiness, value, and danger of ChatGPT-generated responses to health questions. Eur J Cardiovasc Nurs. https://doi.org/10.1093/eurjcn/zvad038
“Is ChatGPT an Evidence-based Doctor? - ClinicalKey.” Accessed: Aug. 23, 2023. [Online]. Available: https://www.clinicalkey.com/#!/content/playContent/1-s2.0-S0302283823027173?returnurl=https:%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0302283823027173%3Fshowall%3Dtrue&referrer=
Yang S, Lee C-J, Beak J (2023) Social disparities in online health-related activities and social support: findings from health information national trends survey. Health Commun 38(7):1293–1304. https://doi.org/10.1080/10410236.2021.2004698
Cacciamani GE, Collins GS, Gill IS (2023) ChatGPT: standard reporting guidelines for responsible use. Nature 618(7964):238–238. https://doi.org/10.1038/d41586-023-01853-w
Dave T, Athaluri SA, Singh S (2023) ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 6:1169595. https://doi.org/10.3389/frai.2023.1169595
Cocci A et al (2023) Quality of information and appropriateness of ChatGPT outputs for urology patients. Prostate Cancer Prostatic Dis. https://doi.org/10.1038/s41391-023-00705-y
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
Conceptualization: Chiarelli Giuseppe, Abdollah, Firas. Data curation: Chiarelli, Giuseppe, Arora Sohrab. Formal analysis: Stephens, Alex. Funding acquisition: Rogers Craig, Abdollah Firas. Investigation: Chiarelli Giuseppe, Cirulli Giuseppe Ottone, Finati Marco, Beatrici Edoardo, Dejan Filipas, Tinsley Shane, Arora Sohrab. Methodology: Chiarelli Giuseppe, Stephens Alex, Abdollah, Firas. Project administration: Abdollah Firas. Supervision: Bhandari Mahendra, Trinh Quoc-Dien, Carrieri Giuseppe, Briganti Alberto, Montorsi Francesco, Lughezzani Giovanni, Buffi Nicolò. Validation: Abdollah Firas. Visualization: Chiarelli Giuseppe. Writing–original draft: Chiarelli Giuseppe. Writing–review and editing: Chiarelli Giuseppe, Abdollah Firas.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no financial or proprietary interests in any material discussed in this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chiarelli, G., Stephens, A., Finati, M. et al. Adequacy of prostate cancer prevention and screening recommendations provided by an artificial intelligence-powered large language model. Int Urol Nephrol (2024). https://doi.org/10.1007/s11255-024-04009-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11255-024-04009-5