Assessing ChatGPT's Performance in Health Fact-Checking: Performance, Biases, and Risks

Ni, Zhenni; Qian, Yuxing; Vaillant, Pascal; Jaulent, Marie-Christine; Bousquet, Cédric

doi:10.1007/978-3-031-49212-9_50

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1957))

Included in the following conference series:

International Conference on Human-Computer Interaction

345 Accesses

Abstract

The increasing use of ChatGPT by the general public has prompted us to assess ChatGPT's performance in health fact-checking and uncover potential biases and risks arising from its utilization. In this study, we employed two publicly accessible datasets to evaluate ChatGPT's performance. We utilized BERTopic for clustering health claims into topics and subsequently employed the gpt-3.5-turbo API for fact-checking these claims. ChatGPT's performance was appraised on multi-class (False, Mixture, Mostly-False, Mostly-True, True) and binary (True, False) levels, with a thorough analysis of its performance across various topics. ChatGPT achieved a F1-score of 0.54 and 0.64 in the multi-class task and 0.88 and 0.85 in the binary task on the two datasets, respectively. In most health topics (e.g., vaccines, Covid-19), ChatGPT's F1-score exceeded 0.8, except for specific topics, such as novel or contentious cancer treatments, which yielded a F1-score below 0.6. We scrutinized the erroneous fact-checking labels and explanations provided by ChatGPT, revealing that it may produce inaccurate results for claims with misleading intent, inaccurate information, emerging research findings, or contentious health knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A vignette-based evaluation of ChatGPT’s ability to provide appropriate and equitable medical advice across care contexts

Article Open access 19 October 2023

Assessing ChatGPT’s Potential in HIV Prevention Communication: A Comprehensive Evaluation of Accuracy, Completeness, and Inclusivity

Article Open access 05 June 2024

Dr. GPT will see you now: the ability of large language model-linked chatbots to provide colorectal cancer screening recommendations

Article 04 March 2024

References

Nori, H., King, N., McKinney, S.M., et al.: Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv: 230313375 (2023)
Singhal, K., Azizi, S,, Tu, T,, et al.: Large Language Models Encode Clinical Knowledge. arXiv preprint arXiv: 221213138 (2022)
Singhal, K., Tu, T., Gottweis, J., et al.: Towards Expert-Level Medical Question Answering with Large Language Models. arXiv preprint arXiv: 230509617 (2023)
Zuccon, G., Koopman, B.: Dr ChatGPT, tell me what I want to hear: How prompt knowledge impacts health answer correctness. arXiv preprint arXiv: 230213793 (2023)
Srba, I., Pecher, B., Tomlein, M., et al.: Monant medical misinformation dataset: mapping articles to fact-checked claims. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 2949–2959. ACM, Madrid Spain (2022)
Google Scholar
Kotonya, N., Toni, F.: Explainable automated fact-checking for public health claims. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 7740–7754. Association for Computational Linguistics, Online (2020)
Google Scholar
Grootendorst, M.: BERTopic: Neural topic modeling with a class-based TF-IDF procedure (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Management, Wuhan University, Wuhan, China
Zhenni Ni & Yuxing Qian
Laboratoire d’Informatique Médicale et d’Ingénierie des Connaissances en eSanté (LIMICS), Sorbonne Université, Inserm, Paris, France
Pascal Vaillant, Marie-Christine Jaulent & Cédric Bousquet

Authors

Zhenni Ni
View author publications
You can also search for this author in PubMed Google Scholar
Yuxing Qian
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Vaillant
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Christine Jaulent
View author publications
You can also search for this author in PubMed Google Scholar
Cédric Bousquet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenni Ni .

Editor information

Editors and Affiliations

University of Crete and Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Constantine Stephanidis
Foundation for Research and Technology Hellas (FORTH), Heraklion, Crete, Greece
Margherita Antona
Foundation for Research and Technology Hellas (FORTH), Heraklion, Crete, Greece
Stavroula Ntoa
University of Central Florida, Orlando, FL, USA
Gavriel Salvendy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ni, Z., Qian, Y., Vaillant, P., Jaulent, MC., Bousquet, C. (2024). Assessing ChatGPT's Performance in Health Fact-Checking: Performance, Biases, and Risks. In: Stephanidis, C., Antona, M., Ntoa, S., Salvendy, G. (eds) HCI International 2023 – Late Breaking Posters. HCII 2023. Communications in Computer and Information Science, vol 1957. Springer, Cham. https://doi.org/10.1007/978-3-031-49212-9_50

Download citation

DOI: https://doi.org/10.1007/978-3-031-49212-9_50
Published: 12 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49211-2
Online ISBN: 978-3-031-49212-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Assessing ChatGPT's Performance in Health Fact-Checking: Performance, Biases, and Risks

Abstract

Access this chapter

Similar content being viewed by others

A vignette-based evaluation of ChatGPT’s ability to provide appropriate and equitable medical advice across care contexts

Assessing ChatGPT’s Potential in HIV Prevention Communication: A Comprehensive Evaluation of Accuracy, Completeness, and Inclusivity

Dr. GPT will see you now: the ability of large language model-linked chatbots to provide colorectal cancer screening recommendations

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Assessing ChatGPT's Performance in Health Fact-Checking: Performance, Biases, and Risks

Abstract

Access this chapter

Similar content being viewed by others

A vignette-based evaluation of ChatGPT’s ability to provide appropriate and equitable medical advice across care contexts

Assessing ChatGPT’s Potential in HIV Prevention Communication: A Comprehensive Evaluation of Accuracy, Completeness, and Inclusivity

Dr. GPT will see you now: the ability of large language model-linked chatbots to provide colorectal cancer screening recommendations

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation