Skip to main content

Testing the Generalization of Neural Language Models for COVID-19 Misinformation Detection

  • Conference paper
  • First Online:
Information for a Better World: Shaping the Global Future (iConference 2022)

Abstract

A drastic rise in potentially life-threatening misinformation has been a by-product of the COVID-19 pandemic. Computational support to identify false information within the massive body of data on the topic is crucial to prevent harm. Researchers proposed many methods for flagging online misinformation related to COVID-19. However, these methods predominantly target specific content types (e.g., news) or platforms (e.g., Twitter). The methods’ capabilities to generalize were largely unclear so far. We evaluate fifteen Transformer-based models on five COVID-19 misinformation datasets that include social media posts, news articles, and scientific papers to fill this gap. We show tokenizers and models tailored to COVID-19 data do not provide a significant advantage over general-purpose ones. Our study provides a realistic assessment of models for detecting COVID-19 misinformation. We expect that evaluating a broad spectrum of datasets and models will benefit future research in developing misinformation detection systems.

J. P. Wahle and N. Ashok—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://coronavirus.jhu.edu/map.html.

  2. 2.

    We collectively refer to fake news, disinformation, and misinformation as false information.

  3. 3.

    https://github.com/ag-gipp/iConference22_COVID_misinformation.

  4. 4.

    https://tinyurl.com/86cpx6u2.

  5. 5.

    https://tinyurl.com/9w24pc93.

  6. 6.

    https://tinyurl.com/86cpx6u2.

  7. 7.

    https://tinyurl.com/9w24pc93.

  8. 8.

    https://tinyurl.com/kebysw.

  9. 9.

    https://tinyurl.com/4xx9vdkm.

  10. 10.

    https://tinyurl.com/4ne9vtzu.

  11. 11.

    General-purpose refers to the tokenizers released with the pre-trained models.

  12. 12.

    Pre-Trained tokenizer provided by HuggingFace.

  13. 13.

    https://pubmed.ncbi.nlm.nih.gov/.

  14. 14.

    https://www.biorxiv.org/.

  15. 15.

    https://www.medrxiv.org/.

  16. 16.

    https://www.who.int/.

  17. 17.

    https://tinyurl.com/4mryzj5k.

  18. 18.

    https://www.poynter.org/ifcn/.

  19. 19.

    https://www.semanticscholar.org/.

References

  1. Alsentzer, E., et al.: Publicly Available Clinical BERT Embeddings. arXiv:1904.03323 [cs], June 2019. http://arxiv.org/abs/1904.03323

  2. Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3613–3618. Association for Computational Linguistics, Hong Kong, China (2019). 10/ggcgtm

    Google Scholar 

  3. Benkler, Y., Farris, R., Roberts, H.: Network Propaganda, vol. 1. Oxford University Press, October 2018. https://doi.org/10.1093/oso/9780190923624.001.0001

  4. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017). 10/gfw9cs

    Google Scholar 

  5. Cinelli, M., et al.: The COVID-19 social media infodemic. Sci. Rep. 10(1), 16598 (2020). https://doi.org/10.1038/s41598-020-73510-5

    Article  MathSciNet  Google Scholar 

  6. Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. arXiv:2003.10555 [cs], March 2020. http://arxiv.org/abs/2003.10555

  7. Cui, L., Lee, D.: CoAID: COVID-19 Healthcare Misinformation Dataset. arXiv:2006.00885 [cs], August 2020. http://arxiv.org/abs/2006.00885

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805, May 2019. http://arxiv.org/abs/1810.04805

  9. Dror, R., Baumer, G., Shlomov, S., Reichart, R.: The hitchhiker’s guide to testing statistical significance in natural language processing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1383–1392. Association for Computational Linguistics, Melbourne, Australia, July 2018. https://doi.org/10.18653/v1/P18-1128

  10. Hele, T., et al.: A global panel database of pandemic policies (oxford COVID-19 government response tracker). Nat. Hum. Behav. 5(4), 529–538 (2021). https://doi.org/10.1038/s41562-021-01079-8

    Article  Google Scholar 

  11. He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: Decoding-enhanced BERT with Disentangled Attention. arXiv:2006.03654 [cs], January 2021. http://arxiv.org/abs/2006.03654

  12. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328–339. Association for Computational Linguistics, Melbourne, Australia, July 2018. https://doi.org/10.18653/v1/P18-1031

  13. Johnson, A.E., et al.: MIMIC-III, a freelyaccessible critical care database. Sci. Data 3, 160035 (2016). https://doi.org/10.1038/sdata.2016.35

  14. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, pp. 1–7 (2019). https://doi.org/10.1093/bioinformatics/btz682

  15. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.703

  16. Liu, Y., et al.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs], July 2019. http://arxiv.org/abs/1907.11692

  17. Memon, S.A., Carley, K.M.: Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset. arXiv:2008.00791 [cs], September 2020. http://arxiv.org/abs/2008.00791

  18. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546 [cs, stat], October 2013. http://arxiv.org/abs/1310.45464

  19. Mutlu, E.C., et al.: A stance data set on polarized conversations on Twitter about the efficacy of hydroxychloroquine as a treatment for COVID-19. Data in Brief 33, 106401 (2020). https://doi.org/10.1016/j.dib.2020.106401

  20. Müller, M., Salathé, M., Kummervold, P.E.: COVID-twitter-bert: a natural language processing model to analyse COVID-19 content on twitter. arXiv:2005.07503 [cs], May 2020. http://arxiv.org/abs/2005.07503

  21. Nguyen, D.Q., Vu, T., Tuan Nguyen, A.: BERTweet: a pre-trained language model for English tweets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 9–14. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-demos.2

  22. Ostendorff, M., Ruas, T., Blume, T., Gipp, B., Rehm, G.: Aspect-based document similarity for research papers. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 6194–6206. International Committee on Computational Linguistics, Barcelona, Spain (Online) (2020). https://doi.org/10.18653/v1/2020.coling-main.545

  23. Pennycook, G., McPhetres, J., Zhang, Y., Lu, J.G., Rand, D.G.: Fighting COVID-19 misinformation on social media: experimental evidence for a scalable accuracy-nudge intervention. Psychol. Sci. 31(7), 770–780 (2020). https://doi.org/10.1177/0956797620939054

    Article  Google Scholar 

  24. Press, O., Smith, N.A., Lewis, M.: Shortformer: better language modeling using shorter inputs. arXiv:2012.15832 [cs], December 2020. http://arxiv.org/abs/2012.15832

  25. Ruas, T., Ferreira, C.H.P., Grosky, W., de França, F.O., de Medeiros, D.M.R.: Enhanced word embeddings using multi-semantic representation through lexical chains. Inf. Sci. 532, 16–32 (2020). https://doi.org/10.1016/j.ins.2020.04.048

    Article  Google Scholar 

  26. Ruas, T., Grosky, W., Aizawa, A.: Multi-sense embeddings through a word sense disambiguation process. Expert Syst. Appl. 136, 288–303 (2019). https://doi.org/10.1016/j.eswa.2019.06.026

    Article  Google Scholar 

  27. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newslett. 19(1), 22–36 (2017). https://doi.org/10.1145/3137597.3137600

    Article  Google Scholar 

  28. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010. NIPS 2017, Curran Associates Inc., Red Hook, NY, USA (2017). https://arxiv.org/abs/1706.03762

  29. Wahle, J.P., Ruas, T., Foltynek, T., Meuschke, N., Gipp, B.: Identifying machine-paraphrased plagiarism. In: Proceedings of the iConference, February 2022

    Google Scholar 

  30. Wahle, J.P., Ruas, T., Meuschke, N., Gipp, B.: Are neural language models good plagiarists? a benchmark for neural paraphrase detection. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, Washington, USA, September 2021

    Google Scholar 

  31. Wang, A., et al.: SuperGLUE: a stickier benchmark for general-purpose language understanding systems. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 3266–3280. Curran Associates, Inc. (2019). https://arxiv.org/abs/1905.00537

  32. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. arXiv:1804.07461 [cs], February 2019. https://arxiv.org/abs/1804.0746

  33. Wang, L.L., et al.: CORD-19: The COVID-19 Open Research Dataset. arXiv:2004.10706 [cs], July 2020. http://arxiv.org/abs/2004.10706

  34. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. arXiv:1906.08237 [cs], June 2019. https://arxiv.org/abs/1804.0746

  35. Zarocostas, J.: How to fight an infodemic. Lancet 395(10225), 676 (2020). https://doi.org/10.1016/S0140-6736(20)30461-X

    Article  Google Scholar 

  36. Zhou, X., Mulay, A., Ferrara, E., Zafarani, R.: ReCOVery: A multimodal repository for COVID-19 news credibility research, pp. 3205–3212. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3340531.3412880

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Philip Wahle .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Dataset Details

COVID-19 Open Research Dataset (CORD-19) [33] is the largest open source dataset about COVID-19 and coronavirus-related research (e.g. SARS, MERS). CORD-19 is composed of more than 280K scholarly articles from PubMed,Footnote 13 bioRxiv,Footnote 14 medRxiv,Footnote 15 and other resources maintained by the WHO.Footnote 16 We use this dataset to extend the general pre-training from selected neural language models (cf. Sect. 3) into the COVID-specific vocabulary and features.

Covid-19 heAlthcare mIsinformation Dataset (CoAID) [7] focuses on healthcare misinformation, including fake news on websites, user engagement, and social media. CoAID is composed of 5 216 news articles, 296 752 related user engagements, and 958 posts about COVID-19, which are broadly categorized under the labels true and false.

Twitter Stance Dataset (COVID-CQ) [19] is a dataset of user-generated Twitter content in the context of COVID-19. More than 14K tweets were processed and annotated regarding the use of Chloroquine and Hydroxychloroquine as a valid treatment or prevention against the coronavirus. COVID-CQ is composed of 14 374 tweets from 11 552 unique users labeled as neutral, against, or favor.

ReCOVery [36] explores the low credibility of information on COVID-19 (e.g., bleach can prevent COVID-19) by allowing a multimodal investigation of news and their spread on social media. The dataset is composed of 2 029 news articles on the coronavirus and 140 820 related tweets labeled as reliable or unreliable.

CMU-MisCov19 [17] is a Twitter dataset created by collecting posts from unknowingly misinformed users, users who actively spread misinformation, and users who disseminate facts or call out misinformation. CMU-MisCov19 is composed of 4 573 annotated tweets divided into 17 classes (e.g., conspiracy, fake cure, news, sarcasm). The high number of classes and their imbalanced distribution make CMU-MisCov19 a challenging dataset.

COVID19FNFootnote 17 is composed of approximately 2 800 news articles extracted mainly from PoynterFootnote 18 categorized as either real or fake.

1.2 A.2 Model Details

General-Purpose Baselines. BERT [8] mainly captures general language characteristics using a bidirectional Masked Language Model (MLM) and Next Sentence Prediction (NSP) tasks. RoBERTa [16] improves BERT with additional data, compute budgets, and hyperparameter optimizations. RoBERTa also drops the NSP as it contributes little to the model representation. BART [15] optimizes an auto-regressive forward-product and auto-encoding MLM objective simultaneously. DeBERTa [11] improves the attention mechanism using a disentanglement of content and position.

Intermediate Pre-Trained. SciBERT [2] optimizes the MLM for 1.14M randomly selected papers from Semantic ScholarFootnote 19. BioClinicalBERT [1] specializes on 2M notes in the MIMIC-III database [13], a collection of disidentified clinical data. BERTweet [21] optimizes BERT on 850M tweets each containing between 10 and 64 tokens.

COVID-19 Intermediate Pre-Trained. COVID-Twitter-BERT [20] (CT-BERT) uses a corpus of 160M tweets for domain-specific pre-training and evaluates the resulting model’s capabilities in sentiment analysis, such as for tweets about vaccines. BioClinicalBERT [1] fine-tunes BioBERT [14] into clinical narratives in the hope to incorporate linguistic characteristics from both the clinical and biomedical domains.

Cui et al. [7] propose CoAID and investigate the misinformation detection task by comparing traditional machine learning (e.g., logistic regression, random forest) and deep learning techniques (e.g., GRU). In a similar layout, Zhou et al. [36] compare traditional statistical learners, such as SVM and neural networks (e.g., CNN), to classify news as credible or not. In both studies, the results show deep learning architectures as the most prominent options.

1.3 A.3 Evaluation Details

Pre-Training. We use the data from the abstracts of the CORD-19 dataset for pre-training. For pre-processing the CORD-19 abstract data, we consider only alphanumerical characters. We use a sequence length of 128 tokens, which reduces training time while being competitive to longer sequence lengths when fine-tuning [24]. We mask words randomly with a probability of .15, a common configuration for Transformers [8, 11], and perform the MLM with the following remaining parameters: a batch size of 16 for all the base models, and eight for the large models, the Adam Optimizer (\(\alpha = 2e-5\), \(\beta _1 = .9\), \(\beta _2 = .999\), \(\epsilon = 1e-8\)), and a maximum of five epochs. All experiments were performed on a single NVIDIA GeForce GTX 1080 Ti GPU with 11 GB of memory.

Fine-Tuning. The classification model applies a randomly initialized fully-connected layer to the aggregate representation of the underlying Transformer (e.g., [CLS] for BERT) with dropout (\(p=.1\)) to learn the annotated target classes with cross-entropy loss for five epochs and with a sequence length of 200 tokens. We use the same configuration of the optimizer as in pre-training.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wahle, J.P., Ashok, N., Ruas, T., Meuschke, N., Ghosal, T., Gipp, B. (2022). Testing the Generalization of Neural Language Models for COVID-19 Misinformation Detection. In: Smits, M. (eds) Information for a Better World: Shaping the Global Future. iConference 2022. Lecture Notes in Computer Science(), vol 13192. Springer, Cham. https://doi.org/10.1007/978-3-030-96957-8_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-96957-8_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96956-1

  • Online ISBN: 978-3-030-96957-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics