Skip to main content

DeMis: Data-Efficient Misinformation Detection Using Reinforcement Learning

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13714))

  • 786 Accesses

Abstract

Deep learning approaches are state-of-the-art for many natural language processing tasks, including misinformation detection. To train deep learning algorithms effectively, a large amount of training data is essential. Unfortunately, while unlabeled data are abundant, manually-labeled data are lacking for misinformation detection. In this paper, we propose DeMis, a novel reinforcement learning (RL) framework to detect misinformation on Twitter in a resource-constrained environment, i.e. limited labeled data. The main novelties result from (1) using reinforcement learning to identify high-quality weak labels to use with manually-labeled data to jointly train a classifier, and (2) using fact-checked claims to construct weak labels from unlabeled tweets. We empirically show the strength of this approach over the current state of the art and demonstrate its effectiveness in a low-resourced environment, outperforming other models by up to 8% (F1 score). We also find that our method is more robust to heavily imbalanced data. Finally, we publish a package containing code, trained models, and labeled data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/GU-DataLab/misinformation-detection-DeMis.

  2. 2.

    We use the term unsupervised because we do not use any labeled data at this stage.

  3. 3.

    Our unlabeled tweets do not overlap with any of our labeled data.

  4. 4.

    https://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_HITReviewPolicies.html. Amazon Mechanical Turk - HIT Review Policies.

  5. 5.

    News on Washington Posts.

References

  1. Guo, B., Ding, Y., Yao, L., Liang, Y., Yu, Z.: The future of false information detection on social media: new perspectives and trends. ACM Comput. Surv. 53(4), 1–36 (2020)

    Google Scholar 

  2. Haber, J., et al.: Lies and presidential debates: How political misinformation spread across media streams during the 2020 election. Harv. Kennedy School Misinform. Rev. (2021)

    Google Scholar 

  3. Helmstetter, S., Paulheim, H.: Weakly supervised learning for fake news detection on twitter. In: ASONAM (2018)

    Google Scholar 

  4. Hossain, T., Logan IV, R.L., Ugarte, A., Matsubara, Y., Young, S., Singh, S.: COVIDLies: detecting COVID-19 misinformation on social media. In: Workshop on NLP for COVID 2019 (Part 2) at EMNLP (2020)

    Google Scholar 

  5. Jin, Z., Cao, J., Guo, H., Zhang, Y., Wang, Y., Luo, J.: Detection and analysis of 2016 us presidential election related rumors on twitter. In: SBP-BRiMS (2017)

    Google Scholar 

  6. Kawintiranon, K., Singh, L., Budak, C.: Traditional and context-specific spam detection in low resource settings. Mach. Learn. 111, 2515–2536 (2022)

    Google Scholar 

  7. Kumar, S., Shah, N.: False Information on Web and Social Media: A Survey. CRC Press, Boca Raton (2018)

    Google Scholar 

  8. Li, Q., Zhang, Q., Si, L., Liu, Y.: Rumor detection on social media: datasets, methods and opportunities. In: NLP4IF Workshop at EMNLP (2019)

    Google Scholar 

  9. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint (2019)

    Google Scholar 

  10. Min, E., et al.: Divide-and-conquer: Post-user interaction network for fake news detection on social media. In: WWW (2022)

    Google Scholar 

  11. Mosallanezhad, A., Karami, M., Shu, K., Mancenido, M.V., Liu, H.: Domain adaptive fake news detection via reinforcement learning. In: WWW (2022)

    Google Scholar 

  12. Nguyen, D.Q., Vu, T., Nguyen, A.T.: BERTweet: a pre-trained language model for english tweets. In: EMNLP: System Demonstrations (2020)

    Google Scholar 

  13. Nielsen, D.S., McConville, R.: Mumin: a large-scale multilingual multimodal fact-checked misinformation social network dataset. In: SIGIR (2022)

    Google Scholar 

  14. Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. In: COLING (2018)

    Google Scholar 

  15. Reimers, N., Gurevych, I.: Sentence-BERT: sentence Embeddings using Siamese BERT-Networks. In: EMNLP (2019)

    Google Scholar 

  16. Singh, L., et al.: A first look at Covid-19 information and misinformation sharing on twitter. arXiv preprint (2020)

    Google Scholar 

  17. Singh, L., Bode, L., Budak, C., Kawintiranon, K., Padden, C., Vraga, E.: Understanding high-and low-quality URL sharing on covid-19 twitter streams. J. Comput. Social Sci. 3(2), 343–366 (2020)

    Article  Google Scholar 

  18. Sutton, R.S., Barto, A.G.: RL: An Introduction. MIT Press, London (2018)

    Google Scholar 

  19. Vo, N., Lee, K.: Where are the facts? searching for fact-checked information to alleviate the spread of fake news. In: EMNLP (2020)

    Google Scholar 

  20. Wang, Y., et al.: Event adversarial neural networks for multi-modal fake news detection. In: KDD (2018)

    Google Scholar 

  21. Wang, Y., et al.: Weak supervision for fake news detection via reinforcement learning. In: AAAI (2020)

    Google Scholar 

  22. Wu, J., Li, L., Wang, W.Y.: Reinforced co-training. In: NAACL (2018)

    Google Scholar 

  23. Yoon, J., Arik, S., Pfister, T.: Data valuation using reinforcement learning. In: ICML (2020)

    Google Scholar 

  24. Yu, F., Liu, Q., Wu, S., Wang, L., Tan, T.: Attention-based convolutional approach for misinformation identification from massive and noisy microblog posts. Comput. Secur. 83, 106–121 (2019)

    Article  Google Scholar 

  25. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with bert. In: ICLR (2020)

    Google Scholar 

  26. Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. 53(5), 1–40 (2020)

    Article  Google Scholar 

Download references

Acknowledgement

This research was funded by National Science Foundation awards #1934925 and #1934494, and the Massive Data Institute (MDI) and McCourt Institute at Georgetown University. We would like to thank our funders, the MDI staff, and the Georgetown DataLab for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kornraphop Kawintiranon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kawintiranon, K., Singh, L. (2023). DeMis: Data-Efficient Misinformation Detection Using Reinforcement Learning. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13714. Springer, Cham. https://doi.org/10.1007/978-3-031-26390-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26390-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26389-7

  • Online ISBN: 978-3-031-26390-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics