DeMis: Data-Efficient Misinformation Detection Using Reinforcement Learning

Kawintiranon, Kornraphop; Singh, Lisa

doi:10.1007/978-3-031-26390-3_14

Kornraphop Kawintiranon¹³ &
Lisa Singh¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13714))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

786 Accesses

Abstract

Deep learning approaches are state-of-the-art for many natural language processing tasks, including misinformation detection. To train deep learning algorithms effectively, a large amount of training data is essential. Unfortunately, while unlabeled data are abundant, manually-labeled data are lacking for misinformation detection. In this paper, we propose DeMis, a novel reinforcement learning (RL) framework to detect misinformation on Twitter in a resource-constrained environment, i.e. limited labeled data. The main novelties result from (1) using reinforcement learning to identify high-quality weak labels to use with manually-labeled data to jointly train a classifier, and (2) using fact-checked claims to construct weak labels from unlabeled tweets. We empirically show the strength of this approach over the current state of the art and demonstrate its effectiveness in a low-resourced environment, outperforming other models by up to 8% (F1 score). We also find that our method is more robust to heavily imbalanced data. Finally, we publish a package containing code, trained models, and labeled data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/GU-DataLab/misinformation-detection-DeMis.
2.
We use the term unsupervised because we do not use any labeled data at this stage.
3.
Our unlabeled tweets do not overlap with any of our labeled data.
4.
https://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_HITReviewPolicies.html. Amazon Mechanical Turk - HIT Review Policies.
5.
News on Washington Posts.

References

Guo, B., Ding, Y., Yao, L., Liang, Y., Yu, Z.: The future of false information detection on social media: new perspectives and trends. ACM Comput. Surv. 53(4), 1–36 (2020)
Google Scholar
Haber, J., et al.: Lies and presidential debates: How political misinformation spread across media streams during the 2020 election. Harv. Kennedy School Misinform. Rev. (2021)
Google Scholar
Helmstetter, S., Paulheim, H.: Weakly supervised learning for fake news detection on twitter. In: ASONAM (2018)
Google Scholar
Hossain, T., Logan IV, R.L., Ugarte, A., Matsubara, Y., Young, S., Singh, S.: COVIDLies: detecting COVID-19 misinformation on social media. In: Workshop on NLP for COVID 2019 (Part 2) at EMNLP (2020)
Google Scholar
Jin, Z., Cao, J., Guo, H., Zhang, Y., Wang, Y., Luo, J.: Detection and analysis of 2016 us presidential election related rumors on twitter. In: SBP-BRiMS (2017)
Google Scholar
Kawintiranon, K., Singh, L., Budak, C.: Traditional and context-specific spam detection in low resource settings. Mach. Learn. 111, 2515–2536 (2022)
Google Scholar
Kumar, S., Shah, N.: False Information on Web and Social Media: A Survey. CRC Press, Boca Raton (2018)
Google Scholar
Li, Q., Zhang, Q., Si, L., Liu, Y.: Rumor detection on social media: datasets, methods and opportunities. In: NLP4IF Workshop at EMNLP (2019)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint (2019)
Google Scholar
Min, E., et al.: Divide-and-conquer: Post-user interaction network for fake news detection on social media. In: WWW (2022)
Google Scholar
Mosallanezhad, A., Karami, M., Shu, K., Mancenido, M.V., Liu, H.: Domain adaptive fake news detection via reinforcement learning. In: WWW (2022)
Google Scholar
Nguyen, D.Q., Vu, T., Nguyen, A.T.: BERTweet: a pre-trained language model for english tweets. In: EMNLP: System Demonstrations (2020)
Google Scholar
Nielsen, D.S., McConville, R.: Mumin: a large-scale multilingual multimodal fact-checked misinformation social network dataset. In: SIGIR (2022)
Google Scholar
Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. In: COLING (2018)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence Embeddings using Siamese BERT-Networks. In: EMNLP (2019)
Google Scholar
Singh, L., et al.: A first look at Covid-19 information and misinformation sharing on twitter. arXiv preprint (2020)
Google Scholar
Singh, L., Bode, L., Budak, C., Kawintiranon, K., Padden, C., Vraga, E.: Understanding high-and low-quality URL sharing on covid-19 twitter streams. J. Comput. Social Sci. 3(2), 343–366 (2020)
Article Google Scholar
Sutton, R.S., Barto, A.G.: RL: An Introduction. MIT Press, London (2018)
Google Scholar
Vo, N., Lee, K.: Where are the facts? searching for fact-checked information to alleviate the spread of fake news. In: EMNLP (2020)
Google Scholar
Wang, Y., et al.: Event adversarial neural networks for multi-modal fake news detection. In: KDD (2018)
Google Scholar
Wang, Y., et al.: Weak supervision for fake news detection via reinforcement learning. In: AAAI (2020)
Google Scholar
Wu, J., Li, L., Wang, W.Y.: Reinforced co-training. In: NAACL (2018)
Google Scholar
Yoon, J., Arik, S., Pfister, T.: Data valuation using reinforcement learning. In: ICML (2020)
Google Scholar
Yu, F., Liu, Q., Wu, S., Wang, L., Tan, T.: Attention-based convolutional approach for misinformation identification from massive and noisy microblog posts. Comput. Secur. 83, 106–121 (2019)
Article Google Scholar
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with bert. In: ICLR (2020)
Google Scholar
Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. 53(5), 1–40 (2020)
Article Google Scholar

Download references

Acknowledgement

This research was funded by National Science Foundation awards #1934925 and #1934494, and the Massive Data Institute (MDI) and McCourt Institute at Georgetown University. We would like to thank our funders, the MDI staff, and the Georgetown DataLab for their support.

Author information

Authors and Affiliations

Georgetown University, Washington, DC, USA
Kornraphop Kawintiranon & Lisa Singh

Authors

Kornraphop Kawintiranon
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kornraphop Kawintiranon .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d'Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kawintiranon, K., Singh, L. (2023). DeMis: Data-Efficient Misinformation Detection Using Reinforcement Learning. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13714. Springer, Cham. https://doi.org/10.1007/978-3-031-26390-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-26390-3_14
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26389-7
Online ISBN: 978-3-031-26390-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)