In Search of Credible News
We study the problem of finding fake online news. This is an important problem as news of questionable credibility have recently been proliferating in social media at an alarming scale. As this is an understudied problem, especially for languages other than English, we first collect and release to the research community three new balanced credible vs. fake news datasets derived from four online sources. We then propose a language-independent approach for automatically distinguishing credible from fake news, based on a rich feature set. In particular, we use linguistic (n-gram), credibility-related (capitalization, punctuation, pronoun use, sentiment polarity), and semantic (embeddings and DBPedia data) features. Our experiments on three different testsets show that our model can distinguish credible from fake news with very high accuracy.
KeywordsCredibility Veracity Fact checking Humor detection
This research was performed by Momchil Hardalov, a student in Computer Science in the Sofia University “St Kliment Ohridski”, as part of his M.Sc. thesis. It is also part of the Interactive sYstems for Answer Search (Iyas) project, which is developed by the Arabic Language Technologies (ALT) group at the Qatar Computing Research Institute (QCRI), HBKU, part of Qatar Foundation in collaboration with MIT-CSAIL.
- 4.Graves, L.: Deciding what’s true: fact-checking journalism and the new ecology of news. Ph.D. thesis, Columbia University (2013)Google Scholar
- 6.Kapukaranov, B., Preslav, N.: Fine-grained sentiment analysis for movie reviews in Bulgarian. In: Proceedings of Recent Advances in Natural Language Processing, RANLP 2015, Hissar, Bulgaria, pp. 266–274 (2015)Google Scholar
- 8.Kohonen, T.: Improved versions of learning vector quantization. In: IJCNN International Joint Conference on Neural Networks, pp. 545–550 (1990)Google Scholar
- 10.Mihalcea, R., Strapparava, C.: Making computers laugh: investigations in automatic humor recognition. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, HLT-EMNLP 2005, Vancouver, British Columbia, Canada, pp. 531–538 (2005)Google Scholar
- 11.Mihaylov, T., Georgiev, G., Nakov, P.: Finding opinion manipulation trolls in news community forums. In: Proceedings of 19th Conference on Computational Natural Language Learning, CoNLL 2015, Beijing, China, pp. 310–314 (2015)Google Scholar
- 12.Mihaylov, T., Koychev, I., Georgiev, G., Nakov, P.: Exposing paid opinion manipulation trolls. In: Proceedings of International Conference Recent Advances in Natural Language Processing, RANLP 2015, Hissar, Bulgaria, pp. 443–450 (2015)Google Scholar
- 13.Mihaylov, T., Nakov, P.: Hunting for troll comments in news community forums. In: Proceedings of 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, Berlin, Germany (2016)Google Scholar
- 15.Yang, D., Lavie, A., Dyer, C., Hovy, E.: Humor recognition and humor anchor extraction. In: Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, pp. 2367–2376 (2015)Google Scholar
- 16.Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, Boston, MA, p. 10 (2010)Google Scholar
- 18.Zubiaga, A., Hoi, G.W.S., Liakata, M., Procter, R., Tolmie, P.: Analysing how people orient to and spread rumours in social media by looking at conversational threads (2015). arXiv preprint arXiv:1511.07487