In Search of Credible News

  • Momchil HardalovEmail author
  • Ivan Koychev
  • Preslav Nakov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9883)


We study the problem of finding fake online news. This is an important problem as news of questionable credibility have recently been proliferating in social media at an alarming scale. As this is an understudied problem, especially for languages other than English, we first collect and release to the research community three new balanced credible vs. fake news datasets derived from four online sources. We then propose a language-independent approach for automatically distinguishing credible from fake news, based on a rich feature set. In particular, we use linguistic (n-gram), credibility-related (capitalization, punctuation, pronoun use, sentiment polarity), and semantic (embeddings and DBPedia data) features. Our experiments on three different testsets show that our model can distinguish credible from fake news with very high accuracy.


Credibility Veracity Fact checking Humor detection 



This research was performed by Momchil Hardalov, a student in Computer Science in the Sofia University “St Kliment Ohridski”, as part of his M.Sc. thesis. It is also part of the Interactive sYstems for Answer Search (Iyas) project, which is developed by the Arabic Language Technologies (ALT) group at the Qatar Computing Research Institute (QCRI), HBKU, part of Qatar Foundation in collaboration with MIT-CSAIL.


  1. 1.
    Brill, A.M.: Online journalists embrace new marketing function. Newsp. Res. J. 22(2), 28 (2001)CrossRefGoogle Scholar
  2. 2.
    Cassidy, W.P.: Online news credibility: an examination of the perceptions of newspaper journalists. J. Comput.-Mediat. Commun. 12(2), 478–498 (2007)CrossRefGoogle Scholar
  3. 3.
    Castillo, C., Mendoza, M., Poblete, B.: Predicting information credibility in time-sensitive social media. Internet Res. 23(5), 560–588 (2013)CrossRefGoogle Scholar
  4. 4.
    Graves, L.: Deciding what’s true: fact-checking journalism and the new ecology of news. Ph.D. thesis, Columbia University (2013)Google Scholar
  5. 5.
    Johnson, T.J., Kaye, B.K., Bichard, S.L., Wong, W.J.: Every blog has its day: politically-interested internet users perceptions of blog credibility. J. Comput.-Mediat. Commun. 13(1), 100–122 (2007)CrossRefGoogle Scholar
  6. 6.
    Kapukaranov, B., Preslav, N.: Fine-grained sentiment analysis for movie reviews in Bulgarian. In: Proceedings of Recent Advances in Natural Language Processing, RANLP 2015, Hissar, Bulgaria, pp. 266–274 (2015)Google Scholar
  7. 7.
    Ketterer, S.: Teaching students how to evaluate and use online resources. Journal. Mass Commun. Educ. 52(4), 4 (1998)CrossRefGoogle Scholar
  8. 8.
    Kohonen, T.: Improved versions of learning vector quantization. In: IJCNN International Joint Conference on Neural Networks, pp. 545–550 (1990)Google Scholar
  9. 9.
    Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Mihalcea, R., Strapparava, C.: Making computers laugh: investigations in automatic humor recognition. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, HLT-EMNLP 2005, Vancouver, British Columbia, Canada, pp. 531–538 (2005)Google Scholar
  11. 11.
    Mihaylov, T., Georgiev, G., Nakov, P.: Finding opinion manipulation trolls in news community forums. In: Proceedings of 19th Conference on Computational Natural Language Learning, CoNLL 2015, Beijing, China, pp. 310–314 (2015)Google Scholar
  12. 12.
    Mihaylov, T., Koychev, I., Georgiev, G., Nakov, P.: Exposing paid opinion manipulation trolls. In: Proceedings of International Conference Recent Advances in Natural Language Processing, RANLP 2015, Hissar, Bulgaria, pp. 443–450 (2015)Google Scholar
  13. 13.
    Mihaylov, T., Nakov, P.: Hunting for troll comments in news community forums. In: Proceedings of 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, Berlin, Germany (2016)Google Scholar
  14. 14.
    Papadopoulos, S., Bontcheva, K., Jaho, E., Lupu, M., Castillo, C.: Overview of the special issue on trust, veracity of information in social media. ACM Trans. Inf. Syst. 34(3), 14:1–14:5 (2016)CrossRefGoogle Scholar
  15. 15.
    Yang, D., Lavie, A., Dyer, C., Hovy, E.: Humor recognition and humor anchor extraction. In: Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, pp. 2367–2376 (2015)Google Scholar
  16. 16.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, Boston, MA, p. 10 (2010)Google Scholar
  17. 17.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Zubiaga, A., Hoi, G.W.S., Liakata, M., Procter, R., Tolmie, P.: Analysing how people orient to and spread rumours in social media by looking at conversational threads (2015). arXiv preprint arXiv:1511.07487
  19. 19.
    Zubiaga, A., Ji, H.: Tweet, but verify: epistemic study of information verification on Twitter. Soc. Netw. Anal. Min. 4(1), 1–12 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.FMISofia University “St. Kliment Ohridski”SofiaBulgaria
  2. 2.Qatar Computing Research Institute, HBKUDohaQatar

Personalised recommendations