Automating fake news detection system using multi-level voting model

  • Sawinder KaurEmail author
  • Parteek Kumar
  • Ponnurangam Kumaraguru
Methodologies and Application


The issues of online fake news have attained an increasing eminence in the diffusion of shaping news stories online. Misleading or unreliable information in the form of videos, posts, articles, URLs is extensively disseminated through popular social media platforms such as Facebook and Twitter. As a result, editors and journalists are in need of new tools that can help them to pace up the verification process for the content that has been originated from social media. Motivated by the need for automated detection of fake news, the goal is to find out which classification model identifies phony features accurately using three feature extraction techniques, Term Frequency–Inverse Document Frequency (TF–IDF), Count-Vectorizer (CV) and Hashing-Vectorizer (HV). Also, in this paper, a novel multi-level voting ensemble model is proposed. The proposed system has been tested on three datasets using twelve classifiers. These ML classifiers are combined based on their false prediction ratio. It has been observed that the Passive Aggressive, Logistic Regression and Linear Support Vector Classifier (LinearSVC) individually perform best using TF-IDF, CV and HV feature extraction approaches, respectively, based on their performance metrics, whereas the proposed model outperforms the Passive Aggressive model by 0.8%, Logistic Regression model by 1.3%, LinearSVC model by 0.4% using TF-IDF, CV and HV, respectively. The proposed system can also be used to predict the fake content (textual form) from online social media websites.


Fake news articles Count-Vectorizer TF-IDF Hashing-Vectorizer Classifiers Textual content Machine learning models 



This Publication is an outcome of the R&D work undertaken in the project under the Visvesvaraya PhD Scheme of Ministry of Electronics and Information Technology, Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia).


Funding was provided by Digital India Corporation (formerly Media Lab Asia) (Grant No. U72900MH2001NPL133410).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.


  1. Aggarwal A, Rajadesingan A, Kumaraguru P (2012) PhishAri: automatic realtime phishing detection on twitter. In: eCrime researchers summit (eCrime). IEEE, pp 1–12Google Scholar
  2. Aggarwal A, Kumar S, Bhargava K, Kumaraguru P (2018) The follower count fallacy: detecting twitter users with manipulated follower countGoogle Scholar
  3. Ahmed F, Abulaish M (2012) An MCL-based approach for spam profile detection in online social networks. In: IEEE 11th international conference on trust, security and privacy in computing and communications (TrustCom). IEEE, pp 602–608Google Scholar
  4. Ahmed H, Traore I, Saad S (2017) Detection of online fake news using n-gram analysis and machine learning techniques. In: International conference on intelligent, secure, and dependable systems in distributed and cloud environments. Springer, pp 127–138Google Scholar
  5. Alahmadi A, Joorabchi A, Mahdi AE (2013) A new text representation scheme combining bag-of-words and bag-of-concepts approaches for automatic text classification. In: 2013 7th IEEE GCC conference and exhibition (GCC). IEEE, pp 108–113Google Scholar
  6. Batchelor O (2017) Getting out the truth: the role of libraries in the fight against fake news. Ref Serv Rev 45(2):143CrossRefGoogle Scholar
  7. Benevenuto F, Rodrigues T, Almeida V, Almeida J, Gonçalves M (2009) Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 620–627Google Scholar
  8. Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol 6, p 12Google Scholar
  9. Caetano JA, de Oliveira JF, Lima HS, Marques-Neto HT, Magno G, Meira W Jr, Almeida VA (2018) Analyzing and characterizing political discussions in WhatsApp public groups. arXiv preprint arXiv:1804.00397
  10. Canini KR, Suh B, Pirolli PL (2011) Finding credible information sources in social networks based on content and social structure. In: IEEE third international conference on social computing (SocialCom). IEEE third international conference on privacy, security, risk and trust (PASSAT). IEEE, pp 1–8Google Scholar
  11. Chen Y, Conroy NJ, Rubin VL (2015) Misleading online content: recognizing clickbait as false news. In: Proceedings of the 2015 ACM on workshop on multimodal deception detection. ACM, pp 15–19Google Scholar
  12. Chhabra S, Aggarwal A, Benevenuto F, Kumaraguru P (2011)\$ocial: the phishing landscape through short URLs. In: Proceedings of the 8th annual collaboration, electronic messaging, anti-abuse and spam conference. ACM, pp 92–101 Google Scholar
  13. Conroy NJ, Rubin VL, Chen Y (2015) Automatic deception detection: methods for finding fake news. Proc Assoc Inf Sci Technol 52(1):1CrossRefGoogle Scholar
  14. D’Angelo G, Palmieri F, Rampone S (2019) Detecting unfair recommendations in trust-based pervasive environments. Inf Sci 486:31CrossRefGoogle Scholar
  15. Dewan P, Kumaraguru P (2015) Towards automatic real time identification of malicious posts on facebook. In: 13th Annual conference on privacy, security and trust (PST). IEEE, pp 85–92Google Scholar
  16. Dewan P, Kumaraguru P (2017) Facebook inspector (FbI): towards automatic real-time detection of malicious content on Facebook. Soc Netw Anal Min 7(1):15CrossRefGoogle Scholar
  17. Dewan P, Gupta M, Goyal K, Kumaraguru P (2013) Multiosn: realtime monitoring of real world events on multiple online social media. In: Proceedings of the 5th IBM collaborative academia research exchange workshop. ACM, p 6Google Scholar
  18. Fake news on whatsapp. Last accessed 27 Aug 2019
  19. Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement. ACM, pp 35–47Google Scholar
  20. Garimella K, Tyson G (2018) WhatsApp, doc? A first look at WhatsApp public group data. arXiv preprint arXiv:1804.01473
  21. Gupta A, Kumaraguru P (2012a) Credibility ranking of tweets during high impact events. In: Proceedings of the 1st workshop on privacy and security in online social media. ACM, p 2Google Scholar
  22. Gupta A, Kumaraguru P (2012b) Twitter explodes with activity in Mumbai blasts! a lifeline or an unmonitored daemon in the lurking? Technical reportGoogle Scholar
  23. Gupta A, Lamba H, Kumaraguru P (2013a) \$ 1.00 per rt #BostonMarathon #PrayForBoston: analyzing fake content on twitter. In: eCrime researchers summit (eCRS). IEEE, pp 1–12Google Scholar
  24. Gupta A, Lamba H, Kumaraguru P, Joshi A (2013b) Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In: Proceedings of the 22nd international conference on world wide web. ACM, pp 729–736Google Scholar
  25. Jain P, Kumaraguru P (2016) On the dynamics of username changing behavior on twitter. In: Proceedings of the 3rd IKDD conference on data science. ACM, p 6Google Scholar
  26. Kaggle database. Last accessed 22 Oct 2017
  27. Kaggle database. Last accessed 24 Oct 2017
  28. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436CrossRefGoogle Scholar
  29. Kuleshov V, Thakoor S, Lau T, Ermon S (2018) Adversarial examples for natural language classification problemsGoogle Scholar
  30. Magdy A, Wanas N (2010) Web-based statistical fact checking of textual documents. In: Proceedings of the 2nd international workshop on search and mining user-generated contents. ACM, pp 103–110Google Scholar
  31. Markines B, Cattuto C, Menczer F (2009) Social spam detection. In Proceedings of the 5th international workshop on adversarial information retrieval on the web. ACM, pp 41–48Google Scholar
  32. Mishu SZ, Rafiuddin S (2016) Performance analysis of supervised machine learning algorithms for text classification. In: 19th International conference on computer and information technology (ICCIT). IEEE, pp 409–413Google Scholar
  33. News trends database. Last accessed 18 Oct 2017
  34. Pontes T, Magno T, Vasconcelos M, Gupta A, Almeida J, Kumaraguru P, Almeida V (2012a) Beware of what you share: inferring home location in social networks. In: IEEE 12th international conference on data mining workshops (ICDMW). IEEE, pp 571–578Google Scholar
  35. Pontes T, Vasconcelos M, Almeida J, Kumaraguru P, Almeida V (2012b) We know where you live: privacy characterization of foursquare behavior. In: Proceedings of the 2012 ACM conference on ubiquitous computing. ACM, pp 898–905Google Scholar
  36. Qazvinian V, Rosengren E, Radev DR, Mei Q (2011) Rumor has it: identifying misinformation in microblogs. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1589–1599Google Scholar
  37. Rubin VL, Conroy NJ, Chen Y (2015) Towards news verification: deception detection methods for news discourse. In: Hawaii international conference on system sciencesGoogle Scholar
  38. Rubin V, Conroy N, Chen Y, Cornwell S (2016) Fake news or truth? Using satirical cues to detect potentially misleading news. In: Proceedings of the second workshop on computational approaches to deception detection , pp 7–17Google Scholar
  39. Ruchansky N, Seo S, Liu Y (2017) CSI: a hybrid deep model for fake news detection. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 797–806Google Scholar
  40. Sen I, Aggarwal A, Mian S, Singh S, Kumaraguru P, Datta A (2018) Worth its weight in likes: towards detecting fake likes on Instagram. In: Proceedings of the 10th ACM conference on web science. ACM, pp 205–209Google Scholar
  41. Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor Newsl 19(1):22CrossRefGoogle Scholar
  42. Sirajudeen SM, Azmi NFA, Abubakar AI (2017) Online fake news detection algorithm. J Theor Appl Inf Technol 95(17):4114Google Scholar
  43. Stein B, Zu Eissen SM (2008) Retrieval models for genre classification. Scand J Inf Syst 20(1):3Google Scholar
  44. Volkova S, Shaffer K, Jang JY, Hodas N (2017) Separating facts from fiction: linguistic models to classify suspicious and trusted news posts on twitter. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2, short papers), vol 2, pp 647–653Google Scholar
  45. Wang AH (2010) Don’t follow me: spam detection in twitter. In: Proceedings of the 2010 international conference on security and cryptography (SECRYPT). IEEE, pp 1–10Google Scholar
  46. Wei W, Wan X (2017) Learning to identify ambiguous and misleading news headlines. arXiv preprint arXiv1705.06031Google Scholar
  47. Weimer M, Gurevych I, Mühlhäuser M (2007) Automatically assessing the post quality in online discussions on software. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions. Association for Computational Linguistics, pp 125–128Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Doctoral Research Lab-II, Computer Science and Engineering DepartmentTIETPatialaIndia
  2. 2.Computer Science and Engineering DepartmentTIETPatialaIndia
  3. 3.Computer Science and Engineering DepartmentIIITDelhiIndia

Personalised recommendations