Skip to main content

Study and Detection of Fake News: P2C2-Based Machine Learning Approach

  • Conference paper
  • First Online:
Data Management, Analytics and Innovation

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1175))

Abstract

News is the most important and sensitive piece of information which affects the society nowadays. In the current scenario, there are two ways to propagate news all over the world; first one is the traditional way, i.e., newspaper and second is electronic media like social media websites. Electronic media is the most popular medium these days because it helps to propagate news to huge audience in few seconds. Besides these benefits of electronic media, it has one disadvantage also, i.e., “spreading the Fake News”. Fake news is the most common problem these days. Even big companies like Twitter, Facebook, etc. are facing fake news problems. Several researchers are working in these big companies to solve this problem. Fake news can be defined as the news story that is not true. In some specific words, we can say that news is fake if any news agency declares a piece of news deliberately written as false and it is also verifiably as false. This paper focuses on some key characteristics of fake news and how it is affecting the society nowadays. It also includes various key viewpoints which are useful to categorize whether the news is fake or not. At last, this paper discussed some key challenges and future directions that help in increasing accuracy in detection of fake news on the basis of P2C2 (Propagation, Pattern, Comprehension & Credibility) approach having two phases: Detection and Verification. This paper helps readers in two ways (i) Newcomer can easily get the basic knowledge and impact of fake news; (ii) They can get knowledge of different perspectives of fake news which are helpful in the detection process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Internetlivestats.com. (2019). Internet Live Stats - Internet Usage & Social Media Statistics. [online] Available at: http://www.internetlivestats.com [Accessed 19 Mar. 2019]

  2. Pew Research Center. (2019). Social media outpaces print newspapers in the U.S. as a news source. [online] Available at: https://pewrsr.ch/2rsoHtb [Accessed 19 Mar. 2019]

  3. K. Shu, A. Sliva, S. Wang, J. Tang, H. Liu, Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsl 19(1), 22–36 (2017)

    Article  Google Scholar 

  4. Hunt Allcott, Matthew Gentzkow, Social Media and Fake News in the 2016 Election. Journal of Economic Perspectives 31(2), 211–236 (2017). https://doi.org/10.1257/jep.31.2.211

    Article  Google Scholar 

  5. Alexander Smith and Vladimir Banic. (2016). Fake News: How a partying Macedonian teen earns thousands publishing lies. In: NBC News; Accessed: March 20, 2019

    Google Scholar 

  6. Craig Silverman. (Nov. 2016). This analysis shows how viral fake election news stories outperformed real news on Facebook. In: BuzzFeed News. Accessed: March 20, 2019

    Google Scholar 

  7. Homebusinessmag.com. (2019). [online] Available at: https://homebusinessmag.com/blog/money-matters/fake-news-impact-stock-market-prices [Accessed 21 Mar. 2019]

  8. Nitin Jindal, Bing Liu. (2008). Opinion spam and analysis. In: Proceedings of the 1st ACM International Conference on Web Search and Data Mining

    Google Scholar 

  9. Huayi Li, Geli Fei, Shuai Wang, Bing Liu, Weixiang Shao, Arjun Mukherjee, Jidong Shao. (2017). Bimodal distribution and co-bursting in review spam detection. In: Proceedings of the 26th International Conference on World Wide Web

    Google Scholar 

  10. Arjun Mukherjee, Bing Liu, Natalie Glance. (2012). Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web (ACM)

    Google Scholar 

  11. Myle Ott, Yejin Choi, Claire Cardie, Jeffrey T Hancock. (2011). Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

    Google Scholar 

  12. Vlad Sandulescu, Martin Ester. (2015). Detecting singleton review spammers using semantic similarity. In: Proceedings of the 24th international conference on World Wide Web(ACM)

    Google Scholar 

  13. Edson C. Tandoc Jr., Zheng Wei Lim, Richard Ling. (August 2017). Defining “Fake News.” A Typology of Scholarly Definitions. In: Digital Journalism

    Google Scholar 

  14. Andrea, Mecacci. (2016). Aesthetics of Fake. An Overview. In: Aisthesis. http://dx.doi.org/10.13128/Aisthesis-19416;

  15. Soroush Vosoughi, Deb Roy, Sinan Aral. (2018). The spread of true and false news online. In: Science Vol.359. https://doi.org/10.1126/science.aap9559

  16. Dan Berkowitz, David Asa Schwartz. (2016). Miley, CNN and The Onion: When fake news becomes realer than real. In: Journalism Practice. https://doi.org/10.1080/17512786.2015.1006933

  17. Nir Kshetri, Jeffrey Voas. (2017). The Economics of “Fake News”. In: IT Professional 6. https://doi.org/10.1109/mitp.2017.4241459

  18. Adam Kucharski. (2016). Post-truth: Study epidemiology of fake news. In: Nature. https://doi.org/10.1038/540525a

  19. Cody Buntain, Jennifer Golbeck. (2017). Automatically Identifying Fake News in Popular Twitter Threads. In: 2017 IEEE International Conference on Smart Cloud (SmartCloud). https://doi.org/10.1109/smartcloud.2017.40

  20. Dale, R. (2017). NLP in a post-truth world. Published online by Cambridge University Press: 31 January 2017. DOI:https://doi.org/10.1017/S1351324917000018

  21. Ahmed, H. (2017). Detecting opinion spam and fake news using n-gram analysis and semantic similarity Ph.D. thesis. Online Access: https://dspace.library.uvic.ca//handle/1828/8796

  22. M. Mendoza, B. Poblete, C. Castillo. (July 2010). Twitter under crisis: Can we trust what we RT? In: 1st Workshop on Social Media Analytics (SOMA’10). ACM Press. https://doi.org/10.1145/1964858.1964869

  23. Newsroom.fb.com. (2019). Addressing Hoaxes and Fake News| Facebook Newsroom. [online] Available at: https://newsroom.fb.com/news/2016/12/news-feed-fyi-addressing-hoaxes-and-fake-news/ [Accessed 15 Mar. 2019]

  24. Newsinitiative.withgoogle.com. (2019). [online] Available at: https://newsinitiative.withgoogle.com [Accessed 21 Mar. 2019]

  25. Scopus.com. (2019). Scopus preview - Scopus - Welcome to Scopus. [online] Available at: https://www.scopus.com [Accessed 24 Jul. 2019]

  26. Duke Reporters’ Lab. (2019). Fact- Checking - Duke Reporters’ Lab. [online] Available at: https://reporterslab.org/fact-checking [Accessed 25 Jul. 2019]

  27. Truthsetter.com. (2019). TruthSetter. [online] Available at: https://truthsetter.com [Accessed 26 Jul. 2019]

  28. Fiskkit.com. (2019). Fiskkit. [online] Available at: https://fiskkit.com [Accessed 26 Jul. 2019]

  29. Sachin Pawar, Girish K Palshikar, Pushpak Bhattacharyya. (2017). Relation Extraction: A Survey. arXiv preprint arXiv:1712.05191

  30. Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, Wei Zhang. (2014). Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 601–610. https://doi.org/10.1145/2623330.2623623

  31. Rebecca C Steorts, Rob Hall, Stephen E Fienberg. (2016). A Bayesian Approach to Graphical Record Linkage and De-duplication. In: Journal of the American Statistical Association: Theory and Methods

    Google Scholar 

  32. Johannes Hoffart, Fabian M Suchanek, Klaus Berberich, Gerhard Weikum. (2013). YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. In: Artificial Intelligence. https://doi.org/10.1016/j.artint.2012.06.001

  33. Amr Magdy, Nayer Wanas. (2010). Web-based statistical fact checking of textual documents. In: Proceedings of the 2nd international workshop on Search and mining user-generated contents. ACM. https://doi.org/10.1145/1871985.1872002

  34. Diego Esteves, Aniketh Janardhan Reddy, Piyush Chawla, Jens Lehmann. (2018). Belittling the Source: Trustworthiness Indicators to Obfuscate Fake News on the Web. In: EMNLP 2018: Conference on Empirical Methods in Natural Language Processing. arXiv preprint arXiv:1809.00494 (2018)

  35. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor. (2008). Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM

    Google Scholar 

  36. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hruschka Jr, Tom M Mitchell. (2010). Toward an architecture for never-ending language learning. In AAAI, Vol. 5. Atlanta, 3

    Google Scholar 

  37. Ndapandula Nakashole, Gerhard Weikum, Fabian Suchanek. (2012). PATTY: a taxonomy of relational patterns with semantic types. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics

    Google Scholar 

  38. Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary Ives. (2007). Dbpedia: A nucleus for a web of open data. In The semantic web. Springer, 722–735

    Google Scholar 

  39. Jinie Pak, Lina Zhou. (2015). A comparison of features for automatic deception detection in synchronous computer-mediated communication. In:2015 IEEE International Conference on Intelligence and Security Informatics (ISI). https://doi.org/10.1109/isi.2015.7165955

  40. Popoola, O. (2018). Detecting Fake Amazon Book Reviews using Rhetorical Structure Theory

    Google Scholar 

  41. Chloé Braud, Anders Søgaard. (2017). Is writing style predictive of scientific fraud?. arXiv preprint arXiv:1707.04095

  42. Mohamed Abouelenien, Verónica Pérez-Rosas, Bohan Zhao, Rada Mihalcea, Mihai Burzo. (2017). Gender-based multimodal deception detection. In: Proceedings of the Symposium on Applied Computing, ACM

    Google Scholar 

  43. Sean L Humpherys, Kevin C Moffitt, Mary B Burns, Judee K Burgoon, William F Felix. (2011). Identification of fraudulent financial statements using linguistic credibility analysis. In: Decision Support Systems 50. https://doi.org/10.1016/j.dss.2010.08.009

  44. Gary D Bond, Rebecka D Holman, Jamie-Ann L Eggert, Lassiter F Speller, Olivia N Garcia, Sasha C Mejia, Kohlby W Mcinnes, Eleny C Ceniceros, Rebecca Rustige. (2017). ‘Lyin’Ted’,‘Crooked Hillary’, and ‘Deceptive Donald’: Language of Lies in the 2016 US Presidential Debates. In: Applied Cognitive Psychology

    Google Scholar 

  45. Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, Yajun Wang. (2013). Prominent features of rumor propagation in online social media. In: 2013 IEEE 13th International Conference on Data Mining. https://doi.org/10.1109/icdm.2013.61

  46. Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd, The pagerank citation ranking: Bringing order to the web (Technical Report, Stanford InfoLab, 1998)

    Google Scholar 

  47. Jon M Kleinberg. (1999). Authoritative sources in a hyperlinked environment. In: Journal of the ACM (JACM). https://doi.org/10.1145/324133.324140

  48. En.wikipedia.org. (2019). Internet bot. [online] Available at: https://en.wikipedia.org/wiki/Internet_bot [Accessed 31 Mar. 2019]

  49. En.wikipedia.org. (2019). Internet troll. [online] Available at: https://en.wikipedia.org/wiki/Internet_troll [Accessed 31 Mar. 2019]

  50. N. Kambhatla. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL 2004, 2004

    Google Scholar 

  51. Zhou GuoDong, Su Jian, Zhang Jie, and Zhang Min. Exploring various knowledge in relation extraction. In Proceedings of the 43rd annualmeeting on association for computational linguistics, pages 427–434. Association for Computational Linguistics, 2005

    Google Scholar 

  52. Jing Jiang and ChengXiang Zhai. A systematic exploration of the feature space for relation extraction. In HLT-NAACL, pages 113–120, 2007

    Google Scholar 

  53. Yee Seng Chan and Dan Roth. Exploiting syntactico-semantic structures for relation extraction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 551–560. Association for Computational Linguistics, 2011

    Google Scholar 

  54. Nanda Kambhatla. Minority vote: at-least-n voting improves recall for extracting relations. In Proceedings of the COLING/ACL on Main conference poster sessions, pages 460–466. Association for Computational Linguistics, 2006

    Google Scholar 

  55. Raymond J Mooney and Razvan C Bunescu. Subsequence kernels for relation extraction. In Advances in neural information processing systems, pages 171–178, 2005

    Google Scholar 

  56. Michael Collins, Scott Miller, Semantic tagging using a probabilistic context free grammar (Technical report, DTIC Document, 1998)

    Google Scholar 

  57. Scott Miller, Heidi Fox, Lance Ramshaw, and Ralph Weischedel. A novel use of statistical parsing to extract information from text. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, pages 226–233. Association for Computational Linguistics, 2000

    Google Scholar 

  58. Min Zhang, GuoDong Zhou, Aiti Aw, Exploring syntactic structured features over parse trees for relation extraction using kernel methods. Inf. Process. Manage. 44(2), 687–701 (2008)

    Article  Google Scholar 

  59. Guodong Zhou, Longhua Qian, Jianxi Fan, Tree kernel-based semantic relation extraction with rich syntactic and semantic information. Inf. Sci. 180(8), 1313–1325 (2010)

    Article  MathSciNet  Google Scholar 

  60. Le Sun and Xianpei Han. A feature-enriched tree kernel for relation extraction. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 61–67, 2014

    Google Scholar 

  61. Deepak Ravichandran and Eduard Hovy. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 41–47. Association for Computational Linguistics, 2002

    Google Scholar 

  62. Patrick Pantel and Marco Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 113–120. Association for Computational Linguistics, 2006

    Google Scholar 

  63. Mark A Greenwood and Mark Stevenson. Improving semi-supervised acquisition of relation extraction patterns. In Proceedings of the Workshop on Information Extraction Beyond The Document, pages 29–35. Association for Computational Linguistics, 2006

    Google Scholar 

  64. Benjamin Rosenfeld and Ronen Feldman. Using corpus statistics on entities to improve semi-supervised relation extraction from the web. In ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, volume 45, page 600, 2007

    Google Scholar 

  65. Sebastian Blohm and Philipp Cimiano. Using the web to reduce data sparseness in pattern-based information extraction. In Knowledge Discovery in Databases: PKDD 2007, pages 18–29. Springer, 2007

    Google Scholar 

  66. Xu Feiyu, Hans Uszkoreit, Hong Li, A seed-driven bottom-up machine learning framework for extracting relations of various complexity. ACL 7, 584–591 (2007)

    Google Scholar 

  67. Fei-Yu Xu. Bootstrapping Relation Extraction from Semantic Seeds. PhD thesis, Saarland University, 2008

    Google Scholar 

  68. Andrew Carlson, Justin Betteridge, Estevam R Hruschka Jr, and Tom M Mitchell. Coupling semi-supervised learning of categories and relations. In Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing, pages 1–9. Association for Computational Linguistics, 2009

    Google Scholar 

  69. Feiyu Xu, Hans Uszkoreit, Sebastian Krause, and Hong Li. Boosting relation extraction with limited closed-world knowledge. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 1354–1362. Association for Computational Linguistics, 2010

    Google Scholar 

  70. Zhu Xiaojin and Ghahramani Zoubin. Learning from labeled and unlabeled data with label propagation. In CMU CALD tech report CMU-CALD-02–107, 2002

    Google Scholar 

  71. Jinxiu Chen, Donghong Ji, Chew Lim Tan, and Zhengyu Niu. Unsupervised feature selection for relation extraction. In Proceedings of IJCNLP,2005

    Google Scholar 

  72. Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. Discovering relations among named entities from large corpora. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, page 415. Association for Computational Linguistics, 2004

    Google Scholar 

  73. Yulan Yan, Naoaki Okazaki, Yutaka Matsuo, Zhenglu Yang, and Mitsuru Ishizuka. Unsupervised relation extraction by mining wikipedia texts using information from the web. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 1021–1029. Association for Computational Linguistics, 2009

    Google Scholar 

  74. Lorenza Romano, Milen Kouylekov, Idan Szpektor, Ido Dagan, and Alberto Lavelli. Investigating a generic paraphrase-based approach for relation extraction. 2006

    Google Scholar 

  75. Eric Miller. (1998). An Introduction to the Resource Description Framework. In: D-Lib Magazine

    Google Scholar 

  76. W3.org. (2019). SPARQL Query Language for RDF. [online] Available at: https://www.w3.org/TR/rdf-sparql-query [Accessed 29 Jun. 2019]

  77. V. P´erez-Rosas, B. Kleinberg, A. Lefevre, R. Mihalcea. (2018). Automatic detection of fake news. In: Proceedings of the 27th International Conference on Computational Linguistics

    Google Scholar 

  78. H. Lamba, P. Kumaraguru, A. Joshi. (2013). Faking sandy: Characterizing and identifying fake images on Twitter during hurricane sandy. In: 22nd Int. Conf. World Wide Web Companion, Rio de Janeiro, Brazil

    Google Scholar 

  79. C. Boididou, S. Papadopoulos, Y. Kompatsiaris, S. Schifferes, N. Newman. (2014). Challenges of computational verication in social multimedia. In: Proc. 23rd Int. Conf. World Wide Web Companion, pp. 743748

    Google Scholar 

  80. S. Sun, H. Liu, J. He, X. Du. (2013). Detecting event rumors on SinaWeibo automatically. In: Web Technologies and Applications. New York, NY, USA: Springer, pp. 120131

    Google Scholar 

  81. J. D. Burger, J. Henderson, G. Kim, G. Zarrella. (2011). Discriminating gender on Twitter. In: Proc. Conf. Empirical Methods Natural Lang. Process., pp. 13011309

    Google Scholar 

  82. F. Al Zamal, W. Liu, D. Ruths. (2013). Homophily and latent attribute inference: Inferring latent attributes of Twitter users from neighbors. In: Proc. 7th Int. AAAI Conf. Weblogs Social Media, pp. 387390

    Google Scholar 

  83. W. Liu, D. Ruths. (2013). What’s in a name? Using rst names as features for gender inference in Twitter. In: Proc. AAAI Spring Symp., Anal. Microtext, pp. 1016

    Google Scholar 

  84. A. Pal, S. Counts. (2013). What’s in a @name? How name value biases judgment of microblog authors. In: Proc. 5th Int. AAAI Conf. Weblogs Social Media

    Google Scholar 

  85. D. Rao, D. Yarowsky. (2010). Detecting latent user properties in social media. In : Proc. NIPS MLSN Workshop, pp. 17

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prateek Agrawal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Verma, P.K., Agrawal, P. (2021). Study and Detection of Fake News: P2C2-Based Machine Learning Approach. In: Sharma, N., Chakrabarti, A., Balas, V.E., Martinovic, J. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1175. Springer, Singapore. https://doi.org/10.1007/978-981-15-5619-7_18

Download citation

Publish with us

Policies and ethics