Study and Detection of Fake News: P2C2-Based Machine Learning Approach

Verma, Pawan Kumar; Agrawal, Prateek

doi:10.1007/978-981-15-5619-7_18

Pawan Kumar Verma^18,19 &
Prateek Agrawal^19,20

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1175))

941 Accesses
3 Citations

Abstract

News is the most important and sensitive piece of information which affects the society nowadays. In the current scenario, there are two ways to propagate news all over the world; first one is the traditional way, i.e., newspaper and second is electronic media like social media websites. Electronic media is the most popular medium these days because it helps to propagate news to huge audience in few seconds. Besides these benefits of electronic media, it has one disadvantage also, i.e., “spreading the Fake News”. Fake news is the most common problem these days. Even big companies like Twitter, Facebook, etc. are facing fake news problems. Several researchers are working in these big companies to solve this problem. Fake news can be defined as the news story that is not true. In some specific words, we can say that news is fake if any news agency declares a piece of news deliberately written as false and it is also verifiably as false. This paper focuses on some key characteristics of fake news and how it is affecting the society nowadays. It also includes various key viewpoints which are useful to categorize whether the news is fake or not. At last, this paper discussed some key challenges and future directions that help in increasing accuracy in detection of fake news on the basis of P²C² (Propagation, Pattern, Comprehension & Credibility) approach having two phases: Detection and Verification. This paper helps readers in two ways (i) Newcomer can easily get the basic knowledge and impact of fake news; (ii) They can get knowledge of different perspectives of fake news which are helpful in the detection process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fake News Types and Detection Models on Social Media A State-of-the-Art Survey

A Walk Through Various Paradigms for Fake News Detection on Social Media

Fake News Detection Using Machine Learning

References

Internetlivestats.com. (2019). Internet Live Stats - Internet Usage & Social Media Statistics. [online] Available at: http://www.internetlivestats.com [Accessed 19 Mar. 2019]
Pew Research Center. (2019). Social media outpaces print newspapers in the U.S. as a news source. [online] Available at: https://pewrsr.ch/2rsoHtb [Accessed 19 Mar. 2019]
K. Shu, A. Sliva, S. Wang, J. Tang, H. Liu, Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsl 19(1), 22–36 (2017)
Article Google Scholar
Hunt Allcott, Matthew Gentzkow, Social Media and Fake News in the 2016 Election. Journal of Economic Perspectives 31(2), 211–236 (2017). https://doi.org/10.1257/jep.31.2.211
Article Google Scholar
Alexander Smith and Vladimir Banic. (2016). Fake News: How a partying Macedonian teen earns thousands publishing lies. In: NBC News; Accessed: March 20, 2019
Google Scholar
Craig Silverman. (Nov. 2016). This analysis shows how viral fake election news stories outperformed real news on Facebook. In: BuzzFeed News. Accessed: March 20, 2019
Google Scholar
Homebusinessmag.com. (2019). [online] Available at: https://homebusinessmag.com/blog/money-matters/fake-news-impact-stock-market-prices [Accessed 21 Mar. 2019]
Nitin Jindal, Bing Liu. (2008). Opinion spam and analysis. In: Proceedings of the 1st ACM International Conference on Web Search and Data Mining
Google Scholar
Huayi Li, Geli Fei, Shuai Wang, Bing Liu, Weixiang Shao, Arjun Mukherjee, Jidong Shao. (2017). Bimodal distribution and co-bursting in review spam detection. In: Proceedings of the 26th International Conference on World Wide Web
Google Scholar
Arjun Mukherjee, Bing Liu, Natalie Glance. (2012). Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web (ACM)
Google Scholar
Myle Ott, Yejin Choi, Claire Cardie, Jeffrey T Hancock. (2011). Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Google Scholar
Vlad Sandulescu, Martin Ester. (2015). Detecting singleton review spammers using semantic similarity. In: Proceedings of the 24th international conference on World Wide Web(ACM)
Google Scholar
Edson C. Tandoc Jr., Zheng Wei Lim, Richard Ling. (August 2017). Defining “Fake News.” A Typology of Scholarly Definitions. In: Digital Journalism
Google Scholar
Andrea, Mecacci. (2016). Aesthetics of Fake. An Overview. In: Aisthesis. http://dx.doi.org/10.13128/Aisthesis-19416;
Soroush Vosoughi, Deb Roy, Sinan Aral. (2018). The spread of true and false news online. In: Science Vol.359. https://doi.org/10.1126/science.aap9559
Dan Berkowitz, David Asa Schwartz. (2016). Miley, CNN and The Onion: When fake news becomes realer than real. In: Journalism Practice. https://doi.org/10.1080/17512786.2015.1006933
Nir Kshetri, Jeffrey Voas. (2017). The Economics of “Fake News”. In: IT Professional 6. https://doi.org/10.1109/mitp.2017.4241459
Adam Kucharski. (2016). Post-truth: Study epidemiology of fake news. In: Nature. https://doi.org/10.1038/540525a
Cody Buntain, Jennifer Golbeck. (2017). Automatically Identifying Fake News in Popular Twitter Threads. In: 2017 IEEE International Conference on Smart Cloud (SmartCloud). https://doi.org/10.1109/smartcloud.2017.40
Dale, R. (2017). NLP in a post-truth world. Published online by Cambridge University Press: 31 January 2017. DOI:https://doi.org/10.1017/S1351324917000018
Ahmed, H. (2017). Detecting opinion spam and fake news using n-gram analysis and semantic similarity Ph.D. thesis. Online Access: https://dspace.library.uvic.ca//handle/1828/8796
M. Mendoza, B. Poblete, C. Castillo. (July 2010). Twitter under crisis: Can we trust what we RT? In: 1st Workshop on Social Media Analytics (SOMA’10). ACM Press. https://doi.org/10.1145/1964858.1964869
Newsroom.fb.com. (2019). Addressing Hoaxes and Fake News| Facebook Newsroom. [online] Available at: https://newsroom.fb.com/news/2016/12/news-feed-fyi-addressing-hoaxes-and-fake-news/ [Accessed 15 Mar. 2019]
Newsinitiative.withgoogle.com. (2019). [online] Available at: https://newsinitiative.withgoogle.com [Accessed 21 Mar. 2019]
Scopus.com. (2019). Scopus preview - Scopus - Welcome to Scopus. [online] Available at: https://www.scopus.com [Accessed 24 Jul. 2019]
Duke Reporters’ Lab. (2019). Fact- Checking - Duke Reporters’ Lab. [online] Available at: https://reporterslab.org/fact-checking [Accessed 25 Jul. 2019]
Truthsetter.com. (2019). TruthSetter. [online] Available at: https://truthsetter.com [Accessed 26 Jul. 2019]
Fiskkit.com. (2019). Fiskkit. [online] Available at: https://fiskkit.com [Accessed 26 Jul. 2019]
Sachin Pawar, Girish K Palshikar, Pushpak Bhattacharyya. (2017). Relation Extraction: A Survey. arXiv preprint arXiv:1712.05191
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, Wei Zhang. (2014). Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 601–610. https://doi.org/10.1145/2623330.2623623
Rebecca C Steorts, Rob Hall, Stephen E Fienberg. (2016). A Bayesian Approach to Graphical Record Linkage and De-duplication. In: Journal of the American Statistical Association: Theory and Methods
Google Scholar
Johannes Hoffart, Fabian M Suchanek, Klaus Berberich, Gerhard Weikum. (2013). YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. In: Artificial Intelligence. https://doi.org/10.1016/j.artint.2012.06.001
Amr Magdy, Nayer Wanas. (2010). Web-based statistical fact checking of textual documents. In: Proceedings of the 2nd international workshop on Search and mining user-generated contents. ACM. https://doi.org/10.1145/1871985.1872002
Diego Esteves, Aniketh Janardhan Reddy, Piyush Chawla, Jens Lehmann. (2018). Belittling the Source: Trustworthiness Indicators to Obfuscate Fake News on the Web. In: EMNLP 2018: Conference on Empirical Methods in Natural Language Processing. arXiv preprint arXiv:1809.00494 (2018)
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor. (2008). Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM
Google Scholar
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hruschka Jr, Tom M Mitchell. (2010). Toward an architecture for never-ending language learning. In AAAI, Vol. 5. Atlanta, 3
Google Scholar
Ndapandula Nakashole, Gerhard Weikum, Fabian Suchanek. (2012). PATTY: a taxonomy of relational patterns with semantic types. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics
Google Scholar
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary Ives. (2007). Dbpedia: A nucleus for a web of open data. In The semantic web. Springer, 722–735
Google Scholar
Jinie Pak, Lina Zhou. (2015). A comparison of features for automatic deception detection in synchronous computer-mediated communication. In:2015 IEEE International Conference on Intelligence and Security Informatics (ISI). https://doi.org/10.1109/isi.2015.7165955
Popoola, O. (2018). Detecting Fake Amazon Book Reviews using Rhetorical Structure Theory
Google Scholar
Chloé Braud, Anders Søgaard. (2017). Is writing style predictive of scientific fraud?. arXiv preprint arXiv:1707.04095
Mohamed Abouelenien, Verónica Pérez-Rosas, Bohan Zhao, Rada Mihalcea, Mihai Burzo. (2017). Gender-based multimodal deception detection. In: Proceedings of the Symposium on Applied Computing, ACM
Google Scholar
Sean L Humpherys, Kevin C Moffitt, Mary B Burns, Judee K Burgoon, William F Felix. (2011). Identification of fraudulent financial statements using linguistic credibility analysis. In: Decision Support Systems 50. https://doi.org/10.1016/j.dss.2010.08.009
Gary D Bond, Rebecka D Holman, Jamie-Ann L Eggert, Lassiter F Speller, Olivia N Garcia, Sasha C Mejia, Kohlby W Mcinnes, Eleny C Ceniceros, Rebecca Rustige. (2017). ‘Lyin’Ted’,‘Crooked Hillary’, and ‘Deceptive Donald’: Language of Lies in the 2016 US Presidential Debates. In: Applied Cognitive Psychology
Google Scholar
Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, Yajun Wang. (2013). Prominent features of rumor propagation in online social media. In: 2013 IEEE 13th International Conference on Data Mining. https://doi.org/10.1109/icdm.2013.61
Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd, The pagerank citation ranking: Bringing order to the web (Technical Report, Stanford InfoLab, 1998)
Google Scholar
Jon M Kleinberg. (1999). Authoritative sources in a hyperlinked environment. In: Journal of the ACM (JACM). https://doi.org/10.1145/324133.324140
En.wikipedia.org. (2019). Internet bot. [online] Available at: https://en.wikipedia.org/wiki/Internet_bot [Accessed 31 Mar. 2019]
En.wikipedia.org. (2019). Internet troll. [online] Available at: https://en.wikipedia.org/wiki/Internet_troll [Accessed 31 Mar. 2019]
N. Kambhatla. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL 2004, 2004
Google Scholar
Zhou GuoDong, Su Jian, Zhang Jie, and Zhang Min. Exploring various knowledge in relation extraction. In Proceedings of the 43rd annualmeeting on association for computational linguistics, pages 427–434. Association for Computational Linguistics, 2005
Google Scholar
Jing Jiang and ChengXiang Zhai. A systematic exploration of the feature space for relation extraction. In HLT-NAACL, pages 113–120, 2007
Google Scholar
Yee Seng Chan and Dan Roth. Exploiting syntactico-semantic structures for relation extraction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 551–560. Association for Computational Linguistics, 2011
Google Scholar
Nanda Kambhatla. Minority vote: at-least-n voting improves recall for extracting relations. In Proceedings of the COLING/ACL on Main conference poster sessions, pages 460–466. Association for Computational Linguistics, 2006
Google Scholar
Raymond J Mooney and Razvan C Bunescu. Subsequence kernels for relation extraction. In Advances in neural information processing systems, pages 171–178, 2005
Google Scholar
Michael Collins, Scott Miller, Semantic tagging using a probabilistic context free grammar (Technical report, DTIC Document, 1998)
Google Scholar
Scott Miller, Heidi Fox, Lance Ramshaw, and Ralph Weischedel. A novel use of statistical parsing to extract information from text. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, pages 226–233. Association for Computational Linguistics, 2000
Google Scholar
Min Zhang, GuoDong Zhou, Aiti Aw, Exploring syntactic structured features over parse trees for relation extraction using kernel methods. Inf. Process. Manage. 44(2), 687–701 (2008)
Article Google Scholar
Guodong Zhou, Longhua Qian, Jianxi Fan, Tree kernel-based semantic relation extraction with rich syntactic and semantic information. Inf. Sci. 180(8), 1313–1325 (2010)
Article MathSciNet Google Scholar
Le Sun and Xianpei Han. A feature-enriched tree kernel for relation extraction. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 61–67, 2014
Google Scholar
Deepak Ravichandran and Eduard Hovy. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 41–47. Association for Computational Linguistics, 2002
Google Scholar
Patrick Pantel and Marco Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 113–120. Association for Computational Linguistics, 2006
Google Scholar
Mark A Greenwood and Mark Stevenson. Improving semi-supervised acquisition of relation extraction patterns. In Proceedings of the Workshop on Information Extraction Beyond The Document, pages 29–35. Association for Computational Linguistics, 2006
Google Scholar
Benjamin Rosenfeld and Ronen Feldman. Using corpus statistics on entities to improve semi-supervised relation extraction from the web. In ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, volume 45, page 600, 2007
Google Scholar
Sebastian Blohm and Philipp Cimiano. Using the web to reduce data sparseness in pattern-based information extraction. In Knowledge Discovery in Databases: PKDD 2007, pages 18–29. Springer, 2007
Google Scholar
Xu Feiyu, Hans Uszkoreit, Hong Li, A seed-driven bottom-up machine learning framework for extracting relations of various complexity. ACL 7, 584–591 (2007)
Google Scholar
Fei-Yu Xu. Bootstrapping Relation Extraction from Semantic Seeds. PhD thesis, Saarland University, 2008
Google Scholar
Andrew Carlson, Justin Betteridge, Estevam R Hruschka Jr, and Tom M Mitchell. Coupling semi-supervised learning of categories and relations. In Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing, pages 1–9. Association for Computational Linguistics, 2009
Google Scholar
Feiyu Xu, Hans Uszkoreit, Sebastian Krause, and Hong Li. Boosting relation extraction with limited closed-world knowledge. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 1354–1362. Association for Computational Linguistics, 2010
Google Scholar
Zhu Xiaojin and Ghahramani Zoubin. Learning from labeled and unlabeled data with label propagation. In CMU CALD tech report CMU-CALD-02–107, 2002
Google Scholar
Jinxiu Chen, Donghong Ji, Chew Lim Tan, and Zhengyu Niu. Unsupervised feature selection for relation extraction. In Proceedings of IJCNLP,2005
Google Scholar
Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. Discovering relations among named entities from large corpora. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, page 415. Association for Computational Linguistics, 2004
Google Scholar
Yulan Yan, Naoaki Okazaki, Yutaka Matsuo, Zhenglu Yang, and Mitsuru Ishizuka. Unsupervised relation extraction by mining wikipedia texts using information from the web. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 1021–1029. Association for Computational Linguistics, 2009
Google Scholar
Lorenza Romano, Milen Kouylekov, Idan Szpektor, Ido Dagan, and Alberto Lavelli. Investigating a generic paraphrase-based approach for relation extraction. 2006
Google Scholar
Eric Miller. (1998). An Introduction to the Resource Description Framework. In: D-Lib Magazine
Google Scholar
W3.org. (2019). SPARQL Query Language for RDF. [online] Available at: https://www.w3.org/TR/rdf-sparql-query [Accessed 29 Jun. 2019]
V. P´erez-Rosas, B. Kleinberg, A. Lefevre, R. Mihalcea. (2018). Automatic detection of fake news. In: Proceedings of the 27th International Conference on Computational Linguistics
Google Scholar
H. Lamba, P. Kumaraguru, A. Joshi. (2013). Faking sandy: Characterizing and identifying fake images on Twitter during hurricane sandy. In: 22nd Int. Conf. World Wide Web Companion, Rio de Janeiro, Brazil
Google Scholar
C. Boididou, S. Papadopoulos, Y. Kompatsiaris, S. Schifferes, N. Newman. (2014). Challenges of computational verication in social multimedia. In: Proc. 23rd Int. Conf. World Wide Web Companion, pp. 743748
Google Scholar
S. Sun, H. Liu, J. He, X. Du. (2013). Detecting event rumors on SinaWeibo automatically. In: Web Technologies and Applications. New York, NY, USA: Springer, pp. 120131
Google Scholar
J. D. Burger, J. Henderson, G. Kim, G. Zarrella. (2011). Discriminating gender on Twitter. In: Proc. Conf. Empirical Methods Natural Lang. Process., pp. 13011309
Google Scholar
F. Al Zamal, W. Liu, D. Ruths. (2013). Homophily and latent attribute inference: Inferring latent attributes of Twitter users from neighbors. In: Proc. 7th Int. AAAI Conf. Weblogs Social Media, pp. 387390
Google Scholar
W. Liu, D. Ruths. (2013). What’s in a name? Using rst names as features for gender inference in Twitter. In: Proc. AAAI Spring Symp., Anal. Microtext, pp. 1016
Google Scholar
A. Pal, S. Counts. (2013). What’s in a @name? How name value biases judgment of microblog authors. In: Proc. 5th Int. AAAI Conf. Weblogs Social Media
Google Scholar
D. Rao, D. Yarowsky. (2010). Detecting latent user properties in social media. In : Proc. NIPS MLSN Workshop, pp. 17
Google Scholar

Download references

Author information

Authors and Affiliations

GLA University, Mathura, India
Pawan Kumar Verma
Lovely Professional University, Punjab, India
Pawan Kumar Verma & Prateek Agrawal
University of Klagenfurt, Klagenfurt, Austria
Prateek Agrawal

Authors

Pawan Kumar Verma
View author publications
You can also search for this author in PubMed Google Scholar
Prateek Agrawal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prateek Agrawal .

Editor information

Editors and Affiliations

Society for Data Science, Pune, Maharashtra, India
Neha Sharma
A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
Amlan Chakrabarti
Department of Automatics and Applied Software, Faculty of Engineering, University of Arad, Arad, Romania
Valentina Emilia Balas
IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic
Jan Martinovic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Verma, P.K., Agrawal, P. (2021). Study and Detection of Fake News: P²C²-Based Machine Learning Approach. In: Sharma, N., Chakrabarti, A., Balas, V.E., Martinovic, J. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1175. Springer, Singapore. https://doi.org/10.1007/978-981-15-5619-7_18

Download citation

DOI: https://doi.org/10.1007/978-981-15-5619-7_18
Published: 19 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5618-0
Online ISBN: 978-981-15-5619-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Study and Detection of Fake News: P²C²-Based Machine Learning Approach

Abstract

Access this chapter

Similar content being viewed by others

Fake News Types and Detection Models on Social Media A State-of-the-Art Survey

A Walk Through Various Paradigms for Fake News Detection on Social Media

Fake News Detection Using Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Study and Detection of Fake News: P2C2-Based Machine Learning Approach

Abstract

Access this chapter

Similar content being viewed by others

Fake News Types and Detection Models on Social Media A State-of-the-Art Survey

A Walk Through Various Paradigms for Fake News Detection on Social Media

Fake News Detection Using Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Study and Detection of Fake News: P²C²-Based Machine Learning Approach