Abstract
Fake news is news-like content that has been produced without following journalism principles. Fake news try to mimic the look and feel of real news to intentionally disinform the reader. This phenomenon can have a strong influence on society, thus being potentially a severe problem. To address this phenomenon, systems to detect fake news have been developed, but most of them build upon fact-checking approaches, which are unfit to detect misinformation when a news piece, rather than completely false, is distorted, exaggerated, or even decontextualized. We aim to detect Portuguese fake news by following a forensic linguistics approach. Contrary to previous approaches, we build upon methods of linguistic and stylistic analysis that have been tried and tested in forensic linguists. After collecting corpora from multiple fake news outlets and from a genuine news source, we formulate the task as a text classification problem and demonstrate the effectiveness of the proposed features when training different classifiers for telling fake from genuine news. Furthermore, we perform an ablation study with subsets of features and find that the proposed feature sets are complementary. The highest results reported are very promising, achieving 97% of accuracy and a macro F1-score of 91%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
References
Ahmed, H., Traore, I., Saad, S.: Detection of online fake news using n-gram analysis and machine learning techniques. In: Traore, I., Woungang, I., Awad, A. (eds.) ISDDC 2017. LNCS, vol. 10618, pp. 127–138. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69155-8_9
Alkhodair, S.A., Ding, S.H., Fung, B.C., Liu, J.: Detecting breaking news rumors of emerging topics in social media. Inf. Process. Manage. 57, 102018 (2020)
Bender, J., Davenport, L., Fedler, F., Drager, M.: Reporting for the Media. Oxford University Press, Oxford (2012)
Browne, R.: ‘Junk news’ gets massive engagement on Facebook ahead of EU elections, study finds. CNBC (2019). https://www.cnbc.com/2019/05/21/junk-news-gets-higher-engagement-on-facebook-ahead-of-eu-elections.html. Accessed 19 Apr 2021
Chowdhury, M.F.M., Lavelli, A.: Assessing the practical usability of an automatically annotated corpus. In: Proceedings of the 5th Linguistic Annotation Workshop, pp. 101–109. Association for Computational Linguistics, Portland, Oregon, USA, Jun 2011. https://www.aclweb.org/anthology/W11-0412
Cruz, A., Rocha, G., Sousa-Silva, R., Lopes Cardoso, H.: Team Fernando-Pessa at SemEval-2019 task 4: Back to basics in hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 999–1003. Association for Computational Linguistics, Minneapolis, Minnesota, USA, Jun 2019. https://doi.org/10.18653/v1/S19-2173
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019)
Álvaro Figueira, Oliveira, L.: The current state of fake news: challenges and opportunities. Procedia Computer Science (2017). https://doi.org/10.1016/j.procs.2017.11.106
Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221–233 (1948)
Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York (1952)
Hahn, U., Tomanek, K., Beisswanger, E., Faessler, E.: A proposal for a configurable silver standard. In: Proceedings of the Fourth Linguistic Annotation Workshop, pp. 235–242. Association for Computational Linguistics, Uppsala, Sweden, July 2010. https://www.aclweb.org/anthology/W10-1838
Hancock, J.T., Curry, L.E., Goorha, S., Woodworth, M.: On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Process. 45 (2007). https://doi.org/10.1080/01638530701739181
Harrower, T.: Inside Reporting: A Practical Guide to the Craft of Journalism. McGraw-Hill Companies, Incorporated (2007)
Horne, B.D., Adali, S.: This just. In: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news (2017)
Kincaid, J.P., Aagard, J.A., O’Hara, J.W.: Development and test of a computer readability editing system (CRES). Technical report, TRAINING ANALYSIS AND EVALUATION GROUP (NAVY) ORLANDO FL (1980)
Laboreiro, G., Oliveira, E.: What we can learn from looking at profanity, pp. 108–113 (2014). https://doi.org/10.1007/978-3-319-09761-9_11
Laughlin, G.H.M.: Smog grading-a new readability formula. J. Reading 12(8), 639–646 (1969). http://www.jstor.org/stable/40011226
Litvinova, O., Seredin, P., Litvinova, T., Lyell, J.: Deception detection in Russian texts. In: Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics (2017)
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. arXiv (2019). arXiv:1908.09635
Mourão, R.R., Robertson, C.T.: Fake news as discursive integration: an analysis of sites that publish false, misleading, hyperpartisan and sensational information. Journalism Stud. 20(14), 2077–2095 (2019). https://doi.org/10.1080/1461670X.2019.1566871
Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. arXiv preprint arXiv:1708.07104 (2017)
Sousa-Silva, R.: Computational forensic linguistics: an overview of computational applications in forensic contexts. Language and Law/Linguagem e Direito 5(2), 118–143 (2019)
Sullivan, M.: What it really means when trump calls a story ‘fake news’. https://www.washingtonpost.com/lifestyle/media/what-it-really-means-when-trump-calls-a-story-fake-news/2020/04/13/56fbe2c0-7d8c-11ea-9040-68981f488eed_story.html (2020). Accessed 20 Apr 2021
Tandoc, E., Lim, Z., Ling, R.: Defining “fake news”: a typology of scholarly definitions. Digital Journalism 6 (2017). https://doi.org/10.1080/21670811.2017.1360143
Vorhaus, M.: People increasingly turn to social media for news. https://www.forbes.com/sites/mikevorhaus/2020/06/24/people-increasingly-turn-to-social-media-for-news/ (2020). Accessed 5 Apr 2021
Weber, G.: Top languages. The World’s 10 (2008)
Acknowledgments
This research is supported by project DARGMINTS (POCI/01/0145/FEDER/031460), CLUP (UIDB/00022/2020), and LIACC (FCT/UID/CEC/0027/2020), funded by Fundação para a Ciência e a Tecnologia (FCT).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Moura, R., Sousa-Silva, R., Lopes Cardoso, H. (2021). Automated Fake News Detection Using Computational Forensic Linguistics. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds) Progress in Artificial Intelligence. EPIA 2021. Lecture Notes in Computer Science(), vol 12981. Springer, Cham. https://doi.org/10.1007/978-3-030-86230-5_62
Download citation
DOI: https://doi.org/10.1007/978-3-030-86230-5_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86229-9
Online ISBN: 978-3-030-86230-5
eBook Packages: Computer ScienceComputer Science (R0)