Skip to main content

Automated Fake News Detection Using Computational Forensic Linguistics

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2021)

Abstract

Fake news is news-like content that has been produced without following journalism principles. Fake news try to mimic the look and feel of real news to intentionally disinform the reader. This phenomenon can have a strong influence on society, thus being potentially a severe problem. To address this phenomenon, systems to detect fake news have been developed, but most of them build upon fact-checking approaches, which are unfit to detect misinformation when a news piece, rather than completely false, is distorted, exaggerated, or even decontextualized. We aim to detect Portuguese fake news by following a forensic linguistics approach. Contrary to previous approaches, we build upon methods of linguistic and stylistic analysis that have been tried and tested in forensic linguists. After collecting corpora from multiple fake news outlets and from a genuine news source, we formulate the task as a text classification problem and demonstrate the effectiveness of the proposed features when training different classifiers for telling fake from genuine news. Furthermore, we perform an ablation study with subsets of features and find that the proposed feature sets are complementary. The highest results reported are very promising, achieving 97% of accuracy and a macro F1-score of 91%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    drive.google.com/file/d/1jqiMxbcH6H4ozA3zbTnxphriQx1fKi4G/view.

  2. 2.

    https://github.com/sumeetkr/AwesomeFakeNews.

  3. 3.

    a) sabado.pt/portugal/detalhe/be-pede-audicao-da-erc-para-esclarecer-registo-de-sites-de-fake-news b) dn.pt/edicao-do-dia/11-nov-2018/fake-news-sites-portugueses-com-mais-de-dois-milhoes-de-seguidores–10160885.html .

  4. 4.

    www.nltk.org/howto/portuguese_en.

  5. 5.

    www.github.com/barrust/pyspellchecker.

  6. 6.

    www.spacy.io/models/pt.

  7. 7.

    www.scikit-learn.org.

  8. 8.

    scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.

  9. 9.

    scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.

References

  1. Ahmed, H., Traore, I., Saad, S.: Detection of online fake news using n-gram analysis and machine learning techniques. In: Traore, I., Woungang, I., Awad, A. (eds.) ISDDC 2017. LNCS, vol. 10618, pp. 127–138. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69155-8_9

    Chapter  Google Scholar 

  2. Alkhodair, S.A., Ding, S.H., Fung, B.C., Liu, J.: Detecting breaking news rumors of emerging topics in social media. Inf. Process. Manage. 57, 102018 (2020)

    Google Scholar 

  3. Bender, J., Davenport, L., Fedler, F., Drager, M.: Reporting for the Media. Oxford University Press, Oxford (2012)

    Google Scholar 

  4. Browne, R.: ‘Junk news’ gets massive engagement on Facebook ahead of EU elections, study finds. CNBC (2019). https://www.cnbc.com/2019/05/21/junk-news-gets-higher-engagement-on-facebook-ahead-of-eu-elections.html. Accessed 19 Apr 2021

  5. Chowdhury, M.F.M., Lavelli, A.: Assessing the practical usability of an automatically annotated corpus. In: Proceedings of the 5th Linguistic Annotation Workshop, pp. 101–109. Association for Computational Linguistics, Portland, Oregon, USA, Jun 2011. https://www.aclweb.org/anthology/W11-0412

  6. Cruz, A., Rocha, G., Sousa-Silva, R., Lopes Cardoso, H.: Team Fernando-Pessa at SemEval-2019 task 4: Back to basics in hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 999–1003. Association for Computational Linguistics, Minneapolis, Minnesota, USA, Jun 2019. https://doi.org/10.18653/v1/S19-2173

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019)

    Google Scholar 

  8. Álvaro Figueira, Oliveira, L.: The current state of fake news: challenges and opportunities. Procedia Computer Science (2017). https://doi.org/10.1016/j.procs.2017.11.106

  9. Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221–233 (1948)

    Google Scholar 

  10. Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York (1952)

    Google Scholar 

  11. Hahn, U., Tomanek, K., Beisswanger, E., Faessler, E.: A proposal for a configurable silver standard. In: Proceedings of the Fourth Linguistic Annotation Workshop, pp. 235–242. Association for Computational Linguistics, Uppsala, Sweden, July 2010. https://www.aclweb.org/anthology/W10-1838

  12. Hancock, J.T., Curry, L.E., Goorha, S., Woodworth, M.: On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Process. 45 (2007). https://doi.org/10.1080/01638530701739181

  13. Harrower, T.: Inside Reporting: A Practical Guide to the Craft of Journalism. McGraw-Hill Companies, Incorporated (2007)

    Google Scholar 

  14. Horne, B.D., Adali, S.: This just. In: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news (2017)

    Google Scholar 

  15. Kincaid, J.P., Aagard, J.A., O’Hara, J.W.: Development and test of a computer readability editing system (CRES). Technical report, TRAINING ANALYSIS AND EVALUATION GROUP (NAVY) ORLANDO FL (1980)

    Google Scholar 

  16. Laboreiro, G., Oliveira, E.: What we can learn from looking at profanity, pp. 108–113 (2014). https://doi.org/10.1007/978-3-319-09761-9_11

  17. Laughlin, G.H.M.: Smog grading-a new readability formula. J. Reading 12(8), 639–646 (1969). http://www.jstor.org/stable/40011226

  18. Litvinova, O., Seredin, P., Litvinova, T., Lyell, J.: Deception detection in Russian texts. In: Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics (2017)

    Google Scholar 

  19. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. arXiv (2019). arXiv:1908.09635

  20. Mourão, R.R., Robertson, C.T.: Fake news as discursive integration: an analysis of sites that publish false, misleading, hyperpartisan and sensational information. Journalism Stud. 20(14), 2077–2095 (2019). https://doi.org/10.1080/1461670X.2019.1566871

  21. Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. arXiv preprint arXiv:1708.07104 (2017)

  22. Sousa-Silva, R.: Computational forensic linguistics: an overview of computational applications in forensic contexts. Language and Law/Linguagem e Direito 5(2), 118–143 (2019)

    Google Scholar 

  23. Sullivan, M.: What it really means when trump calls a story ‘fake news’. https://www.washingtonpost.com/lifestyle/media/what-it-really-means-when-trump-calls-a-story-fake-news/2020/04/13/56fbe2c0-7d8c-11ea-9040-68981f488eed_story.html (2020). Accessed 20 Apr 2021

  24. Tandoc, E., Lim, Z., Ling, R.: Defining “fake news”: a typology of scholarly definitions. Digital Journalism 6 (2017). https://doi.org/10.1080/21670811.2017.1360143

  25. Vorhaus, M.: People increasingly turn to social media for news. https://www.forbes.com/sites/mikevorhaus/2020/06/24/people-increasingly-turn-to-social-media-for-news/ (2020). Accessed 5 Apr 2021

  26. Weber, G.: Top languages. The World’s 10 (2008)

    Google Scholar 

Download references

Acknowledgments

This research is supported by project DARGMINTS (POCI/01/0145/FEDER/031460), CLUP (UIDB/00022/2020), and LIACC (FCT/UID/CEC/0027/2020), funded by Fundação para a Ciência e a Tecnologia (FCT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henrique Lopes Cardoso .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moura, R., Sousa-Silva, R., Lopes Cardoso, H. (2021). Automated Fake News Detection Using Computational Forensic Linguistics. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds) Progress in Artificial Intelligence. EPIA 2021. Lecture Notes in Computer Science(), vol 12981. Springer, Cham. https://doi.org/10.1007/978-3-030-86230-5_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86230-5_62

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86229-9

  • Online ISBN: 978-3-030-86230-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics