Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11122)


Fake news are a problem of our time. They may influence a large number of people on a wide range of subjects, from politics to health. Although they have always existed, the volume of fake news has recently increased due to the soaring number of users of social networks and instant messengers. These news may cause direct losses to people and corporations, as fake news may include defamation of people, products and companies. Moreover, the scarcity of labeled datasets, mainly in Portuguese, prevents training classifiers to automatically filter such documents. In this paper, we investigate the issue for the Portuguese language. Inspired by previous initiatives for other languages, we introduce the first reference corpus in this area for Portuguese, composed of aligned true and fake news, which we analyze to uncover some of their linguistic characteristics. Then, using machine learning techniques, we run some automatic detection methods in this corpus, showing that good results may be achieved.


Fake news Reference corpus Linguistic features Machine learning 



The authors are grateful to FAPESP, CAPES and CNPq for supporting this work.


  1. 1.
    Appling, D.S., Briscoe, E.J., Hutto, C.J.: Discriminative models for predicting deception strategies. In: Proceedings of the 24th International Conference on World Wide Web, pp. 947–952 (2015)Google Scholar
  2. 2.
    Balage Filho, P.P., Pardo, T.A., Aluısio, S.M.: An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology, pp. 215–219 (2013)Google Scholar
  3. 3.
    Bond Jr., C.F., DePaulo, B.M.: Accuracy of deception judgments. Personal. Soc. Psychol. Rev. 10(3), 214–234 (2006)CrossRefGoogle Scholar
  4. 4.
    Clem, S.: Post-truth and vices opposed to truth. J. Soc. Christ. Eth. 37(2), 97–116 (2017)CrossRefGoogle Scholar
  5. 5.
    Conroy, N.J., Rubin, V.L., Chen, Y.: Automatic deception detection: methods for finding fake news. In: Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community, pp. 82:1–82:4 (2015)CrossRefGoogle Scholar
  6. 6.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  7. 7.
    Duran, N.D., Hall, C., McCarthy, P.M., McNamara, D.S.: The linguistic correlates of conversational deceprion: comparing natural language processing technologies. Appl. Psycholinguist. 31(3), 439–462 (2010)CrossRefGoogle Scholar
  8. 8.
    Ferreira, W., Vlachos, A.: Emergent: a novel data-set for stance classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1163–1168. Association for Computational Linguistics (2016)Google Scholar
  9. 9.
    Fonseca, E.R., Aluísio, S.M.: A deep architecture for non-projective dependency parsing. In: Proceedings of the NAACL-HLT Workshop on Vector Space Modeling for NLP (2015)Google Scholar
  10. 10.
    George, J.F., Keane, B.T.: Deception detection by third party observers. In: Paper Presented at the Deception Detection Symposium, 39th Annual Hawaii International Conference on System Sciences (2006)Google Scholar
  11. 11.
    Gimenes, G., Cordeiro, R.L., Rodrigues-Jr, J.F.: Orfel: efficient detection of defamation or illegitimate promotion in online recommendation. Inf. Sci. 379, 274–287 (2017)CrossRefGoogle Scholar
  12. 12.
    Hauch, V., Blandón-Gitlin, I., Masip, J., Sporer, S.L.: Are computers effective lie detectors? A meta-analysis of linguistic cues to deception. Personal. Soc. Psychol. Rev. 19(4), 307–342 (2015)CrossRefGoogle Scholar
  13. 13.
    Musskopf, I.: A ciência da detecção de fake news, September 2017.
  14. 14.
    Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. CoRR abs/1708.07104 (2017)Google Scholar
  15. 15.
    Pérez-Rosas, V., Mihalcea, R.: Cross-cultural deception detection. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 440–445 (2014)Google Scholar
  16. 16.
    Pérez-Rosas, V., Mihalcea, R.: Experiments in open domain deception detection. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1120–1125 (2015)Google Scholar
  17. 17.
    Rubin, V.L., Chen, Y., Conroy, N.J.: Deception detection for news: three types of fakes. Proc. Assoc. Inf. Sci. Technol. 52(1), 1–4 (2015)CrossRefGoogle Scholar
  18. 18.
    Rubin, V.L., Conroy, N.J., Chen, Y., Cornwell, S.: Fake news or truth? Using satirical cues to detect potentially misleading news. In: Proceedings of 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 7–17 (2016)Google Scholar
  19. 19.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc., New York (1986)zbMATHGoogle Scholar
  20. 20.
    Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018)CrossRefGoogle Scholar
  21. 21.
    Wang, W.Y.: “Liar, Liar Pants on Fire": a new benchmark dataset for fake news detection. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada (2017)Google Scholar
  22. 22.
    Zhou, L., Burgoon, J., Twitchell, D., Qin, T., Nunamaker Jr., J.: A comparison of classification methods for predicting deception in computer-mediated communication. J. Manag. Inf. Syst. 20(4), 139–165 (2004)CrossRefGoogle Scholar
  23. 23.
    Zhou, L., Twitchell, D.P., Qin, T., Burgoon, J.K., Nunamaker, J.F.: An exploratory study into deception detection in text-based computer-mediated communication. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences (2003)Google Scholar
  24. 24.
    Zhou, L., Zhang, D.: Following linguistic footprints: automatic deception detection in online communication. Commun. ACM - Enterp. Inf. Integr.: Tools Merging Data 51(9), 119–122 (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Interinstitutional Center for Computational Linguistics (NILC)University of São PauloSão CarlosBrazil
  2. 2.Federal University of São CarlosSorocabaBrazil
  3. 3.University of São PauloRibeirão PretoBrazil
  4. 4.Interinstitutional Center for Computational Linguistics (NILC)Federal University of São CarlosSão CarlosBrazil

Personalised recommendations