Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results

Monteiro, Rafael A.; Santos, Roney L. S.; Pardo, Thiago A. S.; de Almeida, Tiago A.; Ruiz, Evandro E. S.; Vale, Oto A.

doi:10.1007/978-3-319-99722-3_33

Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results

Rafael A. Monteiro²¹,
Roney L. S. Santos²¹,
Thiago A. S. Pardo²¹,
Tiago A. de Almeida²²,
Evandro E. S. Ruiz²³ &
…
Oto A. Vale²⁴

Conference paper
First Online: 26 August 2018

1390 Accesses
38 Citations
8 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11122))

Abstract

Fake news are a problem of our time. They may influence a large number of people on a wide range of subjects, from politics to health. Although they have always existed, the volume of fake news has recently increased due to the soaring number of users of social networks and instant messengers. These news may cause direct losses to people and corporations, as fake news may include defamation of people, products and companies. Moreover, the scarcity of labeled datasets, mainly in Portuguese, prevents training classifiers to automatically filter such documents. In this paper, we investigate the issue for the Portuguese language. Inspired by previous initiatives for other languages, we introduce the first reference corpus in this area for Portuguese, composed of aligned true and fake news, which we analyze to uncover some of their linguistic characteristics. Then, using machine learning techniques, we run some automatic detection methods in this corpus, showing that good results may be achieved.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://fortune.com/2018/04/10/facebook-cambridge-analytica-what-happened/.
2.
https://www.theguardian.com/technology/2017/may/07/the-great-british-brexit-robbery-hijacked-democracy/.
3.
http://piaui.folha.uol.com.br/lupa/.
4.
http://www.boatos.org/.
5.
https://newsroom.fb.com/news/2017/04/working-to-stop-misinformation-and-false-news/.
6.
We could realize that most of the checked sites shared many fake news.
7.
Half truth may be defined as the case in which some actual facts are told in order to give support to false facts [4].
8.
We also remove numeric values in order to help avoiding sparsity.
9.
https://sites.google.com/icmc.usp.br/opinando/.

References

Appling, D.S., Briscoe, E.J., Hutto, C.J.: Discriminative models for predicting deception strategies. In: Proceedings of the 24th International Conference on World Wide Web, pp. 947–952 (2015)
Google Scholar
Balage Filho, P.P., Pardo, T.A., Aluısio, S.M.: An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology, pp. 215–219 (2013)
Google Scholar
Bond Jr., C.F., DePaulo, B.M.: Accuracy of deception judgments. Personal. Soc. Psychol. Rev. 10(3), 214–234 (2006)
Article Google Scholar
Clem, S.: Post-truth and vices opposed to truth. J. Soc. Christ. Eth. 37(2), 97–116 (2017)
Article Google Scholar
Conroy, N.J., Rubin, V.L., Chen, Y.: Automatic deception detection: methods for finding fake news. In: Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community, pp. 82:1–82:4 (2015)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Duran, N.D., Hall, C., McCarthy, P.M., McNamara, D.S.: The linguistic correlates of conversational deceprion: comparing natural language processing technologies. Appl. Psycholinguist. 31(3), 439–462 (2010)
Article Google Scholar
Ferreira, W., Vlachos, A.: Emergent: a novel data-set for stance classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1163–1168. Association for Computational Linguistics (2016)
Google Scholar
Fonseca, E.R., Aluísio, S.M.: A deep architecture for non-projective dependency parsing. In: Proceedings of the NAACL-HLT Workshop on Vector Space Modeling for NLP (2015)
Google Scholar
George, J.F., Keane, B.T.: Deception detection by third party observers. In: Paper Presented at the Deception Detection Symposium, 39th Annual Hawaii International Conference on System Sciences (2006)
Google Scholar
Gimenes, G., Cordeiro, R.L., Rodrigues-Jr, J.F.: Orfel: efficient detection of defamation or illegitimate promotion in online recommendation. Inf. Sci. 379, 274–287 (2017)
Article Google Scholar
Hauch, V., Blandón-Gitlin, I., Masip, J., Sporer, S.L.: Are computers effective lie detectors? A meta-analysis of linguistic cues to deception. Personal. Soc. Psychol. Rev. 19(4), 307–342 (2015)
Article Google Scholar
Musskopf, I.: A ciência da detecção de fake news, September 2017. https://medium.com/data-science-brigade/a-ci%C3%AAncia-da-detec%C3%A7%C3%A3o-de-fake-news-d4faef2281aa
Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. CoRR abs/1708.07104 (2017)
Google Scholar
Pérez-Rosas, V., Mihalcea, R.: Cross-cultural deception detection. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 440–445 (2014)
Google Scholar
Pérez-Rosas, V., Mihalcea, R.: Experiments in open domain deception detection. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1120–1125 (2015)
Google Scholar
Rubin, V.L., Chen, Y., Conroy, N.J.: Deception detection for news: three types of fakes. Proc. Assoc. Inf. Sci. Technol. 52(1), 1–4 (2015)
Article Google Scholar
Rubin, V.L., Conroy, N.J., Chen, Y., Cornwell, S.: Fake news or truth? Using satirical cues to detect potentially misleading news. In: Proceedings of 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 7–17 (2016)
Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc., New York (1986)
MATH Google Scholar
Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018)
Article Google Scholar
Wang, W.Y.: “Liar, Liar Pants on Fire": a new benchmark dataset for fake news detection. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada (2017)
Google Scholar
Zhou, L., Burgoon, J., Twitchell, D., Qin, T., Nunamaker Jr., J.: A comparison of classification methods for predicting deception in computer-mediated communication. J. Manag. Inf. Syst. 20(4), 139–165 (2004)
Article Google Scholar
Zhou, L., Twitchell, D.P., Qin, T., Burgoon, J.K., Nunamaker, J.F.: An exploratory study into deception detection in text-based computer-mediated communication. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences (2003)
Google Scholar
Zhou, L., Zhang, D.: Following linguistic footprints: automatic deception detection in online communication. Commun. ACM - Enterp. Inf. Integr.: Tools Merging Data 51(9), 119–122 (2008)
Google Scholar

Download references

Acknowledgments

The authors are grateful to FAPESP, CAPES and CNPq for supporting this work.

Author information

Authors and Affiliations

Interinstitutional Center for Computational Linguistics (NILC), University of São Paulo, São Carlos, Brazil
Rafael A. Monteiro, Roney L. S. Santos & Thiago A. S. Pardo
Federal University of São Carlos, Sorocaba, Brazil
Tiago A. de Almeida
University of São Paulo, Ribeirão Preto, Brazil
Evandro E. S. Ruiz
Interinstitutional Center for Computational Linguistics (NILC), Federal University of São Carlos, São Carlos, Brazil
Oto A. Vale

Authors

Rafael A. Monteiro
View author publications
You can also search for this author in PubMed Google Scholar
Roney L. S. Santos
View author publications
You can also search for this author in PubMed Google Scholar
Thiago A. S. Pardo
View author publications
You can also search for this author in PubMed Google Scholar
Tiago A. de Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Evandro E. S. Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Oto A. Vale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thiago A. S. Pardo .

Editor information

Editors and Affiliations

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Aline Villavicencio
Instituto de Informática - UFRGS, Porto Alegre, Brazil
Viviane Moreira
INESC-ID, Lisbon, Portugal
Alberto Abad
UFSCAR, Sao Carlos, Brazil
Helena Caseli
Centro Singular de Investigación en Tecnoloxías, Universidade de Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Pablo Gamallo
Université de Toulon, Parc Scientifique Technologique Luminy, Marseille, France
Carlos Ramisch
Centro de Informática e Sistemas, Universidade de Coimbra, Coimbra, Portugal
Hugo Gonçalo Oliveira
Federal University of Technology, Dois Vizinhos, Paraná, Brazil
Gustavo Henrique Paetzold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Monteiro, R.A., Santos, R.L.S., Pardo, T.A.S., de Almeida, T.A., Ruiz, E.E.S., Vale, O.A. (2018). Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-99722-3_33
Published: 26 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99721-6
Online ISBN: 978-3-319-99722-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics