Skip to main content
Log in

Joint evaluation of preprocessing tasks with classifiers for sentiment analysis in Brazilian Portuguese language

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Sentiment analysis is a task that deals with the automatic extraction of sentimental contents expressed in written text. Several approaches in sentiment analysis are based on machine learning techniques, more specifically classifiers that are trained on labeled datasets. In this context, many Natural Language Processing (NLP) tasks are usually employed as a preprocessing step to help improve the quality of the data and to convert them into forms appropriate for the subsequent classification process. Several studies on sentiment analysis in the literature have already performed some evaluation of NLP tasks and/or classification. However, the vast majority of them did not work with texts in the Brazilian Portuguese language and the analyzes did not consider the combination of sets of preprocessing tasks with classifiers. Therefore, in this work, we evaluate the combination of five NLP tasks and three classifiers in the domain of sentiment analysis using texts written in Portuguese. The experimental results showed that different combinations of preprocessing tasks can significantly affect the predictive performance of a classifier for a given dataset. Thus, it is clear the importance of performing the joint evaluation of preprocessing tasks with classifiers when choosing which preprocessing tasks and classifiers should be used for a dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Alam S, Yao N (2019) The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput Math Organ Theory 25(3):319–335. https://doi.org/10.1007/s10588-018-9266-8

    Article  Google Scholar 

  2. Almeida TG, Souza BA, Menezes AAF, Figueiredo CMS, Nakamura EF (2016) Sentiment analysis of portuguese comments from foursquare. In: Proceedings of the brazilian symposium on multimedia and the web. ACM, Teresinha-PI, pp 355–358

  3. Araujo M, Reis J, Pereira A, Benevenuto F (2016) An evaluation of machine translation for multilingual sentence-level sentiment analysis. In: Proceedings of the annual ACM symposium on applied computing. ACM, Pisa, pp 1140–1145

  4. Bird S, Klein E, Loper E (2009) Natural language processing with Python, 1st edn. O’Reilly Media, Inc., Sebastopol

    MATH  Google Scholar 

  5. Camacho-Collados J, P, MT On the role of text preprocessing in neural network architectures: An evaluation study on text categorization and sentiment analysis. CoRR (2017). arXiv 1707.01780

  6. Cirqueira D Jr, AFLJ, Lobato FMF, de Santana ÁL, Pinheiro M (2016) Performance evaluation of sentiment analysis methods for brazilian portuguese. In: Abramowicz W, Alt R, Franczyk B (eds) Business information systems workshops - BIS 2016 international workshops, Leipzig, Germany, July 6-8, 2016, Revised Papers, Lecture notes in business information processing, vol 263. Springer, pp 245–251

  7. Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AYA, Gelbukh AF, Zhou Q (2016) Erratum to: Multilingual sentiment analysis: State of the art and independent comparison of techniques. Cogn Comput 8(4):772–775

    Article  Google Scholar 

  8. Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AYA, Gelbukh AF, Zhou Q (2016) Multilingual sentiment analysis: State of the art and independent comparison of techniques. Cogn Comput 8(4):757–771

    Article  Google Scholar 

  9. Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82

    Article  Google Scholar 

  10. Ferreira RS (2017) Análise de sentimentos – aprenda de uma vez por todas como funciona utilizando dados do twitter. http://minerandodados.com.br/index.php/2017/03/15/analise-de-sentimentos-twitter-como-fazer/. (visited on 27/03/2019)

  11. Fonseca ER, Rosa JLG (2013) Mac-morpho revisited: towards robust part-of-speech tagging. In: Proceedings of the Brazilian symposium in information and human language technology. SBC, Fortaleza-CE, pp 98–107

  12. Freitas LA, Viera R (2015) Feature-level sentiment analysis applied to brazilian portuguese reviews. Ph.D. thesis, PUC-RS

  13. Ghosh M, Sanyal G (2016) Preprocessing and feature selection approach for efficient sentiment analysis on product reviews. In: Satapathy SC, Bhateja V, Udgata SK, Pattnaik PK (eds) Proceedings of the 5th international conference on frontiers in intelligent computing: Theory and applications - FICTA 2016, Bhubaneswar, Odisa, India, Volume 1, advances in intelligent systems and computing, vol 515. Springer, pp 721–730, DOI https://doi.org/10.1007/978-981-10-3153-3_72

  14. Grandin P, Adan JM (2016) Piegas: a systems for sentiment analysis of tweets in portuguese. IEEE Lat Am Trans 14(7):3467–3473

    Article  Google Scholar 

  15. Guimarães RG, Rodríguez DZ, Rosa RL (2017) Aprimoramento da análise de sentimentos em redes sociais utilizando análise léxica e perfil de usuário. Master’s thesis UFLA

  16. Haddi E, Liu X, Shi Y (2013) The role of text pre-processing in sentiment analysis. Procedia Comput Sci 17:26–32

    Article  Google Scholar 

  17. Martins RF, Pereira ACM, Benevenuto F (2015) An approach to sentiment analysis of web applications in portuguese. In: Proceedings of the Brazilian symposium on multimedia and the web. ACM, Manaus-AM, pp 105–112

  18. McNair C, Johnson M, Liu C, Peart M (2017) Worldwide social network users: eMarketer’s estimates and forecast for 2016–2021. Tech. rep. eMarketer

  19. Narr S, Hulfenhaus M, Albayrak S (2012) Language-independent twitter sentiment analysis. In: KDML, LWA. Dortmund, Germany, pp 12–14

  20. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  21. Pereira DA (2020) A survey of sentiment analysis in the portuguese language. Artif Intell Rev https://doi.org/10.1007/s10462-020-09870-1

  22. Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowl-Based Syst 89:14–46

    Article  Google Scholar 

  23. Ribeiro FN, Araújo M, Gonçalves P, André gonçalves M, Benevenuto F (2016) SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Sci 5(1):1–29

    Article  Google Scholar 

  24. Santos FL, Ladeira M (2014) The role of text pre-processing in opinion mining on a social media language dataset. In: Brazilian conference on intelligent systems. IEEE, São Paulo-SP, pp 50–54

  25. Silva IS, Gomide J, Veloso A Jr, WM, Ferreira R (2011) Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In: Proceeding of the 34th international ACM SIGIR conference on research and development in information retrieval, SIGIR 2011. ACM, Beijing, pp 475–484

  26. Simons GF, Fennig CD (eds) (2018) Ethnologue: languages of the world, 21st edn. SIL International, Dallas

  27. Souza BÁ, Almeida TG, Menezes AA, Figueiredo CM, Nakamura FG, Nakamura EF (2017) Uma abordagem para detecção de tópicos relevantes em redes sociais online. In: Proceedings of the Brazilian workshop on social network analysis and mining. SBC, São Paulo-SP, pp 555–566

  28. Souza E, Costa D, Castro D, Vitório D, Teles I, Almeida R, Alves T, Oliveira AL., Gusmão C (2018) Characterising text mining: a systematic mapping review of the portuguese language. IET Softw 12(2):49–75

    Article  Google Scholar 

  29. Souza E, Vitório D, Castro D, Oliveira ALI, Gusmão C (2016) Characterizing opinion mining: a systematic mapping study of the portuguese language. In: Computational processing of the portuguese language - 12th international conference, PROPOR, Lecture Notes in Computer Science, vol 9727. Springer, Tomar, pp 122–127

  30. Stiilpen Jr. M, Merschmann LHC (2016) A methodology to handle social media posts in brazilian portuguese for text mining applications. In: Proceedings of the brazilian symposium on multimedia and the web. ACM, Teresina-PI, pp 239–246

  31. Uysal AK, Gunal S (2014) The impact of preprocessing on text classification. Inf Process Manag 50(1):104–112

    Article  Google Scholar 

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brazil (CAPES) - Finance Code 001, by Fundação de Amparo à Pesquisa do Estado de Minas Gerais - Brazil (FAPEMIG) - Finance Code APQ-02266-16 and by Stilingue Inteligência Artificial Ltda - Brazil - Partnership Agreement 006/2020 - UFLA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Douglas Nunes de Oliveira.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Oliveira, D.N., Merschmann, L.H.d.C. Joint evaluation of preprocessing tasks with classifiers for sentiment analysis in Brazilian Portuguese language. Multimed Tools Appl 80, 15391–15412 (2021). https://doi.org/10.1007/s11042-020-10323-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10323-8

Keywords

Navigation