Abstract
Sentiment analysis is a task that deals with the automatic extraction of sentimental contents expressed in written text. Several approaches in sentiment analysis are based on machine learning techniques, more specifically classifiers that are trained on labeled datasets. In this context, many Natural Language Processing (NLP) tasks are usually employed as a preprocessing step to help improve the quality of the data and to convert them into forms appropriate for the subsequent classification process. Several studies on sentiment analysis in the literature have already performed some evaluation of NLP tasks and/or classification. However, the vast majority of them did not work with texts in the Brazilian Portuguese language and the analyzes did not consider the combination of sets of preprocessing tasks with classifiers. Therefore, in this work, we evaluate the combination of five NLP tasks and three classifiers in the domain of sentiment analysis using texts written in Portuguese. The experimental results showed that different combinations of preprocessing tasks can significantly affect the predictive performance of a classifier for a given dataset. Thus, it is clear the importance of performing the joint evaluation of preprocessing tasks with classifiers when choosing which preprocessing tasks and classifiers should be used for a dataset.
Similar content being viewed by others
References
Alam S, Yao N (2019) The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput Math Organ Theory 25(3):319–335. https://doi.org/10.1007/s10588-018-9266-8
Almeida TG, Souza BA, Menezes AAF, Figueiredo CMS, Nakamura EF (2016) Sentiment analysis of portuguese comments from foursquare. In: Proceedings of the brazilian symposium on multimedia and the web. ACM, Teresinha-PI, pp 355–358
Araujo M, Reis J, Pereira A, Benevenuto F (2016) An evaluation of machine translation for multilingual sentence-level sentiment analysis. In: Proceedings of the annual ACM symposium on applied computing. ACM, Pisa, pp 1140–1145
Bird S, Klein E, Loper E (2009) Natural language processing with Python, 1st edn. O’Reilly Media, Inc., Sebastopol
Camacho-Collados J, P, MT On the role of text preprocessing in neural network architectures: An evaluation study on text categorization and sentiment analysis. CoRR (2017). arXiv 1707.01780
Cirqueira D Jr, AFLJ, Lobato FMF, de Santana ÁL, Pinheiro M (2016) Performance evaluation of sentiment analysis methods for brazilian portuguese. In: Abramowicz W, Alt R, Franczyk B (eds) Business information systems workshops - BIS 2016 international workshops, Leipzig, Germany, July 6-8, 2016, Revised Papers, Lecture notes in business information processing, vol 263. Springer, pp 245–251
Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AYA, Gelbukh AF, Zhou Q (2016) Erratum to: Multilingual sentiment analysis: State of the art and independent comparison of techniques. Cogn Comput 8(4):772–775
Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AYA, Gelbukh AF, Zhou Q (2016) Multilingual sentiment analysis: State of the art and independent comparison of techniques. Cogn Comput 8(4):757–771
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82
Ferreira RS (2017) Análise de sentimentos – aprenda de uma vez por todas como funciona utilizando dados do twitter. http://minerandodados.com.br/index.php/2017/03/15/analise-de-sentimentos-twitter-como-fazer/. (visited on 27/03/2019)
Fonseca ER, Rosa JLG (2013) Mac-morpho revisited: towards robust part-of-speech tagging. In: Proceedings of the Brazilian symposium in information and human language technology. SBC, Fortaleza-CE, pp 98–107
Freitas LA, Viera R (2015) Feature-level sentiment analysis applied to brazilian portuguese reviews. Ph.D. thesis, PUC-RS
Ghosh M, Sanyal G (2016) Preprocessing and feature selection approach for efficient sentiment analysis on product reviews. In: Satapathy SC, Bhateja V, Udgata SK, Pattnaik PK (eds) Proceedings of the 5th international conference on frontiers in intelligent computing: Theory and applications - FICTA 2016, Bhubaneswar, Odisa, India, Volume 1, advances in intelligent systems and computing, vol 515. Springer, pp 721–730, DOI https://doi.org/10.1007/978-981-10-3153-3_72
Grandin P, Adan JM (2016) Piegas: a systems for sentiment analysis of tweets in portuguese. IEEE Lat Am Trans 14(7):3467–3473
Guimarães RG, Rodríguez DZ, Rosa RL (2017) Aprimoramento da análise de sentimentos em redes sociais utilizando análise léxica e perfil de usuário. Master’s thesis UFLA
Haddi E, Liu X, Shi Y (2013) The role of text pre-processing in sentiment analysis. Procedia Comput Sci 17:26–32
Martins RF, Pereira ACM, Benevenuto F (2015) An approach to sentiment analysis of web applications in portuguese. In: Proceedings of the Brazilian symposium on multimedia and the web. ACM, Manaus-AM, pp 105–112
McNair C, Johnson M, Liu C, Peart M (2017) Worldwide social network users: eMarketer’s estimates and forecast for 2016–2021. Tech. rep. eMarketer
Narr S, Hulfenhaus M, Albayrak S (2012) Language-independent twitter sentiment analysis. In: KDML, LWA. Dortmund, Germany, pp 12–14
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830
Pereira DA (2020) A survey of sentiment analysis in the portuguese language. Artif Intell Rev https://doi.org/10.1007/s10462-020-09870-1
Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowl-Based Syst 89:14–46
Ribeiro FN, Araújo M, Gonçalves P, André gonçalves M, Benevenuto F (2016) SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Sci 5(1):1–29
Santos FL, Ladeira M (2014) The role of text pre-processing in opinion mining on a social media language dataset. In: Brazilian conference on intelligent systems. IEEE, São Paulo-SP, pp 50–54
Silva IS, Gomide J, Veloso A Jr, WM, Ferreira R (2011) Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In: Proceeding of the 34th international ACM SIGIR conference on research and development in information retrieval, SIGIR 2011. ACM, Beijing, pp 475–484
Simons GF, Fennig CD (eds) (2018) Ethnologue: languages of the world, 21st edn. SIL International, Dallas
Souza BÁ, Almeida TG, Menezes AA, Figueiredo CM, Nakamura FG, Nakamura EF (2017) Uma abordagem para detecção de tópicos relevantes em redes sociais online. In: Proceedings of the Brazilian workshop on social network analysis and mining. SBC, São Paulo-SP, pp 555–566
Souza E, Costa D, Castro D, Vitório D, Teles I, Almeida R, Alves T, Oliveira AL., Gusmão C (2018) Characterising text mining: a systematic mapping review of the portuguese language. IET Softw 12(2):49–75
Souza E, Vitório D, Castro D, Oliveira ALI, Gusmão C (2016) Characterizing opinion mining: a systematic mapping study of the portuguese language. In: Computational processing of the portuguese language - 12th international conference, PROPOR, Lecture Notes in Computer Science, vol 9727. Springer, Tomar, pp 122–127
Stiilpen Jr. M, Merschmann LHC (2016) A methodology to handle social media posts in brazilian portuguese for text mining applications. In: Proceedings of the brazilian symposium on multimedia and the web. ACM, Teresina-PI, pp 239–246
Uysal AK, Gunal S (2014) The impact of preprocessing on text classification. Inf Process Manag 50(1):104–112
Acknowledgements
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brazil (CAPES) - Finance Code 001, by Fundação de Amparo à Pesquisa do Estado de Minas Gerais - Brazil (FAPEMIG) - Finance Code APQ-02266-16 and by Stilingue Inteligência Artificial Ltda - Brazil - Partnership Agreement 006/2020 - UFLA.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
de Oliveira, D.N., Merschmann, L.H.d.C. Joint evaluation of preprocessing tasks with classifiers for sentiment analysis in Brazilian Portuguese language. Multimed Tools Appl 80, 15391–15412 (2021). https://doi.org/10.1007/s11042-020-10323-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10323-8