Abstract
The analysis of social media posts can provide useful feedback regarding user experience for people and organizations. This task requires the use of computational tools due to the massive amount of content and the speed at which it is generated. In this article we study the effects of text preprocessing heuristics and ensembles of machine learning algorithms on the accuracy and polarity bias of classifiers when performing sentiment analysis on short text messages. The results of an experimental evaluation performed on a Brazilian Portuguese tweets dataset have shown that these strategies have significant impact on increasing classification accuracy, particularly when the ensembles include a deep neural net, but not always on reducing polarity bias.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
Fivefold cross validation is a method in which 80% of the dataset is used to train the algorithm, while the rest 20% are used to test its accuracy. The 80–20% chunks of data are swapped five times until all data have been used for both testing and training.
References
Astya, P.: Sentiment analysis: approaches and open issues. In: 2017 International Conference on Computing, Communication and Automation (ICCCA), pp. 154–158. IEEE (2017)
Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_13
Ng, A., Jordan, M.: On discriminative vs generative classifiers: a comparison of logistic regression and Naive Bayes. In: Advances in Neural Information Processing Systems, vol. 14 (2002)
Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)
Nasrabadi, N.M.: Pattern recognition and machine learning. J. Electron. Imaging 16, 049901 (2007)
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on Empirical Methods in Natural Language Processing (2013)
Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Stanford CoreNLP – Natural language software. https://stanfordnlp.github.io/CoreNLP/
Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms. Chapman and Hall, Boca Raton (2012)
Teh, P.L., Rayson, P., Pak, I., Piao, S., Yeng, S.M.: Reversing the polarity with emoticons. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2016. LNCS, vol. 9612, pp. 453–458. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41754-7_48
Powers, D.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. (2011)
Rosenthal, S., et al.: SemEval-2015 task 10: sentiment analysis in Twitter. In: Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval 2015, Denver, Colorado (2015)
Brum, H.B., das Nunes, M.G.V.: Building a sentiment corpus of Tweets in Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan (2018)
Freitas, C., Motta, E., Milidiú, R., César, J.: Vampiro que brilha… rá! Desafios na anotação de opinião em um corpus de resenhas de livros. Encontro de Linguística de Corpus 11, 22 (2012)
dos Santos, F.L., Ladeira, M.: The role of text pre-processing in opinion mining on a social media language dataset. In: 2014 Brazilian Conference on Intelligent Systems (BRACIS), pp. 50–54. IEEE (2014)
Antonio, J.D., Santin, A.C.L.: “Haters gonna hate”: challenges for sentiment analysis of Facebook comments in Brazilian Portuguese. In: Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms, pp. 64–72 (2017)
Balage Filho, P.P., Pardo, T.A.S., Aluísio, S.M.: An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (2013)
Cirqueira, D., Jacob, A., Lobato, F., de Santana, A.L., Pinheiro, M.: Performance evaluation of sentiment analysis methods for Brazilian Portuguese. In: Abramowicz, W., Alt, R., Franczyk, B. (eds.) BIS 2016. LNBIP, vol. 263, pp. 245–251. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52464-1_22
de Araujo, G.D., Teixeira, F.O., Mancini, F., de Paiva Guimarães, M., Pisa, I.T.: Sentiment analysis of Twitter’s health messages in Brazilian Portuguese. J. Health Inform. 10 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Gomes, F.B., Adán-Coello, J.M., Kintschner, F.E. (2018). Studying the Effects of Text Preprocessing and Ensemble Methods on Sentiment Analysis of Brazilian Portuguese Tweets. In: Dutoit, T., Martín-Vide, C., Pironkov, G. (eds) Statistical Language and Speech Processing. SLSP 2018. Lecture Notes in Computer Science(), vol 11171. Springer, Cham. https://doi.org/10.1007/978-3-030-00810-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-00810-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00809-3
Online ISBN: 978-3-030-00810-9
eBook Packages: Computer ScienceComputer Science (R0)