Studying the Effects of Text Preprocessing and Ensemble Methods on Sentiment Analysis of Brazilian Portuguese Tweets

Gomes, Fernando Barbosa; Adán-Coello, Juan Manuel; Kintschner, Fernando Ernesto

doi:10.1007/978-3-030-00810-9_15

Studying the Effects of Text Preprocessing and Ensemble Methods on Sentiment Analysis of Brazilian Portuguese Tweets

Fernando Barbosa Gomes¹⁶,
Juan Manuel Adán-Coello¹⁶ &
Fernando Ernesto Kintschner¹⁶

Conference paper
First Online: 19 September 2018

625 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11171))

Abstract

The analysis of social media posts can provide useful feedback regarding user experience for people and organizations. This task requires the use of computational tools due to the massive amount of content and the speed at which it is generated. In this article we study the effects of text preprocessing heuristics and ensembles of machine learning algorithms on the accuracy and polarity bias of classifiers when performing sentiment analysis on short text messages. The results of an experimental evaluation performed on a Brazilian Portuguese tweets dataset have shown that these strategies have significant impact on increasing classification accuracy, particularly when the ensembles include a deep neural net, but not always on reducing polarity bias.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://blog.echen.me/2011/10/24/winning-the-netflix-prize-a-summary/.
2.
https://www.kaggle.com/.
3.
Fivefold cross validation is a method in which 80% of the dataset is used to train the algorithm, while the rest 20% are used to test its accuracy. The 80–20% chunks of data are swapped five times until all data have been used for both testing and training.

References

Astya, P.: Sentiment analysis: approaches and open issues. In: 2017 International Conference on Computing, Communication and Automation (ICCCA), pp. 154–158. IEEE (2017)
Google Scholar
Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_13
Chapter Google Scholar
Ng, A., Jordan, M.: On discriminative vs generative classifiers: a comparison of logistic regression and Naive Bayes. In: Advances in Neural Information Processing Systems, vol. 14 (2002)
Google Scholar
Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)
Article Google Scholar
Nasrabadi, N.M.: Pattern recognition and machine learning. J. Electron. Imaging 16, 049901 (2007)
Article Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on Empirical Methods in Natural Language Processing (2013)
Google Scholar
Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Stanford CoreNLP – Natural language software. https://stanfordnlp.github.io/CoreNLP/
Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms. Chapman and Hall, Boca Raton (2012)
Book Google Scholar
Teh, P.L., Rayson, P., Pak, I., Piao, S., Yeng, S.M.: Reversing the polarity with emoticons. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2016. LNCS, vol. 9612, pp. 453–458. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41754-7_48
Chapter Google Scholar
Powers, D.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. (2011)
Google Scholar
Rosenthal, S., et al.: SemEval-2015 task 10: sentiment analysis in Twitter. In: Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval 2015, Denver, Colorado (2015)
Google Scholar
Brum, H.B., das Nunes, M.G.V.: Building a sentiment corpus of Tweets in Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan (2018)
Google Scholar
Freitas, C., Motta, E., Milidiú, R., César, J.: Vampiro que brilha… rá! Desafios na anotação de opinião em um corpus de resenhas de livros. Encontro de Linguística de Corpus 11, 22 (2012)
Google Scholar
dos Santos, F.L., Ladeira, M.: The role of text pre-processing in opinion mining on a social media language dataset. In: 2014 Brazilian Conference on Intelligent Systems (BRACIS), pp. 50–54. IEEE (2014)
Google Scholar
Antonio, J.D., Santin, A.C.L.: “Haters gonna hate”: challenges for sentiment analysis of Facebook comments in Brazilian Portuguese. In: Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms, pp. 64–72 (2017)
Google Scholar
Balage Filho, P.P., Pardo, T.A.S., Aluísio, S.M.: An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (2013)
Google Scholar
Cirqueira, D., Jacob, A., Lobato, F., de Santana, A.L., Pinheiro, M.: Performance evaluation of sentiment analysis methods for Brazilian Portuguese. In: Abramowicz, W., Alt, R., Franczyk, B. (eds.) BIS 2016. LNBIP, vol. 263, pp. 245–251. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52464-1_22
Chapter Google Scholar
de Araujo, G.D., Teixeira, F.O., Mancini, F., de Paiva Guimarães, M., Pisa, I.T.: Sentiment analysis of Twitter’s health messages in Brazilian Portuguese. J. Health Inform. 10 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Pontifícia Universidade Católica de Campinas, Campinas, Brazil
Fernando Barbosa Gomes, Juan Manuel Adán-Coello & Fernando Ernesto Kintschner

Authors

Fernando Barbosa Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Juan Manuel Adán-Coello
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Ernesto Kintschner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fernando Barbosa Gomes .

Editor information

Editors and Affiliations

University of Mons, Mons, Belgium
Thierry Dutoit
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
University of Mons, Mons, Belgium
Gueorgui Pironkov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gomes, F.B., Adán-Coello, J.M., Kintschner, F.E. (2018). Studying the Effects of Text Preprocessing and Ensemble Methods on Sentiment Analysis of Brazilian Portuguese Tweets. In: Dutoit, T., Martín-Vide, C., Pironkov, G. (eds) Statistical Language and Speech Processing. SLSP 2018. Lecture Notes in Computer Science(), vol 11171. Springer, Cham. https://doi.org/10.1007/978-3-030-00810-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-00810-9_15
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00809-3
Online ISBN: 978-3-030-00810-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics