Skip to main content

Studying the Effects of Text Preprocessing and Ensemble Methods on Sentiment Analysis of Brazilian Portuguese Tweets

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11171))

Abstract

The analysis of social media posts can provide useful feedback regarding user experience for people and organizations. This task requires the use of computational tools due to the massive amount of content and the speed at which it is generated. In this article we study the effects of text preprocessing heuristics and ensembles of machine learning algorithms on the accuracy and polarity bias of classifiers when performing sentiment analysis on short text messages. The results of an experimental evaluation performed on a Brazilian Portuguese tweets dataset have shown that these strategies have significant impact on increasing classification accuracy, particularly when the ensembles include a deep neural net, but not always on reducing polarity bias.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://blog.echen.me/2011/10/24/winning-the-netflix-prize-a-summary/.

  2. 2.

    https://www.kaggle.com/.

  3. 3.

    Fivefold cross validation is a method in which 80% of the dataset is used to train the algorithm, while the rest 20% are used to test its accuracy. The 80–20% chunks of data are swapped five times until all data have been used for both testing and training.

References

  1. Astya, P.: Sentiment analysis: approaches and open issues. In: 2017 International Conference on Computing, Communication and Automation (ICCCA), pp. 154–158. IEEE (2017)

    Google Scholar 

  2. Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_13

    Chapter  Google Scholar 

  3. Ng, A., Jordan, M.: On discriminative vs generative classifiers: a comparison of logistic regression and Naive Bayes. In: Advances in Neural Information Processing Systems, vol. 14 (2002)

    Google Scholar 

  4. Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)

    Article  Google Scholar 

  5. Nasrabadi, N.M.: Pattern recognition and machine learning. J. Electron. Imaging 16, 049901 (2007)

    Article  Google Scholar 

  6. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on Empirical Methods in Natural Language Processing (2013)

    Google Scholar 

  7. Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  8. Stanford CoreNLP – Natural language software. https://stanfordnlp.github.io/CoreNLP/

  9. Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms. Chapman and Hall, Boca Raton (2012)

    Book  Google Scholar 

  10. Teh, P.L., Rayson, P., Pak, I., Piao, S., Yeng, S.M.: Reversing the polarity with emoticons. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2016. LNCS, vol. 9612, pp. 453–458. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41754-7_48

    Chapter  Google Scholar 

  11. Powers, D.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. (2011)

    Google Scholar 

  12. Rosenthal, S., et al.: SemEval-2015 task 10: sentiment analysis in Twitter. In: Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval 2015, Denver, Colorado (2015)

    Google Scholar 

  13. Brum, H.B., das Nunes, M.G.V.: Building a sentiment corpus of Tweets in Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan (2018)

    Google Scholar 

  14. Freitas, C., Motta, E., Milidiú, R., César, J.: Vampiro que brilha… rá! Desafios na anotação de opinião em um corpus de resenhas de livros. Encontro de Linguística de Corpus 11, 22 (2012)

    Google Scholar 

  15. dos Santos, F.L., Ladeira, M.: The role of text pre-processing in opinion mining on a social media language dataset. In: 2014 Brazilian Conference on Intelligent Systems (BRACIS), pp. 50–54. IEEE (2014)

    Google Scholar 

  16. Antonio, J.D., Santin, A.C.L.: “Haters gonna hate”: challenges for sentiment analysis of Facebook comments in Brazilian Portuguese. In: Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms, pp. 64–72 (2017)

    Google Scholar 

  17. Balage Filho, P.P., Pardo, T.A.S., Aluísio, S.M.: An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (2013)

    Google Scholar 

  18. Cirqueira, D., Jacob, A., Lobato, F., de Santana, A.L., Pinheiro, M.: Performance evaluation of sentiment analysis methods for Brazilian Portuguese. In: Abramowicz, W., Alt, R., Franczyk, B. (eds.) BIS 2016. LNBIP, vol. 263, pp. 245–251. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52464-1_22

    Chapter  Google Scholar 

  19. de Araujo, G.D., Teixeira, F.O., Mancini, F., de Paiva Guimarães, M., Pisa, I.T.: Sentiment analysis of Twitter’s health messages in Brazilian Portuguese. J. Health Inform. 10 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fernando Barbosa Gomes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gomes, F.B., Adán-Coello, J.M., Kintschner, F.E. (2018). Studying the Effects of Text Preprocessing and Ensemble Methods on Sentiment Analysis of Brazilian Portuguese Tweets. In: Dutoit, T., Martín-Vide, C., Pironkov, G. (eds) Statistical Language and Speech Processing. SLSP 2018. Lecture Notes in Computer Science(), vol 11171. Springer, Cham. https://doi.org/10.1007/978-3-030-00810-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00810-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00809-3

  • Online ISBN: 978-3-030-00810-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics