Skip to main content

BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2022)

Abstract

BERT has revolutionized the NLP field by enabling transfer learning with large language models that can capture complex textual patterns, reaching the state-of-the-art for an expressive number of NLP applications. For text classification tasks, BERT has already been extensively explored. However, aspects like how to better cope with the different embeddings provided by the BERT output layer and the usage of language-specific instead of multilingual models are not well studied in the literature, especially for the Brazilian Portuguese language. The purpose of this article is to conduct an extensive experimental study regarding different strategies for aggregating the features produced in the BERT output layer, with a focus on the sentiment analysis task. The experiments include BERT models trained with Brazilian Portuguese corpora and the multilingual version, contemplating multiple aggregation strategies and open-source datasets with predefined training, validation, and test partitions to facilitate the reproducibility of the results. BERT achieved the highest ROC-AUC values for the majority of cases as compared to TF-IDF. Nonetheless, TF-IDF represents a good trade-off between the predictive performance and computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Carmo, D., Piau, M., Campiotti, I., Nogueira, R., Lotufo, R.: PTT5: pretraining and validating the T5 model on Brazilian Portuguese data (2020)

    Google Scholar 

  2. Carrico, N., Quaresma, P.: Sentence embeddings and sentence similarity for Portuguese FAQs. In: IberSPEECH 2021, pp. 200–204, March 2021

    Google Scholar 

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, June 2019. https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423

  4. Finardi, P., Viegas, J.D., Ferreira, G.T., Mansano, A.F., Carid’a, V.F.: BERTaú: Itaú BERT for digital customer service. ArXiv abs/2101.12015 (2021)

    Google Scholar 

  5. Google: BERT. https://github.com/google-research/bert (2019)

  6. Hartmann, N., et al.: A large corpus of product reviews in Portuguese: tackling out-of-vocabulary words. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). Reykjavik, May 2014. European Language Resources Association (ELRA), Reykjavik (2014)

    Google Scholar 

  7. Jiang, S., Chen, C., Lin, N., Chen, Z., Chen, J.: Irony detection in the Portuguese language using BERT. In: Iberian Languages Evaluation Forum 2021, pp. 891–897 (2021)

    Google Scholar 

  8. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998). https://doi.org/10.1080/01638539809545028

    Article  Google Scholar 

  9. Leite, J.A., Silva, D.F., Bontcheva, K., Scarton, C.: Toxic language detection in social media for Brazilian Portuguese: new dataset and multilingual analysis. CoRR abs/2010.04543 (2020). https://arxiv.org/abs/2010.04543

  10. Lopes, E., Correa, U., Freitas, L.: Exploring BERT for aspect extraction in Portuguese language. The International FLAIRS Conference Proceedings 34, April 2021. https://doi.org/10.32473/flairs.v34i1.128357, https://journals.flvc.org/FLAIRS/article/view/128357

  11. Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning based text classification: a comprehensive review. arXiv preprint arXiv:2004.03705 (2020)

  12. Neto, A.M.S.A., et al.: SiDi-NLP-Team at IDPT2021: Irony detection in Portuguese 2021. In: Montes, M., et al. (eds.) Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), XXXVII International Conference of the Spanish Society for Natural Language Processing. Málaga, Spain, September 2021. CEUR Workshop Proceedings, vol. 2943, pp. 933–939. CEUR-WS.org (2021), http://ceur-ws.org/Vol-2943/idpt_paper6.pdf

  13. Olist: Brazilian e-commerce public dataset by Olist, November 2018. https://www.kaggle.com/olistbr/brazilian-ecommerce

  14. Real, L., Oshiro, M., Mafra, A.: B2W-Reviews01 - an open product reviews corpus. In: STIL - Symposium in Information and Human Language Technology (2019). https://github.com/b2wdigital/b2w-reviews01

  15. Sousa, R.F.d., Brum, H.B., Nunes, M.d.G.V.: A bunch of helpfulness and sentiment corpora in Brazilian Portuguese. In: Symposium in Information and Human Language Technology - STIL. SBC (2019)

    Google Scholar 

  16. Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12319, pp. 403–417. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61377-8_28

    Chapter  Google Scholar 

  17. Souza, F.: Brazilian Portuguese sentiment analysis datasets, June 2021. https://www.kaggle.com/fredericods/ptbr-sentiment-analysis-datasets

  18. Souza, F., Filho, J.: Sentiment analysis on Brazilian Portuguese user reviews. In: IEEE Latin American Conference on Computational Intelligence 2021 (preprint), December 2021. https://arxiv.org/abs/2112.05459

  19. Sparck Jones, K.: A statistical Interpretation of Term Specificity and Its Application in Retrieval, pp. 132–142. Taylor Graham Publishing, GBR, San Diego (1988)

    Google Scholar 

  20. Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16

    Chapter  Google Scholar 

  21. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS 2017, pp. 6000–6010. Curran Associates Inc., Red Hook (2017)

    Google Scholar 

  22. Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC corpus: a new open resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, May 2018, https://aclanthology.org/L18-1686

  23. Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing (2020)

    Google Scholar 

  24. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. (2015). https://proceedings.neurips.cc/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Frederico Dias Souza or João Baptista de Oliveira e Souza Filho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Souza, F.D., Filho, J.B.d.O.e.S. (2022). BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives. In: Pinheiro, V., et al. Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science(), vol 13208. Springer, Cham. https://doi.org/10.1007/978-3-030-98305-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-98305-5_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-98304-8

  • Online ISBN: 978-3-030-98305-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics