BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives

Souza, Frederico Dias; Filho, João Baptista de Oliveira e Souza

doi:10.1007/978-3-030-98305-5_20

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13208))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

888 Accesses
3 Citations

Abstract

BERT has revolutionized the NLP field by enabling transfer learning with large language models that can capture complex textual patterns, reaching the state-of-the-art for an expressive number of NLP applications. For text classification tasks, BERT has already been extensively explored. However, aspects like how to better cope with the different embeddings provided by the BERT output layer and the usage of language-specific instead of multilingual models are not well studied in the literature, especially for the Brazilian Portuguese language. The purpose of this article is to conduct an extensive experimental study regarding different strategies for aggregating the features produced in the BERT output layer, with a focus on the sentiment analysis task. The experiments include BERT models trained with Brazilian Portuguese corpora and the multilingual version, contemplating multiple aggregation strategies and open-source datasets with predefined training, validation, and test partitions to facilitate the reproducibility of the results. BERT achieved the highest ROC-AUC values for the majority of cases as compared to TF-IDF. Nonetheless, TF-IDF represents a good trade-off between the predictive performance and computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SentiCode: A new paradigm for one-time training and global prediction in multilingual sentiment analysis

Article 25 May 2022

A Survey of Cross-lingual Sentiment Analysis: Methodologies, Models and Evaluations

Article Open access 08 June 2022

Use of Bert Neural Network Models for Sentiment Analysis in Russian

Article 01 January 2021

References

Carmo, D., Piau, M., Campiotti, I., Nogueira, R., Lotufo, R.: PTT5: pretraining and validating the T5 model on Brazilian Portuguese data (2020)
Google Scholar
Carrico, N., Quaresma, P.: Sentence embeddings and sentence similarity for Portuguese FAQs. In: IberSPEECH 2021, pp. 200–204, March 2021
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, June 2019. https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Finardi, P., Viegas, J.D., Ferreira, G.T., Mansano, A.F., Carid’a, V.F.: BERTaú: Itaú BERT for digital customer service. ArXiv abs/2101.12015 (2021)
Google Scholar
Google: BERT. https://github.com/google-research/bert (2019)
Hartmann, N., et al.: A large corpus of product reviews in Portuguese: tackling out-of-vocabulary words. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). Reykjavik, May 2014. European Language Resources Association (ELRA), Reykjavik (2014)
Google Scholar
Jiang, S., Chen, C., Lin, N., Chen, Z., Chen, J.: Irony detection in the Portuguese language using BERT. In: Iberian Languages Evaluation Forum 2021, pp. 891–897 (2021)
Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998). https://doi.org/10.1080/01638539809545028
Article Google Scholar
Leite, J.A., Silva, D.F., Bontcheva, K., Scarton, C.: Toxic language detection in social media for Brazilian Portuguese: new dataset and multilingual analysis. CoRR abs/2010.04543 (2020). https://arxiv.org/abs/2010.04543
Lopes, E., Correa, U., Freitas, L.: Exploring BERT for aspect extraction in Portuguese language. The International FLAIRS Conference Proceedings 34, April 2021. https://doi.org/10.32473/flairs.v34i1.128357, https://journals.flvc.org/FLAIRS/article/view/128357
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning based text classification: a comprehensive review. arXiv preprint arXiv:2004.03705 (2020)
Neto, A.M.S.A., et al.: SiDi-NLP-Team at IDPT2021: Irony detection in Portuguese 2021. In: Montes, M., et al. (eds.) Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), XXXVII International Conference of the Spanish Society for Natural Language Processing. Málaga, Spain, September 2021. CEUR Workshop Proceedings, vol. 2943, pp. 933–939. CEUR-WS.org (2021), http://ceur-ws.org/Vol-2943/idpt_paper6.pdf
Olist: Brazilian e-commerce public dataset by Olist, November 2018. https://www.kaggle.com/olistbr/brazilian-ecommerce
Real, L., Oshiro, M., Mafra, A.: B2W-Reviews01 - an open product reviews corpus. In: STIL - Symposium in Information and Human Language Technology (2019). https://github.com/b2wdigital/b2w-reviews01
Sousa, R.F.d., Brum, H.B., Nunes, M.d.G.V.: A bunch of helpfulness and sentiment corpora in Brazilian Portuguese. In: Symposium in Information and Human Language Technology - STIL. SBC (2019)
Google Scholar
Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12319, pp. 403–417. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61377-8_28
Chapter Google Scholar
Souza, F.: Brazilian Portuguese sentiment analysis datasets, June 2021. https://www.kaggle.com/fredericods/ptbr-sentiment-analysis-datasets
Souza, F., Filho, J.: Sentiment analysis on Brazilian Portuguese user reviews. In: IEEE Latin American Conference on Computational Intelligence 2021 (preprint), December 2021. https://arxiv.org/abs/2112.05459
Sparck Jones, K.: A statistical Interpretation of Term Specificity and Its Application in Retrieval, pp. 132–142. Taylor Graham Publishing, GBR, San Diego (1988)
Google Scholar
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS 2017, pp. 6000–6010. Curran Associates Inc., Red Hook (2017)
Google Scholar
Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC corpus: a new open resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, May 2018, https://aclanthology.org/L18-1686
Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing (2020)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. (2015). https://proceedings.neurips.cc/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf

Download references

Author information

Authors and Affiliations

Electrical Engineering Program, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Frederico Dias Souza & João Baptista de Oliveira e Souza Filho

Authors

Frederico Dias Souza
View author publications
You can also search for this author in PubMed Google Scholar
João Baptista de Oliveira e Souza Filho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Frederico Dias Souza or João Baptista de Oliveira e Souza Filho .

Editor information

Editors and Affiliations

Universidade de Fortaleza, Fortaleza, Brazil
Vládia Pinheiro
CiTIUS - Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Pablo Gamallo
Universidade Nova de Lisboa, Lisbon, Portugal
Raquel Amaro
University of Sheffield, Sheffield, UK
Carolina Scarton
INESC-ID, Lisbon, Portugal
Fernando Batista
Federal University of São Carlos, São Carlos, Brazil
Diego Silva
University of Lisbon, Lisbon, Portugal
Catarina Magro
Sentimonitor, Porto Alegre, Brazil
Hugo Pinto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Souza, F.D., Filho, J.B.d.O.e.S. (2022). BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives. In: Pinheiro, V., et al. Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science(), vol 13208. Springer, Cham. https://doi.org/10.1007/978-3-030-98305-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-98305-5_20
Published: 16 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98304-8
Online ISBN: 978-3-030-98305-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives

Abstract

Access this chapter

Similar content being viewed by others

SentiCode: A new paradigm for one-time training and global prediction in multilingual sentiment analysis

A Survey of Cross-lingual Sentiment Analysis: Methodologies, Models and Evaluations

Use of Bert Neural Network Models for Sentiment Analysis in Russian

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives

Abstract

Access this chapter

Similar content being viewed by others

SentiCode: A new paradigm for one-time training and global prediction in multilingual sentiment analysis

A Survey of Cross-lingual Sentiment Analysis: Methodologies, Models and Evaluations

Use of Bert Neural Network Models for Sentiment Analysis in Russian

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation