Skip to main content

BERT Transformers Performance Comparison for Sentiment Analysis: A Case Study in Spanish

  • Conference paper
  • First Online:
Good Practices and New Perspectives in Information Systems and Technologies (WorldCIST 2024)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 989))

Included in the following conference series:

  • 14 Accesses

Abstract

Despite the fact that the Spanish language is the second most spoken language in the world, research about AI and sentiment analysis is few compared with English language as research target. This paper refers to some works about sentiment analysis for the Spanish language that use BERT transformers and other technologies for this kind of sentiment analysis task; but the quality model based on indicators such as accuracy level could be different according to the tool or BERT version used. In addition, about the BERT family, it is challenging to determine which versions or subversions could perform better for sentiment analysis and also comply with the Spanish language when they are used on common platforms such as Colab. Therefore, the present study seeks to address this issue by establishing objectives, such as identifying relevant datasets based on the quality of Spanish used and having balanced subsets; also, locating different Spanish trained models; and proposing a method of comparison that involves relevant variables. We propose a weighted index that combines the F1-Score and the retraining time in different scenarios to help making better decisions. The results of this study indicate that the DistilBERT, RoBERTa, and ALBERT models have highest performances, but BERT remains in top positions as a consistent model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Spanish pre-trained BERT model.

  2. 2.

    Spanish LLM model based on RoBERTa.

References

  1. Lin, T., Wang, Y., Liu, X., Qiu, X.: A survey of transformers. AI open 3, 111–132 (2022)

    Google Scholar 

  2. Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA

    Google Scholar 

  3. Mohammed, A.H., Ali, A.H.: Survey of BERT (Bidirectional encoder representation transformer) types. J. Phys. Conf. Ser. 1963(1), 012173 (2021)

    Article  Google Scholar 

  4. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019). arXiv:1810.04805

  5. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019). arXiv:1907.11692

  6. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations (2020). arXiv:1909.11942

  7. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2020). arXiv:1910.01108

  8. He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with disentangled attention (2021). arXiv:2006.03654

  9. Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer (2020). arXiv:2004.05150

  10. Fernandez, L.: IMDB Dataset of 50K movie reviews (Spanish). Kaggle (2021). Accessed Aug 2023. https://www.kaggle.com/datasets/luisdiegofv97/imdb-dataset-of-50k-movie-reviews-spanish

  11. Romero, M.: BETO (Spanish BERT) + Spanish SQuAD2.0. Hugging Face, 11 Feb 2020. https://huggingface.co/mrm8488/bert-base-spanish-wwm-cased-finetuned-spa-squad2-es. Accessed Aug 2023

  12. IIC - Institute of knowledge engineering, Autonomous university of Madrid. IIC/roberta-base-spanish-squades. Hugging Face, 17 Mar 2022. https://huggingface.co/IIC/roberta-base-spanish-squades. Accessed Aug 2023

  13. DCCUChile - Department of Computer Sciences, University of Chile. dccuchile/albert-xlarge-spanish-finetuned-mldoc. Hugging Face, 11 Jan 2022 b. https://huggingface.co/dccuchile/albert-xlarge-spanish-finetuned-mldoc. Accessed Aug 2023

  14. DCCUChile - Department of Computer Sciences, University of Chile. dccuchile/distilbert-base-spanish-uncased-finetuned-mldoc. Hugging Face, 11 Jan 2022. https://huggingface.co/dccuchile/distilbert-base-spanish-uncased-finetuned-mldoc. Accessed Aug 2023

  15. PLN@CMM - Natural Language Processing Group of the Center for Mathematical Modeling, University of Chile, mdeberta-cowese-base-es. Hugging Face, 04 Jul 2022. https://huggingface.co/plncmm/mdeberta-cowese-base-es. Accessed Aug 2023

  16. Heras, J.: joheras/longformer-base-4096-bne-es-finetuned-v2. Hugging Face, 03 May 2023. Accessed Aug 2023. https://huggingface.co/joheras/longformer-base-4096-bne-es-finetuned-v2

  17. Sierra, J.A.: Spanish continues to grow and has almost 500 million native speakers, according to the Cervantes Institute’s yearbook 2022. Atalayar, 27 Oct 2022. https://www.atalayar.com/en/articulo/culture/spanish-continues-grow-and-has-almost-500-million-native-speakers-according-cervantes/20221026154937158810.html. Accessed Dec 2023

  18. Wankhade, M., Rao, A.C.S., Kulkarni, C.: A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 55(7), 5731–5780 (2022)

    Article  Google Scholar 

  19. Nazir, A., Rao, Y., Wu, L., Sun, L.: Issues and challenges of aspect-based sentiment analysis: a comprehensive survey. IEEE Trans. Affect. Comput. 13(2), 845–863 (2022)

    Article  Google Scholar 

  20. Shi, Y., Zhu, L., Li, W., Guo, K., Zheng, Y.: Survey on classic and latest textual sentiment analysis articles and techniques. Int. J. Inf. Tech. Dec. Mak. 18(04), 1243–1287 (2019)

    Article  Google Scholar 

  21. Alswaidan, N., Menai, M.E.B.: A survey of state-of-the-art approaches for emotion recognition in text. Knowl. Inf. Syst. 62(8), 2937–2987 (2020)

    Article  Google Scholar 

  22. Plaza-Del-Arco, F.M., Molina-Gonzalez, M.D., Urena-Lopez, L.A., Martin-Valdivia, M.T.: A multi-task learning approach to hate speech detection leveraging sentiment analysis. IEEE Access. 9, 112478–112489 (2021)

    Article  Google Scholar 

  23. López Condori, J.J., Gonzales Saji, F.O.: Análisis de sentimiento de comentarios en español en Google Play Store usando BERT. Ingeniare Rev chil ing. 29(3), 557–563 (2021)

    Article  Google Scholar 

  24. Palomino, R., Meléndez, C., Mauricio, D., Valverde-Rebaza, J.: ANEW for Spanish Twitter sentiment analysis using instance-based multi-label learning algorithms. In: Lossio-Ventura, J., Muñante, D., Alatrista-Salas, H. (eds.) Information Management and Big Data. SIMBig 2018, CCIS, vol. 898, pp. 46–53. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11680-4_6

  25. Vasquez, J., Gomez-Adorno, H., Bel-Enguix, G.: Bert-based approach for sentiment analysis of Spanish reviews from Tripadvisor (2021)

    Google Scholar 

  26. Martínez-Seis, B.C., Pichardo-Lagunas, O., Miranda, S., Perez-Cazares, I.J., Rodriguez-Gonzalez, J.A.: Deep learning approach for aspect-based sentiment analysis of restaurants reviews in Spanish. CyS 26(2), 899–908 (2022)

    Google Scholar 

  27. Sánchez-Holgado, P., Martín-Merino Acera, M., Blanco Herrero, D.: Del data-driven al data-feeling: análisis de sentimiento en tiempo real de mensajes en español sobre divulgación científica usando técnicas de aprendizaje automático. Disertaciones (Internet). 17 Jan 2020, vol. 13, no. 1, Accessed 28 Dec 2023

    Google Scholar 

  28. Viñán-Ludeña, M.S., De Campos, L.M.: Discovering a tourism destination with social media data: BERT-based sentiment analysis. JHTT. 13(5), 907–921 (2022)

    Article  Google Scholar 

  29. Pan, R., García-Díaz, J.A., Garcia-Sanchez, F., Valencia-García, R.: Evaluation of transformer models for financial targeted sentiment analysis in Spanish. PeerJ. Comput. Sci. 9(9), e1377 (2023)

    Article  Google Scholar 

  30. Barriere, V., Balahur, A.: Improving sentiment analysis over Non-English tweets using multilingual transformers and automatic translation for data-augmentation. In: Proceedings of the 28th International Conference on Computational Linguistics (Internet). Barcelona, Spain (Online): International Committee on Computational Linguistics, pp. 266–271 (2020). Accessed 28 Dec 2023

    Google Scholar 

  31. Pérez, J.M., Furman, D.A., Alemany, L.A., Luque, F., RoBERTuito: a pre-trained language model for social media text in Spanish (Internet). arXiv (2022). Accessed 28 Dec 2023

    Google Scholar 

  32. Palomino, D., Ochoa-Luna, J.: Advanced transfer learning approach for improving spanish sentiment analysis. In: Martínez-Villaseñor, L., Batyrshin, I., Marín-Hernández, A. (eds.) Advances in Soft Computing. MICAI 2019, LNCS, vol. 11835, pp. 112–123. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33749-0_10

  33. Shaitarova A, Rinaldi F. Negation typology and general representation models for cross-lingual zero-shot negation scope resolution in Russian, French, and Spanish. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop (Internet). Online: Association for Computational Linguistics, pp. 15–23 (2021). Accessed 28 Dec 2023

    Google Scholar 

  34. Rivera-Guamán, R.R., Cumbicus-Pineda, O.M., López-Lapo, R.A., Neyra-Romero, L.A.: Sentiment analysis related of international festival of living arts Loja-Ecuador employing knowledge discovery in text. In: Botto-Tobar, M., Montes León, S., Camacho, O., Chávez, D., Torres-Carrión, P., Zambrano Vizuete, M. (eds.) Applied Technologies. ICAT 2020, Communications in Computer and Information Science, vol. 1388, pp. 327–339. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71503-8_25

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gerardo Bárcena Ruiz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bárcena Ruiz, G., de Jesús Gil, R. (2024). BERT Transformers Performance Comparison for Sentiment Analysis: A Case Study in Spanish. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Poniszewska-Marańda, A. (eds) Good Practices and New Perspectives in Information Systems and Technologies. WorldCIST 2024. Lecture Notes in Networks and Systems, vol 989. Springer, Cham. https://doi.org/10.1007/978-3-031-60227-6_13

Download citation

Publish with us

Policies and ethics