Abstract
Despite the fact that the Spanish language is the second most spoken language in the world, research about AI and sentiment analysis is few compared with English language as research target. This paper refers to some works about sentiment analysis for the Spanish language that use BERT transformers and other technologies for this kind of sentiment analysis task; but the quality model based on indicators such as accuracy level could be different according to the tool or BERT version used. In addition, about the BERT family, it is challenging to determine which versions or subversions could perform better for sentiment analysis and also comply with the Spanish language when they are used on common platforms such as Colab. Therefore, the present study seeks to address this issue by establishing objectives, such as identifying relevant datasets based on the quality of Spanish used and having balanced subsets; also, locating different Spanish trained models; and proposing a method of comparison that involves relevant variables. We propose a weighted index that combines the F1-Score and the retraining time in different scenarios to help making better decisions. The results of this study indicate that the DistilBERT, RoBERTa, and ALBERT models have highest performances, but BERT remains in top positions as a consistent model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Spanish pre-trained BERT model.
- 2.
Spanish LLM model based on RoBERTa.
References
Lin, T., Wang, Y., Liu, X., Qiu, X.: A survey of transformers. AI open 3, 111–132 (2022)
Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
Mohammed, A.H., Ali, A.H.: Survey of BERT (Bidirectional encoder representation transformer) types. J. Phys. Conf. Ser. 1963(1), 012173 (2021)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019). arXiv:1810.04805
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019). arXiv:1907.11692
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations (2020). arXiv:1909.11942
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2020). arXiv:1910.01108
He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with disentangled attention (2021). arXiv:2006.03654
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer (2020). arXiv:2004.05150
Fernandez, L.: IMDB Dataset of 50K movie reviews (Spanish). Kaggle (2021). Accessed Aug 2023. https://www.kaggle.com/datasets/luisdiegofv97/imdb-dataset-of-50k-movie-reviews-spanish
Romero, M.: BETO (Spanish BERT) + Spanish SQuAD2.0. Hugging Face, 11 Feb 2020. https://huggingface.co/mrm8488/bert-base-spanish-wwm-cased-finetuned-spa-squad2-es. Accessed Aug 2023
IIC - Institute of knowledge engineering, Autonomous university of Madrid. IIC/roberta-base-spanish-squades. Hugging Face, 17 Mar 2022. https://huggingface.co/IIC/roberta-base-spanish-squades. Accessed Aug 2023
DCCUChile - Department of Computer Sciences, University of Chile. dccuchile/albert-xlarge-spanish-finetuned-mldoc. Hugging Face, 11 Jan 2022 b. https://huggingface.co/dccuchile/albert-xlarge-spanish-finetuned-mldoc. Accessed Aug 2023
DCCUChile - Department of Computer Sciences, University of Chile. dccuchile/distilbert-base-spanish-uncased-finetuned-mldoc. Hugging Face, 11 Jan 2022. https://huggingface.co/dccuchile/distilbert-base-spanish-uncased-finetuned-mldoc. Accessed Aug 2023
PLN@CMM - Natural Language Processing Group of the Center for Mathematical Modeling, University of Chile, mdeberta-cowese-base-es. Hugging Face, 04 Jul 2022. https://huggingface.co/plncmm/mdeberta-cowese-base-es. Accessed Aug 2023
Heras, J.: joheras/longformer-base-4096-bne-es-finetuned-v2. Hugging Face, 03 May 2023. Accessed Aug 2023. https://huggingface.co/joheras/longformer-base-4096-bne-es-finetuned-v2
Sierra, J.A.: Spanish continues to grow and has almost 500 million native speakers, according to the Cervantes Institute’s yearbook 2022. Atalayar, 27 Oct 2022. https://www.atalayar.com/en/articulo/culture/spanish-continues-grow-and-has-almost-500-million-native-speakers-according-cervantes/20221026154937158810.html. Accessed Dec 2023
Wankhade, M., Rao, A.C.S., Kulkarni, C.: A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 55(7), 5731–5780 (2022)
Nazir, A., Rao, Y., Wu, L., Sun, L.: Issues and challenges of aspect-based sentiment analysis: a comprehensive survey. IEEE Trans. Affect. Comput. 13(2), 845–863 (2022)
Shi, Y., Zhu, L., Li, W., Guo, K., Zheng, Y.: Survey on classic and latest textual sentiment analysis articles and techniques. Int. J. Inf. Tech. Dec. Mak. 18(04), 1243–1287 (2019)
Alswaidan, N., Menai, M.E.B.: A survey of state-of-the-art approaches for emotion recognition in text. Knowl. Inf. Syst. 62(8), 2937–2987 (2020)
Plaza-Del-Arco, F.M., Molina-Gonzalez, M.D., Urena-Lopez, L.A., Martin-Valdivia, M.T.: A multi-task learning approach to hate speech detection leveraging sentiment analysis. IEEE Access. 9, 112478–112489 (2021)
López Condori, J.J., Gonzales Saji, F.O.: Análisis de sentimiento de comentarios en español en Google Play Store usando BERT. Ingeniare Rev chil ing. 29(3), 557–563 (2021)
Palomino, R., Meléndez, C., Mauricio, D., Valverde-Rebaza, J.: ANEW for Spanish Twitter sentiment analysis using instance-based multi-label learning algorithms. In: Lossio-Ventura, J., Muñante, D., Alatrista-Salas, H. (eds.) Information Management and Big Data. SIMBig 2018, CCIS, vol. 898, pp. 46–53. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11680-4_6
Vasquez, J., Gomez-Adorno, H., Bel-Enguix, G.: Bert-based approach for sentiment analysis of Spanish reviews from Tripadvisor (2021)
Martínez-Seis, B.C., Pichardo-Lagunas, O., Miranda, S., Perez-Cazares, I.J., Rodriguez-Gonzalez, J.A.: Deep learning approach for aspect-based sentiment analysis of restaurants reviews in Spanish. CyS 26(2), 899–908 (2022)
Sánchez-Holgado, P., Martín-Merino Acera, M., Blanco Herrero, D.: Del data-driven al data-feeling: análisis de sentimiento en tiempo real de mensajes en español sobre divulgación científica usando técnicas de aprendizaje automático. Disertaciones (Internet). 17 Jan 2020, vol. 13, no. 1, Accessed 28 Dec 2023
Viñán-Ludeña, M.S., De Campos, L.M.: Discovering a tourism destination with social media data: BERT-based sentiment analysis. JHTT. 13(5), 907–921 (2022)
Pan, R., García-Díaz, J.A., Garcia-Sanchez, F., Valencia-García, R.: Evaluation of transformer models for financial targeted sentiment analysis in Spanish. PeerJ. Comput. Sci. 9(9), e1377 (2023)
Barriere, V., Balahur, A.: Improving sentiment analysis over Non-English tweets using multilingual transformers and automatic translation for data-augmentation. In: Proceedings of the 28th International Conference on Computational Linguistics (Internet). Barcelona, Spain (Online): International Committee on Computational Linguistics, pp. 266–271 (2020). Accessed 28 Dec 2023
Pérez, J.M., Furman, D.A., Alemany, L.A., Luque, F., RoBERTuito: a pre-trained language model for social media text in Spanish (Internet). arXiv (2022). Accessed 28 Dec 2023
Palomino, D., Ochoa-Luna, J.: Advanced transfer learning approach for improving spanish sentiment analysis. In: Martínez-Villaseñor, L., Batyrshin, I., Marín-Hernández, A. (eds.) Advances in Soft Computing. MICAI 2019, LNCS, vol. 11835, pp. 112–123. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33749-0_10
Shaitarova A, Rinaldi F. Negation typology and general representation models for cross-lingual zero-shot negation scope resolution in Russian, French, and Spanish. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop (Internet). Online: Association for Computational Linguistics, pp. 15–23 (2021). Accessed 28 Dec 2023
Rivera-Guamán, R.R., Cumbicus-Pineda, O.M., López-Lapo, R.A., Neyra-Romero, L.A.: Sentiment analysis related of international festival of living arts Loja-Ecuador employing knowledge discovery in text. In: Botto-Tobar, M., Montes León, S., Camacho, O., Chávez, D., Torres-Carrión, P., Zambrano Vizuete, M. (eds.) Applied Technologies. ICAT 2020, Communications in Computer and Information Science, vol. 1388, pp. 327–339. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71503-8_25
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bárcena Ruiz, G., de Jesús Gil, R. (2024). BERT Transformers Performance Comparison for Sentiment Analysis: A Case Study in Spanish. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Poniszewska-Marańda, A. (eds) Good Practices and New Perspectives in Information Systems and Technologies. WorldCIST 2024. Lecture Notes in Networks and Systems, vol 989. Springer, Cham. https://doi.org/10.1007/978-3-031-60227-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-60227-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60226-9
Online ISBN: 978-3-031-60227-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)