BERT Transformers Performance Comparison for Sentiment Analysis: A Case Study in Spanish

Bárcena Ruiz, Gerardo; de Jesús Gil, Richard

doi:10.1007/978-3-031-60227-6_13

Gerardo Bárcena Ruiz^14,15 &
Richard de Jesús Gil¹⁶

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 989))

Included in the following conference series:

World Conference on Information Systems and Technologies

14 Accesses

Abstract

Despite the fact that the Spanish language is the second most spoken language in the world, research about AI and sentiment analysis is few compared with English language as research target. This paper refers to some works about sentiment analysis for the Spanish language that use BERT transformers and other technologies for this kind of sentiment analysis task; but the quality model based on indicators such as accuracy level could be different according to the tool or BERT version used. In addition, about the BERT family, it is challenging to determine which versions or subversions could perform better for sentiment analysis and also comply with the Spanish language when they are used on common platforms such as Colab. Therefore, the present study seeks to address this issue by establishing objectives, such as identifying relevant datasets based on the quality of Spanish used and having balanced subsets; also, locating different Spanish trained models; and proposing a method of comparison that involves relevant variables. We propose a weighted index that combines the F1-Score and the retraining time in different scenarios to help making better decisions. The results of this study indicate that the DistilBERT, RoBERTa, and ALBERT models have highest performances, but BERT remains in top positions as a consistent model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Spanish pre-trained BERT model.
2.
Spanish LLM model based on RoBERTa.

References

Lin, T., Wang, Y., Liu, X., Qiu, X.: A survey of transformers. AI open 3, 111–132 (2022)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
Google Scholar
Mohammed, A.H., Ali, A.H.: Survey of BERT (Bidirectional encoder representation transformer) types. J. Phys. Conf. Ser. 1963(1), 012173 (2021)
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019). arXiv:1810.04805
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019). arXiv:1907.11692
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations (2020). arXiv:1909.11942
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2020). arXiv:1910.01108
He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with disentangled attention (2021). arXiv:2006.03654
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer (2020). arXiv:2004.05150
Fernandez, L.: IMDB Dataset of 50K movie reviews (Spanish). Kaggle (2021). Accessed Aug 2023. https://www.kaggle.com/datasets/luisdiegofv97/imdb-dataset-of-50k-movie-reviews-spanish
Romero, M.: BETO (Spanish BERT) + Spanish SQuAD2.0. Hugging Face, 11 Feb 2020. https://huggingface.co/mrm8488/bert-base-spanish-wwm-cased-finetuned-spa-squad2-es. Accessed Aug 2023
IIC - Institute of knowledge engineering, Autonomous university of Madrid. IIC/roberta-base-spanish-squades. Hugging Face, 17 Mar 2022. https://huggingface.co/IIC/roberta-base-spanish-squades. Accessed Aug 2023
DCCUChile - Department of Computer Sciences, University of Chile. dccuchile/albert-xlarge-spanish-finetuned-mldoc. Hugging Face, 11 Jan 2022 b. https://huggingface.co/dccuchile/albert-xlarge-spanish-finetuned-mldoc. Accessed Aug 2023
DCCUChile - Department of Computer Sciences, University of Chile. dccuchile/distilbert-base-spanish-uncased-finetuned-mldoc. Hugging Face, 11 Jan 2022. https://huggingface.co/dccuchile/distilbert-base-spanish-uncased-finetuned-mldoc. Accessed Aug 2023
PLN@CMM - Natural Language Processing Group of the Center for Mathematical Modeling, University of Chile, mdeberta-cowese-base-es. Hugging Face, 04 Jul 2022. https://huggingface.co/plncmm/mdeberta-cowese-base-es. Accessed Aug 2023
Heras, J.: joheras/longformer-base-4096-bne-es-finetuned-v2. Hugging Face, 03 May 2023. Accessed Aug 2023. https://huggingface.co/joheras/longformer-base-4096-bne-es-finetuned-v2
Sierra, J.A.: Spanish continues to grow and has almost 500 million native speakers, according to the Cervantes Institute’s yearbook 2022. Atalayar, 27 Oct 2022. https://www.atalayar.com/en/articulo/culture/spanish-continues-grow-and-has-almost-500-million-native-speakers-according-cervantes/20221026154937158810.html. Accessed Dec 2023
Wankhade, M., Rao, A.C.S., Kulkarni, C.: A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 55(7), 5731–5780 (2022)
Article Google Scholar
Nazir, A., Rao, Y., Wu, L., Sun, L.: Issues and challenges of aspect-based sentiment analysis: a comprehensive survey. IEEE Trans. Affect. Comput. 13(2), 845–863 (2022)
Article Google Scholar
Shi, Y., Zhu, L., Li, W., Guo, K., Zheng, Y.: Survey on classic and latest textual sentiment analysis articles and techniques. Int. J. Inf. Tech. Dec. Mak. 18(04), 1243–1287 (2019)
Article Google Scholar
Alswaidan, N., Menai, M.E.B.: A survey of state-of-the-art approaches for emotion recognition in text. Knowl. Inf. Syst. 62(8), 2937–2987 (2020)
Article Google Scholar
Plaza-Del-Arco, F.M., Molina-Gonzalez, M.D., Urena-Lopez, L.A., Martin-Valdivia, M.T.: A multi-task learning approach to hate speech detection leveraging sentiment analysis. IEEE Access. 9, 112478–112489 (2021)
Article Google Scholar
López Condori, J.J., Gonzales Saji, F.O.: Análisis de sentimiento de comentarios en español en Google Play Store usando BERT. Ingeniare Rev chil ing. 29(3), 557–563 (2021)
Article Google Scholar
Palomino, R., Meléndez, C., Mauricio, D., Valverde-Rebaza, J.: ANEW for Spanish Twitter sentiment analysis using instance-based multi-label learning algorithms. In: Lossio-Ventura, J., Muñante, D., Alatrista-Salas, H. (eds.) Information Management and Big Data. SIMBig 2018, CCIS, vol. 898, pp. 46–53. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11680-4_6
Vasquez, J., Gomez-Adorno, H., Bel-Enguix, G.: Bert-based approach for sentiment analysis of Spanish reviews from Tripadvisor (2021)
Google Scholar
Martínez-Seis, B.C., Pichardo-Lagunas, O., Miranda, S., Perez-Cazares, I.J., Rodriguez-Gonzalez, J.A.: Deep learning approach for aspect-based sentiment analysis of restaurants reviews in Spanish. CyS 26(2), 899–908 (2022)
Google Scholar
Sánchez-Holgado, P., Martín-Merino Acera, M., Blanco Herrero, D.: Del data-driven al data-feeling: análisis de sentimiento en tiempo real de mensajes en español sobre divulgación científica usando técnicas de aprendizaje automático. Disertaciones (Internet). 17 Jan 2020, vol. 13, no. 1, Accessed 28 Dec 2023
Google Scholar
Viñán-Ludeña, M.S., De Campos, L.M.: Discovering a tourism destination with social media data: BERT-based sentiment analysis. JHTT. 13(5), 907–921 (2022)
Article Google Scholar
Pan, R., García-Díaz, J.A., Garcia-Sanchez, F., Valencia-García, R.: Evaluation of transformer models for financial targeted sentiment analysis in Spanish. PeerJ. Comput. Sci. 9(9), e1377 (2023)
Article Google Scholar
Barriere, V., Balahur, A.: Improving sentiment analysis over Non-English tweets using multilingual transformers and automatic translation for data-augmentation. In: Proceedings of the 28th International Conference on Computational Linguistics (Internet). Barcelona, Spain (Online): International Committee on Computational Linguistics, pp. 266–271 (2020). Accessed 28 Dec 2023
Google Scholar
Pérez, J.M., Furman, D.A., Alemany, L.A., Luque, F., RoBERTuito: a pre-trained language model for social media text in Spanish (Internet). arXiv (2022). Accessed 28 Dec 2023
Google Scholar
Palomino, D., Ochoa-Luna, J.: Advanced transfer learning approach for improving spanish sentiment analysis. In: Martínez-Villaseñor, L., Batyrshin, I., Marín-Hernández, A. (eds.) Advances in Soft Computing. MICAI 2019, LNCS, vol. 11835, pp. 112–123. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33749-0_10
Shaitarova A, Rinaldi F. Negation typology and general representation models for cross-lingual zero-shot negation scope resolution in Russian, French, and Spanish. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop (Internet). Online: Association for Computational Linguistics, pp. 15–23 (2021). Accessed 28 Dec 2023
Google Scholar
Rivera-Guamán, R.R., Cumbicus-Pineda, O.M., López-Lapo, R.A., Neyra-Romero, L.A.: Sentiment analysis related of international festival of living arts Loja-Ecuador employing knowledge discovery in text. In: Botto-Tobar, M., Montes León, S., Camacho, O., Chávez, D., Torres-Carrión, P., Zambrano Vizuete, M. (eds.) Applied Technologies. ICAT 2020, Communications in Computer and Information Science, vol. 1388, pp. 327–339. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71503-8_25

Download references

Author information

Authors and Affiliations

Universidad Americana de Europa, Av. Bonampak Sm. 6, Mz. 1, Lt. 1, 77500, Cancún, QR, México
Gerardo Bárcena Ruiz
Universidad Panamericana, Augusto Rodin 498, 03920, Ciudad de México, México
Gerardo Bárcena Ruiz
Universidad Internacional De La Rioja (UNIR), Av. de La Paz 137, 26006, Logroño, La Rioja, Spain
Richard de Jesús Gil

Authors

Gerardo Bárcena Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Richard de Jesús Gil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gerardo Bárcena Ruiz .

Editor information

Editors and Affiliations

ISEG, Universidade de Lisboa, Lisbon, Portugal
Álvaro Rocha
College of Engineering, The Ohio State University, Columbus, OH, USA
Hojjat Adeli
Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania
Gintautas Dzemyda
DCT, Universidade Portucalense, Porto, Portugal
Fernando Moreira
Institute of Information Technology, Lodz University of Technology, Łódz, Poland
Aneta Poniszewska-Marańda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bárcena Ruiz, G., de Jesús Gil, R. (2024). BERT Transformers Performance Comparison for Sentiment Analysis: A Case Study in Spanish. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Poniszewska-Marańda, A. (eds) Good Practices and New Perspectives in Information Systems and Technologies. WorldCIST 2024. Lecture Notes in Networks and Systems, vol 989. Springer, Cham. https://doi.org/10.1007/978-3-031-60227-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-60227-6_13
Published: 16 May 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60226-9
Online ISBN: 978-3-031-60227-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics