Spanish corpora for sentiment analysis: a survey

  • María Navas-LoroEmail author
  • Víctor Rodríguez-Doncel


Corpora play an important role when training machine learning systems for sentiment analysis. However, Spanish is underrepresented in these corpora, as most primarily include English texts. This paper describes 20 Spanish-language text corpora—collected to support different tasks related to sentiment analysis, ranging from polarity to emotion categorization. We present a brand-new framework for the characterization of corpora. This includes a number of features to help analyze resources at both corpus level and document level. This survey—besides depicting the overall landscape of corpora in Spanish—supports sentiment analysis practitioners with the task of selecting the most suitable resources.


Sentiment analysis Corpora Opinion mining Polarity Emotion 



  1. Amigó, E., Carrillo de Albornoz, J., Chugur, I., Corujo, A., Gonzalo, J., Martín, et al. (2013). Overview of RepLab 2013: evaluating online reputation monitoring systems. In Proceedings of the fourth international conference of the clef initiative (pp. 333–352).Google Scholar
  2. Atserias, J., Casas, B., Comelles, E., González, M., Padró, L., & Padró, M. (2006). FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. In Proceedings of LREC (Vol. 6, pp. 48–55).Google Scholar
  3. Boldrini, E., Balahur, A., Martínez-Barco, P., & Montoyo, A. (2012). Using EmotiBlog to annotate and analyse subjectivity in the new textual genres. Data Mining and Knowledge Discovery, 25(3), 603–634.CrossRefGoogle Scholar
  4. Breslin, J. G., Decker, S., et al. (2006). SIOC: An approach to connect web-based communities. International Journal of Web Based Communities, 2(2), 133–142.CrossRefGoogle Scholar
  5. Brooke, J., Tofiloski, M., & Taboada, M. (2009). Cross-linguistic sentiment analysis: From english to spanish. In Proceedings of the international conference RANLP-2009 (pp. 50–54). Borovets: Association for Computational Linguistics.Google Scholar
  6. Cámara, E. M., Cumbreras, M. Á. G., Román, J. V., & Morera, J. G. (2016). Tass 2015-the evolution of the spanish opinion mining systems. Procesamiento del Lenguaje Natural, 56, 33–40.Google Scholar
  7. Cambria, E., Livingstone, A., & Hussain, A. (2012). The hourglass of emotions (pp. 144–157). Berlin: Springer.Google Scholar
  8. Cochrane, T. (2009). Eight dimensions for the emotions. Social Science Information, 48(3), 379–420.CrossRefGoogle Scholar
  9. Cruz, F. L., Troyano, J. A., et al. (2008). Clasificación de documentos basada en la opinión: Experimentos con un corpus de crıticas de cine en espanol. Procesamiento de Lenguaje Natural, 41, 73–80.Google Scholar
  10. Cunningham, H., et al. (2013). Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLOS Computational Biology, 9(2), 1–16.CrossRefGoogle Scholar
  11. Ekman, P., Friesen, W. V., & Ellsworth, P. (1972). Emotion in the human face: Guidelines for research and an integration of findings. Oxford: Pergamon Press.Google Scholar
  12. Fontaine, J. R. J., Scherer, K. R., Roesch, E. B., Ellsworth, P. C., Fontaine, J. R. J., Scherer, K. R., et al. (2007). The world of emotions is not. Psychological Science, 18(12), 1050–1057.CrossRefGoogle Scholar
  13. Garcia-Moya, L., Anaya-Sanchez, H., & Berlanga-Llavori, R. (2013). Retrieving product features and opinions from customer reviews. IEEE Intelligent Systems, 28(3), 19–27.CrossRefGoogle Scholar
  14. Hepp, M. (2008). Goodrelations: An ontology for describing products and services offers on the web. In International conference on knowledge engineering and knowledge management (pp. 329–346). Springer.Google Scholar
  15. Jiménez-Zafra, S. M., Martín-Valdivia, M. T., Maks, I., & Izquierdo, R. (2017). Analysis of patient satisfaction in dutch and spanish online reviews. Procesamiento del Lenguaje Natural, 58, 101–108.Google Scholar
  16. Jiménez-Zafra, S. M., Martín-Valdivia, M. T., Molina-González, M. D., & Ureña-López, L. A. (2017). Corpus annotation for aspect based sentiment analysis in medical domain. In Proceedings of the 2nd international workshop on extraction and processing of rich semantics from medical texts.Google Scholar
  17. Jiménez-Zafra, S. M., Martín-Valdivia, M. T., Molina-González, M. D., & Ureña-López, L. A. (2018). Relevance of the SFU ReviewSP-NEG corpus annotated with the scope of negation for supervised polarity classification in Spanish. Information Processing and Management, 54(2), 240–251. Scholar
  18. Lövheim, H. (2012). A new three-dimensional model for emotions and monoamine neurotransmitters. Medical Hypotheses, 78(2), 341–348.CrossRefGoogle Scholar
  19. Marcheggiani, D., Täckström, O., Esuli, A., & Sebastiani, F. (2014). Hierarchical multi-label conditional random fields for aspect-oriented opinion mining. In Advances in information retrieval (pp. 273–285). Springer.Google Scholar
  20. Martí, M. A., Martín-Valdivia, M. T., Taulé, M., Jiménez-Zafra, S. M., Nofre, M., & Marsó, L. (2016). La negación en español: análisis y tipología de patrones de negación. Procesamiento del Lenguaje Natural, 57, 41–48.Google Scholar
  21. Martín-Valdivia, M. T., Martínez-Cámara, E., Perea-Ortega, J. M., & Ureña-López, L. A. (2013). Sentiment polarity detection in spanish reviews combining supervised and unsupervised approaches. Expert Systems with Applications, 40(10), 3934–3942.CrossRefGoogle Scholar
  22. Martínez-Cámara, E., Martín-Valdivia, M. T., & Ureña-López, L. A. (2011). Opinion classification techniques applied to a spanish corpus (pp. 169–176). Berlin: Springer. Scholar
  23. Martínez-Cámara, E., Martín-Valdivia, M. T., et al. (2015). Polarity classification for Spanish tweets using the COST corpus. Journal of Information Science, 41(3), 263–272. Scholar
  24. Mehrabian, A. (1996). Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology, 14(4), 261–292.CrossRefGoogle Scholar
  25. Molina-González, M. D., & Martínez-Cámara, E., et al. (2014). Cross-domain sentiment analysis using Spanish opinionated words. In Proceedings of NLDB (pp. 214–219).
  26. Nakamura, A. (1993). Kanjo hyogen jiten. Khet Khlong Toei: Tokyodo Publishing.Google Scholar
  27. Navas-Loro, M., & Rodríguez-Doncel, V. (2017). Oeg at tass 2017: Spanish sentiment analysis of tweets at document level. TASS 2017: Workshop on Semantic Analysis at SEPLN, Septiembre 2017 (pp. 43–49).
  28. Navas-Loro, M., Rodríguez-Doncel, V., Santana-Pérez, I., Fernández-Izquierdo, A., & Sánchez, A. (2018). Mas: A corpus of tweets for marketing in spanish. In A. Gangemi, A. L. Gentile, A. G. Nuzzolese, S. Rudolph, M. Maleshkova, H. Paulheim, J. Z. Pan, & M. Alam (Eds.), The semantic web: ESWC 2018 satellite events (pp. 363–375). Cham: Springer.CrossRefGoogle Scholar
  29. Navas-Loro, M., Rodríguez-Doncel, V., Santana-Perez, I., & Sánchez, A. (2017). Spanish corpus for sentiment analysis towards brands. In Proceedings of the 19th international conference on speech and computer (SPECOM) (pp. 680–689).Google Scholar
  30. Periñán-Pascual, C., & Arcas-Túnez, F. (2017). A knowledge-based approach to social sensors for environmentally-related problems. In Intelligent environments 2017: Workshop proceedings of the 13th international conference on intelligent environments (Vol. 22, pp. 49). IOS Press.Google Scholar
  31. Plaza-Del-Arco, F. M., Martín-Valdivia, M. T., et al. (2016). COPOS: Corpus of patient opinions in Spanish. Application of sentiment analysis techniques. Procesamiento de Lenguaje Natural, 57, 83–90.Google Scholar
  32. Plutchik, R. (2001). The nature of emotions: Human emotions have deep evolutionary roots. American Scientist, 89(4), 344–350.CrossRefGoogle Scholar
  33. Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., AL-Smadi, M., et al. (2016). Semeval-2016 task 5: Aspect based sentiment analysis. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) (pp. 19–30). San Diego, CA: Association for Computational Linguistics.Google Scholar
  34. Rangel, F., Rosso, P., & Reyes, A. (2014). Emotions and irony per gender in Facebook. In Proceedings of workshop ES3LOD, LREC-2014 (pp. 1–6).Google Scholar
  35. Reyes, A., & Rosso, P. (2011). Mining subjective knowledge from customer reviews: A specific case of irony detection. In Proceedings of the 2nd workshop on computational approaches to subjectivity and sentiment analysis (pp. 118–124). Association for Computational Linguistics.Google Scholar
  36. Rivera Pastor, R., Tarín Quirós, C., Villar García, J. P., Badía Cardús, T., & Melero Nogués, M. (2017). Language equality in the digital age—Towards a human language project.
  37. Roberto, J. A., Martí, M. A., & Llorente, M. S. (2012). Análisis de la riqueza léxica en el contexto de la clasificación de atributos demográficos latentes. Procesamiento del Lenguaje Natural, 48, 97–104.Google Scholar
  38. Roberto, J. A., Salamó, M. M., & Antònia, M. (2013). Clasificación automática del registro lingüístico en textos del español: un análisis contrastivo. LinguaMática, 5(1), 59–67.Google Scholar
  39. Rodriguez-Doncel, V., & Labropoulou, P. (2015). Digital representation of rights for language resources. In Proceedings of the 4th workshop on linked data in linguistics (LDL-2015), ACL-IJCNLP 2015 (pp. 49–58).Google Scholar
  40. Román, J. V., Morera, J. G., Cámara, E. M., & Zafra, S. M. J. (2015). Tass 2014-the challenge of aspect-based sentiment analysis. Procesamiento del Lenguaje Natural, 54, 61–68.Google Scholar
  41. Rosso, P., & Rangel, F. (2017). Author profiling in social media: The impact of emotions on discourse analysis (pp. 3–18). Cham: Springer. Scholar
  42. Russell, J. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178.CrossRefGoogle Scholar
  43. Sánchez Rada, J. F., & Torres, M., et al. (2014). A linked data approach to sentiment and emotion analysis of twitter in the financial domain. In 2nd international workshop on finance and economics on the semantic web. Google Scholar
  44. Shaver, P., Schwartz, J., et al. (1987). Emotion knowledge: Further exploration of a prototype approach. Journal of personality and social psychology, 52(6), 1061–1086. Scholar
  45. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267–307.CrossRefGoogle Scholar
  46. Tomkins, S. (1962). Affect imagery consciousness: Volume I: The positive affects (Vol. 1). Berlin: Springer.Google Scholar
  47. Vazquez, K. L., Tovar, M., Vilariño, D., & Beltrán, B. (2016). Un algoritmo para detectar la polaridad de opiniones en los dominios de laptops y restaurantes. In Advances in intelligent technologies and its applications (pp. 91–98).Google Scholar
  48. Vilares, D. (2012). Sentiment analysis for reviews and microtexts based on lexico-syntactic knowledge. In FDIA’13 (pp. 38–43).Google Scholar
  49. Vilares, D., & Alonso, M. A. (2013). Goméz-Rodríguez Carlos: A syntactic approach for opinion mining on Spanish reviews. Natural Language Engineering, 1(1), 1–26.Google Scholar
  50. Villena-Román, J., García-Morera, J., Lana-Serrano, S., & González-Cristóbal, J. C. (2014). Tass 2013—A second step in reputation analysis in Spanish. Procesamiento del Lenguaje Natural, 52, 37–44.Google Scholar
  51. Villena-Román, J., Lana-Serrano, S., Martínez-Cámara, E., & González-Cristóbal, J. C. (2013). Tass-workshop on sentiment analysis at sepln. Procesamiento del Lenguaje Natural, 50, 37–44.Google Scholar
  52. Wiebe, J., Wilson, T., & Cardie, C. (2005). Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2), 165–210.CrossRefGoogle Scholar
  53. Zafra, S. M. J., Berardi, G., Esuli, A., Marcheggiani, D., Martín-Valdivia, M. T., & Fernández, A. M. (2015). A multi-lingual annotated dataset for aspect-oriented opinion mining. In EMNLP (pp. 2533–2538).Google Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Ontology Engineering GroupUniversidad Politécnica de MadridMadridSpain

Personalised recommendations