Spanish corpora for sentiment analysis: a survey


Corpora play an important role when training machine learning systems for sentiment analysis. However, Spanish is underrepresented in these corpora, as most primarily include English texts. This paper describes 20 Spanish-language text corpora—collected to support different tasks related to sentiment analysis, ranging from polarity to emotion categorization. We present a brand-new framework for the characterization of corpora. This includes a number of features to help analyze resources at both corpus level and document level. This survey—besides depicting the overall landscape of corpora in Spanish—supports sentiment analysis practitioners with the task of selecting the most suitable resources.

Fig. 1
Fig. 2
Fig. 3







  6. These are the numbers in the corpus description; however, the data itself varied slightly: 34,615 tweets (17,311 negative and 17,304 positive).



  9. The source website asks for most and least positive aspects of users’ experiences.



























  36. It is considered to be freely available if not explicitly mentioned to be otherwise.



  • Sentiment analysis
  • Corpora
  • Opinion mining
  • Polarity
  • Emotion