In the study of emotion, two popular models are most often used to describe and classify affective experiences: the circumplex model and the discrete emotion model (but see also the appraisal theory for a third theoretical approach: Arnold, 1960; Lazarus, 1991; Moors, Ellsworth, Scherer & Frijda, 2013). The circumplex model, first proposed by Wundt (1912/1924), characterizes emotions along three subjective axes: (1) arousal: the intensity of emotion elicited by a stimulus, (2) valence: the pleasantness of a stimulus, and (3) dominance: the degree of control exerted by a stimulus. However, most modern accounts of this model omit dominance from this characterization (Bradley & Lang, 1999; Montefinese, Ambrosini, Fairfield, & Mammarella, 2014; Russell, 2003) because is not a strong predictor of variance of affective judgments (Bradley & Lang, 1994; Lang, Bradley, & Cuthbert, 2008), and is strongly correlated with valence (Warriner, Kuperman, & Brysbaert, 2013). Valence and arousal have been found to affect a variety of cognitive areas, including lexical access (Citron, Weekes, & Ferstl, 2013; Hofmann, Kuchinke, Tamm, Võ, and Jacobs, 2009; Kousta, Vinson, & Vigliocco, 2009), short-term memory (Majerus & D’Argembeau, 2011; Mammarella, Borella, Carretti, Leonardi, & Fairfield, 2013; Monnier & Syssau, 2008), long-term memory (Dewhurst & Parry, 2000; Kensinger & Corkin, 2003; Talmi & Moscovitch, 2004), attention (Mathewson, Arnell, & Mansfield, 2008; Stormark, Nordby, & Hugdahl, 1995), and morpho-syntactic processing (Hinojosa et al., 2014; Martín-Loeches et al., 2012). More relevant to the present study, the discrete emotion theory proposes that affective experiences can be reduced to a limited set of “basic” emotions (Ekman, 1992, 1999; Panksepp, 1998) that have been universally hardwired through evolution. Although there is some debate regarding which emotions belong to this elementary set, the most commonly cited are happiness, sadness, anger, fear, and disgust (Briesemeister, Kuchinke, & Jacobs, 2011a). Rather than seeing them as competing against each other, the circumplex model and the discrete emotion theory offer complementary characterizations of affective language (Stevenson, Mikels, & James, 2007). For example, Briesemeister, Kuchinke, and Jacobs (2011b) found an advantage in processing for happiness-related words (which had a positive valence) with respect to neutral and negatively valenced words. Since these three types of words differed in valence and arousal, this result was in line with the predictions of the circumplex model. However, in the same study these authors also found that, among negatively valenced words (i.e., sadness-related words, fear-related words, anger-related words, and disgust-related words), only those related to disgust showed a disadvantage in processing compared to neutral words. This last result supported the discrete emotion theory, in revealing differences in processing between distinct types of negatively valenced words which were matched in valence and arousal.

The variables involved in both models are subjective in nature, and their values are usually obtained by exposing participants to different words and asking them to rate the feelings they elicit (e.g., “On a scale of 1 to 5, how happy does this word make you feel?”). Subjective norms for valence and arousal have been available for some time for a variety of languages, such as the seminal Affective Norms for English Words (ANEW; Bradley & Lang, 1999), which were followed by, among others, norms in Portuguese (Kristensen, de Azevedo Gomes, Justo, & Vieira, 2011; Soares, Comesaña, Pinheiro, Simões, & Frade, 2012), French (Monnier & Syssau, 2014; Bonin, Méot, Aubert, Malardier, Niedenthal, & Capelle-Toczek, 2003; Gilet, Grühn, Studer, & Labouvie-Vief, 2012, among others), German (Kanske & Kotz, 2010; Lahl, Göritz, Pietrowsky, & Rosenberg, 2009; Võ, Conrad, Kuchinke, Urton, Hofmann, & Jacobs, 2009; Võ, Jacobs, & Conrad, 2006), Polish (Imbir, 2015), Finnish (Söderholm, Häyry, Laine, & Karrasch, 2013; Eilola & Havelka, 2010), Italian (Montefinese et al., 2014), Dutch (Moors, De Houwer, et al., 2013), and Spanish, (Redondo, Fraga, Padrón, & Comesaña, 2007; Stadthagen-González, Imbault, Pérez Sánchez, & Brysbaert, 2017). However, similar norms for discrete emotions are scarcer. To the best of our knowledge, only seven corpora are currently available in which words have been rated according to discrete emotions. Two of these include English words (Stevenson et al., 2007; Strauss & Allen, 2008), and the others focus on German (Briesemeister et al., 2011a), French (Ric, Alexopoulos, Muller, & Aubé, 2013), Polish (Wierzba et al., 2015), and Spanish (Hinojosa, Martínez-García, et al., 2016; Ferré et al., 2017). Norms that include a large set of words are particularly useful because they allow for large-scale data mining and facilitate the selection of experimental stimuli, especially when combinations of multiple word characteristics are necessary. For instance, ERP studies require large experimental sets where multiple variables are controlled for (e.g., Herbert, Junghofer, & Kissler, 2008; Hinojosa, Méndez-Bértolo, Carretié, & Pozo, 2010; Recio, Conrad, Hansen, & Jacobs, 2014; Schacht & Sommer, 2009; Citron, 2012, for an overview). In that regard, the largest sets of norms in any language for valence and arousal are those presented by Warriner, Kuperman, and Brysbaert (2013) in English with 13,915 words and Stadthagen-González et al. (2017) in Spanish with 14,037 words, while for discrete emotions the largest database is the one published by Ferré et al. (2017) with 2,266 Spanish words. Other large datasets are those of Briesemeister et al. (2011b) in German with 1,958 words, and Stevenson et al. (2007) in English with 1,037 words. In the present study, we aimed to provide a new dataset for discrete emotions. Large databases are useful both to determine whether a word has a strong emotional content (when those characteristics are being manipulated) as well as whether it is neutral (when it is important to control for them). For that reason, while some previous norms have chosen to pre-select a proportion of their items from words that are known to be emotionally loaded, we chose to cast a wide net and include a large unrestricted set of words, even if we confirm the status of most of them as “neutral.”

It is worth noting that the discrete emotions approach has mainly focused on the study of the facial expression of emotions (e.g., Elfenbein, Beaupré, Lévesque, & Hess, 2007). By contrast, there is not much research in the verbal domain. However, a few studies using words have recently appeared. Their results suggest that discrete emotions may have a role on word processing that goes beyond valence and arousal (e.g., Barrett, 2004; Briesemeister et al., 2011b; Briesemeister, Kuchinke, & Jacobs, 2014; Briesemeister, Kuchinke, Jacobs, & Braun, 2015 Ferré, Haro, & Hinojosa, in press; Westbury, Keith, Briesemeister, Hofmann, & Jacobs, 2015). In order to further develop and extend this field of research, and to study the contribution of discrete emotions to language processing, memory, and other cognitive domains, it is important to develop more normative studies including discrete emotion ratings for larger sets of words.

The first aim of the present study is to provide discrete-emotion ratings for a very large set of words that, in conjunction with previously published work (Ferré et al., 2017; Hinojosa, Martínez-García et al., 2016), present a definitive database of Spanish ratings for five emotion categories: happiness (alegría), anger (ira), fear (miedo), disgust (asco), and sadness (tristeza). The resulting database will be, by far, the largest set of ratings of discrete emotional characteristics in any language. The words selected for this study overlap with those included in Stadthagen-González et al. (2017), so that ratings for valence and arousal are also available for them. We also set out to validate these norms by comparing them with other published norms both in Spanish (Ferré et al., 2017) and English (Stevenson et al., 2007). A further objective was to investigate the relationship between the ratings for each discrete emotion with each other as well as with emotional dimensions (i.e., valence and arousal) and lexical variables (i.e., word frequency, familiarity, concreteness, imageability, and age of acquisition).

Method

Participants

We collected ratings from a total of 2,010 native speakers of Spanish. Their mean age was 21.97 years (range: 18–64; SD: 5.52) and 78.86 % of them were women. Most participants were students at universities in different regions of Spain, namely Universitat Rovira i Virgili (664); Universidad Complutense de Madrid (637), Universidad de Murcia (476), Universidad Rey Juan Carlos (104), Universidad Nacional de Educación a Distancia (72), and others (57). Participants completed the ratings for academic credit or as volunteers. They had the choice to rate up to five “blocks” of words for any of the five variables. Participants that rated more than one block did so with different blocks of words and different emotions each time, and there was a minimum of 24 hours between the times they rated the blocks. Forty-four participants rated five blocks, 261 rated four, 369 rated three, 493 rated two, and 843 rated one single block of words.

Materials

The 10,491 words in our set were taken from those included in Stadthagen-González et al. (2017). From their original list, we removed those words marked as “not known” by more than 45 % of participants in that study, as well as the conjugated verb forms in Stadthagen-González et al.’s list because, as shown in that study, the differences in emotional ratings with their infinitive forms were minimal. We did not include words for which discrete emotional ratings are already available from studies that used a similar methodology to ours, namely, the 875 words already rated in Hinojosa, Martínez-García et al. (2016), or the 2,266 words rated in Ferré et al. (2017), except for a set of 126 control words shared with Ferré et al.

In terms of frequency, the average Zipf value (Van Heuven, Mandera, Keuleers, & Brysbaert, 2014) of the set according to EsPalFootnote 1 (Duchon, Perea, Sebastián-Gallés, Martí, & Carreiras, 2013) was 3.61 (SD = 0.66, range = 0.51–7.46, median = 3.52). This means that our stimuli are nicely centered on the frequency range (Zipf values of 0–3 indicate low-frequency words, values of 4–7 indicate high-frequency words).

Procedure

The word set was randomly divided into 42 blocks with 249 or 250 words each. In order to validate our ratings, we included 126 control words for each emotion from Ferré et al.’s (2017) lists. These control words were selected for each discrete emotion so that they would span the whole scale for that particular variable and three of them were randomly assigned to each block. We also chose, for each variable, seven “calibrators” to be presented at the beginning of each block. These calibrators were rated across the entire scale in one of the previous studies and are meant to expose participants to the full span of the scale before rating the novel items. In summary, each block to be rated consisted of 259 or 260 words: 249 or 250 novel items, seven calibrators, and three control words. All words in a block (except the calibrators) were randomized individually for each participant.

Each block was rated by 20 participants. Sets of ratings that correlated with the mean ratings per items at less than.10 were removed and replaced with ratings from a different participant (3.8 % of the collected blocks). We also removed and replaced sets of ratings where the same rating was given for more than 95 % of the words (0.6 % of the collected blocks). These changes are already reflected in the participants’ information provided above.

Participants accessed the blocks online through Qualtrics (http://www.qualtrics.com). They first provided informed consent and completed a brief demographic questionnaire and were then given written instructions for the relevant variable. The instructions given were similar to those of the Hinojosa, Martínez-García, et al. (2016) and Ferré et al. (2017) norms. The exact wording in Spanish as well as an English translation are provided in the Appendix. Participants took about 20 min to complete their ratings. Participants were asked to rate each word by clicking on a 5-point scale going from 1 = “nada en absoluto” (not at all) to 5 = “extremadamente” (extremely). There was a further option to indicate that they did not know the word (“No conozco la palabra”). Words were displayed five at a time on each screen, with a rating scale under each word; after rating each set of five words, participants clicked on the “Next” button to display another set of five words. The name of the variable under measure (e.g., Alegría) was displayed at the top and bottom of the screen.

Results and discussion

Availability of the norms

The full set of norms is available in a comma-separated (.csv) as supplementary materials to this article. The words are organized alphabetically and the headings for the table are as follows (Substitute [Emotion] for Happiness, Disgust, Anger, Fear, and Sadness): Word = Word in Spanish in alphabetical order, Few_Raters – an “X” on this column marks words with less than ten ratings for at least one emotion; [Emotion]_Mean = [Emotion] mean value for all valid responses; [Emotion]_SD = Standard Deviation of [Emotion] ratings; [Emotion]_%Raters = percentage of participants (out of 20) that knew and rated the word for [Emotion]. [Emotion]_%Raters are included as measures of word prevalence, which has been shown to be a strong predictor of reaction times in lexical decision (Keuleers, Stevens, Mandera, & Brysbaert, 2015). We also include ratings, standard deviations, and percentage of raters for Valence and Arousal taken from Stadthagen-González et al. (2017). Finally, we present Dominant_POS – each word’s dominant part of speech and %Dominant_POS – the percentage of tokens that correspond to that dominant POS category, both according to EsPal (Duchon et al., 2013). The dominant POS for the 56 words that did not have an entry in EsPal was determined manually. Although each word was presented to 20 participants, they had the option to indicate that they did not know a word. Most ratings in the dataset are based on at least 18 responses (89.3 % overall), while only 0.26 % of ratings are based on less than tenresponses.

Reliability and validity of our norms

All five emotions have a negative skew (see Fig. 1), with an average rating of less than 3, falling below the midpoint of the measurement scale (see Table 1), but all discrete emotions have words with an average rating ranging the entire scale (see Table 2 for example words). This finding is consistent with other discrete emotion studies (Ferré et al., 2017).

Fig. 1
figure 1

Distribution of ratings in each of the five discrete emotions

Table 1. Descriptive statistics of the five discrete emotions, valence, arousal, frequency, familiarity, imageability, concreteness, and age of acquisition; as well as inter-rater split-half reliability of the five discrete emotions
Table 2. Words with the highest rating in each discrete emotion

We explored the inter-rater reliability of the discrete emotion ratings with a split-half procedure. We randomly split the participants that rated each word into two equal groups; resulting in two mean ratings for each word. We repeated this for each word in the dataset, resulting in two lists of average ratings for each word. We ran a Pearson correlation on these lists, resulting a single correlation. We repeated these steps 100 times to get 100 Pearson correlation coefficients. We then averaged these correlation coefficients, and then performed a Spearman-Brown correction on the correlation coefficient. This provides us with the measure of interrater split-half reliability (see Table 1). These steps were repeated for each of the five discrete emotions. The mean correlation for each discrete emotion is high, suggesting that our methods produce high interrater reliability.

Ratings for the control words were highly correlated with their counterparts from Ferré et al. (2017) for each discrete emotion: Happiness (r=0.94, p<0.001), Disgust (r=0.94, p<0.001), Anger (r=0.90, p<0.001), Fear (r=0.93, p<0.001), and Sadness (r=0.93, p<0.001). We also translated the words of our list into English and found 331 one-word translation equivalents in common with Stevenson et al.’s (2007) list. Pearson correlations were moderate to high in all five categories (happiness: r = 0.68, disgust: r = 0.54, anger: r = 0.76, fear: r = 0.76, sadness: r = 0.77). All correlations are positive and significant (p<0.001). This is a relevant finding if we consider that we were comparing ratings of words in different languages.

Relationship with lexical and affective variables

Our five discrete emotion category ratings correlate strongly with the lexical characteristics of the words, including frequency, familiarity, concreteness (Zipf values and ratings from Duchon et al., 2013) and age of acquisition (ratings from Alonso, Fernández, & Diez, 2015), as well as with valence and arousal (ratings from Stadthagen-González et al., 2017). In agreement with previous studies (Briesemeister et al. 2011b; Ferré et al., 2017; Hinojosa, Martínez-García et al., 2016), Valence shows a significant positive correlation with Happiness and significant negative correlations with each of the four negative emotions. Conversely, arousal shows a negative correlation with Happiness and strong positive correlations with each of the other four emotions. Also replicating those previous studies, the correlation of arousal was highest with fear, followed in descending order by anger, sadness, and disgust. These results suggest that, among words related to negative emotions, those associated with fear are generally the most arousing, while those associated with disgust are the least arousing. Most interestingly, age of acquisition is negatively correlated with happiness, suggesting that words with high scores on happiness are learned earlier, and negatively correlated with the other four (emotionally negative) emotions, suggesting that words with high scores on disgust, anger, fear, and sadness are learned later. This result showing that “happy” words are learned earlier is in line with previous findings regarding ratings of both valence and happiness (e.g. Hinojosa, Rincón-Pérez, et al., 2016; Moors, De Houwer, et al., 2013; Stadthagen-González et al., 2017). This could indicate a learning advantage for happy words (Hinojosa, Martínez-García, et al., 2016), or, alternatively, that children tend to be less exposed to unpleasant words, such as insults and taboo words, or at least that those providing subjective estimates of age of acquisition believe so when providing their ratings (Stadthagen-González et al., 2017). Also, the mean happiness rating is positively correlated with frequency, whereas the other four emotions are negatively correlated with frequency (see Table 3 for correlations and Fig. 2). This last finding is in agreement with studies showing a positivity bias in language (Augustine, Mehl, & Larsen, 2011; Dodds et al., 2015; Warriner & Kuperman, 2015).

Table 3. Pearson correlation coefficients among lexical and affective characteristics of words and their average discrete emotion rating
Fig. 2
figure 2

Linear models of each lexical variable for the five discrete emotions

Sex differences

To explore rater’s sex differences in our discrete emotion ratings, we calculated the mean rating for each word by males and females. At most, any emotion only has 22 % of male raters, resulting in a large sex imbalance, and many words were only rated by few males. The Pearson correlation between males and females for each emotion is r = 0.59 for happiness (14.23 % of words had a sex difference rating of more than 1 point), r = 0.59 for disgust (7.53 %), r = 0.67 for anger (7.22 %), r = 0.67 for fear (6.35 %) and r = 0.70 for sadness (6.46 %). All of these correlations are positive and significant (p < 0.001) after accounting for multiple comparisons (see Table 4 for example words with large sex differences). These correlations between ratings provided by male and female raters are somewhat lower than those obtained in similar ratings of discrete emotions (e.g., Hinojosa, Martínez-García, et al., 2016), but the low proportion of male participants makes it difficult to extract definitive conclusions about those differences.

Table 4. Words with the largest gender difference in each emotion (with >4 male raters)

Relationship between the five discrete emotion categories and distribution of words among the five discrete emotions

All emotions are significantly correlated with each other; happiness is negatively correlated with all other emotions, and the other (negative) emotions are positively correlated with each other (see Table 5).

Table 5. Pearson correlations amongst the five discrete emotions (all ps<0.001)

Following Ferré et al. (2017), we classified the ratings for each emotion into two groups: high level (with an average rating of 3 or more), and low level (an average rating of less than 3).Footnote 2 A word with a low emotion level in all five emotions is considered a neutral word; this comprises of 8,400 words, or 80.07 % of the dataset. The remaining high emotion words were classified into their respective discrete emotions. Any word that has an equally high average rating in more than one emotion is considered a part of each of those emotions. A majority of high-emotion words in our list are emotion-laden words that express or elicit emotions in an indirect way (Kazanas & Altarriba, 2015; Pavlenko, 2008; e.g., niñez –“childhood”, morir – “to die”), but emotion words, which explicitly describe or express a particular affective state (e.g., asco – “disgust”, infelicidad – “unhappiness”), are also included. Happy words make up 39.93 % of the emotional words. Disgust words make up 9.82 % of the emotional words. Anger words make up 17.49 % of emotional words. Fear words make up 13.46 % of the emotional words. Sadness words make up 19.29 % of emotional words. These findings are consistent with other discrete emotion studies (Ferré et al., 2017; Hinojosa, Martínez-García, et al., 2016), showing that the number of words associated with happiness is larger than the number of words associated with the other discrete emotions. Moreover, and also in agreement with those previous studies, the number of words associated with disgust was smaller than the number of words associated with any other discrete emotion. Of note, the above classification considers the discrete emotion in which each word scores higher, without considering the scores in the other emotions. However, it is also relevant to know if those words are “pure” with respect to that emotion. “Pure words” can be defined as those with a rating of 3 or more in a single emotion but neutral (i.e., lower ratings) for all other emotions. As mentioned before, all “happy” words can be considered pure with regards to the other four emotions as none of them scored high on any of them. For negative emotions, there are 78 pure disgust words (28.1 % of all disgust words), 165 pure anger words (33.3 % of all anger words), 85 pure fear words (22.3 % of all fear words), and 175 pure sadness words (32.1 % of all sadness words).

Apart from relating each word to a particular discrete emotion, it is also relevant to know if many words have high scores in distinct discrete emotions. The analysis of our database shows that, as expected, none of the words received a high rating for both happiness and any of the other emotions, but many words are associated with more than one discrete negative emotion. Table 6 shows the number of “mixed emotion words” (as defined by Ferré et al., 2017) for each pair of negative emotions; mixed emotion words in this study are those that received ratings of 3 or more in at least two negative emotions. As we can see, the pairs with the highest co-occurrence are Anger/Sadness followed by Fear/Sadness. Taking that to the extreme, there are 62 words in our database that have a rating of 3 or higher in all four negative emotions (e.g., pedofilia = pedophilia, racista = racist, currupción = corruption). These mixed words illustrate how the same stimulus can elicit ambivalent emotional states, where multiple emotions can be elicited either simultaneously or in rapid succession (Oatley & Johnson-Laird, 1996; Briesemeister, Kuchinke, & Jacobs, 2012). It is important to consider this information when selecting the stimuli for the experiments. On the one hand, it has been demonstrated that words related to more than one emotional state are processed differently from words related to a single emotional state (Briesemeister et al., 2012). On the other hand, pure words seem to be the most adequate in research addressed to test the prediction of the discrete emotion model (i.e., that there would be differences in processing between words related to distinct discrete emotions; see Ferré et al., in press). For instance, if slower reaction times to disgusting words with respect to the other words were found in a particular study, in order to conclude that this could be due to an avoidance reaction to the disgusting nature of the stimuli, these words should not have high scores in other emotions (e.g., anger). Otherwise, one could not be sure if it was the disgust component or the anger component what caused avoidance.

Table 6. Number of words with a score of 3 or higher for each pair of negative emotions

In summary, the present norms expand on the work of Ferré et al. (2017) and Hinojosa, Martínez-García, et al. (2016) by providing ratings for 10,491 Spanish words on five discrete emotions: happiness (alegría), anger (ira), fear (miedo), disgust (asco), and sadness (tristeza). To date, this is the largest set of discrete emotion ratings in any language. When combined with the two aforementioned datasets, researchers now have access to ratings of discrete emotions for 13,632 words. Additionally, ratings of valence and arousal for all the words in our list are available from Stadthagen-González et al. (2017). The ratings presented here show a high degree of inter-rater agreement, as well as high correlations with similar norms both in Spanish (Ferré et al., 2017) and English (Stevenson et al., 2007). We also explored the relationship between the five discrete emotions and relevant lexical and emotional variables, and were able to confirm the findings of previous studies conducted with smaller datasets. We believe that the availability of such large set of norms will greatly facilitate the study of emotion, language and related fields, by allowing researchers to contrast the dimensional models and the discrete emotion theories.

Author Note

This research was supported in part by grant PSI2015-63525-P (MINECO/FEDER) to Ferré and PSI2015-68368-P (MINECO/FEDER) and H2015/HUM-3327 from the Comunidad de Madrid to Hinojosa.