The affective content of words influences their processing, as has been repeatedly demonstrated in studies that relied on different types of tasks and experimental paradigms. For example, in the lexical decision task, in which participants are asked to decide whether a sequence of letters constitutes a word or not, the usual finding is that emotional words (e.g. kiss) have a processing advantage over neutral words (e.g., table, Kousta, Vinson, & Vigliocco, 2009; Schacht & Sommer, 2009; Scott, O’Donnell, Leuthold, & Sereno, 2009). Affectively valenced words also seem to capture more attention than their neutral counterparts, as suggested by studies that have used either the emotional Stroop task (Eilola, Havelka, & Sharma, 2007; MacKay & Ahmetzanov, 2005; Sutton & Altarriba, 2008; Sutton, Altarriba, Gianico, & Basnight Brown, 2007), the attentional blink paradigm (Huang, Baddeley, & Young, 2008; Mathewson, Arnell, & Mansfield, 2008), or the Affective Simon task (Altarriba & Basnight-Brown, 2011; De Houwer, 2003; De Houwer, Crombez, Baeyens, & Hermans, 2001). Furthermore, it has consistently been shown that emotional words are better remembered than neutral ones (Altarriba & Bauer, 2004; Brierley, Medford, Shaw, & David, 2007; Buchanan, Etzel, Adolphs, & Tranel, 2006; Ferré, 2003; Ferré, García, Fraga, Sánchez-Casas, & Molero, 2010; Herbert, Junghofer, & Kissler, 2008; Herbert & Kissler, 2010; Kensinger, 2008; Kensinger & Corkin, 2003; Kissler, Herbert, Peyk, & Junghofer, 2007). Finally, the difference between neutral and emotionally charged words has been observed not only with behavioural measures, but also in several studies that have tested whether emotional words are distinguishable from neutral ones in terms of the pattern of neural activation that they produce. For example, affectively valenced words have been reported to provoke larger neural responses in the amygdala than their neutral counterparts (Herbert et al., 2009). In addition, the results of several studies that have used event-related potentials (ERPs) suggest that the emotional content of words modulates brain activity at several temporal stages (Herbert et al., 2008; Hinojosa, Méndez-Bertolo, & Pozo, 2010; Kissler et al., 2007; Schacht & Sommer, 2009; Scott et al., 2009).

Although the aforementioned studies do not constitute an extensive review, they show that emotional words are widely used in psychological research. This interest is not limited to words, since there is also a substantial amount of research conducted with other types of affective stimuli, such as images (Schimmak, 2005) or sounds (Gómez & Danuser, 2004). As a consequence, during the last decade there has been an increasing need to develop standardized norms for the affective properties of stimuli in order to obtain experimental items that are well characterized in terms of their emotionality. There are published norms for pictures (Dan-Glauser & Scherer, in press; Lang, Bradley, & Cuthbert, 1999; Moltó et al., 1999) and sounds (Bradley & Lang, 1999a; Fernández-Abascal et al., 2008; Redondo, Fraga, Padrón, & Piñeiro, 2008). Concerning words, the Affective Norms for English Words (ANEW), developed by Bradley and Lang (1999b), is the most extensive affective database and the most widely used in studies about the emotional properties of words conducted in English, although other databases also exist (e.g., Altarriba, Bauer, & Benvenuto, 1999; Bauer & Altarriba, 2008). The ANEW has been adapted to other languages, such as Spanish (Redondo, Fraga, Padrón, & Comesaña, 2007) and Portuguese (Soares, Comesaña, Pinheiro, Simões, & Frade, in press). Furthermore, other norms have been published in other languages such as Finnish (Eilola & Havelka, 2010), Spanish (Pérez Dueñas, Acosta, Megías, & Lupiáñez, 2010; Redondo, Fraga, Comesaña, & Perea, 2005), and German: the Berlin Affective Word List (BAWL; Võ et al., 2009; Võ, Jacobs, & Conrad, 2006) and the Leipzig Affective Norms for German (LANG; Kanske & Kotz, 2010).

In spite of the existence of normalized affective databases, it might be the case that researchers could not find in them enough stimuli of a given type to select their experimental materials. We will take as an example a line of research conducted with emotional words that has used the affective priming paradigm. In this paradigm, participants see two words presented consecutively (the first one is the prime and the second one is the target) that can be either congruent or incongruent in valence. There is affective priming if participants are faster to categorize a target as positive or negative when it is preceded by a prime that has the same affective valence (e.g., beso–niño, “kiss-child”) than when it has the opposite valence (e.g., beso–enfermedad, “kiss-sickness,” see Klauer & Musch, 2003, for a review).

Although the most common interpretation of affective priming is that it is an effect provoked by the congruence between the emotional content of prime and target words, some authors have pointed out that, usually, words affectively congruent tend also to be semantically related (e.g., beso–amor, “kiss-love,” Castner et al., 2007; Storbeck & Robinson, 2004). As a consequence, affective priming might be reflecting a semantic priming effect, rather than a pure affective effect. To elucidate whether affective priming is independent from semantic priming, the effects of semantic relatedness and affective congruency have to be teased apart. In recent years, some studies have adopted this approach (Castner et al., 2007; Moritz & Graf, 2006; Padovan, Versace, Thomas-Antérion, & Laurent, 2002; Sánchez-Casas et al., 2011; Storbeck & Clore, 2008; Storbeck & Robinson, 2004). The results of these studies are not consistent, since several authors, by controlling for semantic relatedness, have been able to obtain pure affective priming (Castner et al., 2007; Moritz & Graf, 2006; Storbeck & Clore, 2008), whereas in other studies the affective priming effect has not been obtained (Padovan et al., 2002) or it seems to be restricted to limited experimental conditions or tasks (Sánchez-Casas et al., 2011; Storbeck & Robinson, 2004).

In a related line of research, several authors have questioned whether the affective content of words is stored in the semantic system and whether affective priming can be accounted for by spreading activation between affectively congruent concepts in semantic networks (De Houwer, Hermans, Rothermund, & Wentura, 2002; De Houwer & Randell, 2004; Pecchinenda, Ganteaume, & Banse, 2006). In the studies conducted in this field, participants’ attention is directed toward either the semantic characteristics of the words (e.g., their membership to a given semantic category), or to their affective valence (e.g., whether they are positive or negative). The results of these studies are contradictory. In some cases, affective priming is obtained only when participants are asked to pay attention to the affective words’ dimension but not when they focus on nonaffective semantic features (De Houwer et al., 2002; Spruyt, De Houwer, & Hermans, 2009; Spruyt, De Houwer, Hermans, & Eelen, 2007). Conversely, De Houwer and Randell (2004) and Pecchinenda et al. (2006) have obtained affective priming in tasks in which participants attend to the nonaffective, semantic content of the words.

As is apparent from the previous discussion, affective priming studies designed to investigate whether the experimental effects obtained with emotional words are exclusively due to their affective properties do not yield clear conclusions, and there is still controversy about whether emotional information is or is not part of the words’ semantic representation. Given the great importance of this topic for a better understanding of affective word processing as well as of the influence of emotion on cognitive processes, more research has to be done. In these studies, it would be necessary to manipulate both the semantic category and the affective content of words, by using words belonging to different semantic categories that are affectively valenced. However, in searching for materials in published databases, researchers might not be able to obtain enough stimuli. For example, if we look at the ANEW, it contains only 26 animal nouns, a clearly insufficient number to conduct such experiments. In fact, in some of the previously mentioned studies, experimental stimuli had to be repeated throughout the experiment in order to collect enough data. In addition, the selection of positive and negative stimuli was based on the own researchers’ intuition or on judgments made by a small number of participants (e.g., Storbeck & Clore, 2008; Storbeck & Robinson, 2004).

Given the aforementioned considerations, it would be of great interest to have a database that allows researchers to obtain a sufficient number of words and to avoid repetitions if they are interested in taking into account both the affective properties of words and their membership in a given semantic category. With this aim, we collected affective ratings for a set of 380 Spanish words that referred to animals, people, and objects, and that were not included in the Spanish adaptation of the ANEW (Redondo et al., 2007). We also obtained ratings of concreteness and familiarity, since these variables have been found to influence word processing (e.g., Kousta et al., 2009; Larsen, Mercer, & Balota, 2006). In addition, we included in the database the values of some relevant psycholinguistic variables such as frequency, obtained from LEXESP (Sebastián, Martí, Carreiras, & Cuetos, 2000), and length (number of letters).

Method

Participants

A total of 504 participants, 413 females and 91 males, with a mean age of 21.3 years (SD = 4.7) took part in the study. The participants were volunteer undergraduate students of Psychology, Sciences of Communication and Education, and were obtained from several Spanish universities (University Rovira i Virgili, in Tarragona, Autonomous University of Madrid and University of Santiago de Compostela).

Materials and procedure

The words included in the database were 380 Spanish nouns that were not contained in the Spanish Adaptation of the ANEW (Redondo et al., 2007), although there were seven nouns in common with the database of Pérez Dueñas et al. (2010) and 15 words that also appeared in the Redondo et al.’s (2005) normative data. The 380 nouns belonged to three different semantic categories. There were 119 nouns that referred to animals, 124 nouns that referred to people, and 137 to objects. We constructed two versions of a questionnaire with the same words. A total of 304 participants (252 females and 52 males, with a mean age of 20.4 years, SD = 3.3) were given one version of the questionnaire in which they were asked to rate the words in terms of their valence and arousal. The other version of the questionnaire was completed by 200 participants (161 females and 39 males, with a mean age of 22.2 years, SD = 5.3), who gave ratings of concreteness and familiarity for the same words. Each version of the questionnaire was divided into eight answer sheets, which contained 47 words on average. Participants were randomly assigned to one of the eight answer sheets.

Participants were told that they would be presented with a list of words and that their task was to rate each word in the two dimensions assigned to them (either valence and arousal, or concreteness and familiarity). The affective dimensions were rated on a 9-point scale, as in many studies published in this field (Bradley & Lang, 1999b; Kanske & Kotz, 2010; Redondo et al., 2005, 2007; Soares et al., in press). In a similar way, concreteness and familiarity were assessed using a 1 to 9 scale. Students performed the ratings in the classroom, in a collective session. They took about 15 min to complete the task. The affective ratings were made with a paper-and-pencil version of the Self-Assessment Manikin (SAM, see Fig. 1). The SAM, developed by Bradley and Lang (1994), is a commonly used instrument in the assessment of the affective properties of words (Bradley & Lang, 1999b; Kanske & Kotz, 2010; Redondo et al., 2005, 2007; Soares et al., in press), and we employed it in order to ease the comparison of our affective ratings with those previously published.

Fig. 1
figure 1

Self-Assessment Manikin (SAM). Self-evaluation scales for the dimensions of valence and arousal

In the present sudy, only the self-evaluation scales for the dimensions of valence and arousal were included. In the original work of Bradley and Lang (1999b), as well as in the following adaptations of ANEW to different languages (Redondo et al., 2007; Soares et al., in press), participants also rated the words in terms of dominance. But this dimension has been shown to explain a much lower percentage of the variance than valence and arousal (Bradley & Lang, 2000; Osgood, Suci, & Tanenbaum, 1957). Furthermore, dominance is highly correlated with valence (Lang et al., 1999). For these reasons, dominance is not usually included in studies that are based on a dimensional perspective of emotions. Therefore, we decided not to include this dimension in our database.

Each participant received an instructions sheet and an answer sheet containing the words to be assessed. For affective ratings, the SAM was included in the instructions sheet to make the participants familiar with the procedure. Participants were encouraged to use the entire scale and not to spend a lot of time with each word. Rather, they were asked to rate the words according to their first impression. They were also asked to rate only the words whose meaning they knew. Furthermore, we allowed participants to practice with six words before running the actual rating to become familiar with the task, as Redondo et al. (2007) did. The instructions given to the participants to make the affective ratings were the same as those used in the Spanish adaptation of the ANEW (Redondo et al., 2007). Instructions for concreteness were taken from LEXESP (Sebastián et al., 2000), and instructions for familiarity were an adaptation of those used by Eilola and Havelka (2010). These instructions are presented in the Appendix.

Results and discussion

To obtain the final database, participants’ responses were examined using graphical procedures for assessing person-fit, in order to detect aberrant response patterns (Ferrando & Morales, 2010). Each participant’s data was also compared to the mean data to create a personal correlation coefficient that was also used as an indicator of aberrant responses, and we excluded the questionnaires of those participants with negative values. With these criteria, we eliminated 16 participants who rated valence and activation (5.3% of the total), and three participants who rated concreteness and familiarity (1.5% of the total). We replaced the eliminated participants.

The ratings for valence, arousal, concreteness, and familiarity for 380 words belonging to three different semantic categories can be accessed as a supplement for this article. The file also includes some psycholinguistic indexes: frequency, obtained from LEXESP (Sebastián et al., 2000), and length (number of letters). The file is organized according to the three semantic categories of words included. Animal nouns are listed first, followed by words referring to people, and the last category is objects.

We performed several analyses to explore the characteristics of words included in the present database. As a first step, we studied the relationship between the dimensions of valence and arousal. We conducted a regression analysis with emotional valence as the independent factor and arousal as a dependent factor. We obtained a high quadratic correlation between valence and arousal, R = .61, p < .001, which explains 37% of the variance (the linear correlation between both dimensions accounted for only 7.4% of the variance). Figure 2 shows the distribution of the 380 word ratings in the bidimensional affective space. This is a U-shaped distribution, as suggested by the significant quadratic correlation obtained. This result indicates that highly positive and negative words are considered more arousing than neutral words. The same relationship has been previously reported by studies that have provided affective ratings for words in different languages (Bradley & Lang, 1999b; Eilola & Havelka, 2010; Kanske & Kotz, 2010; Pérez Dueñas et al., 2010; Redondo et al., 2005, 2007; Soares et al., in press; Võ et al., 2009) as well as for pictures (Lang et al., 1999; Moltó et al., 1999) and sounds (Bradley & Lang, 1999a; Fernández-Abascal et al., 2008; Redondo et al., 2008).

Fig. 2
figure 2

Distribution of the mean ratings for the 380 words in the valence and arousal affective dimensions

In order to obtain a more detailed picture of our data, we divided the whole set of words into positive, negative, and neutral words. Most studies do not explicitly describe the criteria used to define words as positive, negative, or neutral. Other studies give only the average of valence values of their selected stimuli. In these cases, the means for positive, negative, and neutral words usually take values around 7, 2 and 5, respectively (e.g., Kissler et al., 2007). On the basis of these values, we decided to consider words with values of valence ranging from 1 to 4 as negative (M = 2.9), words with values ranging from 4 to 6 as neutral (M = 5.08), and words with values ranging from 6 to 9 as positive (M = 6.6). According to these criteria, our database consists of 113 negative words (29.7% of the whole database), 194 neutral words (51%), and 73 positive words (19.3%). A similar distribution of positive, negative, and neutral words is observed in each of the three semantic categories included in the database.

We first studied the contribution of positive and negative words to the U-shaped distribution of words in the affective space. We observed that there was a significant negative correlation between valence and arousal for negative words, r = −.65, p < .001. This result indicates that the most negative words are also the most arousing. Concerning positive words, we failed to obtain a significant correlation between valence and arousal, r = .20, p = .08. As can be seen in Fig. 2, there is a high concentration of negative words in the upper range of the arousal scale, whereas in the case of positive words, they are more distributed along the whole arousal scale. This difference between positive and negative words concerning arousal seems to be the general pattern in standardizations of the affective properties of words according to a bidimensional perspective (e.g., Bradley & Lang, 1999b, Redondo et al., 2005, 2007; Soares et al., in press; Võ et al., 2009).

We further analyzed the properties of positive, neutral, and negative words. We performed a one-way ANOVA between positive, negative, and neutral words for the different indexes included in the database. We observed that the three sets of words were different in valence, F(2, 377) = 904.6, p < .001, η 2p = .83; arousal, F(2, 377) = 52.8, p < .001, η 2p = .22; concreteness, F(2, 377) = 3.1, p < .05, η 2p = .02; and familiarity, F(2, 377) = 8.53, p < .001, η 2p = .04. However, they did not differ significantly in either frequency, F(2, 377) = 2.92, p = .06, or length, F(2, 377) = 0.74, p = .48. Post hoc Tukey tests revealed that there were differences between positive and neutral words, between negative and neutral words, and between positive and negative words in both valence (all ps < .001) and arousal (all ps < .05). Concerning concreteness, post hoc contrasts failed to reach statistical significance. With respect to familiarity, both positive and neutral words were considered as more familiar than negative words (all ps < .005). The fact that positive words tend to be more familiar than their negative counterparts is in line with previous results obtained with Spanish words (Pérez Dueñas et al., 2010), and it suggests that this is a variable that has to be taken into account when selecting experimental stimuli, since familiarity has been found to affect word processing (Kousta et al., 2009; Larsen et al., 2006). In fact, it has been suggested that the amount of intereference obtained in an emotional Stroop task might be related to the familiarity of the words, since the most familiar words are those that produced the highest interference (Pérez Dueñas et al., 2010). Therefore, positive and negative words should be matched in familiarity in order to avoid a confounding factor.

We also divided the whole set of words according to their arousal. We used the score of 5 as a division point. In our database, there were more low arousing words (220 words, 57.9% of the whole set) than high arousing words (160—that is, 42.1% of the words). A similar distribution is observed if we consider the categories of animals and objects separately, although in the category of nouns referring to people, there were more high arousing words (66) than low arousing words (58). We conducted t tests between high and low arousing words and observed that they were different in valence, with the low arousing words being rated as more positive than the high arousing words, t(378) = 3.8, p < .001. There were also significant differences between the two sets of words in concreteness, t(378) = 2.1, p < .05, and frequency, t(378) = 2.2, p < .05. High arousing words were more frequent and less concrete than low arousing words. What these results suggest is that, in studies in which the effects of arousal are investigated, words have to be matched in frequency and concreteness.

Apart from the analysis of the whole database, we were also interested in studying in more detail the affective properties and the ratings given to words in the three categories. The means of the indexes for each category are presented in Table 1. We performed a MANOVA considering valence (positive, negative, and neutral words) and semantic category (animals, objects, and people) as independent factors, and the affective ratings as well as the values in the other dimensions as dependent variables. There were significant effects of both valence, F(2, 377) = 66,3, p < .001, η 2p = .52, and semantic category, F(2, 377) = 23.7, p < .001, η 2p = .28, as well as a significant interaction between the two factors, F(8, 371) = 3.25, p < .001, η 2p = .05. We conducted post hoc Tukey tests and applied the Bonferroni correction in order to correct for multiple tests (we accepted as significant any difference between means with a p < .004). Concerning affective ratings, nouns referring to objects were significantly less arousing than either nouns referring to animals or to people, although there were not significant differences in valence between the three semantic categories. With respect to the other variables, nouns referring to animals were considered significantly as less familiar than words in the other two categories, although they were the most concrete, and significantly different from both objects and people. In addition, post hoc tests revealed that nouns referring to people were more frequent and longer than either nouns referring to objects or to animals.

Table 1 Mean values of the indexes included in the database for the three semantic categories (standard deviations are in parentheses)

Finally, we also analyzed each category separately in order to know whether the relationship between valence and arousal follows the same pattern across categories. Concerning animal words, we obtained a significant quadratic relationship between both variables (R = .51, p < .001) that explains 25% of the variance. With respect to object words, the quadratic correlation was also significant (R = 0.66, p < .001), explaining 42% of the variance. And the highest correlation was that observed with words referring to people, with a value of R = 0.71, p < .001, that accounted for 42% of the variance. As we can see, every one of the three categories shows the expected relationship.

To conclude, the analyses conducted with our whole database show that the present results are convergent with those reported in most studies conducted in this field of research. This convergence underlines the comparability of the present database with previously established norms. However, it has to be taken into account that in the present database, there is a high concentration of words in the space defined by the range of scores that may be considered as neutral. This is a logical result if we take into account that the selection of our items was not based on their affective properties, but on their membership to a given category. If we consider that in experiments conducted with affectively valenced words it is necessary to include not only emotional words, but also neutral ones, we believe that the present database can be very useful for experimenters, as a complement of the previous databases. This is because the procedure used in the present study and the instructions given to the participants to obtain the ratings, as well as the scales employed for the valence and arousal dimensions, were the same as those used to obtain previously published norms in Spanish (Redondo et al., 2005, 2007). Researchers may use the present norms to obtain the amount of stimuli necessary to avoid repetitions and to conduct experiments that take into account both the affective properties of words and their membership to a given semantic category, which was not always possible for some categories in the existing Spanish databases. These experiments may contribute to a better understanding of affective word processing.