We developed affective norms for 1,121 Italian words in order to provide researchers with a highly controlled tool for the study of verbal processing. This database was developed from translations of the 1,034 English words present in the Affective Norms for English Words (ANEW; Bradley & Lang, 1999) and from words taken from Italian semantic norms (Montefinese, Ambrosini, Fairfield, & Mammarella, Behavior Research Methods, 45, 440–461, 2013). Participants evaluated valence, arousal, and dominance using the Self-Assessment Manikin (SAM) in a Web survey procedure. Participants also provided evaluations of three subjective psycholinguistic indexes (familiarity, imageability, and concreteness), and five objective psycholinguistic indexes (e.g., word frequency) were also included in the resulting database in order to further characterize the Italian words. We obtained a typical quadratic relation between valence and arousal, in line with previous findings. We also tested the reliability of the present ANEW adaptation for Italian by comparing it to previous affective databases and performing split-half correlations for each variable. We found high split-half correlations within our sample and high correlations between our ratings and those of previous studies, confirming the validity of the adaptation of ANEW for Italian. This database of affective norms provides a tool for future research about the effects of emotion on human cognition.
The affective connotation of words has a pervasive influence on cognitive processing and many studies have shown how neutral and emotionally charged words differently influence their elaboration as well as a number of cognitive processes, including attentional blink (Mathewson, Arnell, & Mansfield, 2008), attentional orienting (Stormark, Nordby, & Hugdahl, 1995), lexical decision (e.g., Scott, O’Donnell, Leuthold, & Sereno, 2009), speed of processing (e.g., Hildebrandt, Schacht, Sommer, & Wilhelm, 2012), accuracy of word detection (Ortigue et al., 2004), and memory (Kuchinke et al., 2006; Mammarella, Borella, Carretti, Leonardi, & Fairfield, 2013). Most importantly, since psychological research often involves emotionally connoted words, there is increasing need to establish norms for the emotional characteristics of stimuli in order to control for and manipulate them.
One of the principal problems in emotion research, then, is the choice of the affect or emotion terms to be used. This problem makes measures of the affective meanings of words fundamental for researchers who focus on emotions and their influence on other cognitive abilities. Emotional ratings of words, in fact, are the basis for many areas of research ranging from studies on emotions themselves, including the way they are perceived and the consequences they have on human behavior, to the impact that emotions have on the processing and memory of emotional words.
Most past research on emotions and cognition has been based on the Affective Norms for English Words (ANEW) by Bradley and Lang (1999), a database created from a rating of 1034 English words in line with Osgood, Suci, and Tannenbaum’s (1957) dimensional theory of emotions along three dimensions. In general, this theory assumes that individuals approach pleasant or positive stimuli and avoid unpleasant or negative ones with variable degrees of intensity (Osgood et al., 1957; Schneirla, 1959). This emotive space is generally defined as valence and arousal (Osgood et al., 1957). Valence represents the way an individual judges a situation and determines the polarity of emotional activation (Lang, Bradley, & Cuthbert, 1997). Valence accounts for the largest proportion of variance when used as a bipolar scale (pleasant vs. unpleasant; Miller, 1959), even though it can also be represented as two unipolar orthogonal scales (e.g., Watson, Wiese, Vaidya, & Tellegen, 1999). Arousal, also termed intensity or energy level, expresses the degree of excitement or activation an individual feels toward a given stimulus, varies from calm to exciting (Lang et al., 1997). Arousal accounts for a considerable amount of variability in evaluative ratings (Osgood et al., 1957). A third dimension, variously called dominance or control, reflects the degree of control an individual feels over a specific stimulus and extends from out of control to in control. Dominance has also been included in the development of emotional stimulus sets, even if much less frequently than valence and arousal (e.g., Bradley & Lang, 1999; Lang, Bradley, & Cuthbert, 2008). In 1980, Lang developed a nonverbal pictographic measure: the Self-Assessment Manikin (SAM), to evaluate these three dimensions. For the valence dimension, the manikin SAM ranges from “pleasant” to “unpleasant” and is depicted as a happy figure or as a frowning figure. Along the arousal dimension, SAM ranges from “excited” to “calm” and is depicted as a figure with overtly open eyes or as a sleepy figure. Lastly, the dominance dimension ranges from “out of control” to “in control” and is represented as a huge figure or as a tiny figure.
The emotional connotations of words, however, have been shown to vary from one culture to another (Leu, Wang, & Koo, 2011), as well as between languages (Chen, Kennedy, & Zhou, 2012). As a consequence, it is possible that ratings for words representing the same concepts in different languages or, indeed, in variations of the same language (i.e., British English) may differ from those provided in the ANEW. In fact, some of the original English words in the ANEW can have somewhat different connotations than their translated equivalents in Italian, and while there can be much overlap in general meanings, many essentially “correct” translations may also have very strong additional connotations that are normally disambiguated by the context. Moreover, finding equivalent meanings across languages is often difficult. For example, the adjective anxious can describe a person who is in a state of negative anxiety or worry or can be positively connoted to refer to a person who is excitedly waiting for someone or something to happen.
For this reason, databases providing affective ratings have also been developed in other languages, such as German (Kanske & Kotz, 2011; Lahl, Goritz, Pietrowsky, & Rosenberg, 2009; Võ et al., 2009), Finnish (Eilola & Havelka, 2010), Portuguese (Soares, Comesaña, Pinheiro, Simões, & Frade, 2012), Dutch (Moors et al., 2013), French (Gilet, Grühn, Studer, & Labouvie-Vief, 2012), Brazilian (Kristensen, Gomes, Justo & Vieira, 2011), and Spanish (Redondo, Fraga, Padrón, & Comesaña, 2007). Most importantly, as of yet, an Italian adaptation of this database is missing making it difficult to control for and manipulate emotional aspects of verbal stimuli when conducting studies with Italian verbal stimuli.
The aim of our study was thus to create an Italian adaptation of the affective word database starting from the translation of the 1,034 English words in the ANEW in order to create a standardized version of the affective dimensions of Italian words. We also included 87 Italian words taken from a database of semantic norms in Italian (Montefinese, Ambrosini, Fairfield, & Mammarella, 2013). In this database, we report ratings along the original three dimensions identified by Bradley and Lang (1999), as well as subjective (e.g., imageability, familiarity and concreteness) and objective (e.g., frequency, length) psycholinguistic indexes since it is also often necessary to control for other word features when carrying out emotion research using affective terms. In fact, an important effort has been made to create new databases with information on the imageability, concreteness, and word association variables of words (e.g., Altarriba, Bauer, & Benvenuto, 1999; Chiarello, Shears, & Lund, 1999; Eilola & Havelka, 2010), allowing researchers to control and manipulate their experimental material not only with regards to affective variables, but also according to other psycholinguistic measures.
To the best of our knowledge, however, only few studies collected these psycholinguistic measures along with the affective norms (e.g., Redondo et al., 2007), and even fewer studies have explored the relationships between these measures and emotional ratings (e.g., Warriner, Kuperman, & Brysbaert, in press). For example, Altarriba and Bauer (2004) showed that emotion words are, more imaginable but less concrete than abstract words, but less imaginable and less concrete than the concrete words themselves. More recent studies analyzed the correlations between affective ratings and psycholinguistic measures collected by the authors themselves (Citron, Weekes, & Ferstl, 2012) or taken from previous nonaffective norms (Warriner et al., in press). In the present study, we collected and subsequently correlated our affective and psycholinguistic ratings in order to explore the relation between these two types of measures. Moreover, we included other psycholinguistic measures, such as the length in letters and the number of orthographic neighbors, which are well known to influence participants’ word recognition performance (e.g., Balota, Cortese, Sergent-Marshall, Spieler, & Yap, 2004), and thus should be controlled for in studies using affective words.
Finally, we investigated gender differences in the Italian ratings of valence, arousal and dominance. According to a common (at least in Western culture) stereotype, women experience and express emotions with a greater intensity than men (e.g., Fischer, 2000; Timmers, Fischer, & Manstead, 2003). Although this view may seem justified by remarkable gender differences in the prevalence of affective disorders (e.g., Darlington, 2009), empirical evidence about gender differences in emotional reactivity is mixed (e.g., Bradley, Codispoti, Sabatinelli, & Lang, 2001; Kelly, Forsyth, & Karekla, 2006; Labouvie-Vief, Lumley, Jain, & Heinze, 2003). Therefore, it is necessary to consider potential gender differences in affective ratings, which may differentially affect word evaluation and consequently participants’ performance in many cognitive tasks.
A total of 1,084 undergraduate psychology students from the University of Chieti participated in our study: 684 participants (477 females and 207 males; mean age: 22.27 years, SD = 4.67; mean education: 15.07 years, SD = 2.33) performed valence, arousal and dominance ratings, and 400 participants (354 females and 46 males; mean age: 21.98 years, SD = 4.6; mean education: 14.99 years, SD = 2.43) performed imageability, concreteness and familiarity ratings. All participants were native Italian speakers and were naïve as to the purpose of the experiment. The study was conducted in accordance with Declaration of Helsinki guidelines.
The data set included a total of 1,121 Italian words taken from the ANEW (Bradley & Lang, 1999) and from the semantic norms collected in our laboratory (Montefinese et al., 2013). Of these 1,121 words, 33 occur in both databases, 1,001 in the ANEW only, and 87 in Montefinese et al.’s norms only. The data set contained 20 % adjectives, 69 % of nouns, 5 % verbs and 6 % of words that could be considered both as an adjective and a noun. All 1,121 words were translated from English to Italian by a native English speaker (author B.F.) and back-translated by another bilingual collaborator. We observed a high level of agreement between the translators (94 %) of the corresponding sets of Italian and English words. A third native Italian speaker (author E.A.) judged disagreements.
All six measures (valence, arousal, dominance, familiarity, imageability, and concreteness) were rated on 9-point scales. For the three emotional ratings, the Self-Assessment Manikin was used (Bradley & Lang, 1999). Response scales ranged from very unpleasant (1) to very pleasant (9) for valence, from very calm (1) to very aroused (9) for arousal, and from very submissive (1) to very dominant (9) for dominance. The other three scales (familiarity, imageability and concreteness, henceforth FIC) ranged from 1 (unfamiliar, unimaginable, abstract) to 9 (highly familiar, highly imaginable, highly concrete). We choose to use the SAM scale in order to preserve methodological consistency with other affective norms. In fact, since its use for the original ANEW norm (Bradley & Lang, 1999), the SAM scale has been adopted to generate ratings in subsequent ANEW norms (Redondo et al., 2007; Soares et al., 2012) and in other influential affective norms as well, such as the IAPS, the International Affective Picture System (Lang et al., 2008); IADS, the International Affective Digitized Sounds (Bradley & Lang, 2007b); and ANET, the Affective Norms for English Text (Bradley & Lang, 2007a).
Word stimuli were distributed over 40 lists (20 lists for the SAM rating and 20 lists for the FIC rating) containing 56–57 words each. For each list, a form was created in which the words were randomized across participants in order to avoid primacy or recency effects. All lists were matched for the valence, arousal and dominance indexes derived from the ANEW (Bradley & Lang, 1999).
In an online survey procedure, participants received via email a URL to access the form with one word list and enter their details. They were instructed to rate how they felt when reading the words along the three dimensions assigned to them (SAM or FIC). The familiarity rating was presented first for each word since past research proposed that seeing the words previously could affect participants’ familiarity ratings (e.g., Ratcliff, Hockley, & McKoon, 1985). On the contrary, we know of no studies that showed an influence of the other ratings on the affective evaluations.
At the top of the form, participants read the instructions and were informed that there were no right or wrong answers. The instructions emphasized the importance of using the entire range of ratings. Moreover, the participants were instructed not to worry about how many times they had used any particular rating and not to spend a lot of time thinking about their ratings, because their first considerations were of greatest interest. The instructions were similar to those provided in previous studies (e.g., Bradley & Lang, 1999; Chiarello et al., 1999; Eilola & Havelka, 2010). In line with most previous research, we did not give participants explicit instructions about ambiguous words. Thus, this ambiguity may be reflected in the rating variability (Moors et al., 2013). We provided the instructions for the two ratings as supplemental material to this article (file Instructions.doc).
Description of the database
The database, available as supplementary material, contains 1,121 Italian words with normative data on affective ratings (valence, arousal and dominance) provided by at least 31 participants (with a minimum of ten males) and on psycholinguistic ratings (familiarity, imageability, and concreteness) provided by 20 participants for each word. Furthermore, objective psycholinguistic measures (e.g., number of letters, word frequency, and grammatical class) are reported. In particular, the database is organized as follows:
Num_ANEW: A number that corresponds to the original number of the ANEW (Bradley & Lang, 1999).
ANEW: This column indicates the presence (corresponding to the value 1) or the absence (corresponding to the value 0) of each word in the ANEW database.
Eng_Word: The English translation of the Italian word.
Ita_Word: The Italian noun of each word included. The words in this column are in alphabetical order.
WordClass: The grammatical role derived from the “la Repubblica” corpus (a very large corpus of Italian newspaper text) (Baroni et al., 2004) assigned to each Italian word: adjective (A), noun (N), and verb (V).
Let_ITA: the number of letters of each Italian word.
FreqColfis and Ln_Colfis columns: The written frequency of use for each Italian word according to the CoLFIS corpus (Bertinetto et al., 2005) and the natural logarithm of this frequency plus one.
FreqRepub and Ln_Repub columns: The written frequency of use according to the “la Repubblica” corpus (Baroni et al., 2004), and the natural logarithm of this frequency plus one.
N_OrtNeig: The number of orthographic neighbors for each Italian word. Two words are considered orthographic neighbors when they share all letters (in the same position) except one (Coltheart, Davelaar, Jonasson, & Besner, 1977; Davis & Perea, 2005).
MeanFreq_Neig: The mean written frequency of use of the orthographic neighbors for each Italian word according to the CoLFIS corpus (Bertinetto et al., 2005). We obtained this and the previous indexes by using an algorithm available at the following site from Padua University: http://dpss.psy.unipd.it/claudio/vicini2.php.
N_Part_p: The number of participants that provided ratings for each Italian word.
M_Fam: Mean familiarity ratings, with 1 being unfamiliar and 9 highly familiar.
SD_Fam: The standard deviation of familiarity ratings.
M_Ima: Mean imageability ratings, with 1 being unimaginable and 9 highly imaginable.
SD_Ima: The standard deviation of imageability ratings.
M_Con: Mean concreteness ratings, with 1 being abstract and 9 highly concrete.
SD_Con: The standard deviation of concreteness ratings.
N_Part_a: The number of participants that provided ratings for each Italian word.
M_Val: Mean valence ratings, with 1 being very unpleasant and 9 very pleasant.
SD_Val: The standard deviation of valence ratings.
M_Aro: Mean arousal ratings, with 1 being very calm and 9 very aroused.
SD_Aro: The standard deviation of arousal ratings.
M_Dom: Mean dominance ratings, with 1 being very submissive and 9 very dominant.
SD_Dom: The standard deviation of dominance ratings.
In addition to the mean and standard deviation of the affective ratings for the total number of participants, we also provided these measures calculated separately for both females and males. Moreover, we added the statistics of the comparisons between females and males for the valence, arousal and dominance ratings of each word (two-tailed t tests for independent samples; degrees of freedom and both the t statistic and the p values for each comparison are provided).
Results and discussion
Figure 1 shows the distributions of the three affective measures. The distributions for the arousal and dominance indexes seem to approximate normal distributions centered over the middle of the scale, with a slight positive bias (respectively, 74 % and 62 % of the words were rated above the median of the rating scale; i.e., 5) and a relative lack of words presenting low values (i.e., lower than 3). In contrast, the distribution for valence covers the entire rating scale and is clearly bimodal, with a higher concentration of words with either mid-high (i.e., 6 to 8) or mid-low (i.e., 2 to 3) mean ratings. However, this index also shows a slight positive bias, with 59 % of the words being rated above 5, confirming the findings of a recent normative study (Warriner et al., in press; Kloumann, Danforth, Harris, Bliss, & Dodds, 2012).
The distributions for the three psycholinguistic subjective indexes are presented in Fig. 2. As can be seen, the distributions of both imageability and familiarity ratings show a strong bias toward the high range of the scale, as most of the words were rated above 5 (95 % and 91 %, respectively), suggesting that our database mainly comprises words that can be easily processed, a characteristic that is often desirable in many of the research fields in which linguistic stimuli are used. On the contrary, concreteness ratings are distributed bimodally, reflecting the presence in our norms of both concrete and abstract words.
With regard to the homogeneity of the participants’ affective ratings, Fig. 3a shows the means of the ratings for each word plotted against their standard deviations for the three emotional dimensions. For each scatterplot, words are split into two groups comprising those above and below the median of the rating scale—that is, the “neutral” value for each dimension—together with the English translations of selected examples of representative words. We also supply the quadratic regression lines and fits that demonstrate the relation between the means and standard deviations (black lines). As can be seen, the same pattern emerged for all of the affective measures. First, the words with extreme characteristics were rated with high agreement (i.e., low variability) among participants. This is not surprising, since a word can obtain an extreme mean value only if most of its ratings are as extreme as the mean value and, hence, it has a low SD. In fact, assuming a rating scale that ranges from 1 to 9, an item with a mean value of 9 necessarily has an SD of 0, because only if it were given a value of 9 by all participants (i.e., without variability in the rating) could it assume a mean value of 9. For example, the word mutilare/“mutilate,” with a mean valence of 1.24 and an SD of 0.66, was given a valence rating of 1 by 85 % of the participants.
Second, an entirely different picture emerged for the words in the midrange (i.e., those rated on average as being “neutral,” with values around 5), which assume very different SD values. Indeed, two kinds of “neutral” words exist: (1) those for which participants agreed that they were neutral, and accordingly assumed low SD values (as we just noted, high agreement means a low SD), and (2) those that assumed a “neutral” mean value because they elicited both high and low values from different participants and, due to this variability, also had higher SD values. For the valence dimension, examples of these two types of neutral items are, respectively, the words spillatrice/“stapler” and masturbarsi/“masturbate,” with the former being truly neutral, since it assumed a value of 5 in 89 % of the cases, whereas the latter should be deemed a more ambiguous word, since it elicited only 9 % of actually neutral (i.e., 5) ratings, but 16 % and 22 % of extremely low (i.e., 1) and extremely high (i.e., 9) valence ratings, respectively. It follows that researchers should take particular care when choosing neutral stimuli as materials for their experiments, since neutral words could actually be ambiguous or, worse, ambivalent. Consequently, here we note the necessity for researchers who use emotional stimuli to take the variability of the ratings into account as well, and not only the means, as potential sources of systematic error.
Our results indicate that all three emotional measures, but especially valence, are best represented by a bipolar dimension, with a negative and a positive pole located at each side of the scale, as well as a neutral pole centered on the scale’s median value, in line with the Lang (1980) theoretical and methodological approach to emotion. For the valence index, our results replicate those of previous ANEW studies, which found the same tripartite distribution of consensual ratings for low-, neutral-, and high-valence words (Bradley & Lang, 1999; Redondo et al., 2007; Soares et al., 2012) and those of other, broader norms (Lahl et al., 2009; Moors et al., 2013; Warriner et al., in press). Although it seems clear that the ANEW adaptation for Italian includes truly neutral words for the valence dimension, especially those added from our semantic norms, the same is not as clear for the arousal and dominance indexes. Indeed, for these dimensions, the cited works did not always show obvious indications of truly neutral words. Moreover, it has also been proposed that the arousal dimension would be better represented as a unipolar dimension with linearly increasing positive values of arousal (Võ et al., 2009). The peculiar distribution of arousal ratings in our norms—in particular, the relative lack of words with extremely low mean ratings—complicated the reliable assessment of the quadratic fit for words with arousal ratings below 5 (see the middle panel in Fig. 3a), and thus does not allow us to discriminate between bipolar and unipolar accounts of the arousal dimension. Finally, it should be noted that in the present study, as in previous ANEW studies, the instructions explicitly informed participants to indicate the median value of the scale when feeling intermediate levels of each emotional index.
With regard to the psycholinguistic subjective indexes, our results suggest that they are represented by a unipolar dimension. Figure 3b shows the means of the ratings for each word, plotted as a function of their SDs, for the familiarity, imageability, and concreteness measures. As can be seen, in this case no words were consistently judged to be neutral in either of these indexes, and a global quadratic relation fits well with the data. Therefore, words that assume midrange values are only those receiving variable ratings across participants.
But why are some words rated with such a high variability? In some cases, the discrepancy in ratings may arise from the fact that a given word elicited different ratings in different subsamples of the population. For example, it is likely that the word tabacco/“tobacco” provoked negative emotions in nonsmokers, whereas the same word produced more positive (but also less dominance) ratings among smokers; similarly, religion-related words may evoke different emotions as a function of the participants’ beliefs. However, it is not possible to evaluate these hypotheses, as we do not know how our sample was composed in relation to these and other potentially relevant factors. Gender is another important factor that likely influenced participants’ ratings, particularly for certain categories of stimuli. The hypothesis that males and females may give different ratings for determinate stimuli is not surprising and has been evidenced by a recent study. Warriner et al. (in press), in fact, showed that gender differences do exist for ratings of words denoting weapons, and, particularly, taboo words and sexual terms (see also Janschewitz, 2008). Moreover, the authors also suggested that the gender differences in their norms could have caused the high variability of ratings, especially for sexual terms. To evaluate this possibility, we performed a correlation analysis between the SDs of each word and the corresponding absolute difference between the mean ratings of male and female participants, for the three affective indexes. The analysis revealed that words that were rated more dissimilarly by males and females have higher SDs (N = 1,121; rs = .280, .199, and .297; R 2s = .078, .039, and .088 for valence, arousal, and dominance, respectively; all ps < .0001), confirming the influence of gender differences in producing the high variability in ratings (we discuss gender differences in detail below).
Reliability of the measures
The consistency of the collected data within each of the six ratings was first evaluated by applying the split-half correlations corrected with the Spearman–Brown formula after randomly dividing the participants into two subgroups of equal size. All reliability indexes were calculated on 1,000 different randomizations of the participants. The mean correlations between the two groups were very high for all dimensions, ranging from a minimum of r = .84 for arousal to a maximum of r = .98 for valence, revealing that the resulting ratings were highly reliable and can be used across the entire Italian-speaking population.
To further test the generalizability of the ANEW adaptation for Italian, we compared our ratings with those collected by other researchers in previous studies (Bradley & Lang, 1999; Eilola & Havelka, 2010; Gilet et al., 2012; Janschewitz, 2008; Kanske & Kotz, 2011; Lahl et al., 2009; Moors et al., 2013; Redondo et al., 2007; Soares et al., 2012; Võ et al., 2009; Warriner et al., in press). In particular, we were interested in the correlation of our data with both other ANEW norms (Bradley & Lang, 1999; Redondo et al., 2007; Soares et al., 2012) and a recent work that included almost all of the ANEW words (Warriner et al., in press) (see Fig. 4).
All correlations are shown in Table 1, and they indicate that our evaluations were all significant and highly consistent with previous evaluations. For valence, a strong linear relationship was present between the Italian adaptation of the ANEW and other ANEW studies (range of Pearson’s correlation coefficients: .90 to .93), with effect size estimates (R 2) ranging from .80 to .91. Moreover, our valence measure was also highly correlated with that from other affective norms (Eilola & Havelka, 2010; Gilet et al., 2012; Janschewitz, 2008; Kanske & Kotz, 2011; Lahl et al., 2009; Moors et al., 2013; Võ et al., 2009; Warriner et al., in press), showing good generalizability across different languages. Both the arousal and dominance measures presented more variability, revealed by lower correlations (arousal range .60 to .71, and dominance range .63 to .75) with other ANEW norms (Bradley & Lang, 1999; Redondo et al., 2007; Soares et al., 2012) and other emotive norms (Eilola & Havelka, 2010; Gilet et al., 2012; Janschewitz, 2008; Kanske & Kotz, 2011; Lahl et al., 2009; Moors et al., 2013; Võ et al., 2009; Warriner et al., in press), and lower effect size estimates (R 2 ranges: .29–.51 and .40–.58 for arousal and dominance, respectively). Note that for dominance, the correlation between our data and those of Moors et al. (2013) resulted in an unexpected lower, but significant, linear relationship (N = 434, r = .13, R 2 = .02, p < .006). It should be noted, however, that the Moors et al. dominance index also showed a low correlation (r = .30, R 2 = .09) with that of the original ANEW norms (Bradley & Lang, 1999) and with those of all other affective norms (all rs < .11, R 2s < .01). This may be due to the fact that the participants in Moors et al.’s study rated the active/dominant meaning of stimuli, whereas participants in the other affective norms rated their own feelings of dominance/control in response to the stimuli. This difference in instructions may have affected valence and arousal less than dominance ratings because the latter dimension emphasizes a sense of duality and opposition between the perceiver/judger and the stimulus to be judged, thus causing Moors et al.’s dominance ratings to be less consistent with the other ANEW norms.
These results are in line with those of previous studies, in which the perceived valence of the words seems to generalize well, whereas the ratings of arousal and dominance show greater variability across languages (e.g., Eilola & Havelka, 2010; Redondo et al., 2007; Warriner et al., in press). A possible explanation for this finding is that different cultures may attach different meanings to dominance and, especially, arousal concepts, and/or may tend to express different levels of emotional response and control. In line with this, it has been suggested that people with Latin backgrounds tend to react to affective stimuli more intensely and with less control than do North Americans (Moltò et al., 1999; Vila et al., 2001). However, albeit cross-cultural diversities can contribute to the greater variability of arousal and dominance ratings among languages, these dimensions are also defined in a unclear manner, and thus may be misinterpreted in some cases. These dimensions are also more prone to contingent and individual factors such as, to say, mood or anxiety state, or recent life events. On the contrary, the concept of “valence” is more straightforward since it is founded on ancestral motivational brain circuits that developed to ensure individual survival by reacting to appetitive and aversive environmental cues (e.g., Lang & Bradley, 2010). Accordingly, it has been shown that the valence dimension exists in all cultures (Russell, 1991). Supporting this view, it is important here to note that valence ratings have a very high reliability, whereas arousal and dominance have lower reliability, even when samples from the same population were compared. In fact, the correlations between Warriner et al.’s (in press) and Bradley and Lang’s (1999) norms (both deriving from US participants) were .95, .76, and .80 for valence, arousal, and dominance, respectively; moreover, the correlations between the three available affective norms in German were, on average, .96 for valence and .78 for arousal. In addition, the results of our analysis on split-half reliability suggest that arousal and dominance ratings are less stable (mean r = .84 and .89, respectively) than valence ratings (mean r = .98), even among individuals from the same sample.
We then compared our subjective psycholinguistic variables with those collected in previous works. As can be seen in Table 1, the correlations with other affective norms were strong for the concreteness (range = .72–.85 and .52–.72 for the r and R 2 coefficients, respectively) and imageability indexes (range = .22–.79 and .05–.62 for the r and R 2 coefficients, respectively). Note that the unexpectedly low correlation (.22) between our and Gilet et al.’s (2012) imageability values is probably due to the small number of overlapping words (N = 79) and to the fact that their words were all adjectives. Indeed, we found lower and more variable imageability ratings for adjectives (mean of ratings = 6.28, SD = 0.90; mean SD of ratings = 2.06, SD = 0.41) than for nouns [mean of ratings = 7.25, SD = 1.14; F(1, 996) = 137.09, p < .0001, η p 2 = .12; mean SD of ratings = 1.74, SD = 0.61; F(1, 996) = 52.32, p < .0001, η p 2 = .05]. For the familiarity ratings, we instead found less agreement among the different languages. The correlations varied from a minimum of r = .30 (N = 316) to a maximum of r = .55 (N = 159), for English (Janschewitz, 2008) and Finnish (Eilola & Havelka, 2010) words, respectively. However, it should be noted that the smallest agreement, found with Janschewitz’s norms, can be explained by methodological differences in this case as well. In fact, whereas in our and other collections of affective norms the familiarity index has been based on subjective measures of how often participants both use and are exposed to a given word, in Janschewitz’s norms, the familiarity index was only based on subjective frequency of exposure. However, the overall pattern of results, with a lower reliability of the familiarity ratings, confirmed those of another study (Eilola & Havelka, 2010), suggesting that familiarity is more language specific, and consequently varies considerably from one language to another. Differences in the familiarity ratings can be related to the objective frequencies of use in a language (Desrochers, Liceras, Fernández-Fuertes, & Thompson, 2010), which is obviously not perfectly stable among different languages. This language-dependent effect, instead, does not influence imageability and concreteness ratings as much, since these are concept-related, rather than word-related, measures.
Relations between measures
Valence versus arousal dimensions
Affective norms for words (Bradley & Lang, 1999; Moors et al., 2013; Warriner et al., in press), pictures (Lang et al., 2008) and sounds (Bradley & Lang, 2007b) have shown that the relationship between emotional valence and arousal is best described as a U-shaped curve, in which items with low negative or positive ratings are perceived to be the least arousing, whereas high negative and positive items are perceived to be the most arousing. Visual inspection of Fig. 5, showing the distributions of word ratings in the affective space defined by the dimensions of valence and arousal, seems to confirm this typical U-shaped relation. To verify this impression, we first investigated the quadratic effect of emotional valence on arousal by performing a regression analysis with the mean valence and its square as independent variables and the mean arousal as a dependent variable. The resulting quadratic function, y = 0.124x 2 – 1.361x + 8.831, was highly significant, explaining 32.06 % of the variance [r = .566; F(2, 1118) = 263.82, p < .0001]; moreover, it outperformed the simpler linear model, which, albeit significant, only accounted for 8.67 % of the variance [r = .294; F(1, 1119) = 106.18, p < .0001], since the R 2 change due to the inclusion of the quadratic term was highly significant [F(1, 1118) = 384.95, p < .0001].
However, the relationship between valence and arousal seems to be asymmetrical, in that “negative” items (i.e., those with a mean valence rating below 5) tend to show higher mean arousal ratings than do “positive” items (i.e., those with a mean valence rating above 5), as was confirmed by a two-tailed independent t test (Ms = 6.02 and 5.38, SDs = 0.76 and 0.91, respectively; t(1119) = 12.20, p < .0001, d = 0.75]. Moreover, the increase in emotional arousal in relation to an increasing degree of negative valence [y = –0.479x + 7.470, R 2 = .390; F(1, 458) = 292.46, p < .0001] seems to be stronger than that related to an increasing degree of positive valence [y = 0.394x + 2.732, R 2 = .168; F(1, 659) = 133.23, p < .0001] (see Fig. 5). This was corroborated by a homogeneity-of-slopes analysis, which showed that item polarity (i.e., the absolute difference between the valence rating and the median of the rating scale) had a significant effect on the slope of the relationship between emotional valence and arousal [F(1, 1117) = 14.53, p = .0001]. It is important to note here that these results did not change when choosing the median of the valence ratings distribution as the cutoff point in the classification of “negative” and “positive” words.
These results, consistently found across both the ANEW (Bradley & Lang, 1999; Soares et al., 2012) and other affective databases (e.g., Võ et al., 2009; Warriner et al., in press), may hamper research using pleasant or unpleasant words when manipulations of arousal levels are intended. Nonetheless, it is worth noting that other than this asymmetry, the items of the ANEW adaptation for Italian cover the affective space defined by the dimensions of valence and arousal quite well (see Fig. 5) and the dispersion of mean ratings observed for both valence and arousal dimensions was quite large (range = 1.18 to 8.88 and 2.00 to 8.11, respectively). In fact, it is possible to find many words that do not align themselves with the general trend of the data. For example, one may choose either pleasant words with low arousal (e.g., dormire/“sleep,” valence = 8.20, arousal = 2.82; rilassato/“relaxed,” valence = 8.38, arousal = 3.5) and/or unpleasant words with low arousal (e.g., bancarotta/“bankrupt,” valence = 1.88; arousal = 4.06; flaccido/“flabby,” valence = 3.68, arousal = 4.12), and even neutral words with high arousal (e.g., montagne russe/“rollercoaster,” valence = 5.83; arousal = 7.04; tigre/“tiger,” valence = 5.22, arousal = 6.78). Therefore, the characteristics of the Italian adaption of the ANEW will allow Italian researchers to control and/or manipulate the affective properties of words they choose as stimuli.
Dominance versus arousal dimensions
Figure 6 shows the distribution of the word ratings in the affective space defined by the dominance and arousal dimensions. Again, a typical U-shaped relation seems to emerge. We thus investigated the quadratic effect of emotional dominance on arousal by performing a polynomial regression analysis, as above. The resulting quadratic function, y = 0.331x 2 – 3.553x + 14.850, was highly significant, explaining 24.49 % of the variance [r = .495; F(2, 1118) = 181.34, p < .0001]. Again, as observed for the relation between valence and arousal, the quadratic model outperformed the simpler linear model [r = .194, R 2 = .037; F(1, 1119) = 43.83, p < .0001], since the R 2 change due to the inclusion of the quadratic term was highly significant [F(1, 1118) = 306.87, p < .0001].
However, in this case as well, this relationship seems to be asymmetrical, since “prevailing” items (i.e., those with a mean dominance rating below 5) have higher mean arousal ratings than do “weak” items (i.e., those with a mean dominance rating above 5), as confirmed by a two-tailed independent t test (Ms = 5.98 and 5.44, SDs = 0.80 and 0.91, respectively; t(1119) = 10.07, p < .0001; d = 0.631]. Moreover, the increase in emotional arousal was stronger for “prevailing” [y = –0.735x + 9.070, R 2 = .291; F(1, 418) = 171.34, p < .0001] than for “weak” items [y = 0.634x + 1.736, R 2 = .159; F(1, 699) = 131.87, p < .0001], and item polarity had a significant effect on the slope of the relationship between dominance and arousal [F(1, 1117) = 9.34, p = .002]. Again, these results did not change when choosing the median of the dominance ratings as the cutoff point in the classification of “prevailing” and “weak” items. Finally, these results should be taken with caution, because they may be mediated by the relation between valence and arousal since, as we discuss below, the valence and dominance dimensions are strongly related.
Valence versus dominance dimensions
Figure 7 shows the distribution of word ratings in the affective space defined by the valence and dominance dimensions. In this case, a strong linear relation seems to exist, with words that make people feel happier also making them feel more in control, and words that make people feel sadder also making them feel less in control. This was confirmed by a simple regression analysis [r = .849; F(1, 1119) = 2,878.48, p < .0001], with the resulting linear function explaining 72 % of the variance. Moreover, even though the R 2 change attributable to the inclusion of the quadratic term was statistically significant [F(1, 1118) = 6.56, p = .011], due to the large sample size, the improvement of the explanatory power was negligible (R 2 change < .002).
Visual investigation of Fig. 7, however, reveals an anomalous distribution of dominance values across the valence scale, with a greater dispersion of dominance values for very negative words. We found a possible explanation for this particular pattern when we inspected the items with lower valence values, in search of what characterized words with opposite dominance ratings. We found that negative words higher in dominance, as evidenced by their standard residuals, were related to different emotions—with a strong prevalence for words referring to anger (e.g., infuriato/“enraged,” arrabbiato/“angry”), hate (e.g., avversione/“hatred,” odioso/“obnoxious”), disgust (e.g., vomito/“vomit,” sporcizia/“filth”), irritation (e.g., arrogante/“arrogant, insolente/“insolent”), and contempt (e.g., traditore/“traitor,” codardo/“coward”). For negative words lower in dominance, a strong prevalence was apparent of words referring to fear (e.g., morto/“dead,” malattia/“illness”), anxiety/stress (e.g., stressato/“distressed,” ansioso/“anxious”), sadness/despair (e.g., funerale/“funeral,” lutto/“bereavement”), and shame (e.g., imbarazzato/“embarrassed,” vergognato/“shamed”). These results are in line with previous finding showing that these negative emotions are opposed along the dominance scale (Fontaine, Scherer, Roesch, & Ellsworth, 2007), which would thus be vital in discriminating different emotions within the affective space.
Nonetheless, the very strong relation that we found between emotional valence and dominance leads to the question about the existence of a truly orthogonal third affective dimension (i.e., dominance) (e.g., Warriner et al., in press), which was assumed by the original ANEW study (Bradley & Lang, 1999) on the basis of the original factor analysis performed by Osgood et al. (1957). Albeit this is still under debate, the dominance dimension has been identified as an independent factor in successive studies (see, e.g., Fontaine et al., 2007), and our findings seem to support the important role of this dimension and its relative independence from valence.
Affective words in 3-D
To further investigate both the anomalous pattern of dominance variability for words with more extreme valence ratings and the unique contribution given by the dominance scale in accounting for affective information, we explored how our words were distributed over the three-dimensional affective space given by the three orthogonal axes of valence, arousal and dominance. To this aim, we performed a multiple polynomial regression analyses with mean arousal rating as dependent variable (z) and mean valence rating (x), mean dominance rating (y), their square, and their interaction as predictors. The resulting model, z = 11.823 – 1.257x – 1.370y + 0.063x 2 + 0.110y 2 + 0.084xy, was highly significant, explaining 37.61 % of the variance [r = .613; F(5, 1115) = 134.44, p < .0001]. Moreover, it outperformed both the simpler quadratic models in predicting arousal on the basis of either valence or dominance variables alone, as the R 2 change due to the inclusion of the terms for the other dimension and for the valence by dominance interaction was significant [respectively, R 2 change = 4.70 % and 11.84 %; Fs(3, 1115) = 45.21 and 113.76, both ps < .0001]. Moreover, it seems that the relatively higher dispersion of dominance values for negative words can be accounted for by the differences in their arousal values. Indeed, the negative words higher in dominance tended to have lower arousal values, whereas negative words lower in dominance tended to have higher arousal values. Therefore, these results indicate that the exploration of emotional meaning of affective words by means of simple two-dimensional models, such as the valence-versus-arousal model, may fail to account for important sources of variation in the emotion domain. In other words, we suggest that all of the three affective dimensions are needed to adequately represent the subtle similarities and differences in the affective information of emotion words.
Correlations between affective and psycholinguistic measures
We were also interested in evaluating whether other aspects of word meaning affect the assessments of the emotional impact of a word. To this end, we performed a series of partial correlation and hierarchical regression analyses between the affective ratings of each word and the corresponding subjective and objective psycholinguistic indexes we collected by controlling for the effect of the remaining variables. Moreover, given the high number of comparisons we tested (i.e., 24, since we contrasted the three affective measure with each of the eight psycholinguistic indexes), in order to protect against type I error, we reported results that were significant at the Bonferroni-corrected α value of .05/24 ≈ .002. However, because of the large size of the word pool, some weak correlations still deviated significantly from zero, but they should be taken with caution, given their small effect size. Note that seven items were excluded from these analyses because they were compound words in the Italian translation (i.e., animale domestico/“pet,” capro espiatorio/“scapegoat,” fuochi d’artificio/“fireworks,” mal di denti/“toothache,” mal di mare/“seasick,” montagne russe/“rollercoaster,” and rapporto sessuale/“intercourse”) and, thus, they have no associated objective psycholinguistic values.
Regarding the valence dimension, we calculated the partial correlations with each of the psycholinguistic variables (length in letters, CoLFIS log frequency, Repubblica log frequency, number and mean frequency of orthographic neighbors, and familiarity, imageability, and concreteness ratings) while controlling for the effect of the remaining variables as well as dominance, dominance squared and arousal. The only partial correlations that survived the Bonferroni correction were those with both the word frequency indexes (for CoLFIS and Repubblica corpora, respectively, rs = .15 and .112, R 2s = .023 and .012, both ps < .0002), showing that positive words are used more frequently than negative words (see Warriner et al., in press, for similar results). This result confirms former findings showing the so-called linguistic positivity bias, the prevalence of positive words in different languages (Augustine, Mehl, & Larsen, 2011; Garcia, Garas, & Schweitzer, 2011; Kloumann et al., 2012; Rozin, Berman, & Royzman, 2010), according to the original “Pollyanna hypothesis” by Boucher and Osgood (1969), claiming that “people tend to look on (and talk about) the bright side of life” (p. 1).
Regarding the dominance dimension, we calculated the partial correlations with each of the psycholinguistic variables by controlling for the effect of other psycholinguistic variables as well as valence, valence squared and arousal. Only one partial correlation was significant after the Bonferroni correction, that is, that with the familiarity ratings (r = .105, R 2 = .011, p < .0005): Participants felt more dominant when facing words that they either use or are exposed to more often, corroborating the finding of Warriner et al. (in press). In other words, participants showed a sort of “fear of the unknown”—that is, a tendency to feel threatened or vulnerable when facing something unknown, a universal, adaptive disposition developed to minimize the likelihood of being harmed by unknown/unfamiliar individuals, animals or circumstance (according to the so-called smoke-detector principle; Nesse, 2005). Furthermore, this result may also represent a dominance bias in language similar to the abovementioned positivity bias: we may use words for which we are in a active/dominant relation rather than passive/controlled relation more often, because life provides us with more events to talk about in which we play a controlling/dominant role. Finally, this result may also be due to a response bias, with participants being indisposed to admit either that they often encounter words that led them feel vulnerable. However, to the best of our knowledge, no prior reports have described a specific (i.e., controlled for the effect of other variables) relation between familiarity and dominance, and further research is needed to confirm this finding and clarify this relation.
A different pattern emerged for the arousal dimension. Indeed, in this case, a preliminary visual inspection of the scatterplots suggested that the relation between arousal and both imageability and concreteness may be represented better by a quadratic model (see Fig. 8). To evaluate this impression before assessing the relation between arousal and other psycholinguistic indexes, we first performed polynomial regression analyses, with the mean arousal of each word as the dependent variable and, for both imageability and concreteness ratings, the corresponding mean values and their squares as independent variables. In both cases, the resulting quadratic functions (respectively, y = –0.173x 2 + 2.111x – 0.456 and y = –.097x 2 + 1.005x + 3.404) were highly significant (both ps < .0001), explaining 15.65 % and 17.70 % of the variance, respectively. Moreover, for both imageability and concreteness, the quadratic models outperformed the corresponding simpler linear models, since the R 2 changes due to the inclusion of the quadratic term were nontrivial (i.e., .073 and .061, respectively) and highly significant [respectively, Fs(1, 1118) = 96.37 and 81.98, ps < .0001].
We then performed the two corresponding hierarchical polynomial regressions in order to partial out the effects of other variables. For the effect of imageability on arousal, all the remaining psycholinguistic variables were entered in a first step, along with valence, valence squared, dominance, and dominance squared, whereas imageability and its square (or concreteness and its square, when assessing the effect of concreteness) were entered in a second step. For imageability, the model at the first step predicted 42.47 % of the variance [F(11, 1102) = 73.95, p < .0001] whereas the final model predicted 43.87 % of the variance [F(13, 1100) = 66.15, p < .0001], and the inclusion of imageability and its square as predictors significantly accounted for an additional 1.4 % of the variance [F(2, 1100) = 13.77, p < .0001]. The same was true for concreteness: The final model outperformed the first-step model [respectively, R 2s = .438 and .418; F(13, 1100) = 65.85 and F(11, 1102) = 71.84, both ps < .0001], since the inclusion of concreteness and its square as predictors significantly accounted for an additional 2 % of the variance [F(2, 1100) = 19.58, p < .0001]. Therefore, results indicate that words that are hard or very easy to imagine make people feel calm, whereas those in the middle of the imageability range increase excitement, confirming the findings of a recent work (Warriner et al., in press). In the same vein, words that are very abstract or very concrete make people feel calm, whereas those in the middle of the concreteness range increase excitement.
Once having assessed the nonlinear effects of both imageability and concreteness on arousal ratings, we calculated the partial correlations between arousal and each of the remaining psycholinguistic variables by controlling for the effect of the other psycholinguistic variables as well as for valence, valence squared, dominance, dominance squared, imageability squared, and concreteness squared. As with valence, the partial correlations that survived the Bonferroni correction were those with both word frequency indexes (for CoLFIS and Repubblica corpora, respectively, rs = .151 and .149, R 2s = .028 and .022, both ps < .0001), showing that, after controlling for the effects of others variables, words used more frequently were rated on average as being more arousing than less frequent words, probably because the former were more salient for the participants. To the best of our knowledge, this is the first evidence of such a relation between frequency of use and the arousal ratings of words. Indeed, previous studies did not find significant correlations between arousal and word frequency (Warriner et al., in press), even after partialing out the effects of other psycholinguistic and affective indexes (Citron, Weekes, & Ferstl, 2012). However, it should be noted that the partial correlation that we found was opposite in sign to the simple correlations between arousal and both word frequency indexes, which were positive, albeit nonsignificant (rs = .037 and .098 for the CoLFIS and Repubblica corpora, respectively). Therefore, these findings should be taken with caution and further research is needed to confirm them.
As already noted, the variability of the affective ratings was affected by the absolute difference between the mean ratings of male and female participants. In the following, we extend the investigation of gender differences in the word assessments obtained in our normative study. First, the ratings of men and women were highly correlated for the arousal (r = .720), dominance (r = .761), and, especially, valence (r = .939) dimensions (all ps < .0001), suggesting that arousal and dominance ratings are more unstable not only across languages, as shown above (see also Eilola & Havelka, 2010; Redondo et al., 2007; Warriner et al., in press), but also between men and women. In this case, the lower cross-gender correlation we found for the arousal and dominance dimensions may possibly be explained by cross-individual variability in perceiving the arousal and dominance scales, but also by gender differences in emotional reactivity and control, in line with the stereotype that women are more emotional than men (e.g., Fischer, 2000). However, this view leads to an additional prediction, that is, women should rate affective stimuli with higher arousal and lower dominance values than do men.
To test this prediction, we compared the mean affective ratings by male and female participants by means of two-tailed paired t tests. We found that the average valence ratings were significantly higher for men than for women [Ms = 5.28 and 5.18, SDs = 1.93 and 2.15, respectively; t(1120) = 4.77, p < .0001], although the effect size was very small (d = 0.134). The same result, but with a much larger effect size, emerged for the dominance ratings, with men reporting to control affective stimuli better than women [Ms = 5.39 and 5.16, SDs = 1.01 and 1.05, respectively; t(1120) = 10.86, p < .0001, d = 0.323]. On the contrary, our pool of words made female participants feel reliably more excited, on average, than male participants [Ms = 5.79 and 5.29, SDs = 0.93 and 1.08, respectively; t(1120) = 21.90, p < .0001, d = 0.653] (see also Soares et al., 2012, for similar results).
Moreover, we investigated differences between the affective ratings provided by our male and female participants for each word. To this end, we performed a series of two-tailed independent t tests (degrees of freedom and both t statistic and the p values for each comparison are provided). The analysis revealed that the men versus women comparison was significant (p < .05) for 130, 107, and 77 words (corresponding to roughly 12 %, 10 %, and 7 % of the total number of words in the database) for valence, arousal and dominance, respectively (see the SAM_Females_vs._Males columns in the database provided in the supplementary materials). A further investigation of these significant comparisons confirmed the findings of the paired t tests contrasting men and women mean affective ratings. In fact, a higher number of comparisons showed significantly higher values for men than for women, relative to the inverse pattern, for valence, although this result was only marginally significant (76 vs. 54), χ 2(1) = 3.72, p = .054, Φ = .169. The effect size estimate Φ was calculated as (χ 2/N)½ (see Fritz, Morris, and Richler, 2012). Upon inspection of these items, we identified some categories of words for which men and women provided valence ratings in a consistently different way. Therefore, researchers should take particular care when choosing them as material for their experiments and should take potential gender differences into account. For instance, approximately 13 % of the words rated with significantly more valence from male than female responders in which either taboo words or sexual terms (e.g., masturbarsi/“masturbate” or vagina/vagina, with a male-female difference on the order of three points on the 9-point scale), whereas female responders never provided higher valence ratings than males for any of these words. For this particular category of words, a general gender-dependent difference emerged for all of the three affective dimensions. In fact, men generally reported feeling not only happier, but also more aroused and dominant when facing sexual terms and, obviously, especially words denoting female erotic body parts (i.e., seno/“breast” and vagina/“vagina”), in line with previous findings (Warriner et al., in press). Regarding the attitude shown by female responders in rating the valence of words, we found that women generally tended to give more extreme valence ratings, rating “positive words” as more pleasant and “negative words” as less pleasant than men. Indeed, 50 out of the 76 words that were rated with lower valence by women received a mean valence value lower than four, whereas 48 out of the 54 words that were rated with higher valence by women received a mean valence value higher than six (among these latter items, we observed a relative prevalence (≈ 50 %) of words related to the family, nature, and domestic domains).
Regarding the dominance dimension, a significantly higher number of comparisons showed higher values for men than women (62 vs. 15), χ 2(1) = 28.69, p < .0001, Φ = .610, confirming the result of the t tests contrasting men and women mean dominance ratings. As for valence, women tended to give more extreme dominance ratings, reporting to feel less in control toward “powerful” words and more in control toward “weak” words than did men. Indeed, results showed that all of the few words to which female responders were more dominant than men were “weak,” pleasant words (i.e., with a mean dominance rating higher than 5), with some of them related to the family (sposo/“spouse,” bambino/“child”) or nature (primavera/“spring,” brezza/“breeze,” or aurora/“dawn”) domains, whereas approximately two thirds of the 62 words to which female responders were less dominant than men were “powerful,” unpleasant words, with higher differences for “scary” stimuli related to death, threat or pain (e.g., morte/“death,” violento/“violent,” paura/“fear,” or panico/“panic”), excluding the abovementioned sexual terms.
Finally, the pattern of gender differences confirmed the overall t test, as a significantly higher number of comparisons showed significantly higher values for women than men for arousal as well (arousal: 100 vs. 7), χ 2(1) = 80.83, p < .0001, Φ = .869. In this case, female responders reported to be more aroused by both pleasant (domestic- and family-related words such as giardino/“garden,” abitazione/“house,” ragazzo/“boy,” or moglie/“wife”) and unpleasant (“scary” words such as rapinatore/“robber,” pugnale/“dagger,” cimitero/“cemetery,” or morte/“death”) categories of words. This is in line with the quadratic relation between valence and arousal. Indeed, since our data showed that women tended to rate both pleasant and unpleasant words with more extreme hedonic values than men, and given that these words generally took higher arousal values, it follows that they also tended to show higher arousal ratings to both pleasant and unpleasant words.
Taken together, the results of the analyses on gender differences seem to confirm the idea that women are more emotional than men, as they experience and express emotions with greater intensity and have lower control on them than men. Our results are in line with the Portuguese adaptation of ANEW (Soares et al., 2012), in which female participants rated words as significantly more arousing and with more extreme valence values than do men, but no difference emerged for the dominance scale. On the contrary, the original ANEW study (Bradley & Lang, 1999) showed gender differences only for the dominance dimension. In addition, other studies showed that female responders tend to give more extreme hedonic ratings to affective stimuli (e.g., Bellezza, Greenwald, & Banaji, 1986; Vasa, Carlino, London, & Min, 2006), especially for negative words (e.g., Bradley et al., 2001), and to experience and express emotion more strongly than men (e.g., Bradley et al., 2001; Fujita, Diener, & Sandvik, 1991) and with less control, at least for both very arousing unpleasant stimuli and stimuli with erotic content (Bradley et al., 2001).
However, the gender differences we reported should not take away from the high consistency between women’s and men’s affective ratings that we found, as shown by the still high cross-gender correlations. Therefore, the data we presented suggest that despite specific gender differences, men and women are quite comparable in their affective experience to a large number of affective words. Moreover, it should be noted that the number of words for which we found a gender difference in at least one of the three affective dimensions is reasonably low, especially considering that in this case we did not correct the α level for multiple comparisons. Therefore, the gender differences we reported should be taken flexibly, as it is possible to choose more or less strict criteria to depending on the need for a specific experimental paradigm. Indeed, it is important here to note that our purpose was not to evaluate the validity of the idea that women are more emotional than men, but rather, without arguing either for or against this position, to highlight any potential gender difference to provide researchers with a tool that would allow them to use highly controlled affective stimuli.
Conclusions, limitations, and future directions
In sum, the present study was carried out in line with the dimensional perspective of emotion, which proposes orthogonal dimensions of emotional valence, arousal and dominance. This perspective underlines the importance of manipulating or controlling the values of words along these dimensions when investigating affective word processing and implies standardized stimuli databases such as the ANEW. The study we present here is an adaptation of the ANEW to Italian in an Italian population. In fact, as research on word processing attempts to further clarify emotional influence, the demand for highly controlled tools is growing. We followed the methods of other ANEW norms by using SAM, a standard assessment system whose effectiveness for rating affective states has been proved (Bradley & Lang, 1994) but have taken a further step by including a range of information regarding psycholinguistic indexes already known to influence word processing. We did not include a measure of the age of acquisition (AoA) of the words, which may be a limitation. In fact, studies have shown that AoA can be an important contribution to lexical processing (Barca, Burani, & Arduino, 2002; Brysbaert & Ghyselinck, 2006; Dell’Acqua, Lotto, & Job, 2000; Juhasz, 2005) and related with affective properties of words (Citron et al., 2012; Warriner et al., in press). Therefore, a future study could expand the present database to include AoA values.
To conclude, we propose that, as for the original ANEW, the availability of its Italian adaptation will contribute to improve research in many different domains. This database enables researchers to use highly controlled Italian verbal stimuli for the study of emotion and allows them to investigate the relation between cognition and emotion more reliably.
Altarriba, J., & Bauer, L. M. (2004). The distinctiveness of emotion concepts: A comparison between emotion, abstract, and concrete words. American Journal of Psychology, 117, 389–410.
Altarriba, J., Bauer, L. M., & Benvenuto, C. (1999). Concreteness, context availability, and imageability ratings and word associations for abstract, concrete, and emotion words. Behavior Research Methods, Instruments, & Computers, 31, 578–602. doi:10.3758/Bf03200738
Augustine, A. A., Mehl, M. R., & Larsen, R. J. (2011). A positivity bias in written and spoken English and its moderation by personality and gender. Social Psychological and Personality Science, 2, 508–515.
Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology: General, 133, 283–316. doi:10.1037/0096-34184.108.40.2063
Barca, L., Burani, C., & Arduino, L. S. (2002). Word naming times and psycholinguistic norms for Italian nouns. Behavior Research Methods, Instruments, & Computers, 34, 424–434. doi:10.3758/BF03195471
Baroni, M., Bernardini, S., Comastri, F., Piccioni, L., Volpi, A., Aston, G., & Mazzoleni, M. (2004). Introducing the La Repubblica corpus: A large, annotated, TEI (XML)-compliant corpus of newspaper Italian. Proceedings of LREC, 2, 5,163.
Bellezza, F. S., Greenwald, A. G., & Banaji, M. R. (1986). Words high and low in pleasantness as rated by male and female college students. Behavior Research Methods, Instruments, & Computers, 18, 299–303.
Bertinetto, P. M., Burani, C., Laudanna, A., Marconi, L., Ratti, D., Rolando, C., & Thornton, A. M. (2005). Corpus e lessico di frequenza dell’italiano scritto (CoLFIS) [Corpus and frequency lexicon for written Italian]. Retrieved from www.istc.cnr.it/grouppage/colfisEng
Boucher, J., & Osgood, C. E. (1969). The Pollyanna hypothesis. Journal of Verbal Learning and Verbal Behavior, 8, 1–8.
Bradley, M. M., Codispoti, M., Sabatinelli, D., & Lang, P. J. (2001). Emotion and motivation II: Sex differences in picture processing. Emotion, 1, 300–319. doi:10.1037/1528-35220.127.116.110
Bradley, M. M., & Lang, P. J. (1994). Measuring emotion: The Self-Assessment Manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychiatry, 25, 49–59.
Bradley, M. M., & Lang, P. J. (1999). Affective norms for English words (ANEW): Instruction manual and affective ratings (Technical Report C-1). Gainesville, FL: The Center for Research in Psychophysiology, University of Florida.
Bradley, M. M., & Lang, P. J. (2007a). Affective Norms for English Text (ANET): Affective ratings of text and instruction manual (Technical Report No. D-1). Gainesville, FL: University of Florida, NIMH Center for Research in Psychophysiology.
Bradley, M. M., & Lang, P. J. (2007b). International Affective Digitized Sounds (2nd Edition; IADS-2): Affective ratings of sounds and instruction manual (Technical Report No. B-3). Gainesville, FL: University of Florida, NIMH Center for Research in Psychophysiology.
Brysbaert, M., & Ghyselinck, M. (2006). The effect of age of acquisition: Partly frequency related, partly frequency independent. Visual Cognition, 13, 992–1011. doi:10.1080/13506280544000165
Chen, S. H., Kennedy, M., & Zhou, Q. (2012). Parents’ expression and discussion of emotion in the multilingual family does language matter? Perspectives on Psychological Science, 7, 365–383.
Chiarello, C., Shears, C., & Lund, K. (1999). Imageability and distributional typicality measures of nouns and verbs in contemporary English. Behavior Research Methods, Instruments, & Computers, 31, 603–637. doi:10.3758/BF03200739
Citron, F. M., Weekes, B. S., & Ferstl, E. C. (2012). How are affective word ratings related to lexicosemantic properties? Evidence from the Sussex Affective Word List. Applied Psycholinguistics, 1, 1–19.
Coltheart, M., Davelaar, E., Jonasson, J. T., & Besner, D. (1977). Access to the internal lexicon. In S. Dornic (Ed.), Attention and performance VI (pp. 535–555). Hillsdale, NJ: Erlbaum.
Darlington, C. L. (2009). The female brain (2nd ed.). Boca Raton, FL: CRC Press.
Davis, C. J., & Perea, M. (2005). BuscaPalabras: A program for deriving orthographic and phonological neighborhood statistics and other psycholinguistic indices in Spanish. Behavior Research Methods, 37, 665–671. doi:10.3758/BF03192738
Dell’Acqua, R., Lotto, L., & Job, R. (2000). Naming times and standardized norms for the Italian PD/DPSS set of 266 pictures: Direct comparisons with American, English, French, and Spanish published databases. Behavior Research Methods, Instruments, & Computers, 32, 588–615. doi:10.3758/BF03200832
Desrochers, A., Liceras, J. M., Fernández-Fuertes, R., & Thompson, G. L. (2010). Subjective frequency norms for 330 Spanish simple and compound words. Behavior Research Methods, 42, 109–117. doi:10.3758/BRM.42.1.109
Eilola, T. M., & Havelka, J. (2010). Affective norms for 210 British English and Finnish nouns. Behavior Research Methods, 42, 134–140. doi:10.3758/BRM.42.1.134
Fischer, A. (Ed.). (2000). Gender and emotion: Social psychological perspectives. New York, NY: Cambridge University Press.
Fontaine, J. R., Scherer, K. R., Roesch, E. B., & Ellsworth, P. C. (2007). The world of emotions is not two-dimensional. Psychological Science, 18, 1050–1057.
Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141, 2–18. doi:10.1037/a0024338
Fujita, F., Diener, E., & Sandvik, E. (1991). Gender differences in negative affect and well-being: The case for emotional intensity. Journal of Personality and Social Psychology, 61, 427–434. doi:10.1037/0022-3518.104.22.1687
Garcia, D., Garas, A., & Schweitzer, F. (2011). Positive words carry less information than negative words. arXiv, 1110.4123.
Gilet, A. L., Grühn, D., Studer, J., & Labouvie-Vief, G. (2012). Valence, arousal, and imagery ratings for 835 French attributes by young, middle-aged, and older adults: The French Emotional Evaluation List (FEEL). European Review of Applied Psychology, 62, 173–181. doi:10.1016/j.erap.2012.03.003
Hildebrandt, A., Schacht, A., Sommer, W., & Wilhelm, O. (2012). Measuring the speed of recognising facially expressed emotions. Cognition and Emotion, 26, 650–666. doi:10.1080/02699931.2011.602046
Janschewitz, K. (2008). Taboo, emotionally valenced, and emotionally neutral word norms. Behavior Research Methods, 40, 1065–1074. doi:10.3758/BRM.40.4.1065
Juhasz, B. J. (2005). Age-of-acquisition effects in word and picture identification. Psychological Bulletin, 131, 684–712. doi:10.1037/0033-2909.131.5.684
Kanske, P., & Kotz, S. A. (2011). Cross-modal validation of the Leipzig Affective Norms for German (LANG). Behavior Research Methods, 43, 409–413. doi:10.3758/s13428-010-0048-6
Kelly, M. M., Forsyth, J. P., & Karekla, M. (2006). Sex differences in response to a panicogenic challenge procedure: An experimental evaluation of panic vulnerability in a non-clinical sample. Behaviour Research and Therapy, 44, 1421–1430.
Kloumann, I. M., Danforth, C. M., Harris, K. D., Bliss, C. A., & Dodds, P. S. (2012). Positivity of the English Language. PLoS ONE, 7, e29484. doi:10.1371/journal.pone.0029484
Kristensen, C. H., Gomes, C. F. de A., Justo, A. R., & Vieira, K. (2011). Brazilian norms for the Affective Norms for English Words. Trends in Psychiatry and Psychotherapy, 33, 135–146. doi:10.1590/S2237-60892011000300003
Kuchinke, L., Jacobs, A. M., Võ, M. L. H., Conrad, M., Grubich, C., & Herrmann, M. (2006). Modulation of prefrontal cortex activation by emotional words in recognition memory. NeuroReport, 17, 1037–1041. doi:10.1097/01.wnr.0000221838.27879.fe
Labouvie-Vief, G., Lumley, M. A., Jain, E., & Heinze, H. (2003). Age and gender differences in cardiac reactivity and subjective emotion responses to emotional autobiographical memories. Emotion, 3, 115–126. doi:10.1037/1528-3522.214.171.124
Lahl, O., Goritz, A. S., Pietrowsky, R., & Rosenberg, J. (2009). Using the World-Wide Web to obtain large-scale word norms: 190,212 ratings on a set of 2,654 German nouns. Behavior Research Methods, 41, 13–19. doi:10.3758/BRM.41.1.13
Lang, P. J. (1980). Behavioral treatment and bio-behavioral assessment: Computer applications. In J. B. Sidowski, J. H. Johnson, & T. A. Williams (Eds.), Technology in mental health and delivery systems (pp. 119–137). Norwood, NJ: Ablex.
Lang, P. J., & Bradley, M. M. (2010). Emotion and the motivational brain. Biological Psychology, 84, 437–450.
Lang, P. J., Bradley, M. M., & Cuthbert, B. N. (1997). Motivated attention: Affect, activation, and action. Attention and orienting: Sensory and motivational processes (pp. 97–135). Mahwah, NJ: Erlbaum.
Lang, P. J., Bradley, M. M., & Cuthbert, B. N. (2008). International Affective Picture System (IAPS): Affective ratings of pictures and instruction manual (Technical Report A-8). Gainesville, FL: University of Florida, Center for Research in Psychophysiology.
Leu, J., Wang, J., & Koo, K. (2011). Are positive emotions just as “positive” across cultures? Emotion, 11, 994–999. doi:10.1037/a0021332
Mammarella, N., Borella, E., Carretti, B., Leonardi, G., & Fairfield, B. (2013). Examining an emotion enhancement effect in working memory: Evidence from age-related differences. Neuropsychological Rehabilitation, 23, 416–428. doi:10.1080/09602011.2013.775065
Mathewson, K. J., Arnell, K. M., & Mansfield, C. A. (2008). Capturing and holding attention: The impact of emotional words in rapid serial visual presentation. Memory & Cognition, 36, 182–200. doi:10.3758/MC.36.1.182
Miller, N. E. (1959). Liberalization of basic SR concepts: Extensions to conflict behavior, motivation, and social learning. New York, NY: McGraw-Hill.
Moltò, J., Montañés, S., Poy, R., Segarra, P., Pastor, M. C., Tormo Irún, M. P., . . . Vila, J. (1999). Un método para el estudio experimental de las emociones: El International Affective Picture System (IAPS). Adaptación española. Revista de psicología general y aplicada: Revista de la Federación Española de Asociaciones de Psicología, 52, 55–87.
Montefinese, M., Ambrosini, E., Fairfield, B., & Mammarella, N. (2013). Semantic memory: A feature-based analysis and new norms for Italian. Behavior Research Methods, 45, 440–461. doi:10.3758/s13428-012-0263-4
Moors, A., De Houwer, J., Hermans, D., Wanmaker, S., van Schie, K., Van Harmelen, A. L., . . . Brysbaert, M. (2013). Norms of valence, arousal, dominance, and age of acquisition for 4,300 Dutch words. Behavior Research Methods, 45, 169–177. doi:10.3758/s13428-012-0243-8
Nesse, R. M. (2005). Natural selection and the regulation of defenses: A signal detection analysis of the smoke detector principle. Evolution and Human Behavior, 26, 88–105.
Ortigue, S., Michel, C. M., Murray, M. M., Mohr, C., Carbonnel, S., & Landis, T. (2004). Electrical neuroimaging reveals early generator modulation to emotional words. NeuroImage, 21, 1242–1251. doi:10.1016/j.neuroimage.2003.11.007
Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The measurement of meaning. Urbana, IL: University of Illinois Press.
Ratcliff, R., Hockley, W., & McKoon, G. (1985). Components of activation: Repetition and priming effects in lexical decision and recognition. Journal of Experimental Psychology: General, 114, 435–450. doi:10.1037/0096-34126.96.36.1995
Redondo, J., Fraga, I., Padrón, I., & Comesaña, M. (2007). The Spanish adaptation of ANEW (Affective Norms for English words). Behavior Research Methods, 39, 600–605. doi:10.3758/BF03193031
Rozin, P., Berman, L., & Royzman, E. (2010). Biases in use of positive and negative words across twenty natural languages. Cognition and Emotion, 24, 536–548.
Russell, J. A. (1991). Culture and the categorization of emotions. Psychological Bulletin, 110, 426–450. doi:10.1037/0033-2909.110.3.426
Schneirla, T. C. (1959). An evolutionary and developmental theory of biphasic processes underlying approach and withdrawal. In M. R. Jones (Ed.), Nebraska Symposium on Motivation (pp. 1–42). Lincoln, NE: University of Nebraska Press.
Scott, G. G., O’Donnell, P. J., Leuthold, H., & Sereno, S. C. (2009). Early emotion word processing: Evidence from event-related potentials. Biological Psychology, 80, 95–104. doi:10.1016/j.biopsycho.2008.03.010
Soares, A. P., Comesaña, M., Pinheiro, A. P., Simões, A., & Frade, C. S. (2012). The adaptation of the Affective Norms for English Words (ANEW) for European Portuguese. Behavior Research Methods, 44, 256–269. doi:10.3758/s13428-011-0131-7
Stormark, K. M., Nordby, H., & Hugdahl, K. (1995). Attentional shifts to emotionally charged cues—Behavioral and ERP data. Cognition & Emotion, 9, 507–523. doi:10.1080/02699939508408978
Timmers, M., Fischer, A., & Manstead, A. (2003). Ability versus vulnerability: Beliefs about men’s and women’s emotional behaviour. Cognition and Emotion, 17, 41–63. doi:10.1080/02699930302277
Vasa, R. A., Carlino, A. R., London, K., & Min, C. (2006). Valence ratings of emotional and non-emotional words in children. Personality and Individual Differences, 41, 1169–1180.
Vila, J., Sánchez, M., Ramírez, I., Fernández, M., Cobos, P., Rodríguez, S., . . . Segarra, P. (2001). El Sistema Internacional de Imágenes Afectivas (IAPS): Adaptación española, Segunda parte. Revista de Psicología General y Aplicada, 54, 635–657.
Võ, M. L., Conrad, M., Kuchinke, L., Urton, K., Hofmann, M. J., & Jacobs, A. M. (2009). The Berlin Affective Word List Reloaded (BAWL-R). Behavior Research Methods, 41, 534–538. doi:10.3758/BRM.41.2.534
Warriner, A. B., Kuperman, V., & Brysbaert, M. (in press). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods. doi:10.3758/s13428-012-0314-x
Watson, D., Wiese, D., Vaidya, J., & Tellegen, A. (1999). The two general activation systems of affect: Structural findings, evolutionary considerations, and psychobiological evidence. Journal of Personality and Social Psychology, 76, 820–838. doi:10.1037/0022-35188.8.131.520
Electronic supplementary material
Below is the link to the electronic supplementary material.
(PDF 1148 kb)
Rights and permissions
About this article
Cite this article
Montefinese, M., Ambrosini, E., Fairfield, B. et al. The adaptation of the Affective Norms for English Words (ANEW) for Italian. Behav Res 46, 887–903 (2014). https://doi.org/10.3758/s13428-013-0405-3
- Affective norms
- Italian language
- Psycholinguistic indexes