Altogether, 1,085,998 ratings were collected across all three dimensions. Around 3 % of the data were removed due to missing responses, lack of variability in responses (i.e., providing the same rating for all words in the list), or the completion of fewer than 100 ratings per assignment. The valence and arousal ratings were reversed post-hoc to maintain a more intuitive low-to-high scale (e.g., sad to happy rather than happy to sad) across all three dimensions. Means and standard deviations were calculated for each word. Ratings in assignments with negative correlations between a given participant’s rating and the mean for that word were reversed (9 %). This was done on the basis of both empirical evidence that higher numbers intuitively go with positive anchors (Rammstedt & Krebs, 2007) and an examination of these participants’ responses, which revealed unintuitive answers (e.g., indicating that negative words such as “jail” made them very happy). Any remaining assignments with ratings that correlated with the mean ratings per items at less than .10 were removed, and the means and standard deviations were recalculated. The final data set consisted of 303,539 observations for valence (95 % of the original data pool), 339,323 observations for arousal (89 % of the original data pool), and 281,735 observations for dominance (74 % of the original data pool). A total of 1,827 responders contributed to this final data set, with 362 of them completing assignments for two or more dimensions. A total of 144 participants completed two or more assignments within a single dimension.
For valence, 51 words received fewer than 18 (but more than 15) valid ratings. For arousal, 128 words had a total number of ratings in that range. For dominance, 564 words had a total of either 16 or 17 ratings, and 17 words had 14 or 15 ratings each. For all three dimensions, more than 87 % of the words had between 18 and 30 ratings per word. A total of 50 words in each dimension received more than 70 ratings each, due to the doubling up of ANEW words and the rerunning of lists. To illustrate how our data enriches the set of words available in ANEW, Table 1 provides examples of words that are not included in the ANEW list and that show very high or very low ratings in one of the three dimensions.
Of the 1,827 valid responders, approximately 60 % were female in all three cases (419 valence, 448 arousal, and 505 dominance). Their ages ranged from 16 to 87 years, with 11 % being 20 years old or younger; 45 % from 21 to 30; 21 % from 31 to 40; 11 % from 41 to 49; and 12 % age 50 or older. Of the participants, 24 (3.3 %), 32 (4.3 %), and 23 (2.7 %) for the valence, arousal, and dominance dimensions, respectively, reported a native language other than English, while 10 (1.4 %), 12 (1.6 %), and 12 (1.4 %) participants, respectively, reported more than one native language, including English. Table 2 shows the numbers of participants at each of the seven possible education levels. Most had some college or a bachelor’s degree.
Table 3 reports descriptive statistics for the three distributions of ratings. The distributions of both valence and dominance ratings are negatively skewed (G
1 = −.28 and –.23, respectively), with 55 % of the words rated above the median of the rating scale for both dimensions (see Fig. 1). The Mann–Whitney one-sample median test indicated that the medians of both the valence and dominance distributions were not significantly different from rating 5, which is the median of the scales (both ps > .1). The tendency for more words to make people feel happy and in control goes along with numerous former findings of positivity biases in English and other languages (see Augustine, Mehl, & Larsen, 2011, and Kloumann et al., 2012 ). The positivity bias—or the prevalence of positive word types in English books, Twitter messages, music lyrics, and other genres of texts—is argued to reflect the preference of humankind for pro-social and benevolent communication. Arousal, on the other hand, is positively skewed (G
1 = .47), meaning that only a relatively small proportion of words (20 % above a rating of 5) made people feel excited.
Ratings of valence were relatively consistent across participants, while arousal and dominance were much more variable. This is indicated by the difference between the average standard deviations of the dimensions: 1.68 for valence, but 2.30 and 2.16 for arousal and dominance, respectively. In addition, the split-half reliabilities were .914 for valence, .689 for arousal, and .770 for dominance; see below for other examples of a higher variability of dominance and arousal ratings. Figure 2a–c show, for the three emotional dimensions, the means of the ratings for each word plotted against their standard deviations, with each scatterplot’s smoother lowess line demonstrating the overall trend in the data (red solid lines). For illustrative purposes, each plot is supplied with selected examples of words that are substantially more or less variable than other words with the given mean rating. Swear words, taboo words, and sexual terms account for a disproportionally large number of words that elicit more variable ratings of valence and arousal than would be expected given the words’ mean ratings (shown as words in blue above the red lowess line in Fig. 2a–c), in line with Kloumann et al. (2012). Below we will demonstrate that the greater variability for such words may be due to gender differences in the norms.
For valence, the scatterplot in Fig. 2a (top left) is symmetrical about the median, with relatively positive or negative words associated with smaller variability in the ratings across participants, as compared to valence-neutral words (see Moors et al., in press, for a similar finding in Dutch). The same holds for the pattern observed in the dominance ratings, Fig. 2c (bottom left). The plot of valence strength (absolute difference between the valence rating and the median of valence ratings; Fig. 2d) corroborates the tendency of more extreme (positive or negative) words to be less variable in their ratings than neutral ones. In contrast, for arousal in Fig. 2b (top right), words that make people feel calm generally elicit more consistent ratings than do those that make people feel excited. To sum up, in terms of the variability of ratings, valence and dominance pattern together and are best considered in terms of their magnitude (how strong is the feeling) rather than their polarity (sad vs. happy, or controlled by vs. in control); polarity, however, determines variability in the arousal ratings.
Correlations between dimensions
We found the typical U-shaped relationship between arousal and valence (see Fig. 3a; Bradley & Lang, 1999; Redondo, Fraga, Padrón, & Comesaña, 2007; Soares, Comesaña, Pinheiro, Simões, & Frade, 2012): Words that are very positive or very negative are more arousing than those that are neutral. This is corroborated by the positive correlation between valence and arousal for positive words (mean valence rating > 6; r = .273, p < .001) and the negative correlation between valence and arousal for negative words (mean valence rating < 4; r = −.293, p < .001). The relationship between arousal and dominance is also U-shaped (see Fig. 3b), as corroborated by the positive correlation between dominance and arousal for high-rated dominance words (mean rating > 6; r = .139, p < .001) and the negative correlation between dominance and arousal for low-rated dominance words (mean rating < 4; r = −.193, p < .001). The relationship between valence and dominance is linear, with words that make people feel happier also making them feel more in control (see Fig. 3c). Table 4 shows that a quadratic relationship between arousal and valence and between arousal and dominance explains more of the variance than does a linear relationship. However, this does not rule out the possibility that the high and low levels of these associations might be explained better by a regression with a break point at the median of the scale (see Fig. 3). The relationship between dominance and valence, however, is fitted better by a linear model.
The strength of the correlation between dominance and valence casts doubt on the claim that the three dimensions under consideration here are genuinely orthogonal affective states. This assumption was the basis of the original ANEW study (Bradley & Lang, 1999), stemming from original factor analyses done by Osgood, Suci, and Tannenbaum (1957). Future research will have to demonstrate that dominance explains unique variance over and above valence in language-processing behavior. The fact that extreme values of valence and dominance are more arousing point again at the utility of considering valence/dominance strength (i.e., how different a word is from neutral) rather than polarity as the explanatory variable. We return to this point below.
We compared our ratings with several smaller sets of ratings that had been collected previously by other researchers, including the ANEW set from which we drew our control words. The correlations are listed in Table 5.
Valence appears to generalize very well across studies and languages, as evidenced by high correlations. Both arousal and dominance show more variability across languages and studies, as reflected in the lower correlations. Note that these studies themselves (those that have reported the information—i.e., c, d, and e) also found a lower correlation between their arousal and dominance ratings and the arousal and dominance ratings reported in other studies (arousal range = .65 to .75; dominance range = .72 to .73). Importantly, however, cross-linguistic correlations were stronger (the range of Pearson’s r for arousal was .575–.759) than those between gender, age, and education groups within our study (the range of Pearson’s r was .467–.516), see Table 8 below. This observation clearly indicates the validity of using emotional ratings to English glosses of words in a language that does not have an extensive set of ratings at the researcher’s disposal. This seems to be more the case for valence and dominance than for arousal.
Correlations with lexical properties
As is known for other subjective ratings of lexical properties (cf. Baayen, Feldman, & Schreuder, 2006), judgments of the emotional impact of a word are likely to be affected by other aspects of the word’s meaning. Table 6 reports correlations of valence, arousal, and dominance with a range of available semantic variables. In the remainder of the article, words, rather then the trial-level data, were chosen as units of the correlational analyses.
Most of the correlations that the emotional ratings show with other semantic properties are weak to moderate (Cohen, 1992), with the exception of correlations with variables that directly tap into emotional states (h and i in Table 6). Specifically, words that make people happy are easier to picture [r(5123) = .161, p < .001] and more concrete [r(1565) = .105, p < .001], familiar [r(2904) = .206, p < .001], context rich [r(316) = .196, p < .001], and easy to interact with [r(1396) = .203, p < .001], are of high frequency [r(13763) = .182, p < .001], and are learned at an early age [r(13707) = −.233, p < .001]. They are also associated with low pain [r(501) = −.456, p < .001], intense smell [r(501) = .139, p < .01], vivid color [r(1281) = .322, p < .001], pleasant taste [r(501) = .309, p < .001], quiet sounds [r(501) = −.176, p < .001], and stillness [r(501) = −.113, p < .05]. Virtually all of these properties are also associated with words that make people feel in control; that is, they correlate in the same way with dominance ratings.
Words that make people feel excited are more ambiguous [r(1565) = −.258, p < .001], unfamiliar [r(501) = −.193, p < .001], context impoverished [r(316) = −.147, p < .01], and difficult to interact with [r(1396) = −.143, p < .001]. They are also associated with strong general sensory experience [r(5005) = .228, p < .001], specifically with high pain [r(501) = .579, p < .001], unpleasant taste [r(501) = −.102, p < .05], intense sounds [r(501) = .407, p < .001], motion [r(1281) = .335, p < .001], and an inability to be grasped [r(501) = −.121, p < .01].
As correlations do not reveal the form of the functional relationships, Fig. 4 below zooms in on functional relationships between the three emotional dimensions and selected semantic properties of interest.
The top left panel of Fig. 4 reveals that early words are maximally positive, strong, and calm. Words become more negative and weak (controlled by) on average as the age of acquisition increases. The peak of arousal is reached in the words learned around the age of 10, while later-acquired words are less exciting. It is tempting to interpret these results as an average developmental timeline of vocabulary acquisition in North American children, with (a) earliest happy and calm words learned in a risk-averse environment protecting a child from negativity and excitement, and (b) excitable words like sexual terms, taboo words, and swear words learned in early school age. Yet it is more likely that the age-of-acquisition patterns of emotional words are at least partly due to how often they occur in English, and thus how likely children are to encounter and learn them early. The top right of Fig. 4 demonstrates that the more frequent a word is, the happier, stronger, and calmer it tends to be. The observed linear relationship between log frequency of occurrence and valence is reasonably strong: The Pearson’s correlation coefficient is .18, and the increase in valence between the least and most frequent words is on the order of two points on the 9-point scale. This corroborates the finding of Garcia, Garas, and Schweitzer (2012) and runs counter to the claim of Kloumann et al. (2012) that the positivity bias in English words is only observed in word types (there are more positive than negative words) and that the correlations between frequency and valence, if any, are corpus-specific and small. The discrepancy may be due to the much broader range of frequency that we consider here, with 14,000 words from the top of the frequency list rather than 5,000 words in each of the corpora considered by Kloumann et al. We leave the verification of the positivity bias over a broader frequency range to further research.
Only highly imageable words are emotionally colored (Fig. 4, bottom left): As imageability increases from rating 5 on the 7-point scale, words become more positive and strong (in control). Again, arousal is distinct from this pattern: Words that are hardly imageable at all or very imageable are calm, while those in the middle of the imageability range increase excitement.
The increasing strength of a sensory experience (Fig. 4, bottom right) varies strongly with arousal: The more tangible the word is, the more exciting it is. This suggests that abstract notions are less powerful in agitating human readers than are material objects. The functional relationship with valence is only observed in the top half of the sensory experience range: More tangible words induce increasingly positive emotions. No reliable relationship is observed between sensory experience ratings and dominance.
Interactions between demographics and ratings
Participants were naturally divided into two genders. In addition, we divided them into two age ranges using the median split—younger (less than 30) and older (30 or greater). We also dichotomized education level into higher (those who had an associate’s degree or greater) and lower (some college or less). All three dimensions showed slightly but significantly higher average ratings for younger versus older and for lower education versus higher education. Also, males gave slightly but reliably higher ratings in all dimensions than did females. Separate independent t tests showed that this difference was significant for valence and arousal, but not for dominance. The means, standard deviations, and independent t test significance levels of each group division are listed in Table 7.
Table 8 reports correlations between groups of participants and demonstrates substantial variability in the ratings that they provided: As with the overall data in Table 5, arousal and dominance elicited less agreement in judgments than did valence.
We ran a series of multiple regressions looking at age, gender, and education (all dichotomized as described above) as predictors. All main effects were significant at p < .001, and each variable made a unique contribution to the variance in the collected ratings. In addition, most of the two- and three-way interactions for all three dimensions were significant, likely due to the large number of data points available. However, the actual ranges of the effects tended to be small. One exception was the interaction between age and education level for all three dimension (see Fig. 5). For valence and arousal, highly educated people rated words similarly, regardless of age. For those with less education, age strongly affected ratings, with the younger group providing higher ratings, on average, than did the older. For dominance, the opposite pattern held: Age affected those in the higher education group, with older participants providing higher ratings than younger ones, but age did not have an effect in the lower education group.
In what follows, we concentrate on gender differences. Effects of well-established lexical properties on emotion norms varied by gender. Figure 6 presents interactions of gender with frequency of occurrence and age of acquisition as predictors of emotional ratings. All interactions reached significance in multiple regression models, with each set of ratings treated separately as a dependent variable, all ps < .01.
The interactions revealed that female raters provided more extreme negative/weak ratings for the lowest-frequency words, and more extreme positive/strong ratings for higher-frequency words, yielding a broader range of values for both valence and dominance. The same holds for the more extreme ratings given by females to earliest- and latest-learned words, as compared to males.
Quite the opposite pattern was observed in the ratings of arousal (Fig. 6, middle row). Female raters showed a weak relationship between either frequency or age of acquisition and arousal, with slightly higher arousal words in the higher-frequency band and in the mid-range of age of acquisition. Conversely, male raters revealed a strong tendency to find higher-frequency and earlier-learned words as being less exciting than relatively late and infrequent words.
Variability in ratings also varied by gender, see Fig. 7. Male raters disagreed increasingly more on all ratings to higher-frequency words, while variance in ratings by female participants was increasingly attenuated with an increase in word frequency.
While pinning down the origin of these differences will be an issue for further investigation, here we note the necessity for research into emotion words to take into account these interactions as potential sources of systematic error.
An interesting aspect of emotional ratings is their use to quantify attitudes and opinions toward physical, psychological, and social phenomena either in the population at large or in specific target groups. We showcase here emotional ratings to the semantic categories of disease (Fig. 8) and occupation (Fig. 9), based on Van Overschelde et al.’s (2004) category norms, with occasional additions of semantically similar words. As Fig. 8 suggests, all diseases are rated as words evoking negative feelings, high arousal, and feelings of being controlled; that is, all ratings were below the median of valence/dominance and above the median of arousal in the entire data set (shown as a dotted line). Sexually transmitted diseases were judged as being among the most negative and the most anxiety-provoking entries in the subset. This is generally in line with surveys of attitudes that list sexually transmitted diseases as being among the most stigmatized medical conditions (e.g., Brems, Johnson, Warner, & Roberts, 2010). The most feared medical conditions—cancer, Alzheimer’s, heart disease, and stroke (listed by decreasing percentages of respondents who feared them; MetLife Foundation, 2011; YouGov, 2011)—are also among the most negative, the least controllable, and the most anxiety-provoking diseases.
Ratings of valence to occupations revealed that the best-paying professions in the list were judged as being the most negative, below the median in the overall data set: compare “lawyer,” “dentist,” and “manager.” The correlation between average income, as reported by the Bureau of Labor Statistics (2011), and mean valence is indeed negative, but it does not reach significance (r = −.167, p = .434), possibly due to reduced statistical power (df = 22). Some interesting contrasts can be seen that might prove interesting to social scientists. For example, both the words “police officer” and “firefighter” are rated as highly arousing, but “police officer” is viewed negatively while “firefighter” is viewed positively. In contrast, “librarian” is a positive but completely unarousing occupation term.
Emotional ratings are also a useful tool for studying gender differences in attitudes and beliefs. Figure 10 reports gender differences in ratings to terms denoting weaponry, with the difference between the ratings of female and male responders on the y-axis. The upper parts of the plots in Fig. 10 show words that were given higher valence, arousal, or dominance ratings by female responders; dotted lines represent the no-difference line. Words in blue color stand for items for which the difference in ratings between gender groups reached significance at the p < .01 level in two-tailed independent t tests.
All three emotional dimensions showed a significantly greater number of ratings in the lower parts of the plots (all p values in chi-square tests < .01). This indicates that male responders generally have a happier, more aroused, and more in-control attitude toward weapons, especially fire weapons and the bow, for which the gender difference in ratings reached significance.
A similar bias toward higher valence, arousal, and dominance can be observed in ratings of male responders to taboo words and sexual terms. As Figs. 11 and 12 demonstrate, most lexical items in this subset are located below the dotted lines, revealing overall higher ratings for taboo words in male responders (marked in blue if reaching significance) and, in rare cases, in female responders (marked in red if reaching significance). The observed discrepancies in attitudes are corroborated by Janschewitz (2008), Newman, Groom, Handelman, and Pennebaker (2008), and Petersen and Hyde (2010). The discrepancies also explain the disproportionate presence of sexual terms and taboo words among lexical items with exceedingly variable ratings (see the highlighted words in Fig. 2 whose standard deviations are larger than the value predicted from their means).