Study I: ANEWing the BAWL
1. + 2. Expanding the BAWL ratings
The translation of the 1,034 words provided by the ANEW (Bradley & Lang, 1999a) resulted in a total of 1,003 German words, because some of the English words have the same German translation. Ratings for valence, arousal, and imageability for 400 of the words could be retrieved from the BAWL (Võ et al., 2009). Additional ratings for the remaining 603 German words were collected separately for each dimension.
Ratings for valence, arousal, and imageability were collected according to the procedure reported in the BAWL (Võ et al., 2009). Note that this procedure involves the use of the Self-Assessment Manikin (SAM; see Fig. 1) only for the arousal dimension, but not for valence.
The SAM, a nonverbal pictorial measure derived from the semantic differential scale developed by Mehrabian and Russell (1974) and adapted by Lang (1980), has often been used to assess the three dimensions of valence, arousal, and dominance. Each of these scales is represented by five figures. To facilitate scale comprehension, SAM pictures are normally accompanied by verbal anchors at the extreme ends of the scale. But note that the capacity of the SAM to adequately represent these constructs or dimensions of emotional content is a matter of debate (see Võ et al., 2009, and Võ et al., 2006, for discussions). This is why, when designing the data collection for the BAWL, we had opted for a slightly different procedure—presented below—which we continued to use now to complete the ratings on valence and arousal for the 1,003 German words.
Following the procedure for obtaining BAWL ratings, participants were presented with the verbal anchors positiv (“positive”) and negativ (“negative”), defining the ends of a bipolar scale ranging from −3 to +3, with “neutral” (0) in the center.
Unlike valence, the concept of arousal is difficult to represent in a purely verbal way. For this reason, we opted to use SAMs for data recollection in both the original BAWL study and the present study (see Fig. 1). But, unlike Bradley and Lang (1999a), we had participants rate arousal on a 5-point scale (rather than a 9-point scale) and only used the SAMs, but not the intervals between them, for the rating.
Probably more important than this difference between a 5- versus a 9-point scale might be the following change concerning the verbal anchors: The anchors for the extreme ends of the arousal dimension, as implemented in the ANEW (Bradley & Lang, 1999a), were calm versus excited. Although the authors’ theoretical account of this scale stressed its bipolar character—ranging from a relaxing extreme to an exciting extreme, with a neutral midpoint—when designing the BAWL ratings (Võ et al., 2006), we felt that these anchors might rather lead participants to interpret arousal as a unipolar and uniformly increasing dimension.
Therefore, when collecting German arousal ratings, we applied the anchors aufregend (“exciting”) and beruhigend, which might best be translated as “calming,” to emphasize an actively relaxing aspect of low arousal, as opposed to the “exciting” opposite end of the scale—understanding arousal as a bipolar concept (see the Appendix).
Imageability was represented in terms of a monopolar, uniformly increasing 7-point scale according to the procedure in BAWL (Võ et al., 2006).
Ratings were collected separately for each dimension: Each participant was presented with a randomized list of words for which ratings had to be given on only one dimension, to avoid transfer effects. To perform the rating, participants were seated in a quiet room. Each list contained 603 items and was rated in a self-paced procedure. Each word was rated on each dimension by at least 20 participants. The words were presented in white letters, together with the rating scale, on a black background on a computer screen. Participants were given written instructions and five practice trials. The same general procedure was applied for all of the following data collections.
A total of 65 participants took part in the study (36 women, 29 men; the female ratio for any particular rating was no more than 2:1), all of whom were psychology students from the Freie Universität Berlin from the age of 18 to 37 years (mean = 24.9, SD = 4.6). All participants (here and for all of the following data assessments) were native speakers of German and had normal or corrected-to-normal vision. Their participation was rewarded with course credit or a small amount of money.
3. Collecting ratings on dominance
Because dominance ratings were not part of the BAWL, we replicated the procedure used by Bradley and Lang (1999a) in order to achieve optimal cross-language comparability.
The SAM for dominance ranges from a small-sized (dominated) figure to a large one (in control). Due to repeatedly expressed difficulties of interpretation, more explicit anchors for the dimensions of dominance and potency were used (see the Appendix). Ratings could also be placed in the spaces between the figures, replicating the 9-point scale originally used by Bradley and Lang (1999a).
For all of the data collections reported from now on, a total of 1,003 words had to be rated. The corresponding list of stimuli was randomized and divided into two subsets containing about half of the stimuli. Each participant thus rated a total of 501 or 502 words. Each half was subsequently split, and the resulting parts were alternately assigned to one of two new experimental lists for the next two participants—a procedure that should assure that all words had similar probabilities to co-occur in a given list for the collection of ratings with any other word from the total list.
A group of 40 students (24 women, 16 men) from the Freie Universität Berlin participated in the study; they were 21 to 33 years of age (mean = 25.1, SD = 4.2).
Results and discussion
To obtain an estimate of the general comparability of our data with the original ANEW values and corresponding databases in Spanish and Portuguese, we computed bivariate correlations between the values obtained in the four different languages on the three dimensions of valence, arousal, and dominance. The substantial correlation coefficients between, for instance, German and English evaluations for valence (r = .90, p < .001), arousal (r = .62, p < .001), and dominance (r = .60, p < .001) support a reasonable general quantitative comparability across the languages (see Table 1).
In particular, the very close relationship between the valence ratings in the four languages (all rs > .9) demonstrates that emotion ratings can—in principle—almost perfectly be replicated across different languages/cultures. However, the substantially lower correlations concerning the other two dimensions suggest that these cross-cultural consistencies also encounter some limitations—apparently affecting different dimensions in specific ways.
In the following discussion, we focus on these apparent discrepancies, considering especially the arousal variable and its relation to the—otherwise cross-culturally remarkably stable—concept of valence.
Although, following the classic work by Wundt (1896), the dimensions of valence and arousal have initially been introduced as two independent factors constituting a two-dimensional affective space (Bradley & Lang, 1999a; Russell, 1980), empirical reports on these variables consistently evidence a U- or boomerang-shaped distribution in which the two variables are positively correlated within the domain of positive, but negatively correlated within the domain of negative valence (Bradley, Codispoti, Cuthbert, & Lang, 2001).
For the present 1,003 German words, the distribution in the bidimensional affective space determined by the dimensions of valence and arousal (see Fig. 2) approximately fits the typical boomerang shape reported by Bradley and Lang (1999a). Accordingly, both the typical patterns of a positivity offset and a negativity bias could be replicated (Cacioppo, Gardner, & Berntson, 1997), as is revealed by a positive correlation for valence and arousal for words of positive valence (i.e., above 0, the midpoint of the valence scale; r = .2, p < .001), and a negative correlation with a considerably steeper slope for words of negative valence (r = −.63, p < .001), for the German sample. In general, a comparable pattern involving a positive offset and a negativity bias is, therefore, observable in all four versions of the ANEW corpus in four different languages. But note, also, that the data from the four languages display interesting discrepancies with respect to the relative strengths of the two effects. As we noted above, the correlation between valence and arousal has a much steeper slope in the negative than in the positive valence domain in our German data, which is comparable to what has been found in the Spanish and Portuguese data, but is opposite the situation for English, where arousal increases especially strongly in positive valence (r = .64 in the positive range, r = −.46 in the negative range; see Table 2). Moreover, both the German and Portuguese data involve a much attenuated positive correlation (r < .27) between the two variables concerning the range of positive words, relative to the English and Spanish data (r > .45).
As a consequence, the overall correlations between valence and arousal continuously increased, from the English (r = −.05), over the Spanish (r = −.15, with z = −2.69; p < .01 for the comparison to English) and Portuguese (r = −.49, with z = −6.2; p < .001 for the comparison to Spanish) data, and finally to the German data (r = −.56, with z = −1.03; p = .3 for the comparison to Portuguese). This clearly suggests that, at least for the Portuguese and German data, the two dimensions do not seem to be orthogonal to one another, challenging the assumption that the dimensions are independent.
However, the interpretation of these apparent differences concerning the internal relations of the two-dimensional space between the original ANEW and our database for German words faces a serious problem: For all emotion ratings of German words that are available so far, the operationalization of the arousal concept has been slightly modified with regard to the original instructions of Bradley and Lang (1999a)—see the Procedure section above for details.
Probably more important than the switch between a 9-point and a 5-point scale could be the more bidimensional interpretation of the arousal variable that potentially was suggested to the participants in our German sample by the translation of the verbal anchor “calm” as beruhigend (which might rather be understood as “calming”).
Certainly, these differences in the use of scales make an interpretation of the present results as evidence for cultural differences in the use of the arousal concept difficult.
To overcome this problem, and to test whether the apparent differences in arousal ratings across languages could have arisen from changes in the scales and instructions, we decided to re-collect ratings for our 1,003 words on the arousal dimension, this time perfectly meeting the operationalization and instructions previously used for the ANEW (see Bradley & Lang, 1999a); hereafter, the new set of ratings will be termed ARO (ANEW), as opposed to ARO (BAWL).
Study II: Contrasting alternative but closely related dimensions
4. Arousal (ANEW)
To ensure comparability across languages, ratings on the dimension of arousal were collected by perfectly matching the procedures applied in the ANEW corpus.
We used the original 9-point scale with five Self-Assessment Manikins, including spaces between them as optional rating positions, ranging from a relaxed (calm) to an open-eyed, literally exploding (excited) figure, with additional verbal markers being given only during the written instructions. In contrast to the version of the BAWL (Võ et al., 2009), the left verbal anchor of the scale was changed to ruhig (“calm”) instead of beruhigend (“calming”).
A group of 40 students (22 women, 18 men) from the Freie Universität Berlin participated in the study. Their ages ranged from 18 to 34 (mean = 23.8, SD = 3.6).
The resulting data did not differ considerably from those obtained using the BAWL scale and instructions (Võ et al., 2009; Võ et al., 2006) with a correlation coefficient of r = .88 between the two. Again, for the new data, the correlation between valence and arousal was negative across the whole sample (r = −.58), moderately positive for the positive range (r = .21), and characterized by an especially steep slope for the negative range (r = −.66). We thus conclude that the previously mentioned differences concerning valence–arousal correlations in the ANEW data across languages cannot be attributed to the different scale used in the BAWL, and we will again refer to these apparent cross-cultural differences in arousal ratings during the General Discussion.
We would like to point out that the relation between past operationalizations of dominance versus potency might be more complex than has been previously considered. While both dimensions are rated using an identical pictorial SAM scale (Bradley & Lang, 1994), the important difference between them resides in the perspective that the participant has to adopt toward the rated concept. In the case of dominance, the participant is asked to establish a relation toward the rated object and then to decide whether or not he or she can dominate the object—the central question being, “How dominant do you feel in relation to the word?” (Bradley & Lang, 1994). In the case of potency, the concepts are rated independently of their relation to the participant, who has to evaluate what potency the object might have, as such (e.g., Heise, 2010; Schröder, 2011).
Participants were asked in the instructions to rate the word according to its perceived potency, independently from his or her own person (cf. Heise, 2010; Schröder, 2011). The Self-Assessment Manikin scale ranged from a small-sized figure to a large one using a 9-point scale, which was formally comparable to the one used for the dominance ratings.
A group of 40 students (23 women, 17 men) from the Freie Universität Berlin participated in the study; their ages ranged from 19 to 37 (mean = 24.5 years, SD = 4.3).
The correlation coefficient between the dimensions of dominance and potency in our ratings was r = −.35. This relatively weak correlation clearly suggests that the scales are not simply reversed, but rather that different specific aspects seem to determine the respective ratings on each dimension. Additionally, the correlations with the dimensions of valence (r = .65 for dominance, r = .25 for potency, z = −47.96; p < .001) and arousal (r = −.47 for dominance, r = .64 for potency, z = 66.02; p < .001) differ considerably across both variables (see Table 3a), suggesting that distinct information is captured by the two measures.
6. Language statistical measures
Two different measures of word frequency were joined from the print-based corpus of the Leipzig Wortschatz Projekt (Wortschatz Universität Leipzig, 2013), including over 50 million words, and the SUBTLEX corpus (Brysbaert et al., 2011), with more than 25 million words taken from movie subtitles. Finally, grammatical class, number of letters, number of syllables, and number of orthographic neighbors were generated.