Introduction

When people describe emotions in songs and literature, they often use figurative language. For example, Jane Porter (1807) wrote, “Happiness is a sunbeam …when it strikes on a kindred heart, like the converged light on a mirror, it reflects itself with redoubled brightness” (p. 81). Figurative language is also common in everyday speech (Fainsilber & Ortony, 1987). People might say, I’m feeling up, she had cold feet, or he was hot-headed. Indeed, figurative language appears to be embedded in how people think (Lakoff, 2014). Not surprisingly, figurative language sometimes creeps into psychological tests related to emotions. For example, figurative language occurs in the Beck Depression Inventory (wound up; Beck, 1972), the Center for Epidemiologic Studies Depression Scale (CES-D; blue; Eaton, Muntaner, Smith, Tien, & Ybarra, 2004; Radloff, 1977), the Positive and Negative Affect Schedule – Expanded Form (PANAS-X; blue, downhearted, sluggish, sheepish; Watson & Clark, 1994; Watson, Clark, & Tellegan, 1988), and Hogan’s Empathy Scale (on top of the world, short tempered; Hogan, 1969). Even the word depression is figurative language: If this word appeared on an emotion test, it would not be interpreted literally as a shallow hole in the ground – or at least, we hope it would not.

There are several types of figurative language. In the quote above, happiness is a sunbeam is a metaphor; it says that one thing (happiness) is the same as another (a sunbeam). Happiness is like the converged light on a mirror is a simile; it says that one thing (happiness) is like another (light). Kindred heart is a metonymy; it uses one thing (a heart) to stand for something it is closely associated with (emotions).

Metonymies are often used to describe emotions. For example, when people are sad, they tend to have slouched body postures and the corners of their mouths tend to turn downwards (Lakoff & Johnson, 1980). From this, people come to associate a downward orientation with sadness, thus leading to phrases such as “I’m feeling down” (Lakoff & Johnson, 1980, p. 15), in which the descriptor down can stand for the target emotion sadness (Kövecses, 2013). When people generalize from the metonymy to other features of the descriptor (things that are not inherently associated with that emotion), conceptual metaphors are created (Kövecses, 2013). For example, people might say they are “in the pits” (p. 79) or “down in the dumps” (p. 79), even though people are not usually located in holes in the ground when they are sad. These instances of figurative language are based upon the conceptual metaphor sadness is down. Thus, conceptual metaphors underlie multiple instances of figurative language (such as ordinary metaphors and similes).

Figurative language can be helpful because it is often the easiest way to explain abstract concepts such as emotions. However, it is also more likely to be misunderstood than literal language. When figurative language appears in psychological tests, it might be misinterpreted for three reasons. First, some test takers may be completing a test that is not written in their native language. When people are learning a new language, they gain grammatical and communicative proficiency first and may not learn figurative language or metaphorical concepts until much later (Danesi, 1992). Second, figurative language varies from one country to another. Because of this, translators must take great care to ensure that the original meaning is preserved (Corral & Landrine, 2009; Matsumoto & Yoo, 2007). For example, calling someone a bear in Chinese is a way of implying they are incompetent (Wang, Wang, & Xing, 2011), but this implication does not hold in the USA. Even when two countries use the same language, figurative language may vary. For example, consider the different meanings of braces, lift, and rubber in the USA and the UK (Lovett, 2013). Finally, even if a test taker is familiar with the figurative language, they may interpret it in a different way than the test designer intended, because figurative language often has many interpretations (Partridge, 2006).

When talking in person, it is fairly easy to resolve miscommunications caused by the use of figurative language. However, when figurative language is embedded in a written psychological test, the test administrator is unlikely to realize a test taker is misinterpreting some phrases. At a minimum, such misinterpretations could add random variation to the test scores, making the test less reliable and valid for those test takers. At worst, these misinterpretations could systematically raise or lower test scores, causing test bias.

Test designers do not appear to be aware of this issue. This is evident from the fact that some of our most respected tests of emotions include figurative language. Books that explain test design do extol the importance of writing test content that is clear and unambiguous (Hogan, 2015; Kline, 2005; Kubiszyn & Borich, 2007; McIntire & Miller, 2007; Salkind, 2006), but they do not generally warn test designers against using figurative language. We examined dozens of psychometrics books and only two of them addressed the issue of figurative language, and then only obliquely: They said that test designers should avoid “specific linguistic constructions” (Aiken & Groth-Marnat, 2006, p. 138) and “slang or colloquial language” (McIntire & Miller, 2007, p. 376).

We three came to appreciate the problem of test bias when we were designing a new test of emotion perception. Due to increasing globalization, people in the USA come from many different countries and cultures. Contrary to the image of the USA being a melting pot, fully 20 % of US residents do not speak English at home (United States Census Bureau, 2012a). Moreover, with the increasing use of English around the world, English-language psychological tests are likely to be used in many countries. We therefore tried to write test items that would be culture fair. While writing items, we identified a possible source of test bias: cross-cultural differences in the emotional connotations of non-emotion words. We therefore conducted an extensive literature search on conceptual metaphors for emotions and tried to avoid culturally specific language. However, when we stared at our final items, a niggling worry persisted: Were the emotional connotations of these non-emotion words really the same? Would people from different cultures really give the same interpretations to our items – or items on other psychological tests related to emotions? This paper documents our attempt to answer these questions.

Rather than examining the items on one specific test, we took the more general approach of studying conceptual metaphors and metonymies for emotions. In this way, we hoped to determine which figurative language for emotions can safely be used in psychological tests. We asked participants which descriptors (e.g., up vs. down) are associated with target emotions (e.g., happiness). If almost all people associate the descriptor with one emotion, it can be used unambiguously on psychological tests.

Selecting conceptual metaphors and metonymies

To identify metonymies and conceptual metaphors for emotions, we conducted a review of previous research. We located support for 14 metonymies and conceptual metaphors for emotions in the USA: Happiness is associated with up, bright, and warm; sadness is associated with down, dark, blue, and empty; anger is associated with hot, red, and dark; and fear is associated with white, cold, dark, and being paralyzed. Ten of these associations have also been identified in three or more other countries (see Table 1).

Table 1 Support for conceptual metaphors and metonymies

These associations have usually been found using lexical analyses. In a lexical analysis, researchers identify commonly used figurative language in a specific culture and then deduce the conceptual metaphors that underlie that language (Kövecses, 1986). For example, in the USA, people sometimes say, “The thought chilled him,” “He had cold feet,” or “Cold shivers ran down her spine” (Kövecses, 2009, p. 81). From these, researchers deduce that fear is cold in the USA.

While lexical analyses can find commonly used figurative language in different countries, these analyses do not indicate how many people understand or use this language in each country. To determine what proportion of a population associates a certain descriptor (e.g., cold) with a target concept (e.g., fear), researchers can ask them explicitly. For example, Waggoner (2010) asked participants in the USA if certain emotions were associated with specific temperatures. He found that adults commonly associated anger with hot, happiness with warm, and fear and sorrow with cold. In the current studies, we extended this research by examining the evidence for a wider range of associations and by including a second country.

Selecting countries

We collected data in the USA and India. We selected India as the second country for three reasons. First, both the USA and India have large numbers of users on Amazon Mechanical Turk (a website that can be used for participant recruitment), making it feasible to obtain large samples (Ross, Irani, Silberman, Zaldivar, & Tomlinson, 2010). Second, India has a large English-speaking population (in fact, English is the subsidiary official language; Department of Official Language, 2011), and this population might be given tests that are written in English because validated tests in the person’s first language may not exist. Third, India is culturally and socially different from the USA. It has different ethnic groups (India is 72 % Indo-Aryan and 25 % Dravidian; the USA is 63.7 % White, 16.3 % Hispanic or Latino, and 12.6 % Black), religions (79.6 % Hindu and 14.2 % Muslim vs. 86.2 % Christian with 24.5 % Catholic and 15.8 % Baptist), population densities (382 people/km2 vs. 87.4), and rural/urban mix (68.8 % rural vs. 21 %) (Census of India, 2011; United States Census Bureau, 2012b). India also is a more closely integrated society (moderate collectivism vs. high individualism) with greater differences in power among individuals (relatively high power distance vs. moderate power distance; Hofstede, 1983). Choosing a country that is so different from the USA provides a better test of whether the associations with emotions are universal and can be used on psychological tests without causing test bias.

Hypotheses

We hypothesized that the above-mentioned 14 associations would be common (endorsed by more than 50 % of participants) in the USA: Happiness would be commonly associated with up, bright, and warm; sadness with down, dark, empty, and blue; anger with dark, hot, and red; and fear with dark, white, paralysis, and cold. Because some of these associations have been found in at least one other country and because some of them might plausibly be due to common physiological reactions to emotional experiences, we hypothesized that most of these same associations would be common in India. We were uncertain, however, if any of these descriptors would be associated with any of the emotions by the vast majority of participants (e.g., more than 90 %) and would be safe to use on psychological tests.

Study 1

Method

Procedures

Participants were recruited through a short advertisement on Amazon Mechanical Turk (MTurk). MTurk is an English-language crowd-sourcing website that allows researchers to advertise studies to potential participants. An increasing number of behavioral studies are being advertised on MTurk (Paolacci & Chandler, 2014), because it is a quick and easy way to obtain large samples (Buhrmester, Kwang, & Gosling, 2011). MTurk users provide high quality data: Participants provide consistent and verifiable answers (Rand, 2012) that result in high reliability (Buhrmester et al., 2011) and replication of cognitive and behavioral findings from laboratory testing and college samples (Paolacci & Chandler, 2014; Sprouse, 2011).

MTurk may be a better source of participants than traditional university subject pools for three reasons. First, college students are often required to do research studies for course credit, which they receive regardless of the quality of their data. In contrast, MTurk workers are primarily motivated by being paid for their work (Antin & Shaw, 2012) and are therefore concerned about protecting their online reputation so that they can get access to good, high-paying work (Gupta, Martin, Hanrahan, & O’Neill, 2014). As a result, we think MTurk workers may take studies more seriously. Second, MTurk samples are more diverse than college student samples (Mason & Suri, 2011; Paolacci & Chandler, 2014) and other Internet samples (Buhrmester et al., 2011), making them more representative of the USA population. Third, MTurk allows researchers to restrict participants to those with certain qualifications, such as people who live in certain countries, thus facilitating cross-cultural research (Amazon Web Services, 2014).

For most tasks on MTurk (including research studies), users are paid a nominal fee. Therefore, participants were given a validation code at the end of our 5-min survey. When they entered the code into a textbox on the MTurk website, they were paid 20 cents. Litman, Robinson, and Rosenzweig (2015) showed that compensation in this range is sufficient to ensure high quality data in both the USA and India.

Participants

Participants were recruited from the USA and India. Participants were only allowed to participate in the study if they claimed to live in the USA or India and if their computer’s IP address was from one of these two countries.

Initially, there were 219 participants (115 male, 104 female) from the USA and 362 (212 male, 150 female) from India. To ensure that participants understood the study and the items, we asked participants how comfortable they were reading, writing, speaking, and listening to English, using ratings that ranged from 1 (Very uncomfortable, it's a real struggle) to 10 (Perfectly comfortable). We only analyzed the data from participants who indicated that they were very comfortable with English (at least 9 out of 10) on all four items. The first author has used this exact filtering method in every one of her studies on emotional intelligence and thus knew from the beginning of data collection that this filtering method would be used.

The final sample included 205 people from the USA and 161 people from India. Originally, we had aimed for samples of 200 from each country. However, given that only 107 people are needed from each country to have power of .80 for a medium effect using a chi-square test of independence (Cohen, 1992), these sample sizes were considered adequate.

The participants from the USA were equally divided between male and female (103 male, 102 female) and they ranged in age from 18 to 64 years (mean = 32.8, SD = 11.0). Most participants identified themselves as White (82.4 %), Black or African American (11.7 %), or Asian (5.9 %). Most spoke English as their first language (95.6 %).

The participants from India were predominantly male (100 male, 61 female). They ranged in age from 19 to 61 years (mean = 31.4, SD = 8.8). Most identified themselves as Asian (94.4 %). About a quarter spoke English as their first language (24.2 %). Their next most common first languages were Malayalam (19.9 %), Tamil (19.9 %), and Hindi (17.4 %).

Measures

Language of emotions questionnaire

We modeled the Language of Emotions Questionnaire (LEQ) after the measure used by Waggoner (2010), which asked if certain emotions “corresponded” (p. 237) to different temperatures. The LEQ expands upon Waggoner’s measure by examining a wider range of descriptors: those identified in previous research on conceptual metaphors for happiness, sadness, anger, and fear (e.g., up, down, bright, dark, etc.; see Table 1). Like Waggoner, we asked participants if the descriptors “corresponded” (p. 237) to the emotions. For example, does the descriptor up correspond to happiness? For each emotion, there were six items. For most items, participants were instructed to choose one of the adjectives (up or down; dark or bright; empty or full; paralyzed or able to move; and hot, warm, cool, or cold). For the last item, participants were allowed to select multiple colors.

Results and discussion

In the USA, the majority of participants indicated that the target emotion was associated with the intended descriptor for 13 of the 14 associations (see Tables 2, 3, and 4). These results indicate that people in the USA commonly understand the majority of these associations, providing additional support for these metonymies and conceptual metaphors. In India, fewer participants indicated that the target emotions were associated with the intended descriptors. This led us to divide the metonymies and conceptual metaphors into those that might be universal (and so might be safe to use on psychological tests) and those that appear to be culture-specific, providing interesting insights for the field of cognitive linguistics (Tendahl & Gibbs, 2008).

Table 2 Percentages of people indicating that the emotion corresponds to specific temperatures, Study 1
Table 3 Percentages of people indicating that the emotion corresponds to dichotomous descriptors, Study 1
Table 4 Percentages of people indicating that emotions correspond to each color, Study 1

Ten of the fourteen associations were endorsed in both the USA and India. These metonymies and conceptual metaphors seem like good candidates for universality, given how strong these associations were in this study and given that these associations have also been found in previous research. First, we found strong evidence that happiness is bright and that negative emotions (anger, sadness, and fear) are dark. Across both the USA and India, at least 87 % of participants endorsed these associations, echoing previous research in the USA, China, Germany, Hungary, Persia, Russia, and the UK (see Table 1). We consider these to be sizable percentages, especially given that facial expressions for emotion are said to be universally recognized even though only 81.5 % of people on average select the target emotion (Russell, 1991). Darkness may be associated with negative emotions and brightness with positive emotions because darkness tends to lead to physical unease and melancholy, whereas brightness tends to lead to feeling confident, safe, and healthy (Barcelona, 2003).

Second, we found strong evidence that anger is associated with red and hot. Across both countries, at least 66 % of participants endorsed these associations. Given that color and temperature choices were not dichotomous (nine colors and four temperatures were listed), these percentages are impressively high. These results are congruent with extensive research in the USA showing that anger is associated with red and hot, as well as similar research in China, Hungary, Israel, Japan, Nepal, Russia, Spain, the UK, Wolof people, and Zulu people (see Table 1). These results are also congruent with three studies from India. First, Chand (2008) conducted a lexical analysis comparing conceptual metaphors in Hindi to English and found that the conceptual metaphor anger is heat is quite common in Hindi, as when referring to an angry person as “boiling or swelling with emotion” (p. 11). Second, Ferro-Luzzi (1995) conducted another lexical analysis, this time of a single Tamil writer (L. S. Ramamirtham), and found anger was associated with hot: “his ears spurted out steam” (p. 189) and fury burnt in their eyes “like live embers covered by ashes” (p. 189). Third, Kennedy (2011) conducted an informal study of figurative language through the Internet and found that the Hindi phrase [which translates as red coal] means to be flushed with anger. The associations of anger with red and hot are consistent with the conceptual metaphor, Anger is a hot fluid in a container, which has been found in English, Chinese, Hungarian, and Japanese (Kövecses, 2000). For example, when anger becomes more intense, fluid rises (“His pent-up anger welled up inside him”; Kövecses, p. 162) and steam is produced (“Smoke was coming out of his ears”; p. 162). When anger becomes too intense, there is pressure on the container and the person explodes (“She blew up at me”; p. 162). This conceptual metaphor might exist because angry people have increased body temperatures, feel internal pressure, and become red in the neck and face (Kövecses, 2000). In addition, both anecdotal evidence (Schwitzgebel, 2006) and experimental research (Fetterman, Robinson, Gordon, & Elliot, 2011) indicate that some people literally see red when they are angry. If this conceptual metaphor is indeed based upon physiological changes, then the associations of anger with red and hot may be commonly endorsed in many countries.

Third, we found strong evidence that happiness is associated with up and sadness with down. Across both countries, these associations were endorsed by at least 96 % of participants. This is consistent with previous research in the USA, India, China, France, Germany, Hungary, Israel, Japan, Mayan Indians, Nepal, and Russia (see Table 1). This is also consistent with conceptual metaphor theory, which claims that happiness is associated with up and sadness with down because of our physical experiences, such as upright posture for happiness and slouching posture for sadness (Lakoff & Johnson, 1980).

Fourth, we found strong evidence that sadness is associated with empty. At least 91 % of participants in the two countries endorsed this association, consistent with previous research in the USA and Persia (see Table 1). Shweder (1991) says that people in some countries experience depression as emptiness because the physical consequences of depression (loss of sleep, appetite, energy, etc.) are the same as the perceived consequences of emptiness. We believe that sadness may be associated with emptiness because emptiness is associated with the lack of something, which is likely to cause sadness.

Finally, the association of fear with paralysis was endorsed by at least 86 % of participants in the two countries, consistent with previous research in the USA, Persia, Poland, Russia, and the UK (see Table 1). Fear may be associated with paralysis due to people’s physiological reaction to fear, which can include an inability to move (Ding, 2012).

Additional research is needed to determine the causes of cross-cultural similarities in the ways emotions are described. Some similarities may be due to direct borrowing: Just as languages borrow individual words (e.g., English borrowed joy, cry, and glorify from French), they also borrow whole phrases (Baker & Jones, 1998). However, whether the metonymies and conceptual metaphors were originally invented in many languages or just one, this still leaves the question of how these phrases came about in the first place. Lakoff (2009, 2014) has postulated that metonymies and conceptual metaphors are possible because physiological sensations often accompany emotional experiences. As William James (1890) said, “What kind of an emotion of fear would be left if the feeling neither of quickened heart-beats nor of shallow breathing, neither of trembling lips nor of weakened limbs, neither of goose-flesh nor of visceral stirrings, were present, it is quite impossible for me to think” (p. 452). Physical experiences like warmth, proximity, height, and movement co-occur with emotional experiences, which causes these sensations to be bound into neural circuits with the emotions themselves (Lakoff, 2009, 2014). These internal sensations can then be used to help understand emotions (Lakoff, 2014; Meier & Robinson, 2005). Moreover, invoking cognitive structures related to sensations influences emotional reactions and social responses (Fay & Maner, 2014). For example, social rejection causes lower skin temperatures, but holding a warm cup counteracts the emotional effect of social rejection (Ijzerman et al., 2012), and recalling a nostalgic event makes a cold room feel warmer (Zhou, Wildschut, Sedikides, Chen, & Vingerhoets, 2012). Thus, the cross-cultural associations of emotions with sensory descriptors may go beyond words to underlying cognitive structures.

In contrast to the ten associations just discussed, some associations may be specific to a few countries or cultures. First, sadness and blue seem to have a country-specific association: Most participants from the USA (65.9 %) associated sadness with blue, but only a minority of participants from India (15.5 %) did so. This was surprising to us, because lexical analyses have shown that sadness is associated with blue in the USA (Apresjan, 1997; Gunther, 2004), China (Tao, Tan, & Picard, 2005), Finland (Salokangas, Vaahtera, Pacriev, Sohlman, & Lehtinen, 2002), and among Mayan Indians (Shweder, 1991). Similarly, in Italy, Korea, and Mexico, blue is associated with mourning (Maroto & De Bortoli, 2001). However, blue is associated with different emotions in other countries. For example, the French phrase “fureur bleue” means to be in a blue fury or to have extreme anger (Kennedy, 2011, French section, line 64) and the phrase “avoir une peur bleue” means to have a blue fear or to be frightened to death (line 67). Blue may be associated with a variety of emotions because the associations are not based upon common physiological reactions. People turn red when they are angry (hence, anger is red), but people do not turn blue when they are sad (Fetterman, Robinson, & Meier, 2012).

Two additional associations may be culture-specific: Participants in the USA predominately associated happiness with warm and fear with cold, but these associations were not common in India. This finding contradicts previous research demonstrating that people from multiple countries, including India, associate joy with an increase in body temperature and fear with a decrease in body temperature (Scherer & Wallbott, 1994). However, three meta-analyses (Cacioppo, Bernston, Larsen, Poehlmann, & Ito, 2000; Kreibig, 2010; Stemmler, 2004) have examined the relationship between emotions and physiological reactions and only one of them (Kreibig, 2010) found that happiness increased facial temperature and fear reduced it. Thus, inconsistencies in the conceptual link between emotions and temperatures may be because these associations are not based upon strong, consistent physiological associations.

One association was not endorsed in either country: Fear is white. This finding contradicts previous research conducted in several countries, which has found that people are said to turn white, blanch, or become pale when they are frightened (Apresjan, 1997; Maalej, 2007; Oster, 2010; Strugielska & Alonso-Alonso, 2007). For example, the phrase “to whiten” is used in the USA and Russia to indicate fear (Apresjan, 1997, p. 187) and the phrase “his face was white from fright” is used in Persia (Maalej, 2007, p. 98). Examining these phrases in light of our findings, we developed a new hypothesis about the association of white and fear. These sayings appear to be based upon a physiological reaction: When people become scared, the blood drains from their faces, making them appear pale or white (Maalej, 2007). Perhaps fear is only associated with looking white, not white in general. We designed Study 2 to investigate this possibility.

Study 2

Introduction and method

In Study 1, we based our methodology upon Waggoner’s (2010) study, in which participants were simply asked if descriptors corresponded to emotions (e.g., which descriptors does fear correspond to?). However, we now believe there are three ways descriptors might be associated with an emotion: Descriptors might be associated with how an emotion feels, how a person looks when they are feeling an emotion, or the cause of an emotion. Therefore, we revised the LEQ. The LEQ2 has the same items as the LEQ. However, there are now three sections: Which adjective describes how people feel when they experience each emotion, which adjective describes how people look when they experience each emotion, and which adjective describes the cause of each emotion.

Using MTurk, we recruited 214 new participants from the USA and 215 new participants from India. Participants were automatically paid 35 cents for completing the 10-min survey. As in Study 1, we only analyzed the data from participants who were very comfortable with English.

Results and discussion

The results were essentially the same as in Study 1. In the USA, the majority of participants associated the target emotion with the descriptor for 13 of the 14 metonymies and conceptual metaphors, and in India, the majority did not associate sadness with blue, happiness with warm, or fear with cold (see Tables A, B, and C in the supplementary material).

As predicted, the relationships between fear and white were somewhat different depending upon which prompt was used (see Table 5). Fear was associated more with how someone looked than with how they felt or the cause of the emotion. However, this relationship was not strong: In the USA only 29.9 % and in India only 15.8 % of participants associated fear with looking white. Furthermore, this was not the strongest association for the emotion or the color in either country: As was the case in Study 1, in both countries, fear was more associated with black and gray than white, and looking white was more associated with happiness than fear. Thus, this study found that fear was not commonly associated with looking white in either the USA or India.

Table 5 Percentages of participants indicating that emotions correspond to each color, Study 2, selected results

Although lexical analyses in several countries associated fear with a white face (see Table 1), other analyses have shown that loss of blood flow is represented by a variety of colors depending upon the culture, including “paleness, whiteness, yellowness, blueness, etc.” (Maalej, 2007, p. 93). The lack of specificity for the appearance of fear echoes previous work on the recognition of fear. In some cultures, facial expressions for fear are not as well recognized as facial expressions for other emotions (Ekman & Friesen, 1971; Matsumoto & Ekman, 1989). We therefore conclude there is no single association between fear and how someone looks: Saying that someone looks white or has a white face is not a universal and unambiguous way of saying that they are scared.

General discussion

The purpose of this research was to determine if certain descriptors have consistent associations with happiness, sadness, anger, and fear, so that they could be used in psychological tests without causing confusion to test takers. For most of our hypotheses, significantly more than 50 % of participants in both countries selected the hypothesized descriptors. This provides support for the universality of many of these associations. The strongest support was for the associations of happiness with up and bright. These associations were universal (found in both the USA and India), ubiquitous (endorsed by more than 90 % of participants in those countries), and unique (those descriptors were not commonly associated with any of the other emotions we tested). Given the strength of these associations, we conclude that up and bright are unambiguously associated with happiness in the USA and India, and can be used on psychological tests without being likely to cause confusion.

However, even when most participants agreed with a particular association, the percentages were frequently far below 100 %. For example, in the USA, only 71.7 % associated hot with anger and only 65.9 % associated blue with sadness. Moreover, descriptors were often associated with multiple emotions, making them ambiguous. For example, almost all participants in the USA associated dark with fear, sadness, and anger (reinforcing previous research showing that negative emotions in general are associated with dark; Barcelona, 2003). Even when only a single emotion was hypothesized to be associated with a descriptor, multiple associations were sometimes found. For example, participants from the USA associated paralyzed with sadness and anger as well as fear, and associated down with fear and anger as well as sadness. Thus, the associations between emotions and descriptors were usually not one-to-one in the USA.

In addition, the percentage of people who endorsed the hypothesized associations was uniformly smaller in India than in the USA. For nine of these associations, the difference in the percentages was statistically significant. For three (happiness with warm, fear with cold, and sadness with blue), the percentage was less than 50 % in India. The lower percentages for India were expected, because we originally selected these 14 emotion-descriptor pairs based upon evidence that came primarily from the USA. However, these lower percentages provide a strong warning to test designers: Figurative language will probably be understood by fewer people when it is used in another country or culture.

The differences between India and the USA are likely even larger than they appear to be in this study. Recall that we filtered our participants. To ensure that participants understood the instructions and the items, we only analyzed data from people who indicated that they were very comfortable with reading, writing, speaking, and listening to English. Because of these strict criteria, we filtered out almost 75 % of the participants from India. Previous research has shown that bilingual people who are very fluent with a language interpret metaphors in a way that is more similar to monolinguals in that language and that the language used during a study influences how bilinguals think about metaphors (Lai & Boroditsky, 2013). Thus, using English-language materials to test participants who are very comfortable with English created a conservative test for differences between the groups. The differences between the populations of English-speaking people from India and the USA are likely larger than shown here.

Additionally, the use of closed-ended questions may have made the USA and India appear more similar than they truly are and may have made the associations between emotions and descriptors appear stronger than they are. It might be that some (or even most) participants associate some of the descriptors with emotions that were not included in our response options. For example, perhaps many people associate up or bright with pride or excitement. If so, then the associations of up and bright with happiness are not unique. Moreover, if people in one country associate up with pride and people in another country associate up with excitement, then people in the two countries would not interpret the word up in the same way. These limitations make the similarities we found less conclusive but make the differences even more impressive. Future research on the emotional associations of descriptors should, at the minimum, include a longer list of possible emotions. Ideally, this research should use open-ended questions.

Future research should examine additional samples. Our two studies examined MTurk users who are very comfortable with English. However, sound conclusions about each culture and sound comparisons across cultures require data collection from multiple groups within each culture (Matsumoto & Yoo, 2007). Therefore, before concluding that the English words up and bright are universally and uniquely associated with happiness (and that other descriptors are not unambiguously associated with particular emotions), other types of participants should be examined in the USA and India (e.g., managers and students, teenagers and retirees, and specific ethnic groups) and in other countries and cultures. Furthermore, future research should include additional languages, especially ones where the lexical space for emotions is quite different. This is important both for the integrity of tests that were initially developed in those languages and for the validity of tests that were translated from English.

Given that only two of the 14 associations were universal, ubiquitous, and unique, we conclude that most figurative language is likely to be ambiguous and cause confusion among test takers, both in the culture in which the test was developed and even more so if the test is used in another culture (either in its original language or when translated). In particular, the use of blue (CES-D; Eaton et al., 2004; Radloff, 1977; and PANAS-X; Watson & Clark, 1994; Watson et al., 1988) and downhearted (PANAS-X) as synonyms for sadness appears to be problematic, especially because these words are presented in isolation on the PANAS-X, with no context that might allow readers to infer their meaning. The phrases wound up (Beck Depression Inventory; Beck, 1972), short-tempered (Hogan’s Empathy Scale; Hogan, 1969), sluggish, and sheepish (PANAS-X; Watson & Clark, 1994; Watson et al., 1988) might also be interpreted inconsistently by test takers. Circumstantial evidence for problems with these items is given by factor analytic results. None of the four figurative words from the PANAS-X were included in the 20-item version of the PANAS (Watson et al., 1988), which was based upon factor analyses of the PANAS-X. Also, none survived translation and factor analyses to be included on the 42-item Estonian General Affect Scale, which was modeled after the PANAS-X (Allik & Realo, 1997).

Future research should explicitly examine how people interpret figurative language that appears on tests related to emotions (e.g., depression, emotion perception, emotional expressiveness, or emotional control). Researchers can ask test takers how they interpret specific items. Perhaps readers will be better able to interpret that figurative language given the context in which it appears (i.e., as part of a sentence-length item, and as part of a test related to emotions). Ideally, researchers should use semi-structured interviews with open-ended questions to explore the conceptual equivalence of these items across individuals and across cultures (see e.g., Corral & Landrine, 2009; Matsumoto & Yoo, 2007).

One such qualitative analysis was completed by Haboush-Deloye (2013) for tests of emotion suppression and found many differences in the interpretation of items containing figurative language. For example, participants were asked how they interpreted the item, “I try to control unpleasant emotions and strengthen positive ones” (an item that occurs on the Emotional Skills and Competence Questionnaire; Takšić, 2005). Participants with an individualistic worldview defined control as not letting others see the feeling, whereas those with a collectivist worldview defined control as managing the emotions so they are not controlling the person. Later, participants were asked how they would interpret the item, “When I am anxious, I smother my feelings” (an item that occurs on the Courtauld Emotional Control Scale; Watson & Greer, 1983). Several participants said they were unsure how to interpret the word smother in the context of the question. Of those who could define smother, the majority said that smother meant to hide emotions from others, but some said it meant to hide the emotions from oneself. Thus, figurative language was interpreted differently both between and within cultural groups. Future researchers should look for such differences in interpretation.

In addition, researchers should explicitly test whether people from different countries and cultures score differently on items that include figurative language. For example, the Mayer-Salovey-Caruso Emotional Intelligence Test (Mayer, Caruso, & Salovey, 2000) contains a section that tests knowledge of sensations that are associated with emotions. Many of those associations appear to be figurative rather than literal (for example, no emotion is literally “sharp,” “dark,” or “closed”). If those figurative associations vary from one country or culture to the next, that section could provide systematically lower scores for people who are not from the USA (where the test was developed and normed). Researchers should therefore test whether differences between countries (and between cultural minorities and cultural majorities in the USA) are larger on these items than on other parts of the test.

Our findings also have practical implications: Test users should be cautious in using existing tests that contain figurative language. If a test contains many items, then the presence of a single confusing item is not likely to have a large effect on test scores. The problems are likely to be greatest if several items contain figurative language (e.g., the PANAS-X contains four such items) or if a short form is being used. Clinicians should be particularly careful if they are interpreting the scores for individual test takers and the initial scores are close to a cut-off point. Under these circumstances, clinicians should assess their clients’ interpretation of items containing figurative language.

Finally, the problems with figurative language likely generalize to content areas besides emotions: Even if a phrase is common, it may not be universally understood. For example, in the USA, sharp is figurative language for intelligence and driven is figurative language for motivation. However, if sharp and driven do not have those same meanings in other countries or cultures, using these words to denote intelligence and motivation on psychological tests could cause confusion among test takers and result in test bias. This is true even if the test will be used exclusively in the USA, given the number of overlapping subcultures that exist here. If test designers want to use figurative language on a test of any psychological construct, they must ensure that the concepts that underlie the words have the same meaning in different countries and cultures (Matsumoto & Yoo, 2007). In the absence such evidence, test users should be aware of possible test bias on items containing figurative language and test designers should be cautious using figurative language when designing new tests.