A number of emotion stimulus sets have been developed for research in human emotion perception and affective computing. The majority of these sets focus on static facial expressions of basic emotions (anger, disgust, fear, happiness, sadness and surprise)—for example, the Radboud Faces Database (Langner et al., 2010), the NimStim set of facial expressions (Tottenham et al., 2009), the 3D Facial Emotional Stimuli (Gur et al., 2002), the Karolinska Directed Emotion Faces (Lundqvist, Flykt, & Öhman, 1998) and the Pictures of Facial Affect (Ekman & Friesen, 1976). However, an increasing number of dynamic emotion stimulus sets have recently been developed particularly for use in affective computing—for example, the Cohn–Kanade AU-Coded Facial Expression Database (Kanade, Cohn, & Tian, 2000; Lucey et al., 2010) and the Video Database of Moving Faces & People (O’Toole et al., 2005). See Bänziger, Mortillaro, & Scherer (2012) for an overview of emotion stimulus sets.

The stimulus sets mentioned above predominantly focus on facial expressions of emotion, a fact that reflects the greater attention of emotion research to facial expressions than to other expressive modalities, such as body gestures. Faces are a salient signal of another’s affective state, and the ability to perceive and understand facial expressions is an important skill for social interaction. Furthermore, certain clinical populations demonstrate impairments in the recognition of facial expressions (Bölte & Poustka, 2003; Comparelli et al., 2013; Meletti et al., 2003; Sucksmith, Allison, Baron-Cohen, Chakrabarti, & Hoekstra, 2013). However, previous research has highlighted that emotion recognition is influenced by an interaction between the expressive modalities of face, voice and body (Meeren, van Heijnsbergen, & de Gelder, 2005; Regenbogen et al., 2012; Van den Stock, Righart, & de Gelder, 2007). Hence, an emotion stimulus set that permits the investigation of emotion perception across the expressive modalities of face, voice and body, both individually and combined, would be a valuable resource for the scientific community. In addition, emotion expression is dynamic in nature, and previous research has suggested that the use of dynamic emotion stimuli is more ecologically valid, results in better emotion recognition (Ambadar, Schooler, & Cohn, 2005; Trautmann, Fehr, & Herrmann, 2009; Weyers, Mühlberger, Hefele, & Pauli, 2006), and also activates a wider neural network compared to static emotion stimuli (Kilts, Egan, Gideon, Ely, & Hoffman, 2003; Sato, Kochiyama, Yoshikawa, Naito, & Matsumura, 2004; Trautmann et al., 2009).

The EU-Emotion Stimulus Set was created as part of the ASC-Inclusion project within the European Community’s Seventh Framework Programme (FP7/2007-2013; www.asc-inclusion.eu). The aim of the ASC-Inclusion project was to create an online socio-emotional training tool for children with a diagnosis of autism spectrum condition (ASC). In the course of the development of this online training tool, an emotion stimulus set was required that portrayed a range of emotions and mental states through the three expression modalities of face, voice and body gesture, as well as contextual social scenes. However, no such stimulus set was available covering both the range of emotions/mental states and modalities required. The only other, somewhat similar dynamic stimulus sets known to the authors that are freely available for scientific use are the Amsterdam Dynamic Facial Expression Set (Van der Schalk, Hawk, Fischer, & Doosje, 2011), the CAM Face–Voice Battery (Golan, Baron-Cohen, & Hill, 2006), and the Geneva Multimodal Emotion Portrayals Core Set (GEMEP-CS; Bänziger et al., 2012).

The EU-Emotion Stimulus Set expands on the previous dynamic emotion sets in (1) the numbers of emotions and mental states represented, (2) the age range of the actors expressing the emotions, and (3) the expression modalities through which these emotions/mental states are portrayed. In this report, we outline the development of the visual stimuli from the EU-Emotion Stimulus Set, along with the validation results and emotional ratings from typically developed adults. The EU-Emotion Stimulus Set is freely available to investigators for use in scientific research, and can be downloaded from www.autismresearchcentre.com/arc_tests.

Method

Stimulus creation

Stimulus set

The EU-Emotion Stimulus Set contains N = 418 visual stimuli (video clips, durations 2–52 s) of 20 different emotions and mental states, plus neutral. This stimulus set is dynamically portrayed by 19 actors through facial expressions (n = 249), body gesture scenes (n = 82), and contextual social scenes (n = 87). See Appendixes A, B and C for example visual stimuli. A total of N = 2,364 vocal stimuli (English n = 698, Swedish n = 1,012, Hebrew n = 654) have also been produced and validated (the methodology and results for these vocal stimuli are currently in preparation for publication separately).

Emotions and mental states

The set consists of the following 20 emotions/mental states, plus a neutral state: afraid, angry, ashamed, bored, disappointed, disgusted, excited, frustrated, happy, hurt, interested, jealous, joking, kind, proud, sad, sneaky, surprised, unfriendly, and worried. These states were selected from an initially evaluated set of 27 emotions/mental states (see Lundqvist et al., 2014). ASC clinical experts (n = 47) and the parents of children with ASC (n = 88) rated these 20 emotions/mental states as being the most important for social interactions, out of the potential 27. Limited research has investigated the visual or auditory distinctiveness/uniqueness of these more complex emotions/mental states. Two prior studies that have examined more complex emotions/mental states (i.e., other than the six basic emotions: anger, disgust, fear, happiness, sadness, and surprise) showed the complex emotions/mental states investigated to be discretely identifiable through facial and vocal expression (Bänziger et al., 2012; Golan et al., 2006). In the present study, no prior assumptions were made as to whether each of these 20 emotions/mental states hold discretely identifiable facial, vocal, and body expressions, but rather, the study provided the opportunity to examine the uniqueness of these emotions/mental states across modalities while concurrently developing a valid set of emotion stimuli.

Actors

The stimuli were depicted by 19 actors of different ethnicities, age range 10–70 years old (ten female and nine male). See Table 1 for the actor demographics. The actors were recruited from professional acting agencies or drama schools within the United Kingdom. A director was hired with experience in theatre and film production to manage the filming. Approximately 80 actors auditioned for the roles. One member of the research team and the director selected the 19 actors on the basis of the quality of their audition performances. The actors were advised/guided through their performance by the director.

Table 1 Actor demographics and modalities completed

Instructions

The actors were first filmed performing the facial expressions, followed by the body gesture scenes, and finally the contextual social scenes. The actors performed each scene three times for all modalities. No vocalizations were produced while the actors performed the facial expressions, body gesture scenes or the contextual social scenes. Filming was shot using an infinite white background.

For the facial expressions, each actor (N = 17) was asked to portray a subset of ten emotions/mental states, plus neutral. Ten of the emotions/mental states were assigned to each actor rather than the full 20, due to filming time constraints. The 20 emotions/mental states were divided into two sets of ten emotions/mental states each (three basic emotions and seven complex emotions/mental states). The actors received facial expression scripts for either Set A or B. The two sets were distributed among the actors to try to ensure even representation across genders and ages. For the facial expression scenes, a straight camera angle and standard zoom was applied, giving a frontal view of each actor with only the shoulders and head visible within the shot. Scripts that described a possible scenario in which an emotion/mental state would occur were provided to help the actors portray the different emotions/mental states. An example follows of a script for the facial expression ashamed: “Face: Your mum caught you eating the cookie you stole from the kitchen and she is angry.” The use of scripts helped balance the intensity at which the emotions/mental states were displayed across actors, and specific instructions were also provided on intensity. The six basic emotions (anger, disgust, fear, happiness, sadness, and surprise) were each portrayed twice through facial expressions, once at a high intensity and once at a low intensity, and the other 14 emotions/mental states were all portrayed through facial expression at a high intensity. The following example instruction was given to help guide the intensity of expression across all modalities: “High Intensity—In this situation, you are *quite* ashamed; not *a little* ashamed, not *very* ashamed, but *quite and unmistakably* ashamed.”

The body gesture scenes were completed by eight of the actors. Each emotion was individually portrayed through a body gesture scene by one actor alone, with the exception of the following six emotions/mental states—ashamed, hurt, jealous, joking, kind, and unfriendly—in which two actors acted out the body gesture scene. The six emotions/mental states listed above were presented in a scene by two actors, because it was felt that these emotions were difficult to act out/interpret without the involvement of a second character to whom the emotion/mental state could be directed. These emotions/mental sates depend on a social interaction for their expression. Again, the emotions/mental states were divided into four scripts, two for the individual body gesture scenes and two for the dual-actor body gesture scenes. Each actor received two scripts (a total of ten emotions/mental states plus neutral to portray), one for the individual body gestures and the other for the dual-actor body gesture scenes. Scripts were provided to guide the actors’ performances and intensities of expression. All emotions/mental states were performed at a high intensity, and the entire body was visible within the shot.

For the contextual social scenes, one to three actors were grouped together and requested to perform a social scenario. All 19 actors participated in the filming of the 87 contextual social scenes. Each contextual social scene depicted one to four emotions. Again, scripts were provided detailing the social interactions required. The social scenes were filmed using long to medium shots of the whole body and facial close-ups. The shots were edited together to depict the social scene. The actors were requested to act out the emotion in the scenario as naturally as possible. An example social scene script follows: “Sneaky and Afraid—James sneaks into room, hides a fake spider under Laura’s book sneakily. Leaves room. Laura walks into the room, picks up book from desk and discovers a spider hiding underneath—she gets a fright and drops book.” The actors were requested to perform all contextual social scenes at a high intensity.

Stimulus validation

Due to the vast quantity of footage produced, it was not possible to validate every emotion/mental state performance recorded. Three members of the research team individually went through the filmed footage and selected the best clip of each emotion/mental state portrayal (each emotion/mental state was performed three times by each actor) to go forward for validation. For the majority of the footage, the three raters agreed on the best emotional portrayal. For any disagreement, the item with the majority agreement was chosen or two takes of the same item were put forward for validation.

Stimulus set and surveys

The face, body gesture and contextual social scene stimuli were divided into 14 separate online surveys (six face surveys, two body gesture surveys and six social scene surveys). The stimuli for each modality were divided into the corresponding surveys, ensuring that the emotions/mental states were evenly distributed. Each of the facial expression surveys included approximately 41 stimulus items, and each of the contextual social scene/body gesture surveys included 15–30 stimulus items. Each survey took 20–30 min to complete. The 14 surveys were first developed in English and then translated into Swedish and Hebrew (using back translation) by two native speakers for each language who were also fluent in English. The 14 surveys were distributed in the UK, Sweden, and Israel separately (14 surveys × 3 languages).

Participants

Altogether, a total of 1,231 complete responses were recorded across the 14 validation surveys (803 female, 428 male) from the three data collection sites. A minimum of 54 participants completed each survey (approximately 18 participants per data collection site and survey). The average age of the participants was 44 years (SD = 16.7, range: 16–84). Participants were recruited using existing research participant databases and university mailing lists, as well as through online resources—for example, on social media sites such as the project Facebook page.

Procedure

To estimate the validity of the EU-Emotion stimuli, we investigated whether each stimulus could be recognized as the intended emotional/mental state expression and the degree to which a stimulus conveyed an emotional impression.

Recognition task

The recognition rates were investigated by using a forced choice task. For each stimulus, a video of the emotional stimuli was shown, together with six counterbalanced response options. The participants were instructed to “Please select the label which best describes what this person is expressing.” The six response options consisted of the target emotion/mental state (i.e., the emotion/mental state that the actor intended to express), four control emotions/mental states, and a None of the above response option. The None of the above option was included to prevent artifactual agreement (Frank & Stennett, 2001).

In the related literature, when researchers have validated stimuli across a smaller range of emotions, such as the six or seven basic emotions (see, e.g., Goeleven, De Raedt, Leyman, & Verschuere, 2008; Langner et al., 2010), they have adopted adopt an “all against all” approach when selecting response options. For example, when presenting a happy stimulus, afraid, angry, disgusted, happy, neutral, sad, and surprised are used as response options, and the same set can be used when presenting a neutral or a sad stimulus, and so forth. Because of the large number of emotions in the EU-Emotion Stimulus Set, it was not possible to use this approach, since a 20-option input was too arduous and overwhelming for participants. To accomplish a limited set of response options per stimulus while also maintaining a fair comparison across emotions, we adopted an innovative approach.

Because the task itself consisted of selecting the label that matched with the stimulus from among a handful of response options, we wanted to present response options for each emotion/mental state that were equal in difficulty across all target emotions/mental states. To accomplish this, we used the data of another study from our group (Lundqvist et al., 2014). In this study, over 700 participants rated the similarity/dissimilarity of each of the 20 emotions/mental states involved here against all of the other 20 emotions/mental states. Since this study resulted in a 20×20 emotion similarity/dissimilarity matrix, we adopted the strategy of first specifying different ranges of similarity (corresponding to very similar, quite similar, quite dissimilar, and very dissimilar), and then (2) selected appropriate response alternatives from the similarity/dissimilarity matrix within these four ranges for each target emotion. This selection logic was applied in order to guarantee that the sets of response options presented for each target emotion/mental state were equally easy or difficult across all emotions/mental states and stimuli (see the “Emotions Matrix” in the supplementary data for a list of the target and control emotions/mental states, along with the similarity ranges used for selection).

Emotional impression

The emotional impression of each stimulus was assessed through subjective ratings of valence, arousal, and intensity. For valence, the question “How Positive or Negative is this emotional expression?” was used. The participants were instructed to answer on a scale from 1 to 5, where 1 = Very negative and 5 = Very positive. For arousal, the question “How strongly does this emotional expression make you feel?” was used. The participants were instructed to answer on a scale from 1 to 5, where 1 = Not at all and 5 = Very strongly. For intensity, the question “How intense is this emotional expression?” was used, and participants were instructed to answer on a scale from 1 to 5, where 1 = Calm, and 5 = High intensity.

Data treatment and analysis

For the recognition data, raw recognition rates were first calculated for all stimuli and all response options. Since we were using six response options, the raw scores were then adjusted for chance rates by using Cohen’s kappa [True Correct = (Proportion of Raw Correct – (1/6) / (5/6)] (Tottenham et al., 2009). The average recognition scores and emotional rating scores were calculated separately per stimulus and then accumulated over emotions, separately per emotion intensity and per modality (facial expression, body expression, or social scenario). In the tables (here and in the supplementary data), true scores below 0 have been adjusted to 0. Also, to give an overview as to what degree the main dependent measures were correlated, we calculated intercorrelations for the (chance-corrected) recognition scores and the valence, arousal, and intensity ratings using Pearson’s correlation coefficient. A comparison of the emotion recognition scores cross-culturally was not possible, due to variation in the age ranges and genders of participants between the data collection sites.

Results

Table 2 shows an overview of the recognition scores and emotional rating scores for each expression modality, each emotion/mental state, and (when applicable) the two intensity levels. Corresponding data are also made available on an individual stimulus item level as supplementary data (see the “Validation Data” in the supplementary data).

Table 2 Summary of the validation data for each of the 20 emotions plus neutral, separately per expression modality and (where applicable) expression intensity (denoted*High and**Low)

On the modality level, the mean chance-corrected recognition (CCR) scores for expressions made via the face were 63 % (SD = 16 %), for body gestures were 77 % (SD = 11 %), and for social scenarios were 72 % (SD = 17 %). For all modalities, most emotions/mental states resulted in CCR scores between 60 % and 90 % (see Table 2 and the supplementary data for details). However, some emotions/mental states even at an average level resulted in very low scores, as for facial expressions: kind (M = 9 %), jealous (M = 14 %), and unfriendly (M = 9 %) and for body gestures jealous (M = 3 %). These lows are the exception; a number of chance-corrected scores as high as 90 % or above were also be found, such as for facial expression joking (M = 90 %), body gestures disappointed (M = 97 %), disgusted (M = 90 %), frustrated (M = 98 %), and kind (M = 91 %), and social scenarios afraid (M = 94 %) and frustrated (M = 96 %). The intensity of the basic emotions also clearly influenced the results, resulting in an average of 78 % for high-intensity facial expressions and 63 % for low-intensity expressions (see Table 2 for details). The modality through which emotions/mental states were expressed also appeared to play an important role in recognition. Unfriendly and kind were both recognized poorly (9 % recognition) when expressed through facial expression. Recognition for unfriendly increased to 68 % when it was portrayed through body gestures, and to 62 % for social scenarios. Similarly for kind, recognition increased to 91 % for body gestures and to 61 % for social scenarios. Jealous also achieved low recognition scores through facial expressions (13 %) and body gestures (3 %), but increased to 44 % when represented within a social context. The emotion surprised was recognized best when it was expressed through the face (79 %) and body (75 %), but recognition decreased to 49 % when it was represented through social scenarios. Overall, emotions/mental states expressed through body gesture scenes showed the highest mean recognition scores (see “Emotion Recognition by Modality” in supplementary data).

The results also showed that errors were made across response options in a manner that was in line with expectations from how conceptually similar the control emotions/mental states were to the target emotion/mental state. Overall, the CCR scores across modalities were 4 % for Control Emotion 1 (the emotion most similar to the target emotion), 1.9 % for Control Emotion 2, 1.1 % for Control Emotion 3, and finally 0.2 % for Control Emotion 4 (the least similar to the target emotion; see the “Emotions Matrix” in the supplementary data for further details). This demonstrates that control emotion/mental state CCR scores declined with decreasing control-emotion-to-target similarity.

The correlation analyses showed a low level of intercorrelation between the recognition, valence, arousal, and intensity measures (R 2 scores on average around .05), with the exception of the correlation between arousal and intensity ratings. For these measures, the r scores were .90 on average, and between .78 and .96 over the different item categories (see Table 3).

Table 3 Summary of intercorrelations between (chance-corrected) recognition scores and valence, arousal, and intensity ratings, using Pearson’s correlation coefficients

Discussion

The EU-Emotion Stimulus Set is a validated collection of 418 dynamic multimodal emotion and mental state representations displayed through facial expressions, body gestures, and contextual social scenes. Overall, moderate to high recognition scores were achieved across each of the modalities: facial expression (M = 63 %, SD = 16 %), body gesture scenes (M = 77 %, SD = 11 %), and contextual social scenes (M = 72 %, SD = 17 %). The validation results varied for each emotion/mental state, depending on the expression modality through which that emotion/mental state was portrayed. For the facial stimuli, joking had the highest recognition rate, at 90 %. Kind and unfriendly facial expressions had the lowest recognition rates, both at 9 %. Disappointed and frustrated had the highest body gesture recognition scores, at 97 % and 98 %, respectively, whereas jealous portrayed through body gesture had a recognition rate of only 3 %. For the contextual social scenes, frustrated was the most identifiable (96 %), and jealous again had the lowest recognition score (44 %). The validation results provide valuable data on the uniqueness of the expressions and the modalities through which emotions/mental states are most clearly displayed. For example, unfriendly was not easily identified through facial expression (9 % recognition rate); however, recognition increased to 68 % when it was portrayed through body gestures. The results showed that jealous was the most difficult to identify, possibly due to its complex nature and need for contextual cues.

Turning to the correlation results, although a number of significant findings are reported, the effect sizes of the majority of these are small. The strongest correlation was found between the intensity and arousal scores (with emotions/mental states that scored high in arousal also scoring high in intensity across facial expressions, body gestures, and social scenes), demonstrating a strong relationship between these two emotion dimensions. Bänziger et al. (2012) also reported that the intensity ratings of their emotion stimuli were influenced by arousal level, with low arousal being associated with low intensity.

The recognition results of the EU-Emotion Stimulus Set are slightly higher than those of Bänziger et al. (2012), whose stimulus set (GEMEP-CS) is the closest to the EU-Emotion Stimulus Set in terms of the range of emotions/mental states and modalities portrayed. Bänziger et al. validated 154 dynamic facial expression stimuli (face and upper torso visible) representing 17 different emotions, producing a mean overall uncorrected recognition score of 47 %. The difference in overall facial recognition scores between the EU-Emotion Stimulus Set (63 %) and GEMEP-CS (47 %; Bänziger et al., 2012) is likely explained by variation in the validation methodologies used between the studies, which we discuss further below.

Tottenham et al. (2009) reported an overall 79 % CCR rate for the NimStim facial stimulus set. This set consists of 672 static facial expression stimuli representing eight different emotions (the six basic emotions—anger, disgust, fear, happiness, sadness, surprise—plus neutral and calm). Since the EU-Emotion set has a wider range of emotional/mental state expressions, and at two levels of intensity for basic emotions, the comparatively lower overall recognition rate reported in our data is likely due to the greater complexity and variety of our stimuli. Indeed, the overall mean recognition rate of our basic facial emotion stimuli (high-intensity only) plus neutral was 78 %, which is comparable to the results of Tottenham et al. Similar recognition rates have also been reported by Langner et al. (2010), reporting an uncorrected mean recognition rate of 82 %, for a set consisting of the six basic emotions plus neutral and contempt. Finally, the validation results for the KDEF set reported by Goeleven, Raedt, Leyman, and Verschuere (2008) revealed an overall uncorrected emotion recognition rate (for the six basic emotions + neutral) of 72 %.

In the present study, a subset of four emotions/mental states were selected from the 20 emotion/mental state labels to act as the control response options, in addition to None of the above, in the forced choice validation task. The four control response options varied depending on the target emotion/mental state portrayed. The use of a forced choice task is a common method that has been used in previous emotion validation studies (Bänziger et al., 2012; Tottenham et al., 2009). The NimStim face stimulus set portrays the six basic emotions plus neutral and calm. Participants had to choose between these eight labels, plus a None of the above option, within this forced choice validation study (Tottenham et al., 2009). The mean corrected recognition rates were high for the individual emotions (.54–.95). This high recognition rate could possibly have been due to the limited number of response options to choose from and the discreteness of the basic emotions. Bänziger et al. had participants choose between 17 emotion labels and an Other emotion option. This resulted in lower mean uncorrected recognition rates (.28–.79) than those reported by Tottenham et al. This was likely due to the fact that the emotions investigated by Bänziger et al. were greater in number, more complex (emotions beyond the six basic emotions were investigated, such as tenderness, pleasure, and despair) and potentially overlapped with one another (e.g., sadness and despair, elated joy and amusement, contempt and irritation), making recognition more challenging in their validation task.

When designing the validation task for the EU-Emotion set, we decided not to present the participants with the full list of 20 emotions/mental states, plus neutral and a None of the above option, following the validation design of Bänziger et al. (2012). Presenting all 20 options would be visually overwhelming and time burdening, and would potentially result in the participants not considering each of the emotion/mental state labels in turn with regard to the stimulus presented. Also it could increase the risk of participants relying on the basic emotion labels (anger, disgust, fear, happiness, sadness, and surprise) as broad categories when viewing the stimuli, rather than considering the particular characteristics of the more complex emotions/mental states. This led to the decision when selecting the control options from the similarity matrix not to include control emotions/mental states with too high a similarity rating with the target emotion/mental state. For example, for the target emotion disappointed, the emotion sad was not used as a control response option, due to its strong similarity and overlap with disappointed in both meaning and expression. Without sufficient context it would be difficult to distinguish these two emotions from one another, which would possibly lead to participants focusing on the basic emotion labels. This difference in methodology and also in the numbers of emotions/mental states to choose between (6 vs. 17) in the present study and that of Bänziger et al. likely explains the variation in mean recognition rates reported between these studies. In the EU-Emotion Stimulus Set validation task, the four control emotions/mental states varied for each target emotion/mental state. This raises the question of whether the recognition scores for each emotion/mental state can be compared. Although the control options did vary, their similarity with the target emotion/mental state was consistent across all target emotions/mental states, as was the ease/difficulty level of the forced choice task. This method provided a standardized measure against which each stimulus was judged and prevented some potential issues from occurring, as outlined above.

The effect of the age and ethnicity of the different actors on participant recognition rates was not explored because each actor was provided with a subset of emotions/mental states to portray for each modality, preventing a comparison across actors; furthermore, it would be difficult to disentangle acting experience from age effects. Further investigation into cross-cultural, age, and gender effects on participants’ emotion recognition would be worthy of further investigation but was also not possible with the present data set, due to differences in participant characteristics between the three data collection sites. The gender and age distributions were homogeneous between the surveys within each data collection site, but heterogeneous between sites.

Given the overall high recognition scores and the findings comparable to those from other emotion validation studies from the US, Belgium, Switzerland, and the Netherlands with predominantly younger populations (Bänziger et al., 2012; Goeleven et al., 2008; Langner et al., 2010; Tottenham et al., 2009), suggests this innovative validation design seems to be effective when investigating a large number of emotion/mental state categories. Despite some of the limitations discussed above, the EU-Emotion Stimulus Set is a valuable resource for scientific research into emotion perception across both individual and integrated expression modalities.