Emotional facial stimuli are an important tool for studying a variety of topics in psychology, including face perception, emotion interpretation, and influence of emotion on cognitive and social processes (Adolphs, Baron-Cohen, & Tranel, 2002; Breiter et al., 1996; Morris et al., 1998; Nelson et al., 2003; Todd, Lewis, Meusel, & Zelazo, 2008). Pictures of emotional faces have been shown to be useful in evoking emotions in participants, while still maintaining a high level of experimenter control (e.g., Calvo & Lundqvist, 2008), and have been shown to reliably activate the brain areas associated with face processing and emotion (e.g., Sabatinelli et al., 2011). A number of facial stimulus sets have been created, each of which has various strengths and limitations. The limitations that are of most concern for research in developmental neuroscience are small stimulus sets, limited age ranges of models in these sets, and photographs that are not well-standardized with respect to important parameters such as eye placement or head angle (see Table 1).

Table 1 Comparison of emotional facial stimulus sets

Developmental researchers are often interested in comparing how different age groups react to emotional stimuli (e.g., Nelson et al., 2003; Thomas et al., 2001) or examining how clinical populations of children react to emotional faces (e.g., Shechner et al., 2013; Wagner, Hirsch, Vogel-Farley, Redcay, & Nelson, 2013), but most studies have relied on stimulus sets that only include adult faces. For developmental work, having a broad range of ages is crucial. Sets that only include adult faces do not allow for use of peer-aged faces in studies done with children, and those that only include child faces are also not sufficient for developmental research, because an adult comparison group is necessary for many studies.

A related issue concerns the validation process. For research with child participants, it is desirable to include both children and adults in the validation process of the photographs to ensure that they are suitable for use by all ages. A facial stimulus set that includes a continuum of child and adult faces can answer unique questions about differences in reaction to adult versus peer emotions, and could be important for fields such as peer relations, authority, adolescent development, psychopathology, as well as for research in developmental cognitive neuroscience and related fields.

Well-standardized stimuli are necessary for neuroimaging/physiological studies, yet many available stimulus sets do not standardize eye placement or head angle. For example, in research using electroencephalography (EEG) and recording event-related potentials on a millisecond time scale (e.g., Todd et al., 2008), variations in eye placement may cause unwanted variation in how people scan the faces, changing the timing of ERPs in uncontrolled ways. Also, sets need to include enough well-validated pictures in each age range for each emotion in order to provide the high number of unique trials often necessary for research using ERP, fMRI, and other measures (Egger et al., 2011; Thomas et al., 2001).

A number of facial stimulus sets have been created for use in scientific research. Table 1 shows the major stimulus sets available that provide 2-D images and do not focus on a specific non-Caucasian ethnicity (see Gross, 2005, for a more extensive list of facial stimulus sets). The table presents an overview comparison of these sets with the Developmental Emotional Faces Stimulus Set (DEFSS), which was designed to add to the existing sets a new set with the following characteristics. First, the DEFSS includes a wider range of model ages than existing sets, in that a total of 116 participants between 8 and 30 years old were used as models. Starting at 8 years rather than 9 or 10, as in some other sets that include children (Egger et al., 2011; Mazurski & Bond, 1993), provides a better sample of prepubertal photographs.

Second, because the DEFSS focuses on the primary emotions—happiness, anger, fear, and sadness—it provides more photographs of these key emotional expressions than do other sets that include a greater number of different poses. The DEFSS also includes neutral expressions, which provides an important comparison for the positive and negative emotions (e.g., Breiter et al., 1996).

Third, all DEFSS stimuli are carefully standardized, with all eyes centered horizontally and at the same height, or vertical position. The pictures were validated by both children and adults. Also, information about the age, race, gender, emotion, and validation ratings is included for each photograph, making it easy for researchers to choose the photographs that are most appropriate for their studies.

Method

The development and validation of the DEFSS involved the following phases: (1) creation of the stimulus set, (2) processing the photographs, and (3) validation of the photographs.

Creation of the stimulus set

Photographs were taken at two locations: the University of Minnesota (35 participants), and the Minnesota State Fair (81 participants). For recruitment to be photographed at the university, families were contacted who had previously agreed to participate in university-based research. At the state fair, participants were recruited who walked through a building designated for research. The procedure and equipment used was the same at both sites. Consent was obtained for use of the photographs for research purposes and distribution to other researchers from each participant, as approved by the University of Minnesota’s Institutional Review Board. Participants over 18 provided consent for themselves, and participants under 18 provided assent and their legal guardian provided consent. Table 2 shows the demographic characteristics of the models who were photographed.

Table 2 Demographics of the models and raters

Participants were seated in an adjustable chair in front of a gray backdrop. A digital camera with a 50 mm lens (typically used for portraits) on a tripod was used, with an LED light mounted above it for standardized lighting. Participants were coached and shown examples to help them express each of five emotions (happiness, anger, fear, sadness, and neutrality). For child participants (under 12 years of age), scenarios were also given to elicit the different emotions, with participants being asked to express how they would feel in that situation. For example, for fear, children were told “I want you to think about how you would feel if you were walking on the street, and there was a big, scary, mean dog barking and running at you. Here is a picture of a person making a scared face. Can you show me on your face what you would look like if you were scared of the mean dog?” Participants were asked to remove all jewelry when possible. Hair and makeup were left as they were to create the most natural-looking photographs.

Processing the photographs

The color photographs were edited and cropped using Adobe Photoshop CS4. Photographs were cropped so only the head is included, and all photographs are the same size and shape. Eye placement was standardized so that the eyes are centered and appear at the same height. Eye placement standardization was done using a mask with lines indicating the center of the photograph and the height for the eyes. The photograph was adjusted horizontally so that the center of the nose was in the center of the photograph. The photograph was rotated if necessary, so that the eyes were aligned at the same height, and the photograph was adjusted vertically so the center of the pupils aligned with the mask line placed 45 % from the top of the photograph (see Fig. 1). When measuring a randomly selected subsample of 20 photographs, an average of 44.8 % of the picture was above the midline of the eye (SD = 0.60 %), and the two eyes were measured to be at the same level (M% difference = 0.45, SD = 0.51). The center of the nose was an average of 49.2 % from the left edge of the photograph (SD = 0.77 %). Any clothing or accessories that were visible in the photograph were edited to be grayscale, to reduce the color variation across pictures. Color balance was standardized across all photographs. The final color photographs measured 859 × 947 pixels.

Fig. 1
figure 1

Illustration of the editing process

Validation of the photographs

To assess whether the photographs provided valid depictions of the intended emotional expressions, 228 participants were asked to view and rate the photographs. Validation of the photographs was done in three settings by participants who had their pictures taken at the University of Minnesota (N = 35) or the Minnesota State Fair (N = 81), and also additional participants who only served as raters and validated pictures via an Internet-based survey (N = 172). Children for the internet survey were recruited from families who had previously indicated that they would be interested in participating in research through the University of Minnesota. Parents who consented to their child’s participation were e-mailed a link to the survey. Adults for the internet survey were recruited by mass e-mails to undergraduate and graduate students in social science departments at the University of Minnesota. Table 2 also shows the demographic characteristics of the participants who rated the pictures.

The procedure for rating the photographs was the same for participants who came to a site in person and those who rated pictures online. Participants who came in person were presented the rating procedure on a desktop computer. Participants who rated pictures online used whatever device was readily available to them. After a brief explanation of the ratings, participants were shown a photograph on the screen with a question presented below it: “What emotion do you think this face is showing?” Participants could then click on one of the following choices: Happy, Sad, Fearful, Angry, Neutral, or None of the Above. If they identified the photograph as an emotion (i.e., did not choose None of the Above), they were then presented with the same photograph on the screen with the question “How strong is that emotion?” below it, and they clicked to rate the intensity of the emotion on a scale from 1 (Just a little) to 7 (A lot). Responses were recorded by the survey program. The photographs presented to each rater were chosen at random by the computer program. Participants at the university and the state fair each rated 10–15 pictures, depending on time constraints. Participants who took the survey online each rated 50 pictures.

Photographs taken at the University of Minnesota were rated by later participants at that site, and those that did not have at least five ratings were then rated at the state fair. Photographs taken at the state fair were then rated by later participants at the fair, and those that did not have at least five ratings were then included in the internet survey. Through this process, every photograph was rated at least five times (M = 16.6, Range = 5–30).

Results

Creating the final set of valid photographs

Photographs were considered valid (and therefore included in the final set) if the correct emotion was identified by over 55 % of the raters (raters guessing at chance would have a 1-in-6 probability [16.7 %] of correct identification). Table 3 shows the numbers of valid photographs out of the total numbers of photographs taken, categorized by emotion, age, and gender. The percentage of valid photographs is also listed for each cell. In all, 70 % of the photographs taken were rated as valid. This percentage was very similar across model age groups and genders. However, the percentages of photographs rated as valid varied across emotions. Happy pictures were most often valid (97 %), followed by neutral ones (88 %), and then fearful (61 %), angry (56 %), and finally sad (47 %). Although this means that the numbers of pictures of the different emotions included in the final set vary, excluding photographs that were not valid gives us confidence that each of the photographs included in the set can be recognized as the correct emotion and therefore used as a stimulus for that emotion.

Table 3 Numbers of valid photographs/numbers of photographs taken (and percentages valid), by model age, gender, and emotion

Description of the photographs in the final set

Figure 2 shows example photographs of three models.

Fig. 2
figure 2

Example photographs from the validated DEFSS

The 404 photographs in the final set have the following characteristics. There are 65 angry photographs, 71 fear photographs, 112 happy photographs, 102 neutral photographs, and 54 sad photographs. In terms of the models, 144 photographs are of children (36 %), 154 of teens (38 %), and 106 (26 %) of adults. In all, 152 of the pictures (37 %) depict males, and 256 (63 %) are of females. Three hundred fifty-nine (89 %) of the photographs are of a Caucasian model, 41 (10 %) of a non-Caucasian model, and four photographs are of a model who did not identify race. Table 4 shows a summary of the average percentages of correct identification ratings for the valid photographs included in the final set, categorized by emotion, age, and gender. After removing pictures that were identified correctly by less than 55 % of the raters, the pictures included in the final set were identified as the correct emotion by an average of 86 % of raters (SD = 13.72, Min = 55, Max = 100). One-way analyses of variance (ANOVAs) showed that the percentage identified correctly did not vary by model age group [F(2, 401) = .378, p = .686], but did differ by emotion [F(4, 399) = 26.6, p < .001]: Happy faces had the highest percentage of correct ratings (96.2 %), and the rest of the emotions were identified correctly an average of about 80 % of the time (sad, 84.0 %; angry, 82.2 %; fear, 81.7 %; neutral, 81.8 %).

Table 4 Summary of percentages identified correctly and intensity ratings of the pictures included in the final set

Similarly, one-way ANOVAs showed that the rated intensity of the photographs did not differ by model age group [F(2, 401) = 2.64, p = .072], but did differ by emotion [F(4, 399) = 22.14, p < .001]. Happy photographs were generally rated as most intense (M = 5.12 on a 7-point scale, SD = 0.803), followed by angry (M = 4.57, SD = 0.970), both fearful (M = 4.41, SD = 0.872) and neutral (M = 4.41, SD = 0.593), and sad (M = 3.98, SD = 0.801). The mean intensity rating of each photograph was correlated with the percentage of correct emotion identification ratings [r(404) = .48, p < .001], indicating that emotions that were perceived as strong were more often identified correctly.

Table 5 shows the distribution of participants by how many of their photographs were rated as valid and included in the final set. Twenty of the participants have a full set of the five emotions included.

Table 5 Numbers of pictures per model included in the final set

Documents available

The DEFSS is available to download from http://reflectionsciences.com/resources/researchers/. Two documents are made available along with the photographs, to aid researchers in using the stimulus set. The first document includes the characteristics of each individual photograph: model ID, sex, age, and race/ethnicity of the model, as well as the number of times the photograph was rated, the percentage of correct emotion identifications, and average intensity ratings. A second document includes characteristics of each model who had photographs taken: model ID, sex, age, race, and a checklist of which emotions for that participant are included in the final set. Each photograph file is named with the model ID, sex, age, and emotion (E.g. 1_F8_Angry; 40_M12_Happy). Therefore, complete information about each photograph is available to all researchers who choose to use this stimulus set, and researchers are able to use the photographs that best fit their research aims.

We set the validity cutoff for inclusion in the set (correct identification by at least 55 % of the raters) to ensure that a large number of pictures would be available. However, most of the photographs have high validity, with the average being identified correctly by 86 % of the raters. Table 6 shows the numbers of pictures available—by emotion, age group, and gender—that were identified as the correct emotion at least 80 % of the time.

Table 6 Pictures available that were rated correctly at least 80 % of the time

Discussion

This article describes the creation of a standardized, validated set of emotional facial stimuli that includes 404 photographs of people 8 to 30 years old across five emotions: happiness, anger, fear, sadness, and neutrality.

This stimulus set has a number of strengths. First, eye placement was standardized across photographs. All photographs are the same size and shape, and the eyes are centered left–right and at the same height in all photographs. This is important for time-sensitive face processing studies, such as using EEG methodology. Second, the models range in age from 8 to 30 years old, a larger age range than has been included in previous stimulus sets, allowing for a wider range of developmental research. The age group of the model was not related to the percentage of raters who identified the emotion correctly, nor to the rated intensity of the emotion, indicating that the photographs of children, teens, and adults are equally valid. A third strength is that the photographs were rated by both child and adult participants. Neither the percentage correctly identified nor the rated intensity varied by age group, indicating that the pictures are equally appropriate for use across the entire age range.

This stimulus set also has a number of weaknesses. Because sad, fearful, and angry pictures were not identified as often with the correct emotion as the other pictures, unequal numbers of pictures of the different emotions are included in the final set, and each participant does not have a picture of every emotion included. The finding that happy faces are easier to display and/or identify than fear, anger, and sadness has been found in multiple other studies (Calvo & Lundqvist, 2008; Egger et al., 2011; Tottenham et al., 2009). Because of the validation process, however, we can be confident that the included pictures are likely to be identified by most observers as displaying the intended emotional expression. There are also more pictures of females than of males, and only 10 % of the photographs include non-Caucasian models. The participants who agreed to have their pictures taken at the lab and at the state fair are therefore not entirely representative of the United States population. Finally, it has been shown that people can detect the difference between posed and actual displays of emotion in photographs (McLellan, Johnston, Dalrymple-Alford, & Porter, 2010), and it has been hypothesized that posed expressions may communicate different information than genuine expressions (Davis & Gibson, 2000). Thus, although posed photographs have significant strengths, such as the ability to tightly control the stimuli, they are potentially limited in terms of real-life validity. It will be important for researchers to determine whether findings with posed photographs can also be replicated with genuine expressions of emotion.

In conclusion, the DEFSS provides new photographs of child and adult emotional faces that should be useful to a variety of researchers doing studies that require emotional facial stimuli. These stimuli are made freely available to researchers.