Introduction

The importance of faces as visual stimuli has been demonstrated across multiple disciplines such as social, behavioral, and neural sciences. Facial expressions carry valuable social information about a person's emotional state and intentions (Emery, 2000; Penton-Voak et al., 2006; Tomasello et al., 2007). Developmental studies have reported preferential attention to face-like stimuli in newborn babies (Goren et al., 1975; Johnson et al., 1991; Grossmann & Johnson, 2007), and neurophysiological studies have provided evidence for the existence of specialized brain areas and pathways for processing faces (Kanwisher et al., 1997; Vuilleumier & Pourtois, 2007; Schindler & Bublatzky, 2020; Calder & Young, 2005; Morris et al, 1998).

Given the significance of faces to humans, they are regularly used as stimuli in a variety of scientific disciplines related to human face perception (Coin & Tiberghien, 1997) including emotion recognition (Durand et al., 2007; Marsh et al., 2012), facial recognition by computers (Huang et al., 2005; Balla & Jadhao, 2018; Chellappa et al., 2010), neuropsychological disorders (Harms et al., 2010; Turetsky et al., 2007; Bornstein & Kidron, 1959), and the effect of emotion on cognitive processes (Adolphs et al., 2002; Nelson et al., 2003; Breiter et al., 1996).

While there is agreement that facial movements are informative for inferring emotions (Ekman & Friesen, 1976), the existence of a precise mapping between configurations of facial movements and emotion categories generalizable to all cultures remains strongly debated (Nelson & Russell, 2013; Durán & Fernández-Dols, 2021; Barrett et al., 2019). Most of the existing emotional face stimulus sets have been developed in Western societies and a few countries in East and south Asia (The Database of Faces, Cambridge, 2001; the EU-emotion stimulus set, O’Reilly et al., 2016; Tsinghua facial expression database, Yang et al., 2020; the Chicago Face Database, Ma Ds Correll & Wittenbrink, 2015; The MPI facial expression database, Kaulard et al., 2012; Radboud Faces Database, Langner et al., 2010; Developmental Emotional Faces Stimulus Set, Meuwissen et al., 2017, Japanese Female Facial Expression (JAFFE) Database, Lyons et al., 2017) see Diconne et al., 2006, and Calistra, 2015, for a more comprehensive list), reflecting their ethnic composition and the potential effects of their culture on their emotional expressions (Jack et al., 2012). Hence, there remains a gap in the existing scientific data for representing other countries and cultures.

To the best of our knowledge, two databases of Iranian faces have been previously developed. The Iranian Face Database (IFDB) (Bastanfard et al., 2007) was the first database of Middle Eastern faces, developed with a strength in covering a wide range of ages and poses. It includes 3600 color images from 616 human faces at different ages between 2 and 85 years. However, it only features two emotional expressions—smile and frown—and is only available for a fee.

The Iranian Kinect Face Database (IKFDB) (Mousavi & Mirinezhad, 2021) was published recently as the first dynamic RGB-D database of Middle Eastern faces. It consists of more than 100,000 color and depth frames recorded by the Kinect V2 sensor from 40 subjects in different head positions portraying the six basic facial expressions plus four micro-expressions, all with external features and a close-up view.

Both datasets were developed with a focus on computer vision applications and do not provide a validation study, making them less suitable for use as stimuli for psychophysical experiments. Furthermore, neither of the databases is currently accessible free of charge. Along with the Bogazici face database from Turkey (Saribay et al., 2018), we contribute the only validated databases from Middle Eastern faces.

The Iranian Emotional Face Database (IEFDB) has been created to address the need for a database of standard and validated Iranian faces for related studies. It consists of 248 photos from 40 individuals’ faces, covering six different emotional states—anger, fear, happiness, surprise, sadness, disgust—as well as the neutral state. All photos were taken in consistent conditions of lighting, camera setup, head and eye position, and with a high resolution. The collected images were validated through an online survey completed by Iranian raters. The database is freely available for academic use (see Section 6 on data availability).

Methods

Development of database

Face models

Forty native Iranians (15 female) in the age range of 18–35 years (mean = 26.50, SD = 4.82) volunteered to participate as the face models for the database. The ethnicity of the models is as follows: Persian, 18 (45%); Azerbaijani, 11 (27.5%); Kurd, 6 (15%); Gilak and Mazanderani, 4 (10%); Lur and Bakhtiari, 1 (2.5%). Metadata on the model’s age, sex, and ethnicity for each image is available upon request. The volunteers were all students, researchers, or faculty members of the Tehran University of Medical Sciences who were notified about the face database by an online announcement. The models were fully informed about the experimental procedure and signed a written informed consent to have their photographs taken and published for scientific research purposes (e.g. scientific experiments, publications, and presentations). The study was approved by the local ethics committee at the School of Advanced Technologies of Medicine at Tehran University of Medical Sciences.

Image acquisition

The photos were shot in a room specifically equipped as a studio in the core research facilities of Tehran University of Medical Sciences. The shooting setup, including the camera settings, lighting, and room temperature, were controlled and consistent across all sessions (see Fig. 1). The camera (Canon EOS 650D) used an 18–35 mm lens to take high-resolution images (5184 × 3456 pixels) in portrait mode from the face models. A 3 × 2 m green screen photo studio backdrop was used for the background. For lighting, we used two professional spotlight devices (1000 W), placed behind the camera within 1.5 m from the subject (see Fig. 2a). Spotlights were softened by blue polarizer film sheets and a transparent sheet. The model sat on a comfortable chair with a fixed head and neck position and was asked to look directly at the camera at the time of shooting. A 15.6-inch Lenovo laptop screen (Lenovo Inc., Beijing, China), located right below the camera, was used to show videos and images for eliciting emotions. The models were required to wear no makeup, jewelry, or accessories such as glasses or piercings, though some still used sunscreen or moisturizing cream. All female models wore a head covering (hijab) as mandated by the law of the country.

Fig. 1
figure 1

Shooting setup (a) and camera settings (b) used consistently across all sessions. The distances were adjusted as in (a) to achieve the best lighting conditions

Fig. 2
figure 2

Sample database images for each emotional state: happiness (a), sadness (b), anger (c), disgust (d), surprise (e), fear (f), neutral (g)

Each session started with explaining the purpose and the content of the study to the models and obtaining participation consent. Next, the emotional expressions were shot one by one in the following order: neutral, happiness, disgust, sadness, anger, fear, and surprise, taking 30–40 minutes in total. Photos with a poor head angle and blurry photos were discarded at the shooting time. To elicit emotions in the models, we used personal event induction and scenario induction as used in previous studies (Ebner et al., 2010; Dalrymple et al., 2013). For personal event induction, the models were asked to recall an event from their own life which strongly elicited the target emotion. For scenario induction, they were shown visual stimuli intended to elicit the target emotion and/or asked to imagine themselves in specific circumstances which would elicit the target emotion. The models were encouraged to show the emotions in their face intensely but naturally. See Supplementary Materials for more details about the content of the instructions and scenarios.

Database validation

Approach

To validate the database, we designed an online evaluation survey for human raters to rate their perceived intensity for each candidate expression in each photo (see Fig. 3). The survey was implemented using JsPsych (de Leeuw & Motz, 2016) and remains available on the database website both in Farsi and English (Heydari & Yoonessi, 2019). The model photos were presented in the original form (without cropping external features like hijab).

Fig. 3
figure 3

An example of a rating trial. Retrieved from http://e-face.ir/

Procedure and raters

In each survey attempt, the database photos were presented to the rater one by one in a random order. For each photo, the rater was asked to score the intensity level of each candidate emotion in the model’s face (see Fig. 3). The levels are based on the Likert scale (Likert, 1932).

Ratings were collected at two phases. In the first phase, the raters took a shorter version of the survey (to facilitate rater recruitment) with only five images to rate, randomly selected from the database. At the end of this phase, we had ~2500 rating records which we used to identify ambiguous stimuli: the images for which the target expression received an intensity rating less than any other expression on average (32 out of 280) were excluded from further ratings and analyses. Similar criteria have been used by other dataset creators (e.g. Yang et al., 2020; Ebner, Riediger, & Lindenberger, 2010). At the second phase of ratings, the additional dimensions of attractiveness, valence, and genuineness were added to the survey, which was then taken by 11 raters who each rated all 248 remaining images. Overall, close to 5300 rating records were collected, with most images rated at least 20 times. All raters received a short, written description of the survey content prior to the start of the rating.

Validation results

Expression identification

We observed a significant effect of expression for all categories. Figure 4 shows the distribution of the ratings, grouped by the model expression. A Welch's unequal variances t-test was performed for each pairing of the target expression with the other expressions (five tests). Bonferroni correction was applied to adjust p-values for multiple tests. All tests showed significant differences, with p < 1e−5 for all comparisons.

Fig. 4
figure 4

Violin plot of the distribution of the ratings. The width of the violin reflects the density of the data at different values. Each subplot contains the ratings of the images corresponding to one model expression, marked on its y-axis. The average intensity for the target is significantly higher than all other categories for all subplots. The p-value for the comparison with the second largest mean is annotated above some subplots

To report comparable measures with other studies, we also computed a confusion matrix to show the ratio of correct expression identification by category, as well as the biases in misidentifications (see Fig. 5). As our survey allows assigning intensities to more than one expression, for each rating record we first assigned the expression with the largest reported intensity as the chosen expression. For records in which all expressions were rated less than Fair, the chosen expression was assigned as Neutral. As shown in Fig. 5, Happiness had the highest hit rate (97%), and Fear and Disgust had the lowest (67% and 79%, respectively). The most common type of miscategorization was for the images in the Fear category being categorized as Surprise (22%). The overall Cohen’s kappa coefficient (Cohen, 1960) for the agreement between the model expressions and the rater-perceived expressions was 0.79.

Fig. 5
figure 5

Confusion matrix, C. The rows represent the intended expression made by the model, and the columns represent the expressions perceived by the raters. C_ij represents the percentage of model expression = i images that were perceived by the raters as expression j. The diagonal cells correspond to the agreement between the intended expression and the perceived expression

Other variables

We also collected ratings for attractiveness, valence, and genuineness for each image from a smaller number of raters (N = 11) and computed the Spearman’s rank correlation (ρ) between them and the perceived intensity of the expressions. As expected, valence was strongly correlated with happiness (ρ = 0.56, 95% CI [0.53, 0.59]), and weakly anti-correlated with anger (ρ = −0.10, 95% CI [−0.14, −0.06]). Attractiveness was correlated with happiness (ρ = 0.27, 95% CI [0.23, 0.30]). Valence and Attractiveness were also correlated with each other (ρ = 0.65, 95% CI [0.62, 0.67]). Genuineness was generally correlated with the intensity of any emotion, but most strongly with happiness (ρ = 0.26, 95% CI [0.22, 0.29]). See Supplementary Materials for the correlation between all pairings of variables.

Discussion

In this study, we collected and validated a database of basic emotional expressions from 40 Iranian male and female faces. The goal was to provide high-quality images that capture all six basic emotional expressions in a controlled studio setting for future researchers. The database screening and validation were conducted through an online survey with select-all-that-apply choices where the intensity of each emotion was independently reported for each image. Most images have been rated at least 20 times in terms of perceived emotional intensity.

Our database is the only existing Iranian face database that provides a validation study. Our images were validated by Iranian raters to reduce the cross-cultural confusion effects (Elfenbein & Ambady, 2002). It is also the only one with a well-controlled and consistent shooting setup which is also available free of charge. The shooting setup is highly standardized, controlling for the camera setup, lighting conditions, background, and the subject’s head position. The images are in high resolution (5184 × 3456 pixels).

Compared to the common forced-choice survey paradigm, the select-all-that-apply format of our survey allows the raters to report the intensity of more than one emotion for each image, providing a more nuanced description (as was used to produce Fig. 4). This may be especially important when examining whether the findings from datasets of other ethnicities generalize to Iranian faces. Furthermore, it also lends itself more naturally to many computer vision algorithms which output a membership degree to each emotion per image.

The expression identification results (Figs. 4 and 5) show high agreement between the model’s intended expression and the rater-perceived expression for nearly all emotion categories. Notably, many images intended to show fear were reported to have a fair or high level of surprise. The confusion between fear and surprise is a well-known effect and has been previously explored by researchers. According to the perceptual-attentional limitation hypothesis, the confusion between fear and surprise might be caused by the visual similarity of the two expressions and the shared facial muscles involved in the movements (Roy-Charland et al., 2014; Chamberland et al., 2017; Zhao et al. 2017). Ekman (1993) specified the action units of fear to include those of surprise, and more. This may explain why in our data, more fear images are rated as surprise than vice versa.

Ratings were collected in two phases, where the first phase had a short-form survey with only five images included. We found the interrater reliability to be lower for the short survey (average intraclass correlation ICC(1,1) = 0.42 as compared to 0.58 for the full format) (Liljequist et al., 2019), but the effect of the model's expression remained highly significant when computed only using the short survey's subset of the rating data. We also did not collect metadata including name, age, ethnicity, etc., from the online raters. New surveys based on the database may include the above metadata as well as the evaluation of additional measures such as perceived age.

In our database of Iranian models rated by Iranian raters, the six basic emotions of happiness, sadness, anger, disgust, fear, and surprise remain largely identifiable, with lower identifiability for fear and disgust, which may have culture-specific underlying reasons (Jack et al., 2009; Jack et al., 2012). Rigorous testing of such effects and other hypotheses on emotion perception was not a focus of this study. To connect to the broader research in the cross-cultural debate on facial emotion expression, a natural next step is a more in-depth characterization of the differences in expression and judgment between Iranians and other ethnicities.