Emotion recognition (sometimes) depends on horizontal orientations

Huynh, Carol M.; Balas, Benjamin

doi:10.3758/s13414-014-0669-4

Emotion recognition (sometimes) depends on horizontal orientations

Published: 25 March 2014

Volume 76, pages 1381–1392, (2014)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Emotion recognition (sometimes) depends on horizontal orientations

Download PDF

Carol M. Huynh¹ &
Benjamin Balas¹

669 Accesses
20 Citations
1 Altmetric
Explore all metrics

Abstract

Face recognition depends critically on horizontal orientations (Goffaux & Dakin, Frontiers in Psychology, 1(143), 1–14, 2010): Face images that lack horizontal features are harder to recognize than those that have this information preserved. We asked whether facial emotional recognition also exhibits this dependency by asking observers to categorize orientation-filtered happy and sad expressions. Furthermore, we aimed to dissociate image-based orientation energy from object-based orientation by rotating images 90 deg in the picture plane. In our first experiment, we showed that the perception of emotional expression does depend on horizontal orientations, and that object-based orientation constrained performance more than image-based orientation did. In Experiment 2, we showed that mouth openness (i.e., open vs. closed mouths) also influenced the emotion-dependent reliance on horizontal information. Finally, we describe a simple computational analysis that demonstrates that the impact of mouth openness was not predicted by variation in the distribution of orientation energy across horizontal and vertical orientation bands. Overall, our results suggest that emotion recognition largely does depend on horizontal information defined relative to the face, but that this bias is modulated by multiple factors that introduce variation in appearance across and within distinct emotions.

Low-level orientation information for social evaluation in face images

Article 05 February 2018

Benjamin Balas & M. Quiridumbay Verdugo

Pervasive influence of idiosyncratic associative biases during facial emotion recognition

Article Open access 11 June 2018

Marwa El Zein, Valentin Wyart & Julie Grèzes

Mona Lisa is always happy – and only sometimes sad

Article Open access 10 March 2017

Emanuela Liaci, Andreas Fischer, … Jürgen Kornmeier

Face recognition depends on a restricted range of low-level image features. This includes specific spatial-frequency (SF) bands (Gold, Bennett, & Sekuler, 1999; Yue, Tjan, & Biederman, 2006) and orientation bands. As is the case for all object categories, different spatial frequencies carry different kinds of visual information about face stimuli (Vuilleumier, Armony, Driver, & Dolan, 2003; Goffaux, Hault, Michel, Vuong, & Rossion, 2005; Goffaux & Rossion, 2006), with lower spatial frequencies carrying more information about coarser features (e.g., the face outline), and higher spatial frequencies carrying information about finer details (e.g., texture features or the appearance of the eyes). Mid-range spatial frequencies (~8–16 cycles per face), however, appear to contribute to face recognition disproportionately, whereas for other object classes it appears that subbands do not contribute disproportionately to recognition (Biederman & Kalocsai, 1997; Collin, 2006). With regard to how orientation subbands contribute to face recognition, Dakin and Watt (2009) demonstrated that horizontal orientations appear to contribute disproportionately to famous face identification. The authors asked observers to identify famous faces (celebrities) and found that observers were about 35 % accurate when the orientation was near the vertical axis, which differed significantly from the 56 % accuracy in performance for orientations near the horizontal axis. Additionally, in a separate computational analysis, the authors discussed the possibility that horizontal structures in face images may be a robust cue for face detection. That is, the typical pattern of horizontally oriented features in the face may be a cue that is not disrupted by changes in view or illumination, and may also reliably distinguish faces from nonfaces. The robustness of these structures following typical environmental manipulations may support invariant recognition in many settings. For example, observers are moderately robust to variation in face illumination and viewpoint (Sinha, Balas, Ostrovsky, & Russell, 2006), possibly because neither of these manipulations typically disrupts the structure of the sequence posited by Dakin and Watt. In contrast, both contrast negation and face inversion produce stripes that are highly dissimilar to the original image. The disruption of the face sequence in both these circumstances may be the reason that contrast negation and face inversion both disrupt face recognition so profoundly (Galper, 1970; Yin, 1969).

A number of face-processing phenomena depend critically on horizontal orientations within face images. For example, Goffaux and Dakin (2010) showed that the face inversion effect (i.e., failure to recognize a familiar face when it is presented upside down) was preserved for faces containing only horizontal information, but did not obtain for vertically filtered faces. They presented upright and inverted pairs of faces, cars, and scenes that contained horizontal or vertical information, or both. When presented upright, faces containing horizontal information were processed better than faces containing only vertical information. However, when presented upside down, the horizontal advantage was greatly disrupted, whereas faces containing vertical information remained largely unaffected. Identity aftereffects were also driven by horizontal information. Adapting to one of two faces containing horizontal information (i.e., staring at a face for an extended period of time) affected responses to morphed versions of those same two faces (a shift of the psychometric curve toward the adapting face). Finally, the authors showed that masking horizontal information with visual noise disrupted the ability to match faces across different viewpoints. All three manipulations led the authors to conclude that the horizontal structure provides the most useful information about face identity (Dakin & Watt, 2009; Goffaux & Dakin, 2010).

The amount of information for identification has also been shown to be greatest within the horizontal orientation band. Pachai, Sekuler, and Bennett (2013a, b) found that masking of face images was strongest for noise with orientations at or near 0 deg (i.e., horizontal), and least for orientations at or near 90 deg (i.e., vertical). Furthermore, the authors also calculated absolute efficiency scores and proposed that if observers were using masked face information across all orientation bands equally, then no differences should be observed for faces embedded in noise fields with different orientation energy distributions. However, this was not the case; rather, Pachai et al. (2013a, b) found that observers were actually more sensitive to horizontal information when faces were upright than inverted, suggesting that observers were more efficient at utilizing orientation information in the horizontal band and less efficient at using information in the vertical band. Finally, the authors showed that sensitivity to horizontal information in upright faces correlated significantly with the size of the face inversion effect, suggesting that horizontal face information was used efficiently, but only for faces presented upright rather than inverted.

However, preliminary evidence has suggested that horizontal information is not completely dominant over all other orientations. Goffaux and Okamoto-Barth (2013) demonstrated that vertical orientation assisted in the processing of gaze information. The authors compared direct with averted gazes by presenting an array of faces filtered to include horizontal, vertical, or a combination of both types of orientation information. By having participants search for a target face consisting of either gaze, the authors found that detection was better for direct than for averted gazes, but only when arrays were composed of vertically filtered faces. This suggests that specific facial regions can be useful for communicating relevant social information, and that these orientation bands carry social cues that are distinct from those that carry useful information for individuation. Indeed, the eyes, in particular (relative to the nose and mouth), carry important horizontally oriented information for individuation (Pachai, Sekuler, & Bennett, 2013a), but clearly also carry important vertically oriented information for gaze perception. These reports also suggest that multiple subsets of orientations may be more critical than others, depending on the region of focus within the face and the specific cues that observers require to complete different perceptual tasks. However, for recognition of whole faces, horizontal information appears to be most important.

Since identification critically depends on horizontal information, but some social cues may depend on a broader or different range of orientations, we chose to investigate how facial emotion recognition depends on orientation information. Bruce and Young’s (1986) classic model of face perception proposes dissociable processes for identity and facial expressions (Winston, Henson, Fine-Goulden, & Dolan, 2004; Young, McWeeny, Hay, & Ellis, 1986). Bruce and Young identified several distinct types of information that can be derived from viewing faces. This perceptual process is broken up into stages, with the first being the encoding of structural information, from which abstract descriptions of features are obtained. Following this initial stage, they proposed that expression and identity are analyzed independently from one another by separate systems (the expression analysis and face recognition units). This model has received support from clinical studies of prosopagnosic patients (Duchaine, Parker, & Nakayama, 2003; Palermo et al., 2011) and from behavioral studies of neurotypical individuals. For example, Young, McWeeny, Hay, and Ellis provided evidence for separate processing of identity and emotional expressions by measuring response times in a matching task. They presented pairs of familiar or unfamiliar faces simultaneously and had participants decide whether faces were of the same person (identity-matching) or emotion (expression-matching). According to the Bruce and Young model, recognizing expression does not depend on face recognition units, and so performance should be similar across familiar and unfamiliar faces. In contrast, identity matching should result in faster responses to familiar than to unfamiliar faces, due to the rapid and automatic operation of face recognition units. The results of Young et al.’s task revealed that for identity matching, response times were indeed faster for familiar than for unfamiliar faces, whereas no differences were observed for matching emotional expressions.

Evidence from neuroimaging studies and visual-adaptation paradigms also supports the possibility that identity and emotion are processed independently. For example, Winston et al. (2004) were able to distinguish between neural representations for emotion and identity processing using an fMRI adaptation paradigm. Behavioral face adaptation paradigms similarly reveal that identity adaptation depends on both an expression-dependent mechanism and an expression-independent mechanism (Fox, Oruç, & Barton, 2008), the latter providing evidence of independent neural processing of facial emotion. Different emotions (happy vs. sad) also appear to be dissociated neurally (Calder, Lawrence, & Young, 2001; Morris, DeGelder, Weiskrantz, & Dolan, 2001), suggesting that not only is emotion processing neurally distinct from identity processing, but that distinct emotions may be processed by distinct mechanisms.

Altogether, different emotional expressions appear to be processed by distinct neuroanatomical structures (Calder et al., 2001; Johnson, 2005). Additionally, they are also largely dissociable from identity. Therefore, we hypothesized that the observed bias for horizontal information in identity recognition might not obtain in a facial emotion recognition task, and that orientation biases may also depend on appearance variability in how those emotions are expressed. Indeed, prior reports suggest that not all emotion categories are equally dependent on the same spatial frequencies or orientations. Happy and sad emotion recognition appear to be supported by low (<8 cycles per face) and high (>32 cycles per face) spatial frequencies, respectively (Kumar & Srinivasan, 2011). Yu, Chai, and Chung (2011) measured performance on the categorization of four facial expressions (anger, fear, happiness, and sadness) using multiple orientation filters (i.e., –60, –30, 0, 30, 60, and 90 deg) and concluded that horizontal information is critical for the recognition of most emotions, with the exception of fear expressions. When the degree of orientation reached near vertical, a bias emerged toward labeling faces as “fearful,” suggesting that diagnostic cues for recognizing fear may be embedded within the vertical rather than the horizontal component, or at least may be more equally distributed between the two. This result was also borne out by computational modeling simulations that help to explain the differing orientation biases observed by Yu et al. using a model of visual processing based on multiscale oriented Gabor filters (Li & Cottrell, 2012).

In the present study, we asked participants to categorize happy and sad faces that were filtered to include information that was predominantly vertical, predominantly horizontal, or both. Furthermore, we used picture-plane rotation (0 or 90 deg) to dissociate image-based from object-based orientation. For instance, when the horizontally filtered face image (stimuli containing predominantly horizontal information) is rotated at a 90-deg angle, the raw visual orientation becomes vertical although information along the horizontal structure of the face remains present. This manipulation thus allowed us to determine the relative contributions of a putative bottom-up bias for horizontal orientations and higher-level biases for particular facial features. We conducted two experiments, using faces expressing genuine emotions (Exp. 1) and faces expressing posed emotions (Exp. 2). This allowed us both to examine an ecologically valid set of emotional faces in one task and to complement this analysis with a controlled set of images that made it possible to control for confounds between emotional expression and specific features (e.g., open mouths) that are present in naturally evoked expressions. We hypothesized that the reliance on horizontal orientations for emotion recognition may depend on the appearance of distinct emotional faces, since particular diagnostic features vary substantially by emotion category. Overall, our results were consistent with this hypothesis, insofar as we found that emotion recognition does largely depend on horizontal orientation information, but that this bias is modulated by factors influencing the appearance of specific emotions, including mouth openness (i.e., open vs. closed). Furthermore, we found that structural orientation relative to the face image, as opposed to raw orientation, was driving performance in our tasks. Finally, we submitted our face images for analysis of the energy content within the horizontal and vertical orientation bands, to determine whether our behavioral effects were driven by the relative amounts of orientation energy in the target bands. Our results revealed that, overall, horizontal orientation energy was consistently greater than vertical orientation energy, but that this effect was not significantly affected by emotional expression or mouth openness. We concluded that the extent to which emotional expressions are recognized with a horizontal orientation bias depends on a number of stimulus factors, suggesting that observers are capable of adopting a flexible strategy for recognition that is not constrained by a front-end horizontal bias.

Experiment 1

In Experiment 1, we investigated whether genuine emotion recognition depended on the horizontal structure of the human face. We also wanted to determine whether image-based orientation or object-based orientation was more relevant to differential performance as a function of orientation energy.

Method

Participants

A group of 17 undergraduate students (11 females/six males) from North Dakota State University participated in this experiment. All participants reported normal or corrected-to-normal vision, provided written informed consent, and received course credit for their participation.

Stimuli

Face images of 29 individuals (12 male/17 female) expressing genuine emotions of both happy and sad were taken from the Tarrlab Face Place database (www.face-place.org) and were 250 × 250 pixels in size. Faces containing certain artifacts (e.g., extensive facial hair) were not chosen—hence, the unequal sample of male and female stimuli. We normalized the images by subtracting the mean luminance value from each image. We filtered these faces in MATLAB 2010A by applying a Fourier transform to each image and multiplying the Fourier energy with either a horizontal or vertical Gaussian filter with a standard deviation of 20 deg. Our stimuli were then created by taking the inverse of the Fourier-transformed image back into the spatial domain (Fig. 1). Following this inverse transformation, all images were readjusted so that the mean luminance and contrast were matched (Dakin & Watt, 2009). In addition to the vertically and horizontally filtered images, we also included a third condition (broadband) composed of faces that contained broadband orientation information (Fig. 1). We had intended this set of images to include only the combination of horizontal and vertical orientation energy from our first two filtered-image conditions, but due to an error in our image-filtering code, these images instead had information from all orientations at low spatial frequencies, and were only restricted to horizontal and vertical orientations at high spatial frequencies. As a result, these control images are effectively composed of orientation energy at all orientations, and therefore primarily allowed us to compare performance with largely unfiltered images to performance in the horizontal and vertical conditions.

Design

We used a 2 × 2 × 3 within-subjects design with the factors Emotion (happy, sad), Image Orientation (upright or sideways), and Filter Orientation (vertical, horizontal, broadband). Image orientation was varied in separate blocks, whereas emotion and filter orientation were pseudorandomized within each block. Participants completed a total of 348 trials (174 in the upright condition and 174 in the rotated condition). Block order was counterbalanced, so that half of the participants began with the upright-images and the remaining half with the rotated-images condition.

Procedure

Participants viewed the stimuli on a 13-in. MacBook with a 2.4-GHz Intel Core 2 Duo Processor. We recorded participants’ responses using an eight-bit USB controller. Stimuli were presented using Psychophysics Toolbox 3.0.10 on a MacOS 10.7.4 system. The participants’ task was to label each face according to the expressed emotion (happy/sad). Participants responded by pressing the “B” button on our controller for “sad,” and the “A” button for “happy.” We asked participants to respond as quickly and as accurately as possible. Each trial began with a fixation cross at the center of a gray screen for 500 ms, followed by a face stimulus that replaced the fixation cross. The face stimulus remained on the screen until either participants had made a response or 2,000 ms had elapsed. Short breaks were offered in between blocks, and the experiment resumed only when participants indicated they were ready.

Results

Sensitivity

We computed estimates of sensitivity (d') using hits (the correct classifications of happy faces) and false alarms (the incorrect classifications of sad faces) in each condition. We submitted these sensitivity measures to a 2 (image rotation) × 3 (filter orientation) repeated measures analysis of variance (ANOVA) and observed a main effect of image rotation, F(1, 16) = 24.62, p < .0001, η ² = .11, such that discrimination was better for upright faces (M = 3.26) than for faces rotated sideways at a 90-deg angle (M = 2.87). We also observed a main effect of filter orientation, F(2, 32) = 73.40, p < .0001, η ² = .68, such that discrimination was poorest for vertically filtered faces (M = 2.43), followed by horizontally filtered faces (M = 3.17) and faces containing broadband information (M = 3.59). Bonferroni post-hoc pairwise comparisons revealed that all three of these values differed from one another (p < .001). These two factors also interacted, F(2, 32) = 7.57, p = .002, η ² = .04, such that image rotation significantly impacted discrimination for vertically filtered faces, but had no effect on horizontally filtered faces and faces containing broadband orientation information (Fig. 2).

Response time

A 2 (emotion) × 2 (image rotation) × 3 (filter orientation) repeated measures ANOVA of median correct response latencies revealed significant main effects of emotion, F(1, 16) = 27.9, p < .001, and filter orientation, F(2, 32) = 42.74, p < .001. These effects were driven by longer response latencies for sad (774 ms) than for happy (665 ms) faces, and by slower response latencies for vertically filtered faces (793 ms) than for horizontally filtered faces (700 ms) and faces with broadband orientation information (M = 665 ms). We also observed a significant interaction between these two factors, F(2, 32) = 11.60, p < .001, such that vertical filtering greatly impaired the recognition of sad faces (Fig. 3), but happy-face latencies did not differ in the horizontal and vertical filtering conditions. Unlike in our analysis of sensitivity, we observed no interaction between image orientation and orientation filter, F(2, 32) = 1.29, p = .29.

Criterion

We also ran a 2 (image rotation) × 3 (filter orientation) repeated measures ANOVA of response bias, C, and found no significant biases in the way participants were responding in any of our conditions, F < 1.

Discussion

Our results demonstrate that the facial cues important for emotion recognition were preferentially carried by horizontal orientation. Discrimination ability (as indexed by d' values) was poorer for vertically filtered images than for horizontally filtered images and images with broadband orientation. We also found that discrimination of emotional faces was worse when images were rotated 90 deg in the picture plane, but that sideways rotation did not lead to a “flipped” orientation bias. That is, horizontal orientations relative to the object (not the image) yielded better performance than did vertical orientations. This suggests that the horizontal bias that we have observed is not driven solely by a front-end bias for horizontally tuned neurons in early vision. Were this the case, we would have expected horizontal information relative to the image to be a better predictor of superior performance. Instead, our effect of planar rotation is largely consistent with previous reports on the effect of inversion on face processing (Freire, Lee, & Symons, 2000; Maurer, Grand, & Mondloch, 2002). A change in orientation disrupts the efficiency of face processing—in our case, impacting vertically filtered faces significantly more than faces with horizontal or with both horizontal and vertical energy. This disruption in performance for vertically rather than horizontally filtered faces rotated sideways is inconsistent with previous results showing that rotation impacts both types of information (Jacques, d’Arripe, & Rossion, 2007; Yin, 1969). However, one major difference between the results that we obtained from previous studies is that our picture-plane rotation did not involve a complete 180-deg rotation. Thus, our face images were not completely inverted, but were presented sideways instead. Furthermore, according to Goffaux and Rossion (2007), face inversion does not equally disrupt vertical and horizontal facial information. Although Goffaux and Rossion (2007) did not actually investigate orientation bands, they did find differences between the extractions of different orientation information within facial features. Specifically, they observed the poorest performance for recognizing vertical rather than horizontal facial relations rotated sideways at a 90-deg angle. Again, orientation information was not the target of their investigation; however, the report that performance differences could arise for various facial information rotated within the picture plane might possibly explain the discrepancies between our results and previous findings.

In terms of response latency, our main effect of emotion category is consistent with previous reports (Elfenbein & Ambady, 2003; Kirita & Endo, 1995) that happy-face categorization was carried out faster than sad-face categorization. In terms of our initial hypotheses regarding the potential for different emotions to exhibit a differential horizontal bias, we also found that the preference for horizontal information depended on emotion category, since happy-face response latencies revealed a reduced horizontal bias, relative to sad faces.

Together, our sensitivity and response time data suggest that (a) orientation biases for emotion recognition may manifest in an emotion-dependent manner, and (b) structural orientation relative to the face image (not raw orientation on the retina) drives differential performance in our task.

One important limitation of our first experiment, however, was that our use of genuine emotional expressions may have introduced confounding factors that could underlie the interaction we observed between emotion category and filter orientation. Specifically, mouth openness varies substantially in genuine happy and sad faces, and the prevalence of open mouths in happy faces may be the basis of the interaction that we observed here. To examine the emotion dependence of the horizontal orientation bias in more depth, we continued in Experiment 2 by using stimuli from a database of posed emotional expressions that permitted systematic control of mouth openness and emotional expression.

Experiment 2

In Experiment 2, we wished to replicate and extend the results of Experiment 1 using a controlled set of face stimuli within which mouth openness could be manipulated. Specifically, we chose to use a set of posed emotions from the NimStim Face Set (Tottenham et al., 2009), in which the position of the mouth (open vs. closed) was systematically varied in happy and sad emotional expressions.