Introduction

Several researchers argue that the recognition of face identity and the perception of facial expressions are two independent processes (Bruce & Young, 1986; Burton, Young, Bruce, Johnston, & Ellis, 1991; Hancock, Bruce, & Burton, 2000; Winston, Henson, Fine-Goulden, & Dolan, 2004). Bruce and Young (1986), for example, distinguish between parallel routes for separable aspects of face perception, such that the mechanism that implements face recognition differs from the one that judges facial expressions. However, evidence from neuroimaging studies indicates interactive connections between the brain regions associated with face identity recognition and facial expression perception. The processing of facial expressions initiates activation in the amygdala, which can modulate the strength of the concomitant activation in the fusiform face area, the region implicated in face identification (Ganel, Valyear, Goshen-Gottstein, & Goodale, 2005; Hasselmo, Rolls, & Baylis, 1989; Vuilleumier, Armony, Driver, & Dolan, 2001; Vuilleumier & Pourtois, 2007). Activation in the amygdala also strengthens subsequent memory traces in the hippocampus, resulting in enhanced memory for emotional rather than non-emotional stimuli (Hamann, Ely, Grafton, & Kilts, 1999; Kilpatrick & Cahills, 2003; Phelps, 2004). These findings on emotion-modulated brain activation are consistent with behavioral findings of facial expressions facilitating the ability to recognize face identities (Gallegos & Tranel, 2005; Kaufmann & Schweinberger, 2004).

However, while several studies have demonstrated improved recognition memory for faces with positive over negative and neutral expressions (D’Argembeau & Van der Linden, 2004, 2007; Gallegos & Tranel, 2005; Kaufmann & Schweinberger, 2004), others have reported enhanced memory for angry or fearful faces compared with happy and neutral faces (Jackson, Linden, & Raymond, 2014; Jackson, Wolf, Johnston, Raymond, & Linden, 2008; Jackson, Wu, Linden, & Raymond, 2009; Sessa, Luria, Gotler, Jolicoeur, & Dell’Acqua, 2011). One noticeable difference between the two incompatible arguments on how facial expression facilitates face recognition relates to the memory mechanisms (e.g., long-term memory, working memory) being tested. The studies showing memory enhancement for faces with positive expressions used an identity recognition task that was separated from encoding for a 5-min interval (D’Argembeau & Van der Linden, 2004, 2007) or tested with familiar faces, such as those of famous people, already encoded in one’s long-term memory (Gallegos & Tranel, 2005; Kaufmann & Schweinberger, 2004). When tested with familiar faces, recognizing face identities was faster for happy than for angry or neutral faces. The memory facilitation for happy faces is explained by a holistic bias where positive expressions induce a holistic or configural processing that can be beneficial for face perception (Bridge, Chian, & Paller, 2010). In contrast, studies that have demonstrated enhanced memory for faces with angry or fearful expressions often measured one’s visual working memory (VWM). In these VWM tasks, a brief temporal interval (e.g., 1,000 ms) separated encoding and memory retrieval. When faces were stored briefly in VWM, participants showed enhanced memory only for faces with negative expressions, while memory for faces with positive expressions did not differ significantly from that for faces with neutral expressions (Jackson et al., 2014; Jackson et al., 2008; Jackson et al., 2009).

Jackson et al. (2009) claimed that the superior VWM performance for negative over positive and neutral faces implies discrete storage capacities for different expressions. Previous studies on VWM capacity have shown that the perceptual complexity or the amount of visual detail is an important determinant of the capacity limits (Alvarez & Cavanagh, 2004; Eng, Chen, & Jiang, 2005). Estimated capacities differ across categories, while the more complex a stimulus is, the smaller its storage capacity would be. If face recognition and perception of facial expressions are indeed independent processes, the presence of an emotional expression in a face is additional information in encoding its identity. This is especially the case where facial expressions were completely irrelevant to the task of storing identities of multiple faces into working memory because the same expressions were presented on each face (Jackson et al., 2014; Jackson et al., 2009). Considering the prior argument that objects that possess more features and detail have smaller storage capacities (Alvarez & Cavanagh, 2004), the capacity to store emotional faces should be smaller than that to store faces with neutral expressions. In contrast to this prediction, however, Jackson and her colleagues showed that the VWM capacity for faces with negative expressions has been found to be superior to that for neutral and even positive faces.

While Jackson et al. (2009) suggested that VWM capacities differ across facial expressions and that the storage capacity for negative expressions is larger than the capacities for other expressions, the exact mechanism that facilitates only certain facial expressions remains unanswered. One of the long-standing debates in VWM studies is whether capacity depends on a finite number of slots or on a single memory resource. The slots model, which assumes independent storages (Cowan, 2010; Luck & Vogel, 1997), has been challenged by an alternative account that a single memory resource is shared out among visual items (Bays & Husain, 2008; Van den Berg, Shin, Chou, George, & Ma, 2012). Bays and Husain (2008) suggested that a shared resource is dynamically shifted and that the precision of a neural representation of a visual item is determined by the amount of resource allocated towards it. Considering both models, two explanations for the selective enhancement of negative expressions can be generated. The first is that independent storages exist for different facial expressions. The model of independent feature memories suggests that different dimensions of features are stored in parallel (Olson & Jiang, 2002; Wheeler & Treisman, 2002). If facial expression is also subject to separate feature stores, capacities should vary across different types of facial expressions while reaching the capacity limit of one facial expression should not impair the capacity of another facial expression. Previous neuroimaging findings identified disparate neural pathways that are instigated by negative expressions (Vuilleumier & Pourtois, 2007). Moreover, results from meta-analyses of neuroimaging data identified discriminable neural correlates for each of the basic emotions (anger, disgust, fear, happiness, and sadness; Vytal & Hamann, 2010). For instance, happiness is associated with activations in the rostral anterior cingulate cortex and right superior temporal gyrus, while fear activates the amygdala and insula. When the processing of multiple stimuli relies on discriminable neural correlates, they produce less interference in the brain. This dispersion of neural resources is reflected in behavioral capacities, as VWM performance is better when items are from different categories with distinct neural representations than when from a single category with a specialized region within the cortex (Cohen, Konkle, Rhee, Nakayama, & Alvarez, 2014). Likewise, as different facial expressions activate distinguishable regions in the brain, it is possible that the representations of faces are stored in VWM at different regions depending on their facial expressions and neural correlates.

The second explanation is that a shared pool of resource exists for faces and that the proportion of resource allocated to each face is influenced by the displayed facial expressions with the allocation being biased by attention. Threat-related stimuli, including angry and fearful faces, are known to be attentionally prioritized (Eimer & Kiss, 2007; Mogg & Bradley, 1999). This attentional priority during early perceptual processes can exert facilitation at encoding, as stimuli that receive a greater portion of attentional resource are encoded with higher precision (Bays & Husain, 2008). Faces with negative expressions are prioritized in the distribution of attentional resource and thus would be encoded into VWM with higher accuracy. This being the case, VWM performance should be impaired for faces with non-negative expressions when they coincide with an angry or a fearful face as only low priority would be given to them.

Previous research has limitations in testing whether the enhanced VWM performance for negative faces is due to having a separable memory store or facilitation occurring during perceptual encoding. Including Jackson et al.’s (2009) study, in which the set size was manipulated to calculate the estimated capacities for emotional and neutral faces, studies often presented a number of faces with an identical emotional expression during encoding but did not present emotional and neutral faces together within a single display (Jackson et al., 2014; Sessa et al., 2011). The logic behind presenting an emotionally homogeneous display was to prevent memory impairment for neutral faces resulting from attentionally prioritizing emotional faces. However, to examine if the allocation of attentional resource is affected by the displayed facial expressions and if this differential allocation leads to different patterns of VWM performance across facial expressions, it is necessary to have an additional encoding display in which the expressions are heterogeneous. By comparing VWM performance for faces encoded in emotionally homogeneous and heterogeneous displays, it is possible to observe how memory for faces with a particular facial expression may change depending on the coinciding expressions and their allotted attentional resource. If the distribution of attentional resource plays a critical role in VWM enhancement, the asymmetric allocation of limited resource would result in a memory trade-off between items that are prioritized and those that are not when the displayed items are heterogeneous rather than homogeneous (Jiang, Remington, Asaad, Lee, & Mikkalson, 2016). On this basis, if negative expressions receive a greater portion of attentional resource during perceptual encoding, only memory for faces with negative expressions would increase when they are presented with neutral faces compared to when they are presented with other negative faces. Memory for neutral faces, in contrast, would decrease when they are presented with negative faces compared with when presented with other neutral faces, as limited resource is biased towards negative expressions.

The other explanation for the enhanced VWM for negative faces was that separate memory stores exist across facial expressions. Cohen et al. (2014) claimed that, when faces and scenes are encoded together, not just the memory performance for faces should increase in the heterogeneous than in the homogeneous encoding display, but the performance for scenes should also increase as faces and scenes each have distinguishable neural representations in the ventral visual stream (i.e., the fusiform face area and parahippocampal place area, respectively). Thus, when fewer items from two different categories are encoded into VWM, they should induce less interference and thus VWM performance for both stimulus categories should increase. Likewise, if different emotional faces are stored in parallel stores with each facial expression instigating discrete neural pathways in the brain, less interference should take place in the emotionally heterogeneous encoding display, and thus, memory for both negative and neutral faces should increase. The larger storage capacity for negative expressions can be explained by the neural pathway that is instigated by the amygdala, which increases activation in the fusiform gyrus (Vuilleumier & Pourtois, 2007).

In contrast to negative faces, previous studies, which measured the capacity for positive faces, reported that it did not differ significantly from that for neutral faces (Jackson et al., 2014; Jackson et al., 2009). Moreover, facial expression modulating the activation in the fusiform gyrus has been primarily reported for fearful expressions (Morris et al., 1998; Vuilleumier et al., 2001; Vuilleumier, Armony, Driver, & Dolan, 2003) but less so for happy expressions (Breiter et al., 1996). Thus, it was less likely for positive expressions to elicit a separate pathway from the amygdala to induce greater activation in the fusiform gyrus. In terms of attentional priority, negative faces, but not positive faces, are known to elicit attentional capture even when they are task-irrelevant (Eimer & Kiss, 2007), although inconsistency exists (Becker, Anderson, Mortensen, Neufeld, & Neel, 2011; Hunt, Cooper, Hungr, & Kingstone, 2007). Consequently, for positive faces that were given irrelevantly to the task goal, it was expected that when the encoding display heterogeneity was manipulated, VWM for happy faces would not significantly differ between the homogeneous and heterogeneous encoding displays.

VWM for face identity was tested with a standard change detection task in which the encoding display was separated from the test display by a brief temporal gap (Phillips, 1974; Rensink, 2002). The test display in this study contained four images, and it was either the same as the encoding display or changed in one image selected as a probe. Participants were asked to report if a change was made to the probe while encoding the entire display set as any of the images could be selected with equal probabilities. As in Cohen et al. (2014), the number of images was fixed as four but the emotional heterogeneity of the encoding display systematically varied.

Experiment 1

To examine the mental mechanisms that implement VWM enhancement for negative expressions, the emotional heterogeneity of the encoding display was manipulated with the use of fearful faces as representative of negative expressions. By comparing VWM performance of faces that were either encoded in emotionally homogeneous or heterogeneous displays, Experiment 1 aimed to observe how memory for faces with fearful expressions changes depending on their coinciding expressions. If facial expressions critically affect the amount of attentional resource allocated to each encoding item, then memory for fearful faces would increase in the heterogeneous compared to the homogeneous display, as more attentional resource is allotted to fearful faces when they are presented with neutral faces than when limited attentional resource is analogously shared by four fearful faces. Memory for neutral faces would decrease in the heterogeneous compared to in the homogeneous display, as the limited resource is depleted by fearful faces. Thus, if attentional resource is asymmetrically allocated across different facial expressions, a memory trade-off would be evident. Alternatively, if discrete memory stores exist for fearful and neutral facial expressions, in light of Cohen et al.’s (2014) findings, it was expected memory enhancement not just for fearful faces but also for neutral faces would be observed in the heterogeneous compared to the homogeneous display, as less interference occurs with fewer items being encoded into each store.

To verify whether the biased distribution of attentional resource or the presence of a separable memory resource results in enhanced VWM exclusively for threatening stimuli, in addition to measuring VWM performance for fearful faces, VWM for happy faces was also tested. Considering the previous finding that the VWM capacity for happy faces does not differ significantly from that for neutral faces (Jackson et al., 2009), happy faces would not be facilitated by a bias in the allocation of attentional resource or by having a separate memory resource. Thus, for happy faces, VWM measured in the heterogeneous display would not differ significantly from VWM measured in the homogeneous display.

Method

Participants

Power calculations to determine the sample size were performed using G*Power (Faul, Erdfelder, Lang, & Buchner, 2007; Faul, Erdfelder, Buchner, & Lang, 2009). For a statistical power of .9 and a Type I error level of .05, results showed that the target sample should be 30. Thus, 32 college students (22 females, 10 males) with a mean age of 22.5 years participated in Experiment 1 in return for a monetary reward of KRW 6,000 (approximately US$5.4). Participants were all naïve to the purpose of the study and had normal or corrected-to-normal visual acuity by self-report. Half of the participants performed the task with fearful and neutral faces while the other half performed the task with happy and neutral faces to prevent any carry-over effects between negative and positive expressions.

Apparatus

Participants were tested individually in a dimly lit, sound-attenuated experimental booth. They were seated approximately 60 cm away from a 15.9-in. CRT monitor of 1,024 × 768 pixels and 60-Hz refresh rate. The experiment was programmed using MATLAB (www.mathworks.com) and Psychtoolbox (Brainard, 1997; Pelli, 1997). Responses were recorded using a standard computer keyboard.

Stimuli

Face images were selected from the Korea University Facial Expression Collection (KUFEC) 2.0, a set of faces that includes neutral and six basic facial expressions. Validation of the set was undertaken using the Self-Assessment Manikin to rate the affective dimensions of valence, arousal, and intensity on a scale of 1–7 (Kim et al., 2017). Faces were selected based on these rating scales. The mean arousal level for the fearful face set was 4.20, and 4.46 for the happy face set. The final selected set was composed of 44 fearful, 44 happy, and 44 neutral faces, where half of them were female faces and the other half were male faces. Each image was cropped closely above the eyes and below the lips into a square to remove the outer contours of the face and hair, as low-level features can be used as cues in noticing changes. All images were in gray-scale with adjusted brightness by setting the mean luminance and target contrast of the final images.

Procedure

Before starting the main experiment, each participant performed 24 practice trials. The main experiment comprised eight blocks of 48 trials. An example of a trial sequence is shown in Fig. 1. Each trial started with a red fixation square (.65° × .65°) presented at the center of the display for 500 ms. Participants were instructed to fixate on the central square before an encoding display appeared. The encoding display consisted of four faces each from a different identity of the same race and approximately the same age. Each face image was subtended 4° × 4° at a viewing distance of 60 cm and appeared in each visual quadrant. The centers of the four images formed an imaginary square sized 5.2° × 5.2°. Participants were instructed to encode all four faces as any of the faces could be tested. The encoding display lasted for 1,200 ms and was followed by a blank retention display that lasted for 1,000 ms. The test display, which also consisted of four faces, immediately followed while one of them was cued with a red outline square (6° x 6°). The probe, to which participants were instructed to report any changes, was the single cued face. While the remaining three of the faces were always identical to the faces in the encoding display, the probe was the same as the face shown in the encoding display in only half of the trials. If the probe changed, the identity of the face changed, but the facial expression remained the same. Thus, the numbers of emotional and neutral faces were consistent within a trial. The red cue appeared in random locations so that each quadrant location was cued equally as often (25% each). The test display remained on the screen until participants responded. Participants pressed the “s” key when the probe was the same as the encoded face and the “d” key when the probe was different. When they made a correct response, three rising tones (800, 1,300, and 2,000 Hz) lasting for 300 ms followed, while a low buzz (400 Hz) was presented for incorrect responses. There was an inter-trial interval of 1,000 ms. At the end of each block, participants were presented with their total number of correct trials.

Fig. 1
figure 1

An illustration of a trial sequence of the change detection task. Participants viewed four faces for 1,200 ms. After a 1,000-ms retention interval, one of the faces was cued as a probe by a red frame. Participants were required to judge whether it was the same or different from the face shown at that location. Here the probe changed so participants should report that it is different. Due to portrait right issues, images from the Nimstim set were partially used here to show as an example, but only the images from KUFEC 2.0 were used in the actual experiments

Design

Depending on the facial expression of the probe, trials were divided into emotional and neutral trials. So, when an emotional face was tested as a probe, the trial was considered an emotional trial. Moreover, encoding displays differed in terms of the numbers of emotional and neutral faces. On 25% of the trials, the encoding display contained four emotional faces. On another 25% of the trials, four neutral faces were presented. The rest of the trials contained a display of two emotional and two neutral faces. An encoding display was considered homogeneous when four faces displayed either all emotional or all neutral faces, and was considered heterogeneous when the display consisted of two emotional and two neutral faces. In sum, there were two additional within-subject variables: (1) emotion of the probe (emotional or neutral), and (2) emotional category heterogeneity of the encoding display (homogeneous or heterogeneous). The number of trials for each combination of these variables was equal.

Results

The memory sensitivity measure, d-prime, was calculated from each participant’s hit and false-alarm rates as a function of emotion and category heterogeneity. Repeated-measures analysis of variance (ANOVA) was performed on the d-prime data with the above factors as within-subjects variables and face type (i.e., fearful or happy face group) as a between-subjects variable (see Table 1).

Table 1 The mean d-prime values (standard deviations in parentheses) in Experiment 1 as a function of face type, emotion, and category heterogeneity

The main effect of face type was significant, F(1, 30) = 4.63, p = .04, MSe = .46, ηp2 = .13. D-prime was higher for the group that encoded fearful and neutral faces (M = 1.34) than for the group that encoded happy and neutral faces (M = 1.08). The main effect of emotion was significant, as memory sensitivity for emotional faces (M = 1.51) was significantly higher than that for neutral faces (M = .91), F(1, 30) = 55.23, p < .0001, MSe = .20, ηp2 = .65. The interaction of face type and emotion was significant, F(1, 30) = 5.02, p = .03, MSe = .20, ηp2 = .14. To understand the interaction, separate analyses were conducted for the fearful and happy face groups. For the fearful face group, the main effect of emotion was significant with higher sensitivity for fearful faces (M = 1.72) than for neutral faces (M = .95), F(1, 15) = 51.37, p < .0001, MSe = .093, ηp2 = .77. The main effect of emotion was significant for the happy face group with higher sensitivity for happy faces (M = 1.29) than for neutral faces (M = .87), F(1, 15) = 12.37, p = .003, MSe = .11, ηp2 = .45.

The interaction of emotion and category heterogeneity was significant, F(1, 30) = 10.39, p = .003, MSe = .18, ηp2 = .26. Further analyses were conducted to examine how d-prime values for emotional and neutral faces differed between the homogeneous and heterogeneous displays. For emotional faces, memory sensitivity was significantly higher when they were encoded in the heterogeneous display (M = 1.63) than in the homogeneous display (M = 1.38), F(1, 31) = 6.02, p = .02, MSe = .17, ηp2 = .16. Sensitivity for neutral faces, in contrast, was significantly poorer when they were encoded in the heterogeneous (M = .80) than in the homogeneous display (M = 1.03), F(1, 31) = 5.48, p = .03, MSe = .15, ηp2 = .15. The three-way interaction of face type, emotion, and category heterogeneity was not significant, F(1, 30) = 2.90, p = .10 (see Fig. 2).

Fig. 2
figure 2

The mean d-prime data as a function of face type, emotion, and category heterogeneity in Experiment 1

No other effects or interactions were significant or marginally significant.

Discussion

The overall performance was better for the group that encoded fearful and neutral faces than the group that encoded happy and neutral faces. Change detection performance was better when the probe was an emotional face rather than a neutral face. Moreover, although the task was a VWM task, superior memory sensitivity for emotional over neutral faces was obtained not only for fearful expressions but also for happy expressions, which is inconsistent with previous studies (e.g., Jackson et al., 2009). Nonetheless, the memory advantage for emotional over neutral expressions was significantly greater for fearful faces than for happy faces.

Even though better change detection performance was obtained for emotional than for neutral faces, this advantage could have been driven by low-level visual characteristics (e.g., Hansen & Hansen, 1988). Refutably, the VWM advantage for emotional faces was not obtained with inverted faces (Jackson et al., 2009), indicating that the configural processing of facial expressions, rather than low-level visual features, facilitated VWM. Moreover, distinctive low-level features in a face, such as eyebrows, were removed in the present study by having the face cropped closely above the eyes and below the lip.

Most importantly, compared to the memory performance for faces presented in the homogeneous displays, memory was enhanced for emotional faces and impaired for neutral faces in the heterogeneous displays, indicating a VWM trade-off. The finding of a memory trade-off between emotional and neutral faces rather than memory sensitivity increasing for both emotional and neutral faces suggests that faces of different facial expressions likely share a single memory resource, while the distribution of attentional resource is biased by facial expressions. Unexpectedly, however, this memory trade-off did not take place only between fearful and neutral faces but was also observed between happy and neutral faces.

A memory trade-off between two different stimuli that are simultaneously presented within the same encoding display can take place when one stimulus category, to which attention is preferentially allocated, is prioritized at perceptual encoding. Fearful faces are known to be attentionally prioritized (Eimer & Kiss, 2007). Although inconsistency exists in the literature, happy faces also capture attention (Becker et al., 2011; Hunt et al., 2007). Thus, it is likely that the memory trade-off occurred both between fearful and neutral faces and between happy and neutral faces, as fearful and happy faces were attentionally prioritized over neutral faces and thus were facilitated during perceptual encoding.

Experiment 2

When two emotional and two neutral faces were presented in the same encoding display, VWM for emotional faces increased while memory for neutral faces decreased. To verify whether this memory trade-off is attributable to attentional components that manifest their influence during perceptual encoding, in Experiment 2 the spatial attentional component was controlled. If this trade-off occurred due to an asymmetric allocation of spatial attention between emotional and neutral faces, it should disappear when the allocation is controlled by presenting each face sequentially. Thus, in Experiment 2, rather than presenting all four faces simultaneously for 1,200 ms, each face was presented sequentially in each quadrant for an equal length of duration (300 ms).

Method

Participants

Thirty-two new participants (18 females, 14 males) from the same pool as in Experiment 1 completed Experiment 2. Their mean age was 22.9 years.

Apparatus and stimulus

The same apparatus and stimuli used in Experiment 1 were adopted in Experiment 2.

Procedure and design

The procedure and the design of Experiment 2 were identical to those of Experiment 1 except that the encoding faces were presented sequentially rather than simultaneously. The order of the face presentation was consistent and predictable. The first face always appeared at the upper left of the visual quadrants and the last face appeared at the lower right. While the encoding display of four faces was presented for 1,200 ms in Experiment 1, each face was presented for 300 ms in Experiment 2. After a 1,000-ms blank retention interval, a test display consisting of four faces with one cued with a red square appeared. The test display lasted until participants responded.

Results

ANOVA was conducted on the d-prime data with face type as a between-subjects variable and emotion, category heterogeneity, and serial position (temporal order of probe) as the three within-subjects variables (see Table 2). The main effect of face type was significant, F(1, 30) = 4.41, p = .04, MSe = 1.96, ηp2 = .13. The group that encoded fearful and neutral faces (M = 1.39) performed significantly better than the group that encoded happy and neutral faces (M = 1.13). The main effect of emotion was significant, F(1, 30) = 19.73, p < .0001, MSe = .85, ηp2 = .40. Memory sensitivity for emotional faces (M = 1.44) was significantly higher than that for neutral faces (M = 1.08). The interaction of face type and emotion was significant, F(1, 30) = 6.41, p = .02, MSe = .85, ηp2 = .18. Separate analyses showed that for the fearful face group, memory sensitivity for fearful faces (M = 1.67) was significantly higher than that for neutral faces (M = 1.10), F(1, 15) = 37.30, p < .0001, MSe = .07, ηp2 = .71. However, memory sensitivity for happy faces (M = 1.21) did not differ significantly from neutral faces (M = 1.05), F(1, 15) = 1.36, p = .26.

Table 2 The mean d-prime values (standard deviations in parentheses) in Experiment 2 as a function of face type, emotion, category heterogeneity, and serial position

As in Experiment 1, the interaction of emotion and category heterogeneity was significant, F(1, 30) = 10.63, p = .003, MSe = .66, ηp2 = .26. Additional analyses showed that memory sensitivity for emotional faces was significantly higher when they were encoded with neutral faces (M = 1.60) than when encoded with other emotional faces (M = 1.28), F(1, 31) = 7.71, p = .009, MSe = .20, ηp2 = .20. In contrast, memory sensitivity for neutral faces when encoded with emotional faces (M = 1.00) did not differ significantly from when encoded with other neutral faces (M = 1.15), F(1, 31) = 1.91, p = .18.

Most importantly, the three-way interaction of face type, emotion, and category heterogeneity was significant, F(1, 30) = 10.82, p = .003, MSe = .66, ηp2 = .27 (see Fig. 3). Separate analyses showed that for the fearful face group, the interaction of emotion and category heterogeneity was significant, F(1, 15) = 19.17, p = .001, MSe = .18, ηp2 = .56. Additional analyses, performed separately for fearful and neutral faces, showed that when the probe was a fearful face, memory sensitivity was significantly higher in the heterogeneous display (M = 1.95) than in the homogeneous display (M = 1.39), F(1, 15) = 13.55, p = .002, MSe = .18, ηp2 = .48. When the probe was a neutral face, the main effect of category heterogeneity was also significant, F(1, 15) = 6.41, p = .023, MSe = .18, ηp2 = .30, but memory sensitivity for neutral faces was significantly lower in the heterogeneous display (M = .91) than in the homogeneous display (M = 1.30). Unlike in the fearful face group, the interaction of emotion and category heterogeneity was not significant in the happy face group, F(1, 15) < 1.

Fig. 3
figure 3

The mean d-prime data as a function of face type, emotion, and category heterogeneity in Experiment 2

The main effect of serial position was significant, F(3, 90) = 75.62, p < .0001, MSe = .77, ηp2 = .72. Memory sensitivity for faces that appeared at the fourth serial position (located at the lower right of the visual quadrants) (M = 2.27) was significantly higher than the d-prime values for faces that appeared at the first (M = .96), second (M = .85), and third serial positions (M = .95), which demonstrates a recency effect (Kumar & Jiang, 2005).

The four-way interaction of face type, emotion, category heterogeneity, and serial position was significant, F(3, 90) = 4.54, p = .005, MSe = .49, ηp2 = .13. Separate analyses were conducted on each serial position for the fearful and happy face groups. For the fearful face group, the interaction of emotion and category heterogeneity was not significant at serial position 1, F(1, 15) = 1.16, p = .30. However, it was marginally significant at serial position 3, F(1, 15) = 3.61, p = .08, and significant at serial position 2, F(1, 15) = 11.31, p = .004, MSe = .39, ηp2 = .43, and at serial position 4, F(1, 15) = 17.27, p = .001, MSe = .60, ηp2 = .54. Additional analyses were performed separately for when the probe was a fearful face and when it was a neutral face. When the probe was a fearful face at serial position 2, the main effect of category heterogeneity was significant with higher sensitivity in the heterogeneous (M = 1.51) than the homogeneous display (M = .90), F(1, 15) = 21.94, p < .0001, MSe = .14, ηp2 =.59. When the probe was a neutral face, the main effect of category heterogeneity was marginally significant with higher sensitivity in the homogeneous (M = .99) than in the heterogeneous display (M = .55), F(1, 15) = 3.32, p = .09. The tendency was the same at serial position 4, as when the probe was a fearful face, sensitivity was higher in the heterogeneous display (M = 3.21) than in the homogeneous display (M = 2.42), F(1, 15) = 4.54, p = .05, MSe = 1.10, ηp2 = .23, while when the probe was a neutral face, sensitivity was higher in the homogeneous (M = 2.41) than in the heterogeneous display (M = 1.58), F(1, 15) = 9.14, p = .009, MSe = .60, ηp2 = .38. The same additional analyses were conducted for the happy face group. The interaction of emotion and category heterogeneity was not significant at serial position 1, F(1, 15) = 1.24, p = .28, or at serial position 2, F(1, 15) = 2.88, p = .11. The interaction was only marginally significant at serial position 3, F(1, 15) = 4.38, p = .054, and at serial position 4, F(1, 15) = 4.13, p = .06. Further analyses showed that at serial position 3, the main effect of category heterogeneity was not significant when the probe was a happy face, F(1, 15) = 1.73, p = .21, nor when it was a neutral face, F(1, 15) = 3.11, p = .10. At serial position 4, the main effect of category heterogeneity was significant when the probe was a happy face, F(1, 15) = 5.11, p = .04, MSe = .77, ηp2 = .25, but was not significant when it was a neutral face, F(1, 15) < 1.

No other effects or interactions were significant.

Discussion

The overall performance was better for the fearful than for the happy face group, as in Experiment 1. Change detection performance was also better for emotional than for neutral faces. However, unlike in Experiment 1, superior memory sensitivity for emotional over neutral faces was observed only between fearful and neutral faces and not between happy and neutral faces. Moreover, the memory trade-off was no longer evident in the happy face group. That is, enhanced memory for happy faces and impaired memory for neutral faces was not observed when the spatial allocation of attentional resource was controlled, indicating that the memory trade-off occurred because of the bias towards happy expressions in allocating spatial attention.

The memory trade-off between fearful and neutral faces, however, was still evident even when fearful faces no longer had priorities for spatial attention. It is important to note that this memory trade-off was obtained at serial positions 2 through 4, but not at serial position 1. The finding that it took place at trailing serial positions implies that fearful expressions in inducing the memory trade-off may have influenced the temporal allocation of attentional resource such that when multiple faces were processed in a rapid order, interference could have been introduced for the processing of stimuli that followed fearful faces. One possible explanation for this is emotion-induced blindness, the attentional blink that occurs when probes are preceded by task-irrelevant emotional stimuli (Arnell, Killman, & Fijavz, 2004; McHugo, Olatunji, & Zald, 2013). This performance decrement results from competition for perceptual representations between probes and emotional distractors. So, when emotional distractors deprive attentional resource, processing of the following probes is temporally suppressed (Wang, Kennedy, & Most, 2012). Thus, to examine if the memory trade-off resulted from how fearful faces temporally consume the limited attentional resource, it seemed necessary to control not just the spatial attentional component but also the temporal attentional component.

Experiment 3

Controlling the spatial allocation of attentional resource removed the memory trade-off between happy and neutral faces but not between fearful and neutral faces. Rather, rapid serial presentation introduced interference that resembled emotion-induced blindness (Arnell et al., 2004). This opens up the possibility that the temporal allocation of attentional resource plays a critical role in how fearful faces can facilitate VWM. If fearful faces held attention longer and consumed more of the attentional resource, and thus a memory trade-off occurred between fearful and neutral faces in the form of emotion-induced blindness, the memory trade-off would no longer persist when temporally asymmetric allocation of attentional resource is controlled. The temporary reduction in detecting a probe, which immediately follows a target, is reduced when brief intervals are inserted between images (Chun & Potter, 1995; Raymond, Shapiro, & Arnell, 1992; Seiffert & Di Lollo, 1997). Thus, in Experiment 3, blank intervals were simply added between images while the rest of the experimental design was the same as in Experiment 2.

Method

Participants

Thirty-two new participants (20 females, 12 males) from the same participant pool used in the previous experiments completed Experiment 3. Their mean age was 21.7 years.

Apparatus and stimuli

The same apparatus and stimuli used in prior experiments were adopted in Experiment 3.

Procedure and design

The procedure and the design were identical to those of Experiment 2 except that a blank interval of 300 ms appeared between images in the encoding display.

Results

ANOVA was performed on the d-prime data with the same factors used in Experiment 2 (see Table 3). Unlike in Experiments 1 and 2, face type was not significant, F(1, 30) < 1. Memory sensitivity for the fearful face group (M = 1.49) did not differ significantly from that for the happy face group (M = 1.51). The main effect of emotion of the probe was significant, as memory sensitivity for emotional faces (M = 1.67) was significantly higher than that for neutral faces (M = 1.33), F(1, 30) = 27.53, p < .0001, MSe = .53, ηp2 = .48. The interaction of face type and emotion was also significant, F(1, 30) = 14.69, p = .001, MSe = .53, ηp2 = .33. A separate analysis indicated that memory sensitivity for fearful faces (M = 1.78) was significantly higher than for neutral faces in the fearful face group (M = 1.20), F(1, 15) = 40.10, p < .0001, MSe = .07, ηp2 = .73. However, memory sensitivity for happy faces (M = 1.55) did not differ significantly from that for neutral faces in the happy face group (M = 1.46), F(1, 15) = 1.03, p = .33.

Table 3 The mean d-prime values (standard deviations in parentheses) in Experiment 3 as a function of face type, emotion, category heterogeneity, and serial position

Unlike in Experiments 1 and 2, the interaction of emotion and category heterogeneity was only marginally significant, F(1, 30) = 3.38, p = .08, MSe = .47, ηp2 =.10. When additional analyses were conducted separately for emotional and neutral faces, memory sensitivity for emotional faces was not significantly higher in the heterogeneous (M = 1.74) than in the homogeneous display (M = 1.60), F(1, 31) = 1.94, p = .17. Memory sensitivity for neutral faces when encoded in the heterogeneous display (M = 1.29) also did not differ significantly when encoded in the homogeneous display (M = 1.37), F(1, 31) < 1. Most importantly, the three-way interaction of face type, emotion, and category heterogeneity was not significant, F(1, 30) < 1 (see Fig. 4).

Fig. 4
figure 4

The mean d-prime data as a function of face type, emotion, and category heterogeneity in Experiment 3

The main effect of serial position was significant, F(3, 90) = 40.55, p < .0001, MSe = 1.03, ηp2 = .58. Memory sensitivity at serial position 4 (M = 2.34) was significantly higher than the d-prime values measured at serial positions 1 (M = 1.12), 2 (M = 1.15), and 3 (M = 1.39). The interaction of category heterogeneity and serial position was also significant, F(3, 90) = 9.80, p < .0001, MSe = .62, ηp2 = .25. Additional analyses, each conducted on different serial positions, showed that at serial position 1, the main effect of category heterogeneity was significant with higher sensitivity in the heterogeneous (M = 1.41) than in the homogeneous display (M = .83), F(1, 31) = 17.20, p < .0001, MSe = .31, ηp2 = .36. At serial position 2, category heterogeneity did not have a significant effect on memory sensitivity, F(1, 31) < 1. At serial position 3, the main effect was significant with higher sensitivity in the homogeneous (M = 1.61) than in the heterogeneous display (M = 1.16), F(1, 31) = 9.96, p = .004, MSe = .32, ηp2 = .24. At serial position 4, category heterogeneity did not have a significant effect, F(1, 31) < 1.

Discussion

The results of Experiment 3 differed from the findings of Experiments 1 and 2 in several important ways. First, while the change detection performance of the fearful face group was significantly better than that of the happy face group in both Experiments 1 and 2, it was not in Experiment 3. The most important finding in Experiment 3 was that the memory trade-off seen in Experiments 1 and 2 did not take place. Inserting temporal blank intervals between faces removed it for the fearful face group. The memory trade-off only at trailing serial positions, which implied the occurrence of emotion-induced blindness, was also not observed in Experiment 3.

Although the memory trade-off was removed, participants still showed higher VWM performance for fearful faces than for neutral faces. The remaining difference in VWM performance between fearful and neutral faces implies that other sources can contribute to the selective memory enhancement, and these are discussed in more detail in the General discussion.

General discussion

The present study investigated the underlying mechanisms that facilitate VWM for emotional faces. VWM can be enhanced by either a bias in the allocation of attentional resource or having a separate memory store. To examine how VWM for emotional faces alters depending on their coinciding memory items in allocating limited resource and storing them into one or more stores, the emotional heterogeneity of the encoding display was manipulated. In general, the results of the present research supported the former prediction, with facial expression playing a key role in biasing attentional resource. When emotional faces were encoded with neutral faces, a memory trade-off occurred, resulting from the preferential allocation of attentional resource towards emotional faces. However, this memory trade-off was not restricted to between fearful and neutral expressions and was also observed between happy and neutral expressions. To verify whether the memory trade-off occurred due to a bias in allocating attentional resource, attentional components were controlled with the intention of removing it.

The occurrence and disappearance of the memory trade-off

When the spatial attentional component was controlled by presenting one face at a time in different quadrants of the display instead of showing all four faces simultaneously, the memory trade-off between happy and neutral faces disappeared. Memory sensitivity for fearful faces, however, was still higher when presented with neutral faces than when presented with other fearful faces, while the opposite pattern was obtained for neutral faces. This memory trade-off between fearful and neutral faces was observed only at trailing serial positions, which indicated that the temporal allocation of attentional resource could have been affected by fearful expressions. Thus, to examine if fearful faces held attention long enough to suppress the processing of the following neutral faces and so the memory trade-off occurred in a way that resembled emotion-induced blindness, blank intervals were inserted between faces to prevent prolonged processing of fearful expressions. The results showed that the memory trade-off disappeared between fearful and neutral faces as well.

Stimuli can be attentionally preferred in two ways. One is to be prioritized in a way that a bias is created in allocating limited attentional resource when processing multiple stimuli. Threatening stimuli, like angry and fearful faces, were claimed to have a bias in drawing attentional resource. Studies using visual search paradigms showed that they are prioritized among other stimuli (Hansen & Hansen, 1988; Öhman, Lundqvist, & Esteves, 2001). However, other studies failed to replicate this superiority effect and claimed that those visual search studies are strongly confounded with low-level features, making threatening stimuli highly distinctive (Hunt et al., 2007; Juth, Lundqvist, Karlsson, & Öhman, 2005; Purcell, Stewardt, & Skov, 1996). Instead, several studies suggested that happy faces consistently draw attention more quickly than do angry or fearful faces (Becker et al., 2011; Juth et al., 2005). The present study is consistent with these studies of a bias in allocating attentional resource towards happy faces, as the asymmetric allocation of spatial attention between happy and neutral faces seemed to have resulted in a memory trade-off, and this trade-off disappeared when each face was presented sequentially for an equal duration.

Another way in which stimuli can be attentionally preferred is by holding attention longer and consuming more of the limited attentional resource. Emotion-induced blindness takes place in a way such that task-irrelevant emotional stimuli consume attentional resource and suppress the processing of non-emotional probes that follow in a rapid serial presentation (Arnell et al., 2004; McHugo et al., 2013). Considering the results of Experiment 2 where interference, which seems to have occurred from the competition for attentional resource between fearful and neutral faces, was present only at serial positions 2–4, but not at serial position 1, it is likely that the temporal allocation of attentional resource was affected by fearful expressions and that they disrupted the processing of neutral faces that followed. This interference, which resulted in a memory trade-off between fearful and neutral faces, was removed once the temporal attentional component was controlled by inserting blank intervals.

It is noteworthy that this VWM facilitation for fearful or happy faces does not necessarily occur in all situations. For example, Kensinger and Corkin (2003) found that accuracy did not differ between negative and neutral faces when using n-back tasks. This may be due to the reason that while emotional information depriving processing resources can facilitate working memory in a change detection task, it may not in an n-back task. Change detection tasks are sensitive to the attentional demands of the stimuli, as one or more items are encoded under perceptual limitations, and thus stimuli that are attentionally prioritized would also be prioritized in encoding.

Memory facilitation for positive facial expressions

In previous research, VWM for happy faces did not differ significantly from that for neutral faces. These studies often presented faces with only the same facial expression in an encoding display but did not present different facial expressions together (Jackson et al., 2014; Sessa et al., 2011). In contrast, the present study had both emotionally homogeneous and heterogeneous encoding displays, and the results of Experiment 1 showed that memory sensitivity for happy faces was significantly higher than that for neutral faces. This is possibly because VWM for happy faces can be facilitated by taking away the limited attentional resource that is shared with other coinciding stimulus categories that are not attentionally prioritized, such as neutral faces. Memory facilitation for happy faces was absent in Experiment 2 in which the asymmetric allocation of attentional resource across different facial expressions was prevented by presenting each face serially. It is, therefore, likely that previous studies did not observe happy facial expressions facilitating VWM as only happy faces were presented in an encoding display, so they could not have benefited from being prioritized in the distribution of spatial attention over other stimuli.

Still a residual memory advantage for fearful facial expressions

The occurrence and disappearance of the memory trade-off have unraveled one factor that crucially affects VWM enhancement, but the present study does not propose that it is the only factor that facilitates memory. Controlling the spatial and temporal attentional components removed the memory trade-off between fearful and neutral faces, but VWM for fearful faces was still significantly better than that for neutral faces. One possible explanation for this residual memory advantage for fearful faces is the amygdala activation during working memory processing. The amygdala is known to modulate encoding and memory consolidation by influencing activity in the hippocampus (Phelps, 2004). Studies have shown that the amygdala can also contribute to performance in working memory tasks. Individual differences in the amygdala activity predict working memory performance with faster response times for people with higher event-related amygdala activation (Schaefer et al., 2006). Threatening stimuli, including fearful faces, induce activation in the amygdala, and its role in facilitating working memory is consistent with findings that only negative expressions significantly increase the capacity to identify faces in VWM tasks (Jackson et al., 2009).

Another possibility is facilitation due to increased cortical resources and thus less interference taking place among fearful faces. Processing of complex stimuli like faces can benefit from increased cortical resources. Studies have shown that fearful expressions instigate distinguishable neural pathways (Vuilleumier & Pourtois, 2007; Vytal & Hamann, 2010). The behavioral findings of the present study do not support the notion of fearful expressions having a separate memory store because memory for both fearful and neutral faces should have increased when fewer items were encoded into separate stores. However, this does not rule out the possibility that the perceptual processing of fearful faces could benefit from diverse neural populations. Cohen et al. (2014) showed that VWM is enhanced by increased cortical resources, and this facilitation can take place either during early perceptual processing or during later stages, where memory is stored and retrieved. If increased cortical resources facilitate perceptual processing of multiple stimuli under brief encoding duration, this memory facilitation should decrease when ample time is given to encode (Eng et al., 2005; Jiang et al., 2016). Thus, to test whether the residual memory advantage for fearful faces is attributed to increased cortical resources that reduce interference among multiple faces, rather than presenting encoding displays for a limited duration, future studies should provide participants with sufficient time to encode in a self-paced condition to test whether the memory advantage disappears.

Conclusion

The present study shows that a bias in how fearful facial expressions draw on limited attentional resource is a critical factor underlying enhanced VWM for faces that display negative expressions. We do not argue that the attentional preference is the only factor. However, the observation of the appearance and disappearance of the memory trade-off suggests that it plays a key role in VWM facilitation. Inconsistent with previous studies, the present findings also showed VWM enhancement for happy faces. This memory facilitation disappeared when spatial attention was controlled. While controlling both the spatial and the temporal attentional components removed the memory trade-off between fearful and neutral faces, residual memory advantage for fearful faces existed. This opens up other possible factors that would facilitate VWM for fearful faces, and future studies are needed to clarify them.