Introduction

A critical function of attention is to direct us to salient stimuli in the environment, facilitating fast and accurate response to these stimuli. In some cases, the “saliency” of a stimulus is defined by basic physical attributes (e.g., high luminance contrast), but attentional biases may also be driven by more complex sets of features. Salient stimuli that are task-irrelevant can involuntarily capture and potentially hold attention, leading to distraction. However, the mechanisms of attentional capture versus hold, and how these differentially contribute to distraction, are not well understood.

One set of stimuli known to strongly bias attention is faces. Faces are biologically and socially relevant and serve as an excellent tool for studying the allocation of attention. Infants prefer to look at upright versus inverted faces (Mondloch et al., 1999), and adults can detect and categorize faces within a scene more quickly than other objects (Ro, Russell, & Lavie, 2001). Attentional biases to faces are not restricted to faces that are task relevant, since task-irrelevant faces have been found to attenuate target performance on categorical discrimination tasks (Langton, Law, Burton, & Schweinberger, 2008). Additionally, spatial neglect patients are less likely to extinguish schematic faces than other shapes, and this is especially true for faces with emotional expressions (Vuilleumier & Schwartz, 2001).

Indeed, the majority of research on attentional biases to faces has utilized highly emotional stimuli such as angry or fearful faces. This work suggests that humans have an attentional bias toward stimuli that may warn of an approaching threat. In visual search tasks, participants detect angry faces more quickly than happy faces (Eastwood, Smilek, & Merikle, 2001; Hansen & Hansen, 1988). Similar evidence comes from research on the attentional blink—the impairment for detecting a second target presented shortly after a first target. The detection of the second target is less impaired when it is an angry face; no such effect is found for happy or neutral faces (Maratos, Mogg, & Bradley, 2008). The aforementioned results have been taken as evidence that negatively valenced faces enhance the capture, or initial orienting, of attention (Ohman, Flykt, & Esteves, 2001; Ohman, Lundqvist, & Esteves, 2001).

Fox, Russo, Bowles, and Dutton (2001), however, challenged the interpretation of these findings as an effect on the initial orienting of attention. Through the use of a cuing paradigm with predictive peripheral face cues, they found that on validly cued trials, emotional (happy or angry) faces had no advantage over neutral faces in attracting attention; however, on invalidly cued trials, responses were significantly slowed following angry faces, but not happy faces, as compared with neutral faces. Thus, the valence of the face did not affect the initial orienting but, rather, affected the holding of attention. Critically, this pattern was present only in those with high state-anxiety, suggesting that this effect may be isolated to certain special populations. Of note, the predictive cue provided task-relevant information. In the present study, we are interested in whether healthy individuals may also show an extended holding of attention on faces, even when completely irrelevant.

Here, we used a novel continuous performance task (Kim & Hopfinger, 2010) to separate the initial capture of attention from the subsequent holding of attention. Participants completed an orientation discrimination task in the periphery, while task-irrelevant distractors appeared at fixation and remained onscreen for 4 s (throughout the possible period of distraction). Initial capture to the distractors would impair task performance at the time of each distractor’s onset, and extended attentional hold on these items would continue to impair performance beyond the distractor onset.

In Experiment 1a, we manipulated the category of the distractor to be a fearful face or a place. Pictures of places were chosen as the control stimuli because these images, like faces, represent a unique class of highly complex objects whose perception has been associated with specialized neural circuitry (Epstein & Kanwisher, 1998).

Experiment 1a

Method

Participants

Twenty-five right-handed students from the University of North Carolina at Chapel Hill (UNC-CH), 18–23 years of age (14 females), with normal or corrected-to-normal vision, participated for course credit. Informed consent was obtained prior to participation. Data from 4 participants (2 females) could not be used, due to technical problems during data collection.

Procedure

Participants maintained fixation upon a centrally located point throughout each run (Fig. 1). In the upper right field (8.37° from the fixation point), a target was presented overlapping a black cross (3.87° × 3.87°) on a gray background. The target was a continuously present red letter “T” that randomly changed its orientation every second. Participants’ task was to judge the target’s orientation after each rotation, pressing one button if the “T” was oriented in the horizontal or vertical direction (0°, 90°, 180°, or 270°) and a different button if the letter was oriented in a diagonal direction (45°, 135°, 225°, or 315°). Participants were instructed to respond as quickly and accurately as possible. Throughout the experimental runs, the target never disappeared; it simply changed orientation. Therefore, there was no top-down target set for abruptly appearing objects. This is important since previous work has suggested that top-down target settings may extend attentional dwell time (Hopfinger & Ries, 2005; Theeuwes, Atchley, & Kramer, 2000; Vachon, Tremblay, & Jones, 2007) and overlap between distractor features and target set can affect attentional weighting (Sy & Giesbrecht, 2011). While participants performed the peripheral discrimination task, distractor items abruptly appeared at the center of the screen for 4 s each. This relatively long stimulus duration, in comparison with previous studies of distraction, ensured that the distractor was fully processed, that any potential attentional holding was localized to a physically present stimulus, and that the potential holding would be distinct from any transient attentional capture to the distractor offset. Distractors consisted of grayscale photographic images (5.88° × 5.88°) of places or fearful faces. Face images were selected from the NimStim database (Tottenham et al., 2009), and place images were selected from those used by Yi, Woodman, Widders, Marois, and Chun (2004). Twenty-eight unique images were used from each database. Distractors were presented in random order, with an equal number of trials of each distractor condition. Participants were instructed that the distractors were task irrelevant and should be ignored. The interstimulus interval (ISI) between distractors was randomized as 3, 4, 5, or 6 s, with an equal number of each ISI. Participants performed six 4-min-long task runs, each containing 204 target events. Throughout the task runs, eye gaze was monitored online with a desk-mounted video camera.

Fig. 1
figure 1

Trial sequence for Experiment 1a. Each frame is presented for 1 s, and the target (the peripheral red letter T), changed orientation between each frame. The duration between successive distractor onsets was equally drawn from the durations 3, 4, 5, or 6 s. While maintaining central fixation, participants discriminated the orientation of each target (pressing button 1 when the T was oriented in the vertical or horizontal direction, and pressing button 2 when the T was oriented in a diagonal direction). A response was required on every frame

Before the experimental runs, participants performed a passive viewing run in which each stimulus that would later be a distractor was presented at fixation, twice in random order (2-s presentation, 1-s ISI). Thus, the distractors used in the experimental runs were not novel to the participants. This design feature was important, since the relative familiarity of stimuli can affect attentional allocation (Parks & Hopfinger, 2008). Before the task runs, participants completed a practice block (containing 75 target events) to ensure that they could accurately perform the task.

Results and discussion

Responses faster than 150 ms or slower than 1,150 ms were rejected from the analyses. For the reaction time (RT) data, we conducted repeated measures ANOVAs with factors of distractor type (fearful face/place) and time (Position T1/T2/T3/T4/T5/TBaseline).Footnote 1 “T1” refers to the target occurring at the time of the onset of the distractor; “T2” refers to the next target (that appears 1 s after the distractor onset); “T3” refers to the next target, and so forth. “T5” refers to the target that occurs simultaneously with the offset (disappearance) of the distractor. “TBaseline” is comprised of all target positions following T5 until the next central distractor stimulus appears, and this was defined separately for each distractor condition, in order to isolate the transient attentional effects specific to each stimulus type, separate from the more sustained effects that might carry over across trial types.

For RTs, the ANOVA revealed no main effect of distractor type, F(1, 20) = 2.75, p = .11 (fearful face = 586.33 ms; place = 583.69 ms), but a significant main effect of time, F(5, 100) = 37.56, p > .001, ηp2 = .653 (T1 = 626.11 ms; T2 = 577.96 ms; T3 = 574.26 ms; T4 = 571.62 ms; T5 = 584.05 ms; TBaseline = 576.08 ms). Critically, a significant interaction between distractor and time was found, F(5, 100) = 2.97, p = .015, ηp2 = .129, suggesting that the two distractor types affected the time course of attentional allocation in different ways. To further explore this interaction, we conducted paired t-tests in conjunction with the Benjamini–Hochberg (B–H) procedure to correct the alpha level for multiple comparisons (Benjamini & Hochberg, 1995) (Fig. 2). For both place and fearful face distractors, RTs to T1 (i.e., to targets occurring at the onset of the distractor) were significantly greater than those to the respective TBaseline condition [for fearful faces: t(20) = 7.77, p > .001, r 2 = .751; FaceT1, M = 625.12 ms, SD = 86.04; FaceTBaseline, M = 571.90 ms, SD = 64.51; for places: t(20) = 6.89, p > .001, r 2 = .703; PlaceT1, M = 627.11 ms, SD = 81.96; PlaceTBaseline, M = 580.25 ms, SD = 63.26]. Thus, the sudden onset of either type of distractor resulted in immediate distraction, significantly slowing responses to the simultaneously presented targets.

Fig. 2
figure 2

Experiment 1a, fearful face versus neutral place: behavioral distraction effect (reaction times relative to respective TBaseline condition). While both fearful face and neutral place distractors initially captured attention (at T1), only fearful face distractors continued to hold attention and reduce task performance (beyond T1). *Significant reaction time differences versus TBaseline (significant after Benjamini–Hochberg correction for multiple comparisons)

Critically, after T1, participants continued to be distracted by fearful faces, but not by places. When the distractor was a fearful face, responses continued to be slowed at T2, t(20) = 3.45, p = .001, r 2 = .372 (FaceT2, M = 583.52 ms, SD = 64.03), and at T3, t(20) = 2.46, p = .012, r 2 = .232 (FaceT3, M = 580.84 ms, SD = 61.94), relative to baseline (FaceTBaseline, M = 571.90 ms, SD = 64.51). Performance returned to baseline by T4, t(20) = −0.50, p = .310 (FaceT4, M = 569.93 ms, SD = 63.19). No such effects were found for place distractors, since performance was impaired only at T1 (i.e., at distractor onset), and not at T2, t(20) = −1.89, p = .037 (not significant after B–H correction); of note, the nonsignificant trend for an effect here was in the opposite direction of that predicted for attentional holding (i.e., faster responses in presence of distractor: PlaceT2, M = 572.40 ms, SD = 58.31; PlaceTBaseline, M = 580.25 ms, SD = 63.27). There were also no significant effects at T3, t(20) = −2.07, p = .026 (not significant after B–H correction; PlaceT3, M = 567.68 ms), or T4, t(20) = −1.34, p = .100 (PlaceT4, M = 573.31 ms, SD = 73.68), relative to baseline. Together, these results provide new evidence that attention was strongly held when the distractor was a fearful face, but not when it was a place. While both distractors initially captured attention (at T1), only fearful face distractors continued to hold attention, impairing task performance beyond T1.

Overall, these results show that fearful faces evoke an extended period of distraction above that produced by other complex stimuli. It is unclear, however, whether the extended holding of attention by fearful faces was due to the fear or the face. Thus, we conducted a follow-up experiment, substituting neutral faces for fearful faces to investigate whether emotion was the critical factor in this holding of attention.

Experiment 1b

In Experiment 1b, the face distractors were of neutral valence. If the attentional hold observed in Experiment 1a was dependent on the emotion of the fearful faces, then the neutral faces in the present experiment should not hold attention at all, or the holding should be shorter-lived (e.g., lasting only until T2).

Method

Participants

Participants included 13 students (9 females; 18–21 years of age) from UNC-CH, who participated for course credit after informed consent was obtained. Participants were right-handed and had normal or corrected-to-normal vision. Two participants were removed from the analyses: One participant revealed that the selection criterion of no concussions was not met; another participant failed to perform the task adequately (accuracy more than three standard deviations below the mean).

Procedure

Procedures were identical to those in Experiment 1a, except that the face distractors were neutral here.

Results and discussion

A two-way ANOVA was performed on RTs to the targets, as described in Experiment 1a. This analysis revealed no main effect of distractor type, F(1, 10) = 0.06, p = .82 (face = 555.08 ms; place = 556.00 ms). However, there was a significant main effect of time, F(5, 50) = 51.24, p > .001, ηp2 = .837 (T1 = 596.03 ms, T2 = 558.20 ms, T3 = 547.14 ms, T4 = 540.20 ms, T5 = 550.07 ms, TBaseline = 541.59 ms), and critically, a significant interaction between distractor type and time, F(5, 50) = 4.42, p = .002, ηp2 = .307. To further explore this interaction, we conducted paired t-tests (Fig. 3). For both neutral face and place distractors, RTs to T1 were significantly greater than those to the respective TBaseline condition [for faces: t(10) = 12.67, p > .001, r 2 = 0.941; FaceT1, M = 597.80 ms, SD = 66.90; FaceTBaseline, M = 536.84 ms, SD = 56.02; for places: t(10) = 10.61, p > .001, r 2 = .918; PlaceT1, M = 594.26 ms, SD = 70.67; PlaceTBaseline, M = 546.33 ms, SD = 61.29]. Thus, the onset of a task-irrelevant stimulus, regardless of whether it was a neutral face or a place, resulted in an immediate distraction, slowing responses to the simultaneously presented target. When the distractor was a place, there was a rapid recovery from distraction, since RTs to targets returned to baseline levels for T2, t(10) = 1.289, p = .11 (PlaceT2, M = 553.57 ms, SD = 59.90), and remained there for T3, t(10) = −0.58, p = .286 (PlaceT3, M = 543.19 ms, SD = 55.77), and T4, t(10) = 0.30, p = .387 (PlaceT4, M = 547.61 ms). However, when the distractors were neutral faces, participants continued to be distracted, showing significantly slower responses to T2, as compared with TBaseline, t(10) = 5.35, p > .001, r 2 = .741 (FaceT2, M = 562.83 ms, SD = 56.19; FaceTBaseline, M = 536.84 ms, SD = 56.02), and this effect extended to T3 as well, t(10) = 3.27, p = .004, r 2 = .517 (FaceT3, M = 551.09 ms, SD = 54.23). Thus, participants continued to be distracted by neutral faces. Only for T4 targets did face distractors no longer robustly hold attention, t(10) = −2.00, p = .036 (not significant after B–H correction; FaceT4, M = 532.79 ms, SD = 56.89).

Fig. 3
figure 3

Experiment 1b, neutral face versus neutral place: behavioral distraction effect (reaction times relative to respective TBaseline condition). Attention was held longer on the distractor when it was a face, as compared with a place, even though the faces were of neutral, unemotional valence. *Significant reaction time differences versus TBaseline (significant after Benjamini–Hochberg correction for multiple comparisons)

In conjunction with Experiment 1a, these results provide evidence that task-irrelevant faces, regardless of emotion, can act as uniquely potent distractors that delay the reorienting of attention back to task-relevant goals. An open question, however, is whether this hold by face distractors is contingent on the ongoing distractor context. In Experiments 1a and 1b, the face distractors were randomly intermixed with nonface, place distractors. Previous work using the attentional blink paradigm has revealed that context plays a critical role in attentional allocation, especially concerning the holding of attention (Parks & Hopfinger, 2008). Therefore, Experiment 2 was conducted to test whether the extended holding of attention by faces is automatic or, rather, is contingent on context.

Experiment 2

In this experiment, the distractors were only of one general type: faces (half fearful, half neutral). If the effects in Experiments 1a and 1b are independent of the context of the distractors, both fearful and neutral faces should hold attention in this experiment as well. An alternative is that fearful faces may produce more distraction than neutral faces when they are paired, in line with research suggesting attentional priority for negatively valenced faces (Eastwood et al., 2001; Ohman, Lundqvist, & Esteves, 2001). Importantly, it is also possible that context differentially affects the initial capture versus the later hold of attention. Indeed, Theeuwes and colleagues (2000) argued that the initial orienting to a salient distractor is automatic but that the duration of attentional holding may be influenced by top-down settings.

Method

Participants

Twenty right-handed students from the UNC-CH, 17–22 years of age (10 females), with normal or corrected-to-normal vision, gave informed consent and received course credit for participation.

Procedure

Procedures were similar to those in Experiments 1a and 1b, except that now all the distractors were faces. There were 18 fearful faces and 18 neutral faces, each of a different person.

Results and discussion

A two-way ANOVA was performed on RTs to the targets, as in Experiments 1a and 1b. Here, no main effect of distractor valence, F(1, 19) = 0.001, p = .971 (fearful = 590.38 ms; neutral = 590.49 ms) was found, but, as expected, there was a significant main effect of time, F(5, 95) = 47.28, p > .001, ηp2 = .713 (T1 = 624.54 ms, T2 = 590.35 ms, T3 = 580.51 ms, T4 = 576.64 ms, T5 = 586.22 ms, TBaseline = 584.35 ms). Critically, and unlike the results of Experiments 1a and 1b, the interaction between distractor and time was not significant, F(5, 95) = 0.91, p = .481. Given the lack of a significant interaction, we followed up on the significant effect of time by conducting paired t-tests on the data collapsed over distractor type.

RTs to T1 were significantly greater than those to the TBaseline condition, t(19) = 10.58, p > .001, r 2 = .855 (T1, M = 624.54 ms, SD = 45.53; TBaseline, M = 584.22 ms, SD = 46.13), revealing that the onset of a face distractor immediately slowed responses to the simultaneously presented target (Fig. 4). Performance quickly recovered following the initial onset of the distractor, since RTs to targets returned to baseline levels by the time of T2, t(19) = 2.26, p = .018 (not significant after B–H correction; T2, M = 590.35 ms, SD = 48.71), and remained there for T3, t(19) = −1.12, p = .138 (T3, M = 580.51 ms, SD = 45.98), and T4, t(19) = −1.92, p = .035 (not significant after B–H correction; T4, M = 576.64 ms, SD = 46.47). Thus, while faces initially disrupted target performance, there was no longer a robust holding of attention. These results suggest that the robust holding effects found in Experiments 1a and 1b were indeed dependent on the distractor context. When the distractors were distinguished only by facial emotion, both types of faces produced an equal degree of distraction. This finding is different from that of a recent eye-tracking study in which individuals were slower to move their eyes away from an angry face, as compared with either a happy or a neutral face (Belopolsky, Devue, & Theeuwes, 2010). In that study, however, the face stimuli were task relevant, since participants were instructed to gaze in the direction that the face was tilted. Thus, the effects of facial expression on attentional holding may depend on the task relevancy of the face stimuli or on differential effects for covert versus overt attention. Additionally, differences in attentional capture have been found for angry faces versus fearful faces (Williams, Moss, Bradshaw, & Mattingley, 2005), suggesting another possible source of differences between studies.

Fig. 4
figure 4

Experiment 2, fearful face versus neutral face: behavioral distraction effect (reaction times relative to respective TBaseline condition). The onset of a face distractor, regardless of its valence, immediately slowed responses to the simultaneously presented target; however, neither condition continued to hold attention. *Significant reaction time differences versus T Baseline (significant after Benjamini–Hochberg correction for multiple comparisons)

General discussion

The goal of the present study was to examine the mechanisms by which highly salient, but irrelevant, stimuli lead to distraction and to dissociate the initial capture of attention from the subsequent holding of attention. In Experiment 1a, the initial capture of attention was found both for fearful faces and for places, but critically, fearful faces continued to hold attention beyond the initial target response. In Experiment 1b, we found that a neutral valence face produced a similar capture and hold. These findings provide new evidence that the holding of attention by faces is a powerful mechanism that can occur regardless of emotional expression. In the consistent-distractor context of Experiment 2, wherein every distractor was a face, there was again a capture of attention regardless of facial emotion, but neither distractor type robustly held attention. Thus, across experiments, context affected one component of attentional distraction (the hold of attention) but not another (the initial capture). When a task-irrelevant face was expected on each trial (as in Experiment 2), participants were able to quickly disengage from such a distractor, preventing a detectable hold of attention. However, when participants could not anticipate the presence of a face (as in Experiments 1a/b), an extended attentional dwell on the faces could not be avoided. Of note, and unlike previous studies of distraction (e.g., Forster & Lavie, 2011), our study isolated the holding of attention by the presence of a distractor stimulus from attentional effects triggered by the offset of that stimulus. By using a relatively long stimulus duration (4 s), we ensured that the abrupt offset of the stimulus, which can automatically capture attention in a manner similar to an abrupt onset (Miller, 1989), would be unlikely to interfere with our measure of attentional dwell time.

Whereas previous studies have demonstrated an attentional holding by negative-valence faces specific to anxious individuals (Fox et al., 2001), our measures of distraction reveal an enhanced hold in healthy individuals. Furthermore, the holding of attention in the present study (Experiments 1a/b) occurred despite the fact that the face stimuli were task irrelevant, unlike previous studies in which there may have been an incentive to dwell on the faces because they were predictive of the target location (e.g., Fox et al., 2001). Our study also reports a novel finding of a context-dependent holding of attention, by providing evidence that faces, whether of negative or neutral valence, hold attention only when they are intermixed with nonface distractors. Results from our Experiment 2 (all face distractors) are in line with the Fox et al. results in which no hold was found in low-anxious individuals, when the stimuli were always faces. Our findings extend that research by showing that, even in healthy individuals, faces can involuntarily hold attention, although this may occur only when faces cannot be anticipated on every trial. Whether the mechanisms underlying this form of distraction are similar in anxious and nonanxious populations remains unclear.

Future research is required to determine the mechanisms underlying the differences in attentional hold across Experiments 1a/b and Experiment 2. Such differences might reflect the active maintenance of having two distractor categories to suppress (i.e., faces and places), or it may result from a passive processing of the distractor frequency (50 % of trials with face distractors, vs. 100 %). Previous research suggests that increasing load on executive control processes reduces the ability to maintain task-relevant processing and, thus, increases distraction (Lavie, 2005). In the present study, perhaps the requirement to inhibit multiple distractor categories in Experiments 1a/b increased load on cognitive control processes, which increased distractor processing (i.e., hold) to the more salient distractor category, faces. Of additional interest is whether the context-dependent attentional hold found here extends to other types of stimuli. Because faces are biologically and socially relevant to humans, these attentional hold effects may be specific to this unique stimulus category. On the other hand, it is possible that other semantically meaningful distractors that promote higher level processing may also extend the hold of attention, as is the case with memory extending the attentional blink (Parks & Hopfinger, 2008). Furthermore, the assignment of meaningfulness may vary across individuals or populations, as was demonstrated for threatening stimuli in those with subclinical anxiety (Fox et al., 2001).

In conclusion, the present study provides new evidence regarding the mechanisms underlying distraction. Across three experiments, we found that an increased level of distraction may not reflect an enhanced orienting but, instead, an extended holding of attention on the irrelevant item. Whereas previous work has highlighted the bias to preferentially attend to faces, the present results refine the mechanism by which this bias occurs: an extended dwelling of attention on faces. Finally, the present findings suggest that distraction is made up of two distinct attentional mechanisms, orienting and holding, of which only the latter was affected by ongoing context. Appreciating the separable effects of these two elements of distraction may ultimately be critical for understanding attention and attention-related impairments.