Determining whether a salient stimulus that bears little resemblance to a search target reliably captures attention has been difficult (e.g., Folk & Remington, 2010; Theeuwes, 2010). Some studies have suggested that the abrupt appearance of a distractor stimulus in an otherwise static array reliably captures attention (e.g., Gaspelin, Ruthruff, & Lien, 2016; Schreij, Theeuwes, & Olivers, 2010) whereas others have suggested that capture only reliably occurs when there is a physical resemblance between the distracting and target stimuli (e.g., Folk, Remington, & Johnston, 1992; Wu, Remington, & Folk, 2014). Similarly, it has been suggested that a salient color distractor, embedded in an array of stimuli, reliably captures attention (e.g., Barras & Kerzel, 2017; Theeuwes, de Vries, & Godijn, 2003), whereas others have suggested that this capture effect is contingent on the resemblance between it and the target color (e.g., Gaspelin, Leonard, & Luck, 2015; Irons, Folk, & Remington, 2012).

Strong evidence that neither abrupt-onset nor salient color distractors reliably capture attention can be found in variations of the contingent-capture cueing paradigm (Folk et al., 1992; see Büsel, Voracek, & Ansorge, 2018, for a review). Participants are instructed to use their peripheral vision to find a uniquely colored target (e.g., red) in an array of homogeneous distractor colors. Once found, the target shape is typically identified by way of one of two keypress responses. Critically, the target is preceded by a cue that either matches or mismatches the target color. When that cue is a single visual transient mismatching the target color—such as an abrupt-onset cue—the finding is that responses are no faster when the target appears at the cued location, relative to when it appears elsewhere, at short cue–target onset asynchronies (≤ 300 ms). The same pattern holds when that cue is a unique color in an array of homogeneous distractors that mismatch the target color. However, when that cue matches the target, responding is faster for targets appearing at the cued location than for targets appearing elsewhere. The pattern gives rise to the notion that only cues that resemble the target’s search-defining feature will reliably capture attention (e.g., Chen & Mordkoff, 2007; Lien, Ruthruff, Goodin, & Remington, 2008).

The finding that a salient color cue mismatching the target color fails to capture attention is not unique in the broader literature (e.g., Jonides & Yantis, 1988; Theeuwes, 1995; Yantis & Egeth, 1999). For example, when there is a high incentive to search for a particular target feature or dimension, there is good evidence against capture (e.g., Eimer & Kiss, 2010; Gaspelin, Leonard, & Luck, 2017; Müller, Geyer, Zehetleitner, & Krummenacher, 2009). That an abrupt-onset cue does not capture attention is what makes the contingent-capture paradigm special. Indeed, this finding is at variance with the idea that abrupt onsets form a special class of stimuli, unique in its ability to reliably capture attention (e.g., Hollingworth, Simons, & Franconeri, 2010; Jonides & Yantis, 1988). However, recent findings suggest that an onset cue does generate capture in contingent-capture paradigms if (a) it occurs infrequently (Folk & Remington, 2015) or (b) the target is difficult to distinguish from the distractors (Gaspelin et al., 2016). The latter study proposed that the capture effect from an onset cue is obscured when the target is easily distinguished from the distractors, thus highlighting a problem of latent attentional capture (i.e., capture that occurs but is masked).

To deal with this problem of latent capture, we have taken a different approach. Specifically, regardless of exactly why capture might be obscured in a contingent-capture paradigm, if a cue reliably attracts attention, people should be able to pick up, either implicitly or explicitly, on any statistical regularities that exist between it and the target location (e.g., learned predictiveness; Le Pelley, Mitchell, Beesley, George, & Wills, 2016). That is, if a mismatch cue reliably captures attention, it should be possible to expose any latent effect of that cue by correlating it with the target location. If the mismatch cue is truly unattended, the relationship between the mismatch cue and the target location should not be learned, and thus no capture will be exposed.

To expose latent capture through statistical learning, we adapted the contingent-capture cueing paradigm of Irons et al. (2012), which involves four possible cue and target locations. Importantly, targets occurred frequently (81.5%) at the location of mismatch cues, but at chance (25%) at the location of match cues. The first experiment tested for latent capture with color mismatch cues, and the second for latent capture with abrupt-onset mismatch cues.

Experiment 1

Identification responses are fastest when the target appears at the location of a cue that matches the search-defining target feature, whereas no such capture effect is found when the cue mismatches the target’s search-defining color. This experiment tested whether this pattern holds when a mismatch cue, unbeknownst to participants, predicts the target location.

Method

Participants

Twenty-one undergraduate students (mean age = 18.43 years; one left-handed; 14 females, seven males) from the University of Toronto consented to participate for course credit.

Stimuli and apparatus

The experiment took place in a dimly lit room. The experiment was built in Python, and all stimuli were displayed on a 24-in. LED screen with a resolution of 2,560 × 1,440 and a refresh rate of 144 Hz. The viewing distance was held constant at 57 cm with a chinrest.

The screen’s background was black (luminance = 0.33 cd/m2; x = .2624, y = 0.2624). The fixation stimulus was a small, gray (luminance = 20.83 cd/m2; x = 0.3500, y = 0.3493), unfilled square (0.34° × 0.34° of visual angle; line width = 1 pixel) at the center of the screen. The placeholder array consisted of four gray, unfilled squares (placeholders; 1.16° × 1.16°, line width = 0.12°), positioned 4.1° to the left of, to the right of, above, and below fixation.

The cueing array consisted of four filled circles (0.12° radius) positioned 1° to the left of, to the right of, above, and below the center of each placeholder. The circles surrounding three of the placeholders were white (luminance = 167.26 cd/m2; x = 0.3598, y = 0.3608), whereas the circles surrounding the remaining placeholder were either red (luminance = 47.34 cd/m2; x = 0.64, y = 0.34) or green (luminance = 118.66 cd/m2; x = 0.32, y = 0.64). These red and green stimuli were the cues.

The target array consisted of one apiece of the “x,” “+,” “=,” and “‖” symbols, with each subtending 0.6° and appearing at the center of a unique placeholder location. The target stimulus was red and always either the “x” or “=” symbol. The distractor stimuli were chosen from the remaining symbols without replacement, with two randomly appearing in white and the remainder appearing in blue (luminance = 6.40 cd/m2; x = 0.16, y = 0.05). Responses were made to the red “x” or “=” target by pressing the “.” or the “/” key, respectively. Errors of commission and omission (> 2 s) resulted in text messages at the center of the screen that included the correct stimulus–response mappings. These messages were acknowledged by pressing the spacebar.

Procedure

Each trial began with the placeholder array for 500 ms, then a fixation stimulus appeared, on which participants focused their gaze. Following a random interval of 1, 1.1, 1.2, or 1.4 s, the cueing array appeared for 50 ms. Half of the cueing arrays contained the red (match) cue, and the other half the green (mismatch) cue. This color cue appeared at a random placeholder location. The red cue did not predict the target location (25% cue validity), whereas the green cue did (81.25% cue validity; see Fig. 1).

Fig. 1
figure 1

The basic contingent-capture approach for exposing latent capture from mismatch cues. Match and mismatch cues occur equally often, but only the mismatch cue predicts the target location

The target array appeared 100 ms after the offset of the cueing array (cue–target onset asynchrony = 150 ms) and remained onscreen for 120 ms. Participants had 2 s to respond to the target array; otherwise, the trial timed out. With a correct response, the fixation stimulus disappeared after 1 s (and reappeared 500 ms after that) to signal the next trial. If an error was made, all stimuli were removed from the screen and an error message appeared, which the participant acknowledged with the spacebar, and the next trial was signaled as above.

Participants were instructed to identify the red “x” or “=” sign as quickly and accurately as possible while staying fixated. Participants were not given any information about the relationship between the cueing and target arrays and were told simply that extraneous information would precede and occur with the target.

Design

After completing a practice block of 64 trials (not analyzed), participants completed three blocks of 256 experimental trials each.

Results and discussion

Mean response times (RTs) were analyzed with a 3 (block: 1, 2, or 3) × 2 (cueing: the target appeared at the location of a previous cue [“cued”] or did not [“uncued”]) × 2 (cue type: spatially nonpredictive match cue or spatially predictive mismatch cue) repeated measures analysis of variance (ANOVA). Prior to the analysis, 3.29% and 0.34% of trials were excluded, for errors of commission and omission, respectively. Trials with RTs 2.5 standard deviations greater than (2.37%) or less than (0.04%) each participant’s mean were excluded as outliers.

We found an effect of cueing, F(1, 20) = 95.20, p < .01, ηp2 = .8263, with faster responses to targets at cued (555 ms) than at uncued (576 ms) locations. We also found an effect of block, F(1, 20) = 3.478, p = .041, ηp2 = .1481, with faster responses on Blocks 2 (562 ms) and 3 (559 ms) than on Block 1 (575 ms). The effect of cue type was not significant, F < 1.

The key interaction between cueing and cue type was significant, F(1, 20) = 85.72, p < .01, ηp2 = .8108. The nonpredictive match cue captured attention (effect = 41 ms; 95% confidence interval [CI]: 35–46 ms), whereas the predictive mismatch cue did not (effect = 1 ms; 95% CI: – 5 to 8 ms). None of the remaining interactions were significant [Cueing × Block: F(2, 40) = 1.833, p = .173, ηp2 = .0839; Cue Type × Block: F < 1], including the critical three-way interaction amongst cueing, cue type, and block, F < 1, which would expose capture via statistical learning (see Fig. 2).

Fig. 2
figure 2

The relationship among cueing (line type), cue type (columns), and block number (x-axis). Errors bars are half Fisher least significant differences computed from the interaction of these variables; overlap signifies a nonsignificant simple effect

There were no concerns about speed–accuracy trade-offs (see Table 1). An ANOVA on error rates, collapsing across errors of commission and omission, revealed that only the cueing effect was significant, F(1, 20) = 11.26, p < .01, ηp2 = .3601, with more accurate responses to cued (3.20%) than to uncued (4.12%) targets. All other Fs < 2.027 and ps > .145.

Table 1 Mean error rates for all combinations of cueing, cue type, and block in Experiment 1

This experiment yielded results consistent with the idea that a salient color cue that mismatches the search-defining target color fails to reliably elicit a shift of attention, whereas a color cue that matches the search-defining target color does. This is the typical pattern. Accordingly, it seems that statistical learning is enslaved to top-down attentional control, in that the mismatch cue never attracted any attention to its location, and thus statistical learning could not occur.

Experiment 2

We now moved the crux of attentional control settings. Would it be possible to expose any latent capture from the mismatch abrupt-onset cues via statistical learning? To test this, the green mismatch cue from before was replaced by an abrupt-onset cue, such that a random placeholder spontaneously and briefly flashed white prior to the target array.

Method

Participants

Twenty-one different undergraduate students (mean age = 21.52 years; one left-handed; 17 females, four males) from the University of Toronto consented to participate for course credit.

Stimuli and apparatus

These were identical to the same aspects of Experiment 1, except that the green cueing array was replaced by an abrupt-onset (mismatch) cue. That is, four filled white circles surrounded a random placeholder prior to the target array.

Procedure

This was identical to that of Experiment 1.

Design

This was identical to the design of Experiment 1, except that cue type included a spatially nonpredictive red cue and a spatially predictive white (luminance = 167.26 cd/m2; x = 0.36, y = 0.36) abrupt-onset cue.

Results and discussion

Mean RTs were analyzed with a 3 (block) × 2 (cueing) × 2 (cue type) repeated measures ANOVA. Only 3.11% and 0.32% of trials were excluded, for errors of commission and omission, respectively. Trials with RTs 2.5 standard deviations greater than (2.5%) or less than (0.06%) each participant’s mean were excluded as outliers.

We observed an effect of cueing, F(1, 20) = 88.57, p < .01, ηp2 = .8158, with faster responses at cued (547 ms) than at uncued (574 ms) locations. We also observed an effect of block, F(1, 20) = 5.057 p = .011, ηp2 = .2018, with faster responses on Blocks 2 (559 ms) and 3 (551 ms) than on Block 1 (572 ms). The effect of cue type was significant, F(1, 20) = 34.90, p < .01, ηp2 = .6357, with faster responses following the nonpredictive match cue (553 ms) than following the predictive mismatch cue (569 ms).

Once again, the key interaction between cueing and cue type was significant, F(1, 20) = 55.88, p < .01, ηp2 = .7364. The magnitude of capture from the nonpredictive match cue (effect = 43 ms; 95% CI: 36–51 ms) was stronger than the magnitude of capture from the predictive mismatch cue (effect = 9 ms; 95% CI: 2–17 ms). Critically, this relationship was qualified by a three-way interaction involving block, F(2, 40) = 5.48, p < .01, ηp2 = .2151. The magnitude of capture from the nonpredictive match cue was unaffected by block, F(2, 40) = 1.294, p = .286, ηp2 = .0608. However, the magnitude of capture from the predictive mismatch cue was affected by block, F(2, 40) = 5.843, p = < .01, ηp2 = .2261. There was no evidence for capture in Block 1 (effect = – 5 ms; 95% CI: – 17 to 7 ms), whereas capture did occur in Blocks 2 (effect = 16 ms; 95% CI: 7–26 ms) and 3 (effect = 16 ms; 95% CI: 4–29 ms; see Fig. 3). The remaining interactions were not significant [Cue Type × Block: F(2, 40) = 1.247, p = .2980, ηp2 = .0587; Cueing × Block: F(2, 40) = 1.204, p = .3110, ηp2 = .0568].

Fig. 3
figure 3

The relationship among cueing (line type), cue type (columns), and block number (x-axis). Errors bars are half Fisher least significant differences computed from the interaction of these variables; overlap signifies a nonsignificant simple effect

An ANOVA on error rates (see Table 2), collapsing across errors of commission and omission, revealed two effects, neither of which involved cueing. There was an effect of cue type, F(1, 20) = 5.43, p = .0304, ηp2 = .2135, with fewer errors following the predictive mismatch cue (2.79%) than following the nonpredictive match cue (4.00%). Thus, the effect of cue type reflected a speed–accuracy trade-off. We also found an interaction between cue type and block, F(2, 40) = 4.807, p = .0135, ηp2 = .1938. Any error rate advantage for the mismatch cue was strongest in Block 3 (2.75% vs. 5.55% for the match cue), and relatively weak in Blocks 1 (2.56% vs. 3.42%) and 2 (3.06% vs. 3.10%). There were no other significant effects (all Fs involving cueing < 1; max F = 2.15, all ps > .13).

Table 2 Mean error rates for all combinations of cueing, cue type, and block in Experiment 2

The results from Block 1 were in line with the contingent-capture literature; there was no evidence that responses were faster for targets appearing at the location of the abrupt-onset cue relative to elsewhere, whereas this was the case for the match cue. Importantly, this pattern changed in Blocks 2 and 3, at which point responses became faster when the target appeared at the location of the abrupt-onset cue than when it appeared elsewhere. The magnitude of capture from the match cue was unaffected by block, as we expected, given that there was no relationship between it and the target location. This pattern suggests that the abrupt-onset cue was not ignored or filtered out, in that the relationship was clearly learned between it and the target location.

General discussion

The goal was to test whether it would be possible to expose any latent capture effects from mismatch color cues (Exp. 1) and abrupt-onset cues (Exp. 2) in the contingent-capture paradigm, through statistical learning. To accomplish this, these mismatch cues were correlated with the target location, whereas cues that matched the target’s search-defining color were not. Both experiments demonstrated capture, in that attention was facilitated to the match cue in both experiments, and the magnitudes of that effect were similar across all blocks of trials. Statistical learning did not expose a latent capture effect from the mismatch color cue, in that the cue did not reliably capture attention in any block. This finding is consistent with the notion that color singleton distractors, mismatching a target color, fail to reliably capture attention. Statistical learning did expose a latent capture effect from the mismatch abrupt-onset cue, an effect that could be seen after the first block of trials. The latter finding defies the notion that the abrupt-onset cue in the classic contingent-capture paradigm is filtered out or ignored because it bears little resemblance to the target color, since here the relationship between it and the target location was clearly acquired.

Collectively, the findings suggest that attentional control settings are not so powerful that only those items matching the setting are attended, as latent capture from the mismatch abrupt-onset cue was exposed via statistical leaning. Yet, statistical learning is not so powerful that it undermines attentional control, in that attention was not reliably captured by the mismatch color cue, despite its relationship to the target location. The findings are generally consistent with the classic idea that abrupt-onset cues have a special status when it comes to attentional capture (e.g., Lamy & Egeth, 2003), contrary to inferences made from a number of historical contingent-capture cueing approaches.

An alternative interpretation could stress that abrupt-onset cues are not special per se. On the one hand, typical contingent-capture paradigms may simply be suboptimal for generating attentional control settings that discourage capture by abrupt-onset cues (e.g., Schönhammer & Kerzel, 2018). This would explain why statistical learning exposed a latent capture effect for the abrupt-onset cue. On the other hand, the mismatch color cue in the typical contingent-capture paradigm may not be sufficiently salient to reliably capture attention (e.g., Rangelov, Muller, & Zehetleitner, 2017). This would explain why statistical learning failed to expose a capture effect for the color mismatch cue. Regardless of these speculations, this experiment, in addition to recent ones that provided an impetus for it (Folk & Remington, 2015; Gaspelin et al., 2016), strongly suggests that the historical contingent-capture approach has underestimated the potential of an abrupt-onset cue to reliably capture attention. Accordingly, strong goal-driven claims that have been made based on this paradigm must be softened.

Remaining to be understood is the opponent process or processes that can obscure capture. There are a number of reasons why a same-location cost might occur at a relatively short cue–target onset asynchrony (e.g., Schoeberl, Ditye, & Ansorge, 2018; Schönhammer & Kerzel, 2017). One is that there is some memory for the stimulus recently associated with the placeholder location (e.g., the abrupt-onset cue, in this case), which has to be updated when a new stimulus occurs within that location (e.g., the red letter target). This updating is presumed to be time-consuming (e.g., Carmel & Lamy, 2014, 2015). Another possibility could include a response-related component, such that a “do not respond” or “ignore” code becomes associated with the location or object at which the abrupt-onset cue occurred, leading to interference when a target appears there (e.g., Neill, Valdes, Terry, & Gorfein, 1992). Appeal to episodic retrieval processes may not always be necessary (e.g., Souto, Born, & Kerzel, 2018). Relatively low-level sensory interactions may also produce a cost. It is known, for example, that the visually responsive neurons in the superior colliculus react less vigorously when their receptive fields are repeatedly stimulated at short cue–target onset asynchronies, which can slow down responding (Fecteau & Munoz, 2005, 2006). These possibilities, among others—such as the speed with which a cued distractor can be rejected (Gaspelin et al., 2016)—need not be mutually exclusive.

Whatever the form of the masking process(es), the attentional capture shown here with statistical regularities strongly suggests that typical contingent-capture cueing paradigms have underestimated capture from abrupt-onset cues. To reveal latent capture, statistical learning approaches provide especially useful diagnostics, and accordingly, we recommend that they be used as litmus tests of whether salient stimuli have truly been ignored or filtered out.