There are at least two opposing views on how attention is controlled (Awh, Belopolsky, & Theeuwes, 2012). According to the bottom-up view of attentional control, characteristics of the stimulus determine where attention is deployed. This idea is most prominently advanced in saliency-map models of visual attention (Itti & Koch, 2001; Theeuwes, 2010; Treisman & Gelade, 1980; Wolfe, 1994). Itti and Koch (2001) proposed that computations of local contrast in various feature maps (for red, green, vertical, etc.) are combined in a master saliency map where information about the nature of the salient feature is lost. For instance, the local contrast between a single red element and the surrounding green elements would be fed into the master saliency map where a peak of saliency would result. However, it would no longer be known whether the peak results from a color singleton or some other unique element (e.g., a unique shape). Further, Itti and Koch (2001) postulate that attention is directed to the most salient element first.

The idea of saliency maps is born out in the additional singleton paradigm (Theeuwes, 1991, 1992) where observers are asked to search for a singleton target defined along a relevant perceptual dimension (e.g., shape), while a singleton distractor defined along an irrelevant dimension (e.g., color) is shown on some of the trials. If the distractor is more salient than the target, reaction times (RTs) increase. To explain this increase, Theeuwes (2010) suggested that attention was captured by the salient distractor against the intentions of the observer, confirming that the initial attentional selection was indeed controlled by bottom-up characteristics of the stimulus. This view is controversial as there is also ample evidence for top-down control of attention (Bacon & Egeth, 1994; Folk, Remington, & Johnston, 1992; Krummenacher & Müller, 2012; Wolfe, 1994; overview in Lamy, Leber, & Egeth, 2012).

The present contribution does not strive for a decision between top-down and bottom-up control theories but attempts to provide a description of the stimulus conditions that favor bottom-up control by salient distractors. In this context, Theeuwes’s (2004, 2010) attentional window account is important.

To respond to early opponents of bottom-up control theory (Bacon & Egeth, 1994), Theeuwes (2004) conjectured that attentional capture by salient stimuli is confined to the attentional window. That is, saliency computations only occur inside the attentional window, and the size of the attentional window is determined by the efficiency of visual search. In conditions where search is efficient, the attentional window is large and saliency computations take place across most of the visual field so that irrelevant-but-salient stimuli may capture attention. Empirically, efficient search results in flat search slopes with small increments in RT when the number of nontargets is increased. Typically, performance with search slopes smaller than 10 ms/item is considered efficient search (Wolfe & Horowitz, 2004). In contrast, the attentional window is small when search is inefficient and observers have to inspect the visual display item by item in order to locate the target. In this case, saliency computations are limited to only a small part of the visual field. Because the probability that the salient element is included in a narrow attentional window is smaller than with a large attentional window, attentional capture is reduced with inefficient compared to efficient search.

Originally, Theeuwes (2004) proposed that the absence of attentional capture with heterogeneous search displays (feature search) in Bacon and Egeth (1994) was due to search being less efficient than with homogeneous search displays (singleton search). Consistent with this explanation, attentional capture reappeared when search efficiency was enhanced by increasing the number of nontarget elements in heterogeneous displays (Theeuwes, 2004). However, more recent investigations confirmed the difference between heterogeneous and homogeneous search displays without any concomitant changes in search efficiency (Barras & Kerzel, 2016; Kerzel & Barras, 2016; Leber & Egeth, 2006).

Further evidence for the attentional window hypothesis comes from experiments using a dual task procedure, where a go/no-go task was combined with a visual search task (Belopolsky & Theeuwes, 2010; Belopolsky, Zwaan, Theeuwes, & Kramer, 2007; see also Notebaert, Crombez, Van Damme, Durnez, & Theeuwes, 2013). To induce a small attentional window, observers had to perform the search task or withhold the response, depending on the stimulus shown at central fixation (e.g., a small circle or square). To induce a large attentional window, execution of the response depended on the shape of the entire search display (i.e., whether the search array formed an upward or downward arrow). Importantly, the stimulus displays were always the same and only the instruction differed. That is, the relevant stimulus for the go/no-go task determined whether attention was in a focused or diffuse state at the onset of the search display. Consistent with the attentional window account, attentional capture was found with a large but not with a small attentional window. This was true for inefficient search with slopes between 16 and 28 ms/item (Belopolsky et al., 2007) and also for efficient search with slopes between −3 and 6 ms/item (Belopolsky & Theeuwes, 2010).

Studies using the dual task procedure did not manipulate the efficiency of visual search directly. However, Theeuwes’s (2004, 2010) central claim was that the efficiency of visual search determined whether the attentional window was large or small. With efficient (or parallel) search, saliency was computed across the visual field because the attentional window was large, whereas it was confined to a small region with inefficient (or serial) search. Two studies have provided data that partially confirm this idea. Proulx and Egeth (2006) investigated search for a vertical line among tilted nontargets. The tilt of the line provided an easily quantifiable measure of target–nontarget similarity. With small tilt angles, target–nontarget similarity was high and search efficiency was low, whereas large tilt angles reduced target–nontarget similarity and increased search efficiency (see Duncan & Humphreys, 1989). Proulx and Egeth (2006) observed that attentional capture was larger with more efficient search, which is consistent with the attentional window account. However, search slopes ranged between 22 and 114 ms/item, which is steeper than the criterion of 10 ms/item for efficient search.

Ideally, the experimental conditions to address the attentional window account would include search slopes below and above the criterion for efficient search because the additional singleton paradigm typically involves efficient search (Theeuwes, 1991, 1992). A study using the modified spatial cueing paradigm developed by Folk et al. (1992) met this requirement. Gaspelin, Ruthruff, Lien, and Jung (2012) asked observers to search for a particular color in the target display. The nontarget colors were either similar to the target color, resulting in inefficient search with slopes between 21 and 26 ms/item, or they were dissimilar from the target color, resulting in efficient search with slopes between −1 and 8 ms/item. The target display was preceded by a cue display, and attentional capture was measured as the difference between valid and invalid cue conditions. For color cues different from the target color, the cueing effect was larger with efficient search than with inefficient search, as predicted by the attentional window account. However, this pattern was not confirmed when the color cue was replaced by an onset cue. Because both color and onset cues were salient events, the results are not entirely compatible with the attentional window account. Be that as it may, it should be kept in mind that the attentional window hypothesis was formulated with respect to cross-dimensional interference in visual search, and it is unclear how and whether it extends to the modified spatial cueing paradigm.

The purpose of the current study was to close the empirical gap in the literature concerning the effects of search efficiency on attentional capture in the additional singleton paradigm that are central to the claims formulated in the attentional window account. Theeuwes (2004) claimed that attentional capture in the additional singleton paradigm is reduced or absent with inefficient compared to efficient search. To date, however, there is no study besides Theeuwes (2004) that directly measured interference from a salient distractor in the additional singleton paradigm under conditions of efficient and inefficient search. In Theeuwes (2004), efficiency of visual search was manipulated by increasing the number of nontarget elements. Search efficiency increased when the number of nontarget elements was increased, probably because the nontarget elements were grouped, which made the target more salient (Bacon & Egeth, 1991; Utochkin, 2013). However, increasing the number of nontargets introduces a number of confounding factors. In particular, the eccentricity and the density of elements in the search array differed with few compared to many nontargets (see Theeuwes, 2004). In contrast, manipulations of target–nontarget similarity, as in two of the studies described above (Gaspelin et al., 2012; Proulx & Egeth, 2006), avoid these nuisance variables and therefore provide better control over the experimental stimuli.

Experiment

To provide a more adequate assessment of the attentional window account, we created a condition that resulted in efficient search and another condition that resulted in inefficient search by changing the similarity between target and nontarget shapes (see Fig. 1). Participants searched for a singleton shape target (a square) and reported the orientation of a line inside the shape by key press. Target–nontarget similarity was low when the nontarget shapes were circles and high when nontargets shapes were diamonds, that is, squares rotated by 45° (Von Grünau, Dube, & Galera, 1994). The number of search elements (set size) varied between five and nine to evaluate search efficiency. The difference in RTs between set sizes of five and nine divided by the difference in set size (four) gives an estimate of the search slope. Search slopes were measured in milliseconds per item, and we expect larger search slopes with high than with low target–nontarget similarity. On half of the trials, a salient color singleton distractor was presented. Following predictions from the attentional window account, we expected stronger interference from the distractor when target–nontarget similarity was low and search was efficient search compared to when target–nontarget similarity was high and search was inefficient.

Fig. 1
figure 1

Experimental stimuli (not to scale). Target was always the square. Similarity was high with diamond nontargets and low with circle nontargets. (Color figure online)

In one group of participants (random group), we varied the experimental factors target–nontarget similarity, set size, and distractor presence simultaneously and presented the resulting combinations in a randomly interleaved order. This design has the advantage of avoiding pretrial expectations or search strategies tailored to the respective visual displays. However, one may argue that the attentional window needs to be set at the onset of the trial, as in the dual-task experiments (Belopolsky & Theeuwes, 2010; Belopolsky et al., 2007). In this case, it would be crucial to allow for pretrial expectations regarding target–nontarget similarity so observers can adjust the size of the attentional window before the search display appears. Therefore, we blocked target–nontarget similarity in the second group of participants (blocked group).

Method

Participants

Forty undergraduate psychology students at the University of Geneva participated in exchange for class credit. Sample size was estimated based on a prior study of Proulx and Egeth (2006), who had 45 participants in their sample. Seventeen students were assigned to the random group and 23 students were assigned to the blocked group. Participants were naive as to the purpose of the experiment and reported normal or corrected-to-normal acuity and normal color vision. The experiment was approved by the local ethics committee, and informed consent was obtained prior to the experiment.

Stimuli

Subjects were seated in a dimly lit room, 80 cm from a 17-in. LCD monitor running at 60 Hz with 1,920 × 1,080 pixels resolution. The background was black. Five or nine shapes were presented at an eccentricity of 5° around the central fixation cross. The target shape was always a square, whereas the nontarget shapes were either diamonds or circles. The circle had a diameter of 2.2° and the square/diamond had a side length of 1.7°. Line width was ~0.06°. On 50% of the trials, one nontarget element, the distractor, had a distinct color (red among green or the opposite) and these trials are referred to as distractor present trials. In the remaining 50% of trials, only a single color was presented. The size and line width of the shapes were adjusted to yield about the same number of lit pixels (1047, 1034, and 1036 for the square, diamond, and circle, respectively). Each shape contained either a vertical or a horizontal gray line of 0.7° length. The luminance of all stimuli was 16.6 cd/m2. There was always a stimulus at the three o’clock position. Placement of display elements and distractor presence varied unpredictably from trial to trial. Further, the color of the nontargets (and distractor, if applicable) changed randomly from trial to trial between green and red to increase interference effects compared to constant colors (Kerzel & Barras, 2016). In a related EEG-study, we used the same stimuli and confirmed that the saliency of the color singleton did not vary between high and low shape-based similarity (Barras & Kerzel, 2017). In that study, we asked observers to respond to the color singleton and varied the similarity between an irrelevant shape singleton (i.e., the square) and the nontarget shapes (i.e., circles vs. diamonds). We found RTs to the color singleton to be independent of the similarity between distractor and nontarget shapes, indicating that the saliency of the color singleton was unaffected by the saliency of the shape singleton.

Procedure

A trial started with the presentation of the fixation cross for a randomly selected duration between 0.85 and 1.2 s, followed by the presentation of the search display that stayed on the screen until a response was registered. Participants were told to find the square and report the orientation of the line inside the shape by pressing one of two keys (left or right arrow keys). They were instructed to ignore the color distractor and to respond as quickly as possible while maintaining accuracy above 90%. Anticipations (RTs < 0.2 s), late trials (RTs > 1.5 s), or choice errors were reported to the subject by visual feedback. Participants performed 64 practice trials before the experiment and were forced to take breaks of 15 s after blocks of 48 trials. Performance feedback was shown during breaks.

In the random group, all independent variables were randomly interleaved. In the blocked group, conditions with the high and low target–nontarget similarity were blocked and two interlaced blocks with 96 trials were run for each similarity condition (i.e., A-B-A-B). The condition in the first block was counterbalanced across participants. Participants completed 720 trials in the random group and 384 trials in the blocked group. The blocked group was run after the random group, and during the initial analysis we noticed that it was not necessary to run as many trials, and we therefore reduced the number of trials from 720 to 384. Running the analyses below only on the initial 384 trials from the random group does not affect the results.

Results

Trials with RTs longer than the online criterion for late trials (1.5 s) amounted to 2.1% of all trials. Outliers with RTs exceeding the respective condition mean by more than 2.5 standard deviations amounted to 2.7%. Late trials and outliers were removed from the data set to partially correct for the skewed distribution of RTs. Mean RTs are shown in Fig. 2. A 2 (group: random, blocked) × 2 (target–nontarget similarity: high, low) × 2 (set size: 5, 9) × 2 (distractor: present, absent) mixed-factors ANOVA was run on mean RTs. RTs were shorter with low than with high target–nontarget similarity (783 vs. 882 ms), F(1, 38) = 43.85, p < .001, ηp 2 = .536, and with a set size of five than with a set size of nine (810 vs. 855 ms), F(1, 38) = 55.82, p = .001, ηp 2 = .595. The interaction between target–nontarget similarity and set size, F(1, 38) = 31.82, p < .001, ηp 2 = .456, showed that the increase of RTs from a set size of five to a set size of nine was larger in the high-similarity condition (850 vs. 915 ms, search slope of 16.4 ms/item) than in the low-similarity condition (771 vs. 795 ms, search slope of 5.9 ms/item). Search slopes are shown in Fig. 2a, and independent-samples t tests showed that both were significantly different from zero, ts(39) > 4.08, ps < .001, with Bonferroni correction. The different search slopes show that our manipulation of target–nontarget similarity was successful. Further, the salient-but-irrelevant color singleton interfered with search, resulting in shorter RTs when it was absent compared to when it was present (805 vs. 860 ms), F(1, 38) = 60.33, p < .001, ηp 2 = .614. Importantly, the interaction between target–nontarget similarity and distractor presence was significant, F(1, 38) = 10.54, p = .002, ηp 2 = .217, showing that interference was larger in the high-similarity condition (67 ms, 849 vs. 916 ms) than in the low-similarity condition (43 ms, 761 vs. 805 ms), but significantly different from zero in both cases, ts(39) > 6.29, ps < .001, with Bonferroni correction. The corresponding means are shown in Fig. 2b. No other effects were significant (ps > .104). In particular, the interaction of target–nontarget similarity and distractor presence was not further qualified by group (p = .253).

Fig. 2
figure 2

Experimental results. a Mean reaction times for the interaction between set size and target–nontarget similarity. b Mean reaction times for the interaction between target–nontarget similarity and distractor presence. Error bars show the between-subject standard error of the mean

Running the same ANOVA on the proportion of correct responses showed that participants were more accurate when the distractor was absent compared to when it was present (95% vs. 94%), F(1, 38) = 5.57, p = .024, ηp 2 = .128. The interaction between target–nontarget similarity and set size was significant, F(1, 38) = 6.09, p = .018, suggesting that participants were more accurate with a set size of five than of nine with high similarity (94% vs. 93%), whereas they were more accurate with a set size of none than of five with low similarity (96% vs. 94%). No other effects were significant (ps > .109).

Discussion

According to the criterion of 10 ms/item, we successfully induced efficient and inefficient search by manipulating target–nontarget similarity. Contrary to our predictions, we observed that a salient-but-irrelevant color singleton caused more interference with inefficient than efficient search. The results were not significantly different between random and block-wise changes of target–nontarget similarity, ruling out that pretrial adjustments of the attentional window modulated interference from salient-but-irrelevant stimuli.

To defend bottom-up control theory, Theeuwes (2004, 2010) claimed that saliency computations only occur in the attentional window. When search is efficient (parallel), the attentional window is large, and salient distractors capture attention. When search is inefficient (serial), the attentional window is small, and salient distractors are likely to be outside the attentional window, which avoids attentional capture. Our results are at odds with the attentional window account. We observed larger interference with inefficient than with efficient search, which is the opposite of what the attentional window account predicts. Further, we showed that this result does not depend on expectations about the display type.

While our results are at odds with the attentional window account that is an integral part of Theeuwes’s (2010) theory, we will show that they are fully consistent with saliency-map accounts of attentional capture. Because the attentional window account was developed in the context of saliency-map theories, this contradiction seems surprising at first, but a closer analysis reveals that the attentional window account is not only problematic for saliency-map models (Itti & Koch, 2001), but also for Theeuwes’s (2010) theory itself. To understand the contradiction, it is necessary to reformulate effects of target–nontarget similarity and search efficiency in terms of target saliency (see Table 1).

Table 1 Summary of the relation between the size of the attentional window, search efficiency, target-nontarget similarity, target saliency, and interference from a distractor of fixed saliency

Target–nontarget similarity determines target saliency in a straightforward manner. When the nontarget shapes are dissimilar from the target shape (low target–nontarget similarity), the target shape will stand out among the nontarget shapes and is therefore salient. In contrast, the target is inconspicuous when the nontarget shapes are similar to the target shape (high target–nontarget similarity). Thus, another way of describing the present results is to say that the same color singleton distractor caused less interference when the shape target was salient (low target–nontarget similarity, efficient search) compared to when it was inconspicuous (high target–nontarget similarity, inefficient search).

These results are fully consistent with classic studies by Theeuwes (1991, 1992), showing that distractors only capture attention when they are more salient than the target (see also Moher, Anderson, & Song, 2015). For instance, when a shape singleton target was less salient than a color singleton distractor, attentional capture occurred. In contrast, when a shape singleton target was more salient than a color singleton distractor, attentional capture was absent. These and our findings support the notion that attention always moves to the most salient element first, an assumption that is shared by saliency-map models (Itti & Koch, 2001) and Theeuwes’s (2010) theory.

Thus, our results are consistent with the assumption that the relative target saliency determines the magnitude of attentional capture. However, the attentional window account contradicts this conclusion. If the size of the attentional window is equated with target saliency, then the attentional window hypothesis states that a salient target (i.e., large attentional window, low target–nontarget similarity, efficient search) allows for stronger interference than an inconspicuous target (i.e., small attentional window, high target–nontarget similarity, inefficient search). This prediction runs counter to the present results, the tenets of saliency-map models (Itti & Koch, 2001) and also Theeuwes’s (2010) theory.

Even though our results are in line with saliency-map models, they seem to contradict the previous study by Proulx and Egeth (2006), who reported less attentional capture with higher target–nontarget similarity. As already pointed out, search in their study was always highly inefficient, with search slopes ranging between 22 and 114 ms/item. With the highest target–nontarget similarity, mean RTs in target-present trials with the smallest set size of three was already at around 900 ms and increased to about 1,500 ms with a set size of nine. In contrast, RTs in the slowest condition of the present experiment were well below 1,000 ms. Thus, it is possible that effects of saliency may have dissipated in the experiments by Proulx and Egeth (2006). Van Zoest, Donk, and Theeuwes (2004) showed that oculomotor capture by salient distractors occurs only for a short period of time after onset of the display and is gradually overcome by top-down influences. Thus, effects of saliency may have dissipated for long RTs with high target–nontarget similarity. However, this explanation would apply only to very long RTs (greater than 1,000 ms) because in the present study, interference from salient distractors did not dissipate over time. Rather, interference in the present study increased with longer RTs (in the high target–nontarget similarity condition) that were overall shorter than 1,000 ms. A restriction of the dissipation account to very long RTs is implausible, however, because the dissipation effect was found to be limited to very fast manual RTs (shorter than 300–400 ms; Hunt, von Mühlenen, & Kingstone, 2007; van Zoest & Kerzel, 2015).

Another difference with respect to some studies supporting the attentional window account is that we used the additional singleton paradigm (Theeuwes, 1991, 1992), whereas others used the irrelevant singleton paradigm. In the additional singleton paradigm, the salient-but-irrelevant feature never coincides with the target. Also, the distractor is shown on only a fraction of the trials to allow for a comparison of distractor-absent and distractor-present trials. The additional singleton paradigm was also used in the studies of Theeuwes (2004) and Belopolsky and Theeuwes (2010) described above. However, Proulx and Egeth (2006) and Belopolsky et al. (2007) used the irrelevant singleton paradigm initially developed by Jonides and Yantis (1988), where the salient-but-irrelevant feature and the target coincide according to baseline probability, creating a small incentive to attend to the distractor (Becker, 2007). Also, the salient-but-irrelevant feature is present on all trials, and search slopes are measured separately for trials where the target bears the salient feature and for trials where it does not. Attentional capture is visible in more efficient search (i.e., shallower search slopes) for salient than for nonsalient targets. In contrast, attentional capture in the additional singleton paradigm is evidenced by an increase in RTs in distractor-present compared to distractor-absent trials. Despite these procedural differences, the interpretations of the results from the additional and irrelevant singleton paradigms seem interchangeable. As a case in point, the same authors used one paradigm or the other (Belopolsky & Theeuwes, 2010; Belopolsky et al., 2007) to answer the same theoretical question. Nonetheless, we cannot rule out that the many procedural differences account for the differences between our results and those of Proulx and Egeth (2006).

In sum, we found predictions from the attentional window account to be inconsistent with the results of a direct comparison between efficient and inefficient search. Inefficient and efficient search are associated with small and large attentional windows, respectively. According to the attentional window account, a large attentional window promotes, while a small attentional window avoids, attentional capture. However, we observed larger and not smaller interference with a small attentional window (i.e., inefficient search). The results are consistent with the assumption that target saliency was reduced in displays resulting in inefficient search so that the relative distractor saliency was increased. According to saliency-map models, higher relative saliency causes larger interference, which is confirmed by the present results.

Overall, our results are consistent with bottom-up control theories of attention. However, the present results do not provide a test between bottom-up and top-down control theories because we did not manipulate top-down search goals at all. Instead, the present results describe the search type that favors stimulus-driven control. Attentional capture by salient stimuli was found to be stronger with stimuli that resulted in inefficient, difficult search compared to stimuli that resulted in efficient, easy search. These results are in line with computational models of visual selection (Itti & Koch, 2001; Navalpakkam & Itti, 2007) but contradict the attentional window account (Theeuwes, 2004, 2010).