Tasks require, on every level of analysis, selections among alternatives. The visual world contains many visual objects to be perceived, which in turn are open to several perceptual interpretations, afford different actions, and allow the achievement of multiple goals. In all these instances, few options (stimuli, interpretations, actions, goals) are favored over their alternatives, with the favored ones dominating in perception and action (Desimone & Duncan, 1995; Schneider, 1995; Schneider, Einhäuser, & Horstmann, 2013). The variable aspect of selectivity, where selectivity is not simply determined by low-level structural aspects of the system such as the sensitivity of the sense organ, is called attention.

Attentional selection is often said to occur either intentionally, in a goal-driven and task-adaptive manner, or unintentionally, in a stimulus-driven manner (e.g., Egeth & Yantis, 1997; Jonides, 1981; Yanits, 1993). Intentional or goal-driven selection, on the one hand, is important for efficient performance, when the utility of the action’s outcome is affected by the speed and accuracy of its execution. Unintentional selection, on the other hand, is necessary because a stimulus irrelevant to the current task may still be important to other goals of the organism: the sudden ringing of the telephone, someone calling one’s name, a tiger only five steps ahead in the jungle – these are examples of items that would require immediate attention, yet are not often expected or necessarily related to the current task. Intentional and unintentional selection need to work together to ensure survival of the organism.

A hotly debated factor for stimulus-driven selection is perceptual salience. This term simultaneously refers to an objective characteristic of a stimulus and the phenomenal experience of distinctness and conspicuousness. For the present purpose, we will adopt the first use of the term, whereby perceptual salience denotes a perceptual inhomogeneity that renders some location, object, or event perceptually different from its surround. Examples for perceptual salience are: a flash of light in the dark, a single cloud in the blue sky, or a solitary red apple on a green-leaved tree.

The debate on perceptual salience centers on the question of whether selection is biased to salient items in a purely stimulus-driven manner. It is undisputed that perceptually salient items are efficiently detected in a salience detection task – literally at first glance – and that attention can be directed immediately to the salient location (Yantis & Egeth, 1999). However, some theorists propose that salience drives selection also in the absence of an intention, or even against current intentions and goals. For example, according to influential neuro-computational models (Itti, Koch, & Niebur, 1998; Itti & Koch, 2001; Koch & Ullman, 1985; Wolfe, 1994), salience is automatically extracted and summed over stimulus features and spatial scales, and finally represented in a feature-unspecific saliency map. Shifts of attention and eye movements follow the gradient of activation in the saliency map, with the first fixation being made to the most salient location in a display, the second fixation to the second salient location, and so forth.

The view that attention is directed quickly and involuntarily to the most salient location in a display (saliency capture hypothesis) receives support from studies showing that an irrelevant salient stimulus can induce reaction time (RT) costs (e.g., Kim & Cave, 1999; Theeuwes, 1992). Corresponding experiments show that an irrelevant and strongly salient color singleton interferes with target selection (Theeuwes, 1992) and attracts the observers’ gaze (Theeuwes, deVries, & Godijn, 2003) during the search for a less salient shape singleton. Another important source of evidence is presented in the context of the Saliency Model (e.g., Itti & Koch, 2001), where the correspondence of the models’ predictions and actual human eye movements on natural stimuli is taken as support for stimulus-driven selection by perceptual salience.

The controversy concerns the interpretation of such evidence. Concerning RT experiments, critics (e.g., Ansorge, Horstmann, & Scharlau, 2010; Bacon & Egeth, 1994, Burnham, 2007; Folk, Remington, & Johnston, 1992) argue that claims of goal-independence are often undermined by the tasks assigned to the participants: participants often searched for a salient stimulus of one type (e.g., a square among circles) while trying to ignore a salient stimulus of another kind (e.g., a red circle among green circles). According to the contingent capture hypothesis, interference by the irrelevant salient item (e.g., the color salience) could be a side effect of the goal-directed search for the relevant salient item (e.g., the form salience). Attention to the irrelevant singleton would then not be purely bottom-up but rather a result of an imprecise top-down controlled filter. Moreover, the evidence from eye-movement studies that salient regions are often preferentially inspected has been criticized for the correlative nature of the correspondence of gaze-fixation distributions and salience distributions (Tatler, Hayhoe, Land, & Ballard, 2001). For example, salient items often correspond to object boundaries, such that it is not clear whether the eyes are directed to salient locations or to objects (e.g., Einhäuser, Spain, & Perona, 2008). Salience information may be used as a means to find objects in scenes, in line with the contingent capture account that attention is ultimately controlled by the intentions and goals of the observer.

In conclusion, demonstrating stimulus-driven salience effects is complicated by the possibility that observers might deliberately use salience information in some way to accomplish their task. The only effective way to ultimately circumvent this problem is by testing the effects of salience when the salient item occurs unexpectedly, without prior announcement, and for the first time. Moreover, care must be taken that aspects of the task do not implicitly incite participants to use salience to accomplish their task. That is, the ultimate test for the saliency capture hypothesis is to test the unannounced first presentation of a salient stimulus after presentations of stimulus displays that contain non-salient stimuli only.

Evidence from unexpected singletons

Previous studies have examined attentional orienting towards a salient item on its unannounced first presentation to critically examine the surprise capture hypothesis, that expectancy-discrepant (“surprising”) objects can attract attention (e.g., Horstmann, 2005). According to this view, the visual system constantly monitors visual inputs for their match with the observers’ expectations and past experience (schemata), and guides attention to unexpected stimuli that violate expectations, provided that the unexpected stimulus is pre-attentively detectable (i.e., differs in an elementary feature from other stimuli). Critically, according to the surprise capture hypothesis, stimuli can attract attention by virtue of being novel or expectancy-discrepant, even when they are not part of the attentional set or in any way related to the target. Previous experiments critically tested this hypothesis by first presenting a search target in the context of pre-critical trials to induce an expectation and then testing performance in response to an unexpected change in the search display in a critical trial (see also, Mack & Rock, 1998). Depending on the stimuli presented in the pre-critical trials, the display in the critical trial may or may not be expectancy discrepant. For example, if all display elements are consistently red in the pre-critical trials, a green display element in the critical trial should be expectancy discrepant. In contrast, if some pre-critical trials contain red items and some green items, both red and green items are familiar, and a green item should not attract attention in the critical trial (by virtue of its novelty). Experiments of this type have demonstrated that attention is attracted to an unexpected salient stimulus when the salient feature is expectancy discrepant (Becker & Horstmann, 2011; Horstmann, 2002, 2005, 2006; Horstmann & Becker, 2008, 2011; Horstmann & Herwig, 2015, 2016; Retell, Venini, & Becker, 2015), using converging operations such as efficiency and accuracy gains in visual search, validity effects, and evidence from eye-tracking (for an overview see Horstmann, 2015).

While this may seem to be good evidence for saliency models, several data suggest that salience is not sufficient to capture attention in this type of procedure. Salient items that were not novel but rather familiar completely failed to capture attention or attracted attention only weakly (Becker & Horstmann, 2011; Horstmann, 2005). For instance, Horstmann (2005) found that a color singleton (e.g., red among green) captured attention in the critical trial only when the pre-critical trials did not contain the singleton feature (red), but failed to capture when the pre-critical phase comprised search displays in which stimuli were either all red or green. Correspondingly, Becker and Horstmann (2011) showed that apparent motion captured attention following pre-critical trials without apparent motion, but not after pre-critical trials where in some displays all items showed apparent motion. In addition to these demonstrations that salience is not sufficient, salience does also not seem to be necessary to attract attention: Horstmann and Herwig (2016) demonstrated that the eyes are attracted by a novel feature even when the feature was not a singleton in the display.

There was, however, one experiment that demonstrates an effect of salience at the unannounced first presentation where capture by the salient item could not be attributed to novelty. Becker and Horstmann (2011, Experiment 3) found that a salient object did capture attention in a display where every item – the salient stimulus (a novel shape) as well as the surrounding non-salient stimuli (novel motion) – had a novel feature such that no item is singled out by novelty. This result cannot be explained by novelty prioritizing a location in the display. It rather points to a unique causal role of salience in biasing attention, at least within a completely novel display. To reconcile this effect of salience in an all-novel context with the absence of an effect of salience in an all-familiar context (Becker & Horstmann, 2011; Horstmann, 2005), the authors suggested that changing the entire display may result in establishing a new exploratory search mode that uses salience to direct attention to the most informative locations in the display.

The evidence for the saliency capture effect must still be regarded as limited for two reasons. A first limitation is that Becker and Horstmann (2011) tested apparent motion, whereas current neuro-computational saliency models are typically designed to deal with static stimuli such as photos of scenes or landscapes (e.g., Itti & Koch, 2001). In a similar vein, the most hotly debated psychological experiments on the saliency hypothesis test color salience, not motion salience (e.g., Folk, Remington, & Johnston, 1992; Theeuwes, 1992, 2010; Yantis & Egeth, 1999). After all, motion may be special, for instance, in that motion is a variable, temporally unstable state of objects, unlike color and shape, which define the identity of objects (e.g., Gibson, 1969). Thus, to show that salient items can indeed attract attention independently of the observer’s goals and independently of being a novelty singleton, it is desirable to demonstrate that the results of Becker and Horstmann (2011, Experiment 3) also apply to color.

The second limitation of previous work is that it does not provide exact information on the time course of capture by an unannounced salient item, because previous studies used a set-size manipulation to infer capture. Precise information about the time course, however, is important because proponents of salience capture emphasize an early onset (as early as 60 ms, and less than 150 ms) of an attentional shift to the salient stimulus (e.g., Kim & Cave, 1999; Theeuwes, Atchley, & Kramer, 2000; cf. also Theeuwes, 2010), or a high proportion of first saccades to the salient stimulus (Itti & Koch, 2001). If, however, as proposed by Becker and Horstmann (2011), the orienting to the salient singleton is surprise triggered, a later onset is expected. Both accuracy gains with short presentation times (Gibson & Jiang, 1998; Horstmann, 2006; Horstmann & Becker, 2008) and eye-tracking measures (Horstmann & Herwig, 2015, 2016) reveal latencies later than 150 ms, more around the order of 400 ms. Given that previous studies indicated salience capture only when all the items were novel, it is still an open question whether visual selection of the salient item was (mainly) mediated by a (slow) expectancy-discrepant mechanism, or a (fast) saliency-capture mechanism.

Aims of the present study

The aim of this study was to provide a more decisive test and detailed information on the time course of a singleton’s ability to capture attention on an unannounced first presentation. To provide precise information about the time course of allocating attention to an unexpected salient item, we monitored the participants’ eye movements with an eye-tracker. Eye-tracking allows measuring the latency of gaze allocation to a stimulus with millisecond accuracy. Moreover, because of a tight temporal coupling between gaze and attention, tracking of gaze position allows inferences about selective attention. Multiple studies have established that selective attention – the covert shift of attention – usually precedes and leads the eye movement in visual search and discrimination tasks (Deubel & Schneider, 1996; Kowler, Anderson, Dosher, & Blaser, 1995). Thus even though eye movements are not identical with attention shifts (e.g., Grubert & Eimer, 2016), it seems reasonable to assume that these two measures are highly correlated, in particular in a visual search task (see also, Hulleman & Olivers, 2016). Using eye-tracking thus retains comparability with previous experimental saliency studies measuring covert shifts of attention, while additionally providing precise measurements of the spatial and temporal parameters of multiple selections during search. Moreover, with eye movements as the dependent measure, the present study is directly relevant for the computational saliency models that model gaze-fixation sequences (e.g., Itti & Koch, 2001).

To ensure that capture by the salient item cannot be attributed to an observer’s intentions or an item’s feature novelty, the experiment was designed such that (a) observers were not biased for or against the selection of a salient object, and (b) that the salient object was not singled out by the novelty of its feature. In the present experiment, all items were initially (in the pre-critical trials) presented on gray color patches and the participants’ task was to detect among rings with small radial gaps (as in a Landolt-c) a ring without a gap. On target-present trials, a ring without a gap was presented among multiple distractors, whereas target-absent trials displayed only distractors. On the critical trial, which was always a target-absent trial, one distractor was displayed on a red (or green) patch among green (or red) patches. Participants were not informed of this change and had not seen any of the colors before. Thus, the novel stimulus colors red and green were completely unexpected and certainly not part of the attentional set. This is the all-new group. Another group of participants was tested in the one-old group, in which a single familiar (“old”) color patch (gray) was presented among novel (red or green) patches on the surprise trial. Note that in both groups, the perceptually salient stimulus was not singled out by novelty: In the all-new group, the salient stimulus was as novel as the non-salient stimuli, and in the one-old group, the salient stimulus was the only familiar color in the display, while all non-salient stimuli had a novel color.

We consider the one-old and the all-new groups as roughly equivalent tests of the saliency capture hypothesis. Both groups satisfy the main requirements that there is (a) no relevant set for the singleton because of its first unannounced presentation, and (b) the feature singleton is not singled out by the novelty of its feature. We should be able to observe the effect of salience on selection, uncontaminated by intentions or feature novelty. Specifically, on a saliency capture account we would expect early effects on attention and eye movements, with the salient stimulus being fixated frequently as the first item in the display.

What are the predictions of the surprise capture account? The novel features at all stimulus locations are assumed to be expectancy discrepant. For basic features such as color it is assumed that expectancy discrepancy directly increases attentional priority. In the all-new condition, on the one hand, this increase in priority for the stimulus location does not favor any of the items because items at all locations have a novel feature. In the one-old condition, on the other hand, only the non-singleton stimuli should be prioritized by expectancy discrepancy; because the singleton has a familiar expectancy-congruent feature it should even be deprioritized. Therefore we expect that the singleton in the one-old group is less likely to be selected, or selected later, than the singleton in the all-new group.

Further predictions concern dwell times and RTs. Previous research found that expectancy-discrepant objects are looked at longer than expectancy-congruent objects (e.g., Horstmann & Herwig, 2015; Retell et al., 2015; Võ, Zwickel, & Schneider, 2010). This probably partly reflects heightened attentional priority in the temporal domain (Horstmann, 2015), and partly more elaborate processing of expectancy-discrepant objects (Schützwohl, 1998). We predict therefore longer average dwell times in the critical trial. Moreover, given that the familiar color is in itself not surprising, we predict that dwell times on the familiar feature singleton should be shorter than on the novel feature singleton (Horstmann & Herwig, 2015; Retell et al., 2015). Additionally, as previous studies have found that surprising items elicit long delays on mean RTs by interfering with late, decisional, and response-related processes, we expect the RTs to be considerably longer on the critical trial than on comparable pre-critical trials (e.g., Horstmann, 2005; Meyer, Reisenzein, & Schützwohl, 1997).

The crucial question is whether attention capture is fast or slow. Classical saliency capture predicts effects to emerge early in time (Kim & Cave, 1999; Theeuwes et al., 2000) and on the first fixation (Itti & Koch, 2001; Theeuwes et al., 2000). Previous results with the surprise paradigm have found somewhat later effects around 400 ms, and often not affecting the first fixation (Horstmann, 2006; Horstmann & Herwig, 2015, 2016). Thus, if selection of the singleton is based on saliency-driven processes, the singleton should affect visual search very early, attracting attention and the gaze within the first fixation. In turn, if gaze capture by the singleton is mediated by the slower, novelty-driven mechanism, then the singleton would affect eye fixation patterns only later (e.g., Horstmann & Herwig, 2015).

Method

Participants

Forty students or visitors at Bielefeld University participated in the 10-min experiment. Participants were approached in the central hall of the university main building, and asked to participate in a short experiment in return for 1€. All had self-reported normal or corrected-to-normal vision. The data of one participant were lost due to a computer failure.

Stimuli

The target was a 1.1° diameter ring with a line-width of 0.17° (viewing distance 71 cm). The distractors were identical to the target with the only difference of a small radial gap of 0.08° height. Sixteen different gap positions were evenly distributed between 22.5° and 360° (steps of 22.5°). The rings were black, as was the background. The rings were presented on circular color patches with a diameter of 2.0° (Fig. 1). The standard color was gray (RGB): 59 %, 59 %, 59 %), the two deviant colors for the critical trial were red (RGB: 100 %, 20 %, 20 %) and green (20 %, 100 %, 20 %). Eight stimuli (color patches plus search stimuli) were presented in each search display. The stimuli were evenly distributed on the imaginary circumference with a radius of 8.36°.

Fig. 1
figure 1

Schematic of display layout and sequence of events within a trial

Apparatus

Stimuli were presented on a 19-inch display monitor (100-Hz refresh rate, resolution 1,024 × 768 pixels) at a distance of 71 cm. A video-based tower-mounted eye-tracker (EyeLink 1000, SR Research, Ontario, Canada) with a sampling rate of 1 kHz was used for the recording of eye movements. The participants’ head was stabilized by a chin rest, and the right eye was monitored in all participants.

Procedure

The experiment comprised one single block of 48 trials; 32 pre-critical trials in which only gray color patches without a salient item were presented, and 16 trials with a salient color singleton, the first of which was the critical trial (only the pre-critical trials and the critical trial were analyzed for the purpose of the present study). Half of the displays in each group were target trials, and half were blank trials (without a target). The participants’ task was to report the presence or absence of the target with a corresponding key press (arrow left and arrow down keys in the lower row of the keyboard, operated with the right index and middle fingers). The critical trial was always a target-absent trial. Target position was determined randomly, with all possible target positions realized equally often.

Prior to the experiment participants were calibrated. Each trial began with a drift correction where participants fixated on the middle of the screen and confirmed fixation with a key press (left hand). Then the search display was presented until a key press was registered (performed with the right hand).

Design

Participants were randomly assigned to one of two groups that differed only in the critical trial. In the all-new group, which was analogous to Becker and Horstmann (2011, Experiment 3), all disks had a color different from the pre-critical trials such that no circle was singled out due to its novelty. The salient (singleton) color was red or green for half of the participants of each group, while the non-salient (non-singleton) color was green or red, respectively. For example, in the pre-critical trials, all disks were gray, and on the critical trial, the singleton was green, and the other items were red (or vice versa). In the one-old group, the color of all stimuli except the singleton was changed in the critical trial. The new non-singleton color was red or green for half of the participants, whereas the singleton remained gray (corresponding to the color of all stimuli in the preceding trials). The position of the target was random, as was the position of the salient color in target-absent trials.

Results

The first 16 trials were considered practice, leaving 16 pre-critical trials for the analysis. Gaze data were analyzed using the EyeLink Data Viewer (2.3.22), which parses eye position data into saccades and fixations according to an acceleration threshold (8,000 °/sec2), and a velocity threshold (30 °/sec). Fixations were classified as eye data that exceeded neither of these thresholds for a period of 20 ms or more, and were always assigned to the nearest object in the display. Further preprocessing and statistical analysis was done using R 3.2.3 (R Core Team, 2014). Our main dependent variables were RT, fixation probability, fixation latency (or entry time, cf. Holmqvist et al., 2011), and fixation duration. For the present analysis, fixation latency was defined as the latency (relative to the onset of the display as time zero) of the first fixation on a stimulus during a trial. Fixation durations within a continuous sequence of fixations on a stimulus were summed to obtain dwell time (cf. Holmqvist et al., 2011).

Performance in the pre-critical trials

Pre-critical trial performance was analyzed to summarize and review overall task performance. We expected very inefficient search as the task was designed such that target-distractor discrimination was much easier with the fovea at or near the response-relevant rings, which should result in individual fixations on candidate targets.

In general, RTs were expected to be longer and fixations to be more frequent on target-absent than target-present trials because with inefficient search, target-absent displays have to be examined exhaustively while search in target-present displays can be terminated once the target has been found, which is on average after half of the display has been examined. Group (all-new vs. one-old) was included as a factor in the analysis to check for possible group differences. Note that the groups should not differ in their performance in the pre-critical trials as these were the same in both groups.

Mean accuracy and reaction time

Accuracy was high in the pre-critical target-absent trials (.96) and somewhat lower in target-present trials (.91). A mixed 2 × 2 ANOVA of the accuracy scores with the variables target presence (present vs. absent) and group (one-old vs. all-new) revealed only a significant main effect for target presence, F (1, 37) = 5.60, p =.023, η G 2 = .08 (other Fs < 1.06, ps > .310). Only trials with correct responses were included for all the following analyses for RTs and eye data. Three participants made manual errors in the critical trial and their data were discarded altogether.

A corresponding ANOVA of correct manual RTs revealed only significant main effects for group, F (1, 34) = 4.44, p =.042, η G 2 = .10, and target presence, F (1, 34) = 188.73, p < .001, η G 2 = .36 (interaction: F (1, 34) = 1.49, p = .230). Responses in pre-critical trials were faster (1,995 ms) in the all-new condition than in the one-old condition (2,314 ms), and faster on target-present trials (1,820 ms) than on target-absent trials (2,473 ms). As the pre-critical trials were identical across the two groups, the RT difference must be due to a random sampling effect.

Probability of fixating on a stimulus before trial termination

The same 2 × 2 ANOVA on the mean proportion of fixated stimuli revealed a significant main effect for target presence only, F (1, 34) = 436.80, p < .001, η G 2 = .80 (other Fs < 1). The mean proportion stimuli that were fixated rather than skipped was .57 in target-present trials, and .91 in target-absent trials, largely in line with the predictions of a serial self-terminating search that finishes either on the detection of the target (present trials) or on completion of a full scan of all stimuli (absent trials). The deviations from theoretically expected values of .56Footnote 1 in present trials and 1.0 in absent trials suggest that the strategy was imperfectly implemented, with some skipping of distractors on absent trials.

Number of fixations

A corresponding 2 × 2 ANOVA computed over the mean number of fixations per trial reveals a significant main effect for target presence only, F (1,34) = 279.82, p < .001, η G 2 = .66 (other Fs < 1). On average, participants made 6.2 fixations in target-present trials, and 9.6 fixations in target-absent trials, in line with a serial, effortful search.

Dwell times

Dwell times on targets and non-targets were examined (dwell time is the sum of fixation durations during the first continuous visit of a region). A corresponding 2 × 2 ANOVA computed over the mean dwell times revealed a significant main effect for target presence only, F (1,34) = 82.80, p < .001, η G 2 = .17 (other Fs < 2.71). Average dwell time was longer in target-present than in target-absent trials (262 vs. 218 ms). This difference was due to longer dwell times on the target than on distractors in target trials (448 vs. 209 ms), t (35) = 11.00, p < .001.

Comparison of performances in the critical and the pre-critical trials

Only target-absent trials of the pre-critical block were included in the further analyses to match with the critical trial, which was also a target-absent trial. As inefficient search results in highly variable RT, performance on the pre-critical trials was always averaged, to reduce the noise and maximize the precision and reliability of average performance estimates.

Mean RT and error

There was no difference in manual errors between the pre-critical target-absent trials and the critical trial, t (36) < 1. To assess whether the novel display elicited the RT delay often found with surprise presentations, we first compared the mean valid RTs between the critical and pre-critical trials. A mixed 2 × 2 ANOVA with the variables trial type (pre-critical vs. critical) and group (one-old vs. all-new) revealed a main effect for trial type only, F (1, 34) = 64.02, p < .001, η G 2 = .42 (other Fs < 1.40), with a considerably longer mean RT of 4,570 ms in the critical trial, than the pre-critical trials (2,472 ms). These results reflect the predicted surprise-induced RT delay, and indicate that the surprise-induced RT delay is comparable for the all-new and the one-old singleton group.

Number of fixations and overall fixation probability

To examine whether the RT increase was due to participants making more fixations on the critical trial, the number of fixations and fixation probability per stimulus were analyzed and compared between the pre-critical block of trials and the critical trial. A 2 × 2 ANOVA with the variables trial type (pre-critical vs. critical) and group (one-old vs. all-new) computed over the overall number of fixations revealed a significant main effect for trial type only, F (1, 34) = 59.14, p < .001, η G 2 = .39 (other Fs < 1), reflecting that many more fixations were made in the critical trial than in the pre-critical trials (15.2 vs 9.6).

More fixations do not necessarily mean that the display was searched through more thoroughly as it is possible that the surplus fixations are made to already fixated stimuli. Thus, a 2 × 2 ANOVA with the same variables as before was computed over the average fixation probability per stimulus. It revealed a significant main effect for trial type only, F (1, 34) = 9.53, p = .004, η G 2 = .12 (other Fs < 1.52), reflecting an increase in the probability of fixating on a stimulus from the pre-critical trials to the critical trial (.91 vs. .97).

To summarize, participants made more fixations on the critical trial than in the pre-critical trials, and skipped a stimulus less frequently.

Singleton fixation probability as a function of fixation sequence

To assess the time-course of visual selection of the singleton, we analyzed the probability of selecting the singleton as a function of the eye-movement sequence. The average probability of selecting a stimulus in pre-critical target-absent trials serves as a baseline. Figure 2 shows the cumulative probabilities for a stimulus fixation on a stimulus in the pre-critical trials (circles) and on the singleton in the critical trial (triangles), depicted separately for the one-old (blue) and the all-new (green) groups.

Fig. 2
figure 2

Cumulative proportion for 1st, 2nd, and 3rd fixations on the singleton stimulus in the critical trial and on the stimuli in the pre-critical trials

Figure 2 indicates that the first fixation was barely influenced by the singleton, and that an influence emerged at the second fixation such that the singleton was more likely to be fixated on the critical trial than a randomly chosen distractor on the pre-critical trials. In addition, this effect appeared to emerge earlier in the all-new than in the one-old group. We tested for each fixation whether there was a significant increase in fixation probability within each group (indicating attentional capture by salience), and whether the increase was higher in the all-new than in the one-old group (indicating an influence of familiarity of the singleton feature). One-tailed t-tests were used to maximize statistical power.

For the first fixation, there were no changes in probability in both groups, and no difference between groups, ts < 1. For the second fixation, there were significant more fixations on the singleton in the critical trial than on a non-singleton in the pre-critical trials in the one-old group, t (18) = 2.55, p =.01, and in the all-new group t (16) = 6.43, p < .001, and the increase was stronger in the all-new than in the one-old group, t (34) =2.04, p = .025. The increase in the third fixation was significant for both the one-old group, t (18) = 6.03, p < .001, and the all-new group, t (16) = 11.1, p < .001, with no difference between the groups, t (34) = 1.00.

These results seem to be more in line with a slow-acting, novelty-driven mechanism mediating eye movements to the salient distractor rather than a fast-acting mechanism, which has been proposed to underlie singleton capture, and is typically observed when attention is biased to an expected target stimulus prior to its appearance (e.g., Becker, Ansorge, & Horstmann, 2009; Becker, Horstmann, & Remington, 2011; Mulckhuyse, van Zoest, & Theeuwes, 2008).

Donk and van Zoest (2008) suggested that the effects of salience may be rather short lived and therefore only apparent in the first fixations when these have short latencies. First-fixation latencies in the critical trial were generally short on average (189 ms). To test whether first-fixation latencies differ between the critical trial and the pre-critical trials, an ANOVA with group (one-old vs. all-new) and trial (pre-critical vs. critical) as variables was conducted. It rendered no main effects or interactions, Fs < 2.19, ps > .14. Next it was checked whether short latency fixations showed a higher probability to select the singleton first. A median split of the first-fixation latencies was used to define two groups with low and high latencies, respectively. The probability of fixating the singleton was not different for fast and slow latency fixations (.12 vs. .26), t (35) = 1.11, p = .276.

The saliency model predicts frequent first fixations on the salient singleton. Analyses thus far, however, did not support that prediction. One might argue that several spatially adjacent items can be selected at once with one fixation and that the selection of the singleton thus does not require a fixation on the singleton, but rather a fixation near the singleton (Hulleman & Olivers, 2016; Venini, Remington, Horstmann & Becker, 2014). We therefore tested whether the first fixations showed a higher probability of being on a stimulus near the singleton (i.e., on the singleton or the two non-singletons that flanked the singleton) than on stimuli further away (i.e., the remaining five stimuli non-adjacent to the singleton). The test, however, failed to support this proposition: 44 % of the first fixations landed near the distractor, which did not deviate significantly from chance level (p = .375) for three out of eight positions, t (35) < 1. An inclusion of the next two adjacent stimuli revealed similar results: 67 % fixations landed on the five positions around and including the singleton, which is not different from chance level (p = .63) for five out of eight positions, t (35) < 1. Apparently, there was no directional bias in the first eye movement.

Singleton fixation latency

To test whether the singletons in the one-old and all-new groups are selected earlier than dictated by chance, we analyzed the mean latencies of the first fixation on the non-singleton stimuli in the pre-critical trials (pre non-sing) as an estimate of unbiased selection, the singleton in the critical trial (crit sing), and the non-singletons in the critical trial (crit non-sing) for the one-old and the all-new groups. The results are depicted in Fig. 3 and indicate that the singleton was selected earlier than non-salient stimuli, both when the non-salient items had a familiar color (pre-critical trials) and when they were presented in a new color (non-singleton stimuli in the critical trials).

Fig. 3
figure 3

Mean latencies of first visits on pre-critical non-singleton stimuli (pre non-sing), the singleton in the critical trial (crit sing), and the non-singleton stimuli in the critical trial (crit non-sing), separately for the one-old and the all-new group. Error bars represent the standard error of the mean

For a statistical analysis, the singleton fixation latency in the critical trial was compared with the average latency of a stimulus fixation during the pre-critical trials as a measure of unbiased (i.e., random) selection. The 2 × 2 ANOVA with the variables trial type (pre-critical vs. critical) and group (one-old vs. all-new) revealed a main effect of trial type, F (1, 34) = 6.56, p < .001, η G 2 = .45. The main effect for group, F (1, 34) = 3.23, p = .08, and the Trial type × Group interaction, F (1, 34) < 1, were not significant. The effect of trial type was due to longer latencies in the pre-critical trials than in the critical trial (1,084 vs. 553 ms).

Fixation duration (dwell time)

A first analysis compared average dwell times on stimuli for the pre-critical (absent) trials with the singleton in the critical trial (see Fig. 4). A 2 × 2 ANOVA with the variables trial type (pre-critical vs. critical) and group (one-old vs. all-new) revealed a main effect for trial type, F (1, 34) = 37.39, p < .001, η G 2 = .35, with longer dwell times in the critical (607 ms) than in the pre-critical trials (218 ms). The main effect for group was not significant, F (1, 34) = 2.71, p = .109. The interaction was significant, F (1, 34) = 4.47, p = .042, η G 2 = .06, indicating a stronger increase of dwell times on singletons in the all-new group. The increases in dwell times are substantial, indicating that a major portion of the surprise-induced RT delay can be attributed to processes that commence after selection of the search items (e.g., perceptual analysis or response selection), not a change in the pattern of searching through the stimuli.

Fig. 4
figure 4

Mean fixation durations (dwell times) for the first visits on pre-critical non-singleton stimuli (pre non-sing), the singleton in the critical trial (crit sing), and the non-singleton stimuli in the critical trial (crit non-sing), separately for the one-old and the all-new group. Error bars represent the standard error of the mean

Because the non-singletons in the critical trial all had a novel and probably expectancy discrepant feature, dwell times are expected to be increased as well for these items. A 2 × 2 ANOVA with the same variables as before revealed a main effect for trial type, F (1, 34) = 10.46, p < .001, η G 2 = .08, with longer dwell times in the critical trials (255 ms) than in the pre-critical trials (218 ms), and a main effect of condition, F (1, 34) = 4.98, p = .03, η G 2 = .10, reflecting a general trend towards longer dwell times in the one-old than in the all-new condition (256 ms vs. 215 ms). The interaction was not significant, F (1, 34) = 1.68, p = .204.

Discussion

We tested whether a salient stimulus captures attention on its unannounced first occurrence when there is no relevant attentional set for selecting salience, and when the salient stimulus is not singled out by a novel feature. The temporal and spatial dynamics of attention were measured using eye tracking.

The results show that the eyes fixated on the singleton earlier than predicted on the assumption of unbiased selection (i.e., random selection as in the pre-critical trials), indicating that the singleton attracted attention and gaze. The mean latency of selecting the salient singleton was 553 ms. The first fixation was not influenced by the salient singleton, but most participants had fixated the singleton by the second fixation. The pattern for first fixations was not different for fast versus slow eye movements.

Results also showed a number of surprise related changes in the critical trial as well. RT increased in the critical trial, as well as average fixation duration (measured as dwell time). Dwell time was higher on the singleton when its feature was novel rather than familiar. Moreover, the total number of fixations in the critical trial was higher than in the pre-critical trials, and search was more exhaustive. There were more early (first or second) fixations on the singleton in the all-new group than in the one-old group, and the singleton was fixated longer in the all-new than in the one-old group.

The first part of our discussion will focus on models and theories that regard salience as an important determinant of initial attentional selection (e.g., Itti & Koch, 2001; Theeuwes, 2010; Wolfe, 1994). These models sometimes raise strong claims about salience and attentional priority. In particular, it has been claimed that salient stimuli capture covert attention in a strongly automatic fashion, with short latencies of less than 150 ms, and that they receive frequent first fixations.

The present results show that salience is in fact an important determinant of selection, confirming the main assumption of saliency models: the salient location was gazed at earlier than expected by chance. This appears to be the first demonstration of involuntary gaze attraction by salience, which cannot be explained by implicit, explicit, or uncontrolled attentional control settings, or by the novelty of the salient feature.

However, details of our results deviate from specific predictions of salience models with respect to timing. Consider, for example, Itti and Koch’s (2001) neuro-computational saliency model. The first fixation should be directed towards the most salient location in the visual field, the second fixation to the second most salient location, and so on. The present study, however, finds fixations on the salient item only after one or more intermediate fixations. The deviation from predictions can hardly be explained by insufficient salience as the display was very simple, providing a single salient item strongly differing from the surrounding items.

Corresponding assumptions are rampant in current cognitive saliency models that predict automatic selection of salient items before 150 ms has elapsed since the search display started (Kim & Cave, 1999; Theeuwes, Atchley, & Kramer, 2000). The present study instead finds that fixations are biased to salient items with a mean latency in the range of 500–600 ms. Clearly, covert and overt shifts of attention are not the same, and when covert shifts precede overt shifts (e.g., Deubel & Schneider, 1996), the covered shift must necessarily have occurred earlier. This, however, would hardly explain a delay of 300 ms or more.

The departure from the fast selection prediction cannot be explained by strategic suppression or inhibition of saliency capture. For example, the lack of evidence for saliency capture on SOAs exceeding 150 ms (e.g. Folk et al., 1992) has been interpreted as reflecting quick disengagement of attention and reorienting from the erroneously selected singleton (e.g., Theeuwes et al., 2000). Strategic attempts to mitigate the deleterious impact of saliency capture make sense of course only when participants know or believe that attending to the salient stimulus would interfere with performance. This is plausible in experiments where participants know that the salient stimulus is non-predictive of the target or when it never coincides with the target. It was, however, not the case in the present experiments, so that fast disengagement of attention cannot explain the results.

Müller, Geyer, Zehetleitner, and Krummenacher (2009) proposed and tested a different mechanism that might mitigate effects of irrelevant salient distractors. On their account, participants could reduce the weights of the irrelevant dimension (e.g., color), such that saliency signals from that dimension are attenuated. Their study shows that salient items can in fact be ignored (see also Töllner, Müller, & Zehetleitner, 2012), but that it takes an incentive and some practice to do so effectively. Participants in the present study, however, had no incentive and no practice to engage in strategic suppression of saliency signals. Saliency capture effects should thus have found ideal conditions, as participants were unprepared for the presentation of the salient stimulus.

Why did other experiments (e.g., Kim & Cave, 1999; Theeuwes et al., 2000) reveal fast saliency capture while the present experiments did not? On the contingent capture (e.g., Folk et al., 1992) and related (e.g., Ansorge et al., 2010) accounts, previous findings of early saliency capture effects are not triggered by salience alone, but by a combination of bottom-up salience and top-down factors. Studies that find early salience effects typically use a task where participants search for a target feature singleton on a different dimension from the interfering distractor singleton. With salience defining the target, task-driven top-down factors are assumed to gate bottom-up salience signals to guide attention. The top-down influence is not reactive (as when a symbolic cue is followed, Müller & Rabbit, 1989), but pre-emptive (Ansorge & Horstmann, 2007), which makes salience fast acting. Such top-down factors were not active in the present experiment, explaining why early salience effects were absent.

We will now turn to the late salience effects in the present experiment. The present results are in line with previous work (Becker & Horstmann, 2011, Experiment 3) that attention is biased towards the salient stimulus even if it is not singled out by feature novelty, and clarify the time course of this effect. Interestingly, the time course of the deployment of gaze is similar to previous studies from the surprise paradigm, when the singleton was the only element in the critical trial with a novel feature (Horstmann & Herwig, 2015), and when non-salient novel stimuli are presented in the critical trial (Horstmann & Herwig, 2016). The similarity in time course is suggestive of a common origin. Thus, we will discuss how novelty or expectancy discrepancy might have contributed to the gaze shift towards the salient stimulus.

In the following, we consider two hypotheses: (1) that salience is a “super-feature” that can be expectancy discrepant similar to ordinary features like color, size, etc., and (2) that surprise changes the attentional control setting to an exploration mode where salience plays an important role.

Considering the first hypothesis, it might be noted that we have previously (e.g., Horstmann, 2002, 2015) distinguished between surprise on the feature level and surprise on a conceptual level. Surprising features that are pre-attentively available can guide attention. Objects that differ from expectations on a conceptual level but do not differ from expected items on elementary feature (e.g., a bowl with goldfishes in the fridge) bind attention longer once attended to, but do not guide attention (Võ & Henderson, 2009). Similar to specific features, salience signals are likewise pre-attentively available (Yantis & Egeth, 1999; Found & Müller, 1996). It might thus be argued that a surprising salient item captures attention in a similar manner to a surprising feature.

If this is true, it needs to be explained why attention capture was absent or weak for the unannounced first occurrence of a salient stimulus when all features were familiar (Becker & Horstmann, 2011; Horstmann, 2005). One plausible option is that the first occurrence of a singleton is less expectancy discrepant than the first occurrence of a particular color. In fact all pre-critical displays contained salience in that every stimulus was salient relative to the uniform black background. The pre-critical trials thus familiarized participants with salience signals to some degree. The idea that the first presentation of a singleton is less discrepant than the first presentation of a novel color is in line with the somewhat larger mean latency of the first fixation on the surprise stimulus in the present experiments (around 550 ms) than in previous experiments (around 400 ms; Horstmann & Herwig, 2015; note, however, that this comparison over experiments introduces uncertainties and has to be independently confirmed by future experiments). Alternatively, or in addition, familiarity of features may dampen salience, while novelty of features may boost salience.

The second hypothesis is that surprise changes the attentional control setting to an exploration mode. At the beginning of the trial, participants process the display elements according to their task relevance. When the appearance of the display in the critical trial is vastly different from the pre-critical trials, task-driven processing is interrupted and participants engage in a more exploratory search mode. There is evidence that eye movements often follow a salience gradient during free viewing, that is, in the absence of a specific task (Itti, Koch, & Niebur, 1998), which is reasonable because often the scene’s most discriminable objects can be found at salient regions (Einhäuser et al., 2008). There were in fact indicators that participants may have entered a different search mode in the critical trial, as more fixations were made and search was more exhaustive. This may be taken as indirect evidence supporting the second hypothesis. An implication of the second hypothesis is that salience plays a role in particular in novel environments, but is not an important determinant of attention in a predictable and familiar environment.

To conclude, we found an attention-attracting effect of a salient singleton on its first unannounced occurrence. This is in line with the general idea that salient stimuli are prioritized for attentional selection. The details of this effect are at odds with predictions of fast-acting singleton capture, and in line with more slowly acting surprise capture. Together with previous results, the present results indicate that salience changes attentional priority in particular in novel environments.