It is known that visual fixation durations in continuous visual tasks such as reading and scene perception vary from less than 100 ms to several seconds, thereby resulting in a positively skewed distribution, with modal values between 200 and 350 ms (Rayner, 1998; Unema, Pannasch, Joos, & Velichkovsky, 2005). Durations vary a great deal from one fixation to the next (Buswell, 1935; Stratton, 1906; Yarbus, 1967). It has been suggested that fixation durations are determined by information processing (Groner & Groner, 1989; Just & Carpenter, 1980), cognitive processes (Shebilske, 1975), and eye-movement preprogramming (Buswell, 1935; Zingale & Kowler, 1987).

Visual attention is among these factors, as has been shown in a number of studies (e.g., Brockmole & Boot, 2009; Pannasch & Velichkovsky, 2009). In particular, we found that a combination of relatively long (>180-ms) fixations surrounded by small-amplitude saccades (i.e., saccades within the parafoveal range of 5°) significantly improves both recognition of foveated picture fragments (van der Linde, Rajashekar, Bovik, & Cormack, 2009; Velichkovsky, Joos, Helmert, & Pannasch, 2005) and reaction to sudden hazardous events in a simulated dynamic environment (Velichkovsky, Rothert, Kopf, Dornhoefer, & Joos, 2002). We attributed this difference in performance to the different modes of visual attention—focal versus ambient—and supposed that the underlying brain mechanisms are localized in ventral and dorsal parts of posterior cortex (Corbetta, Patel, & Shulman, 2008; Milner & Goodale, 2008). Since good visual acuity is limited to the parafoveal region (i.e., 5° of visual angle; see, e.g., Wyszecki & Stiles, 1982), it can be assumed that subsequent fixations remaining within this region are rather related to the processing of details and identification of objects (Velichkovsky et al., 2005). In contrast, visual fixations subsequent to large-amplitude saccades (>5°) are likely to be involved in processing of information about the spatial arrangement of rather undifferentiated visual “blobs” (cf. Trevarthen, 1968). This allows for classifying fixations on the basis of the prior saccadic amplitude: If the preceding amplitude is larger than 5°, this fixation is presumably in the service of the ambient attention mode, whereas for preceding saccadic amplitudes smaller than 5°, the fixation is assumed to belong to the focal attention mode (cf. Pannasch & Velichkovsky, 2009).

Various models can be developed to predict the spatial and temporal aspects of eye-movement control. In contrast to reading research, where considerable effort has been invested to predict when the eyes will move (Rayner, 2009), the focus of the work in scene perception has until very recently been on the question of where the eyes will move next (e.g., Itti & Koch, 2001; Tatler, Baddeley, & Vincent, 2006; Torralba, Oliva, Castelhano, & Henderson, 2006; Underwood, Foulsham, & Humphrey, 2009).

For fixation duration, direct and indirect control mechanisms have been proposed (Rayner, 1998). While direct control theories suppose that decisions about fixation termination are made during the ongoing fixation, indirect control theories suppose that the current fixation is determined by other factors. Direct control theories are supported by the fact that the available visual information influences the duration of fixations (Loftus, 1985; Mannan, Ruddock, & Wooding, 1995; Parkhurst, Culurciello, & Niebur, 2000). For example, Mannan et al. reported longer fixations for low-pass-filtered than for unfiltered scenes. A prolongation of fixations has also been found when the amount of either foveal or peripheral information was limited by a gaze-contingent mask (van Diepen & d’Ydewalle, 2003). Results from recent studies of scene perception led Henderson and colleagues to suggest a mixed control model for fixation durations (Henderson & Pierce, 2008; Henderson & Smith, 2009; Nuthmann, Smith, Engbert, & Henderson, 2010). In these experiments, the scene onset delay paradigm was applied—that is, the scene was replaced by a pattern mask during a saccade and reappeared in a subsequent fixation after various delays. The authors reported a prolongation for a certain proportion of fixations until the scene’s reappearance, but other fixations remained unaffected by the scene onset manipulation. The existence of the first group (“population” or “subpopulation”) of fixations was interpreted as evidence for direct control by the scene information, whereas the second group was considered as being under some form of indirect control.

The scene onset delay paradigm is similar to another well-known paradigm of eye tracking research, namely distractor presentation experiments, especially in their gaze-contingent version (Pannasch, Dornhoefer, Unema, & Velichkovsky, 2001; Reingold & Stampe, 2000). The distractor presentation experiments usually introduce sudden changes in the visual environment (Lévy-Schoen, 1969; Walker, Deubel, Schneider, & Findlay, 1997). Various studies in reading and scene perception have shown a reduction of the saccade probability from 90 to 120 ms following the change (Reingold & Stampe, 2000, 2002). This probability reduction can be viewed as a prolongation of fixation durations (Pannasch et al., 2001) but can also be interpreted as a delay of subsequent saccades (Reingold & Stampe, 2000) or a general inhibition of behavior found during the orienting response (cf. Sokolov, 1963). The latter view is supported by the fact that gaze-contingent distractors of different modalities have revealed habituation-like processes in eye movements and in cortical event-related potentials (ERPs; Graupner, Velichkovsky, Pannasch, & Marx, 2007; Pannasch et al., 2001; Velichkovsky & Pannasch, 2001).

We performed two distractor paradigm experiments to distinguish between direct and indirect control of fixation durations. In Experiment 1, gaze-contingent distractors appeared either early or late within selected fixations. The distractors were presented for different durations in order to allow for an experimental manipulation similar to the manipulation of scene onset delay. In the second experiment, two different types of holistic display changes occurred time-locked within the presentation, independently of the actual gaze behavior. Our experiments address two as-yet-unanswered questions. First, is it possible to determine which mechanism (direct vs. indirect) controls fixation duration? Second, can fixation control be modulated by two different attentional mechanisms?

We anticipated that both distractor paradigm experiments would replicate the findings obtained with the scene onset delay paradigm, demonstrating similar fixation behavior for changes of different qualities and quantities. Additionally, we predicted differential influences of visual changes, depending on the mode (focal vs. ambient) of attention. Recent neuroanatomical research has made a distinction between two frontoparietal networks of visual attention, a ventral one that interrupts and resets ongoing activity, and a dorsal attention network specialized for spatial selection of stimuli and responses (Corbetta et al., 2008). In order to reconcile this distinction with our view of the modes of visual attention, the focal attention mode should require ventral network activity, whereas ambient processing would be related to activity of the dorsal network. Regarding the present study, this implies that an interruption by the appearance of a distractor should have a stronger influence when the focal-processing mode is active, a claim that has been partly supported by previous experimental data (Pannasch & Velichkovsky, 2009).

Experiment 1

Method

Subjects

Nineteen students (13 female) from the Technische Universität Dresden, with a mean age of 24.2 years (range, 20–33 years), took part in this experiment. All subjects had normal or corrected-to-normal vision and received course credit for participation in the study, which was conducted in conformity with the Declaration of Helsinki.

Apparatus

The subjects were seated in a dimly illuminated, sound-attenuating room. Eye movements were sampled monocularly at 1 kHz using the SR EyeLink 1000 infrared eyetracking system, with online detection of saccades and fixations and a spatial accuracy of better than 0.5°. Saccades were identified by deflections in eye position in excess of 0.1°, with a minimum velocity of 30°/s and a minimum acceleration of 8,000°/s2, maintained for at least 4 ms. Pictures were displayed using a GeForce 7300-GT card and a CRT display (19-in. Samtron 98 PDF) of 1,024 × 768 pixels at a refresh rate of 100 Hz. Viewed from a distance of 80 cm, the screen subtended a visual angle of 27.1° horizontally and 20.5° vertically.

Stimuli

Sixty digitized pieces of fine art by seventeenth- to nineteenth-century European painters served as the stimulus materials, since paintings are considered “maximal memory stores” and some of the “most valuable objects in human history” (Leyton, 2006, p. 2). In addition, the use of paintings allowed us to directly compare our results with those of previous work (Graupner et al., 2007; Pannasch et al., 2001). Stimuli were displayed with a size of 1,024 × 768 pixels. To systematically investigate the influences of temporally and spatially gaze-contingent visual distractions, a light blue annulus 2° in diameter (margin width = 0.345°) served as a distractor, while the paintings were shown in grayscale. The color of the distractor assured its visibility, whereas the annular form did not disturb the center of the fovea, covering only about 3.4% of the parafoveal region. Figure 1 shows an example of an image with a distractor.

Fig. 1
figure 1

Example of an image with distractor. A Monk Approaching a Sleeping Colleague by Giuseppe Gambarini is reprinted with permission from the Dresden State Art Collections, Germany

Procedure

The subjects were informed that the purpose of the study was to investigate eye-movement patterns in the perception of art and were asked to study the images in order to be prepared for a subsequent recognition task. The subjects were aware of the presentation of the distractors but were instructed to ignore them. The experiment was run in four consecutive blocks, each containing 15 pictures, with a 5-min break after the second block. The full experiment took 90 min in total to complete. An initial nine-point calibration and validation was performed before the start of each block, and calibration was checked prior to every trial in the experiment. Subjects were initially given two study trials in order to get acquainted with the task.

The first distractor was always presented after an initial 5-s period of scene inspection. This initial phase without the experimental manipulation was included because systematic changes in fixation durations and saccadic amplitudes during the first few seconds of scene inspection have often been reported (e.g., Antes, 1974; Pannasch, Helmert, Roth, Herbold, & Walter, 2008; Unema et al., 2005). Once all 18 distractors (see below) had been shown, the image was replaced by a centrally presented cutout (size 100 × 100 pixels) belonging either to the trial image or to the image presented in a matched catch trial. Catch-trial images were either painted by the same artist or displayed a similar setting. The subjects had to judge whether the cutout was part of the previously seen image by clicking on-screen “yes” or “no” buttons using the mouse.

Distractors were presented during every fifth fixation in a trial. This presentation interval was selected in accordance with previous work (cf. Graupner et al., 2007; Pannasch & Velichkovsky, 2009) in order to ensure that distractors did not become predictable and to acquire enough unaffected fixations to create a baseline. Distractors were triggered by the fixation onset, with an onset latency of either 100 or 200 ms; they remained visible for 100, 200, or 300 ms and were centered on the coordinates of the actual fixation. For each trial, three distractors of each onset delay and duration combination were shown in a randomized order, resulting in a total of 18 distractors per image. If a fixation was terminated before reaching the predefined onset latency, the program waited for the next suitable fixation. Image presentation lasted until the respective number of distractors in each category were presented (on average, about 40 s).

Data analysis

Raw eye-movement data were preprocessed by removing fixations that occurred around eye blinks or outside the presentation screen. Only distracted fixations and the two adjacent nondistracted fixations—the latter serving as baseline—were processed further. Experimentally manipulated fixations were removed if the predefined distractor latency was exceeded by ±10 ms. To assure the comparability of the baseline with the distractor condition, fixations of a duration shorter than the respective distractor latency (100 or 200 ms) were excluded, resulting in a total of 99,205 (84%) fixations. Because of the positive skew in the distribution of fixation durations, median rather than mean values were used as a measure of central tendency. For statistical testing, the respective median values were subjected to repeated measures ANOVAs. In order to highlight possible mutual relationships, linear regression analyses were applied between fixations of different distractor durations (see below). Eta-squared values are reported as estimates of the effect size (Levine & Hullett, 2002).

Results and discussion

The subjects correctly judged whether cutout images matched the previous image with a score of 78% (SD = 6.6).

The first objective was to see whether the presentation of distractors led to distinct clusters of fixation durations. The second was to analyze the influences of distractor onset latency and distractor duration. Thirdly, we compared the durations of fixations with distractors to baseline fixations, with regard to the influence of the preceding saccade amplitude.

The topmost histogram in Fig. 2a exemplifies the typical right-skewed distribution of unaffected fixations. The histograms in the subsequent rows show the fixation duration distributions influenced by distractors; each histogram represents a particular latency and duration combination. Figure 2b shows scatterplots of distractor and baseline fixations according to the distractor onset latency. Scatterplots for the distractor fixations are further subdivided by the distractor duration. For our later discussion, it should be mentioned here that baseline and distractor fixations covered the same range of durations.

Fig. 2
figure 2

(a) Histograms of fixation durations for an exemplary baseline and the different distractor conditions (the baseline shown represents the latency 100 and duration 100 condition and fixations n – 1, where n equals the distractor presentation). In the histograms of the experimental conditions, the time of distractor presence is indicated in gray. (b) Scatterplots of fixation duration for each distractor latency and duration condition are shown, together with scatterplots for the baseline (BL) of each latency. The baselines represent the two adjacent fixations surrounding the distractor presentation

A clear influence of the experimental manipulation was found: Irrespective of the distractor latency and duration, the histograms in Figure 2a show a first dip 100–140 ms after the distractor onset and a second dip about 100–140 ms following the distractor offset. Accordingly, fixation distributions shift to the right if a distractor is shown; in other words, the fixation duration increases. Likewise, in the scatterplots of Fig. 2b, gaps in the fixation distributions represent the distractor influence. The timing of this inhibition is in line with previous reports (Graupner et al., 2007; Reingold & Stampe, 2004). The distribution of fixation durations can be characterized by three clusters: First, fixations that are terminated within 100 ms after the distractor onset compose the unaffected cluster, since there is not enough time for distractors to influence the fixation. Next, there are the fixations enclosed by the two gaps—the onset cluster—because those fixations are most likely affected only by the distractor onset (Fig. 2b). Finally, related to the distractor duration and therefore to the disappearance of the distracting event, a third, offset cluster was identified, containing fixations that were influenced both by the onset and offset of the distractor. The same subdivision can be recognized in the histograms of Fig. 2a; here, the clusters are separated by dips in the fixation distributions.

The data shown in Fig. 2 suggest that the timing of the onset and offset clusters is influenced by the different distractor durations. To explore this relationship, differences between the distributions of the baseline and distractor fixation durations were calculated. The resulting difference (gray lines in Fig. 3) was either negative (decreased saccadic activity and fewer fixations than in the baseline) or positive (increased saccadic activity and more fixations than in the baseline). From each of the three aforementioned clusters, we selected the peaks of saccade activity (bullets in Fig. 3) along the difference lines to compute linear regressions (black lines in Fig. 3).

Fig. 3
figure 3

Difference lines with saccade activity peaks and the resulting linear regression fits as a function of distractor latency and duration (see the text for details)

The fixation duration of the unaffected clusters remained constant across distractor durations and onset latencies (latency 100: slope = –0.1, intercept = 160, R 2 = .25, p = .67; latency 200: slope = 0, intercept = 246, R 2 = 0, p = .90; Fig. 3, bottom regression lines in panels A and B). Fixation durations of the onset clusters as a function of distractor duration revealed a slight (but nonsignificant) increase for each distractor latency (latency 100: slope = 0.30, intercept = 260, R 2 = .75, p = .33; latency 200: slope = 0.30, intercept = 340, R 2 = .93, p = .33; Fig. 3, middle regression lines). For the offset clusters, a monotonic increase with respect to distractor duration was found for each distractor latency (latency 100: slope = 1.02, intercept = 280, R 2 = 1.0, p < .001; latency 200: slope = 1.05, intercept = 380, R 2 = 1.0, p < .001; Fig. 3, top regression lines). Although increased distractor duration to some extent influenced the onset cluster fixation duration, a significant effect was found for the offset cluster. This one-to-one linear relationship for the offset cluster—meaning that increasing the distraction time prolongs fixation—exactly mirrors the findings from the scene onset delay paradigm (Henderson & Pierce, 2008; Henderson & Smith, 2009). This means that we replicated the findings from the scene onset delay paradigm by presenting simple and irrelevant distractors. Since scene information remained stable, the findings can hardly be explained by the presence or absence of scene information; therefore, we interpret the prolongation of fixation durations as being caused by the onset and duration of the distractor.

To alternatively examine the distractor effect with respect to the different distractor latencies and durations, baseline fixation durations were subtracted from the fixation durations of the distractor conditions; the resulting difference values represent the prolongation of the fixation due to the shown distractor (Graupner et al., 2007). Additionally, distractor fixations were classified as either ambient or focal on the basis of the preceding saccade amplitude (Pannasch & Velichkovsky, 2009).

Difference values were applied to a 2 (latency: 100, 200) × 3 (duration: 100, 200, 300) × 2 (fixation type: ambient, focal) repeated measures ANOVA and revealed significant main effects of latency, F(1, 18) = 41.47, p < .001, η2 = .68, and fixation type, F(1, 18) = 14.21, p = .001, η2 = .15, as well as a significant interaction for latency and fixation type, F(1, 18) = 5.71, p = .028, η2 = .05. No reliable effects were found for duration, F(2, 36) = 2.81, p = .073. Regarding the main effect of latency, the difference values were larger if distractors appeared 100 ms after the fixation onset (latency 100, 86.3 ms; latency 200, 41.5 ms). Considering the influence of the fixation type, larger difference values were obtained for focal fixations (focal, 74.3 ms; ambient, 53.4 ms), replicating our previous findings (Pannasch & Velichkovsky, 2009). The interaction was based on the fact that difference values for the 200-ms latency condition differed by about 33 ms (focal, 57.9 ms; ambient, 25.1 ms), whereas those for the 100-ms latency condition differed only by about 9 ms (focal, 90.8 ms; ambient, 81.8 ms; see Fig. 4a).

Fig. 4
figure 4

Mean differences and standard errors in relation to the fixation type for both distractor latencies (a), and ratios of focal to ambient fixations for both distractor latencies in each cluster (b)

Two different explanations are possible for the fact that the difference values are smaller for larger distractor latencies. Distractions appearing late within a fixation might be less influential because visual information is processed only during the beginning of a fixation. During later stages, mental activities associated with the processing of the fixated information and the programming of the next saccade take place. These processes should be less susceptible to visual disruptions. Although support for this idea comes from reading research (Morrison, 1984; Rayner, Liversedge, White, & Vergilino-Perez, 2003; Rayner & Pollatsek, 1992), it contradicts various studies that have used the distractor paradigm, where similar influences of the distractors at different times within the affected fixation have been reported (Pannasch et al. 2001; Reingold & Stampe, 2000, 2002). Possible spillover effects of the distractor appearance could be rejected, since comparing durations of fixation n – 1 and n + 1 relative to the distractor fixation with a paired t-test revealed no differences, t = 1.82, p = .086. Therefore, the alternative explanation is based on the baseline correction. As already mentioned, the baseline contains only fixations with durations above the distractor latency. Accordingly, the difference value in the 200-ms latency condition might be smaller because of the increased baseline fixation duration. This interpretation is also supported by the dips in the histograms and the gaps in the scatterplots in Fig. 2, which show similar influences of distractors of both latencies. We will come back to this issue in discussing the results of Experiment 2.

Finally, we examined whether the clusters we identified revealed different proportions of focal and ambient fixations. Since focal fixations are more susceptible to visual changes (Pannasch & Velichkovsky, 2009), we anticipated that greater proportions of focal fixations would be found in the onset and offset clusters. To answer this question, the ratio of the number of focal fixations to the number of ambient fixations was calculated; ratio values >1 indicate more focal than ambient fixations, whereas ratio values <1 represent the opposite. The distractor duration was ignored because of the nonsignificant findings in the previous analysis. Ratio values were entered into a 2 (latency: 100, 200) × 3 (cluster: unaffected, onset, offset) repeated measures ANOVA, revealing significant main effects of latency, F(1, 17)Footnote 1 = 9.38, p = .007, η2 = .23, and cluster, F(2, 34) = 16.56, p < .001, η2 = .67, as well as a significant interaction, F(2, 34) = 5.56, p = .008, η2 = .10. Altogether, more focal fixations were observed, as evidenced in the positive ratio values (see Fig. 4). This finding is in line with previous reports (e.g., Pannasch et al., 2008; Unema et al., 2005; Wedel, Pieters, & Liechty, 2008). More focal fixations occurred in the 200-ms latency condition (latency 100, 3.34; latency 200, 4.44), which likely resulted from our selection procedure. Focal processing is likely related to short saccadic amplitudes and long fixation durations (Velichkovsky et al., 2005); the exclusion of fixations of less than 100 or 200 ms might have an effect on ambient rather than focal fixations, which could result in an overrepresentation of focal fixations in the 200-ms latency condition.

Regarding the main effect of cluster, we obtained the expected increase in the proportion of focal fixations from the unaffected to the offset cluster (unaffected cluster, 2.99; onset cluster, 3.49; offset cluster, 5.18). This was confirmed by Bonferroni-corrected post hoc testing for the differences between the offset and unaffected clusters as well as for the offset and onset clusters, both ps < .005. Furthermore, the interaction revealed similar proportions of focal fixations for both latencies in the unaffected cluster (latency 100, 2.91; latency 200, 3.07), but the proportions differed if fixations were successfully manipulated (onset cluster: latency 100, 2.87; latency 200, 4.11; offset cluster: latency 100, 4.23; latency 200, 6.13; see Fig. 4b). With paired t-tests, we compared the ratio values of both latencies for the particular cluster and obtained significance for the onset and offset clusters, t > 2.65, all ps < .05, but not for the unaffected cluster, p > .05.

Taken together, the distractor effect was observed for both latencies, and fixation prolongation was greater for fixations preceded by short saccades. While the distractor duration was of negligible influence, we attribute the latency differences to the selection criteria in the baseline, lowering the general effect. Moreover, the proportion of focal fixations increased from the unaffected to the offset cluster and was largest in the offset cluster.

The aim of our next experiment was to explore the consequences of the full-display changes, similar to those used in the scene onset delay paradigm (Henderson & Pierce, 2008; Henderson & Smith, 2009). We investigated different types of time-locked changes occurring either early or late in the process of scene exploration. The introduction of this particular variable was motivated by the difference in the durations of fixations at the early and late stages of scene perception (see Pannasch et al., 2008; Unema et al., 2005).

Experiment 2

Method

Subjects

Twenty-five healthy volunteers (18 female) from the Technische Universität Dresden, with a mean age of 23.4 years (range, 19–40 years), took part in this experiment. All subjects had normal or corrected-to-normal vision, provided signed informed consent, and received course credit for participation in the study, which was conducted in conformity with the Declaration of Helsinki.

Apparatus

The same apparatus was used as in Experiment 1.

Stimuli and design

Subjects were presented with 99 digitized paintings by seventeenth- to nineteenth-century European artists; the stimuli were 1,024 × 768 pixels in size and had 24-bit color depth. The database of paintings from Experiment 1 was extended by 39 images of the same styles and periods to provide a sufficient number of trials for the purpose of this experiment. During each trial, an image was shown for 15 s but was interrupted either 1,500 or 5,000 ms after the trial start by the presentation of a random pixel mask for 300, 1,000, or 3,000 ms, followed by the reappearance of the original image. Interruption delays and durations were randomly assigned but equally distributed. At the end of each trial, a cutout of 100 × 100 pixels, taken either from the previous image or from a matched catch-trial image, was shown in the center of the screen. By pressing designated keyboard buttons, subjects had to classify each cutout as being a valid or nonvalid part of the previously seen image.

Procedure

Each subject was informed that the purpose of the study was to investigate eye-movement patterns in the perception of art. Subjects were asked to study the images in preparation for the subsequent recognition task. After performing a nine-point calibration and validation, three study trials were given before the start of the experimental phase. Before each trial, a calibration check was performed. In total, the session took 45 min to complete.

Data analysis

The same preprocessing procedures were applied as in Experiment 1.

Results and discussion

The subjects correctly judged whether the cutout images matched the previous image with a score of 73% (SD = 8.2) correct.

In contrast to Experiment 1, no gaze-contingent experimental manipulations were used. Two display changes appeared, time-locked within each trial: replacement of the image by the random pixel matrix (start of the interruption, henceforth scene offset) and replacement of the random pixel matrix by the original image (end of the interruption, henceforth scene reonset). Since the changes were independent of gaze behavior, they could occur at any time within a fixation. To define an appropriate time interval for the analyses, we used the 75th percentile of the fixation duration of all affected fixations (= 688 ms), resulting in a time window of –700 to 0 ms before a display change (22% of the total data, N = 26,854).

Figure 5 shows fixations plotted as a function of the time remaining before the onset of the display change. Each scatterplot in the figure represents a certain combination of type, delay, and duration of the display change. Although they generally look quite similar, an obvious difference was found for the scene reonset after a change duration of 300 ms (the two top-right panels): Only few fixations were found in the interval between 100 and 200 ms, which can be attributed to the saccadic inhibition caused by the previous scene offset. Moreover, fewer cases are present in the scene reonset condition, particularly for the longer change durations. This is also attributable to the previous display change: Since the preceding fixations were prolonged, the number of subsequent fixations is reduced.

Fig. 5
figure 5

Fixation durations as a function of display change relative to the fixation onset, according to change type, change onset, and change duration. For ease of reference, the y-axis is cut off at 1,100 ms (longer fixations did occasionally occur)

The scatterplots in Fig. 5 reveal, for each distribution, a pronounced diagonal gap, dividing the distributions into an upper and a lower cluster of fixations. This gap is always present, regardless of whether the change happened early or late within a fixation. Whereas fixations in the lower cluster are terminated before the change, fixations in the upper cluster are prolonged by the visual change. We tested the linearity of cluster membership by assigning fixations to 40-ms bins along the x-axis and calculating modal values for each bin and cluster, as shown in Fig. 6. Whereas fixations in the lower clusters remained mostly around the same duration (~200 ms), fixations in the upper cluster were significantly prolonged. This is also reflected by the parameters of the linear regressions: Slopes in the lower clusters approached 0, but most of the slopes for the upper clusters were close to 1, demonstrating a one-to-one linear relationship for time of display change and fixation duration for both scene changes (see Table 1). This linear relationship replicates earlier findings (Pannasch et al., 2001; Reingold & Stampe, 2000, 2002) and clarifies the latency issue we discussed regarding the results of Experiment 1: At any time in a fixation, a distraction results in a prolongation effect very similar to the one we found in Experiment 1. Therefore, the discrepancies in the previous experiment obtained between both latencies when analyzing the difference values can be attributed to the baseline correction.

Fig. 6
figure 6

Regression lines fit to the two modes in each 40-ms bin of display change relative to the fixation onset, according to change type, change onset, and change duration

Table 1 Results of the regression analyses for change type, change onset and change duration

It should be mentioned that not only the scatterplots of fixations in Fig. 5, but also the results for the regression analyses in Fig. 6, were highly similar to the findings by Henderson and Pierce (2008, see their Fig. 2a, b) and Henderson and Smith (2009, see their Figs. 2 and 4). The positive slope for fixation durations in the upper cluster was considered evidence for the direct control hypothesis. The authors concluded that the prolongation of fixations in the upper cluster was due to the absence of scene information (the offset happened during the preceding saccade), because the scene analysis therefore required more time. This explanation, however, cannot be applied to the present data: In 99.7% of the cases, the display changes (i.e., the disappearance or reappearance of the scene information) occurred during fixations. According to the mixed control hypothesis described by Henderson and colleagues (Henderson & Pierce, 2008; Henderson & Smith, 2009), differential influences on a fixation could be predicted for the offset versus the reonset of a scene. Whereas a scene offset should result in an immediate fixation termination, a fixation prolongation should be the result of the scene reonset. In contrast to that, both change types resulted in very similar fixation behaviors (see Figs. 5 and 6, Table 1).

To examine the distractor effect, we subtracted for each subject the fixation durations of the lower cluster (i.e., nondistracted fixations) from those of the upper cluster (i.e., distracted fixations). The obtained difference values were entered into a 2 (change type: scene offset, reonset) × 2 (onset delay: 1,500, 5,000) × 3 (change duration: 300, 1,000, 3,000) repeated measures ANOVA. Data analysis revealed significant main effects of change type, F(1, 24) = 56.47, p < .001, η2 = .77, and change duration, F(2, 48) = 8.00, p = .001, η2 = .15. No effect was obtained for onset delay, F(1, 24) < 1, and there was no significant interaction. Regarding the main effect of change type, smaller difference values were found for the scene offset (scene offset, 137 ms; scene reonset, 345 ms), suggesting a less pronounced distractor effect for the disappearance of a scene. The main effect of change duration revealed an increase in the distractor effect for longer changes (300-ms duration, 184 ms; 1,000-ms duration, 244 ms; 3,000-ms duration, 295 ms). Post hoc comparisons revealed significant differences between the durations (p < .05), except for 1,000 and 3,000 ms. In the case of the scene reonset condition, this increase—very similar to the findings of Henderson and colleagues (Henderson & Pierce, 2008; Henderson & Smith, 2009)—supports the direct control hypothesis. The longer the scene reappearance was delayed, the longer the distracted fixations became.

Figure 7a shows the relationship between change type and change duration. For the scene offset and the 300-ms change duration, the difference value (81 ms) is similar to the difference value of the 100-ms latency in Experiment 1, but for the two longer change durations, we obtained difference values twice as high (1,000-ms duration, 175 ms; 3,000-ms duration, 155 ms). This increase for longer change durations supports direct control of fixations (e.g., Henderson & Smith, 2009): Visual input is expected in some waiting state for a certain time interval; once the interval is exceeded, the fixation is terminated. This time interval can be understood as a sum of the baseline duration (about 330 ms) and the difference value (155–175 ms), resulting in approximately 500 ms. This could explain the difference between the short and the two longer change durations. However, for the scene reonset, the prolongation effect is much stronger. In some cases, the offset and reonset took place during one fixation, a distraction very similar to the manipulation in Experiment 1. Excluding the fixations that were affected twice (offset and reonset) reduced the gap between scene offset and reonset but did not eliminate the difference (see Fig. 7b). The filtered data were applied to a 2 (change type: scene offset, reonset) × 3 (change duration: 300, 1,000, 3,000 ms) repeated measures ANOVA and revealed significant main effects of change type, F(1, 24) = 24.76, p < .001, η2 = .36, and change duration, F(2, 48) = 17.08, p < .001, η2 = .44, as well as a significant interaction, F(2, 48) = 8.93, p = .001, η2 = .20.

Fig. 7
figure 7

(a) Mean differences and standard errors for change type and change duration, (b) corrected for fixations that were affected by both scene changes, and (c) in relation to the fixation types for both change types. (d) Ratios for fixations in the two clusters and in relation to both change types

From the present data, we propose that in addition to saccadic inhibition caused by the scene change, other mechanisms are at work. We will elaborate on this proposal in the General Discussion section.

Next, we tested whether the assumed differences in the modes of attention also contributed to the present findings. As in Experiment 1, we distinguished between ambient and focal fixations, based on the amplitude of the preceding saccade. To provide a database with n ≥ 10 in each single cell, we collapsed the factors Change Onset and Change Duration. We conducted a 2 (change type: scene offset, reonset) × 2 (fixation type: ambient, focal) repeated measures ANOVA. Significant main effects were obtained for change type, F(1, 24) = 58.41, p < .001, η2 = .94, and fixation type, F(1, 24) = 10.44, p = .004, η2 = .06, but there was no interaction. As was expected, the difference values were again smaller for the scene offset (scene offset, 138 ms; scene reonset, 300 ms). Focal fixations were more sensitive to the scene change, expressed by a larger difference value (focal, 249 ms; ambient, 198 ms). Since there was no interaction, the proposed distinction of the two attentional modes is similar for both change types, which is also shown in Fig. 7d.

Finally, to analyze the proportions of focal and ambient fixations in the different clusters, the ratio of the number of focal fixations to the number of ambient fixations was calculated for both change types and both clusters. Due to the non-gaze-contingent manipulation, in contrast to Experiment 1, no shifts in the proportion of ambient and focal fixations were expected. Ratio values were entered into a 2 (change type: scene offset, reonset) × 2 (cluster: upper, lower) repeated measures ANOVA. Significant effects were found for the factors Change Type, F(1, 24) = 30.71, p < .001, η2 = .74, and Cluster, F(1, 24) = 5.88, p = .023, η2 = .15, as well as for their interaction, F(1, 24) = 5.40, p = .029, η2 = .11. Regarding the main effects, we obtained fewer focal fixations around the scene offset (scene offset, 1.66; scene reonset, 2.30) but an overall larger proportion of focal fixations in the upper cluster (upper, 2.12; lower, 1.83). According to the significant interaction, there was a reliable difference in the scene reonset condition (upper cluster, 2.56; lower cluster, 2.03), but for the scene offset, the proportions remained stable (upper cluster, 1.67; lower cluster, 1.64); see Fig. 7d. Post hoc paired t-tests revealed significance only for scene reonset differences, t = 2.98, p = .007, not for the scene offsets, p > .05.

Again, a larger proportion of focal fixations was observed (see also Pannasch et al., 2008; Unema et al., 2005; Wedel et al., 2008), as in Experiment 1. Although the proportion remained stable during the scene offset, this was obviously influenced by the disappearance of the visual information, since more focal fixations were found around the scene reonset, especially in the upper cluster (Fig. 7c).

General discussion

The influence of visual changes on the duration of individual fixations was investigated. In Experiment 1, fixations were analyzed when gaze-contingent distractors of different onset latencies and durations appeared as rings around the center coordinates of a fixation, covering a small proportion of the scene. In Experiment 2, we examined fixation durations in situations where the image was replaced by a random pixel matrix—that is, an offset of useful scene information—and vice versa—corresponding to an onset of useful scene information. Of particular interest for our analysis was the mixed—direct and indirect—control hypothesis of fixation duration. The hypothesis is based on data reflecting two distinct subpopulations of fixations obtained with the scene onset delay paradigm (Henderson & Pierce, 2008; Henderson & Smith, 2009). We explored this hypothesis from the perspective of different modes of visual attention, as evidenced by a correlation between eye-movement patterns and performance in continuous visual tasks (Velichkovsky et al., 2005).

Provided that fixations are generally or at least in part under indirect control, no particular relation between distractor parameters and the duration of the affected fixations was expected (Morrison, 1984; Rayner & Pollatsek, 1981). The results of Experiment 1 do not support this hypothesis, since we obtained clear time-locked effects of the distractor onset and offset on the duration of fixations (Fig. 2), thereby supporting the direct control approach.

According to direct control theories, the duration of a fixation represents the time required for perceptual and cognitive processing, together with the planning of the next saccade. Although the point during a fixation in which the decision about the next saccade is made is still an open question (Rayner, 2009), for the present purpose it is sufficient to outline four possible alternatives: The decision is made (1) after the fixation onset, (2) in the middle of a fixation, (3) before the fixation is terminated, or (4) in parallel with all other processes that are completed during a fixation. Manipulating visual information during a fixation is likely to influence the processing of such information, but should be less influential for the saccade programming itself. Considering only the results of Experiment 1, Alternative 3 might seem a promising candidate, but according to Experiment 2, which covers a broader range of distractions within fixations, this does not seem to be the case. Distractors were found to have the same influence at any time within a fixation, which rules out Alternatives 1, 2, and 3. We are therefore in favor of Alternative 4, which assumes that scene information is processed while saccade planning is executed in parallel, as has been put forward in direct control theories (Morrison, 1984).

Our main findings here corroborate treating the prolongation of fixations as a manifestation of the orienting response (Graupner et al., 2007). Despite differences in the timing, size, and relevance of the distractors in both experiments, a dip in fixation distributions always appeared roughly 120 ms after a distractor or display change. On the basis of these dips in the fixation distribution, different clusters were identified. Connecting the peaks of the respective clusters by linear regression analyses revealed a monotonic increase of fixation durations in the upper (or “offset”) cluster as a function of the distractor duration. The slopes of the regression lines reflect, in a one-to-one manner, the timing of the display change. This relationship, found in the data of both our experiments, indicates that the prolongation of fixations is related to the change per se. Linearity was also observed for the regression of the lower cluster, but here the slopes were close to zero. This implies that these fixations were terminated before the fixation could be influenced.Footnote 2 The question is then whether those fixations are under indirect control, as suggested by Henderson and colleagues (Henderson & Pierce, 2008; Henderson & Smith, 2009). Data from Experiment 1 can contribute to this discussion. According to the two dips in the distributions of fixation durations, one after distractor onset and another following distractor offset, three subpopulations of fixations were identified. While the observation of lower and upper clusters agrees with the assumptions of the mixed control hypothesis, it is difficult to explain the existence of the middle (or “onset”) cluster within this framework. These fixations are clearly under the direct control of the distractor onset. Accordingly, the occurrence of all three clusters can be explained in terms of direct control mechanisms.

Taken together, the present study of the distractor effect replicated previous results obtained with the scene onset delay paradigm (Henderson & Pierce, 2008; Henderson & Smith, 2009), even without the scene onset delay itself. The existence of distinct subpopulations of fixations was demonstrated by manipulating a part of the parafoveal information (Exp. 1) and by changing the full scene information (Exp. 2) at any time during the fixation. Therefore, the results cannot be reduced solely to the availability of useful visual scene information. Whenever a change happens within a fixation—regardless of its nature and/or importance for the ongoing information processing—an inhibitory mechanism reduces the probability of programming and executing the next saccade. Due to the observed direct relation between distracting events and saccadic inhibition, the present results support the direct control hypothesis.

Henderson and Pierce (2008) rejected saccadic delay (or inhibition) as a possible explanation of their results because of the occurrence of increased fixation durations for longer delays in the lower cluster. They argued that those unaffected fixations were lengthened by the absence of visual information, relative to zero and short delays. Following the predictions of the mixed control model, this is contradictory, because unaffected fixations (i.e., fixations in the lower cluster) are rather indirectly controlled and should therefore not be influenced by the presence or absence of visual information. Furthermore, a closer look at the fixation distributions in Henderson and Pierce (2008) and Henderson and Smith (2009) reveals that it is hardly possible to find differences between the zero-delay condition and the lower population for the longer delays, presumably due to the form of presenting the baseline (cf. the baseline histograms and scatterplots in Fig. 2).

The results of Experiment 2 allow for different explanations. The first is that the scene offset—in addition to the saccadic inhibition—activates a waiting state in anticipation of further visual information. As discussed earlier, we assume about 500 ms as the maximum length for the waiting state (baseline plus difference value; see the lower lines in Fig. 7a, b). For the scene reonset, it is similar to the scene onset delay paradigm (e.g., Henderson & Smith, 2009). The scene reappearance also elicits saccadic inhibition, but a further prolongation (see Fig. 7b, upper line) is necessary to explore the “new” visual information (see, e.g., van Diepen & d’Ydewalle, 2003). Such processes may be required only subsequent to long interruptions, explaining the larger difference values in the 3,000-ms duration condition. The second explanation simply refers to the fact that fixation length in the lower cluster is limited by change duration (i.e., saccadic inhibition due to the scene reonset). Increasing change durations make it likely to observe longer fixations. Further studies will contribute to this discussion, but the present results give evidence that the gap in the fixation distributions is due to the visual change per se.

Finally, we assigned fixations to either an ambient or a focal processing mode, based on the amplitude of the preceding saccade, using the conventional radius of the parafoveal region (5° of visual angle) as a separation criterion (Velichkovsky et al., 2005). To investigate how individual fixations are controlled, we examined difference values with regard to the preceding saccade amplitude, and the ratio of focal to ambient fixations within each cluster.

Regarding the difference values of fixation durations, both experiments revealed a stronger influence of visual distraction for fixations attributed to the focal mode. These results extend earlier observations by Pannasch and Velichkovsky (2009) by showing similar fixation behavior for non-gaze-related changes in Experiment 2.

In line with earlier reports (e.g., Unema et al., 2005; Wedel et al., 2008), we observed a larger proportion of focal fixations. The overall difference in the ratio values between both experiments can be explained by the time course of fixation durations in scene perception: Early in scene inspection—the time interval analyzed in Experiment 2—a dominance of ambient processing has been found, whereas distractors in Experiment 1 were shown during a later period, when focal processing prevailed (Pannasch et al., 2008). The fact that the largest fraction of focal fixations was found in the offset cluster of Experiment 1 confirms that focal fixations are more susceptible to visual distraction. Additionally, we observed a difference regarding the onset latencies (see Fig. 4b), reflecting an increased likelihood of focal fixations for longer fixation durations (Velichkovsky et al., 2005). In the non-gaze-related scene offset in Experiment 2, we observed nearly equal ratios in both clusters. In addition, as a result of the scene offset, a shift to more focal fixations was predicted. The larger proportion of focal fixations for the scene reappearance confirms this shift, particularly in the upper cluster. This again supports the assumption that focal fixations are more susceptible to distracting events.

The idea that eye-movement behavior can be used to identify the mode of visual attention can be extended to previous studies that analyzed visual task performance (Velichkovsky, 2002; Velichkovsky et al., 2005) and relationships between saccadic amplitudes and fixation durations in the perception of complex images (Tatler & Vincent, 2008; Unema et al., 2005). This view agrees with the two-attentional-networks approach: Changes in the environment require increased activity in the ventral frontoparietal network of visual attention by interrupting the ongoing selection in the dorsal network (Corbetta et al., 2008). One can expect, therefore, that the processing and suppressing of the distractors requires special efforts, reflected in increased activity of the brain’s attentional mechanisms (Graupner, Pannasch, & Velichkovsky, 2011; Hickey, Di Lollo, & McDonald, 2009).

Our results demonstrate that virtually any form of visual distraction during a fixation prolongs its duration. If two distracting events occur within the same fixation, three subpopulations of fixations can be identified. This finding suggests exclusive direct control mechanisms for fixation durations, and thereby challenges the mixed control hypothesis. An alternative approach is to ask whether a fixation is more under the control of the ambient or focal mode of attention, on the basis of analysis of the amplitudes of neighboring saccades. With this distinction, it is possible to differentiate between groups of affected and unaffected fixations, but the distinction also allows for identifying the underlying processing mechanisms of an individual fixation, even within the subpopulations in question.