On the control of visual fixation durations in free viewing of complex images

Pannasch, Sebastian; Schulz, Johannes; Velichkovsky, Boris M.

doi:10.3758/s13414-011-0090-1

On the control of visual fixation durations in free viewing of complex images

Published: 27 January 2011

Volume 73, pages 1120–1132, (2011)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

On the control of visual fixation durations in free viewing of complex images

Download PDF

Sebastian Pannasch^1,2,
Johannes Schulz² &
Boris M. Velichkovsky^2,3

2506 Accesses
37 Citations
1 Altmetric
Explore all metrics

Abstract

The mechanisms for the substantial variation in the durations of visual fixations in scene perception are not yet well understood. During free viewing of paintings, gaze-contingent irrelevant distractors (Exp. 1) and non-gaze-related time-locked display changes (Exp. 2) were presented. We demonstrated that any visual change—its onset and offset—prolongs the ongoing fixation (i.e., delays the following saccade), strongly suggesting that fixation durations are under the direct control of the stimulus information. The strongest influence of distraction was observed for fixations preceded by saccades within the parafoveal range (<5° of visual angle). We assume that these fixations contribute to the focal in contrast to the ambient mode of attention (Pannasch & Velichkovsky, Visual Cognition, 17, 1109–1131, 2009; Velichkovsky, Memory, 10, 405–419, 2002). Recent findings about two distinct “subpopulations of fixations,” one under the direct and another under the indirect control of stimulation (e.g., Henderson & Smith, Visual Cognition, 17, 1055–1082, 2009), are reconsidered in view of these results.

Fixation durations in scene viewing: Modeling the effects of local image features, oculomotor parameters, and task

Article Open access 01 August 2016

A Computational Dual-Process Model of Fixation-Duration Control in Natural Scene Viewing

Article Open access 01 September 2021

Evaluating the influence of a fixated object’s spatio-temporal properties on gaze control

Article 17 February 2016

It is known that visual fixation durations in continuous visual tasks such as reading and scene perception vary from less than 100 ms to several seconds, thereby resulting in a positively skewed distribution, with modal values between 200 and 350 ms (Rayner, 1998; Unema, Pannasch, Joos, & Velichkovsky, 2005). Durations vary a great deal from one fixation to the next (Buswell, 1935; Stratton, 1906; Yarbus, 1967). It has been suggested that fixation durations are determined by information processing (Groner & Groner, 1989; Just & Carpenter, 1980), cognitive processes (Shebilske, 1975), and eye-movement preprogramming (Buswell, 1935; Zingale & Kowler, 1987).

Visual attention is among these factors, as has been shown in a number of studies (e.g., Brockmole & Boot, 2009; Pannasch & Velichkovsky, 2009). In particular, we found that a combination of relatively long (>180-ms) fixations surrounded by small-amplitude saccades (i.e., saccades within the parafoveal range of 5°) significantly improves both recognition of foveated picture fragments (van der Linde, Rajashekar, Bovik, & Cormack, 2009; Velichkovsky, Joos, Helmert, & Pannasch, 2005) and reaction to sudden hazardous events in a simulated dynamic environment (Velichkovsky, Rothert, Kopf, Dornhoefer, & Joos, 2002). We attributed this difference in performance to the different modes of visual attention—focal versus ambient—and supposed that the underlying brain mechanisms are localized in ventral and dorsal parts of posterior cortex (Corbetta, Patel, & Shulman, 2008; Milner & Goodale, 2008). Since good visual acuity is limited to the parafoveal region (i.e., 5° of visual angle; see, e.g., Wyszecki & Stiles, 1982), it can be assumed that subsequent fixations remaining within this region are rather related to the processing of details and identification of objects (Velichkovsky et al., 2005). In contrast, visual fixations subsequent to large-amplitude saccades (>5°) are likely to be involved in processing of information about the spatial arrangement of rather undifferentiated visual “blobs” (cf. Trevarthen, 1968). This allows for classifying fixations on the basis of the prior saccadic amplitude: If the preceding amplitude is larger than 5°, this fixation is presumably in the service of the ambient attention mode, whereas for preceding saccadic amplitudes smaller than 5°, the fixation is assumed to belong to the focal attention mode (cf. Pannasch & Velichkovsky, 2009).

Various models can be developed to predict the spatial and temporal aspects of eye-movement control. In contrast to reading research, where considerable effort has been invested to predict when the eyes will move (Rayner, 2009), the focus of the work in scene perception has until very recently been on the question of where the eyes will move next (e.g., Itti & Koch, 2001; Tatler, Baddeley, & Vincent, 2006; Torralba, Oliva, Castelhano, & Henderson, 2006; Underwood, Foulsham, & Humphrey, 2009).

For fixation duration, direct and indirect control mechanisms have been proposed (Rayner, 1998). While direct control theories suppose that decisions about fixation termination are made during the ongoing fixation, indirect control theories suppose that the current fixation is determined by other factors. Direct control theories are supported by the fact that the available visual information influences the duration of fixations (Loftus, 1985; Mannan, Ruddock, & Wooding, 1995; Parkhurst, Culurciello, & Niebur, 2000). For example, Mannan et al. reported longer fixations for low-pass-filtered than for unfiltered scenes. A prolongation of fixations has also been found when the amount of either foveal or peripheral information was limited by a gaze-contingent mask (van Diepen & d’Ydewalle, 2003). Results from recent studies of scene perception led Henderson and colleagues to suggest a mixed control model for fixation durations (Henderson & Pierce, 2008; Henderson & Smith, 2009; Nuthmann, Smith, Engbert, & Henderson, 2010). In these experiments, the scene onset delay paradigm was applied—that is, the scene was replaced by a pattern mask during a saccade and reappeared in a subsequent fixation after various delays. The authors reported a prolongation for a certain proportion of fixations until the scene’s reappearance, but other fixations remained unaffected by the scene onset manipulation. The existence of the first group (“population” or “subpopulation”) of fixations was interpreted as evidence for direct control by the scene information, whereas the second group was considered as being under some form of indirect control.

The scene onset delay paradigm is similar to another well-known paradigm of eye tracking research, namely distractor presentation experiments, especially in their gaze-contingent version (Pannasch, Dornhoefer, Unema, & Velichkovsky, 2001; Reingold & Stampe, 2000). The distractor presentation experiments usually introduce sudden changes in the visual environment (Lévy-Schoen, 1969; Walker, Deubel, Schneider, & Findlay, 1997). Various studies in reading and scene perception have shown a reduction of the saccade probability from 90 to 120 ms following the change (Reingold & Stampe, 2000, 2002). This probability reduction can be viewed as a prolongation of fixation durations (Pannasch et al., 2001) but can also be interpreted as a delay of subsequent saccades (Reingold & Stampe, 2000) or a general inhibition of behavior found during the orienting response (cf. Sokolov, 1963). The latter view is supported by the fact that gaze-contingent distractors of different modalities have revealed habituation-like processes in eye movements and in cortical event-related potentials (ERPs; Graupner, Velichkovsky, Pannasch, & Marx, 2007; Pannasch et al., 2001; Velichkovsky & Pannasch, 2001).

We performed two distractor paradigm experiments to distinguish between direct and indirect control of fixation durations. In Experiment 1, gaze-contingent distractors appeared either early or late within selected fixations. The distractors were presented for different durations in order to allow for an experimental manipulation similar to the manipulation of scene onset delay. In the second experiment, two different types of holistic display changes occurred time-locked within the presentation, independently of the actual gaze behavior. Our experiments address two as-yet-unanswered questions. First, is it possible to determine which mechanism (direct vs. indirect) controls fixation duration? Second, can fixation control be modulated by two different attentional mechanisms?

We anticipated that both distractor paradigm experiments would replicate the findings obtained with the scene onset delay paradigm, demonstrating similar fixation behavior for changes of different qualities and quantities. Additionally, we predicted differential influences of visual changes, depending on the mode (focal vs. ambient) of attention. Recent neuroanatomical research has made a distinction between two frontoparietal networks of visual attention, a ventral one that interrupts and resets ongoing activity, and a dorsal attention network specialized for spatial selection of stimuli and responses (Corbetta et al., 2008). In order to reconcile this distinction with our view of the modes of visual attention, the focal attention mode should require ventral network activity, whereas ambient processing would be related to activity of the dorsal network. Regarding the present study, this implies that an interruption by the appearance of a distractor should have a stronger influence when the focal-processing mode is active, a claim that has been partly supported by previous experimental data (Pannasch & Velichkovsky, 2009).

Experiment 1

Method

Subjects

Nineteen students (13 female) from the Technische Universität Dresden, with a mean age of 24.2 years (range, 20–33 years), took part in this experiment. All subjects had normal or corrected-to-normal vision and received course credit for participation in the study, which was conducted in conformity with the Declaration of Helsinki.

Apparatus

The subjects were seated in a dimly illuminated, sound-attenuating room. Eye movements were sampled monocularly at 1 kHz using the SR EyeLink 1000 infrared eyetracking system, with online detection of saccades and fixations and a spatial accuracy of better than 0.5°. Saccades were identified by deflections in eye position in excess of 0.1°, with a minimum velocity of 30°/s and a minimum acceleration of 8,000°/s², maintained for at least 4 ms. Pictures were displayed using a GeForce 7300-GT card and a CRT display (19-in. Samtron 98 PDF) of 1,024 × 768 pixels at a refresh rate of 100 Hz. Viewed from a distance of 80 cm, the screen subtended a visual angle of 27.1° horizontally and 20.5° vertically.

Stimuli

Sixty digitized pieces of fine art by seventeenth- to nineteenth-century European painters served as the stimulus materials, since paintings are considered “maximal memory stores” and some of the “most valuable objects in human history” (Leyton, 2006, p. 2). In addition, the use of paintings allowed us to directly compare our results with those of previous work (Graupner et al., 2007; Pannasch et al., 2001). Stimuli were displayed with a size of 1,024 × 768 pixels. To systematically investigate the influences of temporally and spatially gaze-contingent visual distractions, a light blue annulus 2° in diameter (margin width = 0.345°) served as a distractor, while the paintings were shown in grayscale. The color of the distractor assured its visibility, whereas the annular form did not disturb the center of the fovea, covering only about 3.4% of the parafoveal region. Figure 1 shows an example of an image with a distractor.

Procedure

The subjects were informed that the purpose of the study was to investigate eye-movement patterns in the perception of art and were asked to study the images in order to be prepared for a subsequent recognition task. The subjects were aware of the presentation of the distractors but were instructed to ignore them. The experiment was run in four consecutive blocks, each containing 15 pictures, with a 5-min break after the second block. The full experiment took 90 min in total to complete. An initial nine-point calibration and validation was performed before the start of each block, and calibration was checked prior to every trial in the experiment. Subjects were initially given two study trials in order to get acquainted with the task.

The first distractor was always presented after an initial 5-s period of scene inspection. This initial phase without the experimental manipulation was included because systematic changes in fixation durations and saccadic amplitudes during the first few seconds of scene inspection have often been reported (e.g., Antes, 1974; Pannasch, Helmert, Roth, Herbold, & Walter, 2008; Unema et al., 2005). Once all 18 distractors (see below) had been shown, the image was replaced by a centrally presented cutout (size 100 × 100 pixels) belonging either to the trial image or to the image presented in a matched catch trial. Catch-trial images were either painted by the same artist or displayed a similar setting. The subjects had to judge whether the cutout was part of the previously seen image by clicking on-screen “yes” or “no” buttons using the mouse.

Distractors were presented during every fifth fixation in a trial. This presentation interval was selected in accordance with previous work (cf. Graupner et al., 2007; Pannasch & Velichkovsky, 2009) in order to ensure that distractors did not become predictable and to acquire enough unaffected fixations to create a baseline. Distractors were triggered by the fixation onset, with an onset latency of either 100 or 200 ms; they remained visible for 100, 200, or 300 ms and were centered on the coordinates of the actual fixation. For each trial, three distractors of each onset delay and duration combination were shown in a randomized order, resulting in a total of 18 distractors per image. If a fixation was terminated before reaching the predefined onset latency, the program waited for the next suitable fixation. Image presentation lasted until the respective number of distractors in each category were presented (on average, about 40 s).

Data analysis

Raw eye-movement data were preprocessed by removing fixations that occurred around eye blinks or outside the presentation screen. Only distracted fixations and the two adjacent nondistracted fixations—the latter serving as baseline—were processed further. Experimentally manipulated fixations were removed if the predefined distractor latency was exceeded by ±10 ms. To assure the comparability of the baseline with the distractor condition, fixations of a duration shorter than the respective distractor latency (100 or 200 ms) were excluded, resulting in a total of 99,205 (84%) fixations. Because of the positive skew in the distribution of fixation durations, median rather than mean values were used as a measure of central tendency. For statistical testing, the respective median values were subjected to repeated measures ANOVAs. In order to highlight possible mutual relationships, linear regression analyses were applied between fixations of different distractor durations (see below). Eta-squared values are reported as estimates of the effect size (Levine & Hullett, 2002).

Results and discussion

The subjects correctly judged whether cutout images matched the previous image with a score of 78% (SD = 6.6).

The first objective was to see whether the presentation of distractors led to distinct clusters of fixation durations. The second was to analyze the influences of distractor onset latency and distractor duration. Thirdly, we compared the durations of fixations with distractors to baseline fixations, with regard to the influence of the preceding saccade amplitude.

The topmost histogram in Fig. 2a exemplifies the typical right-skewed distribution of unaffected fixations. The histograms in the subsequent rows show the fixation duration distributions influenced by distractors; each histogram represents a particular latency and duration combination. Figure 2b shows scatterplots of distractor and baseline fixations according to the distractor onset latency. Scatterplots for the distractor fixations are further subdivided by the distractor duration. For our later discussion, it should be mentioned here that baseline and distractor fixations covered the same range of durations.

A clear influence of the experimental manipulation was found: Irrespective of the distractor latency and duration, the histograms in Figure 2a show a first dip 100–140 ms after the distractor onset and a second dip about 100–140 ms following the distractor offset. Accordingly, fixation distributions shift to the right if a distractor is shown; in other words, the fixation duration increases. Likewise, in the scatterplots of Fig. 2b, gaps in the fixation distributions represent the distractor influence. The timing of this inhibition is in line with previous reports (Graupner et al., 2007; Reingold & Stampe, 2004). The distribution of fixation durations can be characterized by three clusters: First, fixations that are terminated within 100 ms after the distractor onset compose the unaffected cluster, since there is not enough time for distractors to influence the fixation. Next, there are the fixations enclosed by the two gaps—the onset cluster—because those fixations are most likely affected only by the distractor onset (Fig. 2b). Finally, related to the distractor duration and therefore to the disappearance of the distracting event, a third, offset cluster was identified, containing fixations that were influenced both by the onset and offset of the distractor. The same subdivision can be recognized in the histograms of Fig. 2a; here, the clusters are separated by dips in the fixation distributions.

The data shown in Fig. 2 suggest that the timing of the onset and offset clusters is influenced by the different distractor durations. To explore this relationship, differences between the distributions of the baseline and distractor fixation durations were calculated. The resulting difference (gray lines in Fig. 3) was either negative (decreased saccadic activity and fewer fixations than in the baseline) or positive (increased saccadic activity and more fixations than in the baseline). From each of the three aforementioned clusters, we selected the peaks of saccade activity (bullets in Fig. 3) along the difference lines to compute linear regressions (black lines in Fig. 3).

The fixation duration of the unaffected clusters remained constant across distractor durations and onset latencies (latency 100: slope = –0.1, intercept = 160, R ² = .25, p = .67; latency 200: slope = 0, intercept = 246, R ² = 0, p = .90; Fig. 3, bottom regression lines in panels A and B). Fixation durations of the onset clusters as a function of distractor duration revealed a slight (but nonsignificant) increase for each distractor latency (latency 100: slope = 0.30, intercept = 260, R ² = .75, p = .33; latency 200: slope = 0.30, intercept = 340, R ² = .93, p = .33; Fig. 3, middle regression lines). For the offset clusters, a monotonic increase with respect to distractor duration was found for each distractor latency (latency 100: slope = 1.02, intercept = 280, R ² = 1.0, p < .001; latency 200: slope = 1.05, intercept = 380, R ² = 1.0, p < .001; Fig. 3, top regression lines). Although increased distractor duration to some extent influenced the onset cluster fixation duration, a significant effect was found for the offset cluster. This one-to-one linear relationship for the offset cluster—meaning that increasing the distraction time prolongs fixation—exactly mirrors the findings from the scene onset delay paradigm (Henderson & Pierce, 2008; Henderson & Smith, 2009). This means that we replicated the findings from the scene onset delay paradigm by presenting simple and irrelevant distractors. Since scene information remained stable, the findings can hardly be explained by the presence or absence of scene information; therefore, we interpret the prolongation of fixation durations as being caused by the onset and duration of the distractor.

To alternatively examine the distractor effect with respect to the different distractor latencies and durations, baseline fixation durations were subtracted from the fixation durations of the distractor conditions; the resulting difference values represent the prolongation of the fixation due to the shown distractor (Graupner et al., 2007). Additionally, distractor fixations were classified as either ambient or focal on the basis of the preceding saccade amplitude (Pannasch & Velichkovsky, 2009).

Difference values were applied to a 2 (latency: 100, 200) × 3 (duration: 100, 200, 300) × 2 (fixation type: ambient, focal) repeated measures ANOVA and revealed significant main effects of latency, F(1, 18) = 41.47, p < .001, η² = .68, and fixation type, F(1, 18) = 14.21, p = .001, η² = .15, as well as a significant interaction for latency and fixation type, F(1, 18) = 5.71, p = .028, η² = .05. No reliable effects were found for duration, F(2, 36) = 2.81, p = .073. Regarding the main effect of latency, the difference values were larger if distractors appeared 100 ms after the fixation onset (latency 100, 86.3 ms; latency 200, 41.5 ms). Considering the influence of the fixation type, larger difference values were obtained for focal fixations (focal, 74.3 ms; ambient, 53.4 ms), replicating our previous findings (Pannasch & Velichkovsky, 2009). The interaction was based on the fact that difference values for the 200-ms latency condition differed by about 33 ms (focal, 57.9 ms; ambient, 25.1 ms), whereas those for the 100-ms latency condition differed only by about 9 ms (focal, 90.8 ms; ambient, 81.8 ms; see Fig. 4a).

Two different explanations are possible for the fact that the difference values are smaller for larger distractor latencies. Distractions appearing late within a fixation might be less influential because visual information is processed only during the beginning of a fixation. During later stages, mental activities associated with the processing of the fixated information and the programming of the next saccade take place. These processes should be less susceptible to visual disruptions. Although support for this idea comes from reading research (Morrison, 1984; Rayner, Liversedge, White, & Vergilino-Perez, 2003; Rayner & Pollatsek, 1992), it contradicts various studies that have used the distractor paradigm, where similar influences of the distractors at different times within the affected fixation have been reported (Pannasch et al. 2001; Reingold & Stampe, 2000, 2002). Possible spillover effects of the distractor appearance could be rejected, since comparing durations of fixation n – 1 and n + 1 relative to the distractor fixation with a paired t-test revealed no differences, t = 1.82, p = .086. Therefore, the alternative explanation is based on the baseline correction. As already mentioned, the baseline contains only fixations with durations above the distractor latency. Accordingly, the difference value in the 200-ms latency condition might be smaller because of the increased baseline fixation duration. This interpretation is also supported by the dips in the histograms and the gaps in the scatterplots in Fig. 2, which show similar influences of distractors of both latencies. We will come back to this issue in discussing the results of Experiment 2.

Finally, we examined whether the clusters we identified revealed different proportions of focal and ambient fixations. Since focal fixations are more susceptible to visual changes (Pannasch & Velichkovsky, 2009), we anticipated that greater proportions of focal fixations would be found in the onset and offset clusters. To answer this question, the ratio of the number of focal fixations to the number of ambient fixations was calculated; ratio values >1 indicate more focal than ambient fixations, whereas ratio values <1 represent the opposite. The distractor duration was ignored because of the nonsignificant findings in the previous analysis. Ratio values were entered into a 2 (latency: 100, 200) × 3 (cluster: unaffected, onset, offset) repeated measures ANOVA, revealing significant main effects of latency, F(1, 17)^{Footnote 1} = 9.38, p = .007, η² = .23, and cluster, F(2, 34) = 16.56, p < .001, η² = .67, as well as a significant interaction, F(2, 34) = 5.56, p = .008, η² = .10. Altogether, more focal fixations were observed, as evidenced in the positive ratio values (see Fig. 4). This finding is in line with previous reports (e.g., Pannasch et al., 2008; Unema et al., 2005; Wedel, Pieters, & Liechty, 2008). More focal fixations occurred in the 200-ms latency condition (latency 100, 3.34; latency 200, 4.44), which likely resulted from our selection procedure. Focal processing is likely related to short saccadic amplitudes and long fixation durations (Velichkovsky et al., 2005); the exclusion of fixations of less than 100 or 200 ms might have an effect on ambient rather than focal fixations, which could result in an overrepresentation of focal fixations in the 200-ms latency condition.

Regarding the main effect of cluster, we obtained the expected increase in the proportion of focal fixations from the unaffected to the offset cluster (unaffected cluster, 2.99; onset cluster, 3.49; offset cluster, 5.18). This was confirmed by Bonferroni-corrected post hoc testing for the differences between the offset and unaffected clusters as well as for the offset and onset clusters, both ps < .005. Furthermore, the interaction revealed similar proportions of focal fixations for both latencies in the unaffected cluster (latency 100, 2.91; latency 200, 3.07), but the proportions differed if fixations were successfully manipulated (onset cluster: latency 100, 2.87; latency 200, 4.11; offset cluster: latency 100, 4.23; latency 200, 6.13; see Fig. 4b). With paired t-tests, we compared the ratio values of both latencies for the particular cluster and obtained significance for the onset and offset clusters, t > 2.65, all ps < .05, but not for the unaffected cluster, p > .05.

Taken together, the distractor effect was observed for both latencies, and fixation prolongation was greater for fixations preceded by short saccades. While the distractor duration was of negligible influence, we attribute the latency differences to the selection criteria in the baseline, lowering the general effect. Moreover, the proportion of focal fixations increased from the unaffected to the offset cluster and was largest in the offset cluster.

The aim of our next experiment was to explore the consequences of the full-display changes, similar to those used in the scene onset delay paradigm (Henderson & Pierce, 2008; Henderson & Smith, 2009). We investigated different types of time-locked changes occurring either early or late in the process of scene exploration. The introduction of this particular variable was motivated by the difference in the durations of fixations at the early and late stages of scene perception (see Pannasch et al., 2008; Unema et al., 2005).