Saccade control in natural images is shaped by the information visible at fixation: evidence from asymmetric gaze-contingent windows
- 881 Downloads
When people view images, their saccades are predominantly horizontal and show a positively skewed distribution of amplitudes. How are these patterns affected by the information close to fixation and the features in the periphery? We recorded saccades while observers encoded a set of scenes with a gaze-contingent window at fixation: Features inside a rectangular (Experiment 1) or elliptical (Experiment 2) window were intact; peripheral background was masked completely or blurred. When the window was asymmetric, with more information preserved either horizontally or vertically, saccades tended to follow the information within the window, rather than exploring unseen regions, which runs counter to the idea that saccades function to maximize information gain on each fixation. Window shape also affected fixation and amplitude distributions, but horizontal windows had less of an impact. The findings suggest that saccades follow the features currently being processed and that normal vision samples these features from a horizontally elongated region.
KeywordsEye movements Scene perception Attention Saliency Natural vision
The human’s visual environment is extremely rich. At any one time, people are faced with a continuous array of information comprising important or potentially useful items amidst a background of less informative noise. The visual system’s answer to this complexity is twofold. First, the retinas encode the whole visual field in a non-uniform manner: Spatial resolution is greatest at the fovea and decreases rapidly, meaning that objects in central vision are processed in fine detail, while neural resources are spared the intensive task of representing the whole environment at this level of precision. Second, a series of fast eye movements are then programmed to align the high-resolution fovea with different parts of the visual array. The efficiency of the visual system at processing the parts of the environment most important for the current task, therefore, depends crucially on its ability to make efficient eye movements. Specifically, the eye guidance system must compute where to move the eyes in order to process important regions, but this computation can only be an estimate based on the low-resolution preview of the periphery. Although the resolution of the visual system drops off exponentially as a stimulus is moved further from the current fixation, researchers often divide the visual field into the fovea (within about 1° of fixation), the parafovea (between about 1° and 5° from fixation), and the periphery (more than 5° from fixation; see, e.g., Larson & Loschky, 2009).
In this study, we examined global changes in eye movements during a scene-encoding task, by manipulating the extent of the scene that could be processed on each fixation, using a gaze-contingent display. With this aim in mind, we will first review some of the previous research investigating eye guidance in scenes and the use of gaze-contingent displays.
Eye guidance in natural scenes
Two of the earliest studies of eye movements used pictures of natural scenes and identified two important facts about where people look in such images (Buswell, 1935; Yarbus, 1967). First, fixations are not uniformly distributed but cluster on points of interest (e.g. faces and objects). Second, eye movement patterns change depending on the viewer’s task. Subsequent researchers have sought to determine what aspects of the image or the task influence the decision of where to move the eyes (for a review, see the recent special issue: Tatler, 2009).
One approach has been to identify the features commonly found at fixated locations (Reinagel & Zador, 1999) and use these features to compute a saliency map of conspicuous points in the image (Itti & Koch, 2000). This model predicts that people will look at the most salient points in the image, and the implication is that the visual system is computing saliency from peripheral information and using this as an estimate of the most important places to fixate. The saliency map model can predict eye movements better than chance, and it has the advantage of being applicable to any arbitrary image (Foulsham & Underwood, 2008; Peters, Iyers, Itti, & Koch, 2005). However, even the highest estimates of the correlation between saliency and fixation are small, and because image-statistical approaches are fundamentally correlational, the question of whether saliency actually causes fixation selection often remains unanswered. Furthermore, the whole idea of a model of eye guidance based only on image features is called into question by the demonstration that eye movements are highly dependant on the observer’s task. In search tasks, for example, participants are able to ignore salient regions and look towards regions that are similar to the target, as well as to areas where they expect targets to be found, given the context (Chen & Zelinsky, 2006; Foulsham & Underwood, 2007; Torralba, Oliva, Castelhano, & Henderson, 2006).
The models of eye movements that have been developed on the basis of search data can be considered top-down, in the sense that they possess task-relevant knowledge (normally, target features) independent of the actual stimulus (Navalpakkam & Itti, 2005; Rao, Zelinsky, Hayhoe, & Ballard, 2002; Torralba et al., 2006; Zelinsky, 2008). For example, in the Rao et al. model, saccades are programmed to locations showing the highest correlation with the target, producing efficient search, as well as less intuitive eye movement behaviour, such as centre-of-gravity fixations that land between objects. The Torralba et al., model combines bottom-up saliency, guidance by target features, and contextual guidance: a spatial prior to bias attention towards areas where targets are likely to be found (when one searches for pedestrians, search should be concentrated on the street and not the sky). Kanan et al. (2009) extended these ideas with their saliency using natural statistics (SUN) model, which incorporates contextual guidance and probabilistic maps based on object appearance.
Najemnik and Geisler (2005) took a different approach with their ideal observer model of search. Rather than programming eye movements to locations resembling the target, this model emphasizes that an optimal searcher will place fixations that maximize the information gained. For example, because the human visibility map is horizontally elongated (i.e. empirically measured detection performance drops off with eccentricity more rapidly above and below fixation), the ideal searcher will often move to the top or bottom of the display, where target presence is more uncertain, in order to maximize the information that can be gained given the horizontal visibility map. This model successfully matched several aspects of human performance at searching for a sinusoid amid 1/f noise, including search times and some global eye movement behaviours (Najemnik & Geisler, 2008). This approach highlights the importance of foveated models, which take into account the limited resolution in the periphery. The predictiveness of both top-down and bottom-up models improves when this fundamental anatomical feature of human vision is included (Parkhurst, Law, & Niebur, 2002; Peters et al., 2005; Zelinsky, 2008).
Although these models are useful examples of top-down guidance, it is unclear how they can be applied to tasks other than search (or even to search where target features are not completely known). For example, Foulsham and Underwood (2007, 2008) used a memory-encoding task, where participants are asked to view a series of scenes in preparation for a memory test; and Underwood, Jebbett, and Roberts (2004) and Foulsham, Kingstone, and Underwood (2008) used a picture–sentence verification task where participants had to verify the accuracy of a sentence that appeared after the image it was describing. What determines where people look in these tasks, in which there is no explicit target? Some of the highest correlations between saliency and fixation are found in free-viewing tasks (although even in such tasks, the correlations are very weak and can be overidden by top-down demands; see, e.g., Einhauser, Rutishauser, & Koch, 2008). This is perhaps because, in the absence of a target, visual saliency coincides with places that are useful for interpreting or remembering the scene. A rather different approach to investigating eye movements in such tasks is to consider the general spatial biases that occur in saccade selection. Fixations tend to be biased towards the centre of most images, and this can be dissociated from photographer bias and the distribution of visual features (Foulsham & Underwood, 2008; Tatler, 2007). Saccades tend to move horizontally, and their amplitudes show a characteristic, positively skewed distribution. The trend for horizontal saccades is seen even in square images, and it changes as the scene is rotated, demonstrating that it is related to scene content and is not a fundamental property of the oculomotor system (Foulsham et al., 2008). These systematic tendencies of eye movements in scenes can potentially predict where people will fixate just as well as image-based models (Tatler & Vincent, 2009). It is therefore important to consider the causes of these tendencies and, in particular, the role of central and peripheral information. This article looks at how biases in saccade selection during an encoding task are altered by the use of a gaze-contingent viewing paradigm. We first review this technique.
The use of gaze-contingent displays
A gaze-contingent display is one that is updated in response to the viewer’s eye movements. This technique allows the experimenter to manipulate the information available at different eccentricities. In reading, the gaze-contingent moving-window design masks text outside a window that is centred on fixation. By varying the size of the window, the perceptual span at which reading can still proceed normally can be assessed (McConkie & Rayner, 1975; see Rayner, 1998, for a review). In search, a gaze-contingent window has been used to manipulate the target-similar features present in the periphery (Pomplun, Reingold, & Shen, 2001). Masking reduced the degree to which search was guided, supporting the idea that guidance operates preattentively and in parallel. In scene perception, Saida and Ikeda (1979) and Shioiri and Ikeda (1989) used a moving-window design to identify the useful field of view for picture memorisation—the size of the area around fixation which is actually used for perception. Memory performance improved as the size of the window increased, although large windows of around 10° in diameter elicited performance similar to that for normal viewing. Interestingly, an analysis of eye movements suggested that the useful field of view tended to overlap on consecutive fixations, with around 75% of the saccades moving to points that could be processed on the previous fixation. Multi-resolutional displays take the gaze-contingent technique further by allowing the resolution of the scene to be steadily degraded as a function of increasing eccentricity (Loschky & McConkie, 2002; Reingold, Loschky, McConkie, & Stampe, 2003). When the function relating eccentricity and resolution is lower than or matches that in the human visual system, this is not detected by the observer (Loschky, McConkie, Yang, & Miller, 2005). Some of these results were confirmed by Geisler, Perry, and Najemnik (2006), who varied the drop-off in peripheral resolution while observers searched for a target in noise. The authors’ ideal searcher model produced similar behaviour, given the same eccentricity limitations. Masking the periphery outside a moving window also seems to lengthen fixation durations (Greene, 2006), particularly with small windows (Loschky & McConkie, 2002). van Diepen and d'Ydewalle (2003) found that while masking the region at fixation affected fixation durations most severely (as would be expected if fixations are dominated by processing at the current location), peripheral masking also had an effect.
Despite this research, there are surprisingly few studies using gaze-contingent displays to study issues of saccade control in scenes, certainly as compared with reading research (Rayner, 2009, made a similar observation while reviewing the literature). In particular, despite evidence for asymmetries in eye movements (e.g., the predominance of horizontal saccades) and in the processing of information at fixation (which appears to be horizontally elongated; Najemnik & Geisler, 2005), no study has yet varied the shape and symmetry of a gaze-contingent window in scenes.
The present research
On the basis of previous research, we would expect scanning to suffer with the gaze-contingent window manipulations, leading to shorter saccades and longer fixations. The models discussed above suggest two potential determinants of saccade selection, and these lead to two hypotheses in the case of an asymmetric window. First, if saccades are targeted towards features whose importance can be detected from the current fixation location, currently visible regions of the scene will have a greater influence than unseen areas. In the window conditions, the only features available are within the window, so we would expect short saccades that target things within this region. If the features of potential saccade targets are represented as points on a spatial map, the saliency of points outside the window will be reduced (see also Loschky & McConkie, 2002). In the case of asymmetric windows, there are more features in one direction, and this should therefore result in a change in the distribution of saccade directions: more horizontal saccades with a horizontal window and more vertical saccades with a vertical window.
A second possibility is that saccades are chosen to maximize the new information gained on each fixation. How should we define information maximization in an encoding task? The best strategy in such a task would be to look at as much of the scene as possible. We therefore propose that maximizing the information gained means moving to a location where the most new features can be seen. If this were the case, we would expect more vertical saccades in the horizontal window condition and more horizontal saccades in the vertical window condition. This pattern would “reveal” more of the image with each gaze shift by its moving to locations that were invisible (where information was zero) on the previous fixation. Our experiments investigated the effects of the differently shaped gaze-contingent windows on saccades during encoding, with particular emphasis on distinguishing between these possibilities.
The results of Najemnik and Geisler (2008) emphasised that, for a full picture of the scanning process, saccade direction, saccade amplitude and fixation location distributions need to be analysed. In their study, they found that although fixation positions were biased to the top and bottom of the display (consistent with their ideal information maximization model and a horizontal visibility map), horizontal saccades were most common. These results could be reconciled by looking at the saccade amplitudes: Infrequent but large vertical saccades moved fixation to the top or bottom of the display, which was then explored with smaller but more common horizontal movements. The authors speculated that this strategy may also be due to our experience with natural scenes, where objects are often found on the horizon. To test the generalizability of this claim, we also looked at the interaction between window shape and scene type (landscapes vs. interiors), since we had found differences in these stimuli previously (Foulsham et al., 2008).
Inclusion in this study was contingent on having normal vision (without glasses) and on completing a good calibration on the eyetracker. Sixteen participants (9 females; age range, 18–24 years) took part for course credit and gave their informed consent.
Stimuli, apparatus and design
Eighty colour photographs showing indoor or outdoor scenes were used, half of which were presented in both the encoding phase and the test phase. All the images were high resolution and were collected from the Internet and commercially available collections and were resized to 1,024 × 768 pixels. Each encoding image was matched with a correct sentence (describing the state or position of something in the scene; e.g. “There is a towel on the bath”) and an incorrect sentence (e.g. “There is a towel on the floor”).
Eye movements were recorded using the Eyelink 1000 eyetracker (SR-Research). Participants were seated at a chinrest that ensured a constant viewing distance of 60 cm from the screen and eliminated head movements. Stimuli and instructions were presented on a 19-in. monitor with a 60-Hz refresh rate, the frame of which was visible throughout the experiment. The screen subtended approximately 30° × 25° of visual angle. Images were shown full-screen, and participants used a gamepad to respond after each trial. Eye movement events were parsed using the default EyeLink 1000 algorithm, which identified saccades where the velocity of the eye position signal was greater than 30°/s and acceleration was above 8,000°/s2.
All the participants saw the same images in a random order. Four viewing conditions were used, and these were presented within participants in a blocked fashion, counterbalanced among participants: normal viewing, square gaze-contingent window, horizontal window and vertical window (see Fig. 1).
Stimuli appeared in all four conditions, across participants, and at encoding, images were equally likely to be paired with a correct or an incorrect sentence. At test, the images from encoding were presented again, interleaved with the same number of unseen images.
Following calibration with a 9-point grid, two practice trials were given in order to familiarize the participants with the gaze-contingent display. The experiment proper then began with the encoding phase (Fig. 2).
Participants were shown four blocks of ten images, one block for each of the four viewing conditions. Each encoding trial began with a central fixation point, which participants were required to fixate before the trial began and which therefore ensured that scanning started in the centre. The image then appeared and remained on the screen for 10 s. Participants were instructed to inspect the scene and try to remember it for the sentence verification task. Following the image, a sentence appeared that could be correct or incorrect with regard to the previous scene. Participants were required to press one of two keys on the gamepad to indicate whether the sentence was correct or not, and this keypress terminated the display and initiated the next trial.
When all the encoding trials were complete, participants were given a surprise memory test for the images, which we will use as an additional measure of how well image encoding can proceed under the different viewing conditions. Participants were instructed to view each image and decide whether they had seen it previously in the encoding block. All 80 images (half of which were the ones seen at encoding) were then displayed in a random order. Each test trial began with a fixation point, followed by the presentation of the scene. The image remained on the screen until the participant made an old/new judgment by pressing one of two keys on the gamepad. The experimenter continued to monitor the validity of the eyetracker calibration, and it was recalibrated after encoding and whenever necessary to maintain a good calibration.
Analysis and results
We used participants’ memory as a preliminary indicator of encoding performance. In the subsequent recognition test, scenes were correctly recognized on 75% of trials (mean false alarm rate = 12%), and accuracy did not vary reliably with the viewing condition at encoding, F(3, 45) = 1.1, p = .34. There was a marginally reliable effect on correct recognition time, F(3, 45) = 2.7, p = .059. Recognition was fastest when the stimulus had been seen under normal conditions (mean RT = 2,990 ms) or when a square window had been used at encoding (3,014 ms). The asymmetric conditions were associated with the slowest performance (horizontal = 3,765 ms; vertical = 3,747 ms). Thus, scene encoding was worse in the gaze-contingent conditions, which is consistent with previous reports (Saida & Ikeda, 1979). Our subsequent analyses concentrated on the way in which the scenes were scanned with saccadic eye movements.
We first looked at eye movement measures across the whole trial: the number of fixations, their mean duration, and the mean amplitude of the saccades made. We then focused on our main question by looking at saccade direction and amplitude in the different conditions. In this study, we were concerned only with behaviour at encoding, where there was a fixed trial time. In each case, we compared the different viewing conditions, using a within-subjects ANOVA, with post hoc Tukey tests, which compensate for the familywise error associated with making multiple comparisons, being used to compare between each pair of conditions where necessary.
General eye movement measures (Table 1)
Measures quantifying general eye movement behaviour during the scene-encoding task
Number of fixations per trial
Fixation duration (ms)
Saccade amplitude (°)
There was a highly reliable effect of condition on the mean saccade amplitude, F(3, 45) = 117.96, p < .001. The gaze-contingent conditions were characterized by saccades several degrees shorter, on average, than in normal viewing, all qs(15) < 13, all ps < .01. However, saccades in the horizontal condition were not as short as those in the vertical and square conditions, both qs(15) < 8, both ps < .01; a horizontal window did not produce such a severe change in the size of scanning movements. The vertical and square conditions did not differ reliably.
To perform statistics, we divided the full range of directions into four 90° arcs, centred on the cardinal directions (see the shaded regions in Fig. 3). We first confirmed the symmetry of the plots in Fig. 3. There was no difference in the frequency of upward versus downward saccades, and no difference in the frequency of leftward versus rightward saccades, in any of the conditions, all ts(15) < 1. As a result, we collapsed all the saccades into two categories, vertical and horizontal, and computed the frequency of saccades in each category for each participant in each condition. Finally, we calculated the proportion of horizontal saccades (hereafter, the HVP, calculated as the frequency of horizontal saccades divided by the frequency of all saccades). An HVP of 1 would indicate that all the saccades made were horizontal, whilst an HVP of 0 would show complete dominance of vertical eye movements.
In normal viewing, with no gaze-contingent window, there was a mean HVP of .70 (SE = .01). This quantifies the horizontal bias, which was present in all participants and reliably different from an equal proportion of saccades in each direction (one-sample t test against an HVP of .5), t(15) = 22.0, p < .001. Viewing condition had a reliable effect on the HVP, F(3, 45) = 11.44, p < .001. The vertical window produced a significantly smaller horizontal bias than in normal viewing (M ± SE = .59 ± .03), q(15) = 5.4, p < .01. However, the horizontal and square conditions were not significantly different from normal (.72 ± .01 and .67 ± .01, respectively). The vertically oriented window elicited reliably more vertical and fewer horizontal saccades, leading to a lower HVP than for the other shapes, both qs(15) > 4, both ps < .05. The square window resulted in behaviour somewhere between that in the two rectangular conditions, with a less pronounced horizontal tendency than in the horizontal condition, q(15) = 4.9, p < .05. These findings demonstrate that the window shape modified the frequency of saccades made in different directions.
In light of the differences in saccade direction, it is pertinent to ask how amplitude and direction interact as a function of the window shape. If the changes in the saccade direction distribution were due to saccades that target locations within the window, we would expect the majority of saccades to have amplitudes of less than the extent of the viewing aperture. Saccades larger than this would have been made towards the masked background and would, therefore, indicate strategic or top-down selection.
The amplitude distributions were unimodal and characterized by a few, very small saccades (of 1° or less), a majority of saccades of amplitude between about 1.5° and 4°, and a gradually decreasing frequency of larger eye movements. In normal viewing, horizontal and vertical saccades show similar distributions with a mode at about 1.5° and a median amplitude of 6.2° and 4.1° for horizontal and vertical saccades, respectively. The distribution for vertical saccades is sharper, with fewer long saccades: Only 17% of vertical saccades were over 8°, as compared with 30% of horizontal eye movements.
In comparison with normal viewing, the gaze-contingent window conditions had narrower distributions with a larger mode but fewer long saccades, resulting in lower medians. For example, the medians for horizontal and vertical saccades in the square condition were 3.4° and 3.0°, respectively, lower than those for normal viewing but showing the same trend as that for more large horizontal eye movements.
Looking at the bottom two panels in Fig. 5, it is clear that the asymmetrical window shapes led to a systematic change in saccade length. With a horizontal window, the distribution of horizontal saccades was more spread out and had a higher mode and more long-range saccades, leading to a higher average (median = 5.1°), relative to vertical eye movements (median = 2.9°). With the vertical shape, the opposite pattern was observed, and this was the only case where there were more large saccades moving vertically (median = 3.5°) than horizontally (median = 2.9°). An omnibus ANOVA performed on the participant medians confirmed that there was an effect of viewing condition, F(3, 45) = 76.6, p < .001. Direction was also reliable, with horizontal saccades resulting in a longer median, overall, than did vertical eye movements, F(1, 15) = 91.3, p < .001. However, these effects were qualified with an interaction, F(3, 45) = 89.4, p < .001. The median amplitude of horizontal saccades was greater than that of vertical saccades in normal viewing and on those trials with a square or a horizontal window, all ts(15) > 3.8, ps < .005. However, on trials with a vertical window, the median amplitude of vertical saccades was larger, t(15) = 3.6, p < .005.
The dotted lines in Fig. 5 indicate the extent of the moving window in each direction, which gives an idea of the frequency of saccades landing within versus beyond the window. We compared the landing site of each saccade with the coordinates of the aperture on the previous fixation. Although saccades in the gaze-contingent conditions were shorter than normal, about 50% of all the saccades went outside the window. In the square condition, 49% of the horizontal saccades and 37% of the vertical saccades went outside the window. The pattern in the horizontal condition (horizontal saccades, 37% outside the window; vertical saccades, 78%) was precisely the opposite of that seen on trials with a vertical window (horizontal saccades, 79%; vertical saccades, 32%). These observations suggest that it was perfectly possible for people to saccade to parafoveal or peripheral locations that were empty. In other words, the length of saccades was not completely curtailed by the presence of a gaze-contingent boundary, as evidenced by, for example, the tendency to make vertical eye movements beyond the edge of a horizontal window.
Conclusions from Experiment 1
There was a clear effect of the gaze-contingent viewing conditions, relative to normal viewing, and of the shape of the window. Image viewing with a moving window was characterized by more fixations and shorter saccades, and window shape had a differential effect on scanning direction and amplitude. There was a predominance of horizontal saccades in all the conditions, which suggests that this bias is not dependant solely on the visual features in the periphery (because it was also found in the masked-background conditions). However, a change from a horizontal window to a vertical one did change the pattern of saccade directions. A vertical window led to more vertical saccades: Participants preferred to move toward regions about which they already had some visible information. The distribution of saccade amplitudes shifted according to the boundaries of the window, although a significant number of saccades went beyond this boundary (i.e. into empty space). In the context of the memorization task, the moving-window conditions were detrimental to encoding and recognition, demonstrating that removing peripheral information had an impact on cognition.
In Experiment 1, the window conditions reduced the visual information available outside the aperture to zero (and in the case of the horizontal and vertical windows, this reduction was asymmetric). Complete masking of peripheral information is a rather artificial situation, and this may have been compounded in our experiment by the use of rectangular windows, which led to strong discontinuities, and straight edges at the boundary of the window. It is possible that the predominance of horizontal and vertical saccades was affected by these properties of the moving window or that it was unnatural for participants to saccade into empty space.
In Experiment 2, we used a more subtle manipulation to control the information available for planning saccades. We had two aims with this additional experiment. First, our aim was to replicate the changes in saccades found with vertical versus horizontal windows in a moving-window display without straight edges and with a less pronounced discontinuity between the window and the surround. Specifically, we used an elliptical window, and rather than mask the background completely, we presented high-resolution information at fixation and a low-pass-filtered (i.e. blurred) version of the image as a background. Second, we tested to see whether the effects of window shape would be moderated by the amount of information in the periphery. With a blurred background, all possible saccade targets contain some information, and the saccadic system must decide whether to move within the window, where visual information is preserved, or into the periphery, where current information is still present but is degraded. As previously, we manipulated the extent of preserved information in different directions by using a horizontally or vertically oriented window, and we explored whether the changes in saccade direction and amplitude remained. If the pattern of saccades with different window shapes is different—for example, if the vertical window no longer produces a higher frequency of vertical saccades—it would suggest that when some peripheral features are present, they are used by the saccadic system, perhaps in computations to maximize the information gained. Moreover, any differences between the experiments would demonstrate the importance of having something in the periphery, as opposed to nothing at all. One way that the extent of peripheral information might have an effect on eye movements is if scene type has an effect on the direction biases observed. We therefore also looked at saccade direction in both landscapes and interior scenes, for both experiments.
Twenty-six participants (18 females) took part in this experiment, none of whom had taken part in Experiment 1. All the participants were students (age range, 18–24 years), who took part for course credit and had normal vision.
Stimuli, apparatus and design
The gaze-contingent display functioned in the same way as in Experiment 1. However, rather than a completely masked background, a low-pass-filtered version of the current stimulus appeared outside the window of fixation. The low-pass versions were produced in Adobe Photoshop by convolution with a Gaussian blur filter, the standard deviation of which was 0.5°. This relatively severe level of blur attenuated spatial frequencies greater than approximately 2 cycles/deg, frequencies which are well above perceptual thresholds, even at large eccentricities. This level of blur was chosen, on the basis of pilot studies, to be noticeable to participants, while still enabling general scene content to be determined from the blurred image (for manipulations of the degree of peripheral blur necessary for participants to become aware of the manipulation, see Loschky et al., 2005). The window around fixation was a regular ellipse with major axes of 12.5° and 3.1° (or vice versa for the vertical window), which were the same as the dimensions of the rectangular windows used previously.
We again used the picture–sentence verification task from Experiment 1. Participants saw all images in a random order, with the instructions that they should “look carefully at the pictures so as to verify the accuracy of the following sentence”.
The procedure began with instructions, and the eyetracker was calibrated as previously. Two practice trials were then given, in order to familiarize participants with the apparatus and task. The experiment proper then began, and each trial proceeded in exactly the same way as in Experiment 1. No subsequent memory test was given.
Analysis and Results
We look first at general eye movement parameters before examining the saccade direction and amplitude distributions. Our comparisons of interest were (1) whether the orientation of the window (horizontal or vertical) affected viewing and (2) whether the effect in these blurred conditions was the same as with the rectangular, masked windows in Experiment 1. If the effects from the first experiment were caused by a reluctance to orient to a blank mask, the results with a blurred periphery should be more similar to normal viewing and should show less of an effect of window shape.
General eye movement measures
The mean number of fixations per trial, fixation duration and saccade amplitude were subjected to a 2 × 2 mixed ANOVA with the within-subjects factor of window shape (horizontal or vertical) and the between-subjects factor of experiment. For both number and duration of fixations, there were no differences between conditions or experiments and no interactions, all Fs < 2.3, ps > .14. Moreover, independent t tests yielded no differences in these measures between either condition in Experiment 2 and normal viewing in Experiment 1, both ts(40) < 1.
There was no main effect of experiment on saccade amplitude, F(1, 40) < 1: Masking and blurring led to saccades of a similar mean length. Although there was an interaction between experiment and window shape, F(1, 40) = 6.3, p < .05, saccades in Experiment 2 remained longer with a horizontal ellipse window (M = 4.4°, SE = 0.21) than with a vertical window (M = 4.0°, SE = 0.19), t(25) = 3.5, p = .001. This is the same difference as that seen in Experiment 1, although it was not as severe. In both conditions, saccades remained shorter than those seen in normal viewing, both ts(40) > 7, both ps < .001. This is consistent with the findings from Experiment 1: Gaze-contingent windows led to shorter saccades.
Thus, the window manipulation in this experiment produced results similar to those in Experiment 1, although the number and duration of fixations was relatively less affected (as compared with normal viewing) than in the fully masked conditions in Experiment 1.
The mean HVP was .71 (SE = .01) in the horizontal condition and .59 (SE = .01) in the vertical condition. There were more vertical saccades and fewer horizontal ones when the elliptical window was vertically oriented, t(25) = 8.8, p < .001. The vertical condition was reliably different from normal viewing, t(40) = 7.0, p < .001, but the horizontal condition did not differ reliably, t(40) < 1. There was no evidence that the pattern was any less pronounced than in Experiment 1 (no effect of experiment and no interaction; both Fs(1, 40) < 1).
Overall, saccades were marginally shorter with the blurred display than in Experiment 1 (Experiment 1, M = 3.6°; Experiment 2, M = 3.2°), F(1, 40) = 3.6, p = .065. Main effects of direction, F(1, 40) = 32.3, p < .001, and window shape, F(1, 40) = 27.8, p < .001, and an interaction between them, F(1, 40) = 368.8, p < .001, confirmed the pattern in Experiment 1: With a horizontal window, horizontal saccades peaked at higher amplitudes than did vertical saccades (Ms = 4.2° vs. 2.9°), but with a vertical window, the pattern was reversed and vertical eye movements were longer (Ms = 3.6° vs. 3.1°). There was also an interaction between experiment and direction, F(1, 40) = 4.4, p = .04. Horizontal saccades had a higher median than did vertical saccades, but this difference was less pronounced in Experiment 2 (Ms = 3.3° and 3.1° for horizontal and vertical saccades, respectively) than in Experiment 1 (Ms = 3.9° and 3.3°). Finally, there was a three-way interaction, suggesting that the shift in the amplitude of saccades of different directions elicited by different windows was different in Experiment 2, F(1, 40) = 15.1, p < .001. However, although the differences were less pronounced in Experiment 2, breaking down the effects within this experiment showed the same result: Direction interacted with window shape, F(1, 25) = 144.2, p < .001. With a horizontal ellipse window, horizontal saccades (M = 3.8°) had larger median amplitude than did vertical saccades (M = 2.8°). With a vertical ellipse, the opposite was true (horizontal, M = 2.8°; vertical, M = 3.3°).
How often did saccades move outside the window of preserved visibility? Saccades were just as likely to move outside the window in Experiment 2 as in Experiment 1, t(40) < 1. As before, the direction of these saccades followed the orientation of the window (see also the dashed lines in Fig. 8) With a horizontal window, more vertical than horizontal saccades landed outside the window (69% vs. 37%), and with a vertical window, the opposite was true (39% of vertical saccades vs. 75% of horizontal saccades). Thus, blurring the peripheral information, as opposed to masking it completely, did not seem to affect the deployment of saccades outside the window. How were these eye movements controlled, given that the features at their landing site were invisible or degraded? The next section looks at the properties of these saccades in more detail.
The control of saccades beyond the window
We performed additional analyses of all the eye movements from the horizontal and vertical conditions (since these were the most comparable between experiments). First, we checked whether the trends in saccade direction held for the saccades outside the window. All of these eye movements were targeted at masked or blurred regions. If they also followed the direction of the window, it would indicate control based on memory or expectations of what was there (in the case of a masked periphery) or on limited, low-spatial-frequency information (in Experiment 2).
It was most appropriate for this analysis to compare saccades of similar amplitude, so we looked at all saccades longer than 6.5°, placing their endpoints beyond the window in all conditions from both experiments (Experiment 1, N = 2,304 saccades; Experiment 2, N = 6,313). Window shape continued to have an effect on the direction of these long saccades, F(1, 40) = 40.5, p < .001. However, this interacted with experiment, F(1, 40) = 24.6, p < .001. In Experiment 1, even large saccades were more likely to be horizontal with a horizontal window (mean HVP = .84) than with a vertical window (mean HVP = .52), paired t test, t(15) = 5.1, p < .001. This same trend was reduced in Experiment 2 (horizontal window, HVP = .71; vertical window, HVP = .67), although it did reach one-tailed significance, t(25) = 1.77, p = .04. Thus, the differences in saccade direction in Experiment 1 were also found in large saccades that were targeted at locations outside the window, even though there was no difference in the information at these points. However, when peripheral information was blurred, window shape had less of an effect on large saccades. On those occasions in which gaze moved outside the window, an increase in the information in the periphery ameliorated the affect of the gaze-contingent window on saccade direction.
We also analysed the properties of all eye movements landing outside the window, in order to test two specific predictions about the control of these saccades. First, it might take longer to initiate a saccade to these locations, perhaps because masking or blurring reduces the saliency of points beyond the window boundary. To test this, we looked at the duration of the fixation preceding the eye movement: Systematically longer fixation durations would suggest an increased preparation time for these saccades. Second, given that the saccades landed on targets that were masked or blurred, they should not be ideally positioned and might lead to a corrective eye movement. For example, people rarely fixate an empty background, but they may have erroneously done so if these regions were masked. If this happened often, participants may have terminated the resulting fixation early and made a short saccade to a more optimal position. We therefore computed the duration of the following fixation and the amplitude of the following saccade, and we predicted shorter fixations and smaller saccades after saccades landing outside the window. In each case, we compared the average for saccades landing within the window with that for saccades landing outside, in masked viewing (Experiment 1) and blurred viewing (Experiment 2), collapsed across horizontal and vertical window shapes.
Saccades outside the window were not associated with reliably longer prior fixation durations in either Experiment 1 (outside, M = 245 ms; inside, M = 249 ms) or Experiment 2 (outside, M = 249 ms; inside, M = 250 ms), both ts < 1. However, in Experiment 1, the fixation following a saccade outside the window (M = 232 ms) was reliably shorter than one directed within the window (M = 257 ms), t(15) = 6.5, p < .001. The same trend was also reliable in Experiment 2 (outside, M = 242 ms; inside, M = 251 ms), t(25) = 2.3, p < .05. The median amplitude of the saccade following an eye movement outside of the window was slightly shorter than that following an eye movement within the window in both experiments. This difference was negligible in Experiment 1 (outside, M = 3.65°; inside, M = 3.69°), t(15) < 1, but reached significance in Experiment 2 (outside, M = 3.13°; inside, M = 3.36°), t(25) = 3.2, p < .005. To summarize these results, whether a saccade moved inside or outside the window made no difference to the previous fixation duration. However, in line with our predictions, the subsequent fixation was shorter in duration and the following saccade had a smaller amplitude if an eye movement went outside the window.
The effect of scene type
We previously reported a reduced horizontal bias and more vertical saccades when participants viewed interior scenes than when they viewed landscapes (Foulsham et al., 2008). In the present research, the windowed conditions reduced the availability of peripheral scene content. Was saccade direction in these conditions sensitive to the type of scene? If the change in eye movements in interiors occurs because scene type is recognized from peripheral information, we would expect similar scanning in both landscapes and interiors in the moving-window conditions, particularly in Experiment 1, where the background was completely masked. Looking at the interaction between window shape and scene type will also help us explore the relationship of the windowed conditions to normal scene viewing.
Distribution of saccade endpoints
A residual question from our experiments concerns the location of fixations around the scene. This is important for two reasons. First, there are several biases known to affect the overall spatial distribution of fixations in an image, such as a central bias, and it is interesting to ask whether the gaze-contingent window modified these biases. Second, Najemnik and Geisler (2008) showed that both human observers and an ideal observer model tended to fixate in a “donut”-shaped region around the centre of a search display and, particularly, at the top and bottom of this ring. This pattern complemented their analyses of saccade amplitudes and direction: Although horizontal saccades were more likely in their study, they suggested that in order to maximize information, initial, infrequent vertical saccades moved fixation towards the top or bottom, which was then explored with more frequent but shorter horizontal saccades. In sum, fixations were most common at the top and bottom of the display, which, according to these authors, indicated optimal positioning of the horizontally elongated region of visibility. Thus, looking at the fixation distribution is a key way in which to distinguish between alternative interpretations of our own data.
The plots reveal several interesting trends. First, there is a strong central bias in all the conditions, which occurs in the first few saccades but becomes less pronounced in later saccades. Second, the most frequently inspected points are around the image horizontal, whereas the top and bottom of the image are more likely to be neglected. Third, in general, there was a leftward bias, particularly in the first five saccades, where 69% of all the saccades landed in the left side of the image. Finally, the moving-window conditions resulted in some differences in the distribution of saccade endpoints, particularly when one looks at the 6th–10th saccade in the trial. In both the square window and vertical window conditions, there was a strong asymmetry in the plots: Saccades were more likely to move to the left of the image than to the right. This contrasts with the distribution seen in normal viewing, which is more evenly distributed, and that in the horizontal window condition, which actually seemed to produce a rightward bias in saccades 6–15.
Conclusions from Experiment 2
This experiment replicated the main finding from Experiment 1—that a vertically oriented window reliably reduced the horizontal bias—and extended that experiment in several ways. First, the effect remained in stimuli where peripheral information was blurred, rather than being completely removed, which is important because this is similar to the way that information is disrupted in natural vision. Second, the effect remained for large saccades that moved outside the window. Third, saccades outside the window were followed by shorter fixations and smaller saccades, demonstrating that decreased information at the saccade destination affected subsequent eye movements. Fourth, the gist of the scene moderated the pattern of saccade direction, even when peripheral information was masked, such that interiors led to fewer horizontal saccades and more vertical ones. Finally, although there was evidence for a central bias in the distribution of fixations, the concentration of fixations at the top and bottom of the display that was reported by Najemnik and Geisler (2008) was not seen in the task and stimuli used here.
We investigated how the shape of the information around fixation affected some of the patterns in eye movement scanning during an unconstrained encoding task. We will begin by characterising normal scanning, before discussing the effect of different gaze-contingent windows and the implications for models of eye guidance in scenes.
Normal and gaze-contingent scanning
In Experiment 1, we replicated some of the eye movement biases that have been seen in other image-viewing tasks. There was a strong central bias in normal viewing, and this was strongest at the start of scene viewing (during the first five saccades). This is likely because the starting viewing position was constrained by the experiment to be at the centre of the screen and, presumably, as time went on, people were more likely to have moved further from the centre. Other factors that have been suggested to contribute to the central bias are the distribution of salient features or objects in the scene (photographer bias) or orbital reserve, and Tseng, Carmi, Cameron, Munoz, and Itti (2009) and Tatler (2007) have considered these factors in detail. There was also a slight leftward bias at the start of viewing, which is consistent with the results of Dickinson and Intraub (2009), who recently reported a leftward asymmetry in scene perception. The preference to move to the left side of the image was found across a range of scenes and, therefore, seems unlikely to be caused by an uneven distribution of features or objects within the scene (for further discussion of the role of image features in saccade asymmetries, see Foulsham & Kingstone, 2010).
The saccades made in normal viewing also showed biases in direction and amplitude. There was a marked tendency for making horizontal saccades, rather than vertical or oblique eye movements. Most saccades were between about 2° and 7° in amplitude, indicating that they tended to target regions on the parafovea or extending into the periphery, but horizontal saccades were longer than vertical saccades, on average. Why were there more (and longer) horizontal saccades? The bias in the present study was probably exacerbated by the fact that images (and the visible monitor) were landscape in orientation, that scanning started in the centre, and that the image was always presented in the same egocentric reference frame (meaning that the horizontal position of the eyes and biases in the movement of the extraocular muscles may have had an effect). However, we have shown previously that a horizontal bias persists even in the absence of these cues (Foulsham et al., 2008). With random start locations and square images that were rotated from their canonical orientation, that study demonstrated that the horizontal bias was scene centred, rather than egocentric.
Gaze-contingent windows had some general effects on scanning, some of which have been reported elsewhere. First, the saccades made in these conditions were shorter, on average, than those in normal viewing, confirming the influence of peripheral information on saccade guidance and suggesting that a more conservative strategy was employed that targeted features within the window. This would also explain why the gaze-contingent conditions elicited somewhat less dispersed endpoint distributions and a greater central bias: The window curtailed long saccades, so that fixations remained closer to the centre for longer. Removal of peripheral information also had a detrimental effect on memory for the scenes: It took longer to recognise scenes that had been viewed through a gaze-contingent window, consistent with a detriment in encoding in these conditions (see Saida & Ikeda, 1979).
There was mixed evidence for an effect of peripheral masking on the number or duration of fixations. In Experiment 1, some of the window conditions resulted in more fixations, with a slightly lower average duration, than did those in normal viewing, but this was not found in Experiment 2. van Diepen and d'Ydewalle (2003) also found an increased number of fixations with peripheral masking of line drawings of scenes, although this study reported an increase in fixation durations under these conditions. Loschky and McConkie (2002) also reported longer fixation durations in viewing with a low-pass-filtered periphery, which we did not find here, perhaps because we used larger windows. This discrepancy might also occur because, in our study, viewing was limited by a fixed trial duration. The increased difficulty of the gaze-contingent encoding in Experiment 1 was reflected in an increase in the number of fixations, perhaps because each object or region of interest had to be fixated multiple times. This interpretation is consistent with a sequential model of attention in scene perception where fixation durations reflect processing close to fixation and are relatively unaffected by peripheral information. On the other hand, having low-resolution information in the periphery (Experiment 2) was sufficient for eliciting fixations that were not significantly more frequent or longer than in normal viewing.
The effect of window shape
In the introduction, we offered two possible hypotheses for how a horizontal and vertical window would change patterns in scanning direction. First, if saccades were guided in order to maximize the information gained on each fixation (defined as revealing new areas of the scene), a horizontal window would lead to more vertical saccades in order to avoid previously seen regions, with the opposite being true in the case of a vertical window. Second, if saccades were guided towards features that were currently visible, a horizontal window would lead to more horizontal saccades. Our findings point unanimously to the latter explanation: A vertical window produced more vertical saccades than in the other conditions, even though there were fewer unseen areas to be explored by moving up and down. This difference between window conditions was found even on the very first saccade. The pattern of saccade amplitudes was also systematically related to the dimensions of the window: Saccades with an amplitude and direction matching the boundary of the window were made frequently. We can be confident that these findings are not artefacts of windows with straight edges and a completely masked periphery, because the findings were replicated in Experiment 2 with elliptical apertures and a blurred periphery. In addition, there was no evidence for a trade-off in terms of the horizontal bias and a tendency to fixate at the top and bottom of the display, as was found by Najemnik and Geisler (2008). In fact, the top and bottom of the scene were relatively neglected in all conditions.
Several conclusions can be made on the basis of the saccade direction and amplitude results. First, because a bias for horizontal saccades persisted even with a square window (where visible information was equal in all directions), this bias must be partly driven by experience or knowledge about landscape-oriented images and monitors. Second, this default bias for horizontal eye movements was modified by an asymmetric window, consistent with a strategy of targeting points that could already be seen within the high-resolution window. In the vertical window condition, there were more features above and below fixation, and fewer to the left or right, and so the higher frequency of vertical saccades might reflect a tendency for people to move towards the information within the window.
A potential problem with this interpretation is that there was a significant number of saccades that were large enough to be targeted outside the window. How were these saccades controlled, and how did masking and blurring affect their occurrence? Loschky and McConkie (2002) also found that the radius of a gaze-contingent window shortened saccades, and they interpreted this pattern of results as evidence that peripheral filtering reduces the saliency of points outside the window, making them less likely to win the competition for the next saccade. Surprisingly, in our study, saccades were no more likely to move outside the window when the periphery was blurred than when it was completely masked. This further supports the argument that, when given the choice between high-resolution information and masked or above-threshold filtered information, the eye movement system tends to saccade within the window. Furthermore, even long saccades outside the window tended to go in the direction of the elongated boundary. One possibility is that these saccades target features on or near the window boundary but overshoot this destination, due to noise in the saccadic system. This would predict a distribution of amplitudes with the mode at the radius of the boundary, which is similar to what we find. It is also possible that participants were driven to “follow” partially seen objects or details which extended into the masked space, therefore allowing them to make a reasonable prediction about what was there before they planned their saccade.
Another possibility is suggested by the properties of the preceding and following fixations and of the following saccade. It has been suggested that eye movement events in scene viewing can be divided into local clusters of exploratory fixations of long duration with short saccades, separated by larger amplitude, “global” shifts to a new region (Unema et al., 2005). We found that saccades outside the window were followed by shorter fixations and smaller amplitude saccades than were saccades within the window. This suggests that saccades outside the window may have been qualitatively different, global shifts which moved the eyes from one period of local scanning (within the window) to another. Why, then, was the subsequent fixation atypically brief? It is likely that, because the information at this point was degraded at the start of the saccade, its positioning was suboptimal and, therefore, participants terminated the fixation early and made a small re-adjustive saccade. It may be that viewing with a gaze-contingent window exaggerates the local/global viewing strategy, and this would be worth exploring in further research.
Implications for natural scene viewing
An additional point of interest concerns the relationship between the gaze-contingent conditions and normal viewing. Across several measures, a horizontal window led to less of a difference from normal scanning than did a vertical or square window. Specifically, in Experiment 1, a horizontal window had less of an impact on mean saccade amplitude and on the direction and amplitude distributions than did a vertical or square window. This was also true in Experiment 2, and it suggests that viewing with a horizontal window was less disruptive and more normal for participants. The implication here is that during normal vision, the visible region most important for eye guidance is elongated in the horizontal direction. This is consistent with the visibility maps measured by Najemnik and Geisler (2005), albeit in a rather different task.
Although the results emphasise the importance of currently visible features in our task, we should be cautious about making further claims about human eye movement strategies in natural viewing. Window shape is only one of several factors in determining the pattern of saccade directions, and we also confirmed that scene type makes a difference. More horizontal saccades were made when landscapes were viewed than when interiors were viewed, probably because interesting features were arranged along the horizontal, and this did not interact with window shape. The lack of information-seeking saccades may have occurred because participants did not have experience and knowledge about the visibility of the artificial windows and, so, resorted to a more feature-driven approach. It is highly likely that saccade targeting is based on multiple sources of information that can be weighted differently (see Brouwer & Knill, 2007, for an example of this cue integration in visually guided reaching). Therefore, the challenge for modelling is to combine the drive towards currently visible features and the desire to maximize information in a way that explains the present data. For example, perhaps locations close to the boundary of the window (which were frequently fixated in the present study) represent a trade-off between targeting visible features and revealing new information. This is another interesting avenue for future study.
In conclusion, the experiments reported here point to two important principles regarding the control of saccades in scene viewing. First, rather than aiming only to maximize the information on each saccade, the eye movement system in our encoding task targeted features within the regions of foveal or parafoveal visibility. Second, this region is elongated horizontally, which may be an important factor in the horizontal saccade bias that has been observed in natural scene perception. With this foundation in place, the opportunity for future investigations is vast. In addition to questions of local/global exploration and the modelling of saccade targeting, researchers can use the gaze-contingent window to examine the role of bottom-up and top-down features in search, scanning in different tasks and how changes in window shape might be used to enhance exploration in patient populations, such as those with left-side neglect.
This work was supported by NSERC grants to A.K. and a Commonwealth Postdoctoral fellowship to T.F. from the Government of Canada.
- Brouwer, A.-M., & Knill, D. C. (2007). The role of memory in visually guided reaching. Journal of Vision, 7, 1–12.Google Scholar
- Buswell, G. T. (1935). How people look at pictures: A study of the psychology of perception in art. Chicago: University of Chicago Press.Google Scholar
- Einhauser, W., Rutishauser, U., & Koch, C. (2008). Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. Journal Of Vision, 8((2, Art. 2)), 1–19.Google Scholar
- Kanan, C.M., Tong, M. H., Zhang, L., & Cottrell, G. W. (2009). SUN: Top-down saliency using natural statistics. Visual Cognition, 17(6 & 7), 979–1003Google Scholar
- McConkie, G. W., & Rayner, K. (1975). Span of effective stimulus during a fixation in reading. Perception & Psychophysics, 17, 578–586.Google Scholar
- Saida, S., & Ikeda, M. (1979). Useful visual-field size for pattern perception. Perception & Psychophysics, 25, 119–125.Google Scholar
- Tatler, B. W. (2009). Eye guidance in natural scenes [Special issue]. Visual Cognition, 17(6/7)Google Scholar
- Tseng, P. H., Carmi, R., Cameron, I. G. M., Munoz, D. P., & Itti, L. (2009). Quantifying center bias of observers in free viewing of dynamic natural scenes. Journal of Vision, 9((7, Art. 4)), 1–16.Google Scholar
- Underwood, G., Jebbett, L., & Roberts, K. (2004). Inspecting pictures for information to verify a sentence: Eye movements in general encoding and in focused search. Quarterly Journal of Experimental Psychology, 57A, 165–182.Google Scholar
- Yarbus, A. L. (1967). Eye movements and vision. New York: Plenum.Google Scholar