Look-ahead fixations: anticipatory eye movements in natural tasks
- First Online:
- Cite this article as:
- Mennie, N., Hayhoe, M. & Sullivan, B. Exp Brain Res (2007) 179: 427. doi:10.1007/s00221-006-0804-0
- 477 Views
During performance of natural tasks subjects sometimes fixate objects that are manipulated several seconds later. Such early looks are known as “look-ahead fixations” (Pelz and Canosa in Vision Res 41(25–26):3587–3596, 2001). To date, little is known about their function. To investigate the possible role of these fixations, we measured fixation patterns in a model-building task. Subjects assembled models in two sequences where reaching and grasping were interrupted in one sequence by an additional action. Results show look-ahead fixations prior to 20% of the reaching and grasping movements, occurring on average 3 s before the reach. Their frequency was influenced by task sequence, suggesting that they are purposeful and have a role in task planning. To see if look-aheads influenced the subsequent eye movement during the reach, we measured eye-hand latencies and found they increased by 122 ms following a look-ahead to the target. The initial saccades to the target that accompanied a reach were also more accurate following a look-ahead. These results demonstrate that look-aheads influence subsequent visuo-motor coordination, and imply that visual information on the temporal and spatial structure of the scene was retained across intervening fixations and influenced subsequent movement programming. Additionally, head movements that accompanied look-aheads were significantly smaller in amplitude (by 10°) than those that accompanied reaches to the same locations, supporting previous evidence that head movements play a role in the control of hand movements. This study provides evidence of the anticipatory use of gaze in acquiring information about objects for future manipulation.
KeywordsEye movementsComplex actionNatural tasksPredictionHuman
In everyday tasks, gaze is used actively to gather information for the control of actions. Such strategies have been shown in a variety of tasks such as driving (Land and Lee 1994), cricket (Land and McLeod 2000), walking (Patla and Vickers 1997), sandwich making (Hayhoe 2000, Hayhoe et al. 2003) and in tea making (Land et al. 1999). In the laboratory, tasks such as block copying (Ballard et al. 1992) and block manipulation (Johansson et al. 2001) have shown that gaze was directed to locations where information critical for manipulation was obtained. Subjects appear to use gaze to select the specific information required for that point in the task. Ballard et al. (1995) referred to this as a “just-in-time” strategy. This strategy is computationally efficient, since only a limited amount of information needs to be computed from the visual image within a fixation, and it is not necessary to maintain this information in working memory if it is no longer needed. However, it does not address the problem of coordinating larger behavioral sequences.
It seems clear that in natural behavior, subjects plan actions ahead of time, and information acquired at an earlier point in time is used in this planning. Land and Furneaux (1997) noted the need for some kind of visual buffer both in driving, where the current information influences the steering about 800 ms later, and in piano playing, where the fixations lead the note played by about a second. The reduction in temporal and spatial uncertainty afforded by the continuous presence of stimuli in ordinary behavior allows for the use of visual information acquired in fixations prior to the current one, to plan both eye and hand movements. Chun and colleagues (Chun 2000; Chun and Jiang 1998; Chun and Nakayama 2000) hypothesized that implicit memory structures may be needed for guiding eye (and presumably hand) movements around a scene. They argue that such guidance requires continuity of visual representations across different fixation positions. However, we have limited knowledge of the extent of such action planning, and of the extent to which memory representations are used for this purpose. When subjects repeatedly tap a sequence of locations on a table, they locate the targets more efficiently over repeated trials, and the entire fixate-and-tap sequence speeds up (Epelboim et al. 1995). That subjects represent the target locations in spatial memory is consistent with this result; these representations lead to more efficient tapping movements. There is also evidence that the locations of objects in a scene that would be used in the future were retained in memory, facilitating the targeting of subsequent saccades (Aivar et al. 2005), while Henderson et al. (2005) have shown that eye movements play a functional role in the learning of faces.
Another observation that suggests that subjects might be planning movements several seconds ahead in natural behavior is the occurrence of what has been termed “look-ahead fixations” (LAF). In a study of gaze during a hand-washing task, as subjects approached the wash basin they fixated the tap, soap, and paper towels in sequence, before returning to fixate the tap to guide contact with the hand (Pelz and Canosa 2001). These fixations on objects that were not being manipulated, but would be used a few seconds later were called “look-aheads”. Since subjects did not look-back at objects once they had finished with them (even though the objects remained in full view) it seems likely that these fixations were not random. Similar look-ahead fixations have also been observed in tea making (Land et al. 1999) and in sandwich making (Hayhoe et al. 2003), where about a third of the reaching and grasping movements were preceded by a fixation on the object a few seconds earlier.
The aim of the current study was to investigate the function of these look-ahead fixations. Pelz and Canosa (2001) interpreted look-ahead fixations in terms of their perceptual role, suggesting that they provide continuity of perceptual experience. Alternatively, they may be a consequence of the increased significance of the upcoming target, which may attract a fixation if vision is not required for another purpose. In the current study we investigate the possibility that fixating the location of a future target facilitates the programming of the next saccade, the next reach, or both. Facilitation of a subsequent reaching and grasping movement by a prior fixation is suggested by evidence that pointing accuracy to remembered locations are improved by prior fixations on the target (Terao et al. 2002). In addition, cells in posterior parietal cortex, supplementary eye fields, and pre-SMA often respond non-selectively to a planned movement to a target, independently of the effector (i.e. reach alone, saccade alone, or both; Calton et al. 2002; Fujii et al. 2002). Thus eye and hand movement plans appear to be intimately related at the level of neural control as well as psychophysically. If reaches are facilitated by previous fixations on the target, this facilitation may be manifest in reduced reach latencies or increased velocities (cf. Epelboim et al. 1995) as well as by increased accuracy.
In addition to facilitating motor action, it is also possible that look-ahead fixations reveal high-level visual plans involved in the execution of complex tasks. Hodgson et al. (2000) looked at gaze during problem solving in the Tower-of-London task. Subjects had to plan, but not execute, the sequence of moves required to solve the problem. They found that gaze strategy correlated with the discrete phases of problem solving, with the more efficient planners directing gaze more often to the problem critical areas. Recently, it has also been observed that a patient with action disorganization syndrome (ADS) (Forde et al. 2004; Humphreys and Forde 1998) resulting from lesions in frontal cortex produced fewer anticipatory fixations in a tea making task (Forde et al. 2006). Both studies are consistent with the idea that look-aheads reflect high level planning for performing a complex task, or a series of complex tasks, and that such planning is present in everyday situations.
Extended sequences of behavior can be modeled as a series of sub-tasks, with “turn on faucet, reach for soap dispenser, dispense soap, hands under water stream” all being examples of sub-tasks within the main task of “Wash hands”. It is not clear, from the studies where look-aheads have been reported, if subjects were looking ahead to objects in the next sub-task in the sequence or to the next object of manipulation. The strategic use of gaze required to achieve behavioral goals needs some flexible control over the switching between different sub-tasks, and look-aheads could be a reflection of high level planning during a series of complex actions (sub-tasks) as opposed to planning the next reach and grasp. Perhaps, in extended sequences of behavior where subjects are free to move in relation to the objects, look-aheads reflect planning of the upcoming sequence of actions in the task (as in the Tower of London paradigm). This has important implications for theoretical models of everyday actions and for interpreting the behavior of patients with disorders of actions, such as Action Disorganization Syndrome, where failure to complete behavioral routines is attributed to degradation of stored action schema.
To examine the role of look-aheads in a more systematic way, we devised a paradigm that is simpler and more repetitive than unconstrained tasks like sandwich making or hand washing. We measured the eye, head, and hand movements of subjects while they assembled wooden models on a tabletop. One goal of the experiment was to quantify the extent to which look-ahead fixations occurred in the task, and to see if it was possible to influence their occurrence by manipulating the task structure. We therefore introduced a modest alteration in the order of sub-tasks, while preserving the relatively unconstrained context of natural behavior. The second goal of this study was to investigate whether look-ahead fixations influence motor planning of the subsequent reach and grasp to a set location.
Twelve undergraduates from the University of Rochester participated in this experiment. All had normal, or corrected to normal vision. All were right handed. Subjects gave informed consent prior to the experiment, which had been approved by the University of Rochester Research Subjects Review Board. All subjects were paid for their participation.
Design and procedure
Additionally, to investigate the possibility that look-ahead fixations were a behavioral marker of higher-level mechanisms used in everyday task planning, we asked subjects to join the four pieces together in two differing ways. In the first condition, subjects were asked to reach out and pick up piece 1 and piece 2, join them together, then join piece 3 to that structure, and then finally join piece 4 to that assemblage to complete the model. This was the Join-All condition. In the second condition, subjects again had to reach out to piece 1 and 2, this time joining them together into a T-shape, put that T-shaped structure down on the table and then reach out to piece 3 and 4, join those into a second T-shape, and then finally join both T-shapes together (Make-T condition).
Prior to commencement of the experiment, a white cloth occluded any view of the tabletop (see Fig. 1a). Inserted through the cloth and into small pre-positioned rubber blocks glued to the board were nine “push-pins”. These served as calibration points for the eye tracker, enabling the subjects to be calibrated on the plane of the working surface. Once calibration was completed, the cloth was removed and subjects began the task. A re-calibration procedure was performed after every five models, using the rubber blocks on the tabletop as the calibration points. All instructions were given verbally, prior to removal of the cloth obscuring the tabletop.
The tabletop display
Figure 1 shows the layout of the work surface. The overall dimensions of the board were 120 cm × 60 cm. The arrangement of the containers holding the model pieces is shown in Fig. 1b, with a total of 10 plastic containers mounted on the work surface. Each container was 23 cm (l) × 9 cm (w) × 4 cm (h). Four containers held the model pieces that had to be manipulated, two contained distractor pieces (full distractors), two remained empty (empty distractors) and two contained nuts and bolts, respectively. Inserted into the board in front of each container was a wooden peg that either had a number (between 1 and 4) or the letter X written on it (see Fig. 1c). The pegs served to identify each container, so for example if a peg had the number 1 on it then the container was number 1, and the pieces within that container were all piece 1. All other containers that did not contain a component piece were marked with an X. Figure 1d illustrates the layout of all pieces on the board. There were five identical pieces within containers 1–4, enabling the subjects to construct five models without interruption.
Recording eye, head and hand movements
Monocular (left) eye position was monitored using an Applied Science Laboratories 501 eye tracker with a scene camera. The video based eye tracker is headband mounted, using IR reflection to provide an eye-in-head signal at a sampling rate of 60 Hz and accuracy of ∼1°. The scene camera mounted on the headband was positioned so that its field of view was coincident with the observer’s line of sight. The ASL 501 has a real time delay of 50 ms when indicating point of gaze (as indicated through crosshairs on video frames). Also attached to the headband was a magnetic head tracking (MHT) receiver, a Polhemus Fastrak sampling position and orientation at 60 Hz. The MHT data and eye gaze data were integrated using the ASL’s EyeHead™ Integration software. This software combines the eye and head position data for the computation of gaze with respect to the scene space (the work surface). The EyeHead™ Integration package allows for scene space to be defined through a set of planes returning the intersection of the eye in the plane’s coordinates. In this instance, we used the plane of the workspace for our data collection, defining the x, y, z coordinates of the work surface with respect to a MHT transmitter mounted on the edge of the supporting table. Calibration of the eye tracker on the plane of the work surface was performed prior to removal of the white sheet using a nine-point calibration (see above). A small laser diode with a 2D holographic diffraction grating was attached to the headband of the eye tracker (Babcock and Pelz 2004). This was maneuverable, and projects a 9-point grid in front of the subject. This grid serves as a visual aid in stabilizing the head during calibration. Coordinates of all 10 containers, the work space and the put-down space (see Fig. 1d) were stored and used for analysis of eye movements over the work surface, while the movements of the headband MHT receiver within the field were used to provide head movement data.
Hand movements of the subjects were monitored by attaching two other Polhemus (Fastrak) motion receivers to the backs of fingerless gloves worn on each hand by the subject. Movements of these receivers within a field generated from a second transmitter mounted underneath the centre of the table (also sampling at 60 Hz) was captured and saved to the data files. To signify the beginning and end of each trial, we asked subjects to place both of their hands on the workspace on top of each other (and keep them stationary for a minimum of 3 s) after they had placed a completed model in the put down area. This enabled a trial boundary to be determined as being when both receivers were nearly stationary and in close proximity, for 3 s or more.
Saccades and fixations were identified using in house software that used the combination of outputs from three algorithms: a velocity based threshold, an adaptive velocity-based algorithm, which adjusts its threshold dependent on an estimate of the noise present in the velocity signal, and a two state (saccade and fixation) hidden Markov model for identifying saccades, described further in Rothkopf and Pelz (2004). The parameters used for the threshold algorithms defined a fixation as when the eye velocity was under 40°/s for at least 45 ms. Note this is an initial parameter for the adaptive algorithm since the threshold will shift dependent on noise levels. Additionally, fixations that occurred within 50 ms of another and were within. Seventy-five degrees of spatial separation were combined. The output of the three algorithms was combined using a simple voting algorithm requiring that at least two of the algorithms agreed that a given data sample was a saccade or fixation for it to be marked as such. The eye-head integration output allowed the identification of the position of fixations on the tabletop and the object they fell on. Timing corrections were made to the eye and hand tracking data due to the latencies inherent in processing, 5 ms for the Polhemus Fastrak and 50 ms for the ASL 501. Reaches were analyzed to find the start, end, and target of the reaching and grasping movements. Hand motion was identified as a reach if the hand moved over 7 cm/s for a minimum of 300 ms, over a minimum distance of 5 cm. All trajectories satisfying these requirements had to initiate in the workspace and terminate in one of the containers for a model piece. Lastly, head rotational motion was analyzed using the raw heading signal from the motion tracker mounted on the ASL 501 headband. Head rotation was segmented using a velocity and time threshold requiring motion of 5°/s or greater for at least 300 ms to qualify as a head movement. Parameters for eye, head, and hand analyses were obtained empirically and validated against frame-by-frame analysis of the video record. Combined, these analyses allowed look-ahead fixations to be automatically identified, and have eye-hand latencies computed.
Characterizing look-ahead fixations
In a task such as hand washing, look-ahead fixations are easier to define because an object (e.g. a faucet) is used once in the task. It’s relatively easy to see when someone looks at it during manipulation, or if they look at it several seconds before use. However, in a task that involves the repetitive use of objects from the same spatial location, the boundaries between looking ahead and looking back become blurred. The vast majority of fixations in our paradigm, as in most tasks, serve to assist actions and the small percentage that do not fall into this category count as either look-aheads or look-backs if localized on task specific objects. We classified all fixations that fell on the four relevant containers during the course of a reach and grasp as guiding (G) fixations. Additionally, as subjects often saccade to a target just prior to the initiation of a reach, we also classified as guiding fixations all fixations that were on a container and stayed on that container until initiation of the reach from the workspace. All other fixations that could not be classified as guiding fixations were then put into categories of look-ahead fixations or look-backs. Look-ahead fixations (LAF) included all fixations that fell on a container in the 10 s period before the initiation of a reach from the workspace to that container (given that there was at least one intervening fixation on a separate area). This category does not include the guiding fixations. A look-back fixation (LB) was counted if a fixation occurred within a 10 s window after a reach and grasp sequence had been completed. Therefore, a look-ahead fixation is any fixation on an object prior to the fixations that actually assist the reach to, and manipulation of, that object and by definition there must be at least one fixation between a look-ahead fixation and the ones that guide reaching and grasping that is not on the target object.
Distribution of fixation durations during the task
Frequency of look-ahead fixations
Does the number of look-ahead fixations change over time?
Additionally, we looked at the relative frequency of look ahead fixations across the three blocks that contributed to each task sequence (Fig. 6b). Again, there was no significant difference in the number of look-ahead fixations across time in either task. Using ANOVA on task and block position, there was no effect of task (F1 = 2.7, P = 0.15) and no effect of block position (F2 = 0.8, P = 0.48). There was a trend for these fixations to be greater in the first block. It is possible that this was a result of the initial large number of fixations that subjects directed over the containers as the sheet was removed at the onset of the first block. However, finding no decrease in look ahead fixations over the course of the session (approx 1 h) or over the course of a task (approx 0.5 h) suggests that in this experimental context, look ahead fixations are not significantly influenced by familiarity with the scene layout.
Does the number of look-ahead fixations change with task sequence?
According to our prediction, there should be no difference in the number of look ahead fixations to any container across tasks apart from 3 and 4. We found that there was a clear difference in the number of look ahead fixations to 4 as a result of the manipulation (see Fig. 7), but this difference was not significant for piece 3 (Wilcoxon signed; piece 3, z = - 0.9, P = 0.93; piece 4, z = −2.49, P < 0.05). Importantly, there was no difference in the number of look ahead fixations across tasks to any of the other pieces that were manipulated. Overall, the first two pieces received more look-ahead fixations that the last two. During the interval between each trial, and as shown in the video, subjects often made fixations on the first two pieces as they waited for the trial to start, accounting for more fixations on 1 and 2. This again shows the relevance of look-ahead fixations for the most immediate action, as they did not look ahead to pieces 3 or 4 during that phase. Figure 7 also shows that there were fewer look-ahead fixations to the nuts than to the bolts. It is possible that this is a consequence of the fact that the bolts were closer to the subjects than the nuts (see Fig. 2), perhaps making the bolts more behaviorally salient than the nuts.
If look-aheads only serve to guide the next upcoming reach, then we would only expect a decrease in look-aheads to piece 4, as the reach and grasp actions up to 3 do not differ in either task, even though the shape to be constructed out of 3 and 4 does (see Fig. 2). Conversely, if look-aheads reflect planning of more than one sequence of actions (i.e. more than one piece at a time), then we might expect a change in looking ahead to both 3 and 4 as making a T shape is a different sub-task than in the join all condition. That the frequency of look ahead fixations to piece 3 did not change as a result of our manipulation suggests that it is unlikely that look ahead fixations reflect planning at the level of both pieces (i.e. look ahead fixations do not change because subjects have to make a T shape as opposed to consecutively joining them to a larger structure). Our finding a difference to piece 4 alone suggests that they reflect planning of one piece at a time and they point to a “just in time strategy” with respect to the next object of manipulation, and not to the overall configuration. A consequence of keeping pieces in fixed locations is that piece 4 was always on the left of the table, resulting in subjects using the left hand to reach and grasp piece 4. In this experiment, we cannot exclude the possibility that handedness could contribute to the difference in frequency of look-aheads to the two components; although it seems unlikely given that there was no difference between pieces 1 and 2.
What are look-ahead fixations for?
Look-ahead fixations change the eye/hand latency
When reaching to a container to pick up a piece, subjects usually fixated the container to guide the movement. The eye/hand latency of this reach is the time between the initiation of the saccade from the workspace towards the target and the initiation of the accompanying reach. Although the saccade sometimes occurred after the reach had been initiated (a negative eye/hand latency), the dominant strategy was to look at the containers before initiating a reach (a positive eye/hand latency). In our analysis, hand motion was identified as a reach if the hand moved over 7 cm/s for a minimum of 300 ms, over a minimum distance of 5 cm. All trajectories satisfying these requirements initiated in the workspace and terminated in one of the containers of a model piece were examined. Using repeated measures ANOVA, we looked at the eye/hand latency of all reaches that went to the six different locations. There was no effect of position (F1,5 = 1.9, P = 0.15). The data was therefore pooled and we looked at the effect of look-aheads on the subsequent eye/hand latency. The mean eye/hand latency for reaches not preceded by a look-ahead fixation was 230 ms, while those that were preceded by a look-ahead had a mean eye/hand latency of 353 ms. This difference was significant using repeated measures ANOVA (F1,11 = 10.32, P < 0.01).
In everyday tasks there are no start signals, and we cannot say if the increase in the eye/hand latency following a look ahead fixation was due to the eye departing earlier towards the target or if the hand was delayed. In an attempt to further explore this finding we compared the durations of reaches to see if they differed as a result of a preceding look-ahead fixation. We also looked at the durations of the individual guiding fixations and the total time spent guiding a reach and grasp (i.e. the sum of several guiding fixations on the object that might typically accompany a reach and grasp action).
The duration of a reach did not change depending on the occurrence of a preceding look-ahead fixation (Mean LAF = 0.64 s, No LAF = 0.61 s), ANOVA (F1,11 = 1.43, P = 0.26). There was also no significant change in the peak velocity of reaches to a container when there was a prior look (LAF 107 cm/s, No LAF 120 cm/s). However, the average time spent visually guiding a reach was greater (LAF 1.23 s, No LAF 815 ms), ANOVA (F1,11 = 46.9, P < 0.001). This was not due to an overall increase in the mean duration of each guiding fixation but due to an increase in the mean number of these fixations. There were on average, 2.2 fixations used to guide a reach/grasp without a LAF, and 2.9 fixations with a LAF, ANOVA (F1,11 = 23.42, P < 0.001).
Look ahead fixations and the targeting of eye movements
We speculated that one potential benefit of looking ahead might be to facilitate the accuracy of an eye movement that guides action. Evaluating accuracy in our paradigm is difficult. The target is continuously present, and subjects always end up on the target. Consequently we looked at the eye movements that took more than one saccade to reach the container from the workspace. If a subject did not move gaze directly to the box with one accurate saccade from the workspace, then that saccade would land at some location in the vicinity of the box. Although most of these would be classified as hypometric saccades, this is not always the case in natural tasks, as some saccades land around all boundaries of a container.
We plotted the landing points for saccades that did not land directly on the containers 3 and 4, and calculated the distance from the centre of the target container to these “intermediate” fixations (we excluded any that fell on the opposite half of the table as they were the likely result of an eye movement towards the wrong container). It is not possible to know if subjects were in fact targeting the centre of the container, but it serves as a convenient reference point. The mean distance of the intermediate fixation to the centre of the containers was 33 cm in the absence of a look-ahead, and 22 cm following a look-ahead (ANOVA, F1,14 = 5.4, P < 0.05). There was no effect of container location (F = 0.38, P = 0.54). The results suggest subjects were able to get 11 cm closer to the center of the container with a change in gaze. This figure is difficult to convert to degrees as viewing distance is always fluctuating in natural tasks. As a guideline, if viewing distance to the center of the container were approximately 50 cm, this would be equivalent to 12.6°. At 65 cm, it would be 9.6°.
Additionally, when a reach was preceded by a look-ahead, 43% of the saccades that were initiated from within the workspace landed directly in the target container, while this figure dropped to 37% when there was no preceding look-ahead. This modest increase in initial saccades landing in the target box following a LAF was not significant (using a binomial test for equality of two proportions), but our findings suggest that when subjects look ahead to a target container before looking back to the model in hand, the subsequent saccade during the pickup of that piece lands significantly closer, perhaps resulting is a greater proportion of fixations being classed as guiding fixations instead of “intermediate” fixations, while those that were not inside the container were 11 cm closer to the target centre following look-ahead fixations.
Head movements and look-aheads
The most noticeable head movements in our study occurred when subjects looked left or right towards the two containers that contained pieces 3 and 4 (see Fig. 2b, d). To look at the amplitude of these head movements, we looked at the change in heading (the rotational component of the movement) when subjects looked to the left or right. This was the difference between the initial heading (direction the head was facing at onset of head movement) and the final heading (direction the head was facing at the end of a head movement). Directly ahead was 0°, ±90° was a heading to the far right and left, respectively. We also investigated the head/eye latency at the onset of the movement of gaze and the peak velocity of these movements.
There was no significant effect of looking ahead on the change in heading when reaching to containers 3 and 4 (LAF 23°; No LAF 26°). Also, there was no significant difference in the peak velocities of these head movements (LAF 54°/s; No LAF: 60°/s) and a similar result was found for the head/eye latency (LAF 153 ms, SE 80; No LAF 211 ms, SE 19). Overall, the mean was 204 ms, SE 19. While head movements can be initiated after the saccade (Abrams et al. 1990), in this study the dominant strategy was to initiate a head movement prior to executing a saccade.
While there was no effect of look-aheads on head movements accompanying a reach, head movements that accompanied a look-ahead (16°) differed in amplitude to those that accompanied a reach (25°), ANOVA (F1, 21 = 9.01, P < 0.01). This is consistent with other evidence suggesting that the head plays a special role in reaching and grasping movements (Smeets et al. 1996; Flanders et al. 1999).
The goal of this study was to explore the extent and role of anticipatory eye movements in ongoing natural behavior. We focused on the role of look-ahead fixations, which have been observed in previous studies of natural behavior, as a potential indicator of planning processes. As in previous studies, we found that look-ahead fixations occurred with a modest but significant frequency, before about 20% of the reaching and grasping movements to pick up a piece. Thus such anticipatory fixations are a common feature of natural behavior. Since subjects rarely fixate a location in the period following the reach (look-backs), it seems likely that look-aheads indeed reflect some aspect of task planning. The distribution of look-aheads in time was revealing, clustering within the 3 s period before the initiation of the associated reach, as was the difference in duration between look-aheads and look-backs, indicating some aspect of the next action was planned a few seconds ahead of time in natural behavior, as suggested by the coordination patterns in sandwich making (Hayhoe et al. 2003).
To explore whether look-ahead fixations reflected high level planning for the task, we manipulated the task structure by separating the “joining” operations into two sets (two T’s instead of a single four-bar object), while maintaining the same sequence of reaches, to see if this changed the frequency of look-ahead fixations occurring on the last two pieces if they were now part of a second set. Our manipulation was only partially successful, as look-ahead fixations were significantly reduced only for the fourth piece in the Make-T condition, but not the third. Since this modest manipulation of task sequence changed the pattern of looking ahead, the finding supports the suggestion that these anticipatory looks somehow reflect the task planning. However, in this exploratory study it is not entirely clear what the plan is, and it appears to be more complex that our tentative prediction. The breakdown of component actions illustrated in Fig. 2 shows that asking subjects to make two different sets introduced two major differences between conditions. First, it changed the task of making one large four-bar object into one of making two smaller two-bar objects. Secondly, it resulted in the reach to piece 4 occurring earlier in the Make-T condition than in the Join-All condition (action 10 vs. action 7 in Fig. 2). However, the time to reach to piece 3 did not really change (action 6), even though it was part of the second set in the Make-T condition. Taken together, this suggests that it was not a division of the task into two sets that influenced the number of anticipatory fixations, but rather, that temporal differences in the timings of the reaches contributed to the number of look-ahead fixations.
These anticipatory “intrusions” of the next action suggest that the underlying control structure of natural tasks is not strictly sequential, but can be modeled as a set of loosely coupled micro-tasks that are capable of running on an opportunistic basis. A lack of strict sequentiality in the task sub-components might be a natural consequence of the different roles and time constraints on the eyes and hands. While the hands are occupied with attaching the pieces the eye, if free, can move on to the next sub-task. Structures of this kind can be modeled by Partially Observable Markov Decision Processes (Ballard and Sprague 2005). Such models may be able to predict disorders in the sequentiality of task sub-components observed following frontal cortex damage (e.g. Schwartz et al. 1995; Forde and Humphreys 2002).
Given that look-ahead fixations are quite common, we were interested to see if there was some more explicit manifestation of the planned action. It is possible that anticipatory fixations merely reflect the increased saliency of the next target, and occur when there is no conflicting demand on vision. However, it also seems likely that there is some behavioral advantage to early planning. In particular, location of the target by a prior saccade might facilitate the programming of the next saccade, the next reach, or both. Our results support facilitation of the next saccade. Saccades that accompanied the reaching and grasping movement were initiated about 120 ms earlier in cases where a look-ahead occurred, and, on those trials where more than one saccade was required to locate the target, they landed significantly closer to the target box on the initial saccade.
We were unable to discern any facilitation of the reach, however. Terao et al. (2002) found that pointing accuracy to remembered locations is improved by prior fixations on the target. In our task, it is not really possible to measure accuracy of the movements, since the target is continuously present and reaches start from different positions within the workspace. Other measures, however, such as peak velocity and reach duration showed no effect of the prior fixation. In addition, there was no reduction in the fraction of the reaching and grasping movement that was guided by fixation on the target, as might be expected if the reach program was better defined by prior specification of the target. Rather, there was an increase in the duration of foveal guidance as a consequence of the earlier fixation on the target.
Another way that facilitation of the reach might have been manifest would be by a reduction in eye-hand latency for reaches preceded by a look-ahead fixation, for example, if the fixation allowed earlier programming of the reach. However, eye-hand latencies increased. As mentioned above, this is consistent with facilitation of the saccade, but not the reaching and grasping movement. Thus, although eye and hand movement plans are intimately related (Calton et al. 2002; Fujii et al. 2002; Jeannerod 1988; Carey 2000), the facilitation of subsequent eye movements are not accompanied by a corresponding facilitation of reaches in the present task context. It may be that the limiting factor was the length of time required to screw together the pieces, a task requiring both hands. During this period, visual monitoring is not required, and the eye is free to look elsewhere in the scene. Thus it may have been unrealistic to expect a speeding up of the hand movements. In the tapping task used by Epelboim et al. (1995) hand movements were limited only by tapping the previous target, and a speeding up of the movements was observed. Also, in the sandwich making task of Hayhoe et al. (2003) early hand movements relative to the eye were often observed when one hand was free to begin the next action when the eyes were supervising an ongoing action with the other hand.
Although facilitation of the hand movements was not observed, there is clearly some influence of the prior fixation on the following saccade. This is of interest with respect to the issue of what information is integrated across saccadic eye movements. It is generally thought that representations of the information from prior fixations are limited to a small number of objects and scene gist, and that the spatial information in such representations is ill-defined (eg, Irwin 1991; Irwin and Andrews 1996; Irwin et al. 1990; Henderson and Hollingworth 1999; O’Regan 1992; O’Regan and Levy-Schoen 1983). However, in the present experiment, several fixations intervene between the look-ahead fixation and the subsequent re-fixation of the target. Thus the representation of information from a prior fixation facilitates a subsequent movement in a reference frame that is independent of eye position. Our study supports the growing body of evidence that information accumulated across saccades is more extensive and spatially precise than previously thought (Aivar et al. 2005; Chun and Nakayama 2000; De Graef et al. 2001; Hayhoe et al. 1992).
Other studies also show robust memory representations of multiple objects and their locations in images of scenes (Hollingworth and Henderson 2002; Melcher and Kowler 2001; Tatler et al. 2003). These representations appear to be somewhat longer-term memory representations, however. It is notable in our experiment that look-aheads did not change in frequency across blocks of trials, and seemed to be timed for just a few seconds ahead of the associated reach. It is also worth noting that the different emphasis of the earlier findings on the semantic nature and low spatial precision of inter-saccadic representations may reflect the different functional demands of the experimental paradigms involved. The current paradigm involves coordinated movements and planning, a context where an integrated representation of spatial position is clearly advantageous, whereas many of the earlier paradigms investigating integration across saccades simply required memory for the identity of objects, and their precise location was less important.
Finally, we found no evidence of a change in head movements during a reach following a prior look at the target. However, we did find that during a look-ahead fixation subjects made significantly smaller head movements than when they had to reach to the same target location. Our finding implies that during reaching and grasping, head movements can differ depending on the visual task, and subjects typically align the head more with the direction of the reach than when simply looking. In earlier work we have also observed a correlation between head and hand trajectories (Smeets et al. 1996; Pelz et al. 2001) in a block-copying task. This suggests that the linkage between head and eye is very flexible and supports the notion of independent controllers for the eye and head downstream of the Superior Colliculus (Freedman et al. 1996). It also suggests that the head is important for control of the hand movements. Biguer et al. (1985) showed that pointing to an eccentric target was more accurate when the head was directed to the target. Flanders et al. (1999) showed that pointing errors to a remembered target in a step, stoop, and point movement were correlated with the direction of the head. They suggested that the head might act as a stable platform, or reference frame, for the hand movements in the context of whole-body movements. Alternatively, eye position in the head might be most accurate when the eye is centered in the orbit, and this might facilitate the transformation from visual to arm coordinates. The finding of a linkage between the head and hand also has implications for the suggestion that reaches are coded in eye-centered coordinates (Batista et al. 1999; Buneo et al. 2002), pointing rather to a head-centered encoding. Given that the cells in the Parietal Reach Region that code reaches have gain fields that are modulated by eye position in the head, this is not inconsistent with Batista and Anderson’s findings. Our results are consistent with the findings of Boussaoud et al. (1998), who found that eye position modulates cells that code reach direction in dorsal pre-motor cortex, and suggested that reaches are coded in a head-centered frame and with Flanders et al’s (1999) suggestion that the head forms a stable platform for the hand in the context of a moving body. It is not clear, however, whether the important factor is the position of the eye in the orbit, or the hand’s relation to the head.
We conclude that look-ahead fixations are not incidental. They are important aspects of the use of gaze in everyday life. In addition to its use for active online monitoring of manipulation, vision can also be proactive, gathering information ahead of time for future movements. Our findings suggest that the timing of these fixations is not accidental, as the majority occur a few seconds prior to initiation of the object related act, and not after the object has been acted on. This implies that subjects are planning an action in the near future, and that look-ahead fixations reflect short-term memory processes as opposed to longer-term representations. While they do not occur before every reach, a look-ahead reliably predicts an upcoming action.
Our present experiment could not fully reveal the interplay between task structure and the nature of the planning indicated by look-aheads, although the findings suggest that this planning is more likely to be associated with the upcoming movement rather than with sub-tasks involving sets of movements. They facilitate the use of gaze, indicating that information that was transferred across the intervening fixations regarding spatial structure was available to, and used by, the oculomotor system. We could find no facilitation of the reaching and grasping movement as a result of these prior looks. Nevertheless, we cannot exclude the possibility that look-ahead fixations might facilitate reaches in different task contexts, or that they could provide additional functions in addition to facilitation of gaze.
Eye movements in natural tasks have been shown to reflect task demands, and to operate using specific strategies. The main finding from these studies was that fixations were coupled to the progress of the task. This study extends those findings by examining look-ahead fixations that are also coupled to the progress of the task, but are predictive in that they occur prior to the visuo-motor components of the routine. We conclude that they are purposeful, reflect planning, and the visual information processed during these fixations is capable of being retained across intervening fixations and confers spatial and temporal benefits to subsequent visuo-motor coordination. The trans-saccadic transfer of information from a look-ahead fixation has a functional role in the performance of the subject, and this strategy is present in everyday ongoing behavior.
This work was supported by NIH grants EY05729 and RR09283. Thanks to Constantin Rothkopf and Jeff Pelz for their assistance, and to five anonymous reviewers for their helpful comments. Parts of this work have been previously presented at the Vision Sciences Society (VSS) annual conference in Sarasota, Florida.