Spatial constancy of attention across eye movements is mediated by the presence of visual objects
Recent studies have shown that attentional facilitation lingers at the retinotopic coordinates of a previously attended position after an eye movement. These results are intriguing, because the retinotopic location becomes behaviorally irrelevant once the eyes have moved. Critically, in these studies participants were asked to maintain attention on a blank location of the screen. In the present study, we examined whether the continuing presence of a visual object at the cued location could affect the allocation of attention across eye movements. We used a trans-saccadic cueing paradigm in which the relevant positions could be defined or not by visual objects (simple square outlines). We find an attentional benefit at the spatiotopic location of the cue only when the object (the placeholder) has been continuously present at that location. We conclude that the presence of an object at the attended location is a critical factor for the maintenance of spatial constancy of attention across eye movements, a finding that helps to reconcile previous conflicting results.
KeywordsAttention Saccades Eye movements Object-based attention Remapping
In everyday life, we often find that we must keep track of several objects of interest: pedestrians and other cars as we approach an intersection, the ball and other players in sports, our friends among strangers in a swimming pool. While this tracking function itself is remarkable (Pylyshyn & Storm, 1988), it is even more remarkable that the tracking seems unimpeded by eye movements (Howe, Drew, Pinto, & Horowitz, 2011). We make about three eye movements each second, and each one shifts the retinal input and the retinotopic (eye-centered) coordinates of our targets of interest. Clearly, if we had to rediscover each target following every eye movement, sports like soccer or basketball would be much slower paced and activities like driving much more dangerous. How do we keep our attention locked onto each target as our eyes move? One mechanism that has been proposed to explain this ability consists in the updating (remapping) of target locations to compensate for each eye movement. This process has been documented physiologically in saccade and attention areas (Duhamel, Colby, & Goldberg, 1992; Fecteau & Munoz, 2006; Sommer & Wurtz, 2006; Wurtz, Joiner, & Berman, 2011) and has been demonstrated behaviorally with probes that reveal the location of attentional benefits before and after eye movements (Hunt & Cavanagh, 2011; Jonikaitis, Szinte, Rolfs, & Cavanagh, 2013; Rolfs, Jonikaitis, Deubel, & Cavanagh, 2011). A number of review papers have suggested that the remapping process offers a sparse form of visual constancy by predicting where targets will be in retinotopic coordinates following each eye movement (Berman & Colby, 2009; Cavanagh, Hunt, Afraz, & Rolfs, 2010; Hall & Colby, 2011; Mathôt & Theeuwes, 2011; Wurtz, 2008). However, other studies report that attention is updated slowly after the eye movement and that initially, attention remains in the retinotopic location despite the fact that this location is behaviorally irrelevant following the saccade (Golomb, Chun, & Mazer, 2008; Golomb, Marino, Chun, & Mazer, 2011; Golomb, Pulido, Albrecht, Chun, & Mazer, 2010). The goal of this paper is to resolve these two different outcomes by showing the critical role played by the presence of an object at the cued location.
In favor of predictive remapping occurring prior to the eye movement, several neurophysiological studies have shown activity at the predicted postsaccadic location of the target even before the saccade starts (Duhamel et al., 1992; Kusunoki & Goldberg, 2003; Sommer & Wurtz, 2006; Wurtz et al., 2011). More recent evidence of predictive remapping of attention has come from behavioral studies that investigated the dynamics of visual attention just before (Hunt & Cavanagh, 2011; Rolfs et al., 2011) or across (Jonikaitis et al., 2013) eye movements. These studies have shown that even before a saccade occurs, attention moves in the opposite direction, toward retinotopic locations that will align to the target’s location in space once the saccade is completed. These predictive attention shifts compensate for changes in retinal input by ensuring that attention is at the correct location after the saccade. Ultimately, remapping of attention should support the tracking of target locations across eye movements, as found by Howe and colleagues in a multiple object tracking paradigm (Howe et al., 2011).
In contrast, other studies report that attention is updated slowly and only after the eye movement. Golomb and colleagues developed a gaze-contingent paradigm in which participants performed an eye movement while keeping track of the position of a spatial cue (Golomb et al., 2008, 2011; Golomb, Pulido, et al., 2010). This paradigm was used to investigate the postsaccadic allocation of attention by presenting a probe at different time intervals after the saccade completion. The results showed a persisting attentional benefit at the retinotopic (eye-centered) coordinates of the cued location (even though task irrelevant) for 100–200 ms after an eye movement, along with growing facilitation at the spatiotopic location, reaching its maximum around 400 ms after saccade completion (see Casarotti, Lisi, Umiltà, & Zorzi, 2012, for simulations with a computational model that implements a spatial updating process). Neural signatures of this persisting retinotopic trace have been studied with EEG and fMRI for several different areas in human visual cortex (Golomb, Nguyen-Phuc, Mazer, McCarthy, & Chun, 2010; Talsma, White, Mathôt, Munoz, & Theeuwes, 2013).
Across these two groups of articles we find that attention can be shown to remap in advance of a saccades (Rolfs et al, 2011); conversely, attention lingers at the retinotopic location and is found only later at the spatiotopic location (Golomb and colleagues). Nevertheless, one factor may explain the conflict in these results. Specifically, in the studies that showed attention lingering in retinal coordinates, and only in these studies, participants were asked to maintain attention on a blank location of the screen and not on a visual object (Golomb et al., 2008, 2011). In contrast, in studies that showed a spatiotopic allocation of attention, the objects were present at the relevant locations before, during, and after the saccade. Golomb and colleagues (Golomb, Pulido, et al., 2010) have examined a case where a visual reference (a faint grid covering the whole display) was present throughout the trial and may have helped anchor a spatiotopic allocation. However, their grid encompassed both the retinotopic or spatiotopic locations and, possibly as a consequence, yielded mixed results. They did find a spatiotopic attentional benefit present at the earliest delay tested after the saccade (75 ms), but this was accompanied by a retinotopic facilitation that remained constant up to the later delay (400 ms after the saccade).
In the present study, we investigated the role of a spatially constrained object at the attended location. Objects are of primary importance in the organization of our perception (Feldman, 2003; Spelke, 1990), and it is known that visual objects can play a role in the deployment of spatial attention (Egly, Driver, & Rafal, 1994; Hollingworth, Maxcey-Richard, & Vecera, 2012). We examine whether the presence of a visual object at the attended location could support the maintenance of voluntary attention in spatiotopic coordinates across eye movements, and thereby reconcile the conflicting findings on this debated topic.
Ten volunteers participated in Experiment 1 (4 females, 6 males, including 1 author, ML; mean age, 28.7 years). All had normal or corrected-to-normal vision and gave their informed consent.
Stimuli, design, and procedure
Participants were seated in a silent and dimly lit room, with the head positioned on a chin rest at 60 cm in front of the computer screen. The experiment was run on a PC, using E-Prime 2.0 software (Psychology Software Tools, Pittsburgh, PA). Eye movements were recorded with a Tobii T120 screen-based eyetracker (Tobii Technology, Sweden), which was used to present stimuli through its embedded 17-inch TFT monitor.
The experiment used a trans-saccadic cueing paradigm: first, one of four locations was cued, and then the fixation cross was displaced and participants made a saccade to refixate its new location (Fig. 1a). After the saccade, a probe was presented in one of the four locations and they made a speeded discrimination of its orientation (horizontal or vertical) on a standard keyboard using the left and right index finger (assignment of left-right key with horizontal-vertical response was counter-balanced across participants).
Each trial started with a black fixation cross appearing on a gray background, horizontally centered but displaced 4° above or below the center of the screen. As soon as the participant fixated the cross, the trial started and four black square outlines (squares were 2.5° wide) appeared, arranged at the 4 corners of a rectangle of 16° width and 8° height centered on the screen. After a delay (500 ms), the cue was presented. The cue consisted in a small black square appearing for 1,000 ms in one of the four sectors of the fixation cross, indicating the square in the corresponding quadrant of the screen (Fig. 1c). Then, 400 ms after the cue disappeared, the cross was displaced up or down (depending on its initial position) of 8°, and this jump indicated to the participants to make a vertical saccade to the new fixation position. Participants were instructed to maintain fixation at the new position until response and to maintain attention focused on the cued square. The cue was spatially congruent with the probe on half of the trials (validly cued spatiotopic trials); the other half of trials was composed of an equal proportion of retinotopic trials (probe appearing at the retinotopic cued position) and control trials (Fig. 1c). We preferred to use a nonpredictive cue to avoid adding any involuntary effects to the voluntary orienting (Peterson & Gibson, 2011; Risko & Stolz, 2010). In control trials, the probe appeared always on the same side of the cue (left or right) to avoid additional costs of crossing the vertical meridian (Rizzolatti et al., 1987).
The probe stimulus was a Gabor patch (2.5 cycles/degree, contrast 100 %) presented at different delays after saccade completion (0 and 400 ms). In order to equate the task difficulty for each participant, the duration of probe presentation was adaptively adjusted online. The goal of the procedure was to keep the accuracy in the spatiotopic condition approximately within the range 65-85 %: if after a spatiotopic trial the global spatiotopic accuracy exceeded 85 % or was below 65 % probe duration was respectively increased or decreased by one monitor refresh cycle (~16 ms).
Eye movements were monitored with a sampling frequency of 120 Hz; trials in which subjects did not make the correct saccade were aborted and repeated within the same block. The saccade was considered completed as soon as a gaze position sample was detected 2.5° or closer to the second fixation point (the saccade target). The probe was immediately presented in the 0-ms delay condition or after 400 ms in the other condition. At the end of each trial, the whole sequence of sampled eye positions was automatically inspected. If in the initial fixation phase, or during the post-saccadic delay (only in 400 ms delay condition) the gaze position deviated more than 2° from the fixation point for at least three consecutive samples (approximately 25 ms), the trial was repeated later in the same block. Note that the transport delay (between the eye event and the availability of the gaze position sample) in the Tobii T120 eyetracker can be as much as 30-35 ms (Technology Tobii AB, 2010); by adding the duration of one monitor refresh period (about 16.6 ms) and the eyetracker sampling period (approximately 8.3 ms), the effective delay between saccade completion and target presentation can increase up to 55-60 ms. Nonetheless, we will refer to the condition with shorter delay of probe presentation as “0 ms” for simplicity, because it is the condition without any extra delay added between the detection of gaze in the final position and target presentation. Previous studies (Golomb et al., 2008, 2011) found the strongest attentional benefits at the retinotopic location 75 ms after saccade completion. Thus, despite the unwanted unavoidable delays, our “0 ms” condition represents an appropriate comparison to test the attentional updating against the longer delay (400 ms). In the second experiment, we use a different experimental setup and have a more precise control of time delays.
Each participant completed 384 trials: 192 trials for the spatiotopic condition, and 96 for each of the other conditions, in 2 experimental sessions on different days. Each session was divided in 4 blocks. Before each session, participants completed 40 pretest trials, consisting of only spatiotopic trials, in which the duration of the probe presentation was adapted according to a weighted up-down staircase procedure (Kaernbach, 1991) with targeted performance of 75 % correct responses. This quickly adjusted probe duration to individual sensitivity in order to move closer to the desired level of performance before the beginning of experimental trials. Pretest trials were only spatiotopic to ensure that participants correctly interpreted the central symbolic cue, and became more familiar with it, before the beginning of the experimental trials.
Trials in which the response time was shorter than 100 ms or longer than 2000 ms were excluded from subsequent analysis (0.2 % of total trials). The delay between the fixation cross changing position and the gaze position crossing a circular boundary at 2° from the initial fixation position was taken as a measure of the latency of the eye movement: trials with latency shorter than 100 ms or longer than 500 ms were excluded (5 % of total trials); the mean latency in the remaining trial was 206 ms (with a standard deviation across participants of 21 ms).
Probe duration was adjusted online, according to accuracy in the spatiotopic condition; the mean probe duration resulted in 79 ms (between subject standard deviation, 29 ms, min 34 ms and max 125 ms). To rule out any potential mismatch in probe duration across conditions, we performed a repeated-measures analysis of variance (ANOVA) on probe duration with probe position (spatiotopic, retinotopic, control) as within-subjects factor. No significant differences emerged [F(2, 18) = 0.5, p = 0.62], suggesting that our probe adjustment effectively matched accuracy across participants without significantly affecting the average duration between the experimental conditions.
With a persisting object at the cued location, Experiment 1 yielded a stable attentional facilitation at the cued spatiotopic location immediately after an eye movement. We also found a significant difference between retinotopic and control trials, which could be attributed to the difference in eccentricity between the two conditions (retinotopic trials always appeared in the position closer to the fovea), but it might reflect a persisting retinotopic attentional focus. Accuracy in spatiotopic trials was higher than in retinotopic trials; however, when comparing retinotopic and spatiotopic trials matched for eccentricity, at the two delays, we find that the advantage of spatiotopic over retinotopic trials was significant only at the later delay, a result that is consistent with a growing spatiotopic facilitation after the saccade and/or a decaying retinotopic trace (Golomb et al., 2008).
One difference with respect to the task used by Golomb and colleagues (2008) is that in our design the saccade was always predictable. However, it is unlikely that the predictability of the saccade contributed to faster remapping, as other studies have found that attention is predictively remapped in advance of a saccade using unpredictable saccades (Jonikaitis et al., 2013; Rolfs et al., 2011).
If the continuous presence of a visual object at the attended location (i.e., the placeholder) is required to maintain attention at the spatiotopic location after a saccade, removing the object after the cue and before the eye movement (as it was the case in the paradigm of Golomb et al., 2008) should impair the ability to sustain attention at the spatiotopic location, even in our simpler paradigm with only four possible probe locations. We tested this hypothesis in the second experiment, by modifying the paradigm used in experiment 1 so that the placeholders could either disappear after cue presentation or remain visible throughout the trial. These two conditions were randomly interleaved within each block. We also used two types of cues, a central, symbolic cue identical to the one used in experiment 1, and a peripheral cue, similar to the one used in previous studies (Golomb et al., 2008) to control for possible confounding effects related to the type of attentional cue.
Twelve volunteers participated in experiment 2 (5 females and 7 males, including 1 author, ML; mean age, 29.2). All had normal or corrected-to-normal vision and gave their informed consent. We added two participants with respect to experiment 1 to test higher order interactions (the interactions with placeholder presence and cue type) with possibly smaller effect size.
Stimuli, design, and procedure
Participants were seated in a silent and dimly lit room, with the head positioned on a chin rest at 60 cm in front of the computer screen. The experiment was run on an Apple MacPro Dual Intel-Core Xeon computer and stimuli were displayed on a 22-inch Formac ProNitron 22800 screen with a spatial resolution of 1440 by 1050 pixels and a vertical refresh rate of 120 Hz. The experimental software controlling stimulus display and response collection was implemented in Matlab (MathWorks, Natick, MA) using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). Eye movements were recorded with an Eyelink 1000 Desktop Mount (SR Research, Osgoode, Ontario, Canada) with a sampling rate of 1 kHz and a minimal transport delay: <1.8 ms (SR Research, 2013).
The procedure was based on the one adopted in experiment 1 (Fig. 1b). In half of the trials we presented a central cue, identical to the one used in experiment 1; in the other half of the trials we presented a peripheral cue consisting in the outline of the cued placeholder increasing its thickness up to 3 times the original value (the internal area of the square remained constant during the increase; Fig. 1c); the two types of cue were randomly interleaved within blocks. Peripheral cues are faster in orienting attention, so we used a shorter duration (300 ms), similar to the duration of peripheral cues used in previous studies (Golomb et al., 2008).
After cue presentation, the four squares delimiting the relevant positions could either disappear or remain on the screen. Participants were instructed to ignore the disappearance of the placeholders whenever that occurred and to maintain attention focused on the cued spatial location whether or not the placeholders were still present. After 500-ms, the cross was displaced up or down, depending on its initial position, of 10°, and this jump signaled the participants to make a saccade to the new fixation position. Golomb and colleagues found an enhanced pattern of early retinotopic and late spatiotopic facilitation with larger eccentricities (at 11° vs. 5° eccentricity, Golomb et al., 2008, Supplemental material). We therefore increased the size of saccade and display (and thus also probe eccentricity) to maximize the possibility of stronger retinotopic effects at larger eccentricities; the 4 positions were now at the corners of a rectangle of 20° width and 10° height. The probe stimulus was again a Gabor patch (contrast 100 %, spatial frequency 2 cycles/degree) presented at different delays after saccade completion (0 and 400 ms). The duration of probe presentation was adjusted online by a standard staircase procedure with criterion performance of 75 % correct responses in spatiotopic trials and step of one monitor refresh cycle (~8 ms). One staircase was used for both spatiotopic trials with probe at higher and lower eccentricity. Probe duration was allowed to vary between 16 and 250 ms. We used a different procedure with respect to the previous experiment because of the higher monitor vertical refresh rate (120 Hz), which allowed for a finer modulation of probe duration.
Eye movements were recorded at 1000 Hz and also monitored online: trials in which participants did not make the correct saccade or in which gaze deviated more than 2° from the correct fixation point were aborted and repeated within the same block. The screen changes were synced to the eye movements in the same way as the first experiment, but additionally we able to perform a more detailed offline analysis (see Results and Supplemental material).
Each participant made 512 trials, 256 trials for the spatiotopic condition, and 128 for each of the other conditions, in 2 experimental sessions on different days; each session was divided into 4 blocks. Trials with different cueing conditions (spatiotopic, retinotopic, and control), type of cue (central or peripheral), and presence/absence of the placeholders were randomly interleaved within blocks. As in the first experiment, before each session, participants completed 40 pretest trials, consisting of only spatiotopic trials, in which the duration of the probe presentation was adapted according to a weighted up-down staircase procedure (Kaernbach, 1991) with criterion performance of 75 % correct responses.
We detected the saccades with an algorithm based on two-dimensional eye velocity (Engbert & Mergenthaler, 2006) and defined a response saccade as the first saccade that left a circular fixation region and landed inside a target-centered circular region (radii of 2°). We rejected trials with blinks or saccades >1° before the response saccade or after the saccade and before probe presentation in trials with the longer delay (400 ms). We also excluded trials in which the saccadic latency was shorter than 100 ms or longer than 500 ms; the mean latency in the remaining trials was 181 ms (with a standard deviation across participants of 36 ms). The precise time of probe onset was marked in the eye movement recordings and was compared to the saccade landing time (detected offline). In particular, in the 0-ms delay condition, we excluded trials in which the onset of the probe was delayed 20 ms or more with respect to the saccade landing time (supplementary Figure 1A). In all, 85 % of trials were included in subsequent analysis.
Trials in which the response time was shorter than 100 ms or longer than 2000 ms were excluded from subsequent analysis (1.8 % of total trials). The mean probe duration resulted in 44 ms (standard deviation between subject 14 ms, min 24 ms, and max 81 ms). A repeated measures ANOVA on mean probe duration revealed a significant, although small, difference across cueing conditions [F(2, 22) = 5.28, p = 0.009]. In particular, the mean probe duration was slightly longer for the control condition with respect to spatiotopic (mean difference 1.68 ms) and retinotopic (mean difference 2.13 ms). This could have improved performance in control trials, but importantly there was no differences between mean probe durations in spatiotopic and retinotopic conditions [t(11) = 1.05, p = 0.31]. See Supplemental material for additional analysis that takes into account trial by trial variations in probe duration.
We conducted a repeated-measures ANOVA with mean accuracy as dependent variable and probe position (spatiotopic, retinotopic, control), delay (0, 400 ms), cue type (central, peripheral), and placeholders (present, absent) as within-subject factors. This analysis revealed significant main effects of probe position [F(2, 22) = 4.40, p = 0.02], delay [F(1, 11) = 110.60, p < 0.0001], and placeholders [F(1, 11) = 14.62, p = 0.001]. We found a significant interaction between delay and placeholders [F(1, 11) = 4.87, p = 0.049] and between probe position and placeholders [F(2, 22) = 6.88, p = 0.003] (Fig. 2b). We also found a significant three-way interaction between probe position, presence of placeholders, and delay [F(2, 22) = 8.24, p = 0.002] (Fig. 2c and d). The cue type did not yield a significant main effect [F(1, 11) = 2.77, p = 0.12] and did not interact significantly with any other factor [condition*delay*boxes*cue type: F(2, 22) = 0.07, p = 0.92; condition*boxes*cue type: F(2, 22) = 1.54, p = 0.23; condition*cue type: F(2, 22) = 1.95, p = 0.16; delay*cue type: F(1, 11) = 1.78, p = 0.21; boxes*cue type: F(1, 11) = 1.95, p = 0.29].
Follow-up comparisons (all t tests paired and two-tailed) revealed that, with the placeholder present, response accuracy was higher in spatiotopic trials than in both control [t(11) = 3.07, p = 0.01] and retinotopic trials [t(11) = 4.15, p = 0.002], whereas in contrast to experiment 1, there were no differences between retinotopic and control trials [t(11) = 0.45, p = 0.66]. On the other hand, in the placeholder-absent condition, there were no significant differences between spatiotopic and control trials [t(11) = 0.55, p = 0.59], spatiotopic and retinotopic trials [t(11) = 0.41, p = 0.69], or retinotopic and control trials [t(11) = 0.59, p = 0.56]. This pattern of results suggests that the presence of placeholders increased accuracy specifically in spatiotopic trials (Fig. 2b). This was further confirmed by a significant comparison between spatiotopic trials in the condition with versus without placeholders [t(11) = 5.71, p < 0.0001]; the same comparison did not reach significance for retinotopic [t(11) = 1.27, p = 0.23] or control [t(11) = 1.35, p = 0.20] trials.
We also found a significant three-way interaction between probe position, presence of placeholders, and delay. As a follow-up, we performed comparisons (paired and two-tailed) between the condition with placeholders present versus the condition with placeholders absent for each probe position (spatiotopic retinotopic, control) and delay (0, 400). Spatiotopic trials resulted in a greater accuracy in the placeholder present condition at both delays [delay 0: t(11) = 3.97, p = 0.002; delay 400: t(11) = 6.57, p < 0.0001]; no difference was found in control trials [delay 0: t(11) = 0.65, p = 0.52; delay 400: t(11) = 1.30, p = 0.21]; retinotopic trials had a greater accuracy in the placeholder present condition but only at the later delay [delay 0: t(11) = 1.67, p = 0.12; delay 400: t(11) = 3.71, p = 0.003]. The three-way interaction thus arises from the different effect of placeholder absence on retinotopic trials at the two delays. Comparison with the control conditions revealed a similar pattern: in the condition without placeholders, accuracy in retinotopic trials was significantly lower than control trials at the later delay [t(11) = 2.44, p = 0.03] but no significant difference emerged at the earlier delay [t(11) = 1.23, p = 0.24].
We performed additional planned comparisons between retinotopic and spatiotopic trials with the same probe eccentricity, separate for each condition. Response accuracy was higher at the spatiotopic location at both delays in the placeholder present condition [delay 0, t(11) = 5.33, p = 0.0002; delay 400, t(11) = 2.58, p = 0.02] (Fig. 2d) but only at the later delay in the placeholder absent condition [t(11) = 2.22, p = 0.048] (Fig. 2c). Accuracy at the retinotopic and spatiotopic location did not differ at earlier delay in the placeholder absent condition [t(11) = 0.61, p = 0.55].
The results of this second experiment are in agreement with our hypothesis. Analysis of response accuracy revealed that the continuous presence of placeholders enhanced performance at the spatiotopic location. The facilitation was specific to the spatiotopic location and thus not the result of a general improvement due to reduced spatial uncertainty (the presence of the placeholders had no effect on performance at the control location). Contrary to previous studies (Golomb et al., 2008), we did not find any spatiotopic facilitation in the condition with no placeholders, suggesting that participants failed to maintain attention at that location. Additionally, the three-way interaction revealed that the presence or absence of placeholders had a different effect at the retinotopic location at the two delays: when the placeholders were removed before the saccade, accuracy in retinotopic trials dropped with respect to the condition with placeholders present, but only at the longer delay. At the shorter delay with placeholders absent, accuracy was greater at cued retinotopic location (Fig. 2c), a pattern that resembles the retinotopic trace results from Golomb and colleagues (2008), but the difference with spatiotopic trials did not reach statistical significance in our data.
In contrast to the results of the first experiment, there was no retinotopic benefit (accuracy was similar in retinotopic and control trials, both lower than spatiotopic trials) in the condition with placeholders present. It is known that attention spreads in a gradient fashion around a cue (Downing & Pinker, 1985); additionally the gradient seems to be asymmetrical, with larger costs for probes more peripheral than the cue (Shulman, Sheehy, & Wilson, 1986). Therefore, one possibility is that the difference between retinotopic and control trials in the first experiment might have been caused by the different position of the probe relative to the cue (probes are more peripheral than the cue in control trials, and vice versa in retinotopic trials). In the second experiment, this effect might have been reduced because of the larger (10° vs. 8°) distance between probe locations.
Another difference between the first and second experiment warrants some discussion. Specifically, in the second experiment, but not the first, there was a significant main effect of delay, indicating that participants responded more accurately with 400 ms as opposed to 0 ms delay before stimulus presentation. This discrepancy is likely due to the timing of the probe in the “0 ms” delay conditions in the two experiments: in the first experiment the presentation of the probe in the “0 ms” condition might have been delayed up to 55-60 ms due to timing lags in the eye movement monitoring and screen refresh (a delay that is still comparable with the persistence of retinotopic benefits found in previous studies; Golomb et al., 2008). In contrast, the different experimental setup used in the second experiment allowed for a finer temporal control of eye and trial events, and we were able to present stimuli almost immediately after the saccade (delay less than 20 ms confirmed by offline analyses; Supplementary Figure 1), thus making the early delay condition more challenging for participants. In any case, these less accurate responses for the short delay probe do not affect our conclusions concerning spatiotopic and retinotopic allocation of attention.
Our findings reveal that the presence of a visual object at the attended location is a critical factor for the maintenance of the spatial constancy of attention—the ability to sustain attention in spatiotopic coordinates across eye movements. When visual attention is directed to an object, the placeholder, it is remapped to the correct spatial location with each eye movement. In contrast, when attention is directed to an empty location, participants fail to maintain it at that location across eye movements, indicating a lack of appropriate remapping.
Additionally, when the placeholders are removed before the saccade, we find that performance at the retinotopic location has a different time course compared to when the placeholders remain present: response accuracy seems slightly enhanced at the shorter delay and then drop at the later one. It is interesting to note that this pattern is present uniquely in the retinotopic trials and mirrors the process of extinguishing the previous retinotopic representation that has been described in previous studies (Golomb et al., 2008, 2011; Golomb, Nguyen-Phuc, et al., 2010; Golomb, Pulido, et al., 2010). Could this later drop in response accuracy be related to inhibition of return (IOR)? Several arguments run counter this interpretation. First, IOR typically follows exogenously but not endogenously generated shifts of attention (Klein, 2000; Lupianez, Klein, & Bartolomeo, 2006; Rafal, Calabresi, Brennan, & Sciolto, 1989). Second, in the typical time course of IOR facilitation turns into inhibition 200-300 ms after the cue (Klein, 2000), whereas in our results accuracy drops only 400 ms after the saccade (around 850 ms after the cue). Third, IOR seems automatically encoded in spatiotopic coordinates, and it is typically found at both the retinotopic and spatiotopic coordinates of the cue immediately after the saccade, both with (Krüger & Hunt, 2013; Satel, Wang, Hilchey, & Klein, 2012) and without (Hilchey, Klein, Satel, & Wang, 2012; Pertzov, Zohary, & Avidan, 2010) placeholders, whereas in our data the inhibition is only found at the retinotopic location and at the later delay. Only one study (Mathôt & Theeuwes, 2010) reported a retinotopic only IOR, but it was limited to the short delay after the saccade, and it turned into spatiotopic IOR at the longer delay.
An alternative view of our experiments is that the paradigm actually investigates memory rather than attention, because it involves the active maintenance of a location of interest across a saccade. There is an interesting overlap between attention and memory (Awh & Jonides, 2001), and several previous studies have used a spatial memory task to manipulate attention across saccades (Golomb et al., 2008; Golomb & Kanwisher, 2012; Golomb, Pulido, et al., 2010). However, because we did not ask participants to identify where the cue was, but only to identify a subsequent probe that may or may not appear at the same location of the cue, we have used a standard definition in the field and presented the cued performance as a measure of attention rather than memory. In particular, the cued location was continuously attended, rather than stored to be retrieved later. Unlike items stored in memory, continuously attended items do not, for example, show the decay functions of memory. Attended items show no decay at all until the focus of attention is distracted.
Overall, the pattern of results in the present study is consistent with the idea that remapping of attention across eye movements comprises two complementary but distinct processes: a rapid updating of the focus of attention to the new location, and a slower process of suppressing the attentional focus at the previous retinotopic location (Golomb, L’Heureux, & Kanwisher, 2014; Golomb, Pulido, et al., 2010). In our paradigm, the absence of placeholders seems to have prevented the updating process, but might have not blocked the suppressing process, resulting in a modulation over time of response accuracy at the retinotopic location. The idea that the two processes (spatiotopic updating and retinotopic suppression) might be relatively independent is consistent with another study by Golomb and colleagues (Golomb, Nguyen-Phuc, et al., 2010) in which they used EEG and fMRI to investigate the neural correlates of the retinotopic attentional trace: in that study participants responded only to probes presented at the central location, while ignoring the other locations. Both blood oxygen level-dependent signals and event-related potentials showed the strongest response enhancement for probes presented at the spatiotopic location, even at the shorter delay after the saccade. However, even if the spatiotopic location was facilitated at the shorter delay, they also found a robust enhancement for irrelevant probes presented at the retinotopic location, suggesting that the new spatiotopic location might be facilitated independently of when the attentional focus at the previous retinotopic location was extinguished.
Why would the rapid updating process fail to operate in the case of attention directed toward a blank location, as in the study of Golomb et al. (Golomb et al, 2008), and the blank field conditions of our experiments? It is important to remark that participants were explicitly told to focus on the cued spatiotopic location, irrespective of whether it was delimited or not by a placeholder. Nevertheless, participants failed to sustain attention at the spatiotopic location when a placeholder did not mark it. This suggests that activity in areas responsible for spatial updating across saccades is modulated by object-based properties in the image. It remains a challenge for future studies to determine how this modulation takes place: one hypothesis could be that the visual representation in these maps depends mostly on grouping cells in earlier visual cortices that operate some preattentive figure-ground segmentation (Qiu, Sugihara, & von der Heydt, 2007). This early grouping cells could bind basic visual features into larger compounds and provide the structure for top-down attentional selection (Mihalas, Dong, von der Heydt, & Niebur, 2011). An interesting question for future research is whether this attentional modulation becomes more effective in the periphery of the visual field.
One difference between our results and previous findings (Golomb et al., 2008, 2011; Golomb, Pulido, et al., 2010) is that we did not find evidence for a growing spatiotopic benefit in the condition with placeholder absent. However, in the original paradigm of Golomb et al., attention was manipulated through a spatial memory task, and this might have enforced participants to voluntarily recover the original location of the cue after the saccade (perhaps on the basis of other spatial landmarks, like the monitor’s frame) leading to the late increase of the spatiotopic benefit. In our task, the memory of the cue location was not tested, and participants might have let attention spread over all the cued hemifield (Hughes & Zimba, 1987) instead of recovering the specific cued location.
An alternative explanation of our findings could be that a visual object simply acted as a landmark and facilitated the localization of the cued location after the eye movement (Deubel, Schneider, & Bridgeman, 2002; Deubel, 2004). However, without an anticipatory attention shift (Jonikaitis et al., 2013; Rolfs et al., 2011) localization of the cued position, even if facilitated by landmarks, would need to be followed by an attention shift to that position. The typical duration of an attention shift has been estimated to be 150-200 ms (Khayat, Spekreijse, & Roelfsema, 2006; Montagnini & Castet, 2007), a duration much longer than the average probe duration in our second experiment (44 ms). It is unlikely that an attention shift occurred to the cued location after the saccade, because this would not have yielded the early spatiotopic benefit found here and in other studies (Golomb, Pulido, et al., 2010; Jonikaitis et al., 2013).
The present findings suggest that spatiotopic allocation of attention across eye movements relies on a mechanism evolved to keep track of the location of distinct objects across eye shifts (Cavanagh et al., 2010). Much evidence suggest that this mechanism consists in the rapid and anticipatory updating of a retinotopic map (Duhamel et al., 1992; Hunt & Cavanagh, 2011; Rolfs et al., 2011), and we argue that it operates on a structured representation of the visual input, in which visual input has been already parsed into discrete objects. As a final remark, although it is widely accepted that attention can directly select visual objects, and not just spatial locations and visual features (Scholl, 2001), it is still unclear what qualifies as an “object” for attention (Marino & Scholl, 2005; Scholl, Pylyshyn, & Feldman, 2001). Our results suggest that studying the degree to which different compounds of visual features support sustaining attention at a spatiotopic location across eye movements might provide an alternative way to investigate “attentional objecthood” (Scholl et al., 2001).
This study was supported by grant no. 210922 from the European Research Council and by a grant from the University of Padova (Strategic Grant “NEURAT”) to MZ.