Visual percepts are radically different from the pattern of light that activates photoreceptors in the retina. For example, we perceive a 3D world of objects that are in motion relative to ourselves and to each other. However, the image on the retina is 2D, with no explicit representation of motion, objects, materials, or depth. Motion, material, shape, scene layout, and causation must be constructed on the basis of computations over the sequence of 2D images in the two eyes guided by assumptions about the image-to-world mapping.

Even seemingly low-level properties, such as position are not simply detected; an object’s perceived position can appear radically shifted from its retinotopic position (Cavanagh & Anstis, 2013; De Valois & De Valois, 1991; Ramachandran & Anstis, 1990; Tse & Hsieh, 2006). In particular, when the internal grating of a peripherally moving Gabor patch drifts in a direction orthogonal to its envelope’s path, an observer perceives the Gabor’s trajectory to be tilted in the direction of the internal drift. Such a doubly drifting Gabor might appear to be moving in a direction 45 degrees offset from its actual path. This illusion has been known variously as the infinite regress illusion (Tse & Hsieh, 2006), the curveball illusion (Gurnsey & Biard, 2012; Shapiro, Lu, Huang, Knight, & Ennis, 2010; Kwon, Tadin, & Knill, 2015; Ueda, Abekawa, & Gomi, 2018), and the double-drift illusion (Cavanagh & Tse, 2019; Lisi & Cavanagh, 2015; Liu, Tse, & Cavanagh, 2018; Liu, Yu, Tse, & Cavanagh, 2019). Here, we will use the latter name.

The double-drift illusion allows a dissociation of perceived and physical locations. By “perceived position” we mean where an object appears to be in the world as consciously experienced by the observer. By “physical position” or “veridical position” we mean the position of the object in the world or the stimulus on the screen. The visual system generally represents object positions correctly—namely, we see them at their physical location in the world, but illusions like the double-drift can induce differences between physical positions and perceived positions.

In this study, we used the double-drift illusion to induce large differences between perceived and physical positions in order to investigate whether attentional tracking operates over perceptual or physical position representations. Recent results have ruled out early visual cortex as the origin of the double-drift illusion (Cavanagh & Tse, 2019; Liu et al., 2018) and even suggest that it might arise outside of visual cortex entirely, perhaps in frontoparietal regions (Liu et al., 2019).

A similar paradigm, albeit with static Gabor envelopes, was used by Maus and colleagues (Maus, Fischer, & Whitney, 2011) as well as Dakin and colleagues (Dakin, Greenwood, Carlson, & Bex, 2011) to investigate the effect of perceived position shifts on crowding. When reporting the orientation of a Gabor patch that is flanked by other Gabor patches, crowding will reduce performance when the distance to the flankers is less than about one half of the eccentricity of the target. In these experiments, distances were fixed in physical coordinates, but the perceived distance was altered by drifting the internal textures of the Gabors toward or away from the target (De Valois & De Valois, 1991). When the perceived spacing was decreased due to the internal drift, crowding increased, suggesting that the region of crowding is defined over perceived positions (Dakin et al., 2011; Maus et al., 2011).

We build on these results here in order to examine the spatial representation accessed by attention in a tracking task using the double-drift illusion to introduce a dissociation between perceived and physical target distractor spacing (see Fig. 1). Performance in multiple object tracking tasks depends on, among other factors, the distance between target and distractors (review Cavanagh & Alvarez, 2005). If the tracking of target locations during attentional tracking operates over representations in physical coordinates, no difference between the conditions should be observed. However, if attentional tracking operates on perceived object positions, tracking should be easier in the condition with increased perceived distance than in the condition with decreased perceived distance.

Fig. 1
figure 1

Stimulus schematic. The Gabors rotated around a stationary point in the periphery. Internal drift (white arrows) made them appear closer to each other in the inward drift case (a and c) and farther from each other in the outward drift case (b and d). White arrows show internal drift, and black arrows indicate the Gabors’ envelope motion. After the staircase procedure to determine a baseline spacing for each participant, the physical distance was the same on all trials, while the perceptual distance varied according to the drift condition. While panels a and b refer to the physical stimuli with internal drift and equal physical distances, panels c and d refer to the percept caused by this stimulus, where the internal drift has led to an offset of the Gabors’ positions

When a target and a distractor come too close to each other, it is not possible to individuate them, and the observer is more likely to lose track of the target (Cavanagh & Alvarez, 2005). This zone of interaction or pooling has been linked to the general phenomenon of crowding (He, Cavanagh, & Intriligator, 1996; Pelli, 2008). If two objects fall within the radius of the pooling region, their features are mixed and cannot be further individuated.

The present study addresses the question of whether the distance that limits target selection in attentional tracking is based on physical distance or perceptual distance. We find that perceptual, and not physical, target distractor spacing underlies tracking performance, demonstrating that attentional tracking operates in perceptual coordinates. This could either occur at a stage prior to visual consciousness, if illusory shifts have already emerged at an earlier level, or at the stage of visual consciousness itself. In either case, attentional tracking operates on object locations after they have been converted to perceptual positions.

Method

Participants

We recruited 15 participants (10 women, five men, age range: 18–30 years, mean age: 22.4 years ± 4 years) from the Dartmouth community and reimbursed them with $10. Their vision was normal or corrected to normal. Participants volunteered and gave informed consent. The experimental protocol was approved by the Institutional Review Board at Dartmouth College.

Apparatus

Participants sat alone in a dark testing room, facing an LCD screen (15-in. wide, 1,280 × 1,024, 60 Hz). A chin rest was used in order to hold the distance to the screen constant at 57 cm. Stimuli were created in MATLAB (The MathWorks, Natick, MA, USA) using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997).

Eye movements of the right eye were monitored with a head-mounted eye tracker (EyeLink 2, SR Research, Oakville, ON, Canada; 500 Hz sampling rate).

Stimuli

The display consisted of a white fixation point in the middle of the screen (0.2 degrees visual angle [dva] diameter) and Gabor patches on a uniform gray background. The Gabor patches—sinusoidal gratings with a Gaussian envelope, (σenv = 0.1 dva), and a spatial frequency of 2 cycles per degree—served as stimuli for all experiments described below.

Gabors were presented at the three corners of an imaginary equilateral triangle (compare Fig. 1). The Gabor patches were oriented such that their internal grating was orthogonal to a hypothetical line from each Gabor’s center to the triangle’s center. This orientation was chosen to maximize the effect of internal drift on the perceived distance between them. The internal drift of the grating (4 Hz) of all three Gabors was moving towards their common center in the inward condition and directly away from it in the outward condition. In all experiments, the Gabor triangles rotated around their center, which was 8 dva away from fixation, with an angular velocity of 180°/s. Distances between Gabors here always refer to center to center distance, not to the gray space between visible parts of the Gabor.

Procedure

Pretest

In order to determine by how much internal drift would change the perceived distance between Gabors in our specific configuration, we devised a quick perceptual pretest utilizing the method of constant stimuli. Participants were asked to compare two of the triangular Gabor configurations described above. One of them consisted of Gabors with drift toward the virtual triangle center (inward), while the other Gabors drifted away from their virtual center (outward). While participants fixated in the middle of the screen, one triangle of Gabors appeared in the top right and the other in the bottom left quadrant of the screen. Inward and outward drifting Gabor triangles were presented pseudorandomly interleaved on the left and right side of the screen. Participants were tasked to report whether the spacing of the left or right triangle was wider in a two-alternative forced-choice (2AFC) design. Their responses were then recoded to mean whether the outward or the inward drifting triangle was perceived as more widely spaced.

All triangles were equilateral, but the physical distances between inward drifting Gabors were larger than those between the outward drifting Gabors by either 0, 0.264, 0.498, 0.762, 1.025, or 1.26 dva. The spacing between Gabors for the outward drifting Gabors was always 1.2 dva. Participants completed 144 trials (i.e., 24 trials per spacing difference). A psychometric function of these spacing differences can be fit to the frequency with which the outward drifting triangle was perceived to be farther apart. The point of subjective equality (PSE) on this psychometric curve is then equivalent to the average perceived distance shift caused by the internal drift of the Gabors.

Tracking experiment

We presented participants with one triangular configuration of drifting Gabors, which appeared pseudorandomly in one of the four quadrants of the screen. Participants were tasked with tracking one Gabor, while ignoring the other two. In contrast to most other tracking paradigms, where targets and distractors move in seemingly random directions, here, they always rotated smoothly around the center of the equilateral triangle (Alvarez & Cavanagh, 2005; Holcombe, Chen, & Howe, 2014; Störmer, Alvarez, & Cavanagh, 2014). In order to discourage tracking strategies involving only the start and end positions of the Gabors rather than actual tracking, the duration of motion varied from trial to trial. The physical distance between the three Gabor patches was first set to give an 80% tracking accuracy for the outward drifting condition using a staircase procedure and so it varied from participant to participant. This distance was then used to determine performance with both inward and outward internal drift.

The experiment consisted of a total of 120 trials, split evenly into three blocks. All trials were initiated by a button press, making the procedure entirely self-paced. At the beginning of a trial, the to-be-tracked Gabor was indicated by its offset and onset prior to motion onset. The indicated Gabor flashed for 1 s, marking it as the to-be-tracked target, and after another 250-ms pause, all three Gabors moved (i.e., rotating around their center) for a random duration of between 4.5 and 5.5 seconds. At the end of each trial, one of the Gabors flashed. The subject responded whether or not the same Gabor had flashed in the beginning and end of the trial. The guessing rate was 50% rather than 33.33% because the initially cued object was probed at the end of half the trials, while one of the two nontarget Gabors was probed at the end of the other half of the trials. Participants also had to keep their eyes fixated on the central spot in the middle of the screen while they covertly tracked the cued target. Additionally, written instructions were provided at the beginning of every block. After participants had given their response, they received feedback and were informed about the number of trials they had answered correctly so far. The task was completely self-paced, as subjects could decide to take a break between any two trials.

The first block of 40 trials was dedicated to a standard one-up/one-down staircasing procedure with the outward drifting Gabors where the distance between the Gabor patches was adjusted before every trial depending on the participant’s performance in the preceding trial. If they tracked the target correctly, the distance between Gabors decreased by 0.111 dva; if not, it increased by 0.147 dva. The starting distance was 1.5 dva and the minimum distance between Gabors was 0.7 dva, to avoid overlap. At the end of the first block, the reversal points of the staircasing procedure were averaged to give the distance at which a participant should perform with roughly 80% accuracy (García-Pérez, 1998; Kaernbach, 1991). The three Gabor patches were held at this distance in all remaining trials for both inward and outward drifting Gabors. The following two blocks had an equal number of trials of both conditions (i.e., 20 per block and condition).

The double drift illusion works best in the visual periphery (Tse & Hsieh, 2006). In order to prevent subjects from looking directly at the rotating Gabors, which would disrupt the illusion and help tracking performance, we monitored participants’ eye movements during the experiment. Eye-tracking data were used to exclude all trials in which participants looked away from fixation by more than 2 dva during tracking (an average 14.5% of trials). Performance was collapsed across Blocks 2 and 3, while the staircasing data from Block 1 were analyzed separately.

Results

Pretest

If internal drift had no impact on perceived distance between Gabors, participants should have reported equally spaced sets of Gabors as equally far apart for both the inward and outward drift conditions. However, internal drift caused participants to perceive Gabors as farther apart in the outward drift case and closer together in the inward drift case. At equal spacings, participants on average reported the inward drifting Gabors to be farther from each other only 10.3% of the time, which is significantly below the 50% level expected if there were no illusion, t(14) = 16.786, p < .001, Cohen’s d = 6.130. At the largest difference in Gabor spacings, when inward drifting Gabors were 1.26 dva farther apart than outward drifting Gabors, participants on average reported the inward drifting Gabors to be farther apart 96.9% of the time.

In order to find the point at which these two different sets of Gabors appeared equally spaced for each participant, we fit a Weibull function to the proportion of “inward appears farther” responses (see Fig. 2 for a representative participant’s data). We then used the fitted function to estimate the PSE by finding the spacing at which the function predicts 50% of responses to fall either way. The average PSE across participants was 0.46 dva, which was significantly different from zero, t(14) = 14.02, p < .001. Thus, the internal direction of the Gabors shifted the apparent spacing by about 38% so that a set of Gabors with inward motion needed to be spaced by 1.66 dva to appear to have the same spacing as the set with outward drift that had 1.20 dva spacing.

Fig. 2
figure 2

Results from pretest and tracking experiment. a Sample fit of psychometric Weibull function with estimated PSE. b Average and individual PSEs. c Average and individual tracking performance for inward and outward drift conditions. All error bars correspond to ± 1 standard error of the mean

Tracking experiment

If attentional tracking operates on perceived positions, tracking should be more difficult when the Gabors’ internal grating drifts inward, because the three Gabors are then perceived to be ~38% closer to one another. Indeed, on average, participants had 12.73% more correct trials in the outward than in the inward condition, which was highly significant, t(14) = 4.415, p < .001, Cohen’s d = 1.077. There were no floor or ceiling effects in either condition. Mean tracking accuracy in the outward condition was 79.66% (SD = 12%), quite close to the baseline of 80% targeted by the staircase procedure of the first session, while the mean tracking accuracy in the inward condition was 66.93% (SD = 12%). Compare Fig. 2 for more details.

Discussion

Attentional tracking was notably impaired when the double-drift illusion appeared to move the target closer to the two distractors. Since physical target distractor spacing was equal in both conditions, this effect could only have been driven by the perceived distance between targets and distractors. Consequently, our results suggest that the attentional tracking system, which determines the locus of attentional selection, is influenced at some stage by the perceptual representations of target location. Our results support the view that attentional selection and tracking occur late in the visual hierarchy, after the conversion to perceptual representations of object location (He & Nakayama, 1992; Hochstein & Ahissar, 2002; Özkan, Tse, & Cavanagh, 2020; Suzuki & Cavanagh, 1995). Our data do not support models of attentional tracking that describe tracking solely as an encapsulated, low-level visual process (e.g. the fingers of instantiation theory; Pylyshyn & Storm, 1988). This aligns with data that instead support higher level mechanisms of attentional tracking (Cavanagh & Alvarez, 2005; Oksama & Hyönä, 2004). Our account of attentional tracking as operating on high-level perceptual representations is also in line with recent findings linking attentional tracking ability with other higher cognitive processes (Tullo, Faubert, & Bertone, 2018).

Our results suggest that attentional tracking operates primarily on representations in the perceptual coordinates of conscious vision, and quite plausibly on the content of conscious perception itself. It seems that attentional tracking is unable to operate solely on the retinotopic representations that are used to construct conscious percepts (He & Nakayama, 1992; Hochstein & Ahissar, 2002; Suzuki & Cavanagh, 1995). This is consistent with the recent finding that pop-out in a visual search paradigm happens only among perceptual rather than stimulus-level double-drift oddballs (Özkan et al., 2020). Similarly, it has been shown that crowding does not affect representations in early visual areas (He et al., 1996) and instead operates over perceived positions (Dakin et al., 2011; Maus et al., 2011).

Our results are also consistent with attentional tracking performance that is seen when saccades are involved (Howe, Drew, Pinto, & Horowitz, 2011). Participants in this study had to execute multiple saccades while attentionally tracking. When the display shifted with the eye movement, preserving retinotopic locations, tracking was disrupted. However, when spatiotopic locations of the objects were preserved across saccades (i.e., nothing shifted on the screen during the saccade), participants performed better. These results show that it is possible to execute saccades while tracking, provided that the spatiotopic target locations are preserved. This suggests that attention tracks targets in their spatiotopic locations, corrected for eye movements, ruling out retinotopic locations. Our results here go farther and demonstrate that attentional tracking selects from target representations that include illusory perceptual shifts.

Although our results clearly show that tracking in our task is influenced by perceptual coordinates, two other findings with the double-drift stimulus have suggested that attention might be in physical coordinates. First, Lisi and Cavanagh (2015) showed that saccades to the drifting Gabor land along a line parallel to the physical path of the Gabor rather than its perceived path. To the extent that spatial attention is linked to saccades (e.g., Awh, Armstrong, & Moore, 2006), we might expect attention to also be unaffected by the double-drift illusion. Second, in an fMRI study, Liu et al. (2019) found that activity in early visual areas allowed the decoding of the physical but not the perceptual positions. Liu and colleagues suggested that the attentional feedback went to the physical location, not the perceived location, because attention, like saccades, would be immune to the illusion.

The saccade results of Lisi and Cavanagh (2015) were replicated by Nakayama and Holcombe (2020). However, they also found that irrelevant transients (e.g., a flash of light that would grab attention) also reset the illusion, bringing the perceived location back to the physical location. According to these authors, saccades targeted the physical location not because saccades were immune to the illusion, but because the attention drawn by the saccade had reset the illusion. Due to this, the perceived location would be the same as the physical location at the time of the saccade. Their result challenges the claim that attention might be in physical coordinates. It also raises a question about whether the effects of attention during tracking in our task might eliminate the illusion. We can reject that notion on two grounds. First, attentional tracking is more like smooth pursuit than a saccade (Horowitz, Holcombe, Wolfe, Arsenio, & DiMase, 2004; Howe et al., 2011; Howe, Pinto, & Horowitz, 2010), and smooth pursuit has been shown not to affect the illusion (Cavanagh & Tse, 2019). Second, we did measure a significant effect of the illusion on the apparent size of the rotating trajectory and on the performance in tracking, which would have been impossible if attention had reset the illusion. Although ballistic attentional shifts may reset the illusion (Nakayama & Holcombe, 2020), we demonstrate that smooth attentional shifts do not.

Why do our results demonstrate that attentional tracking operates in perceptual coordinates while Liu et al. (2019) suggest it operates in physical coordinates? There are a number of possibilities, and we outline three here. First, it may be that when tracking is disrupted, the participants must rely on memory to recover the target, no matter which coordinate system attention operates on during tracking. Memory of location is most likely in perceptual coordinates—at least, we know that memory saccades are influenced by the double-drift illusion (Massendari, Lisi, Collins, & Cavanagh, 2018) even if immediate saccades are not. In this event, the remembered (perceptual) location in the outward motion case would be less crowded than in the inward motion case.

Second, the illusory shifts produced by our stimulus may be in a smaller range that affects saccades, and so by inference, attention. Specifically, the perceptual shifts we find with rotating Gabors are smaller than have been reported in other double-drift studies using linear trajectories. Possibly the presence of continual rotation or curved trajectories leads to a saturated offset or limited accumulation of offset errors. Whatever the reason, the offset we find is closer to that seen for static Gabors with internal drift (De Valois & De Valois, 1991). Notably, for offsets of this size with stationary Gabors with internal drift, saccades target the perceived positions (Kosovicheva, Wolfe, & Whitney, 2014; Schafer & Moore, 2007). Indeed, in Lisi and Cavanagh’s (2015) study, as well as in Nakayama and Holcombe’s (2020), although saccades did not follow the increasing offset of the illusory path, they did show a constant, small shift at all locations. This constant offset in the direction of the internal motion was similar in magnitude to that seen for static Gabors. In contrast to the double drift, the effects of this smaller position shift can be observed in V1 using fMRI (Schneider, Marquardt, Sengupta, de Martino, & Goebel, 2019; Whitney et al., 2003), although it is paradoxically in the opposite direction from the perceived shift.

Finally, there is a hybrid alternative. Attention may be guided to the physical location, but then must select the target from a map that carries perceived locations. The two locations are not very far apart, and the perceived location would fall within the selection zone of attention (Intriligator & Cavanagh, 2001) so that it can be tracked. Nevertheless, the perceived locations are influenced by the illusion and so are closer together in the inward condition, leading to a higher probability of interference. In the end, our data do not resolve which of these accounts is correct. These unanswered questions need to be addressed by future research.

A number of motion illusions demonstrate that perceived object positions are not merely detected but rather constructed depending on (among other things) an object’s motion (Cavanagh & Anstis, 2013; Cavanagh & Tse, 2019; De Valois & De Valois, 1991; Tse & Hsieh, 2006). Positions are, however, explicitly encoded on the retina. At some stage between these two extremes of the visual processing hierarchy there must be a conversion of position information from retinotopic to perceptual object locations. We do not fully understand where or how this occurs, but our data show that attentional tracking does not operate only on the early representations that would still be in physical coordinates. Instead, we show that its processing bottleneck is at a later stage where positions are coded in perceptual coordinates. This, together with the evidence from visual search (Özkan et al., 2020), suggests that attention accesses perceptual representations at a high level, perhaps at the level of consciousness.