Prior research on the speed of intentional influences on perception has predominantly investigated voluntary allocations of visual–spatial attention. In a typical paradigm, a symbolic instructional cue (e.g., a centrally presented arrow) informs the participant of the likely location of an upcoming target (Bashinski and Bacharach 1980; Hawkins et al. 1990, Posner, Snyder, and Davidson 1980). The time interval between the instructional cue and the presentation of the target (cue-lead time) allows the experimenter to control the time available for the intentional allocation of visual–spatial attention (Cheal and Lyon 1989, 1991, Muller and Rabbitt 1989, Posner 1980). A study using this paradigm showed that for untrained observers, the perceptual benefit of voluntary visual–spatial attention becomes statistically reliable within about 175 ms of cue-lead time, and asymptotically effective at about 400 ms (Muller and Rabbitt 1989). These results revealed the time course of intentionally allocating visual–spatial attention to a specific location.

Less is known, however, about the time course of the influence of intention on the construction of a percept. Here, we address this question using a “one-shot” ambiguous apparent-motion paradigm. For example, a trial begins with a frame comprising a vertical pair of squares. When these squares are replaced by their 90º-rotated (i.e., horizontal) version—the rotated frame—either a clockwise or counterclockwise rotation is momentarily perceived. Rotation in an instructed direction is typically seen if an instructional cue is presented sufficiently before the presentation of the rotated frame, so that the participant has enough time to intentionally engage appropriate processes to influence the percept that is generated.

We chose apparent rotation as our ambiguous stimulus for the following reasons. The ambiguous motion signal available to generate the percept of apparent rotation is given only at the moment when the rotated frame appears; therefore, we know that the process of percept generation depends on the visual signal presented at this specific moment. In contrast, the Necker cube, face–vase figure, and other static ambiguous stimuli continuously provide figural signals as long as they are displayed, and the percepts generated by these stimuli can evolve or change over time. Moreover, in our experience, each percept of apparent rotation is compelling, at least for relatively small stimuli, typically producing a percept of rotation definitively in one direction or the other. In contrast, in our experience, the percepts of Necker cube and other static ambiguous stimuli are not as distinct. For instance, a Necker cube may appear flat or oscillating between the two orientations, and a face–vase figure may look like neither a face nor a vase.

To examine how rapidly intention can influence the construction of apparent rotation, we presented an instructional cue (a tone) indicating whether participants should try to see clockwise or counterclockwise rotation with variable delays between the cue and the onset of the rotated frame. Because motion information is available only once the rotated frame is presented, the speed of perceptual control can be inferred by determining how the success of perceptual control depends on the delay between the cue and the presentation of the rotated frame (cue-lead time). Note that the visual system retroactively constructs a motion percept after the presentation of the rotated frame, whereas the resultant motion is perceived to occur before the rotated frame (similar to retroactive priming effects; see, e.g., Kahneman, Treisman, and Gibbs 1992). The intriguing result revealed by the present experiments is that intentional control over the construction of apparent motion is engaged so quickly that intention reliably influences the direction of perceived rotation, even if an instructional cue is presented simultaneously with the rotated frame.

Experiment 1

Method

Participants

Eight undergraduates at Northwestern University with normal hearing and normal or corrected-to-normal vision signed informed consent to participate for course credit or monetary compensation. We adhered to the Declaration of Helsinki in all procedures.

Stimuli

The visual stimuli are illustrated in Fig. 1. The initial frame consisted of a central fixation marker “ο” (0.22º × 0.22º of visual angle) and a pair of vertically arranged squares (each 0.22º × 0.22º of visual angle). The fixation marker and squares were dark (21.8 cd/m2) against a lighter background (51.7 cd/m2). Each square was equidistant from the central fixation marker (0.49º of visual angle from fixation to center of square). The rotated frame was a 90º-rotated version of the initial frame. The small-radius configuration produced compelling percepts of rotation in either the clockwise or counterclockwise direction at a viewing distance of approximately 50 cm. The auditory cues were two 17-ms complex tones (460 and 1480 Hz). On each cued trial, one of the two tones was randomly selected to be the cue tone with equal probability. Sounds were presented at a comfortable listening level well above threshold (~70 dB SPL).

Fig. 1
figure 1

Schematic of the no-intention, perceptual-control, and the post-intention tasks used in Experiment 1. In the initial, no-intention task, participants were instructed to ignore the tones. In the perceptual-control task, each of the cue tones was given a meaning; participants were instructed to try to see clockwise (CW) rotation when they heard a high tone, and to try to see counterclockwise (CCW) rotation when they heard a low tone. In the post-intention task, performed after the perceptual-control task, participants were again asked to ignore the cue tones.

Apparatus

The visual stimuli were presented on a 13-in. color CRT monitor, and the auditory stimuli were binaurally presented through the built-in computer speakers. The experiment was controlled using a MacIIcx (OS9) computer running Vision Shell (Micro ML Inc., Quebec, Canada).

Procedure

No-intention task

Participants initially performed one block (96 trials) of the no-intention task. Each trial began with the central fixation marker and the vertically arranged pair of squares. This vertical frame remained for 2,134 ms, until it was replaced by its 90º-rotated version, the rotated frame. Within this interval, we presented an auditory cue (either the 460-Hz or 1480-Hz tone) at a variable delay, so that the cue was equiprobably presented 0, 133, 267, 533, or 1,067 ms (cue-lead time) preceding the rotated frame (16 trials per cue-lead time). On the remaining 16 randomly interspersed trials, no auditory cue was presented (see below). The responses consisted of pressing one of two keys on a computer keyboard (one to indicate clockwise rotation, and the other to indicate counterclockwise rotation). No response-time limit was imposed. The no-intention task was used to ensure that no intrinsic association held between the low- and high-pitch auditory cues and perceived directions of rotation. The intertrial interval was 1,700 ms.

Perceptual-control task

Participants subsequently performed four blocks (96 trials each) of the perceptual-control task. The stimuli were exactly the same as in the no-intention task, but participants were instructed to try to generate clockwise rotation when precued by the higher-frequency tone and counterclockwise rotation when precued by the lower-frequency tone. The importance of accurately reporting the perceived direction of rotation (rather than simply confirming the precued direction) was emphasized. The intermixed no-cue trials helped to encourage participants to report the direction of the actually perceived (rather than the intended) rotation on cued trials. We also conducted control experiments to estimate and rule out response bias (see Experiments 2 and 3).

Post-intention task

Participants then performed a block (96 trials) of the post-intention task. This task was the same as the no-intention task, but it was performed after the perceptual-control task. Participants were asked to ignore the meanings previously associated with the cues in the perceptual-control task, to make no intentional effort to control the percept, and simply to report the direction of rotation that they perceived on each trial. Performance on this task was compared with performance on the initial, no-intention task to determine whether performing the perceptual-control task produced a direct association between the pitches of the auditory cue and the corresponding rotation percepts (“cue recruitment”; see Haijiang et al. 2006).

Perceptual-control index

To analyze the degree of perceptual control beyond any general perceptual bias toward perceiving clockwise or counterclockwise rotations, we computed the perceptual-control index, which was the z-transformed (probit-transformed) proportion of trials on which a clockwise rotation was seen when the high-pitch cue was presented (consistent with intention in the perceptual-control task) minus the z-transformed proportion of trials on which a clockwise rotation was seen when the low-pitch cue was presented (inconsistent with intention in the perceptual-control task). This index would be positive and large if participants successfully controlled their rotation percepts; it would be 0 if participants had no control over their rotation percepts; and it would be negative if participants applied a cue contingency opposing our instructions. We calculated the perceptual-control index for each cue-lead time. Note that a z-transform would diverge to ± ∞ for proportions of 0 and 1, respectively. We replaced 0 with 1/2n and 1 with 1 – 1/2n (where n = 16 was the number of trials per cue-lead time); this is a standard correction used in signal detection analyses (Macmillan and Creelman 1991).

Results

In the no-intention task, the auditory cues had no consistent effect on the perceived direction of rotation (Fig. 2, white circles with dashed lines), with the perceptual-control index being not significantly different from chance (0) at any cue-lead time [t(7)s < 0.792, ds < 0.28]. Thus, no a priori associations held between the pitch of the auditory cue and the perception of a clockwise or counterclockwise rotation.

Fig. 2
figure 2

Mean perceptual-control indices (larger positive values indicate superior control, and 0 indicates lack of control) for participants who viewed the small-radius configuration in Experiment 1 (n = 8), in the no-intention (white circles with dashed lines), perceptual-control (black circles with solid lines), and post-intention (gray circles with dashed lines) tasks. Error bars represent ±1 standard error of the mean.

In the perceptual-control task, the ability to intentionally generate apparent motion in the cued direction monotonically decreased for shorter cue-lead times (Fig. 2, black circles with solid lines) [F(4, 28) = 15.63, p < 10–6, η p 2 = .69, for the main effect of cue-lead time]. This systematic dependence on cue-lead time suggests that the control of ambiguous rotational apparent motion is more effective when a longer time is given to engage intentional control prior to the onset of the rotated frame. The dependence on cue-lead time also suggests that our participants were not entirely swayed by response bias; if they had been, they would have responded that they perceived rotation in the precued direction, regardless of cue-lead time. Remarkably, intention was engaged quickly enough to produce above-chance perceptual control even with a cue-lead time of 0 ms [t(7) = 4.46, p < .003, d = 1.58].

A potential alternative explanation for this apparently very fast intentional control of perceived rotation is that during the course of the perceptual-control task, the repeated intentional effort to generate the cued rotations might have resulted in the formation of an association between the pitch of the cue and the perceived direction of rotation (sometimes referred to as cue recruitment; see Haijiang et al. 2006). Performance on the post-intention task, in which participants were instructed only to report the perceived direction of rotation while ignoring the previously assigned meanings of the auditory cues, provides a measure of this association. It appears that performing the perceptual-control task produced a small effect consistent with an auditory–visual association, and this effect seems to have increased with cue-lead time (Fig. 2, gray circles with dashed lines); however, the perceptual-control indices in the post-intention task were not significantly greater than those in the initial, no-intention task for any cue-lead time [|t(7)|s < 2.04, |d|s < 0.722]. Importantly, the perceptual-control indices in the perceptual-control task were significantly higher than those in the post-intention task for every cue-lead time, including 0 ms [t(7)s > 3.60, ps < 0.01, ds > 1.27], providing evidence in support of rapid intentional control of rotational apparent motion over and above any potential cue-recruitment effects.

We conducted two additional experiments to demonstrate that the significant perceptual-control index for the 0-ms cue-lead time indeed indicates a surprisingly rapid influence of intention on the generation of motion percepts, over and above any potential effects of response bias.

Experiment 2

The first strategy that we used to rule out response bias was stimulus manipulation. It has been shown that longer interstimulus intervals are required to perceive apparent rotation across a wider spatial gap (Farrell, Larsen, and Bundesen 1982). We thus reasoned that it would require a longer cue-lead time to control a larger-radius apparent rotation. There is no obvious reason to expect that any response bias (i.e., a tendency to respond that the apparent rotation occurred in the cued direction, irrespective of the perceived direction of rotation) would be greater for a smaller-radius than for a larger-radius rotation. Thus, if we were to find that the perceptual-control index for the small-radius configuration (Experiment 1) was significantly greater than that for the large-radius configuration (this experiment) at the 0-ms cue-lead time, it would suggest that the rapid perceptual control demonstrated in Experiment 1 reflects perceptual control beyond response bias.

Method

Participants

A new group of eight undergraduates at Northwestern University with normal hearing and normal or corrected-to-normal vision signed informed consent to participate for course credit or monetary compensation.

Stimuli

The stimuli were the same as in Experiment 1, except for their larger dimensions. Each of the two squares was 4.36º × 4.36º of visual angle in size, and the center of each square was 8.71º of visual angle away from fixation.

Apparatus

The apparatus was the same as in Experiment 1.

Procedure

The no-intention, perceptual-control, and post-intention tasks were the same as in Experiment 1.

Results

In the no-intention task, auditory cues had no consistent effect on the perceived direction of rotation (Fig. 3, white squares with dashed lines), with the perceptual-control index being not significantly different from chance (0) at any cue-lead time [|t(7)|s < 1.75, |d|s < 0.40], except at 267 ms [t(7) = 5.30, p < .002, d = 1.87]. Thus, overall no a priori associations held between the pitches of the cue tones and the perception of clockwise or counterclockwise rotations.

Fig. 3
figure 3

Mean perceptual-control indices (larger positive values indicate superior control, and 0 indicates lack of control) for participants who viewed the large-radius configuration in Experiment 2 (n = 8), in the no-intention (white squares with dashed lines), perceptual-control (black squares with solid lines), and post-intention (gray squares with dashed lines) tasks. Included for comparison are data from the perceptual-control task using the small-radius configuration (Experiment 1; black circles with solid lines). Error bars represent ±1 standard error of the mean

In the perceptual-control task, the ability to intentionally control rotational apparent motion monotonically decreased for shorter cue-lead times (Fig. 3, black squares with solid lines) [F(4, 28) = 22.42, p < 10–7, η p 2 = .76, for the main effect of cue-lead time]. This systematic dependence on cue-lead time again suggests that participants’ responses reflect the degree of intentional control rather than a simple response bias. Importantly, perceptual control was significantly worse for the large-radius configuration in this experiment than for the small-radius configuration in Experiment 1 (reproduced in Fig. 3; black circles with solid lines), especially for the shortest cue-lead times [t(14) = 2.73, p < .02, d = 1.36, for 0 ms; t(14) = 2.23, p < .05, d = 1.12, for 133 ms; and t(14) = 2.23, p < .05, d = 1.11, for 267 ms].

Our primary goal was to estimate the bona fide intentional control of the small-radius configuration at a 0-ms cue-lead time. Perceptual control for the large-radius configuration at the 0-ms cue-lead time was marginally above chance [t(7) = 2.30, p < .056, d = 0.81], but to be conservative, we assumed that this effect was entirely due to response bias. Because stimulus radius was manipulated in separate experiments with different groups of participants, no participant was aware of our manipulation of stimulus radius. Thus, the significant improvement in the perceptual-control index (from 0.42 to 1.42 in z units) for the small-radius configuration relative to the large-radius configuration at the 0-ms cue-lead time provides a conservative estimate of the amount of rapid intentional control of rotation perception, over and above any potential response bias.

Performance on the post-intention task, in which participants were instructed only to report the perceived direction of rotation while ignoring the meanings previously assigned to the auditory cues, again provided some evidence that the repeated intentional effort to generate the cued rotations resulted in the formation of an association between the pitch of the cue and the perceived direction of rotation. Similar to what we obtained for the small-radius configuration (Experiment 1, see Fig. 2), the cue produced a small effect consistent with the association that increased with longer cue-lead times (Fig. 3, gray squares with dashed lines). In this case, the perceptual-control indices in the post-intention task were significantly greater than those in the initial no-intention task for several cue-lead times [t(7) = 3.98, p < .006, d = 1.41, for 133 ms; t(7) = 2.43, p < .05, d = 0.86, for 533 ms; and t(7) = 4.26, p < .004, d = 1.50, for 1,067 ms]. The perceptual-control indices for the perceptual-control task were significantly higher than those for the post-intention task only for the cue-lead times of 533 ms or longer [t(7) = 2.98, p < .03, d = 1.05, for 533 ms, and t(7) = 3.04, p < .02, d = 1.07, for 1,067 ms], suggesting that bona fide perceptual control of the large-configuration stimulus requires a preparation time greater than 267 ms (the cue-lead time one step shorter than 533 ms). Taken together, these results may suggest that relatively brief experience with intentionally controlling rotation percepts on the basis of auditory cues can generate a cross-modal association (or cue recruitment), so that auditory pitch can systematically influence visual motion perception (see the Discussion section).

Experiment 3

We showed in Experiment 1 that people can rapidly control the direction of perceived rotation at a 0-ms cue-lead time for the small-radius configuration, over and above any cue-recruitment effect. In Experiment 2, we indirectly ruled out response bias by demonstrating that perceptual control was significantly worse for a large-radius configuration, reasoning that, because response bias is unlikely to depend on stimulus size, the superior intentional control of the small-radius configuration should reflect genuine intentional control. The goal of Experiment 3 was twofold. First, we more directly examined the potential role of response bias for the rapid intentional control of the small-radius configuration by randomly intermixing a negative cue-lead time condition, in which the auditory cue was presented after the rotated frame. Any above-chance perceptual-control index for the negative cue-lead time would provide a measure of response bias. The second goal was to replicate the rapid perceptual control at the 0-ms cue-lead time that we obtained in Experiment 1.

Method

Participants

A new group of 14 undergraduates at Northwestern University with normal hearing and normal or corrected-to-normal vision signed informed consent to participate for course credit.

Stimuli

The stimuli were the same as in Experiment 1, except that we intermixed an additional negative cue-lead time condition in which the cue-lead time was −400 ms. In a pilot experiment in which we backward-masked the rotated frame, we found that the reliable perception of rotation from the small-radius configuration required that the rotated frame be exposed for about 200 ms prior to the mask, suggesting that it takes about 200 ms for the visual system to construct a rotation percept from this stimulus configuration. We thus set the negative cue-lead time at −400 ms so that the auditory cue was presented immediately but reliably after the rotation percept would have been constructed.

Apparatus

The visual stimuli were presented on a 19-in. color CRT monitor, and the auditory stimuli were binaurally presented through Sennheiser HD 280 Pro headphones. The experiment was controlled using MATLAB (7.6.0, Mathworks) running on Mac OS 10.6.4 on an Apple MacBook.

Procedure

The procedure was the same as in Experiment 1, except that participants performed two blocks (112 trials per block, with the addition of 16 trials for the −400-ms cue-lead time) of the perceptual-control task. Participants performed neither the no-intention task nor the post-intention task.

Results

The ability to intentionally control rotational apparent motion monotonically decreased for shorter cue-lead times [Fig. 4; F(4, 52) = 4.65, p < .003, η p 2 = .26, for the main effect of cue-lead time (including only the zero and positive cue-lead times)], replicating the results of Experiments 1 and 2. We also replicated the rapid perceptual control of the small-radius configuration at the 0-ms cue-lead time [t(13) = 2.71, p < .02, d = 0.72]. We noted that the perceptual-control indices were overall lower in this experiment (about 1.5 z units; see Fig. 4) than in Experiment 1 (almost 2.5 z units; Fig. 2). A possible explanation is that participants in this experiment had less practice than those in Experiment 1. Participants in this experiment performed a total of only 160 trials of the perceptual-control task (excluding the negative cue-lead time trials and no-cue trials), whereas participants in Experiment 1 performed 320 trials of the perceptual-control task (excluding the no-cue trials).

Fig. 4
figure 4

Mean perceptual-control indices (larger positive values indicate superior control, and 0 indicates lack of control) for participants who viewed the small-radius configuration in Experiment 3 (n = 14), in which trials with a negative cue-lead time (−400 ms) were intermixed among trials with zero and positive cue-lead times. The vertical line marks the 0-ms cue-lead time. Error bars represent ±1 standard error of the mean. Asterisks indicate cue-lead times at which perceptual control was significantly above chance (see the text for details)

Critically, the perceptual-control index for the 0-ms cue-lead time was significantly greater than that for the −400-ms cue-lead time [t(13) = 3.24, p < .007, d = 0.87], and the perceptual-control index for the −400-ms cue-lead time was not significantly different from chance [t(13) = −0.346, p > .734, d = −0.09]. During post-experiment debriefing, only one of the 14 participants indicated any awareness of the negative cue-lead time, and the statistical results remained the same without this participant. Thus, we have replicated the rapid control of rotational apparent motion at a 0-ms cue-lead time, and also ruled out response bias.

Discussion

The goal of this study was to investigate the time course of intentional control over percept generation. Prior studies have focused on how fast volition could allow the visual system to allocate visual–spatial attention (Cheal and Lyon 1989, 1991, Muller and Rabbitt 1989), sequentially change the location of the attentional focus (Horowitz et al. 2009, Wolfe, Alvarez, and Horowitz 2000), or continually track ambiguous apparent motion (Ramachandran and Anstis 1983, Verstraten, Cavanagh, and Labianca 2000).

Here, we determined how quickly volition could control the initial construction of a percept, using an auditory instructional cue in combination with a punctate, one-shot, bistable apparent-motion stimulus. Intentional control of perceived rotation monotonically declined when less time was available to prepare for and engage the control. Surprisingly, however, perceptual control of the small-radius configuration was still reliable even when an instructional cue was presented simultaneously with the rotated frame—that is, even when participants were given no preparation time. Is this indicative of rapid volitional control of visual motion perception?

We considered and ruled out two alternative possibilities. One was that participants could have built an association between the pitch of the cue and the desired rotation percept, potentially as a result of practice on trials with longer cue-lead times, which nearly always produced successful perceptual control. Although we found some evidence for such a cue-recruitment effect (e.g., Haijiang et al. 2006) in Experiments 1 and 2, we showed in Experiment 1 that the reliable perceptual control of the small-radius configuration at the 0-ms cue-lead time (and at all other cue-lead times) was significantly above the level of control attributable to cue recruitment.

We also considered response bias as an alternative explanation for the results; in other words, participants might have tended to respond in accordance with the auditory cue, irrespective of the actual rotational motion that they perceived. We indirectly ruled out response bias in Experiment 2 by showing that the perceptual-control indices at the shortest (0- to 267-ms) cue-lead times were significantly worse for the large-radius than for the small-radius configuration, consistent with the prior finding that a longer stimulus onset asynchrony is required to generate apparent rotation over a wider spatial gap (Farrell et al. 1982). Conservatively, the amount by which intentional control was significantly greater for the small-radius than for the large-radius configuration (an increase from 0.42 to 1.42 in z units) should reflect genuine perceptual control, rather than response bias.

We more directly ruled out response bias in Experiment 3, in which we also replicated the rapid perceptual control of the small-radius configuration for a 0-ms cue-lead time. A response-bias explanation would predict that the perceptual-control index would be high even if the auditory cue were presented after the rotated frame. On the contrary, despite the fact that our participants (except one) were unaware of the small proportion of randomly intermixed trials with a negative cue-lead time (an auditory cue presented 400 ms after the rotated frame), the perceptual-control index for the 0-ms cue-lead time was significantly greater than that for the negative cue-lead time, with the latter being not significantly different from chance. Taken together, these results suggest that people can detect and identify an auditory instructional cue, and then select and exert appropriate intentional control over the generation of a specific apparent rotation percept, all within the perceptually instantaneous time period during which the visual system retroactively constructs the percept of apparent motion.

Our results do not directly speak to the mechanisms governing this rapid intentional control over the generation of an apparent-rotation percept. We speculate here about two likely mechanisms: attentive tracking and a direct effect of intention on motion-processing mechanisms. An explanation along the lines of attentive tracking is that, in order to generate a clockwise (or counterclockwise) rotation, our participants might have shifted attention from the top square of the initial frame to the right (or left) square of the rotated frame. Such attention shifts have been shown to be closely associated with the generation of apparent-motion percepts (e.g., Cavanagh 1992, Horowitz and Treisman 1994, Lu and Sperling 1995, Xu, Suzuki, and Franconeri 2013). Studies on the attentive tracking of a single item within an ambiguously rotating apparent-motion display suggest that an experienced observer can perceive the tracked item to be convincingly moving in the precued direction up to a flicker rate of 4–8 Hz, corresponding to 62.5–125 ms per attention shift between successive motion frames (Verstraten et al. 2000). However, a naive observer needs 200–250 ms per motion frame when attention needs to be focused on the tracked item (Horowitz et al. 2004). As we will discuss below, even if an effective attentive-tracking strategy for our task did not require precise focusing of attention on the squares, so that Verstraten et al’s. (2000) faster tracking estimate was appropriate, attentive tracking might still not be fast enough to mediate the intention effects that we obtained for very short cue-lead times.

In our display, even without any intentional effort, either clockwise or counterclockwise rotation was perceived. Thus, for volitional control of rotation perception to be effective, an appropriately directed attention shift should occur before the visual system could automatically construct a rotation percept on the basis of mechanisms such as spontaneous asymmetric activation of neural populations tuned to opposing motion directions (see below) or spontaneous capture of attention by one of the squares in the rotated frame. Our pilot result, in which we backward-masked the rotated frame at various latencies and asked participants whether they saw compelling rotation, suggests that for the small-radius configuration, it takes about 200 ms of exposure of the rotated frame to reliably perceive rotational motion. When participants had sufficient time to prepare to shift attention in a specific direction with a long cue-lead time, it is reasonable to expect that they would have shifted attention while the rotation percept was still being generated. However, for shorter cue-lead times, participants might not have had sufficient time to both identify the desired direction of rotation and engage attentive tracking to generate the desired percept.

In particular, for the 0-ms cue-lead time, the rotated frame and the auditory cue were presented simultaneously. According to an attentive-tracking account, participants would presumably listen to the auditory cue, identify its pitch, translate the pitch to the corresponding direction in which to shift their attention, and then shift attention to the appropriate (right or left) square. It would have taken at least 100 ms to distinguish the frequency of the auditory cue, according to electroencephalographic studies of auditory frequency discrimination (Baldeweg et al. 1999, Menning, Roberts, and Pantev 2000). Verstraten et al. (2000), who yielded a relatively fast estimate of attentive tracking (see above), used participants who were experienced with attentive tracking tasks and who always shifted attention in the same direction. We used naive college students, and they did not know in which direction to shift attention until the auditory cue was presented. Thus, it could well have taken our participants substantially longer than Verstraten et al.’s estimate of 62.5 to 125 ms to eventually shift attention to the appropriate square, casting doubt on the attentive-tracking hypothesis for rapid intentional control, especially for the 0-ms cue-lead time. It is even less likely for our participants to have used their eye movements to track rotation in the instructed direction, because it would have taken them even longer to initiate eye movements (e.g., 150–350 ms; Darrien et al. 2001, Yang, Bucci, and Kapoula 2002) than to track rotation with attention.

In addition to attentive tracking, our participants could have directly influenced motion-processing mechanisms. Visual areas such as MT and MST contain neurons that are tuned to specific directions of motion (including rotational directions). These motion-tuned neurons respond to apparent-motion displays (Mikami 1991, Mikami, Newsome, and Wurtz 1986) and mediate motion perception in monkeys and humans (Moutoussis et al. 2005, Newsome et al. 1985). When the rotated frame was presented in our study, it would provide equal motion energy in both clockwise and counterclockwise directions, thus making the neural populations tuned to these opposing directions compete for activation. In this case, the neural population that happened to be more strongly activated would determine the direction of the perceived apparent rotation. The phenomenon of flicker motion aftereffect is consistent with this idea (e.g., Culham et al. 2000, Nishida and Ashida 2000). For example, after viewing a display showing continuous clockwise rotation, an ambiguously rotating display appears to rotate in the opposite (counterclockwise) direction, suggesting that the relatively stronger activity of the unadapted counterclockwise-rotation-tuned neural population drives the perceived direction of rotation. Thus, if one could intentionally increase the relative activity of, for example, the clockwise-rotation-tuned neural population, that would make a directionally ambiguous stimulus appear to rotate in the clockwise direction. Indeed, it has been shown that attending to a specific direction of motion enhances the activity of neurons tuned to the attended direction of motion while reducing the activity of neurons tuned to the opposite direction in MT (e.g., Martinez-Trujillo and Treue 2004). It is thus reasonable to hypothesize, for example, that upon presentation of the high-pitch auditory cue, the intentional effort to induce a clockwise rotation would operate via the enhancement of the relative activation of the clockwise-rotation-tuned neural population. We speculate that this direct intentional enhancement of a motion-tuned neural population might be faster than the process of locating the left or right square in the rotated frame and shifting attention to it. If so, the fast intentional enhancement of an appropriate motion-tuned neural population might primarily mediate the rapid control of apparent motion at a 0-ms cue-lead time, whereas attentive tracking might primarily mediate control of apparent rotation at longer cue-lead times.

We found that intentional control was rapid for the small-radius configuration (reliable perceptual control beyond cue recruitment for all cue-lead times, including the 0-ms cue-lead time), whereas intentional control was slower for the large-radius configuration (reliable perceptual control beyond cue recruitment only for the longest cue-lead times of 533 and 1,067 ms). Whereas attentive tracking might have been effective for both the small-radius and large-radius configurations, the strategy of directly influencing a motion-tuned neural population might have been more effective for the small-radius configuration. The large spatial gap in the large-radius configuration might have been too wide to directly engage the direction-selective neural mechanisms in MT and MST, whereas the small spatial gap of rotation in the small-radius configuration was within the scale known to activate MT neurons (Mikami et al. 1986). Thus, we speculate that the rapid intentional control unique to the small-radius configuration was primarily mediated by a direct intention effect on direction-tuned neural populations, whereas the slower intentional control for longer cue-lead times obtained for both the small-radius and large-radius configurations was primarily mediated by attentive tracking.

Interestingly, our results provide some evidence of cue recruitment. After going through a number of perceptual-control trials, participants developed a weak tendency to see rotations in the cued directions on the post-intention trials, even though they were instructed to ignore the auditory cues. The cue-recruitment effect largely occurred for longer cue-lead times, and it was absent for the 0-ms cue-lead time (see Figs. 2 and 3). This selectivity for longer cue-lead times may suggest that the cue-recruitment effect is primarily due to the formation of an association between an auditory pitch and a direction-of-attention shift, rather than to the formation of an association between an auditory pitch and activation of a specific direction-tuned neural population. If true, this might in turn suggest that visual–spatial attention mechanisms are especially amenable to forming experiential associations with auditory signals, an implication consistent with our prior results that have been indicative of learned associations between auditory stimuli and visual–spatial attention (Mossbridge, Grabowecky, and Suzuki 2011).

In summary, people can control percept generation with surprising rapidity. Our demonstration of reliable perceptual control with a zero cue-lead time suggests that interpreting an instructional cue and intentionally influencing visual-motion mechanisms can occur entirely within the subjectively instantaneous time period during which the visual system retroactively constructs a motion percept. Interestingly, at the zero cue-lead time, intentional control is implemented after the presentation of the rotated frame, but the rotational motion that is influenced by the intention is perceived to occur before the presentation of the rotated frame. Our results thus demonstrate that an effect of intention can propagate backward in the time course of the construction of visual awareness.