People spontaneously move their bodies to music (e.g., Wallin, Merker, & Brown, 2000) but not to visual flashes. Body movements tend to synchronize with auditory rather than visual rhythms (e.g., Patel, Iversen, Chen, & Repp, 2005; Repp & Penel, 2004), and movements reciprocally influence the perception of auditory rhythms (e.g., Phillips-Silver & Trainor, 2005, 2007). Auditory learning of brief time intervals transfers to motor production of those intervals (Meegan, Aslin, & Jacobs, 2000). Furthermore, auditory processing tends to yield superior temporal sensitivity and to dominate visual processing in the perception of timing and duration (e.g., Aschersleben & Bertelson, 2003; Burr, Banks, & Morrone, 2009; Droit-Volet, Tourret, & Wearden, 2004; Hirsh & Sherrick, 1961; Morein-Zamir, Soto-Faraco, & Kingstone, 2003; Penney, Gibbon, & Meck, 2000; Shams, Kamitani, & Shimojo, 2000; Shipley, 1964; Wearden, Todd, & Jones, 2006). These results suggest that the superior auditory timing mechanisms may be uniquely associated with the production of coordinated action, which often requires precisely timed movements.

While people are engaged in rhythmic movements such as walking, running, or dancing, auditory signals such as footsteps provide feedback about the quality of temporal coordination of the ongoing action. It would thus be beneficial to increase auditory temporal sensitivity while one is engaged in action. In contrast, visual temporal signals relevant to action tend to be motion signals (e.g., trying to strike a moving target) rather than signals of temporal intervals. We thus hypothesized that initiating action would selectively enhance auditory, but not visual, sensitivity for temporal intervals.

Previous research has demonstrated that initiating action influences the perception of action-related stimuli. For example, initiating action may attenuate perceptual information congruent with the action, enhance perceptual information deviant from the action, or bias perception to be more congruent with the action (see Schutz-Bosbach & Prinz, 2007, for a review). It has also been shown that initiating action can bias the perceived time of onset of subsequently presented stimuli (e.g., Haggard & Clark, 2003; Park, Schlag-Rey, & Schlag, 2003; Yarrow & Rothwell, 2003; also, see below). However, the effect of initiating action on the precision of time perception has not been investigated.

Timing in the subsecond range is processed using perceptual mechanisms rather than with cognitive strategies such as counting (e.g., Grondin, Meilleur-Wells, & Lachance, 1999). Precise processing of brief durations is crucial for producing action, as well as for motion discrimination, speech recognition, haptic exploration, and conditioning (see, e.g., Mauk & Buonomano, 2004, and Buonomano & Karmarkar, 2002, for reviews). We thus investigated how initiating action influences auditory and visual temporal sensitivity to durations in the subsecond range.

We measured temporal sensitivity using a standard temporal-bisection task (e.g., Church & Deluty, 1977; Wearden, 1991). A sequence of three brief clicks (in the auditory timing task) or three brief flashes (in the visual timing task) was presented within 507 ms. The temporal location of the second click or flash was randomly varied, and the participants’ task was to respond as to whether the second stimulus was temporally closer to the first or the third stimulus.

If this temporal-bisection task were performed with perfect sensitivity, all clicks or flashes earlier than 253 ms would be classified as being closer to the first click or flash, whereas all clicks or flashes later than 253 ms would be classified as being closer to the third click or flash. Thus, if the proportion of “closer-to-Click 3” or “closer-to-Flash 3” responses were plotted as a function of the time of presentation of the second click or flash, performance with perfect temporal sensitivity would yield a step function with an infinitely steep slope at 253 ms. In general, a steeper slope of this temporal-bisection function indicates greater temporal sensitivity.

We determined how being either active or passive while performing an auditory (Exp. 1) or a visual (Exp. 2) temporal-bisection task influenced this slope. In the active condition, the participant voluntarily initiated the first click (or flash) via a keypress, whereas in the passive condition, the computer started each stimulus sequence.

Experiment 1

Does initiating action enhance auditory temporal sensitivity?

Method

Participants

The participants in all of these experiments were undergraduate students at Northwestern University, who gave informed consent to participate for partial course credit, had normal or corrected-to-normal visual acuity and normal hearing, and were tested individually in a dimly lit room. Sixteen students (nine women, seven men) participated in this experiment.

Stimuli

Auditory clicks (~76 dB SPL) were monaurally presented through two loudspeakers placed on each side of the display monitor. Throughout the experiment, pink noise (70 dB SPL) was presented through headphones so that the participants could not hear their keypress sounds. The clicks were clearly audible over the pink noise. Although visual stimuli were not presented in this experiment, participants were still seated 60 cm from a color CRT monitor (1,024 × 768, 75 Hz), so that the general environments were similar between this experiment and the visual temporal-bisection experiment (Exp. 2).

On each trial, three clicks (each lasting 13 ms) were presented in a sequence. The interval (onset to onset) between the first and third clicks was always 507 ms. The temporal location of the second click was varied symmetrically around the mid point (253 ms relative to the first click), including nine intervals from the onset of the first click: 147, 173, 200, 227, 253, 280, 307, 333, and 360 ms. These values were chosen to be separated by a fixed-integer multiple of the monitor video-frame duration so that the visual flashes in Experiment 2 could be presented at the same temporal intervals.

Procedure

In the passive condition, the experimenter initiated the first trial, and each subsequent trial began after a 1-s intertrial interval following the participant’s response. In the active condition, participants initiated each trial by pressing the space bar on the computer keyboard, usually with the thumb of the dominant hand. Because participants rested their fingers on the space bar, they did not need to (and did not) look down at the keyboard when they initiated a trial (verified by the experimenter looking at their eyes). In either condition, participants made an unspeeded response as to whether the second click was closer to the first click (by pressing the “z” key with the left index finger) or to the third click (by pressing the “/” key with the right index finger).

The passive and active conditions were blocked. In each block of 90 trials, the nine temporal locations of the second click were randomly intermixed across trials. The two passive and two active blocks were alternated in either of the two possible sequences (passive–active–passive–active or active–passive–active–passive). The two block sequences were counterbalanced across participants. Five practice trials were given prior to each block.

Analysis

For each of the two conditions (passive and active) for each participant, we computed the temporal-bisection function—that is, the proportion of trials on which the second click was perceived to be closer to the third click, as a function of the temporal location of the second click. A logistic function of the form \( y={1 \left/ {{\left[ {1 + {e^{{{{{-\left( {x-a} \right)}} \left/ {b} \right.}}}}} \right]}} \right.} \) is typically fit to a sigmoidal psychometric function of this sort, with a and b as free parameters. In this way, we estimated the bias and slope for each temporal-bisection function.

The value of a (a.k.a. the point of subjective equality, or PSE) indicates the temporal location where the function crosses the 50 % level—that is, the temporal location that participants perceived to be equidistant from the first and third clicks (i.e., the perceived midpoint). Thus, a value of a smaller than 253 ms (the veridical midpoint) indicates a bias for the first interval (from Click 1 to Click 2) to be perceived as longer, whereas a value of a larger than 253 ms indicates a bias for the first interval to be perceived as shorter.

The value of b is proportional to the just noticeable difference (\( \mathrm{JND}\approx 1.0986\times <Emphasis Type="Italic">b </Emphasis>\), defined as the temporal deviation from the PSE classified at 75 % accuracy), and inversely related to the slope of the temporal-bisection function at the PSE, with the slope given by \( {{\left( {{dy \left/ {dx } \right.}} \right)}_x}_{=a }={1 \left/ {{\left( {4b} \right)}} \right.} \). Note that a smaller JND and a steeper slope indicate greater temporal sensitivity.

Results

The mean slope (Fig. 1b) of the temporal-bisection function (Fig. 1a) was significantly steeper in the active than in the passive condition, t(15) = 3.476, p < .004, d = 0.869 (with JNDs being smaller in the active condition for most participants; Fig. 1c). The mean PSEs did not differ significantly between the active (251 ms, SE = 3 ms) and passive (249 ms, SE = 5 ms) conditions, t(15) = 0.587, n.s. Thus, for auditory timing, initiating action increases temporal sensitivity without biasing temporal judgments.

Fig. 1
figure 1

Results of the auditory-timing task (Exp. 1). a Temporal-bisection functions for the active and passive conditions (computed from the average values of the a and b parameters obtained from fitting a logistic function to each participant’s data; see the “Method” Section). The inset shows the psychometric functions and their logistic fits for one participant, as an example. We obtained good logistic fits across participants (mean r 2 = 0.956, SD = 0.047). b The slopes of the temporal-bisection functions estimated from logistic fits, with error bars representing ±1 SEM (adjusted for repeated measures comparison). c The just noticeable differences (JNDs) in the active condition plotted against those in the passive condition (one point per participant). Most points are below the diagonal, indicating that JNDs were reduced in the active relative to the passive condition

Experiment 2

Does initiating action enhance visual temporal sensitivity?

Method

This experiment was the same as Experiment 1, except that we recruited 16 new undergraduate students (ten women, six men) and that visual flashes replaced the auditory clicks. A black circle (0.02 cd/m2, 3.5º diameter) was flashed in the middle of the screen (each flash lasted 27 ms, or two video frames at 75 Hz) against a white (100 cd/m2) background.

Results

The mean slope (Fig. 2b) of the temporal-bisection function (Fig. 2a) was significantly shallower in the active than in the passive condition, t(15) = 2.598, p < 0.021, d = 0.650 (with JNDs being larger in the active condition for most participants; Fig. 2c). The mean PSE was marginally longer in the active condition (255 ms, SE = 4 ms) than in the passive condition (248 ms, SE = 2 ms), t(15) = 2.029, p < 0.061. The first interval was thus perceived to be slightly shorter in the active than in the passive condition. This trend did not replicate a previous result suggesting that manual action dilates visual time perception (Park et al., 2003; cf. Yarrow & Rothwell, 2003, for an additional failure to replicate). Thus, for visual timing, initiating action reduces temporal sensitivity without substantially biasing temporal judgments.

Fig. 2
figure 2

Results of the visual-timing task (Exp. 2). a Temporal-bisection functions for the active and passive conditions (computed from the average values of the a and b parameters obtained from fitting a logistic function to each participant’s data; see the Exp. 1Method” Section). The inset shows the psychometric functions and their logistic fits for one participant, as an example. We obtained good logistic fits across participants (mean r 2 = 0.950, SD = 0.046). b The slopes of the temporal-bisection functions estimated from logistic fits, with error bars representing ±1 SEM (adjusted for repeated measures comparison). c The just noticeable differences (JNDs) in the active condition plotted against those in the passive condition (one point per participant). Most points are above the diagonal, indicating that JNDs were increased in the active relative to the passive condition

Taken together, the results from Experiments 1 and 2 suggest that initiating an action as simple as a keypress significantly increases auditory temporal sensitivity but reduces visual temporal sensitivity [F(1, 30) = 17.889, p < 0.0005, η 2 p = 0.374, for the across-experiment interaction]. The opposite effects of action on the two modalities rule out the possibility that voluntarily initiating a trial might have indirectly increased temporal resolution by increasing arousal and/or reducing uncertainty regarding the onset of each trial, because these general effects should have influenced the auditory and visual timing tasks equivalently.

There was, however, a potential confound in the active condition. Besides the fact that participants voluntarily initiated the action, the keypress to initiate each trial in the active condition was accompanied by a tactile sensation on the finger. Temporal processes in the auditory and tactile modalities are closely associated (e.g., Fujisaki & Nishida, 2009; Yau, Olenczak, Dammann, & Bensmaia, 2009). Thus, it is possible that the advantage of the active condition for auditory time perception could have been mediated by the accompanying tactile sensation. To evaluate this possibility, we conducted a control experiment (Exp. 3) in which we included two versions of the passive condition in the auditory timing task. One version was identical to the passive condition in Experiment 1—the no-tactile-pulse condition. In the other version, a tactile pulse designed to be perceptually similar to the sensation of pressing the space bar was applied to the finger at the onset of the first click to simulate the tactile sensation accompanying a keypress—the with-tactile-pulse condition. If the tactile sensation from a keypress, rather than the active initiation of each stimulus sequence, enhanced auditory temporal sensitivity in the active condition in Experiment 1, the tactile pulse should similarly enhance auditory temporal sensitivity.

Experiment 3

Does tactile sensation from a keypress enhance auditory temporal sensitivity in the absence of action initiation?

Method

This experiment was the same as Experiment 1, except that we recruited 16 new undergraduate students (five women, 11 men), and that the with-tactile-pulse condition replaced the active condition. The pulse was generated using an exposed speaker cone (e.g., van Ee, van Boxtel, Parker, & Alais, 2009) and was delivered to the index finger of the dominant hand. Participants gave verbal responses (i.e., spoke “1” to indicate that the second click was closer to the first click, and spoke “3” to indicate that the second click was closer to the third click), and these responses were recorded by the experimenter.

Results

The mean slopes (Fig. 3b) of the auditory temporal-bisection functions (Fig. 3a) were equivalent in the with-tactile-pulse and no-tactile-pulse conditions, t(15) = 1.724, n.s.; if anything, the mean slope was numerically smaller (and the JNDs were larger for more participants; Fig. 3c) in the with-tactile-pulse condition. Thus, the increased auditory temporal sensitivity in the active condition in Experiment 1 was not attributable to the tactile sensation from a keypress.

Fig. 3
figure 3

Results of the control experiment in which the passive condition of the auditory-timing task was repeated with or without a tactile pulse applied at the first click sound (Exp. 3). a Temporal-bisection functions for the with-tactile-pulse and no-tactile-pulse conditions (computed from the average values of the a and b parameters obtained from fitting a logistic function to each participant’s data; see the Exp. 1Method” Section). The inset shows the psychometric functions and their logistic fits for one participant, as an example. We obtained good logistic fits across participants (mean r 2 = 0.957, SD = 0.049). b The slopes of the temporal-bisection functions estimated from logistic fits, with error bars representing ±1 SEM (adjusted for repeated measures comparison). c The just noticeable differences (JNDs) in the with-tactile-pulse condition plotted against those in the no-tactile-pulse condition (one point per participant). The points are not predominantly above or below the diagonal, indicating that the tactile pulse did not systematically influence JNDs

The mean PSE was significantly shorter in the with-tactile-pulse condition (236 ms, SE = 4 ms) than in the no-tactile-pulse condition (261 ms, SE = 4 ms), t(15) = 4.341, p < 0.0006. The tactile pulse thus perceptually expanded the first interval. It is interesting to note that this effect of tactile sensation was significantly diminished, t(30) = 3.452, p < 0.002, d = 1.229, when a similar tactile sensation accompanied a voluntary keypress (the active condition in Exp. 1). In fact, the perception of the time interval was veridical in Experiment 1 [i.e., not significantly deviated from 253 ms; t(15) = 0.640, n.s.]. Although it is unclear why a tactile sensation caused the first interval to be overestimated, the fact that this overestimation did not occur in the active condition suggests that anticipating a tactile sensation as an expected consequence of each keypress may have attenuated the impact of the tactile sensation (e.g., see Schutz-Bosbach & Prinz, 2007, for similar examples of action-related perceptual attenuation effects).

Future research will be necessary to understand how initiating action produces differential effects on the slope and PSE of the temporal-bisection function. Here we note that these differential effects are consistent with our recent finding that flicker adaptation alters the PSE of a temporal-bisection function but has little effect on the slope in a visual timing task (Ortega, Guzman-Martinez, Grabowecky, & Suzuki, in press), suggesting that perceived temporal intervals and temporal sensitivity are mediated by dissociable mechanisms, at least in the visual modality.

Discussion

The auditory modality is typically superior to and dominant over the visual modality for the processing of temporal intervals in the subsecond range (e.g., Aschersleben & Bertelson, 2003; Burr et al., 2009; Droit-Volet et al., 2004; Morein-Zamir et al., 2003; Penney et al., 2000; Shams et al., 2000; Wearden et al., 2006), a range that is important for producing coordinated motor movements (e.g., Buonomano & Karmarkar, 2002; Mauk & Buononano, 2004). Accordingly, action and auditory temporal processing are closely associated. Action tends to synchronize with auditory rather than visual rhythmic signals (e.g., Patel et al., 2005; Repp & Penel, 2004); people across different cultures move their bodies to the rhythms of music (e.g., Wallin et al., 2000); and body movements influence auditory perception of rhythmic structures in both adults (Phillips-Silver & Trainor, 2007) and infants (Phillips-Silver & Trainor, 2005). Thus, the auditory processing of timing and the production of action are closely associated.

Because auditory signals (e.g., footsteps) provide feedback about the temporal coordination of ongoing movements (e.g., walking, running, or dancing), we hypothesized that initiating action might increase auditory sensitivity to temporal intervals. We confirmed this hypothesis by demonstrating that initiating the simple action of a keypress significantly increases auditory temporal sensitivity. The result suggests the possibility that motor mechanisms and auditory temporal processing form a synergistic loop to enhance the temporal precision of body movements.

We have no straightforward explanation as to why initiating action impaired visual temporal sensitivity. It is possible that a delay of up to 14 ms between the keypress and the onset of the first visual flash (due to the 75-Hz refresh rate of the display monitor) might have disrupted visual temporal processing. It is also possible that initiating action interferes with visual temporal processing. For example, visual processing of temporal information is nearly always in the form of visual motion perception, and precise motion perception is typically important before and during, rather than after, initiating action. For example, one needs to focus visual attention on the movements of a teammate before passing the ball to her. It is thus possible that visual temporal processing might suffer a refractory lapse immediately after initiating action. Although future research will be necessary to consider these and other possibilities, it is not surprising that action would produce differential effects on auditory and visual temporal processing, because it has been shown that the visual system has its own temporal mechanisms, potentially mediated by LGN magno cells and low-level cortical visual neurons (e.g., Ayhan, Bruno, Nishida, & Johnston, 2011; Bruno, Ayhan, & Johnston, 2010; Johnston, Arnold, & Nishida, 2006; Johnston et al., 2008; Ortega et al., in press).

Finally, our results leave open some important questions that should be investigated in future research to understand how initiating action enhances auditory temporal sensitivity. For example, did initiating action improve temporal-bisection performance by enhancing the encoding of the first click (which occurred simultaneously with the keypress), the second click, or the third click, and/or by enhancing the encoding of the interclick intervals? Which processes involved in initiating action, such as intending versus executing action (Maimon & Assad, 2006), are responsible for enhancing auditory temporal sensitivity? Whatever the relevant intention or motor processes might be, they are likely to interact with the superior temporal cortex, thought to mediate auditory perception of timing (e.g., Bueti, van Dongen, & Walsh, 2008).