Methods
Participants
Twenty-seven students from Kyushu Sangyo University participated in the experiment (three were female, mean age was 21.8, and one was left-handed but used a computer mouse by the right hand). About half of them (thirteen) were assigned to the synchronous feedback condition in which the sensory feedback of MA and MV was always subjectively synchronous (50 ms). The remaining half (fourteen) were assigned to the delayed feedback condition, in which the sensory feedback of MA and MV was always somewhat delayed (150 ms). These values were chosen because they yielded consistent TREs in previous reports (Sugano et al. 2012, 2014).Footnote 1 All participants had normal hearing and normal or corrected-to-normal vision. Written informed consent was obtained from each participant. The experiment was approved by the Local Ethics Committee of Kyushu Sangyo University and followed the Declaration of Helsinki.
Stimuli and apparatus
Participants sat at a desk in a dimly lit booth looking at a 17-inch CRT monitor (Fujitsu FMV-DP97W3G running with 100 Hz refresh rate) at approximately 60 cm viewing distance. The visual stimulus was a 1-cm white square (30 ms duration, 9 cd/m2) with a black background (0 cd/m2) on the CRT monitor. The auditory stimulus was a 2000-Hz pure tone pip (30 ms duration with 2 ms rise/fall slope) presented via headphones (Sony MDR-Z900) at 74 dB(A). A 1-cm red square (30 ms duration, 3 cd/m2) and a 2250-Hz pure tone pip [30 ms duration with 2 ms rise/fall slope, 74 dB(A)] were used for catch trials (see “Design and procedure”). White noise was continuously presented via headphones at 68 dB(A) to mask faint sound of mouse-presses. A special gaming mouse (Logitech G300) was used to obtain high temporal resolution (2-ms polling interval). Stimulus presentation and response detection were controlled by E-prime software running on a general PC/AT personal computer (Compaq EVO D300). The temporal resolution of stimulus control and response detection was ~ 1 ms as verified by a multiple-trace oscilloscope.
Design and procedure
Test type (pre- vs. post-test) and exposure modality (visual vs. auditory) were within-subject factors, while exposure delay (50 vs. 150 ms) was a between-subjects factor to avoid crosstalk between exposure delays.
The experimental procedure is shown in Fig. 2. We adopted an exposure-test paradigm that consisted of a pre-test, followed by a (mini-)exposure phase and a post-test. The mini-exposure phase and the post-test were repeated many times so that the build-up of adaptation could be measured. In the pre-test, participants pressed a mouse in synchrony with a pacer (a flash on a CRT display or a tone via headphones) with their right hand. The pacers were delivered 9 times at a constant inter-stimulus interval (ISI) of 750 ms. Participants skipped the first two pacing signals to get into the rhythm and then tried to sync their mouse-presses with the following 7 pacers. The pre-test contained 20 trials, 10 with visual pacers and 10 with auditory pacers. They were presented in pseudo-alternating order.
After completion of the pre-test, the exposure/post-test trials began. During exposure, participants made 9 voluntary finger taps (no pacer) trying to maintain an inter-tap interval of ~750 ms. Each tap was followed by a flash or tone (the same stimulus as in ST task) at either 50 ms (synchronous) or 150 ms (delayed), depending on condition. The modality of the feedback signal (visual or auditory) was kept constant within a trial, but alternated in a pseudo-random fashion across trials. At the end of the exposure phase, participants were asked whether a deviant feedback signal (a red flash or a high tone) was presented (a catch trial) or not to make sure that they attended the feedback signal. The post-test immediately followed the exposure phase and was the same as the pre-test in which participants tapped in sync with a pacer. The modality of the pacer matched the modality of the feedback signal of the previous exposure phase (auditory pacers after MA-exposure and visual pacers after MV-exposure). There were 40 exposure/post-test trials, 20 visual and 20 auditory, all presented in pseudo-alternating order. Testing lasted about 1 h including instructions and practice.
Results
Trials from the practice session were excluded from further analysis. The tap-asynchrony during the synchronous tapping task was defined as the timing differences between the tap and the pacer. It was negative if the tap preceded the pacer. Missing responses and abnormal tap-asynchronies (out of range from −375 to 375 ms, which was a midpoint of the ISI of the pacer) were eliminated from analysis (0.5 % of the total number of taps). Individual tap-asynchronies that were out of range within the mean plus/minus two standard deviations from each participant’s average for each modality were treated as outliers by a per-participant basis and were also eliminated from the analysis. The rest of the tap-asynchronies were analysed and were averaged over trials for each experimental condition.
Mean tap-asynchronies
Table 1 shows the mean tap-asynchronies during the tapping task for each condition before (pre-test) and after (post-test) exposure to delayed feedback. The temporal recalibration effect (TRE) was defined as the change in tap-asynchrony from pre- to post-test. If the TRE is negative, it means that the anticipation tendency became greater after exposure to delayed feedback (as expected). The mean TREs for each condition are also shown in Table 1.
Table 1 Mean tap-asynchrony and TRE (in milliseconds) per condition in experiment 1 and experiment 2
The TREs were entered into a mixed-model ANOVA with exposure delay (lag-50 vs. lag-150 ms) as a between-subjects factor and exposure modality (visual vs. auditory) as a within-subject factor. The ANOVA showed that the main effect of the exposure delay was significant, F(1, 25) = 16.1, p < 0.001. The main effect of the exposure modality and the interaction between two factors did not reach significance, both p’s > 0.05. These results thus indicate that the tap-asynchrony was greater after exposure to delayed feedback (mean TRE = −36.7 ms) than synchronous feedback (mean TRE = −7.3 ms), irrespective of the sensory modality.
In order to test whether there were significant TREs for each condition, one-sample t tests were done to see whether the TREs were significantly different from zero. One-sample t tests (one sided as there was a clear prediction) on the TRE for each condition showed that both the MV-TRE (−35.4 ms) and the MA-TRE (−37.9 ms) in the MV-delay and MA-delay condition were significantly less than zero, t(13) = 6.6 and t(13) = 9.1, respectively, both p’s < 0.001, whereas the MV-TRE (−8.4 ms) and the MA-TRE (−6.2 ms) in the MV-sync and MA-sync condition were not significantly different from zero, t(12) = 0.8, p = 0.206, and t(12) = 1.3, p = 0.104, respectively. If the TRE differences between delay and sync are expressed in the proportion to that delay, they are 27.0 % for the MV-TRE and 31.7 % for the MA-TRE. This is shown as the “TRE diff per modality” in Table 1. Thus, delayed auditory and visual feedback induced equally large TREs to visual and auditory pacers, whereas synchronous feedback did not elicit a TRE.
Build-up of TRE
Although the MV-TRE and the MA-TRE are shown to be equal in size, it does not necessarily mean that their internal mechanisms are same. For example, if their build-up were different, it might also imply that different mechanisms are involved. To examine how the TRE built up across trials, mean TREs per trial block were calculated by subtracting the mean tap-asynchrony in the pre-test from tap-asynchronies for each trial block in the post-test. One trial block consisted of 4 adjacent trials (2 visual and 2 auditory trials). Figure 3 shows how the TREs built up across trial blocks. A nonlinear exponential decay function, p0 + p2 × (1 − exp(−p1 × x)), was fitted to the mean TRE per trial block, where p0 reflects the “initial TRE” of adaptation at the first trial block (x = 0), p1 reflects the “recalibration rate” (the greater, the faster the decay), and p2 reflects the “max span of TRE”, which is the difference between the “initial TRE” and “final TRE” after adaptation was completed (x → ∞). The fitting was carried out using the NLS function in the statistical package R version 3.1.0 (R Core Team 2014) with the NL2SOL algorithm, which gave the nonlinear least squares estimates of fitting parameters. The fitted lines are also shown in Fig. 3. As can be seen, the MV-TRE and the MA-TRE built up in a very similar way and had not yet reached plateau even at the end of the experiment (20 exposure trials = ~2-min exposure for each modality).
To verify their similarity of build-up more formally, we used a bootstrap method to calculate a confidence interval of each parameter (Efron and Tibshirani 1986; Motulsky and Christopoulos 2003). Figure 4 shows estimated values of the parameters and 95 % bootstrap confidence intervals (CI) after 2000 simulations. As shown in Fig. 4, all the 95 % CIs overlapped between MV and MA, except for the “initial TRE” (p0) and “recalibration rate” (p1) under the MV-delay and MA-delay condition (Fig. 4b), showing that the MV-TRE was initially somewhat smaller (p0 = −8.1), but built up faster (p1 = 0.744) than the MA-TRE (p0 = −23.6, p1 = 0.206).
Variability of tap-asynchronies
Some researchers have suggested that a lowered sensitivity to delays is the first stage of temporal recalibration (Navarra et al. 2005; Winter et al. 2008). This might be reflected in an increase in tapping variability before subjective simultaneity shifts. To examine whether the TRE emerged with or without a change in variability, we examined whether variability increased after exposure to delayed feedback. We calculated a mean within-trial standard deviation (WSD) of tap-asynchrony as a measure for tapping variability. The mean WSD is the averaged standard deviation of tap-asynchrony within a trial of 7 individual taps. The group-averaged WSDs are shown in Table 2. We analysed ΔWSD as the difference of the mean WSD from the pre-test to the post-test as an indicator of whether variability increased or not. The mean ΔWSDs for each experimental condition are also shown in Table 2.
Table 2 Mean within-trial standard deviation (WSD) of tap-asynchrony and ΔWSD (in milliseconds) per condition in experiment 1 and experiment 2
Before analysing ΔWSD, raw WSDs under the synchronous feedback conditions (MV-sync and MA-sync) were analysed to see whether there was a modality difference. They were entered into a repeated-measures ANOVA with exposure modality and test type (pre- vs. post-test) as within-subject factors. The ANOVA showed that the main effect of exposure modality was significant, F(1, 12) = 48.3, p < 0.001, while the other effects were not (all p’s > 0.05), indicating that tapping with auditory pacers yielded more stable responses (mean WSD = 32.7 ms) than tapping with visual pacers (mean WSD = 40.7 ms).
The mean ΔWSDs were entered into a mixed-model ANOVA with exposure delay (lag-50 vs. lag-150 ms) as a between-subjects factor and exposure modality (visual vs. auditory) as a within-subject factor. The ANOVA showed that the effect of exposure delay was significant, F(1, 25) = 4.8, p = 0.037, indicating that ΔWSD was significantly greater after exposure to delayed feedback (5.4 ms) than that after exposure to synchronous feedback (1.1 ms). This is shown as “ΔWSD diff per modality” in Table 2. No other effects were significant (all p’s > 0.05). As in the TRE, one-sample t tests (one sided as there was a clear prediction) on the mean ΔWSD for each condition showed that in the MV-delay and MA-delay condition, the mean MV-ΔWSD (6.9 ms) and MA-ΔWSD (3.9 ms) were significantly greater than zero, t(13) = 3.3, p = 0.003, t(13) = 2.9, p = 0.006, respectively. Variability in the MV-sync and MA-sync condition did not change (MV-ΔWSD = 1.9 ms, MA-ΔWSD = 0.4 ms, all p’s > 0.05).
Discussion
The results of experiment 1 show that exposure to visual and auditory delayed feedback induced equally large TREs to auditory and visual pacers with similar build-up courses.Footnote 2 This fits previous studies reporting equal amounts of temporal recalibration across modalities (Heron et al. 2009; Sugano et al. 2010). The bootstrap analysis, however, showed that the build-up rate for the MV-delay was somewhat faster than for the MA-delay. The results also show that the variability in timing increased after exposure to delayed feedback, regardless of modality. It is known that variability corresponds to the precision of interval discrimination (Ivry and Hazeltine 1995; Hazeltine et al. 1997; Krause et al. 2010; Merchant et al. 2013; van Vugt and Tillmann 2014). This increase in variability is therefore in line with the notion that adaptive shifts in subjective simultaneity are accompanied (or preceded) by a widening of the “window of simultaneity” (Navarra et al. 2005, 2007).
In experiment 2, we further examined whether exposure to visual and auditory delayed feedback yields identical TREs. Importantly, the apparent similarity between the MA- and MV-TRE does not imply that the same underlying mechanism is causing these effects because it is in essence a null finding. To examine this in more detail, we mixed in experiment 2 synchronous feedback in one modality with delayed feedback in the other modality. Participants were thus exposed to synchronous auditory feedback interleaved with delayed visual feedback, or vice versa. This allowed us to test whether adaptation to auditory or visual delayed feedback is modality specific, or rather that synchronous feedback in one modality can erase the TRE in the other modality. To be more precise, we envisaged three potential outcomes: (i) if only a general or shared component is involved in MA and MV temporal recalibration, one expects that mixing synchronous with delayed feedback will diminish the TREs in both modalities. (ii) If only sensory-specific components are involved in MA and MV temporal recalibration, one expects that the delayed modality displays a sensory-specific TRE, but not the synchronous one. (iii) A third possibility is related to the notion that audition dominates vision in time (Welch and Warren 1980; see Vroomen and Keetels 2010 for review), which was already evident in the smaller tapping variability in MA than in MV (Table 2). In this view, it is the timing of the auditory feedback that determines both the MA- and MV-TRE: delayed auditory feedback (combined with synchronous visual feedback) thus induces an MA- and MV-TRE, but delayed visual feedback (combined with synchronous auditory feedback) does not induce a TRE in the auditory and visual modality.