Procedure
Prior to experimental trials, participants sat in a chair facing a fixation point and were instructed to rotate their head to a second fixation point located 20° to the right and then back again to the left. A sound (1,000 Hz sinusoidal waveform; 80 db; 50 ms), repeated every 700 ms, was used as a reference for the speed with which to move the head, such that participants were instructed to face each fixation point on every beat. Thirty beats were presented in total during this acoustical training period prior to each block, and participants were instructed to move their heads in accordance with this trained speed and displacement for the subsequent experimental trials.
Figure 1 schematically shows the presentation of the stimuli in each trial. For each trial, participants were instructed to move their heads to the right and then back again to the left at the trained speed. Head movements were made following the offset of a “go” sound (200 Hz—not 2,000 Hz—sinusoidal waveform, 80 db), which also triggered a comparison stimulus. The duration of the go stimulus (i.e., intertrial interval) was 3 s, with an additional random 0–1.5 s duration to prevent anticipatory head movements. On account of the reaction time latencies relative to the go signal, comparison stimuli could occur before or after the head movement (c.f., Barnett-Cowan and Harris 2011). A comparison sound stimulus was presented between 0 and 650 ms after the go stimulus offset.
After each trial, participants were asked “which stimulus started first?” Participants responded by pressing either the left or right arrow key on a keyboard to indicate “sound first” or “head movement first”, respectively. Participants were instructed to attend equally to the sounds and head movement. There were three experimental blocks where, in each block, head movement was paired with one sound condition (one block containing brief sounds, one containing long square sounds, and one containing long raised-cosine sounds). The order of these three conditions was randomized across participants. There were 100 experimental trials in each block, which were preceded by 10 practice trials. Participants closed their eyes after being trained to move their head at a given speed and kept them closed for the duration of each block. Data collection took approximately 12 min for each block. Participants were allowed to take as long as they needed to make their judgments. The order of conditions was randomized across participants, and testing occurred within 1 h.
Data analysis
The percentage of responses in which sound was selected as occurring first was plotted as a function of SOA, with negative SOAs signaling that the head movement took place before the presentation of the sound. A two-parameter sigmoidal logistic curve (Eq. 2) was fitted to the data using SigmaPlot (version 9). The inflection point of the logistic curve \( (x_{0} ) \) was taken as the point of subjective simultaneity (PSS), and the standard deviation (b) was taken as a measure of the just noticeable difference (JND), which provides an index of precision
$$ y = \frac{100}{{1 + e^{{ - \left( {\frac{{x - x_{0} }}{b}} \right)}} }}\%. $$
(2)
Statistical analysis included one-sample t-tests for the PSSs of each condition to confirm significant deviations from an SOA of 0 (i.e., the point of true simultaneity). A one-way repeated-measures ANOVA was carried out to examine differences in the PSSs and JNDs due to temporal envelope shape and duration of the auditory stimuli. Bonferroni’s adjustments were made for pairwise comparisons between means. For the data in which the normal distribution could not be assumed, a nonparametric Friedman’s test was employed.
Results
Differences in PSS
The average PSSs derived from TOJs for active head movement paired with brief (−73.0 ms, s.e. 18.4), long square (−99.8 ms, s.e. 16.9) and long raised-cosine (−114.7 ms, s.e. 24.2) sounds are shown in Fig. 2a. Pairwise comparison tests confirmed that the significant effect of sound type on the PSS (F
(2,28) = 3.5, p = 0.043) was driven by the long raised-cosine sound, which was significantly different from brief square sounds (p = 0.042) but not from the long square sounds (p = 0.358). The difference in PSS between brief and long square sounds was not significant (p = 0.196). All PSSs were significantly different from true simultaneity (one-sample t-tests, all p < 0.001), confirming that head movement must precede all sound types in order to be regarded as simultaneous.
Differences in JND
JNDs were not normally distributed (Shapiro–Wilk test, p < 0.05). The median JNDs derived from TOJs for active head movement paired with brief (44.35 ms, 25 % = 22.1, 75 % = 79.9), long square (66.5 ms, 25 % = 39.1, 75 % = 89.6) and long raised-cosine (95.0 ms, 25 % = 43.0, 75 % = 116.9) sounds are shown in Fig. 2b. Pairwise comparison tests confirmed that the significant effect of sound type on the JND (χ
2(2)
= 10.5, p = 0.005) was driven by the long raised-cosine sound, which was significantly different from brief (p < 0.05) but not from long square (p > 0.05) sounds. The difference between long square and brief sounds was not significant (p > 0.5). These results indicate that, in general, participants were less precise when judging the timing of sound of a continuously changing intensity.
Discussion
We originally speculated that the results of previous studies showing that vestibular stimulation must precede other sensory stimulation in order to be perceived as simultaneous (Barnett-Cowan and Harris 2009, 2011; Sanders et al. 2011) were attributable to the lacking equivalence in temporal envelope duration and shape of the brief pulses used and the longer vestibular signals. We more closely matched auditory stimuli to the vestibular signal and predicted that changing the stimuli to better match the vestibular signal would enhance participants’ ability to accurately perceive simultaneity and would therefore displace the PSS toward the point of true simultaneity. Instead of reducing this lead time, the time required for a head movement to precede an auditory stimulus actually increased by up to an additional 42 ms.
What can account for this additional lead time? Vos and Rasch (1981) posited a threshold model to understand the perceptual onset of musical tones. In order to perceive the onset of a tone, a certain perceptual threshold level, relative to the maximum amplitude, must be exceeded during the rise portion of the stimulus. An important factor influencing the timing of perceptual onsets is the rise time; for instance, if tones have simultaneous physical onsets but different rise times, the perceptual onsets will not occur simultaneously. A raised-cosine temporal envelope has a shallow slope, while a brief pulse has an extremely steep slope. According to the threshold model put forth by Vos and Rasch, the onset of the raised-cosine stimulus will be perceived as occurring later than the brief tone. As this can only lead to a pattern of results where the PSS would move toward true simultaneity, the threshold model cannot explain our results. Jaśkowski (1993), however, did find a curious but unexplained result where triangular stimuli that reached peak intensity earlier than mid-duration can be perceived as occurring before the onset of a square stimulus of equal duration. Although Jaśkowski did not provide an explanation for the effect, a mechanism responsible for it could also explain why a head movement would require additional lead time when preceding a raised-cosine stimulus to be synchronously perceived.
The influence of stimulus duration on processing time is more inconclusive in the literature. Jaśkowski (1991) showed that the onset of a shorter stimulus shifts toward the offset of its paired longer stimulus, causing a delay in processing of the shorter stimulus and thus a shift of the PSS (Jaśkowski 1991, 1992), but that this is largely diminished for discrepancies of more than 500 ms (Jaśkowski 1991). Efron (1970a, b, c), however, found no such effects of duration. Recently, Boenke et al (2009) attempted to resolve these inconsistencies in the literature. Contrary to the findings of Jaśkowski (1991, 1992) and consistent with Efron (1970a, b, c), their results established that duration does not change the PSS and therefore cannot account for discrepancies between auditory and visual processing. It should be noted, however, that when Boenke and colleagues assessed PSS values on an individual level, duration had an effect; however, the direction of this effect was not consistent across subjects, and thus, it is difficult to draw conclusions from this finding.
Given the inconsistency in the literature regarding the potential effects of rise time and duration on perceived temporal order, a second experiment was conducted by pairing the different sound stimuli with each other to assess whether possible significant differences between these stimuli could explain the results in experiment 1. In keeping with the results of Jaśkowski (1993), we predicted that a long square sound should be perceived as simultaneous with a long raised-cosine sound as the peak of the long raised-cosine sound occurs at the midpoint (i.e., not in the early portion) of the temporal envelope. We also predicted that a brief sound should be simultaneously perceived with a long square sound as would be expected by the observation of Jaśkowski (1991) that duration discrepancies greater than 500 ms should not affect the PSS.