Experimental Brain Research

, Volume 198, Issue 2, pp 221–231

Perceived timing of vestibular stimulation relative to touch, light and sound


    • Multisensory Integration Laboratory, Department of Psychology, Centre for Vision ResearchYork University
  • Laurence R. Harris
    • Multisensory Integration Laboratory, Department of Psychology, Centre for Vision ResearchYork University
Research Article

DOI: 10.1007/s00221-009-1779-4

Cite this article as:
Barnett-Cowan, M. & Harris, L.R. Exp Brain Res (2009) 198: 221. doi:10.1007/s00221-009-1779-4


Different senses have different processing times. Here we measured the perceived timing of galvanic vestibular stimulation (GVS) relative to tactile, visual and auditory stimuli. Simple reaction times for perceived head movement (438 ± 49 ms) were significantly longer than to touches (245 ± 14 ms), lights (220 ± 13 ms), or sounds (197 ± 13 ms). Temporal order and simultaneity judgments both indicated that GVS had to occur about 160 ms before other stimuli to be perceived as simultaneous with them. This lead was significantly less than the relative timing predicted by reaction time differences compatible with an incomplete tendency to compensate for differences in processing times.


AuditionGalvanic vestibular stimulationMultisensorySimultaneity judgmentsTemporal order judgmentsTouchVestibularVision


Knowing whether or not the different components that make up an event occur at the same time is not as easy as it may seem. Each stage of processing sensory information takes a certain amount of time that is unique for each sensory modality. Several psychophysical and physiological studies have shown how the temporal processing of tactile, visual, and auditory signals are different from each other but that these differences may be at least partially accounted for by the brain in perceiving the relative timing of the multisensory components of an event occurring at the same time. Here we explore the nature of how the brain temporally processes vestibular signals in comparison with tactile, visual, and auditory signals.

Based on what is known about the transduction of stimuli presented to the different senses some preliminary predictions can be made about the perceived relative timing of the different components of multisensory events. The transduction of vestibular and auditory stimuli is extremely fast due to the kinetics of hair cells which have latencies of approximately 40 μs (Corey and Hudspeth 1979). Similarly, the transduction of tactile stimuli is very fast with transduction latencies ranging from 500 μs to 2.6 ms (Alvarez-Buylla and de Arellano 1952). In contrast, the transduction latencies of photoreceptors are 15–93 ms (Kuffler 1953). Thus, if transduction time were the only differential delay among the senses we would predict that vestibular, auditory and tactile stimuli presented at the same time should be perceived as simultaneous but vestibular, auditory, and touch stimuli should all need to be delayed relative to a visual stimulus in order to be perceived as simultaneous with it. However, in addition to differences in transduction latencies (King and Palmer 1985; Pöppel et al. 1990) asynchrony among sensory signals can also be associated with differences in axonal length (Bekesy 1963; Bergenheim et al. 1996; Harrar and Harris 2005), stimulus intensity (Roufs 1963; Wilson and Anstis 1969; Craig and Baihua 1990; Diederich 1995), and attention (Spence et al. 2001). How then can the brain correctly assess whether different attributes of an event occur at the same time?

There have been many attempts to answer this question by identifying the amount of asynchrony required for stimulus pairs to appear simultaneous. Temporal order judgments (TOJs) are a common method of measuring perceptual latency. TOJ experiments have found that a visual stimulus will often need to precede a sound presented within a few meters in order for the pair to be perceived as synchronous (Hirsh and Sherrick 1961; Engel and Dougherty 1971; Jaskowski et al. 1990; Kopinska and Harris 2004; Keetels et al. 2007; Jaekl and Harris 2007; Harrar and Harris 2008, see van Eijk et al. 2008 for a recent review), that a touch must precede a light (Hirsh and Sherrick 1961; Spence et al. 2001, 2003; Harrar and Harris 2005, 2008; Shore et al. 2006) and that a touch must precede a sound (Hirsh and Sherrick 1961; Zampini et al. 2005; Navarra et al. 2007; Harrar and Harris 2008). Although the perceived latencies derived from reaction times (RTs) may not directly predict TOJs (Rutschmann and Link 1964; Jaskowski et al. 1990) it seems that, in general, differences between processing times is usually reflected in the perceived relative timing of sensory events. We now wish to investigate how vestibular processing fits into this model. Differences in processing times would suggest that a visual stimulus would need to be presented substantially before a vestibular stimulus in order for the stimuli to be perceived as simultaneous but that auditory stimuli within a few meters or tactile stimuli on the hand would need to be presented at approximately the same time as a vestibular stimulus (Bekesy 1963; Bergenheim et al. 1996; Harrar and Harris 2005).

When information in different senses is regarded as coming from a single event the brain is sometimes able to compensate for the processing time differences (Engel and Dougherty 1971; Sugita and Suzuki 2003; Kopinska and Harris 2004) to create a veridical perception of simultaneity independent of differences in the timing of the individual signals. The perception of true simultaneity despite differences in processing times between modalities is a form of perceptual constancy called “simultaneity constancy” (Kopinska and Harris 2004). Whether a simultaneity constancy mechanism might contribute to temporal processing of vestibular signals is unknown.

There has been some investigation into whether vestibular signals might affect the spatial location of other sensory targets. Results from these studies depend on the method used to generate vestibular signals. Caloric-vestibular stimulation (by injection of iced water into the auditory canal) will shift the subjective straight ahead in the direction of the stimulated ear while the perceived location of visual targets will move in the opposite direction (Lewald and Karnath 2000, 2001). The perception of rhythm varies with vestibular stimulation produced by self-motion (Israël et al. 2004; Capelli et al. 2007; Capelli and Israel 2007). While vestibular stimulation may bias attention which in turn might affect the perceived onset difference among pairs of stimuli from other senses (Figliozzi et al. 2005), the perceived timing of vestibular signals relative to other sensory stimuli has not been directly assessed. Those studies that have indirectly investigated the perceived timing of vestibular stimuli have done so by measuring perceived phase shifts between vestibular and visual motion. Shifts in phase between vestibular and large field visual motion are not noticeable when visual motion lags vestibular motion by up to 133 ms (for movement of 2°/s at 1 Hz; Grant and Lee 2007). Finally, there is some evidence to suggest that the perception of time itself is disturbed by vestibular input. The perceived timing of vestibular stimulation compared to other stimuli however has not been directly assessed.

In addition to using TOJs to measure perceived simultaneity, synchronicity judgments (SJs) have also been a useful psychophysical approach. While both approaches measure perceived simultaneity they often yield different just noticeable differences (JNDs) (Mitrani et al. 1986; Schneider and Bavelier 2003; Vatakis et al. 2008) reflecting less uncertainty for making TOJs compared to SJs. Thus, only by measuring both TOJs and SJs can one fully assess the perceived timing of sensory stimuli compared with each other.

In this paper we dissociate the vestibular signal from other aspects of movement by stimulating the vestibular system directly using galvanic vestibular stimulation (GVS) (Buys 1909; Goldberg et al. 1984; Fitzpatrick and Day 2004). GVS is administered by delivering a controlled current through electrodes placed over the mastoid processes and will typically evoke an illusory head movement (see Fitzpatrick and Day 2004 for a review). We first measure simple RTs to our vestibular, auditory, visual and tactile stimuli. From these RTs we make predictions about the relative timing of these stimuli necessary for them to appear simultaneous. We then present GVS-touch, GVS-light and GVS-sound stimulus pairs and use TOJs and SJs to compare the measured and predicted temporal relationships for the perception of simultaneity.

General methods


Ten participants (six males, four females) aged 24–45 years participated in this study and gave their informed written consent according to the guidelines of the York University Research Ethics Board. Participants reported having no auditory, visual, vestibular or other neurological disorders. Participants received no feedback regarding their performance in any of the experiments. All participants were paid $10/h for participation.

Galvanic vestibular stimulation

Vestibular stimuli were generated by a GVS system (Good Vibrations Engineering Ltd., Nobleton, Ontario, Canada). Electrodes were positioned over the mastoid process behind each of the participants’ ears with a reference electrode positioned on the forehead (Fig. 1a). The electrodes were 1.25″ diameter round carbon-conductor electrodes (9000 series electrodes; Empi Recovery Sciences, St. Paul, Minnesota, USA). The GVS system could be triggered by the experimenter and a copy of the signal sent to the electrodes was passed through a custom-built interface including an opto-isolator and recorded at 250 Hz by a Cambridge Electronic Design 1401 computer system (CED1401, Cambridge, England). The CED1401 interface box was controlled by a PC and was also used to control presentation of all stimuli and to record responses. The GVS system was armed to deliver one 1,200 ms cycle of alternating positive and negative Gaussian waveforms ±2.5 mA that were out of phase between the ears (Fig. 1b). This stimulation, which was similar to that used by Trainor et al. (2009), induced illusory side-to-side head movement or illusory head rotation about the longitudinal body axis. The head actually remained stationary as observed by the experimenter. In addition to the illusory head movement evoked by GVS, our participants also experienced a tingling sensation at the site of the electrodes. This sensation has previously been reported (e.g. Lobel et al. 1998; Trainor et al. 2009) and is the result of the percutaneous current being directly applied to the skin. In pilot tests, we found that participants perceived the tingling sensation as being distinctly out of phase with perceived illusory head movement with reaction times to the tingling sensation occurring well before reaction times to illusory head movement.
Fig. 1

a GVS electrode placement demonstrated on the senior author (details see text). b GVS electrode current plotted as a function of time for each electrode relative to the reference electrode on the forehead. Current values were updated every 25 ms

Touch stimulation

Touch stimuli consisted of 50 ms bursts of 200 Hz vibration (bone conduction vibrators, Oticon/Phonic Ear Ltd. BC, 462 BE 3 PIN, Mississauga, Canada). The tactile vibrator was held between the index finger and thumb of the right hand. The tactile vibrator was driven by a 3311A Function Generator (Hewlett Packard, Palo Alto, California, USA) which was triggered using a relay tripped by a signal from the CED1401 interface box controlled by the PC computer.

Light stimulation

Light stimuli used in assessing perceived temporal delays among multisensory stimuli are typically delivered using LEDs. This presented a problem with the present study as LEDs can appear to move during illusory head motion (Taylor and McCloskey 1991) and can thus provide conflicting information concerning the perceived timing of vestibular stimulation. To eliminate the possibility of using this visual cue, participants sat under a hemispherical dome (Fig. 2a) and received a diffuse flash of white light. To deliver the flash, we used an externally triggered digital stroboscope (Shimpo model DT-315A, Itasca, Illinois, USA) equipped with a xenon flash lamp with a rise time to 100% light output of 20 μs and a similar decline resulting in a total flash duration of about 40 μs. The strobe was mounted on top of the 6 mm thick white plastic dome (57 cm radius) with the light directed towards the top of the participant’s head (Fig. 2a). The light diffused throughout the plastic dome. The strobe light was triggered using the CED1401 interface box controlled by a PC. Participants sat with their eyes 30 cm from the side wall of the plastic dome which had a mean illuminance of 4 lux while the flash was off and a mean illuminance 283 lux during presentation of the flash as measured using a calibrated photocell and a Minolta™ illuminance meter T-10.
Fig. 2

Apparatus. a Participants sat inside a plastic hemisphere. A strobe light was mounted pointing downwards to provide a diffuse flash of light as required. b Participants wore headphones for presentation of sound stimuli. c Participants held a vibrotactile stimulator between their thumb and index finger for presentation of touch stimuli

Sound stimulation

Sound stimuli consisted of 50 ms bursts of 2,000 Hz, 73 db tones generated using the CED1401 interface box controlled by a PC and delivered using headphones (Grado Labs SR-80, Brooklyn, New York, USA). Participants wore ear plugs during all tasks in order to mask noise generated by the strobe light and the tactile vibrator. All participants reported that they could hear the sound stimulus while not being able to hear the other equipment.

Response buttons

For RT trials, participants pressed a button using their left hand. For temporal order and synchronicity judgments, participants lifted their feet from foot pedals under the left and right foot.

Experiment 1

Measuring reaction times

The difference between the RTs to GVS, touch, light and sound stimuli presented alone provides a crude and indirect estimate of their relative processing times (Exner 1868; Kopinska and Harris 2004). Differences in RTs predict the delay time between a pair of stimuli that needs to be added for them to appear simultaneous (Gibbon and Rutschmann 1969; Kopinska and Harris 2004). We therefore measured RTs to individual stimuli.


RTs were collected in two separate blocks. In the GVS RT trials, stimuli were triggered manually by the experimenter in response to a signal light presented by the CED1401 with an inter-trial interval that varied between 500 and 1,500 ms. Participants were required to press a button as fast as they could relative to the onset of illusory head movement while keeping their eyes closed to ensure that responses were not based on visual feedback arising from any compensatory eye movements that might be evoked from GVS. RTs outside 100–1,100 ms were excluded. GVS and RTs were recorded via the CED1401 on a PC computer. Data collection took no longer than 5 min to complete 30 RT trials.

For touch, light and sound RT trials, stimuli were randomly interleaved. Participants were required to press a button as fast as they could relative to the onset of any stimulus. Participants kept their eyes open during these trials. The inter-trial interval was varied between 500 and 1,500 ms. RTs outside 100–500 ms were excluded. Data collection took less than 10 min to complete 90 RT trials (30/modality).

Statistical analysis comprised of a 4(Modality: GVS/touch/light/sound) × 1(RT) repeated measures ANOVA to determine differences in RTs among stimuli. A 4(Modality: GVS/touch/light/sound) × 1(RT) repeated measures ANOVA to determine differences in RT standard deviations among stimuli. Bonferroni adjustments were made for pairwise comparisons between means.


The RTs to GVS, touch, light and sound are shown in Fig. 3. A significant effect of modality was found (F(3,7) = 13.55, p = 0.003). RTs to GVS (mean = 438 ms, SE = 50 ms) were longer than RTs to touch (p = 0.017), light (p = 0.008) and sound (p = 0.004). RTs to touch (mean = 245 ms, SE = 14 ms) were longer than RTs to light (p = 0.017) and sound (p = 0.003). Finally, RTs to light (mean = 220 ms, SE = 14 ms) were longer than RTs to sound (mean = 197 ms, SE = 13 ms; p = 0.049).
Fig. 3

Average reaction times (RTs) to GVS, touch, light and sound; error bars are standard errors. Significant differences are indicated by asterisks (*p < 0.05, **p < 0.01)

A significant effect of modality was found among the RT standard deviations for GVS (mean σ = 85.2 ms, SE = 11.4 ms), touch (mean σ = 54.7 ms, SE = 6.8 ms), light (mean σ = 55.5 ms, SE = 4.7 ms) and sound (mean σ = 64.8 ms, SE = 6.6 ms) (F(3,7) = 5.106, p = 0.035), however, with only the standard deviations for GVS being significantly greater than for touch (p = 0.028).


RTs to GVS were slower than RTs to touches, lights and sounds by 197–241 ms. This large difference in RT between GVS and the other sensory stimuli predicts that for vestibular stimulation to appear simultaneous with any other stimulus GVS onset must precede the onset of other stimuli by approximately 220 ms unless timing differences among the senses are compensated for in the brain by some kind of simultaneity constancy mechanism.

We used a Gaussian vestibular signal of 1,200 ms duration to approximate a natural head movement signal. However, the other sensory signals were square wave signals of 50 ms duration. In order to control for this we had five of the participants perform 30 RTs to a 50 ms square wave GVS stimulus consisting of +2.5 mA presented over the left mastoid and −2.5 mA over the right mastoid. RTs to the square wave vestibular stimulus (mean = 401 ms, SE = 91 ms) were not significantly different than RTs to the Gaussian signal (mean = 409 ms, SE = 85 ms) (t(1,4) = 0.403, p = 0.708). RTs to the two vestibular signals were highly correlated with each other (r = 0.979, p = 0.004). These results confirm that the observed slow RT to GVS is not a methodological artifact of the signal we initially chose to use.

RTs to GVS were collected in a separate session from RTs to touch, light and sound, and hence were collected under focused rather than divided attention. Could this explain discrepancies in RT? Spence et al. (2001) demonstrated that attention speeds reaction time to the attended stimulus in accordance with the law of prior entry (Titchener 1908). This would suggest that RTs to GVS could potentially be even slower than reported here if collected under divided attention. Thus, although attention undoubtedly affects RT to GVS, it does not account for why RT to GVS is so slow compared to RTs to touch, light or sound. Why RTs to GVS might be so slow is further discussed in the general discussion of this paper.

We found that reaction times to touch were significantly longer than reaction times to light and sound. This is contrary to Harrar and Harris (2008) who found that reaction times to touch were faster than reaction times to light and sound. Further, temporal order judgment experiments have found that that touch is generally perceived faster than light (Hirsh and Sherrick 1961; Spence et al. 2001, 2003; Harrar and Harris 2005, 2008; Shore et al. 2006) and sound (Hirsh and Sherrick 1961; Zampini et al. 2005; Navarra et al. 2007; Harrar and Harris 2008). The reaction time to touch that we report here of 245 ms is similar to the 235 ms measured by Harrar and Harris (2008) (RT values, as opposed to RT differences, confirmed through personal correspondence with Harrar). However, the reaction times to sound and light (197 and 220 ms respectively) were much faster than those reported by Harrar and Harris (2008): 253 and 249 ms respectively. Harrar and Harris (2008) also used a divided attention paradigm in which subjects were asked to respond to a stimulus, but did not know which modality would be stimulated. Thus attention differences (Spence et al. 2001) could not be the explanation for this difference. Rather, we attribute this disparity to differences in stimulus intensity among studies (Roufs 1963; Wilson and Anstis 1969; Craig and Baihua 1990; Diederich 1995). For example, reaction time to light decreases with increases in luminance (Rains 1963; Schiefer et al. 2001). Thus it is not surprising that the reaction time to light as reported by Harrar and Harris (2008), which was in response to an LED in a lit room, is longer than in response to a strobe flash filling most of the visual field in an otherwise dark room. We attribute the disparity in reaction times to sound to white noise that was played in addition to sound stimuli by Harrar and Harris (2008).

Experiment 2

Temporal order and synchronicity judgments

To test whether the results of the RT experiments predicted the relative timings of GVS relative to other stimuli necessary for them to be perceived as simultaneous, we ran a series of TOJ and SJ tasks. Stimuli consisted of GVS stimulation paired with touch (GVS-touch), light (GVS-light) or sound (GVS-sound).


Participants sat in a chair, held the tactile stimulator for GVS-touch trials, wore earphones for GVS-sound trials, and sat within the light dome for GVS-light trials. Participants were allowed to take as long as they needed to make their judgments and responded using foot pedals. Data collection took approximately 10 min for each trial block (120 trials). A total of six blocks were conducted for GVS-touch, GVS-light and GVS-sound, TOJ and SJ trials. GVS stimuli were triggered manually by the experimenter in response to a signal light presented by the CED1401 with an inter-trial interval that varied between 500 and 1,500 ms. Touch, light or sound stimuli were subsequently triggered by the CED1401 with a random stimulus onset asynchrony (SOA) between 300 and 1,000 ms after signal light offset. Because the GVS onset had an average onset time of about 430 ms after the signal light offset, the other stimuli occurred within a range of about 130 ms before to 570 ms after the GVS onset. The order of all conditions, including the RT blocks of trials in “Experiment 1”, was randomized across all participants and testing occurred over the course of several non-consecutive days. Participants were required to take a break of at least one hour after completing two trial blocks. Participants kept their eyes closed during GVS-touch and GVS-sound trials to ensure that responses were not based on visual feedback arising from any compensatory eye movements that might have been evoked from GVS. Participants kept their eyes open during GVS-light trials.

For TOJ trials, participants were asked to answer the two alternative forced-choice question: “Which stimulus appeared first?” Participants responded by lifting their left foot to indicate ‘touch, light or sound first’, or their right foot to indicate ‘head movement first’. For SJ trials, participants were asked the two alternative forced-choice question: “Were the stimuli synchronous or not?” Participants responded by lifting their right foot to indicate ‘synchronous’, and their left foot to indicate ‘asynchronous’.

Data analysis

For TOJs and SJs, the percentage of trials on which a particular stimulus was chosen was plotted as a function of SOA. Using SigmaPlot 9.0 a two-parameter, cumulative Gaussian (Eq. 1) was fitted to TOJ data and a three-parameter, Gaussian (Eq. 2) was fitted to SJ data.
$$ y\, = \frac{100}{{1 + e^{{ - \left( {\frac{{x - x_{\text{o}} }}{b}} \right)}} }}\% $$
$$ y = \;ae^{{\left( { - 0.5\left( {\frac{{x - x_{\text{o}} }}{b}} \right)^{2} } \right)}} . $$

The inflection points of the cumulative Gaussians (x0 for TOJs, Eq. 1) or the peaks of the Gaussians (x0 for SJs, Eq. 2) were taken as the point of subjective simultaneity (PSS). The standard deviation (b) was taken as the JND. RT predictions were derived by taking RT values (from “Experiment 1” above) for GVS away from the RT values for touch, light or sound separately (i.e., negative means GVS took longer).

Statistical analysis comprised of a series of one-way t tests for each PSS value relative to an SOA of 0 ms to confirm significant differences from true simultaneous presentation of stimuli. A 3(Task: RT PSS prediction, TOJ PSS and SJ PSS) × 3(Modality: GVS-touch, GVS-light, GVS-sound) repeated measures ANOVA was used to determine differences in PSS among GVS-paired stimuli from predictions from RTs and across measures, and a 2(Task) × 3(Modality) repeated measures ANOVA was used to determine differences in JND among GVS-paired stimuli and across measures. Bonferroni adjustments were made for pairwise comparisons between means.


The results of TOJs and SJs made for GVS-touch, GVS-light, and GVS-sound stimulus pairs are shown in Fig. 4a. Psychometric functions fitted to each participant’s data are plotted as well as the group average. Note that for all conditions, the PSS (indicated by the solid vertical line) is displaced from true simultaneity (0 ms; dashed vertical line) in the negative direction. This means that GVS needed to precede touch, light or sound stimuli by approximately 160 ms in order for the pair to be perceived as simultaneous.
Fig. 4

a Average TOJ cumulative Gaussian (top row) and SJ Gaussian (bottom row) curves. The three columns are arranged according to stimulus pair (GVS-touch, GVS-light and GVS-sound), where positive and negative SOA values indicate which of the stimuli was presented first, as shown by the inserted cartoons. The individual participants’ curves (grey lines) are best fits through the means of the percentage of times one stimulus was perceived to be first, plotted as a function of SOA. The thickerblack curves are reconstructed from the average PSS and JND of the ten participants. The solid vertical lines represent the average PSS. The dashed vertical lines represent the point of true simultaneity (SOA = 0 ms). The dotted vertical lines represent the predicted PSS from differences in RTs. b PSS data from a plotted as a function of SOA with standarderror bars. c JND data from a plotted as a function of SOA with standard error bars

Differences in PSS

The PSSs derived from TOJs and SJs for GVS-touch, GVS-light, and GVS-sound pairs are shown in Fig. 4b, where they are compared to predictions derived from the RTs obtained in “Experiment 1”. The RT PSS predictions for GVS-touch, GVS-light and GVS-sound pairs were all significantly negative. The TOJ PSS for GVS-touch, GVS-light and GVS-sound pairs were also all significantly negative indicating that GVS had to occur first before any of the other three stimuli in order to be perceived as simultaneous. The SJ PSSs for GVS-touch and GVS-light pairs were significantly negative but the GVS-sound pair was not significantly different from true simultaneity. These results and descriptive statistics are presented in Table 1. Results from the repeated measures ANOVA found no main effect of task (RT prediction vs. TOJ vs. SJ) or of modality (touch vs. light vs. sound). No significant interaction or pairwise comparisons were observed. These results indicate that in general GVS must be presented before a touch, light or a sound by approximately 160 ms in order for the stimulus pair to be perceived as simultaneous. This is true for both TOJs and SJs and was approximately predicted from RT differences.
Table 1

Mean points of subjective simultaneity (PSSs) in milliseconds (negative indicates GVS first) with standard errors and one-way t tests relative to 0 ms for GVS-touch, GVS-light and GVS-sound pairs for TOJs and SJs



Mean PSS (ms)

SE (ms)










































Differences in JND

The mean JNDs derived from TOJs and SJs for GVS-touch, GVS-light, and GVS-sound pairs are compared in Fig. 4c. Descriptive statistics are summarized in Table 2. A significant main effect was found for task (F(1,9) = 15.98, p = 0.003) but not for modality. This indicates that in general SJ JNDs were higher than TOJ JNDs for GVS-other stimulus pairs and that this effect did not significantly change across sensory modalities.
Table 2

Mean just noticeable differences (JNDs) in milliseconds with standard errors for GVS-touch, GVS-light and GVS-sound pairs for TOJs and SJs



Mean JND (ms)

SE (ms)






















This is the first study to compare the perceived timing of vestibular stimulation directly with that of touch, light and sound stimuli using reaction times, temporal order judgments and judgments of simultaneity. All these measures indicated that the time to perceive vestibular stimulation is much longer than it is for the other senses. Reaction times to vestibular stimulation were 197–245 ms longer than they were to touches, lights and sounds. Correspondingly, vestibular stimulation had to be delivered substantially before other stimuli in order to be perceived as simultaneous with those stimuli.

Why is the perception of vestibular stimulation so slow?

The slow time to respond to galvanic stimulation (438 ± 49 ms) was not expected from the extremely fast vestibular transduction times and the very short time that it takes to generate reflexive eye and postural correction movements (as little as 20 ms, Lorente de No 1933). Despite this, it would seem that vestibular signals are not accessible to temporal perception as quickly as other sensory signals. In contrast to these slow detection times, the latency of responses in the cortex after electrical stimulation of the vestibular nerve is only about 6 ms (de Waele et al. 2001). It is this expediency in arriving at the cortex that enables vestibular signals to be available to update a world-centered frame of reference that continuously modulates visual responses in area 7a of the parietal cortex whenever the head moves (Snyder et al. 1998). Zeki (1998) suggested that conscious awareness of a stimulus is dependent upon the activity of the region of the brain that represents the stimulus. There is ample evidence of the existence of vestibular cortical areas in both primates (Guldin and Grüsser 1998) and closely corresponding areas in humans found using GVS and fMRI (Bucher et al. 1998; Lobel et al. 1998; Brandt and Dieterich 1999; Bense et al. 2001). In fact, vestibular information is simultaneously processed in parallel in different cortical areas (de Waele et al. 2001). However, it may be that Zeki’s rule does not apply to this distinctive sense. Angelaki and Cullen (2008) suggested that “because of the strong and extensive multimodal convergence with other sensory and motor signals, vestibular stimulation does not give rise to a separate and distinct conscious sensation” (Angelaki and Cullen 2008, p. 126). The present experiments were designed explicitly to remove such converging signals which therefore might explain the long delays in this unnatural case.

Another reason for the long vestibular delay may relate to the fact that the vestibular signal coming from the end organ is one of velocity despite the vestibular system being an acceleration transducer (Fernandez and Goldberg 1971). Perceptual processes require knowing where the head is rather than its velocity and this requires integration across time. So one factor that might contribute to the perceptual delay of vestibular signals is the sampling time of this integration.

Partial compensation for processing time differences

In order for the brain to reconstruct the actual time of a multisensory event from information arriving at the senses with various latencies, some allowances must be made for the variable delays among sensory signals. To accomplish this task, some neural mechanism must exist that is capable of resynchronizing asynchronous signals and that underlies our ability to perceive simultaneity correctly. The ability to perceive simultaneous events correctly despite sensory variation is known as simultaneity constancy (Kopinska and Harris 2004).

Here we found, when vestibular stimulation was paired with simultaneous touch, light or sound stimuli, the vestibular stimulation was inevitably perceived as following the other stimuli. To appear simultaneous the vestibular stimulation had to be delivered first by about 160 ms: a shorter time than predicted by the differences in simple reaction times to vestibular and other stimuli. This therefore suggests a partial compensation for the sensory processing differences between vestibular and other senses: a move towards correct perception of simultaneity.

This partial compensation is shown in Fig. 5 where each participant’s PSS for a particular stimulus pair is plotted as a function of the PSS predicted from the RT difference for both TOJs and SJs. Figure 5 compares the simultaneity constancy hypothesis prediction with the no-compensation hypothesis prediction for any given stimulus pair. Simultaneity constancy requires that true simultaneity is correctly perceived (i.e., PSS = 0) despite variations in neural processing times. No compensation predicts that the PSS will depend entirely on neural processing times, and that PSS should therefore be equal to the RT differences and result in a slope of 1. The regression lines for TOJs and SJs had slopes of 0.49 and 0.36 and regression coefficients of 0.24 (p = 0.006) and 0.13 (p = 0.054), respectively. This suggests a compensation for the differences between the perceived timing of vestibular processing and that of other stimuli of between 51 and 64%.
Fig. 5

PSSs (negative indicates GVS first) of GVS paired with touch, light and sound from TOJ (white dots) and SJ (black dots) plotted as a function of RT differences (other stimulus RT minus GVS RT) plotted for all participants. Two predictions are shown: “no compensation”, in which the PSS is directly predicted from the RT differences (slope = 1), and “complete compensation”, in which the PSS is unaffected by RT differences (slope = 0). The regressions through the TOJ (dashed grey line) and SJ (dashed black line) data had slopes of 0.49 and 0.36, respectively suggesting partial compensation

Why only a partial compensation? The pairs of stimuli used in this experiment were not chosen to have any natural connection to each other. The only reason that the flashes and beeps might have been considered as part of a ‘multisensory event’ including the vestibular stimulation was because of their close temporal relation. The vestibular system responds to head movement and so onset of movement of other stimuli, especially when synergistic with an actual or simulated head movement, may provide situations where stimuli are more easily grouped. If vestibular stimulation were accepted as part of a genuine multisensory event, more temporal compensation towards the correct interpretation of simultaneity might occur. For example, Trainor et al. (2009) recently demonstrated that GVS can disambiguate unaccented auditory rhythm patterns as analogous to natural physical movement (Phillips-Silver and Trainor 2005, 2007).

Another reason why the difference in timing between vestibular and other stimuli might be only partially compensated is the size of the challenge faced by the simultaneity constancy mechanism. Indeed, the RT differences reported here may be an underestimate of processing time differences (see “Experiment 1” discussion above).

Comparison of simultaneity and temporal order judgments

Simultaneity judgments had significantly higher JNDs than temporal order judgments. This result has also recently been reported when judging temporal misalignments of audiovisual speech patterns (Vatakis et al. 2008). The larger JNDs associated with simultaneity judgments indicates that there is a range of time differences which are regarded as simultaneous, but can nonetheless be assigned correct temporal order. This is consistent with temporal order judgments being made first, but then needing to meet a higher criterion before the stimuli are considered simultaneous (Allan 1975). In this regard, the perception of vestibular stimulation appears to be subject to the same treatment as other stimuli.

GVS versus natural head movements

During natural head movements, proprioceptive information about head movement is provided not only by the vestibular system, but also by proprioceptive organs in the neck muscles and joints (Biguer et al. 1988; Roll et al. 1991; Taylor and McCloskey 1991; Fitzpatrick and Day 2004). While GVS provides a means of stimulating the vestibular system directly, it simultaneously stimulates all receptors from the otoliths and semicircular canals in a rather non-ecological manner. Despite the unnatural signal provided by GVS, it can evoke compensatory eye movements (Pfaltz 1967; Brantberg and Magnusson 1990; Aw et al. 2006) as well as balance responses throughout the body (Lund and Broberg 1983; Day et al. 1997) causing participants to sway towards the anodal side of stimulation when standing and illusory movement when seated (see Fitzpatrick and Day 2004 for a review). We did not control for head motion by restraining the head as this might reduce the intensity of illusory movement (Lobel et al. 1998) and generate secondary sensory feedback regarding head motion from, for example, resistance against a bite bar. This raises some concern regarding the additional information conveyed by evoked actual head movements. EMG responses to GVS are significantly reduced (Day et al. 1997) and perceived illusory motion is enhanced (Fitzpatrick and Day 2004) when participants are seated as opposed to standing. Although we did not observe head movements in response to our GVS, if small head corrections were evoked then the EMG activity associated with them would occur within 6.7–9.8 ms (Watson and Colebatch 1998). Thus, if EMG activity is involved in perceiving illusory head movement, then vestibular and EMG signals would be closely linked in time and long-latency perception evoked from either of these could potentially be confused. The perceived timing of the natural stimuli associated with physical head movement or direct neck muscle stimulation should also be investigated in order to confirm whether such a large latency in the perceived onset of head motion is still found.

Practical implications

That vestibular stimulation goes unperceived so much longer than stimulation of the other senses may have practical applications in calibrating virtual reality environments and vestibular prostheses. We have shown that the perception of vestibular stimulation lags behind vision by 120–160 ms, which is comparable to the 133 ms vestibular phase error threshold observed in virtual reality experiments (Grant and Lee 2007). The unexpected delay in the perceived timing of vestibular stimulation, despite activity occurring in the cortex considerably before perceptual reports, represents an important caveat when interpreting brain activity thought to underlie perception.


This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). M. Barnett-Cowan was supported by a PGS-D3 NSERC Scholarship and a Canadian Institutes of Health Research Vision Health Science Training Grant. Our thanks go to Michael Jenkin for technical assistance, to Jeff Sanderson who helped conduct experiments and to David Shore for comments on this project.

Copyright information

© Springer-Verlag 2009