Storing upright turns: how visual and vestibular cues interact during the encoding and recalling process
Many previous studies have focused on how humans combine inputs provided by different modalities for the same physical property. However, it is not yet very clear how different senses providing information about our own movements combine in order to provide a motion percept. We designed an experiment to investigate how upright turns are stored, and particularly how vestibular and visual cues interact at the different stages of the memorization process (encoding/recalling). Subjects experienced passive yaw turns stimulated in the vestibular modality (whole-body rotations) and/or in the visual modality (limited lifetime star-field rotations), with the visual scene turning 1.5 times faster when combined (unnoticed conflict). Then they were asked to actively reproduce the rotation displacement in the opposite direction, with body cues only, visual cues only, or both cues with either the same or a different gain factor. First, we found that in none of the conditions did the reproduced motion dynamics follow that of the presentation phase (Gaussian angular velocity profiles). Second, the unimodal recalling of turns was largely uninfluenced by the other sensory cue that it could be combined with during the encoding. Therefore, turns in each modality, visual, and vestibular are stored independently. Third, when the intersensory gain was preserved, the bimodal reproduction was more precise (reduced variance) and lay between the two unimodal reproductions. This suggests that with both visual and vestibular cues available, these combine in order to improve the reproduction. Fourth, when the intersensory gain was modified, the bimodal reproduction resulted in a substantially larger change for the body than for the visual scene rotations, which indicates that vision prevails for this rotation displacement task when a matching problem is introduced.
KeywordsSelf-motion Yaw rotations Spatial orientation Vestibular Multisensory integration
Humans move around in a world where their different senses provide a set of cues about self-motion that allows them to have good estimations of changes in body position and heading over time. The spatial information contained in each of these cues is often derived from the same physical property of the environment, and therefore can be redundant. For instance, passive upright turns create a rotation of the visual and acoustic scene sensed by the visual and auditory system, respectively (external signals), together with a set of inertial-related cues sensed by the vestibular system and the somatosensory receptors (internal signals). In recent years, technological advances have allowed an increased number of studies on the way these signals are combined by the central nervous system in order to correctly perceive self-motion in space. However, only a few studies have assessed how sensory cues might interact during the memorization process. This paper focuses on the encoding and recalling stages of the self-motion memorization process using a reproduction task. The encoding stage consisted in the storage of an upright turn presented passively, and the recalling stage consisted in remembering the stored turn to reproduce it actively. Specific manipulations were done in order to assess how distinct sensory cues available during the rotations might interact during the encoding stage, and combine to improve the reproduction during the recalling stage.
Self-motion perception with vision
Since the early 1950s, many studies have emphasized how processing optic flow alone provides humans with a very efficient mechanism to guide their movements in a stable environment. Although early work emphasized the importance of heading retrieval (Warren et al. 1991a, b), a general steering heuristic requiring far less complex processing of optic flow has been proposed (Wann and Land 2000). The discrimination of the traveled distance between two simulated visual translations can be performed with great accuracy, even when the velocity profile and duration vary (Bremmer and Lappe 1999; Frenz et al. 2003). Nevertheless, when testing a reproduction task, the same authors found significant overshoots for distances of up to 8 m and undershoots for larger distances (Bremmer and Lappe 1999; Frenz and Lappe 2005). In a recent study, using a passive velocity profile controlled by computer for the reproduction task, this range effect was largely cancelled when the visual stimulation was identical during the encoding and the recalling stages (Mossio et al. 2008). Furthermore, the large inter-subject differences reported earlier were explained in this work by highlighting the use of distinct strategies among the subjects. Indeed, some could cope with velocity profile manipulations exhibiting an integration process independent of the profile, while others could not reproduce accurately when the velocity profile was manipulated. In the present study, we will test the capacity to reproduce turns with an active control of the velocity profile. Therefore, we assumed that either subject would cope with the velocity profile they use in order to perform the task, or they would try to reproduce the same velocity profile as during the encoding. Finally, for more complex visually simulated movements in which heading is not tangent to the trajectory, combining an initial vestibular stimulation allows humans to accurately disambiguate the analysis of optic flow (Bertin et al. 2000; Bertin and Berthoz 2004). This result provides evidence for a multisensory advantage for the self-motion analysis.
Self-motion perception with the vestibular system
In complete darkness, the acceleration signals provided by the vestibular system and proprioception allow humans and other mammals to integrate self-motion and to keep track of one’s position in the environment (Mittelstaedt and Mittelstaedt 1980; Loomis et al. 1999). Distances as well as the temporal velocity profiles of passive linear translations performed in darkness can be reproduced with great accuracy (Berthoz et al. 1995; Israël et al. 1997). The production of joystick-controlled on-axis body rotations when subjects are verbally instructed of the total turn to perform was significantly undershot, in other words, the subjects felt that they had moved further than the actual physical movement (Israël et al. 1995). Similarly, the reproduction with a pointer of orientation change for on-axis rotations and curved paths was overestimated (Ivanenko et al. 1997). These findings suggest that humans tend to overestimate the rotation sensation given by the vestibular and somatosensory systems, although it is compensated when turns are reproduced and therefore matched within the same modality (Siegler et al. 2000). Curiously, in this study, the dynamics of the presented rotations, that is the temporal velocity profile, were not reproduced, as it has been reported for forward translations. Neither the same motion duration nor peak velocity was used for the reproductions. In a later study, Israël and collaborators investigated whether this could stem from the different processing of the signals provided by the otoliths and the canals. They tested on- and off-axis rotations with the same paradigm to find that the addition of otolithic cues lowered the performance and that the dynamics were closer to the presented one (Israël et al. 2005). The present study focused on pure whole-body on-axis yaw rotations stimulating mostly the semi-circular canals. We checked whether vision could also contribute to the storage of the rotation dynamics in order to reproduce velocity profiles.
Multisensory interactions for spatial orientation
How humans integrate cues provided by different senses for the same physical property have been studied for many decades. According to Welch and Warren’s ‘modality appropriateness hypothesis’, the reliability of each signal should be taken into account, the relative modality dominance depending upon the precision of its signal in the given task context (Welch and Warren 1980). In a similar line of research, an increasingly popular model based on the maximum likelihood estimator (MLE) was recently proposed (Ernst and Banks 2002; Ernst and Bülthoff 2004). This Bayesian framework for sensory integration puts forward the following principle: The available signals provided for the same physical property are combined into a percept defined by the weighted sum of the signals, with weights determined by the inverse of their respective variances (measured in unimodal conditions).
In recent years, a large number of studies that used direct perception tasks have reported that the MLE model very accurately predicts the observed performance when modalities are combined. Nevertheless, only a few have assessed its validity for motion perception and spatial orientation in general. When dealing with general movements that include translations and tilting, the central nervous system is faced with an ambiguity in the interpretation of incoming signals. Indeed, all linear accelerometers measure gravitoinertial force (GIF), which is the sum of gravitational force (tilt) and inertial force due to linear acceleration (translation). A growing literature has studied and modeled how additional cues combine with the GIF measured by the otoliths to extract self-motion and orientation: from the semi-circular canals signal and graviceptors (Zupan et al. 2000; Merfeld et al. 2001), to visual input (MacNeilage et al. 2007), to top-down cognitive expectations (Wertheim et al. 2001).
Finally, little research has focused on the visuo-vestibular interactions involved in everyday life navigation. Unlike in weightlessness or in a dense milieu such as water, in which navigation involves rotations about any axis and peculiar gravitational conditions (Vidal et al. 2003, 2004), natural navigation over planar surfaces includes only translations and upright yaw rotations. Recent electrophysiological studies on non-human primates found a neural structure—the dorsal medial superior temporal area (MSTd)—that provides a functional link between brain activity and heading perception based on inertial motion cues (Gu et al. 2007). In the very same structure, a subpopulation of neurons with congruent heading preferences for visual and vestibular cues strongly correlated with monkeys’ perceptual decisions when both modalities are present (Gu et al. 2008). Curiously, the processing of visual and vestibular cues in MSTd during self-rotations showed substantial qualitative differences compared to translations (Takahashi et al. 2007). Indeed, the large majority of neurons showed a maximal incongruence in the rotation direction preference for visual and vestibular cues, suggesting that the integration of these cues for robust perception cannot take place in MSTd. The role of vestibular signals in this area could then be restricted to disambiguate optic flow resulting from self-motion from that produced by eye, head and body rotations.
These neurophysiological results are consistent with the dissociation between visual and vestibular storage that was found in an older human behavioral study. After traveling a virtual path where physically turning the body was driving the visual orientation, when a multiplicative factor was introduced between body and visual rotations, subjects could reproduce each sensory turn separately depending on the task context (Lambrey et al. 2002). The observed locomotor reproduction of turns matched the ones performed during the presentation, whereas drawings of the paths matched the visual experience. These results concern rather high-level mechanisms, and one of the motivations of the present study was to test if this apparent task-dependent behavior is to be related to the different primary modality involved in each task rather than the task itself.
We designed an experiment to investigate how humans store upright turns, and particularly how vestibular and visual cues available during the rotations interact at the encoding and recalling stages of the memorization process. In this experiment, subjects experienced a turn and they were instructed to reproduce the turn amplitude backwards by controlling the velocity with a joystick. The sensory contexts of the encoding and recalling stages were manipulated, which could include visual and/or vestibular cues about the turn. Furthermore, a gain factor between these modalities was used during the presentation phase in order to dissociate the two modalities and to allow inferring on which sensory basis was the resulting reproduction done. On the one hand, we addressed the issue of the reproduction strategy: will the velocity profile of the encoding stage be reproduced for turns as it was reported for linear translations in darkness, and will the reproduced profiles differ according to the sensory context available? On the other hand, we addressed several issues about the sensory interactions in the memorization process. Concerning the encoding stage, will unimodal reproduction vary according to the encoding sensory context? Will each of the cues available in the stimulation be recovered accurately? Although visual and vestibular cues combine at a very early stage of processing in the brainstem, this integration concerns mostly low-level control of eye movements and does not imply that the initial unimodal information is lost. We, therefore, expect a large degree of independence in the storage of visual and vestibular cues, which should result in very little interaction during the encoding. Concerning the recalling stage, will the combination of modalities improve the reproduction of turns? Will visual cues prevail when a modality matching conflict is introduced during the reproduction? If an integration process occurs, it should result in better performance when both cues are available.
Materials and methods
Twelve naïve subjects (nine male and three female) between 18 and 28 years participated in this experiment. Most of them were students and all but one were right handed. Subjects were paid for their participation time according to the Max Planck regulations and were informed that they could interrupt the experiment at any time.
Visual rotations were projected on a flat screen subtending 87° of horizontal field of view, centered on the subject’s line of sight. The three-dimensional geometry of the visual stimulus was adjusted to the real perspective. A limited lifetime star field was projected on the surface of a virtual vertical cylinder with a radius of 6 m. The virtual observer was placed on the cylinder’s axis, therefore at a simulated distance of 3 m from the dots. Visual rotations corresponded to the virtual rotation of the cylinder around its axis. The star field was composed of approximately 600 simultaneously visible dots, each subtending 0.2°. Dots were randomly positioned and displayed during 2 s, including a fade-in and fade-out period of 300 ms. After extinction, each dot was replaced by another randomly positioned dot.
Body rotations were performed with a hexapod motion platform (Maxcue from MotionBase™) remote-controlled by computer. The seat was adjusted so that the yaw rotations performed with the platform were around the vertical axis that passed between the two ears of the subject.
The subject reproduced the turns with a joystick positioned in front of him. Tilting the handle leftward or rightward rotated the platform and/or the visual display. The rotation speed varied according to the tilt angle, e.g., the more the handle was tilted the faster the rotation was (with a visual or body maximum velocity of 40 or 26.7°/s, respectively). During both passive and reproduced turns, subjects were instructed to look at a fixation cross that was displayed in the center of the screen. A notifying sound was played through the headphone when the fixation appeared. Eye movements were recorded with an SMS iView eye tracker in order to verify subjects’ fixations. During the whole experiment, subjects had to wear a headphone with noise cancellation that played a masking noise that was a recording of the sound made by the platform legs when moving. This suppressed the noise of the motion platform that could have been used as an additional cue for the motion speed.
The experiment consisted of two experimental sessions, separated by a minimum of 10 days to prevent learning effects from transferring across sessions. All trials had the same structure, including a presentation phase where subjects had to memorize a turn (encoding stage) and a reproduction phase where they were asked to reproduce this turn backwards (recalling stage).
The second session was similar to the first except that this time stimulations were always unimodal. The presentation consisted of purely visual or purely vestibular turns and the reproduction phase was done in the exact same modality as the presentation phase, again in the opposite direction. In this experimental session, only two conditions were studied (see Fig. 2): V to V and B to B labeled according to the studied modality.
Subjects were never informed before the reproduction in which condition they would be tested on. Reproduced turns were validated by pressing the joystick’s trigger, then in a few seconds the platform was slowly repositioned (5°/s) in order to dispose of a maximal reproduction range for the following trial, which then started automatically. The delay between the presentation and the reproduction phase was voluntarily chosen to be small. A longer delay would have allowed the semi-circular canals to return to a stable activation state before the next stimulation, but this would have required about 30 s and the memory decay would then have become problematic. Furthermore, another study found no difference between waiting or not for this stable state before reproducing body turns (Siegler et al. 2000).
Presented turns could take two directions (leftward or rightward), and three different amplitudes (45° and/or 30°, 60° and/or 40°, 75° and/or 50° for the visual and/or body rotations, respectively). These rotation angles were chosen so to cover a natural range of turns in everyday life locomotion. As mentioned earlier, the difference between visual and body rotations stem from the unnoticed conflict that was used in order to disentangle these modalities. The angular velocity of the rotations followed gaussian profiles, which are known to be a natural profile for head rotations (see Appendix A of Supplementary material). Furthermore, in order to avoid an easy correlation between the turn amplitude and either the total rotation duration or the peak velocity, both the peak velocity and the total rotation duration varied according to the final amplitudes. Rotation durations were 5.4, 6.1, and 6.6 s; visual (body) peak velocities were 19.5°/s (13.0°/s), 23.6°/s (15.7°/s) and 27.4°/s (18.2°/s) for the three rotation amplitudes, respectively. The velocity range was chosen to cover natural heading changes resulting from full body turns during navigation. The velocity profiles that were used for the presentations, e.g., gaussian profiles, are quite similar to single cycle sinusoids with frequencies ranging from 0.076 to 0.1 Hz. At these frequencies, the sensitivity thresholds range roughly between 1.75 and 2.25°/s (Grabherr et al. 2008), which is far below the velocities experienced. Finally, although these frequencies lie at the limits of the sensitivity plateau where the threshold is about 0.75°/s, it still falls within a range where the vestibular system is fully operational. Note that in the second experimental session, presentations were unimodal and reproductions were tested using the same modality. Therefore, there were no conflicting cues although the tested visual and body turns were not the same. This difference was preserved in order to allow for within subject comparisons with the first experimental session conditions. Note also that in all conditions, the gaze fixation was used in order to prevent subjects from using the vestibulo-ocular reflex (body rotations) or target pursuit (visual rotations) to perform the task.
At the beginning of each experimental session, a set of practice trials including two trials per condition was conducted in order to ensure that the task was clearly understood. During these trials, subjects were instructed to play around with the joystick and feel the effect it has on their motion in the various sensory contexts studied. The data from these trials were not collected. Six repetitions of each of the turn × condition were performed, corresponding to a total number of 144 trials and 72 trials in the first and second session, respectively. The order was randomized for each subject and trials were blocked in groups of 16. At the end of each block, they could rest a while without leaving the platform; and after 3 blocks, they could rest for up to 10 min outside of the experimental room.
The trajectory of the reproduction was recorded for each trial, e.g., the instantaneous visual and/or body orientation sampled at a frame rate of 60 Hz. The final orientation determined the reproduced rotation amplitude and the amplitude ratio (reproduced/presented rotation amplitude). Matlab scripts were used to process these files in order to analyze the reproduction dynamics. Consequently, we achieved an optimal visualization of the angular velocity profiles of all trials according to each condition, and the best Gaussian fit for each of these profiles was computed providing the root mean square error (RMSE) as an indicator for fitness. The motion duration and maximum angular velocity (peak velocity) were also extracted.
Several repeated-measures ANOVA designs were used to analyze the statistical effects and significance when comparing the different experimental conditions. Student’s t tests were also used in order to compare observations with single reference values. Post hoc analyses were performed with Tukey’s HSD test.
Turn direction had no significant effect in any of the statistical tests performed, therefore we disregarded the direction factor and leftward/rightward data were pooled together for all the analyses presented hereafter.
Modality interactions during the encoding stage
Figure 6b shows the individual variances within a given experimental condition of the amplitude ratio, averaged across subjects, together with the variance predicted by the MLE for the bimodal reproduction. The classic equation linking the unimodal variances with the expected bimodal variance was used to compute these predictions (Ernst and Banks 2002), further details on how this model was applied to our experimental data can be found in Appendix B of Supplementary material. The reproduction condition has a significant effect on these variances (F(2,22) = 6.29; p < 0.01). Post hoc tests showed that the bimodal reproduction variance was significantly lower than that of both unimodal reproductions (p < 0.05 for VB to V and p < 0.01 for VB to B). The measured bimodal reproduction variance was significantly higher than that predicted by the MLE for the measured unimodal variances (2.8% instead of 1.7% predicted, Student test: t(11) = 3.5; p < 0.005). These results indicate that variability reduces when bimodal information is available as compared to unimodal information, but this improvement is not optimal as defined by this Bayesian framework for sensory fusion.
Reproduction with a modified sensory gain
Turn reproduction strategy
Ruling out the motor strategy
In the conditions where presentation was bimodal, the experienced turns were identical whether the reproduction sensory context used the same or a different gain. Since the reactivity of the joystick was adjusted so to have the same rotation speed for the visual stimulation and the platform relatively to the expected angle, subjects could then have used a motor strategy in order to reproduce turns. For all conditions except the one with a modified gain (VB to VBdiff), reproducing the same dynamic tilt profile with the joystick would lead to the same answers across conditions. Accordingly, subjects could only estimate the “size” of the turn presented, and then generate a constant motor profile independently of the condition assuming that these were constant across conditions. The analysis of the reproduction’s peak velocity shows that this was not the case: visual reproductions (VB to V or V to V) were significantly larger than vestibular reproductions (VB to B or B to B) of about 23% in average.
It is important to notice that the joystick range did not limit the peak velocities of the reproduction profile’s plateaus: in all conditions the tilt angle was way below the maximum (see Fig. 4b), with a maximum of about 55% tilt for visual reproduction. In fact, when controlling the platform rotation with the joystick, subjects felt more comfortable not turning too fast, whereas with visual turns they could turn as fast as they wished without feeling shaken by the physical motion. This provides an additional argument against the possible use of a motor strategy allowing turn reproductions regardless of the sensory context available.
Velocity profiles are not reproduced
During the encoding stage, subjects could have memorized the entire velocity profile instead of just storing its temporal integration corresponding to the total turn amplitude. Then, in order to reproduce backwards the turn amplitude as they were instructed to, they could just rollback the velocity profile converting it into a motor command to tilt the joystick. This strategy has been reported for vestibular reproduction of linear translations (Berthoz et al., 1995), and was interpreted by the authors as an indication to the nature of how self-motion might be encoded.
In the current experiment, we wanted to check whether the whole velocity profile would also have been stored for on-axis turns using different sensory contexts, namely visual, vestibular or visuo-vestibular motions. To that end, we left on purpose this velocity profile strategy available: not only the velocity was linearly controlled by the tilt angle of the joystick (velocity command), but also the reactivity of the joystick was kept constant in a congruent fashion across the experimental conditions. Despite these efforts, in our experiment subjects used velocity profiles for their reproduction of turns that were particularly distinct in many aspects from the presented Gaussian velocity profile. In all the sensory contexts that we have studied, neither the motion duration, nor the peak velocity, nor the best Gaussian fit for each single reproduction, were compelling with an attempt to reproduce the velocity profile of the turn presented (see Fig. 4a, b). The dynamics of the reproduction observed in the present experiment rather follows a standard trapezoidal velocity profile, with some small corrections in the end suggesting a temporal integration process undergoing, which is consistent with what was found for varying velocity profiles for visual translations (Mossio et al. 2008). Both the peak velocity maintained during the plateaus were, as discussed before, highly dependent on the sensory context of the reproduction.
This observation leads to the conclusion that the velocity profile might not be stored and that for turn reproduction tasks, a temporal integration is performed during the encoding stage, which is then matched to the temporal integration of the produced turn. Accordingly, the findings reported in Berthoz et al. (1995) could result from an experimental bias related to the chosen velocity profiles: trapezoidal, triangular and rectangular. These profiles all fall into the trapezoidal category, the last two being just singular cases (trapezoidal without plateau and trapezoidal with steep acceleration and deceleration slopes), which is precisely the profile that subjects tended to use in our experiment. It seems then that using such profiles would naturally lead to a reproduction with a similar profile, e.g., a trapezoidal profile.
Encoding interactions: sensory-independent storage
In order to determine the possible interactions between visual and vestibular senses during the encoding of turns, we designed two sets of unimodal reproduction conditions: one with unimodal presentations (visual or vestibular), and the other with bimodal presentations using a visuo-vestibular gain factor of 1.5.
On one hand, the free reproduction of pure visual turns was very accurate whereas a significant underestimation of about 15% was observed for pure vestibular turns. As reported in the dynamic analysis, this could stem from the distinct motor behaviors observed: a reduced peak velocity was used when rotating the platform as compared to rotating the visual field, whereas the motion duration itself was similar. These performances provide the baseline characteristics observed for totally independent unimodal reproductions.
On the other hand, when simultaneous visual and vestibular turns are presented during the encoding stage, one could expect that these senses interact during the encoding such that each unimodal reproduction would be biased towards that of the other presented modality. Since the visual field rotations turned faster than the corresponding body rotations, such interaction would lead to an overestimation when reproducing body turns and an underestimation when reproducing visual turns. Interestingly, these biases were not observed. There was no difference whatsoever in visual reproduction whether the encoding was purely visual or visuo-vestibular, which indicates that body turns did not interfere with the storage of visual turns. Similarly, there was no significant difference in the vestibular reproduction. Surprisingly, there is a very small tendency to underestimate the reproduction of small body turns when vision was also available during the encoding, which goes in the opposite direction to the expected interaction. Nevertheless, to a large extent, visual turns did not interfere with the storage of vestibular turns.
Taken together, these results allow us to conclude that subjects could accurately extract each modality from the encoding of the presented visuo-vestibular turns in order to reproduce it. Therefore, whether or not a combination of cues occurs, the unimodal visual and vestibular information is not lost during the encoding process, which shows that each modality is stored independently. In fact, a merged visuo-vestibular information could have also been stored, but this third encoding would be largely independent of the unimodal encoding. This is consistent with recent findings that showed a dissociation between the processing of visual and vestibular cues in MSTd (Takahashi et al. 2007), suggesting that for self-rotations, the integration of these cues when both are present should occur in another cerebral structure. This storage independence confirms using the same low-level reproduction task, what had been reported previously using distinct tasks (Lambrey et al. 2002), and stays in-line with what was reported for independent sensory modalities (Hillis et al. 2002).
Recalling interactions: bimodal reproduction
Two reproduction conditions using different gain factors between the visual scene and the body rotations were studied in order to assess how the modalities interact at the recalling of visuo-vestibular turns. In a first condition, we kept this gain factor at 1.5 as during the presentation, which allowed to test with a classical multisensory approach how the availability of these two modalities might have improved the reproduction. In a second condition, we deliberately changed this gain factor so to make the matching of both modalities impossible. This last condition was designed to evaluate where the trade-off between the two modality matchings would be placed.
Performance observed when subjects had to reproduce the turns in the same sensory context (controlling both the body and visual scene rotations with the same discrepancy as during the presentation) shared some of the characteristics of an optimal signal combination provided by each of the senses. On the one hand, the average position of the bimodal reproduction of turns lay in between that of visual or vestibular alone. In other words, the vestibular part of the bimodal reproduction was pulled towards that of vision alone, and simultaneously the visual part was pulled towards that of body alone. Consequently, since the vestibular reproduction was globally undershot compared to the visual reproduction, the vestibular part and the visual part of the bimodal reproduction were increased and decreased, respectively. In a similar fashion, the peak velocity of the reproduction observed for the bimodal condition was influenced by both intervening sensory cues, corresponding to a significantly faster reproduction than with the body only, but slower than with vision only. On the other hand, the variance of the reproduced turn amplitudes with bimodal stimulation was significantly smaller than the one observed with each of the unimodal stimulation, whether vestibular or visual.
These findings show that there is an advantage of matching both modalities over matching a single modality, showing that a sensory integration process is actually undergoing. The multimodal contribution for spatial mechanisms has also been elicited in an earlier study for mental rotations (Vidal et al. 2009). Nevertheless, this integration was not optimal as defined in the Bayesian MLE framework, the variance being still significantly higher than what the model predicts. We believe that the sensory integration optimality is not achieved because of the indirect nature of the tested perception. All previous studies that found a combination of cues consistent with the MLE model were direct perceptual tasks in the sense that the physical property to be evaluated did not rely on cues that were to be integrated over time: the cues could be directly sensed. In contrast, for turn perception, the estimated measure, e.g., the rotation amplitude, is not available at once. A temporal integration of visual and/or vestibular cues is required in order to estimate the total amplitude. Furthermore, since in our experiment, the velocity was not constant, one can expect the sensory weighting in the integration to evolve: slower rotations could favor the visual input, whereas faster rotations could favor the vestibular input. For these reasons, we believe that the sensory integration in such indirect perceptual task is not straightforward, which is why it did not follow the MLE predictions.
Matching problem: vision prevails
If we consider the task as a matching within modality between the presented stimulus stored in memory and the one being reproduced, we could wonder what happens when both cannot be simultaneously matched anymore. Modifying the visuo-vestibular gain in the reproduction phase created this situation. Changing from a gain of 1.5 between modalities to a gain of 1.0 introduced a compulsory mismatch of 50% that subjects had to distribute among the visual and the vestibular reproduction. One could expect two extreme strategies adopted to face this problem. In one extreme, subjects would evenly distribute this mismatch between the two modalities, which would happen if the importance given to each of these modalities was equivalent for this task. In the other extreme, subjects would try to minimize the mismatch introduced for one particular modality, which would be the prevailing modality for this specific task. The distribution of the mismatch measured in the different gain conditions was closer to the second extreme. Indeed, 84.5% of the mismatch was attributed to the vestibular reproduction whereas only 15.5% to the visual reproduction, which shows that subjects prefer to adjust correctly the visual rotation disregarding the much larger vestibular rotation that was associated to it. This finding is valid within the studied velocity range, which covers natural heading changes resulting from full body turns. For higher velocities such as those experienced during head rotations, the visual system might not be precise any longer and the vestibular contribution could become predominant.
The purpose of our work was to reinforce the connection between self-motion perception and memorization with the multisensory framework. We focused on the general issue of on which sensory basis can humans memorize visuo-vestibular turns, and how these senses interact during the encoding and recalling stages.
First, recalling a memorized turn does not rely on the reproduction of the velocity profile, which suggests that for this task the velocity profile might not be stored. Instead a temporal integration is performed during the encoding stage, which would then be matched to the temporal integration of the produced turn, regardless of the velocity profile used. Second, the unimodal recalling of turns, either visual or vestibular, is independent of the other sensory cue that it might be combined with during the encoding stage. Therefore, turns in each modality (visual and vestibular) are stored independently. Third, when the intersensory gain of the recalling was preserved, the visuo-vestibular bimodal reproduction of turns was more precise (reduced variance) and lay between each unimodal reproductions. This suggests that with both visual and vestibular cues available, these combine in order to improve the reproduction (trade-off between modalities). As discussed before, the predictions of the MLE model did not apply possibly because of the fundamental difference between the instantaneous sensing of a physical property (direct perception) and, like in our experiment, a perception that requires integration over time (indirect perception). Fourth, modifying the reproduction gain resulted in a substantially larger change for the body than for the visual scene rotations reproduced. Therefore, within the studied dynamic range, which corresponds to a natural range of whole-body rotation speeds involved during navigation, vision prevails when a visual/vestibular matching problem is introduced.
This research was supported by the Max Planck Society. The authors wish to thank Daniel Berger for his support with the experimental set-up, and for his helpful comments on earlier versions of the manuscript.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Loomis JM, Klatzky RL, Golledge RG, Philbeck JW (1999) Human navigation by path integration. In: Wayfinding behavior. The John Hopkins University Press, Baltimore and London, pp 125–151Google Scholar
- Vidal M, Lipshits M, McIntyre J, Berthoz A (2003) Gravity and spatial orientation in virtual 3D-mazes. J Vestib Res Equilib Orientat 13:273–286Google Scholar