In this experiment we investigated the effect of task-induced attention to a particular modality on the integration of visual and inertial cues during actively reproduced turns. These effects were compared for conditions in which cue conflicts were perceived and when they were not perceived.
We found that asking participants to use one of the two modalities while ignoring the other significantly affected relative cue weighting. However, the influence of the ignored cue could not be completely suppressed. The cross-modal bias of vision on inertial cues was stronger than the bias of inertial cues on the visual rotation.
We also found that the amount of influence that the task had on the relative cue weights in the responses correlated with an awareness of the cue conflict. In trials in which no cue conflict was noticed, the effect of the task on the responses was not significant. If, however, conflicts were noticed, attending to one modality increased its weight in the combined estimate. In the following, we discuss the different experimental findings in more detail.
Over- and under-rotations in the turn reproduction task
We found that participants did not reproduce a large enough angle during large rotation trials (25°) in all four conditions BO, VO, BA, and VA. Under-rotations for actively returning a passively presented whole-body rotation have also been observed in a previous study by Israël et al. (1996), though for much larger rotation angles (180°). Such an under-rotation is somewhat surprising if both the passively presented and actively reproduced turns are misestimated similarly. Specifically, any under- or overestimations of the turn angles should cancel out in the turn reproduction task. The difference we find must thus come from a difference in misestimating the two rotations in a trial. The effect could be explained by a stronger underestimation of passive rather than active turns. Alternatively, it could be explained by a stronger underestimation of the first compared to the second turn; for example, if the memory of the first rotation decays over time. A third possibility is that since active returns were under the control of the participants, their motion profiles differed from a raised cosine, which could influence the perceived size of the rotation.
Jürgens and Becker (2006) proposed a model for human rotation perception, based on experimental evidence, in which a “default velocity” represents a top-down prior which is integrated with bottom-up sensory cues in a Bayesian framework. This default velocity could be adapted to the average velocity during an experiment, and draw responses towards an average value. Jürgens and Becker found that this “tendency towards the mean” is reduced when more sensory cues are available, which is consistent with the idea that the tendency towards the mean is caused by a top-down Bayesian prior.
A similar prior could be the reason for the underestimation in our experiment, which would represent a very small or even zero rotation (0°). The Bayesian framework predicts that the less reliable the sensory estimate, the stronger the effect of this prior would be. Since the first turn has to be kept in memory, a decrease in reliability of the memorized first turn would increase the influence of a zero-rotation-prior on that turn, reducing its effective size, so that participants would turn less in their active response.
A different reason is suggested by the results of a study by Lappe et al. (2007). They showed that whether traveled distances are under- or overestimated depends on the task given to the participant. The difference is explained by a leaky integration during the movement, of either the distance to the starting point or the remaining distance to a target. If participants had to travel to a previously indicated target, they did not travel far enough (they thought they had traveled farther than they actually did); when they had to indicate the starting position after moving, they set the target too close (they thought that they moved less than they actually did).
In our experiment, both cases applied. Specifically, during the passively presented rotation, participants had to keep track of the starting position and during the active return, they had to update the distance to the target. According to Lappe et al. (2007), participants would first assume that they traveled less than they actually did in the passive rotation and traveled farther than they actually did in the active return. Consequently, both effects would add up and cause an under-rotation in the response, which is consistent with our findings.
Effect of turn size on cue weights
We found a significant effect of the target angle on the cue weights, with a higher visual weight for small rotations (10° target angle) and a lower visual weight for large rotations (25° target angle), while the opposite was true for inertial cue weights.
In this experiment, all passively presented rotations were 3 s long, independent of the rotation angle. This means that larger rotations also had higher accelerations and velocities than smaller rotations. Therefore, we could not determine whether the differences in cue weighting depended on rotation angle, velocity, or acceleration.
If the MLE framework is correct, cue weights should follow the respective reliabilities of the individual cues. Therefore, in our experiment it would be predicted that visual rotations would be relatively more reliable for small angles and inertial rotations would be more reliable for larger rotations. The smallest of our rotations (3.3°/s average) were quite close to vestibular thresholds compared to the largest rotations (8.3°/s average). The threshold for such cosine yaw rotations of about 3 s duration is around 1.5°/s in darkness and 0.55°/s in the presence of a visual target (Benson et al. 1989; Benson and Brown 1989), but the variability across participants is high. The threshold for visual motion is lower, in the range of 0.3°/s (Mergner et al. 1995).
Also, for larger visual rotations, the visual target region went off the projection screen, which may have reduced the reliability of the estimate of the size of the visual turn. It is therefore reasonable to assume that the reliability of the inertial cue increases with respect to the reliability of the visual cue as the turn size increases.
Effects of attending to a modality and noticing cue conflicts
Participants responded differently when they were instructed to use visual cues than when they were instructed to use inertial cues for their response. This suggests that the cues are not always mandatorily fused into a singular “combined percept” of the rotation. Instead, our results indicate that for their response, participants put more weight on the modality which was to be attended according to the task instructions. Further analyses showed that the effect of task-defined attention on the response is significant only in trials in which participants responded that they noticed a difference between the visual and the inertial rotation, but is not significant if they reported that they did not notice a difference. This suggests that participants could use task-defined attention to reduce the influence of a to-be-ignored cue on the response, particularly when they notice that it is conflicting with the to-be-attended cue. In our results, even when participants noticed cue conflicts, the ignored modality still had an effect on the response. Thus, participants did not have access to pure sensory signals from the individual modalities during cue conflict conditions and could not simply use one cue for the response while completely ignoring the other cue. This is consistent with findings from a study by Bertelson and Radeau (1981) on visual-auditory cue integration, where it was found that a crossmodal influence on stimulus localization can still occur even when cue conflicts are noticed.
In another experiment investigating the influence of a concurrent visual stimulus on auditory localization, Wallace et al. (2004) asked participants after each trial whether they had perceived the signals in the two modalities as coming from the same or from different sources. Comparable to the results in our study, they found that the reported auditory stimulus location was strongly drawn towards the location of the visual stimulus only when participants reported that they had perceived visual and auditory stimuli as a unified event, but not when they were perceived as separate events. Also, the response variability was lower when visual and auditory events were perceived as unified. This is consistent with our results, as we also found lower response variances when participants did not perceive a conflict between visual and inertial cues. We found an indication of true integration, as evidenced by a reduction of the response variance in comparison to the single-cue conditions, only in the VA condition when the visual cue was to be attended, and only if no cue conflicts were reported. In that condition, the variance of the responses was not statistically distinguishable from what would be expected if visual and inertial cues were optimally integrated in a maximum-likelihood fashion.
Lambrey and Berthoz (2003) also investigated the effect of awareness of conflicts between visual and body yaw rotations on cue weights, but in a different way. They did not instruct the participants to use one of the modalities and ignore the other, and they also did not tell them about the cue conflicts. During the experiment the participants were interrogated repeatedly to find out at what point during the experiment they became aware of the conflict. The experimenters then compared the weights of the cues before and after the participants became aware of the conflict. They found that about half of their participants had a bias towards using visual cues, and the other half inertial cues. After the participants became aware of the conflict, the bias towards the preferred cue was increased. Our results agree with those findings, and additionally show that task-induced attention can select the preferred cue. Our results also show that the strength of the attentional bias can change on a trial-by-trial basis as a function of the current awareness of a conflict when participants are not naïve about the possible occurrence of cue conflicts.
Helbig and Ernst (2008) performed a visual-haptic cue integration experiment to investigate the effect of attention on modalities on multimodal integration. They manipulated the amount of available resources for the processing of signals in visual and haptic modalities differentially by introducing a secondary task. They did not find an effect of the secondary task on cue weights, showing that visual-haptic cue integration was immune to their manipulation of attentional resources. Since they kept cue conflicts so small that they were not noticed by the participants, these results are also consistent with the present study.
Some caveats
There are a few caveats when interpreting the results of this study. Firstly, since participants always rotated back actively, we could not control the duration and movement profile of the second turn in each trial. Many participants, but not all, tried to imitate the raised-cosine rotation profile of the passive rotations. The variability in the motion profile of their active turns might add to the response variability we find within and across participants. Also, the rather short delay between the passive rotation and the active return might cause a contamination of our results by rotation aftereffects. Such aftereffects can, however, not explain any response differences as a function of task instruction or conflict awareness. A study by Siegler et al. (2000), where much larger and longer rotations were shown to the participants, did not find any difference in the response accuracy whether or not a yaw turn was reproduced immediately after presentation or after the end of post-rotatory sensations.
We found a correlation between “becoming aware of a cue conflict” and the “strength of the task-defined attentional bias” towards one of the modalities. However, we could not determine from this study whether this is a causal relationship and if so, what the causal direction is. Further experiments will be needed to evaluate whether noticing a cue conflict enforces top-down attentional influences, or whether participants become aware of cue conflicts more often when they are more attentive.
Neural basis of self-motion perception
The neural basis of the multimodal perception of self-motion in humans is still obscure. Most imaging methods do not allow movement of the participant’s head, and imaging methods that would be feasible during self-motion, e.g., EEG and near-infrared spectroscopy (NIRS), have a rather low spatial resolution. Some studies have investigated brain activity in response to large-field optic flow stimuli (Brandt et al. 1998; Beer et al. 2002; Kleinschmidt et al. 2002; Deutschländer et al. 2004; Wall and Smith 2008) and interactions of visual and vestibular self-motion signals by using caloric stimulation (Deutschländer et al. 2002). Since the stimuli used are not true self-motion stimuli, it remains unclear whether the same results would be obtained with actual self-motion. Some of the observed effects might be attributable to discrepancies of vestibular and visual stimulation.
Animal studies have shown that vestibular and visual signals already interact at the level of the vestibular nuclei (Henn et al. 1974) and that several separate but interconnected regions in the cortex process vestibular information. Most importantly these include, the posterior insular region termed the “posterior insular vestibular cortex” (PIVC) as well as regions in parietal, somatosensory, cingulate, and premotor cortices (Guldin and Grüsser 1998)
Studies on cortical processing of self-motion stimuli using single-cell recordings in animals have focused on only a few regions; in particular areas MSTd (Froehler and Duffy 2002; Gu et al. 2007; Takahashi et al. 2007; Morgan et al. 2008; Britten 2008) and the ventral intra-parietal sulcus VIP (Bremmer et al. 2002a, b; Britten 2008). It has been suggested that cortical area MSTd, which contains cells that are sensitive to large fields of visual motion and also to vestibular stimulation, could be a central cortical area for integration of visual and vestibular signals of self-motion. A subpopulation of the neurons in MSTd shows responses to visual and vestibular translations which are consistent with a Bayesian cue integration model (Gu et al. 2007; Morgan et al. 2008). For rotations, however, such cells were not found (Takahashi et al. 2007). Virtually all cells in MSTd which are sensitive to both visual and vestibular rotations prefer opposite rotation directions for the two modalities. This suggests that instead of integrating visual and vestibular stimuli, cells in MSTd actually remove head rotations from the visual motion signal so that the resulting responses represent movement of objects in space while discounting self-rotation. Therefore, MSTd is most likely not the brain region in which visual and vestibular signals of self-rotations are integrated.
For yaw rotations, like those investigated in this experiment, it has been shown in rats that there is a special system of interconnected subcortical and cortical regions which maintains and updates a heading signal of the animal based on visual, vestibular, and somatosensory cues; the so-called “head-direction cell system” (Taube et al. 1990a, b; Taube and Bassett 2003; Zugaro et al. 2000). Although mostly studied in rats, such cells have also been found in primates (Robertson et al. 1999), which suggests that they might also be present in humans and might play a larger role in the cortical processing of yaw self‐rotations than currently recognized.