Distortions of perceived auditory and visual space following adaptation to motion
- 630 Downloads
Adaptation to visual motion can induce marked distortions of the perceived spatial location of subsequently viewed stationary objects. These positional shifts are direction specific and exhibit tuning for the speed of the adapting stimulus. In this study, we sought to establish whether comparable motion-induced distortions of space can be induced in the auditory domain. Using individually measured head related transfer functions (HRTFs) we created auditory stimuli that moved either leftward or rightward in the horizontal plane. Participants adapted to unidirectional auditory motion presented at a range of speeds and then judged the spatial location of a brief stationary test stimulus. All participants displayed direction-dependent and speed-tuned shifts in perceived auditory position relative to a ‘no adaptation’ baseline measure. To permit direct comparison between effects in different sensory domains, measurements of visual motion-induced distortions of perceived position were also made using stimuli equated in positional sensitivity for each participant. Both the overall magnitude of the observed positional shifts, and the nature of their tuning with respect to adaptor speed were similar in each case. A third experiment was carried out where participants adapted to visual motion prior to making auditory position judgements. Similar to the previous experiments, shifts in the direction opposite to that of the adapting motion were observed. These results add to a growing body of evidence suggesting that the neural mechanisms that encode visual and auditory motion are more similar than previously thought.
KeywordsAuditory Visual Motion Position Adaptation
Physical objects moving in the environment are rarely perceived by a single sensory system in isolation. Rather, when an object translates across external space in front of an observer, salient sources of motion information are typically available to both the visual and auditory systems. It is well established that the human visual system contains specialised and dedicated mechanisms for analysing object motion (Van Essen and Maunsell 1983; Albright and Stoner 1995). In contrast, our understanding of the processes and structures underlying auditory motion processing remains comparatively poor. Unlike the visual system, where the inputs for position and motion coding are derived from the pattern of light falling on the retina, input to the auditory system contains no inherent spatial organisation. Instead, the position of a sound source is computed based on frequency and timing information transmitted by the cochlea. Three cues are used to encode auditory spatial position and its change over time (motion): interaural timing and level differences that arise from the separation of the ears by the solid mass of the head (Rayleigh 1907), and spectral cues that arise from frequency filtering caused by the interaction of sound waves with the trunk, shoulders and pinnae (Békésy 1960). As a result, any map of auditory space is fundamentally computational, requiring reconstruction based on these cues (King 1993).
Two alternative accounts of how the auditory system might encode motion have emerged. The first, termed the ‘snapshot’ hypothesis, suggests that motion information is derived indirectly by tracking changes in object position over time (e.g. Grantham 1986). This could be achieved using the same mechanisms involved in the localisation of stationary auditory stimuli, without recourse to dedicated motion sensors. According to this hypothesis, auditory motion perception should be limited by the sensitivity of the auditory system to detecting auditory features and the precision by which they are localised in space and time. Proponents of the ‘snapshot’ hypothesis point to the fact that both detectability (Wilcott and Gales 1954; Grantham and Leuthke 1988; Xiao and Grantham 1997) and spatial resolution (Harris and Sargeant 1971; Grantham 1986) of moving targets tend to be similar to, or poorer than, comparable measures for stationary targets.
Alternatively, it has been argued that the auditory system does possess dedicated neural structures which directly encode attributes of motion in a manner analogous to the visual system (Perrott and Marlborough 1989; Warren et al. 2002). Two of the fundamental properties of neurons comprising the cortical substrate of visual motion analysis are selectivity to the scalar components of velocity: direction and speed. Directional selectivity is present as early as the primary visual cortex (V1) but is a dominant characteristic of extra-striate cortical area V5/MT, where in excess of 90% of neurons show marked directional preferences (Zeki 1974; Albright and Desimone 1987). V5/MT neurons also exhibit speed tuning that is invariant to changes in the stimulus pattern, such as spatial frequency, orientation or temporal frequency (Maunsell and Van Essen 1983; Perrone and Thiele 2001; Liu and Newsome 2003; Priebe et al. 2003). If auditory motion is encoded in a similar manner to its visual counterpart, one might expect to find neurons with comparable tuning properties in the auditory system.
A number of physiological studies have provided evidence consistent with the existence of motion sensitive mechanisms in the auditory systems of both cat and monkey. Neurons exhibiting directional selectivity to moving auditory stimuli have been shown to exist, both subcortically (Altman 1968; Altman et al. 1970) and in the primary auditory cortex (AI) (Ahissar et al. 1992). Additionally, recent evidence suggests that some AI neurons also show properties that could be considered consistent with speed tuning (Jenison et al. 2001). However, the proportion of cells exhibiting these qualities is extremely modest compared with the visual system, and it remains to be seen whether a true auditory analogue of V5/MT exists—i.e. an area dedicated to the analysis of motion.
Perhaps the most compelling psychophysical support for direct motion encoding in the visual system comes from the visual motion aftereffect (vMAE). Following prolonged exposure to motion in a particular direction, subsequent viewing of a stationary object elicits the non-veridical percept of motion in the opposite direction. This effect is readily demonstrable, extremely robust and is widely accepted as evidence for adaptation of specialised motion detecting mechanisms (Barlow and Hill 1963). By manipulating the characteristics of adapting and test stimuli, it is possible to demonstrate aftereffects with at least two distinct sets of properties. The first becomes apparent when a stationary test stimulus is used and is thought to reflect adaptation of motion sensitive cells in V1 (Maffei et al. 1973). The second occurs when a dynamic (or flickering) test stimulus is used and displays properties consistent with adaptation in extra-striate areas of visual motion analysis, most likely V5/MT (Van Wezel and Britten 2002).
Efforts to demonstrate an auditory motion aftereffect (aMAE) have met with mixed results. Early studies typically failed to demonstrate an effect that was both convincing and reliable between observers (e.g. Grantham and Wightman 1979; Grantham 1989; Reinhardt-Rutland 1992). In contrast, the results of recent studies provide more compelling evidence that a robust and replicable aMAE does indeed exist (Grantham 1998; Dong et al. 2000; Neelon and Jenison 2003, 2004). Critically, studies showing a strong aMAE have used stimuli containing all possible sources of motion information, produced either by physically moving a sound source through space (Dong et al. 2000), by filtering sounds through generic head-related transfer functions (HRTFs) (Grantham 1998), or by filtering sounds with individually measured HRTFs (Neelon and Jenison 2003, 2004). In contrast, studies failing to find a substantial aMAE have typically opted to simulate auditory motion via dynamic manipulation of interaural timing and loudness cues (Grantham and Wightman 1979; Ehrenstein 1994; Reinhardt-Rutland 1992). Although these stimuli can produce a sensation of lateral motion, it is possible that they provide suboptimal stimulation to the auditory system.
In recent years, it has been further demonstrated that adaptation to visual motion produces shifts in the perceived position of subsequently presented objects in a direction opposite to that of the adapting stimulus (Snowden 1998; McGraw et al. 2002, 2004; Whitney 2005). These effects demonstrate a direct interaction between neural mechanisms that encode motion and those that contribute to the representation of object position. Shifts in perceived visual position exhibit many tuning properties consistent with dynamic MAEs, but can also be induced using stimulus configurations that do not directly result in the perception of illusory motion (Whitney et al. 2003; McGraw et al. 2004). This has lead to the suggestion that these effects constitute a new and distinct class of motion aftereffect (Whitney and Cavanagh 2003).
In this study, we ask if analogous positional distortions also exist in the auditory domain. We demonstrate that adapting to auditory motion induces shifts in the perceived position of stationary auditory stimuli (Experiment 1). We then compare the properties of these shifts with those found in the visual system (Experiment 2). Finally, we investigate whether adapting to motion in one sensory domain (visual) results in shifts in perceived position in another sensory domain (auditory) (Experiment 3).
Four participants (three male) between the ages of 24 and 40 took part. Participants RWD, NWR and PVM were the authors of this study. Participant LKS was naïve to the purposes of the experiment. Each participant gave their informed consent prior to their inclusion in the study. All participants had normal hearing as assessed by standard audiological techniques and normal visual acuity.
Static auditory localisation stimuli were generated by convolving 200 ms bursts of bandpass filtered Gaussian noise (8th order Butterworth filter, 200 Hz–12 kHz passband; 44.1 kHz sampling rate) with a given pair of HRIRs, and delaying each channel by the appropriate ITD. Adapting stimuli were created in an equivalent fashion, with the exception that HRIRs and ITDs were sequentially updated in 0.1° steps to produce motion in the azimuthal plane. Each motion stimulus traversed a fixed angular extent of 20° azimuth, centred immediately in front of the participant. A range of velocities (2, 4, 8, 16, and 32°/s) were produced by systematically manipulating the length of each noise burst. Both static and moving stimuli were presented via Sennheiser HD 250 headphones at 84 dB SPL and cosine ramped at onset and offset over 5 ms. These stimuli mimicked the complete set of localisation cues normally available to each observer in the free field and produced compelling percepts of sound externalised in space.
Participants were required to judge the relative horizontal positions of the two localisation patches (i.e. did the upper or lower patch appear offset to the right?). A comparable procedure to Experiment 1 was used, whereby the physical offset of the patches was controlled in a method of constant stimuli (7 offsets, 40 trials per offset) and logistic fits to participants’ psychometric functions allowed quantification of both the PSE and JND. To further aid comparison with the results from Experiment 1, a preliminary experiment was conducted in which baseline performance was measured for a range of different visual patch sizes. Changing the standard deviation of the Gaussian envelope (σ) provides a robust means of manipulating positional sensitivity (Toet and Koenderink 1988; Whitaker et al. 2002). Mapping out this relationship allowed us to individually tailor the visual localisation stimuli for each participant such that positional sensitivity was equivalent to that obtained on the auditory task (Experiment 1). In subsequent adaptation conditions, visual motion stimuli were presented for identical periods to those used in Experiment 1 (60 s initial adaptation plus 10 s top up adaptation) and motion-induced shifts in perceived visual position were measured using the methods described previously. Because simultaneous motion adaptation was induced in opposite directions above and below fixation, the magnitude of the induced positional offsets was halved in order to allow comparison with the results of Experiment 1.
Participants were required to complete the auditory localisation task (7 positions, 40 trials per position, manipulated within a method of constant stimuli) employed in Experiment 1, with the important difference that instead of adapting to auditory motion prior to making auditory localisation judgements, participants adapted to unidirectional (leftwards or rightwards) visual motion. A single adapting velocity (16°/s) was selected, as this velocity successfully induced shifts in Experiments 1 and 2. Adaptation durations were consistent with those used in Experiments 1 and 2 (60 s initial adaptation, 10 s top up adaptation between each trial). Data was analysed in the same way as in Experiments 1 and 2.
Auditory localisation thresholds were first obtained for the baseline localisation (without motion adaptation) task to allow comparison between HRTF techniques and free-field procedures. The localisation threshold for each participant was 0.63° (SEM ± 0.10) for PVM, 1.14° (SEM ± 0.25) for RWD, 0.51° (SEM ± 0.11) for NWR and 1.20 (SEM ± 0.23) for LKS. These values compare favourably with localisation thresholds obtained in free-field environments, which are commonly in the order of 1° (Middlebrooks and Green 1991; Chandler and Grantham 1992).
The demonstration of an auditory aftereffect following adaptation to visual motion is consistent with previous findings from Kitagawa and Ichihara (2002) and Ehrenstein and Reinhardt-Rutland (1996). Kitagawa and Ichihara examined the effect of adapting to auditory and visual stimuli, moving in depth, on stationary test stimuli. Following adaptation to visual motion, the authors found that subsequently presented auditory test stimuli of fixed volume appeared to modulate in loudness. However, adapting to an auditory stimulus moving in depth had no measurable effect on static visual test stimuli. It is unclear whether this is because auditory motion adaptation does not induce visual aftereffects, or whether the lack of an aftereffect was a consequence of the paucity of motion cues present in the adapting stimulus used. Our current experimental setup does not easily allow us to resolve this issue. Adapting to auditory motion may shift both the visual test stimulus itself and the visual reference stimulus against which its position is judged. It is for this reason that we used a stable internal auditory reference (perceived midline) for our cross modal experiment. Ensuring the stability of an internal visual reference is considerably more challenging and requires that eye position is held constant. This is impossible to guarantee without the use of image stabilisation devices. The effect of auditory motion adaptation on perceived visual position is certainly an interesting question, and one that will be addressed in greater detail in future work.
Taken together, these experiments demonstrate that adaptation to auditory motion produces shifts in the perceived position of stationary test stimuli that are comparable with those observed in the visual domain. The observed shifts in perceived position are direction specific and band pass tuned for adaptor speed. The localisation shifts that occur following adaptation to both visual and auditory motion demonstrate some striking similarities. Apart from the fact that both are direction specific, the magnitude of each shift is similar when expressed in units of sensitivity, and shows band-pass tuning that is largely governed by adaptor speed. The shape of speed tuning for visual stimuli is governed by the spatial frequency of the adapting pattern and consists of two distinct regions. Initially, the magnitude of the motion-induced positional offset increases with adaptor speed and is essentially independent of adaptor spatial frequency. However, for each individual spatial frequency the peak offset occurs at a specific adaptor speed. Beyond this peak, stimulus visibility is reduced and the magnitude of the positional offset declines. The reduction in the size of the positional offset is entirely consistent with changes in spatiotemporal contrast sensitivity which occur as stimulus velocity is increased (Kelly 1979). The fact that auditory stimuli show similar band-pass speed tuning is somewhat surprising given the superior temporal processing properties of the auditory system. Differences in acoustic and photochemical transduction times in each sensory system set very different limits to the upper velocity of motion that can be encoded (Lennie 1981; King and Palmer 1985). Thresholds for discriminating ITDs are in the region of 9 μs (Klumpp and Eady 1956) and psychoacoustical studies demonstrate that human observers are more than capable of detecting velocities up to 360°/s (Chandler and Grantham 1992)—well beyond the 32°/s upper limit of our auditory shifts in position. This suggests that the locus of our effect might lie in cortical areas with poorer temporal processing capabilities than those typically encountered in the auditory system. Further support for this notion can be drawn from our third experiment where we demonstrated cross-modal effects (adapting to visual motion and testing with spatially localised auditory stimuli) that were also direction specific and similar in magnitude.
It is well known that the perceived location of an auditory stimulus can be influenced by stationary sounds presented in other regions of auditory space. Following a period of adaptation to a stationary sound located in a particular region of the auditory field, human listeners typically misperceive the location of subsequent sounds in a direction consistent with them appearing spatially repulsed from the adapted location (Taylor 1962; Thurlow and Jack 1973; Carlile et al. 2001). These effects are thought to reflect selective adaptation of auditory neurons tuned to particular regions of auditory space. Indeed, analogous effects are also found in the visual system where adaptation to a static visual target has a marked influence upon the perceived position of subsequently presented stimuli (Whitaker et al. 1997). However, static adaptation effects of this type are extremely unlikely to explain the present findings. Although each period of auditory motion adaptation was centred on the listener’s midline, the starting position of the first motion sweep in the series was randomised on each presentation. This meant that whilst both leftward and rightward adapting sequences traversed identical spatial extents, importantly, the start and end points were randomly distributed across the adapted region of auditory space. Therefore, no single auditory location was systematically subjected to the adapting stimuli either at onset or offset. Nonetheless, we found robust direction-dependent shifts in perceived position. Clearly these results are a consequence of adaptation to coherent unidirectional motion, rather than to any individual spatial components contributing to the motion sequence.
The neural processes that mediate motion-induced positional shifts have recently been investigated in both the visual and auditory systems. Studies of neurones in the cat (Fu et al. 2004) and monkey (Sundberg et al. 2006) visual cortex have suggested that motion signals have the capacity to induce dynamic shifts in the spatial representation of receptive fields (RF). Similarly, a very recent study examining auditory neurons in the owl’s optic tectum has also demonstrated a dynamic shift of spatial RFs in the presence of auditory motion (Witten et al. 2006). Critically however, in each sensory domain, the reported RF shift is in opposite directions. In the visual studies, the induced displacement in RF spatial representation occurs in a direction opposite to that of the motion signals inducing it. At first glance this seems surprising since it is in exactly the opposite direction to behavioural measurements of motion-induced positional shifts in human subjects, where the perceived location of a stationary object is displaced in the direction of motion (Ramachandran and Anstis 1990; De Valois and De Valois 1991; Arnold et al. 2007). However, as retinotopic location is derived from a population response (e.g. a vector average code, Georgopoulos et al. 1986), in order to produce a shift in the positional average derived from all active neurons, individual neurons would need to modify their spatial profiles in the opposite direction, whilst retaining their original positional labels within the map. In this scheme, shifts in the spatial profiles of individual neurons in one direction result in a mislocalisation of the vector average response in the opposite direction (Sundberg et al. 2006). Although visual RF shift models are compatible with the psychophysical observations we report here, where the spatial displacement of both visual and auditory test stimuli is in the opposite direction to that of the adapting stimulus, recent attempts to replicate the observations from animal studies using human functional neuroimaging techniques (fMRI) have failed. Specifically, rather than showing a displacement in the spatial preference of cortical RFs when motion signals are present, the retinotopic map of visual space found in the primary visual cortex (V1) remains invariant when measured with stimuli moving in different directions (Liu et al. 2006).
The RF shifts reported in the auditory domain, on the other hand, are qualitatively different to their visual counterparts (Witten et al. 2006). That is, the receptive field is displaced in the direction of motion, and presumably in the direction opposite to this following a period of adaptation, though this has not been explicitly tested. Furthermore, in the auditory domain, the behavioural and neurophysiological measures differ considerably in their speed tuning. We show that auditory motion-induced shifts in perceived location are band-passed tuned for speed, becoming minimal by 32°/s. In contrast, auditory motion-induced RF shifts recorded in the owl tectum are about twice as large and increase systematically as a function of speed, reaching a maximum value by 32°/s (Witten et al. 2006). Given this difference in operational characteristics it seems likely that these motion-induced shifts serve different functional roles. Witten and colleagues argue that the speed dependant nature of their RF shifts suggests they play an important role in adaptively compensating for unavoidable processing delays that are inherent to the analysis moving objects. The band-pass tuning of our behavioural effects are obviously inconsistent with such a role. Moreover, auditory RF shifts are mediated by a displacement of both edges of the receptive field (Witten et al. 2006), whilst motion-induced spatial misperceptions measured psychophysically, in the visual domain at least, are brought about by changes in apparent contrast at only one edge of the stimulus (Arnold et al. 2007; Tsui et al. 2007). Clearly, the physiological processes that mediate motion-induced positional shifts remain uncertain. What we do know is that despite reported differences in the underlying neurophysiology gathered from different species, the perceived displacement of stationary visual and auditory stimuli following motion adaptation is very similar in humans.
Within the visual domain, virtually every strand of previous research places the locus of interaction between motion and position at the level of area V5/MT. Adaptation-induced positional shifts display a lack of specificity for basic stimulus properties such as spatial frequency, orientation and contrast, consistent with adaptation occurring at this level (McGraw et al. 2002; Whitney 2005). In addition, the disruption of ongoing cortical activity (using transcranial magnetic stimulation) immediately following motion adaptation, dramatically reduces the magnitude of perceived spatial shifts that normally occur, when delivered to V5/MT, but has little or no effect when delivered to earlier cortical areas (V1) (McGraw et al. 2004). As yet, no auditory analogue of V5/MT has been firmly established. Candidate areas that have been investigated include the planum temporale (PT) (Baumgart et al. 1999), the premotor cortex (PMC) (Griffiths et al. 2000) and right parietal cortex (rPC) (Griffiths et al. 1998). However, most studies that have implicated these areas have contrasted neuronal activity associated with auditory motion with that of stationary control stimuli. Such a comparison is problematic, since selective cortical activity might still simply represent a response to source location, or changes therein, rather than motion per se. Indeed, when coherent motion is compared with a more appropriate control, which presents stationary stimuli that randomly change location over time, neuronal activity in areas such as PT and rPC is very similar, arguing against each being the seat of a specialised auditory motion area (Smith et al. 2004; Smith et al. 2007).
The inability to identify a dedicated auditory motion complex is not incompatible with the existence of a specialized auditory motion processing system. The visual and auditory systems may simply display differences in the spatial concentration of direction selective neurons. In area V5/MT virtually all neurons are direction selective, whereas in the AI motion-selective neurons are typically interspersed with non motion-selective neurons (Ahissar et al. 1992). The fact that motion-selective neurons are fewer in number and dispersed over wider cortical areas may make any auditory motion processing system more difficult to detect using current neuroimaging techniques. An alternative and intriguing possibility is that auditory positional shifts are mediated by cortical areas traditionally thought to analyse visual motion. One candidate is area V5/MT, which has shown to be active when subjects are presented with both auditory (Poirier et al. 2005) and tactile (Hagen et al. 2002) motion. Such a mechanism might explain our results by updating auditory position with respect to ongoing motion activity in the visual cortex. Results from our third experiment are consistent with this possibility, as the magnitude of auditory position shifts is similar regardless of whether participants adapt to visual or to auditory motion. However, alternative interpretations are possible. A recent investigation has shown that another form of auditory aftereffect (loudness aftereffect), resulting from exposure to radial visual motion, is mediated by a high-level attention-guided integrative spatial mechanism that operates between the visual and auditory systems (Hong and Papathomas 2006).
The existence of a specialised auditory motion processing system remains a matter of some debate. Psychoacoustical studies have produced conflicting evidence. Some support a system that infers motion direction and speed from a series of positional estimates accumulated over time—the so-called snapshot hypothesis (Grantham 1986). Others point to the reliable perception of motion properties such as acceleration and deceleration, where the time and distance travelled by motion stimuli are equated, as evidence for more sophisticated motion processing than that offered by a simple position-based mechanism (Perrott and Marlborough 1989; Perrott et al. 1992). Our results are clearly more compatible with the latter proposal. In the visual domain it is well established that motion detectors can be influenced by adaptation and this process results in a variety of perceptual aftereffects, such as the motion aftereffect (Barlow and Hill 1963) and adaptation-induced changes in spatial position (Snowden 1998; McGraw et al. 2002; Whitney and Cavanagh 2003; McGraw et al. 2004; Whitney 2005). However, systems that track features over time are not thought to induce such effects in static test stimuli (Anstis 1980; Badcock and Derrington 1985). Here we show that the perceived location of static auditory stimuli is directly modified by motion adaptation. We also show that these effects share similar properties with their visual counterparts. These results add to a growing literature highlighting considerable overlap in the processing mechanisms that encode visual and auditory motion.
This work was funded by The Wellcome Trust, UK. We are grateful to Bose Ltd., Knowles Europe and Oto-tech Limited for providing equipment for this experiment.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Békésy GV (1960) Experiments in hearing. McGraw Hill, New YorkGoogle Scholar
- Grantham DW (1998) Auditory motion aftereffects in the horizontal plane: the effects of spectral region, spatial sector and spatial richness. Acta Acustica 84:337–347Google Scholar
- Grantham DW, Wightman FL (1979) Auditory motion aftereffects. Percept Psychophysiol 26:403–408Google Scholar
- Pralong D, Carlile S (1996) Generation and validation of virtual auditory space. In: Carlile S (ed) Virtual auditory space: generation and applications. Landes, Austin, pp 109–152Google Scholar
- Rayleigh L (1907) On our perception of sound direction. Philos Mag 13:214–232Google Scholar
- Wightman F, Kistler D (2005) Measurement and validation of human HRTFs for use in hearing research. Acta Acustica 91:429–439Google Scholar
- Wilcott RC, Gales RS (1954) Comparison of the masked thresholds of a simulated moving and stationary auditory signal. J Exp Psychol 26:136Google Scholar
- Zeki SM (1974) Functional organization of a visual area in the posterior bank of the superior temporal sulcus of the rhesus monkey. J Physiol (Lond) 236:549–573Google Scholar