Crossmodal integration; Multisensory integration; Intermodal; Heteromodal; Polymodal; Supramodal
Multimodal (or multisensory) integration refers to the neural integration or combination of information from different sensory modalities (the classic five senses of vision, hearing, touch, taste, and smell, and, perhaps less obviously, proprioception, kinesthesis, pain, and the vestibular senses), which gives rise to changes in behavior associated with the perception of and reaction to those stimuli [1,2]. Information is typically integrated across sensory modalities when the sensory inputs share certain common features. For example, although vision is concerned with a certain frequency band of the electromagnetic energy spectrum, and hearing is concerned with changes in pressure at the ears, stimulus features such as spatial location, movement, intensity, timing, and duration, as well as other higher-order features such as meaning and identity can apply equally to information from several (or all) sensory modalities. Crossmodal integration is often used synonymously with multimodal integration, however the latter term has various other associations in different disciplines, including in describing the use of more than one measuring system. The former term, crossmodal, may therefore be preferable.
Multimodal integration is more often used to refer to integrative processes operating at the systems level, and studied most commonly using brain imaging techniques alongside behavioral and perceptual measurements. Multisensory integration on the other hand, tends to refer to the combinatorial effects of stimulation of two or more senses on the activity of single neurons, measured electrophysiologically in experimental animals. Since multisensory integration is more commonly used in the context of single-cell recordings, often made under anesthetised recording conditions, causal relationships to the behavioral outcomes of multisensory integration are less certain, although this is currently an area attracting considerable research interest.
An extensive body of experimental research has shown that many cognitive systems operate in a multimodal manner. Such systems include those responsible for selective attention and orientation to external stimuli, along with both more elementary perceptual effects, and higher-level cognitive systems such as memory. For example, the familiar experience of both hearing another person speak in natural conversation, and seeing the speaker's lip movements while they speak, is an everyday example of multimodal integration involving both low-level perceptual features, such as detecting sounds and lip movements, as well as higher-level linguistic and semantic factors .
In a typical experiment designed to study multimodal attentional orienting, participants may be asked to pay attention and respond only to tactile stimuli presented to a certain hand (e.g., their left hand), and to ignore both tactile stimuli presented to the other hand (i.e., the right hand) or visual stimuli presented to either hand. Typically, visual stimuli presented close to the attended hand result in larger activation (as measured, for example, using electroencephalographic (EEG) or functional magnetic resonance (fMRI) techniques) than for visual stimuli presented to the unattended side. This is true even though the visual stimuli were not relevant to the participants’ task. These, and many other similar results, suggest that the mechanisms of spatial attention may operate in a multimodal or supramodal fashion, facilitating the detection and discrimination of stimuli from a given location regardless of the stimulus modality. The behavioral and neurophysiological effects of attending to a primary modality on the response to the secondary modality, however, are usually smaller than the effects in the primary modality itself. This latter result suggests that both unimodal and multimodal perceptual and attentional mechanisms operate in concert.
In order for multisensory or multimodal integration to occur, information must have been processed initially within the component unimodal sensory systems. The level and extent of this prior unimodal processing, however, depends on the system under study. In the superior colliculus (SC), for example, visual and auditory inputs are integrated very early on, after transmission along only several synapses following sensory transduction at the periphery. The retina sends visual projections directly onto the SC, while auditory inputs reach the SC only several neural synapses after initial sensory transduction at the cochlea. Conversely, those stimuli that are involved in multisensory integration in the cerebral cortex may undergo substantial unimodal processing prior to integration, lasting many tens or even hundreds of milliseconds. Recent research, on the other hand, is beginning to detail the extent to which different sensory processing streams interact at very early stages of processing – as early as 45 ms in the example of visual and auditory processing. This physiological evidence is supported by the existence of distinct anatomical connections between the primary sensory areas of several different sensory systems. More and more it appears that multimodal integration and interaction is the rule, not the exception, at all levels of processing.
Following many years of detailed study on the integration of multisensory inputs in neurons of the superior colliculus, several guiding principles of multisensory integration have emerged . These principles have later been applied in order to determine whether a particular brain region is involved specifically in multisensory integration, both at the level of single-cells, in neurophysiology, and the whole-brain, in neuroimaging.
First, inputs to any given neuron or any given brain area must typically arrive at that area at the same time in order to be integrated and to have significant behavioral consequences. Depending on the specific brain area, “at the same time” typically refers to a “temporal window for multisensory integration.” The width of the temporal window, that is, the maximum temporal delay between the arrivals of inputs from different sensory modalities, may be on the order of 100–300 ms.
Second, in many brain areas, particularly those concerned with spatial representations of visual, tactile, and auditory stimuli, multisensory integration is enhanced for those stimuli that arise from the same external spatial location as compared to different locations. The “same” location in the case of audio-visual speech integration, for example, would be the speaker's mouth, from which both visual and auditory signals arise. In the case of visual and tactile stimuli, the same location might refer, for instance, to the lower left portion of the visual field, and to the animals’ front left leg, or the lower-left side of the organism's face.
Third, one important aspect of multisensory integration at the neural level relates to the relative strength of inputs from different sensory modalities and the relative amplification that occurs in the process of multisensory integration. This “principle of inverse effectiveness” thus states that the relative enhancement due to multisensory integration is larger for those stimuli that produce weak sensory effects on their own, and is smaller for stimuli that cause strong activations at the neural level.
The Superior Colliculus in the midbrain integrates visual, somatosensory, and auditory inputs in the generation and control of spatial orienting behaviors, particularly those concerning eye and head movements. The SC has been studied intensively as a potential model system for multisensory integration in animals. More recent research has examined the extent to which the spatial rule of multisensory integration and temporal rule of multisensory integration as measured in the SC also apply to higher-order behaviors and cognition.
The posterior parietal cortex contains multiple cortical regions, which respond in a variety of ways to visual, somatosensory, auditory, proprioceptive, and vestibular inputs, such as Brodmann's areas 5 and 7 (or the superior and inferior parietal lobules, respectively), and the multiple, heterogeneous areas within the intraparietal sulcus (e.g., ventral, anterior, and medial intraparietal areas). Somatosensory information is processed initially in Brodmann's areas 3 and 1, the primary somatosensory cortices. Somatosensory processing then proceeds posteriorly through areas 2 and 5, into the anterior or medial bank of the intraparietal sulcus. Visual stimuli are processed initially in the primary and secondary visual cortices, proceeding along the dorsal and ventral visual streams. In the intraparietal sulcus (IPS), the dorsal visual stream meets the somatosensory processing stream. Neurons in area 5 have been shown to integrate proprioceptive stimuli with visual information in the representation and updating of postural information.
At the fundus of the intraparietal sulcus, the ventral intraparietal area (VIP) contains a variety of neurons with responses ranging from purely somatosensory to purely visual. The lateral intraparietal area (LIP), on the posterior or lateral bank of the intraparietal sulcus is thought to be involved in the planning, generation, and control of eye movements. This area, dubbed the “parietal eye field” because of its close functional association with the frontal eye fields, integrates multisensory information in generating eye movements to expected, current, and remembered target locations originally specified in a variety of different possible sensory modalities. Other areas in the intraparietal sulcus display a variety of multisensory responses. The anterior intraparietal area (AIP) integrates visual and somatosensory information in planning and generating object-related movements such as grasping, while the medial intraparietal area (MIP), as part of the parietal reach region (PRR) is involved in the generation and control of reaching movements.
Neurons in the superior temporal sulcus in macaques and humans, and the anterior ectosylvian gyrus in cats respond to stimulation in a number of sensory modalities, but have been studied particularly in connection with audiovisual speech and vocalizations, in both monkeys and man. This area is often activated in studies that pair both audible and visual (lip movements) speech.
The premotor cortex in the frontal lobe is thought to integrate multisensory information involved in the planning and execution of movements. A small portion of the ventral premotor cortex, known as the polysensory zone, responds to somatosensory, visual, and auditory inputs, and seems to be involved in representing multisensory “peripersonal space” – the space immediately surrounding certain parts of our bodies, particularly the hands and face. This area is connected to functionally similar areas in the posterior parietal cortex such as the ventral intraparietal area. Neurons in the polysensory zone of the premotor cortex respond both to objects approaching a certain portion of the animal's skin (i.e., a visual receptive field surrounding the neuron's corresponding somatosensory receptive field), and to the generation of defensive or avoidance movements in response to these objects.
Certain areas of the orbitofrontal cortex also respond to multisensory stimuli, particularly those concerned with appetitive rewards, such as food, flavors, tastes, and aromas, along with emotionally salient multisensory signals.
Multiple feedforward and feedback connections between the frontal and parietal cortices subserving the processing of multisensory information, and the planning, and execution of movements following multisensory stimulation probably constitute a network of multimodal perception-action or attentional systems. Neural studies of multimodal integration, at least in recent years, have been based largely on the findings of multisensory integration in the SC concerning the generation of orientation movements. Since these early studies, however, research has unveiled numerous brain areas that process and integrate information from a number of sensory systems. Each of these areas seems to be specialized for particular domains of environmental stimuli, or for particular forms of action. Underlying the various approaches to the study of multisensory integration is the hope that general rules of multisensory integration can be discovered that apply to a wide range of behavioral situations, and across a variety of distinct brain regions.
Methods to Measure this Event/Condition
Multisensory integration is typically measured via single-unit recordings in cats, ferrets, barn owls, or macaque monkeys. Both anesthetized and awake behaving preparations have been used, often in conjunction with behavioral studies in the same species and under similar stimulus conditions, or with human studies under similar experimental conditions.
A variety of neuroanatomical and neurophysiological techniques have been used in the model system of the SC, including single-unit recording and stimulation, lesion studies, tract-tracing, cooling and other forms of inactivation and deactivation of the colliculi themselves, or of brain regions projecting to or receiving from the SC . These studies have shown, for example, that selective lesions or deactivation of the SC abolishes the integration of auditory and visual information arising in those regions of space that the affected portion of the SC represented. Multisensory integration in those parts of the SC left intact was unaffected. Early work on the SC focused on the developmental time-course of multisensory integration, the temporal and spatial characteristics of the stimuli required for effective multisensory integration, the spatial arrangement of the different multisensory representations in the SC, and the ways in which this particular organization came about. (i.e., in part genetically determined, but influenced very strongly by visual and multisensory experience throughout development).
A number of behavioral methods are available to measure multimodal integration in human participants, including reaction-time measures, threshold determination, two (or more)-alternative-forced-choice measures (speeded or unspeeded) and signal detection analyses, which have been used on studies of sensory modalities in isolation for many years. The variety of experimental techniques now available for studying multimodal integration in healthy human participants as well as in brain-damaged neuropsychological patients is now considerable, and an adequate summary is beyond the scope of this article. However, certain important recent trends can be highlighted.
Modern neuroimaging techniques, such as fMRI, positron emission tomography (PET), and magnetoencephalography (MEG), are increasingly being used to address questions concerning multisensory integration. Such experiments often require the development of new stimulation equipment that can be brought into the scanner environment itself. Studying the effects of tactile, olfactory, and gustatory stimulation in the scanner has involved overcoming some difficult technical problems, due to the very strong electromagnetic fields involved in fMRI, and to the very sensitive equipment required to detect small changes in electrical (EEG) or magnetic (MEG) fields over the scalps of human participants. But it is now possible to present stimuli in a number of sensory modalities simultaneously to participants lying in the scanner while they perform simple behavioral tasks. This line of research will provide much-needed theoretical and empirical links between the neurophysiological literature derived from experimental animals, typically macaque monkeys, cats, or ferrets on the one hand, and the human behavioral, psychophysical, and neuropsychological literature on the other hand. It will be crucial to know, for example, to what extent the principles and properties of multisensory integration at the single-neuron level measured in experimental animals can be related to the principles and properties derived from human behavioral studies. In short, do those behaviors that reflect multimodal integration depend directly on cells displaying multisensory integration?
There has been much interest in the effects of a variety of brain lesions in adult humans on multimodal integration and associated behaviors. Several neuropsychological syndromes that have traditionally been studied as if they were unimodal deficits, such as tactile extinction and unilateral visuospatial neglect, have, upon closer inspection, been found to be multisensory in nature (crossmodal extinction). For example, many patients with unilateral visuospatial neglect often have deficits in the detection of auditory and tactile stimuli that occur on their affected side, in addition to visual impairments. Similarly, patients suffering from tactile extinction (a condition where contralesional tactile stimuli are easily detected in isolation, but when two stimuli are presented together on opposite sides of the body, the detection of the contralesional stimulus is impaired), may also have impairments in detecting tactile stimuli on the contralesional hand, for example, when a simultaneous visual stimulus is presented near to the ipsilesional hand. The discovery of deficits that cut across the senses in disorders that have typically been thought of as being confined to a single sensory modality, suggests that disorders such as neglect and extinction may also be characterized as disorders of supramodal functions and processes such as spatial perception, and attention, rather than as impairments of a more sensory-specific (modality-specific) perceptual nature.
Another important line of research involving human participants involves examining the multimodal consequences of sensory-specific impairments, for example in blind and deaf adults and children . Such work has shown that impairments in a single modality have rather intriguing consequences for other sensory systems. In neuroimaging experiments, for example, it has been shown that, when blind participants read Braille, their visual cortex is activated suggesting a functional role for “visual” cortex in complex tactile spatial discrimination. Such visual activations are not observed when participants with unimpaired vision read Braille, nor in people who lose their sight late in life (i.e., after puberty). This neural activation was shown to be functionally relevant to the Braille reading task by the significant disruptive effects of transcranial magnetic stimulation (TMS) over the occipital cortex of the same volunteers that exhibited visual activations during Braille reading. Additionally, and perhaps more strikingly, it has recently been shown that normal participants, when completely blindfolded for 5 days while learning to recognize Braille characters, also show activation in the visual cortex, along with an improved ability to learn the Braille task. These changes were not seen in normal participants who were blindfolded only during the learning and testing phases of the experiment, suggesting that this form of neural plasticity may take several days to take effect. Several different forms of crossmodal plasticity seem to be operating – one that occurs only in those patients who lose their sight before puberty, and another that results from the short-term recruitment of visual cortex following temporary blindfolding. Further research is needed to understand what neural mechanisms underpin these physiological changes.
The consequences of such findings for our views of the functional organization of the brain could hardly be more important: the assignment of visual cortex as strictly visually responsive may be rather premature. Rather, the visual cortex may be specialized for processing detailed spatial information in order to make complex spatial discriminations (e.g., in reading Braille). In normal circumstances, visual inputs are functionally the most useful for such spatial discrimination tasks, but in the absence of input from the eyes, inputs from the tactile and auditory receptors may help to perform such tasks. A similar line of research on the multimodal consequences of hearing impairments has reached analogous conclusions.
Finally, and quite recently, multimodal integration is now being approached from a mathematical modeling perspective, particularly with regard to modeling the precision and reliability of information arising from different sensory modalities . Bayesian and maximum-likelihood methodologies have been used to model a variety of phenomena in multimodal integration. Such work suggests that the central nervous system integrates information from the different sensory modalities in an optimal fashion, based on the variability of responses under increasingly noisy stimulus conditions. This relates well to the foregoing conclusions of the work in blind and deaf people – the brain is a highly interconnected network dealing with vast quantities of information, and different neural subsystems are able to share that information effectively to the best advantage of the organism. The visual cortex will receive and process auditory and tactile inputs given a certain amount of visual deprivation, in order to process the relevant information and complete the designated task. Under less extreme conditions of visual deprivation, where the quality of the visual signal is degraded (e.g., on a misty day, or when we remove correcting lenses), information from other modalities may be weighted more strongly in the performance of certain cognitive tasks.
In conclusion, multimodal integration is an exciting and rapidly developing field of enquiry that spans numerous academic disciplines, from basic neuroscience, to medicine, physiology, psychology, cognitive science, and mathematical modeling. From each of these disciplines, multiple well-developed methodological approaches are now available to facilitate the study of the multimodal brain. By progressively questioning the assumption that we are born with only five senses, and that, throughout life, these five senses are both anatomically and phenomenologically distinct, the study of multimodal integration is beginning to provide intriguing answers to historically difficult questions.