Abstract
The perception of objects is a cognitive function of prime importance. In everyday life, object perception benefits from the coordinated interplay of vision, audition, and touch. The different sensory modalities provide both complementary and redundant information about objects, which may improve recognition speed and accuracy in many circumstances. We review crossmodal studies of object recognition in humans that mainly employed functional magnetic resonance imaging (fMRI). These studies show that visual, tactile, and auditory information about objects can activate cortical association areas that were once believed to be modality-specific. Processing converges either in multisensory zones or via direct crossmodal interaction of modality-specific cortices without relay through multisensory regions. We integrate these findings with existing theories about semantic processing and propose a general mechanism for crossmodal object recognition: The recruitment and location of multisensory convergence zones varies depending on the information content and the dominant modality.
This is a preview of subscription content, access via your institution.




References
Adams RB, Janata P (2002) A comparison of neural circuits underlying auditory and visual object categorization. Neuroimage 16:361–377
Amedi A (2004) Multisensory object-related processing in the visual cortex of sighted and its reversed hierarchical organization in blind humans. In: Presented at the 5th international multisensory research forum in Sitges, Spain, Abstract No. 149
Amedi A, Malach R, Hendler T, Peled S, Zohary E (2001) Visuo-haptic object-related activation in the ventral visual pathway. Nat Neurosci 4:324–330
Amedi A, Jacobson G, Hendler T, Malach R, Zohary E (2002) Convergence of visual and tactile shape processing in the human lateral occipital complex. Cereb Cortex 12:1202–1212
Andersen RA, Snyder LH, Bradley DC, Xing J (1997) Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Annu Rev Neurosci 20:303–330
Arnott SR, Binns MA, Grady CL, Alain C (2004) Assessing the auditory dual-pathway model in humans. Neuroimage 22:401–408
van Atteveldt N, Formisano E, Goebel R, Blomert L (2004) Integration of letters and speech sounds in the human brain. Neuron 43:271–282
Banati RB, Goerres GW, Tjoa C, Aggleton JP, Grasby P (2000) The functional anatomy of visual-tactile integration in man: a study using positron emission tomography. Neuropsychologia 38:115–124
Bartels A, Zeki S (2004a) Functional brain mapping during free viewing of natural scenes. Hum Brain Mapp 21:75–85
Bartels A, Zeki S (2004b) The chronoarchitecture of the human brain—natural viewing conditions reveal a time-based anatomy of the brain. Neuroimage 22:419–433
Beauchamp MS, Lee KE, Haxby JV, Martin A (2002) Parallel visual motion processing streams for manipulable objects and human movements. Neuron 34:149–159
Beauchamp MS, Lee KE, Argall BD, Martin A (2004) Integration of auditory and visual information about objects in superior temporal sulcus. Neuron 41:809–823
Belin P, Zatorre RJ (2000) ‘What’, ‘where’, and ‘how’ in auditory cortex. Nat Neurosci 3:965–966
Belin P, Zatorre RJ (2003) Adaptation to speaker’s voice in right anterior temporal lobe. Neuroreport 14:2105–2109
Belin P, Zatorre RJ, Lafaille P, Ahad P, Pike B (2000) Voice-selective areas in human auditory cortex. Nature 403:309–312
Bernstein LE, Auer ET Jr, Moore JK, Ponton CW, Don M, Singh M (2002) Visual speech perception without primary auditory cortex activation. Neuroreport 13:311–315
Binder JR, Liebenthal E, Possing ET, Medler DA, Ward BD (2004) Neural correlates of sensory and decision processes in auditory object identification. Nat Neurosci 7:295–301
Binkofski F, Buccino G, Posse S, Seitz RJ, Rizzolatti G, Freund H (1999) A fronto-parietal circuit for object manipulation in man: evidence from an fMRI-study. Eur J Neurosci 11:3276–3286
Bodegard A, Geyer S, Grefkes C, Zilles K, Roland PE (2001) Hierarchical processing of tactile shape in the human brain. Neuron 31:317–328
Bregman AS (1990) Auditory scene analysis. MIT Press, Cambridge, MA
Burton AM, Bruce V, Johnston RA (1990) Understanding face recognition with an interactive activation model. Br J Psychol 81(Pt 3):361–380
Callan DE, Callan AM, Kroos C, Vatikiotis-Bateson E (2001) Multimodal contribution to speech perception revealed by independent component analysis: a single-sweep EEG case study. Brain Res Cogn Brain Res 10:349–353
Callan DE, Jones JA, Munhall K, Callan AM, Kroos C, Vatikiotis-Bateson E (2003) Neural processes underlying perceptual enhancement by visual speech gestures. Neuroreport 14:2213–2218
Calvert GA (2001) Crossmodal processing in the human brain: insights from functional neuroimaging studies. Cereb Cortex 11:1110–1123
Calvert GA, Campbell R (2003) Reading speech from still and moving faces: the neural substrates of visible speech. J Cogn Neurosci 15:57–70
Calvert GA, Lewis JW (2004) Hemodynamic studies of audiovisual interactions. In: Calvert G, Spence C, Stein BE (eds) The handbook of multisensory processes. MIT Press, Cambridge, MA, pp 483–502
Calvert GA, Bullmore ET, Brammer MJ, Campbell R, Williams SC, McGuire PK, Woodruff PW, Iversen SD, David AS (1997) Activation of auditory cortex during silent lipreading. Science 276:593–596
Calvert GA, Brammer MJ, Bullmore ET, Campbell R, Iversen SD, David AS (1999) Response amplification in sensory-specific cortices during crossmodal binding. Neuroreport 10:2619–2623
Calvert GA, Campbell R, Brammer MJ (2000) Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Curr Biol 10:649–657
Catani M, Jones DK, Donato R, Ffytche DH (2003) Occipito-temporal connections in the human brain. Brain 126:2093–2107
Colby CL, Goldberg ME (1999) Space and attention in parietal cortex. Annu Rev Neurosci 22:319–49
De Gelder B, Bertelson P (2003) Multisensory integration, perception and ecological validity. Trends Cogn Sci 7:460–467
Deibert E, Kraut M, Kremen S, Hart J Jr (1999) Neural pathways in tactile object recognition. Neurology 52:1413–1417
Dolan RJ, Morris JS, de Gelder B (2001) Crossmodal binding of fear in voice and face. Proc Natl Acad Sci USA 98:10006–10010
Downing PE, Jiang Y, Shuman M, Kanwisher N (2001) A cortical area selective for visual processing of the human body. Science 293:2470–2473
Easton RD, Srinivas K, Greene AJ (1997) Do vision and haptics share common representations? Implicit and explicit memory within and between modalities. J Exp Psychol Learn Mem Cogn 23:153–163
Ellis HD, Jones DM, Mosdell N (1997) Intra- and inter-modal repetition priming of familiar faces and voices. Br J Psychol 88:143–156
Epstein R, Kanwisher N (1998) A cortical representation of the local visual environment. Nature 392:598–601
Ernst M, Bülthoff H (2004) Merging the senses into a robust percept. Trends Cogn Sci 8:162–169
Falchier A, Clavagnier S, Barone P, Kennedy H (2002) Anatomical evidence of multimodal integration in primate striate cortex. J Neurosci 22:5749–5759
Feinberg TE, Rothi LJ, Heilman KM (1986) Multimodal agnosia after unilateral left hemisphere lesion. Neurology 36:864–867
Felleman DJ, Van Essen DC (1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1:1–47
Gauthier I, Skudlarski P, Gore JC, Anderson AW (2000) Expertise for cars and birds recruits brain areas involved in face recognition. Nat Neurosci 3:191–197
Gauthier I, Tarr MJ, Moylan J, Skudlarski P, Gore JC, Anderson AW (2000) The fusiform “face area” is part of a network that processes faces at the individual level. J Cogn Neurosci 12:495–504
Gleitman LR, Rozin P (1977) The structure and acquisition of reading I: relations between orthographies and the structure of language. In: Reber A, Scarborough D (eds) Towards a psychology of reading: the proceedings of the CUNY conferences. Lawrence Erlbaum Associates, Hillsdale, NJ
Goodale MA, Meenan JP, Bulthoff HH, Nicolle DA, Murphy KJ, Racicot CI (1994) Separate neural pathways for the visual analysis of object shape in perception and prehension. Curr Biol 4:604–610
Gorno-Tempini ML, Price CJ (2001) Identification of famous faces and buildings: a functional neuroimaging study of semantically unique items. Brain 124:2087–2097
Gorno-Tempini ML, Price CJ, Josephs O, Vandenberghe R, Cappa SF, Kapur N, Frackowiak RS, Tempini ML (1998) The neural systems sustaining face and proper-name processing. Brain 121(Pt 11):2103–2118
Grefkes C, Weiss PH, Zilles K, Fink GR (2002) Crossmodal processing of object features in human anterior intraparietal cortex: an fMRI study implies equivalencies between humans and monkeys. Neuron 35:173–184
Griffiths TD, Warren JD (2002) The planum temporale as a computational hub. Trends Neurosci 25:348–353
Grill-Spector K (2003) The neural basis of object perception. Curr Opin Neurobiol 13:159–166
Grill-Spector K, Malach R (2004) The human visual cortex. Annu Rev Neurosci 27:649–677
Hadjikhani N, Roland PE (1998) Cross-modal transfer of information between the tactile and the visual representations in the human brain: a positron emission tomographic study. J Neurosci 18:1072–1084
Hashimoto R, Sakai KL (2004) Learning letters in adulthood: direct visualization of cortical plasticity for forming a new link between orthography and phonology. Neuron 42:311–322
Hasson U, Harel M, Levy I, Malach R (2003) Large-scale mirror-symmetry organization of human occipito-temporal object areas. Neuron 37:1027–1041
Hasson U, Nir Y, Levy I, Fuhrmann G, Malach R (2004) Intersubject synchronization of cortical activity during natural vision. Science 303:1634–1640
Haxby JV, Gobbini MI, Furey ML, Ishai A, Schouten JL, Pietrini P (2001) Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293:2425–2430
Hoffman EA, Haxby JV (2000) Distinct representations of eye gaze and identity in the distributed human neural system for face perception. Nat Neurosci 3:80–84
Iwamura Y (1998) Hierarchical somatosensory processing. Curr Opin Neurobiol 8:522–528
James TW, Humphrey GK, Gati JS, Servos P, Menon RS, Goodale MA (2002) Haptic study of three-dimensional objects activates extrastriate visual areas. Neuropsychologia 40:1706–1714
Jäncke L, Wüstenberg T, Scheich H, Heinze HJ (2002) Phonetic perception and the temporal cortex. Neuroimage 15:733–746
Kaas JH, Hackett TA (1999) ‘What’ and ‘where’ processing in auditory cortex. Nat Neurosci 2:1045–1047
Kamachi M, Hill H, Lander K, Vatikiotis-Bateson E (2003) “Putting the face to the voice”: matching identity across modality. Curr Biol 13:1709–1714
Kanwisher N, McDermott J, Chun MM (1997) The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci 17:4302–4311
Kilgour AR, Lederman SJ (2002) Face recognition by hand. Percept Psychophys 64:339–352
Kilgour AR, de Gelder B, Lederman SJ (2004) Haptic face recognition and prosopagnosia. Neuropsychologia 42:707–712
Kosslyn SM, Ganis G, Thompson WL (2001) Neural foundations of imagery. Nat Rev Neurosci 2:635–642
von Kriegstein K, Giraud AL (2004) Distinct functional substrates along the right superior temporal sulcus for the processing of voices. Neuroimage 22:948–955
von Kriegstein K, Eger E, Kleinschmidt A, Giraud AL (2003) Modulation of neural responses to speech by directing attention to voices or verbal content. Brain Res Cogn Brain Res 17:48–55
von Kriegstein K, Kleinschmidt A, Sterzer P, Giraud AL (in press) Interaction of face and voice areas during speaker recognition. J Cog Neurosci
Kuhl PK, Meltzoff AN (1982) The bimodal perception of speech in infancy. Science 218:1138–1141
Laurienti PJ, Wallace MT, Maldjian JA, Susi CM, Stein BE, Burdette JH (2003) Cross-modal sensory processing in the anterior cingulate and medial prefrontal cortices. Hum Brain Mapp 19:213–223
Leveroni CL, Seidenberg M, Mayer AR, Mead LA, Binder JR, Rao SM (2000) Neural systems underlying the recognition of familiar and newly learned faces. J Neurosci 20:878–886
Lewis JW, Wightman FL, Brefczynski JA, Phinney RE, Binder JR, DeYoe EA (2004) Human brain regions involved in recognizing environmental sounds. Cereb Cortex AoP
Liberman AM (1992) The relation of speech to reading and writing. In: Frost R, Katz L (eds) Orthography, phonology, morphology and meaning. Elsevier Science Publishers BV, Amsterdam
Macaluso E, George N, Dolan R, Spence C, Driver J (2004) Spatial and temporal factors during processing of audiovisual speech: a PET study. Neuroimage 21:725–732
Malach R, Reppas JB, Benson RR, Kwong KK, Jiang H, Kennedy WA, Ledden PJ, Brady TJ, Rosen BR, Tootell RB (1995) Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proc Natl Acad Sci USA 92:8135–8139
Martin A, Chao LL (2001) Semantic memory and the brain: structure and processes. Curr Opin Neurobiol 11:194–201
McCandliss BD, Cohen L, Dehaene S (2003) The visual word form area: expertise for reading in the fusiform gyrus. Trends Cogn Sci 7:293–299
McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748
Merabet L, Thut G, Murray B, Andrews J, Hsiao S, Pascual-Leone A (2004) Feeling by sight or seeing by touch?. Neuron 42:173–179
Mesulam MM (1998) From sensation to cognition. Brain 121(Pt 6):1013–1052
Mishkin M (1979) Analogous neural models for tactual and visual learning. Neuropsychologia 17(2):139–151
Molholm S, Ritter W, Javitt DC, Foxe JJ (2004) Multisensory visual-auditory object recognition in humans: a high-density electrical mapping study. Cereb Cortex 14:452–465
Morin P, Rivrain Y, Eustache F, Lambert J, Courtheoux P (1984) Visual and tactile agnosia. Rev Neurol (Paris) 140:271–277
Munhall KG, Tohkura Y (1998) Audiovisual gating and the time course of speech perception. J Acoust Soc Am 104:530–539
Murray MM, Michel CM, Grave de Peralta R, Ortigue S, Brunet D, Gonzalez AS, Schnider A (2004) Rapid discrimination of visual and multisensory memories revealed by electrical neuroimaging. Neuroimage 21:125–135
Nakamura K, Kawashima R, Sato N, Nakamura A, Sugiura M, Kato T, Hatano K, Ito K, Fukuda H, Schormann T, Zilles K (2000) Functional delineation of the human occipito-temporal areas related to face and scene processing. A PET study. Brain 123:1903–1912
Naumer MJ, Singer W, Muckli L (2002a) Audio-visual perception of natural objects. OHBM Abstract#15600
Naumer MJ, Wibral M, Singer W, Muckli L (2002b) FMRI-studies of category-specific audio-visual processing—visual cortex. IMRF Abstract#25
Naumer MJ, Petkova V, Havenith MN, Kohler A, Singer W, Muckli L (2004) Paying attention to multisensory objects. OHBM Abstract#TH99
Newell FN (2004) Cross-modal object recognition. In: Calvert G, Spence C, Stein BE (eds) The handbook of multisensory processes. MIT Press, Cambridge, MA, pp 123–139
Ohtake H, Fujii T, Yamadori A, Fujimori M, Hayakawa Y, Suzuki K (2001) The influence of misnaming on object recognition: a case of multimodal agnosia. Cortex 37:175–186
Olson IR, Gatenby JC, Gore JC (2002) A comparison of bound and unbound audio-visual information processing in the human cerebral cortex. Brain Res Cogn Brain Res 14:129–138
O’Sullivan BT, Roland PE, Kawashima R (1994) A PET study of somatosensory discrimination in man. Microgeometry versus macrogeometry. Eur J Neurosci 6:137–148
Pascual-Leone A, Hamilton R (2001) The metamodal organization of the brain. Prog Brain Res 134:427–445
Pascual-Leone A, Walsh V, Rothwell J (2000) Transcranial magnetic stimulation in cognitive neuroscience—virtual lesion, chronometry, and functional connectivity. Curr Opin Neurobiol 10:232–237
Paulesu E, Perani D, Blasi V, Silani G, Borghese NA, De Giovanni U, Sensolo S, Fazio F (2003) A functional-anatomical model for lipreading. J Neurophysiol 90:2005–2013
Pietrini P, Furey ML, Ricciardi E, Gobbini MI, Wu WH, Cohen L, Guazzelli M, Haxby JV (2004) Beyond sensory images: object-based representation in the human ventral pathway. Proc Natl Acad Sci USA 101:5658–5663
Polk TA, Stallcup M, Aguirre GK, Alsop DC, D’Esposito M, Detre JA, Farah MJ (2002) Neural specialization for letter recognition. J Cogn Neurosci 14:145–159
Polster MR, Rose SB (1998) Disorders of auditory processing: evidence for modularity in audition. Cortex 34:47–65
Pons TP, Garraghty PE, Friedman DP, Mishkin M (1987) Physiological evidence for serial processing in somatosensory cortex. Science 237:417–420
Prather SC, Votaw JR, Sathian K (2004) Task-specific recruitment of dorsal and ventral visual areas during tactile perception. Neuropsychologia 42:1079–1087
Raij T, Uutela K, Hari R (2000) Audiovisual integration of letters in the human brain. Neuron 28:617–625
Rauschecker JP, Tian B (2000) Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proc Natl Acad Sci USA 97:11800–11806
Reales JM, Ballesteros S (1999) Implicit and explicit memory for visual and haptic objects: cross-modal priming depends on structural descriptions. J Exp Psychol Learn Mem Cog 25:644–663
Reed CL, Caselli RJ (1994) The nature of tactile agnosia: a case study. Neuropsychologia 32:527–539
Reed CL, Shoham S, Halgren E (2004) Neural substrates of tactile object recognition: an fMRI study. Hum Brain Mapp 21:236–246
Rockland KS, Ojima H (2003) Multisensory convergence in calcarine visual areas in macaque monkey. Int J Psychophysiol 50(1–2):19–26
Roland PE, O’Sullivan B, Kawashima R (1998) Shape and roughness activate different somatosensory areas in the human brain. Proc Natl Acad Sci USA 95:3295–3300
Romanski LM, Tian B, Fritz J, Mishkin M, Goldman-Rakic PS, Rauschecker JP (1999) Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nat Neurosci 2:1131–1136
Saito DN, Okada T, Morita Y, Yonekura Y, Sadato N (2003) Tactile-visual cross-modal shape matching: a functional MRI study. Brain Res Cogn Brain Res 17:14–25
Sathian K, Zangaladze A, Hoffman JM, Grafton ST (1997) Feeling with the mind’s eye. Neuroreport 8:3877–3881
Schroeder CE, Smiley J, Fu KG, McGinnis T, O’Connell MN, Hackett TA (2003) Anatomical mechanisms and functional implications of multisensory convergence in early cortical processing. Int J Psychophysiol 50:5–17
Sekiyama K, Kanno I, Miura S, Sugita Y (2003) Auditory-visual speech perception examined by fMRI and PET. Neurosci Res 47:277–287
Shah NJ, Marshall JC, Zafiris O, Schwab A, Zilles K, Markowitsch HJ, Fink GR (2001) The neural correlates of person familiarity A functional magnetic resonance imaging study with clinical implications. Brain 124:804–815
Stein BE, Meredith MA (1993) The merging of the senses. MIT Press, Cambridge, MA
Stoesz MR, Zhang M, Weisser VD, Prather SC, Mao H, Sathian K (2003) Neural networks active during tactile form perception: common and differential activity during macrospatial and microspatial tasks. Int J Psychophysiol 50:41–49
Sumby WH, Pollack I (1954) Visual contribution to speech intelligibility in noise. J Acoust Soc Am 26:212–215
Thierry G, Giraud AL, Price C (2003) Hemispheric dissociation in access to the human semantic system. Neuron 38:499–506
Tootell RB, Tsao D, Vanduffel W (2003) Neuroimaging weighs in: humans meet macaques in “primate” visual cortex. J Neurosci 23:3981–3989
Ungerleider LG, Haxby JV (1994) ‘What’ and ‘where’ in the human brain. Curr Opin Neurobiol 4:157–165
Ungerleider LG, Mishkin M (1982) Two cortical visual streams. In: Ingle DJ, Goodale MA, Mansfield RJW (eds) Analysis of visual behavior. MIT Press, Cambridge, MA
Wallace MT, Ramachandran R, Stein BE (2004a) A revised view of sensory cortical parcellation. Proc Natl Acad Sci USA 101:2167–2172
Wallace MT, Roberson GE, Hairston WD, Stein BE, Vaughan JW, Schirillo JA (2004b) Unifying multisensory signals across time and space. Exp Brain Res [epub ahead of print]
Welch RB, Warren DH (1986) Intersensory interactions. In: Boff KR, Kaufman L, Thomas J (eds) Handbook of perception and human performance. Wiley, New York
Wernicke C (1874) Der aphasische Symptomenkomplex, eine psychologische Studie auf anatomischer Basis. Cohn& Weigert, Breslau
Wright TM, Pelphrey KA, Allison T, McKeown MJ, McCarthy G (2003) Polysensory interactions along lateral temporal regions evoked by audiovisual speech. Cereb Cortex 13:1034–1043
Zangaladze A, Epstein CM, Grafton ST, Sathian K (1999) Involvement of visual cortex in tactile discrimination of orientation. Nature 401:587–590
Zatorre RJ, Bouffard M, Belin P (2004) Sensitivity to auditory object features in human temporal neocortex. J Neurosci 24:3637–3642
Zeki SM (1978) Functional specialization in the visual cortex of the rhesus monkey. Nature 274:423–428
Acknowledgements
This research was funded by a Horowitz Foundation fellowship (A.A.), the Bundesministerium für Bildung und Forschung (BMBF; K.v.K., M.J.N.), the Volkswagenstiftung (K.v.K.), and the Max Planck Society (M.J.N). The authors thank Nikolas Francis, Axel Kohler (for help with the figures), Lotfi Merabet, Wolf Singer, Lars Muckli, and three anonymous reviewers (for their helpful comments on earlier versions of this paper). Reprint requests and remarks should be addressed to Marcus Johannes Naumer (H.J.Naumer@med.uni-frankfurt.de) or to Amir Amedi (aamedi@bidmc.harvard.edu).
Author information
Authors and Affiliations
Corresponding author
Additional information
A. Amedi, K. von Kriegstein, N. M. van Atteveldt, M. S. Beauchamp and M. J. Naumeri contributed equally to this work
Rights and permissions
About this article
Cite this article
Amedi, A., von Kriegstein, K., van Atteveldt, N.M. et al. Functional imaging of human crossmodal identification and object recognition. Exp Brain Res 166, 559–571 (2005). https://doi.org/10.1007/s00221-005-2396-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00221-005-2396-5