Skip to main content

Computational Modeling of Multisensory Object Perception

  • Chapter
  • First Online:
Multisensory Object Perception in the Primate Brain

Abstract

Computational modeling largely based on advances in artificial intelligence and machine learning has helped furthering the understanding of some of the principles and mechanisms of multisensory object perception. Furthermore, this theoretical work has led to the development of new experimental paradigms and to important new questions. The last 20 years have seen an increasing emphasis on models that explicitly compute with uncertainties, a crucial aspect of the relation between sensory signals and states of the world. Bayesian models allow for the formulation of such relationships and also of explicit optimality criteria against which human performance can be compared. They therefore allow answering the question, how close human performance comes to a specific formulation of best performance. Maybe even more importantly, Bayesian methods allow comparing quantitatively different models by how well they account for observed data. The success of such techniques in explaining perceptual phenomena has also led to a large number of new open questions, especially about how the brain is able to perform computations that are consistent with these functional models and also about the origin of the algorithms in the brain. We briefly review some key empirical evidence of crossmodal perception and proceed to give an overview of the computational principles evident form this work. The presentation of current modeling approaches to multisensory perception considers Bayesian models, models at an intermediate level, and neural models implementing multimodal computations. Finally, this chapter specifically emphasizes current open questions in theoretical models of multisensory object perception.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Adams WJ, Graf EW, Ernst MO (2004) Experience can change the ‘light-from-above’ prior. Nat Neurosci 7(10):1057–1058

    Article  CAS  PubMed  Google Scholar 

  • Alais D, Burr D (2004) The ventriloquist effect results from near-optimal bimodal integration. Curr Biol 14(3):257–262

    CAS  PubMed  Google Scholar 

  • Alvarado JC, Vaughan JW, Stanford TR, Stein BE (2007) Multisensory versus unisensory integration: contrasting modes in the superior colliculus. J Neurophysiol 97(5): 3193–3205

    Article  PubMed  Google Scholar 

  • Anastasio TJ, Patton PE (2003) A two-stage unsupervised learning algorithm reproduces multisensory enhancement in a neural network model of the corticotectal system. J Neurosci 23(17):6713–6727

    CAS  PubMed  Google Scholar 

  • Anastasio TJ, Patton PE, Belkacem-Boussaid K (2000) Using Bayes’ rule to model multisensory enhancement in the superior colliculus. Neural Comput 12(5):1165–1187

    Article  CAS  PubMed  Google Scholar 

  • Anderson CH, Van Essen DC (1994) Neurobiological computational systems. In: Zureda JM, Marks RJ, Robinson CJ (eds) Computational intelligence imitating life. IEEE Press, New York, pp 213–222

    Google Scholar 

  • Atkins JE, Fiser J, Jacobs RA (2001) Experience-dependent visual cue integration based on consistencies between visual and haptic percepts. Vision Res 41(4):449–461

    Article  CAS  PubMed  Google Scholar 

  • Battaglia PW, Jacobs RA, Aslin RN (2003) Bayesian integration of visual and auditory signals for spatial localization. J Opt Soc Am A Opt Image Sci Vis 20(7):1391–1397

    Article  PubMed  Google Scholar 

  • Battaglia PW, Schrater P, Kersten D (2005)  Auxiliary object knowledge influences visually-guided interception behavior.  In: Proceedings of the 2nd symposium on applied perception in graphics and visualization, ACM International Conference Proceeding Series. ACM, New York, NY, pp 145–152

    Google Scholar 

  • Bernoulli D.; Originally published in 1738; (January 1954). “Exposition of a New Theory on the Measurement of Risk”. Econometrica 22(1): 22–36 (trans: Lousie Sommer)

    Google Scholar 

  • Beierholm U, Kording K, Shams L, Ma WJ (2008) Comparing Bayesian models for multisensory cue combination without mandatory integration. Advances in neural information processing systems 20. MIT Press, Cambridge, MA, vol. 1, pp 81–88

    Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, Heidelberg

    Google Scholar 

  • Bizley JK, Nodal FR, Bajo VM, Nelken I, King AJ (2007) Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cereb Cortex 17(9):2172–2189

    Article  PubMed  Google Scholar 

  • Bruce C, Desimone R, Gross CG (1981) Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. J Neurophysiol 46(2):369–384

    CAS  PubMed  Google Scholar 

  • Calvert GA, Bullmore ET, Brammer MJ, Campbell R, Williams SC, McGuire PK, Woodruff PW, Iversen SD, David AS (1997) Activation of auditory cortex during silent lipreading. Science 276(5312):593–596

    Article  CAS  PubMed  Google Scholar 

  • Daw N, Courville A (2008) The pigeon as particle filter. In: Advances in neural information processing systems 20 (NIPS 2007). MIT Press, Cambridge, MA, pp 369–376

    Google Scholar 

  • Deneve S (2005) Bayesian inferences in spiking neurons. In: Advances in neural information processing systems 17 (NIPS 2004). MIT Press, Cambridge, MA, pp 353–360

    Google Scholar 

  • Doya K, Ishii S, Pouget A, Rao RPN (2007) The Bayesian brain: probabilistic approaches to neural coding. MIT Press, Cambridge, MA

    Google Scholar 

  • Ernst MO (2007) Learning to integrate arbitrary signals from vision and touch. J Vis 7(5):7.1–7.14

    Article  Google Scholar 

  • Ernst MO, Banks MS (2002) Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870):429–433

    Article  CAS  PubMed  Google Scholar 

  • Ernst MO, Banks MS, Bülthoff HH (2000) Touch can change visual slant perception. Nat Neurosci 3:69–73

    Article  CAS  PubMed  Google Scholar 

  • Ernst MO, Bülthoff HH (2004) Merging the senses into a robust percept’. Trends Cogn Sci 8(4):162–169

    Article  PubMed  Google Scholar 

  • Fellemann DJ, Van Essen DC (1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1(1):1–47

    Article  Google Scholar 

  • Fine I, Jacobs RA (1999) Modeling the combination of motion, stereo, and vergence angle cues to visual depth. Neural Comput 11(6):1297–1330

    Article  CAS  PubMed  Google Scholar 

  • Finney EM, Fine I, Dobkins KR (2001) Visual stimuli activate auditory cortex in the deaf. Nat Neurosci 4(12):1171–1173

    Article  CAS  PubMed  Google Scholar 

  • Foxe JJ, Morocz IA, Murray MM, Higgins BA, Javitt DC, Schroeder CE (2000) Multisensory auditory-somatosensory interactions in early cortical processing revealed by high-density electrical mapping. Brain Res Cogn Brain Res 10(1–2):77–83

    Article  CAS  PubMed  Google Scholar 

  • Frens MA, Van Opstal AJ, Van der Willigen RF (1995) Spatial and temporal factors determine auditory-visual interactions in human saccadic eye movements. Percept Psychophys 57(6):802–816

    Article  CAS  PubMed  Google Scholar 

  • Geisler WS, Perry JS, Super BJ, Gallogly DP (2001) Edge co-occurrence in natural images predicts contour grouping performance. Vision Res 41(6):711–724

    Article  CAS  PubMed  Google Scholar 

  • Ghazanfar AA, Maier JX, Hoffman KL, Logothetis NK (2005) Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. J Neurosci 25(20):5004–5012

    Article  CAS  PubMed  Google Scholar 

  • Gibson JR, Maunsell JH (1997) Sensory modality specificity of neural activity related to memory in visual cortex. J Neurophysiol 78(3):1263–1275

    CAS  PubMed  Google Scholar 

  • Gielen SC, Schmidt RA, Van den Heuvel PJ (1983) On the nature of intersensory facilitation of reaction time. Percept Psychophys 34(2):161–168

    Article  CAS  PubMed  Google Scholar 

  • Gold JI, Shadlen MN (2001) Neural computations that underlie decisions about sensory stimuli. Trends Cog Sci 5:10–16

    Article  Google Scholar 

  • Gori M, Del Viva M, Sandini G, Burr DC (2008) Young children do not integrate visual and haptic form information. Curr Biol 18(9):694–698

    Article  CAS  PubMed  Google Scholar 

  • Greenwald HS, Knill DC (2009) A comparison of visuomotor cue integration strategies for object placement and prehension. Vis Neurosci 26(1):63–72

    Article  PubMed  Google Scholar 

  • Hagen MC, Franzén O, McGlone F, Essick G, Dancer C, Pardo JV (2002) Tactile motion activates the human middle temporal/V5 (MT/V5) complex. Eur J Neurosci 16(5):957–964

    Article  PubMed  Google Scholar 

  • Hairston WD, Wallace MT, Vaughan JW, Stein BE, Norris JL, Schirillo JA (2003) Visual localization ability influences cross-modal bias. J Cogn Neurosci 15(1):20–29

    Article  CAS  PubMed  Google Scholar 

  • Helmholtz H von (1867) Handbuch der physiologischen Optik. Brockhaus, Leipzig

    Google Scholar 

  • Hershenson M (1962) Reaction time as a measure of intersensory facilitation. J Exp Psychol 63:289–293

    Article  CAS  PubMed  Google Scholar 

  • Hinton GE, Sejnowski TJ (1986) Learning and relearning in Boltzmann machines, In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing explorations in the microstructure of cognition volume foundations. MIT Press, Cambridge, MA

    Google Scholar 

  • Hoyer PO, Hyvärinen A (2003) Interpreting neural response variability as Monte Carlo sampling of the posterior. In: Advances in neural information processing systems 15 (NIPS*2002). MIT Press, Cambridge, MA, pp 277–284

    Google Scholar 

  • Jacobs RA (1999) Optimal integration of texture and motion cues to depth. Vision Res 39(21):3621–3629

    Article  CAS  PubMed  Google Scholar 

  • Jacobs RA, Fine I (1999) Experience-dependent integration of texture and motion cues to depth. Vision Res 39(24):4062–4075

    Article  CAS  PubMed  Google Scholar 

  • James TW, Humphrey GK, Gati JS, Servos P, Menon RS, Goodale MA (2002) Haptic study of three-dimensional objects activates extrastriate visual areas. Neuropsychologia 40(10):1706–1714

    Article  PubMed  Google Scholar 

  • Jones EG, Powell TP (1970) An anatomical study of converging sensory pathways within the cerebral cortex of the monkey. Brain 93(4):793–820

    Article  CAS  PubMed  Google Scholar 

  • Jousmaki V, Hari R (1998) Parchment-skin illusion: sound-biased touch. Curr Biol 8(6):R190–R191

    Article  CAS  PubMed  Google Scholar 

  • Kersten D (1999) High-level vision as statistical inference. In: Gazzaniga MS (ed) The new cognitive neurosciences, 2nd edn. MIT Press, Cambridge, MA, pp 352–364

    Google Scholar 

  • Kahneman D, Tversky A (2000) Choices, values, and frames. Cambridge University Press, New York, NY

    Google Scholar 

  • Kersten D, Mamassian P, Yuille A (2004) Object perception as Bayesian inference. Annu Rev Psychol 55:271–304

    Article  PubMed  Google Scholar 

  • Kersten D, Yuille A (2003) Bayesian models of object perception. Curr Opin Neurobiol 13(2):150–158

    Article  CAS  PubMed  Google Scholar 

  • Knill DC (2003) Mixture models and the probabilistic structure of depth cues. Vision Res 43(7):831–854

    Article  PubMed  Google Scholar 

  • Knill DC (2007) Learning Bayesian priors for depth perception. J Vis 7(8):13

    Article  PubMed  Google Scholar 

  • Knill DC, Kersten D (1991) Apparent surface curvature affects lightness perception. Nature 351(6323):228–230

    Article  CAS  PubMed  Google Scholar 

  • Knill DC, Saunders JA (2003) Do humans optimally integrate stereo and texture information for judgments of surface slant? Vision Res 43(24):2539–2558

    Article  PubMed  Google Scholar 

  • Knill DC, Pouget A (2004) The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci 27(12):712–719

    Article  CAS  PubMed  Google Scholar 

  • Knill DC, Saunders JA (2003) Do humans optimally integrate stereo and texture information for judgments of surface slant? Vision Res 43:2539–2558

    Article  PubMed  Google Scholar 

  • Knutsen PM, Ahissar E (2008) Orthogonal coding of object location. Trends Neurosci 32(2):101–109

    Article  PubMed  Google Scholar 

  • Koerding KP, Beierholm U, Ma WJ, Quartz S, Tenenbaum JB, Shams L (2007) Causal inference in multisensory perception. PLoS One 2(9):e943

    Article  Google Scholar 

  • Körding KP, Wolpert D (2004) Bayesian integration in sensorimotor learning. Nature 427:244–247

    Article  PubMed  Google Scholar 

  • Kujala T, Huotilainen M, Sinkkonen J, Ahonen AI, Alho K, Hämäläinen MS, Ilmoniemi RJ, Kajola M, Knuutila JE, Lavikainen J, Salonend O, Simolab J, Standertskjöld-Nordenstamd, C-G, Tiitinena H, Tissarie SO, Näätänen R (1995) Visual cortex activation in blind humans during sound discrimination. Neurosci Lett 183(1–2):143–146

    Article  CAS  PubMed  Google Scholar 

  • Landy MS, Maloney LT, Johnston EB, Young M (1995) Measurement and modeling of depth cue combination: in defense of weak fusion. Vision Res 35(3):389–412

    Article  CAS  PubMed  Google Scholar 

  • Lewkowicz DJ (2000) Perceptual development in human infants. Am J Psychol 113(3):488–499

    Article  Google Scholar 

  • Lomo T, Mollica A (1959) Activity of single units of the primary optic cortex during stimulation by light, sound, smell and pain, in unanesthetized rabbits. Boll Soc Ital Biol Sper 35:1879–1882

    CAS  PubMed  Google Scholar 

  • Ma WJ, Beck JM, Latham PE, Pouget A (2006) Bayesian inference with probabilistic population codes. Nat Neurosci 9(11):1432–1438

    Article  CAS  PubMed  Google Scholar 

  • MacKay D (2003) Information theory, inference, and learning algorithms. Cambridge University Press, New York, NY

    Google Scholar 

  • Mamassian P, Knill DC, Kersten D (1998) The perception of cast shadows. Trends Cogn Sci 2(8):288–295

    Article  CAS  PubMed  Google Scholar 

  • Mamassian P, Landy MS (2001) Interaction of visual prior constraints. Vision Res 41(20):2653–2668

    Article  CAS  PubMed  Google Scholar 

  • Marr D (1982) Vision: a computational investigation into the human representation and processing of visual information. W.H. Freeman & Co., San Francisco

    Google Scholar 

  • McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264(5588):746–748

    Article  CAS  PubMed  Google Scholar 

  • Meredith MA, Stein BE (1986) Spatial factors determine the activity of multisensory neurons in cat superior colliculus. Brain Res 365(2):350–354

    Article  CAS  PubMed  Google Scholar 

  • Michel MM, Jacobs RA (2007) Parameter learning but not structure learning: a Bayesian network model of constraints on early perceptual learning. J Vis 7(1):4

    Article  PubMed  Google Scholar 

  • Morrell F (1972) Visual system’s view of acoustic space. Nature 238:44–46

    Article  CAS  PubMed  Google Scholar 

  • Murata K, Cramer H, Bach-y-Rita P (1965) Neuronal convergence of noxious, acoustic, and visual stimuli in the visual cortex of the cat. J Neurophysiol 28(6):1223–1239

    CAS  PubMed  Google Scholar 

  • Nardini M, Jones P, Bedford R, Braddick O (2006) Development of cue integration in human navigation. Curr Biol 18(9):689–693

    Article  Google Scholar 

  • Neumann Jv, Morgenstern O (1944) Theory of games and economic behavior. Princeton University Press, Princeton, pp 648

    Google Scholar 

  • Newell FN, Ernst MO, Tjan BS, Bülthoff HH (2001) Viewpoint dependence in visual and haptic object recognition. Psychol Sci 12(1):37–42

    Article  CAS  PubMed  Google Scholar 

  • Oruç I, Maloney LT, Landy MS (2003) Weighted linear cue combination with possibly correlated error. Vision Res 43(23):2451–2468

    Article  PubMed  Google Scholar 

  • Patton PE, Anastasio TJ (2003) Modeling cross-modal enhancement and modality-specific suppression in multisensory neurons. Neural Comput 15(4):783–810

    Article  PubMed  Google Scholar 

  • Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference, 2nd edn. Morgan Kaufmann Publishers, San Mateo

    Google Scholar 

  • Pick HL, Warren DH, Hay JC (1969): Sensory conflict in judgements of spatial direction. Percept Psychophys 6:203–205

    Article  Google Scholar 

  • Poremba A, Saunders RC, Crane AM, Cook M, Sokoloff L, Mishkin M (2003) Functional mapping of the primate auditory system. Science 299(5606):568–572

    Article  CAS  PubMed  Google Scholar 

  • Rothkopf CA, Ballard DH (2009) Image statistics at the point of gaze during human navigation. Vis Neurosci 26(1):81–92

    Article  PubMed  Google Scholar 

  • Rothkopf CA, Weisswange TH, Triesch J (2009) Learning independent causes in natural images explains the space variant oblique effect. In: Proceedings of the 8th International Conference on Development and Learning (ICDL 2009). Shanghai, China

    Google Scholar 

  • Rowland BA, Stanford TR, Stein BE (2007) A model of the neural mechanisms underlying multisensory integration in the superior colliculus. Perception 36(10):1431–1443

    Article  PubMed  Google Scholar 

  • Sadato N, Pascual-Leone A, Grafman J, Ibañez V, Deiber MP, Dold G, Hallett M (1996) Activation of the primary visual cortex by Braille reading in blind subjects. Nature 380(6574):526–528

    Article  CAS  PubMed  Google Scholar 

  • Sanborn A, Griffiths T, Navarro DA (2006) A more rational model of categorization. Proc Cog Sci 2006:726–731

    Google Scholar 

  • Sato Y, Toyoizumi T, Aihara K (2007) Bayesian inference explains perception of unity and ventriloquism aftereffect: identification of common sources of audiovisual stimuli. Neural Comput 19(12):3335–3355

    Article  PubMed  Google Scholar 

  • Saunders JA, Knill DC (2001) Perception of 3d surface orientation from skew symmetry. Vision Res 41(24):3163–3183

    Article  CAS  PubMed  Google Scholar 

  • Schlicht EJ, Schrater PR (2007) Effects of visual uncertainty on grasping movements. Exp Brain Res 182(1):47–57

    Article  PubMed  Google Scholar 

  • Schrater PR, Kersten D (2000) How optimal depth cue integration depends on the task. Int J Comp Vis 40(1):71–89

    Article  Google Scholar 

  • Schroeder CE, Foxe JJ (2002) The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Brain Res Cogn Brain Res 14(1):187–198

    Article  PubMed  Google Scholar 

  • Shams L, Seitz AR (2008) Benefits of multisensory learning. Trends Cogn Sci 12(11):411–417

    Article  PubMed  Google Scholar 

  • Smith AM (ed and trans) (2001) Alhacen’s theory of visual perception: a critical edition, Transactions of the American Philosophical Society, Philadelphia, 91(4–5)

    Google Scholar 

  • Stocker AA, Simoncelli EP (2006) Noise characteristics and prior expectations in human visual speed perception. Nat Neurosci 9(4):578–585

    Article  CAS  PubMed  Google Scholar 

  • Thomas G (1941) Experimental study of the influence of vision on sound localisation. J Exp Psychol 28:167177

    Article  Google Scholar 

  • Triesch J, Ballard DH, Jacobs RA (2002) Fast temporal dynamics of visual cue integration. Perception 31(4):421–434

    Article  PubMed  Google Scholar 

  • Triesch J, von der Malsburg C (2001) Democratic integration: self-organized integration of adaptive cues. Neural Comput 13(9):2049–2074

    Article  CAS  PubMed  Google Scholar 

  • Trommershäuser J, Maloney LT, Landy MS (2003) Statistical decision theory and trade-offs in the control of motor response. Spat Vis 16(3–4):255–275

    Article  PubMed  Google Scholar 

  • Trommershäuser J, Maloney LT, Landy MS (2008) Decision making, movement planning and statistical decision theory. Trends Cogn Sci 12(8):291–297

    Article  PubMed  Google Scholar 

  • van Beers RJ, Sittig AC, Gon JJ (1999) Integration of proprioceptive and visual position-information: an experimentally supported model. J Neurophysiol 81(3):1355–1364

    PubMed  Google Scholar 

  • von Schiller P (1932) Die Rauhigkeit als intermodale Erscheinung. Z Psychol Bd 127:265–289

    Google Scholar 

  • Wallace MT, Stein BE (2007) Early experience determines how the senses will interact. J Neurophysiol 97(1):921–926

    Google Scholar 

  • Wallace MT, Wilkinson LK, Stein BE (1996) Representation and integration of multiple sensory inputs in primate superior colliculus. J Neurophysiol 76(2):1246–1266

    CAS  PubMed  Google Scholar 

  • Weiss Y, Fleet DJ (2002) Velocity likelihoods in biological and machine vision. In: Rao RPN, Olshausen BA, Lewicki MS (eds) Probabilistic models of the brain. MIT Press, Cambridge, MA

    Google Scholar 

  • Weiss Y, Simoncelli EP, Adelson EH (2002) Motion illusions as optimal percepts. Nat Neurosci 5(6):598–604

    Article  CAS  PubMed  Google Scholar 

  • Weisswange TH, Rothkopf CA, Rodemann T, Triesch J (2009) Can reinforcement learning explain the development of casual inference in multisensory integration? In: Proceedings of the 8th International Conference on Development and Learning (ICDL 2009). Shanghai, China

    Google Scholar 

  • Wozny DR, Beierholm UR, Shams L (2008) Human trimodal perception follows optimal statistical inference. J Vis 8(3):24, 1–11

    Article  PubMed  Google Scholar 

  • Yuille AL, Bülthoff HH (1996) Bayesian theory and psychophysics. In: Knill D, Richards W (eds) Perception as Bayesian inference. Cambridge University Press, New York, NY, pp 123–161

    Google Scholar 

  • Yuille A, Kersten D (2006). Vision as Bayesian inference: analysis by synthesis? Trends Cogn Sci 10(7):301–308

    Article  PubMed  Google Scholar 

  • Zemel RS, Dayan P, Pouget A (1998) Probabilistic interpretation of population code. Neural Comput 10(2):403–430

    Article  CAS  PubMed  Google Scholar 

  • Zhou YD, Fuster JM (2000) Visuo-tactile cross-modal associations in cortical somatosensory cells. Proc Natl Acad Sci U S A 97(17):9777–9782

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Constantin Rothkopf .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science + Business Media, LLC

About this chapter

Cite this chapter

Rothkopf, C., Weisswange, T., Triesch, J. (2010). Computational Modeling of Multisensory Object Perception. In: Kaiser, J., Naumer, M. (eds) Multisensory Object Perception in the Primate Brain. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5615-6_3

Download citation

Publish with us

Policies and ethics