Computational Modeling of Multisensory Object Perception

  • Constantin RothkopfEmail author
  • Thomas Weisswange
  • Jochen Triesch


Computational modeling largely based on advances in artificial intelligence and machine learning has helped furthering the understanding of some of the principles and mechanisms of multisensory object perception. Furthermore, this theoretical work has led to the development of new experimental paradigms and to important new questions. The last 20 years have seen an increasing emphasis on models that explicitly compute with uncertainties, a crucial aspect of the relation between sensory signals and states of the world. Bayesian models allow for the formulation of such relationships and also of explicit optimality criteria against which human performance can be compared. They therefore allow answering the question, how close human performance comes to a specific formulation of best performance. Maybe even more importantly, Bayesian methods allow comparing quantitatively different models by how well they account for observed data. The success of such techniques in explaining perceptual phenomena has also led to a large number of new open questions, especially about how the brain is able to perform computations that are consistent with these functional models and also about the origin of the algorithms in the brain. We briefly review some key empirical evidence of crossmodal perception and proceed to give an overview of the computational principles evident form this work. The presentation of current modeling approaches to multisensory perception considers Bayesian models, models at an intermediate level, and neural models implementing multimodal computations. Finally, this chapter specifically emphasizes current open questions in theoretical models of multisensory object perception.


Object Perception Partially Observable Markov Decision Process Ideal Observer Haptic Information Multisensory Processing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Adams WJ, Graf EW, Ernst MO (2004) Experience can change the ‘light-from-above’ prior. Nat Neurosci 7(10):1057–1058CrossRefPubMedGoogle Scholar
  2. Alais D, Burr D (2004) The ventriloquist effect results from near-optimal bimodal integration. Curr Biol 14(3):257–262PubMedGoogle Scholar
  3. Alvarado JC, Vaughan JW, Stanford TR, Stein BE (2007) Multisensory versus unisensory integration: contrasting modes in the superior colliculus. J Neurophysiol 97(5): 3193–3205CrossRefPubMedGoogle Scholar
  4. Anastasio TJ, Patton PE (2003) A two-stage unsupervised learning algorithm reproduces multisensory enhancement in a neural network model of the corticotectal system. J Neurosci 23(17):6713–6727PubMedGoogle Scholar
  5. Anastasio TJ, Patton PE, Belkacem-Boussaid K (2000) Using Bayes’ rule to model multisensory enhancement in the superior colliculus. Neural Comput 12(5):1165–1187CrossRefPubMedGoogle Scholar
  6. Anderson CH, Van Essen DC (1994) Neurobiological computational systems. In: Zureda JM, Marks RJ, Robinson CJ (eds) Computational intelligence imitating life. IEEE Press, New York, pp 213–222Google Scholar
  7. Atkins JE, Fiser J, Jacobs RA (2001) Experience-dependent visual cue integration based on consistencies between visual and haptic percepts. Vision Res 41(4):449–461CrossRefPubMedGoogle Scholar
  8. Battaglia PW, Jacobs RA, Aslin RN (2003) Bayesian integration of visual and auditory signals for spatial localization. J Opt Soc Am A Opt Image Sci Vis 20(7):1391–1397CrossRefPubMedGoogle Scholar
  9. Battaglia PW, Schrater P, Kersten D (2005)  Auxiliary object knowledge influences visually-guided interception behavior.  In: Proceedings of the 2nd symposium on applied perception in graphics and visualization, ACM International Conference Proceeding Series. ACM, New York, NY, pp 145–152Google Scholar
  10. Bernoulli D.; Originally published in 1738; (January 1954). “Exposition of a New Theory on the Measurement of Risk”. Econometrica 22(1): 22–36 (trans: Lousie Sommer)Google Scholar
  11. Beierholm U, Kording K, Shams L, Ma WJ (2008) Comparing Bayesian models for multisensory cue combination without mandatory integration. Advances in neural information processing systems 20. MIT Press, Cambridge, MA, vol. 1, pp 81–88Google Scholar
  12. Bishop CM (2006) Pattern recognition and machine learning. Springer, HeidelbergGoogle Scholar
  13. Bizley JK, Nodal FR, Bajo VM, Nelken I, King AJ (2007) Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cereb Cortex 17(9):2172–2189CrossRefPubMedGoogle Scholar
  14. Bruce C, Desimone R, Gross CG (1981) Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. J Neurophysiol 46(2):369–384PubMedGoogle Scholar
  15. Calvert GA, Bullmore ET, Brammer MJ, Campbell R, Williams SC, McGuire PK, Woodruff PW, Iversen SD, David AS (1997) Activation of auditory cortex during silent lipreading. Science 276(5312):593–596CrossRefPubMedGoogle Scholar
  16. Daw N, Courville A (2008) The pigeon as particle filter. In: Advances in neural information processing systems 20 (NIPS 2007). MIT Press, Cambridge, MA, pp 369–376Google Scholar
  17. Deneve S (2005) Bayesian inferences in spiking neurons. In: Advances in neural information processing systems 17 (NIPS 2004). MIT Press, Cambridge, MA, pp 353–360Google Scholar
  18. Doya K, Ishii S, Pouget A, Rao RPN (2007) The Bayesian brain: probabilistic approaches to neural coding. MIT Press, Cambridge, MAGoogle Scholar
  19. Ernst MO (2007) Learning to integrate arbitrary signals from vision and touch. J Vis 7(5):7.1–7.14CrossRefGoogle Scholar
  20. Ernst MO, Banks MS (2002) Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870):429–433CrossRefPubMedGoogle Scholar
  21. Ernst MO, Banks MS, Bülthoff HH (2000) Touch can change visual slant perception. Nat Neurosci 3:69–73CrossRefPubMedGoogle Scholar
  22. Ernst MO, Bülthoff HH (2004) Merging the senses into a robust percept’. Trends Cogn Sci 8(4):162–169CrossRefPubMedGoogle Scholar
  23. Fellemann DJ, Van Essen DC (1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1(1):1–47CrossRefGoogle Scholar
  24. Fine I, Jacobs RA (1999) Modeling the combination of motion, stereo, and vergence angle cues to visual depth. Neural Comput 11(6):1297–1330CrossRefPubMedGoogle Scholar
  25. Finney EM, Fine I, Dobkins KR (2001) Visual stimuli activate auditory cortex in the deaf. Nat Neurosci 4(12):1171–1173CrossRefPubMedGoogle Scholar
  26. Foxe JJ, Morocz IA, Murray MM, Higgins BA, Javitt DC, Schroeder CE (2000) Multisensory auditory-somatosensory interactions in early cortical processing revealed by high-density electrical mapping. Brain Res Cogn Brain Res 10(1–2):77–83CrossRefPubMedGoogle Scholar
  27. Frens MA, Van Opstal AJ, Van der Willigen RF (1995) Spatial and temporal factors determine auditory-visual interactions in human saccadic eye movements. Percept Psychophys 57(6):802–816CrossRefPubMedGoogle Scholar
  28. Geisler WS, Perry JS, Super BJ, Gallogly DP (2001) Edge co-occurrence in natural images predicts contour grouping performance. Vision Res 41(6):711–724CrossRefPubMedGoogle Scholar
  29. Ghazanfar AA, Maier JX, Hoffman KL, Logothetis NK (2005) Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. J Neurosci 25(20):5004–5012CrossRefPubMedGoogle Scholar
  30. Gibson JR, Maunsell JH (1997) Sensory modality specificity of neural activity related to memory in visual cortex. J Neurophysiol 78(3):1263–1275PubMedGoogle Scholar
  31. Gielen SC, Schmidt RA, Van den Heuvel PJ (1983) On the nature of intersensory facilitation of reaction time. Percept Psychophys 34(2):161–168CrossRefPubMedGoogle Scholar
  32. Gold JI, Shadlen MN (2001) Neural computations that underlie decisions about sensory stimuli. Trends Cog Sci 5:10–16CrossRefGoogle Scholar
  33. Gori M, Del Viva M, Sandini G, Burr DC (2008) Young children do not integrate visual and haptic form information. Curr Biol 18(9):694–698CrossRefPubMedGoogle Scholar
  34. Greenwald HS, Knill DC (2009) A comparison of visuomotor cue integration strategies for object placement and prehension. Vis Neurosci 26(1):63–72CrossRefPubMedGoogle Scholar
  35. Hagen MC, Franzén O, McGlone F, Essick G, Dancer C, Pardo JV (2002) Tactile motion activates the human middle temporal/V5 (MT/V5) complex. Eur J Neurosci 16(5):957–964CrossRefPubMedGoogle Scholar
  36. Hairston WD, Wallace MT, Vaughan JW, Stein BE, Norris JL, Schirillo JA (2003) Visual localization ability influences cross-modal bias. J Cogn Neurosci 15(1):20–29CrossRefPubMedGoogle Scholar
  37. Helmholtz H von (1867) Handbuch der physiologischen Optik. Brockhaus, LeipzigGoogle Scholar
  38. Hershenson M (1962) Reaction time as a measure of intersensory facilitation. J Exp Psychol 63:289–293CrossRefPubMedGoogle Scholar
  39. Hinton GE, Sejnowski TJ (1986) Learning and relearning in Boltzmann machines, In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing explorations in the microstructure of cognition volume foundations. MIT Press, Cambridge, MAGoogle Scholar
  40. Hoyer PO, Hyvärinen A (2003) Interpreting neural response variability as Monte Carlo sampling of the posterior. In: Advances in neural information processing systems 15 (NIPS*2002). MIT Press, Cambridge, MA, pp 277–284Google Scholar
  41. Jacobs RA (1999) Optimal integration of texture and motion cues to depth. Vision Res 39(21):3621–3629CrossRefPubMedGoogle Scholar
  42. Jacobs RA, Fine I (1999) Experience-dependent integration of texture and motion cues to depth. Vision Res 39(24):4062–4075CrossRefPubMedGoogle Scholar
  43. James TW, Humphrey GK, Gati JS, Servos P, Menon RS, Goodale MA (2002) Haptic study of three-dimensional objects activates extrastriate visual areas. Neuropsychologia 40(10):1706–1714CrossRefPubMedGoogle Scholar
  44. Jones EG, Powell TP (1970) An anatomical study of converging sensory pathways within the cerebral cortex of the monkey. Brain 93(4):793–820 CrossRefPubMedGoogle Scholar
  45. Jousmaki V, Hari R (1998) Parchment-skin illusion: sound-biased touch. Curr Biol 8(6):R190–R191CrossRefPubMedGoogle Scholar
  46. Kersten D (1999) High-level vision as statistical inference. In: Gazzaniga MS (ed) The new cognitive neurosciences, 2nd edn. MIT Press, Cambridge, MA, pp 352–364Google Scholar
  47. Kahneman D, Tversky A (2000) Choices, values, and frames. Cambridge University Press, New York, NYGoogle Scholar
  48. Kersten D, Mamassian P, Yuille A (2004) Object perception as Bayesian inference. Annu Rev Psychol 55:271–304CrossRefPubMedGoogle Scholar
  49. Kersten D, Yuille A (2003) Bayesian models of object perception. Curr Opin Neurobiol 13(2):150–158CrossRefPubMedGoogle Scholar
  50. Knill DC (2003) Mixture models and the probabilistic structure of depth cues. Vision Res 43(7):831–854CrossRefPubMedGoogle Scholar
  51. Knill DC (2007) Learning Bayesian priors for depth perception. J Vis 7(8):13CrossRefPubMedGoogle Scholar
  52. Knill DC, Kersten D (1991) Apparent surface curvature affects lightness perception. Nature 351(6323):228–230CrossRefPubMedGoogle Scholar
  53. Knill DC, Saunders JA (2003) Do humans optimally integrate stereo and texture information for judgments of surface slant? Vision Res 43(24):2539–2558CrossRefPubMedGoogle Scholar
  54. Knill DC, Pouget A (2004) The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci 27(12):712–719CrossRefPubMedGoogle Scholar
  55. Knill DC, Saunders JA (2003) Do humans optimally integrate stereo and texture information for judgments of surface slant? Vision Res 43:2539–2558CrossRefPubMedGoogle Scholar
  56. Knutsen PM, Ahissar E (2008) Orthogonal coding of object location. Trends Neurosci 32(2):101–109CrossRefPubMedGoogle Scholar
  57. Koerding KP, Beierholm U, Ma WJ, Quartz S, Tenenbaum JB, Shams L (2007) Causal inference in multisensory perception. PLoS One 2(9):e943CrossRefGoogle Scholar
  58. Körding KP, Wolpert D (2004) Bayesian integration in sensorimotor learning. Nature 427:244–247CrossRefPubMedGoogle Scholar
  59. Kujala T, Huotilainen M, Sinkkonen J, Ahonen AI, Alho K, Hämäläinen MS, Ilmoniemi RJ, Kajola M, Knuutila JE, Lavikainen J, Salonend O, Simolab J, Standertskjöld-Nordenstamd, C-G, Tiitinena H, Tissarie SO, Näätänen R (1995) Visual cortex activation in blind humans during sound discrimination. Neurosci Lett 183(1–2):143–146CrossRefPubMedGoogle Scholar
  60. Landy MS, Maloney LT, Johnston EB, Young M (1995) Measurement and modeling of depth cue combination: in defense of weak fusion. Vision Res 35(3):389–412CrossRefPubMedGoogle Scholar
  61. Lewkowicz DJ (2000) Perceptual development in human infants. Am J Psychol 113(3):488–499CrossRefGoogle Scholar
  62. Lomo T, Mollica A (1959) Activity of single units of the primary optic cortex during stimulation by light, sound, smell and pain, in unanesthetized rabbits. Boll Soc Ital Biol Sper 35:1879–1882PubMedGoogle Scholar
  63. Ma WJ, Beck JM, Latham PE, Pouget A (2006) Bayesian inference with probabilistic population codes. Nat Neurosci 9(11):1432–1438CrossRefPubMedGoogle Scholar
  64. MacKay D (2003) Information theory, inference, and learning algorithms. Cambridge University Press, New York, NYGoogle Scholar
  65. Mamassian P, Knill DC, Kersten D (1998) The perception of cast shadows. Trends Cogn Sci 2(8):288–295CrossRefPubMedGoogle Scholar
  66. Mamassian P, Landy MS (2001) Interaction of visual prior constraints. Vision Res 41(20):2653–2668CrossRefPubMedGoogle Scholar
  67. Marr D (1982) Vision: a computational investigation into the human representation and processing of visual information. W.H. Freeman & Co., San FranciscoGoogle Scholar
  68. McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264(5588):746–748CrossRefPubMedGoogle Scholar
  69. Meredith MA, Stein BE (1986) Spatial factors determine the activity of multisensory neurons in cat superior colliculus. Brain Res 365(2):350–354CrossRefPubMedGoogle Scholar
  70. Michel MM, Jacobs RA (2007) Parameter learning but not structure learning: a Bayesian network model of constraints on early perceptual learning. J Vis 7(1):4CrossRefPubMedGoogle Scholar
  71. Morrell F (1972) Visual system’s view of acoustic space. Nature 238:44–46CrossRefPubMedGoogle Scholar
  72. Murata K, Cramer H, Bach-y-Rita P (1965) Neuronal convergence of noxious, acoustic, and visual stimuli in the visual cortex of the cat. J Neurophysiol 28(6):1223–1239PubMedGoogle Scholar
  73. Nardini M, Jones P, Bedford R, Braddick O (2006) Development of cue integration in human navigation. Curr Biol 18(9):689–693CrossRefGoogle Scholar
  74. Neumann Jv, Morgenstern O (1944) Theory of games and economic behavior. Princeton University Press, Princeton, pp 648Google Scholar
  75. Newell FN, Ernst MO, Tjan BS, Bülthoff HH (2001) Viewpoint dependence in visual and haptic object recognition. Psychol Sci 12(1):37–42CrossRefPubMedGoogle Scholar
  76. Oruç I, Maloney LT, Landy MS (2003) Weighted linear cue combination with possibly correlated error. Vision Res 43(23):2451–2468CrossRefPubMedGoogle Scholar
  77. Patton PE, Anastasio TJ (2003) Modeling cross-modal enhancement and modality-specific suppression in multisensory neurons. Neural Comput 15(4):783–810CrossRefPubMedGoogle Scholar
  78. Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference, 2nd edn. Morgan Kaufmann Publishers, San MateoGoogle Scholar
  79. Pick HL, Warren DH, Hay JC (1969): Sensory conflict in judgements of spatial direction. Percept Psychophys 6:203–205CrossRefGoogle Scholar
  80. Poremba A, Saunders RC, Crane AM, Cook M, Sokoloff L, Mishkin M (2003) Functional mapping of the primate auditory system. Science 299(5606):568–572CrossRefPubMedGoogle Scholar
  81. Rothkopf CA, Ballard DH (2009) Image statistics at the point of gaze during human navigation. Vis Neurosci 26(1):81–92CrossRefPubMedGoogle Scholar
  82. Rothkopf CA, Weisswange TH, Triesch J (2009) Learning independent causes in natural images explains the space variant oblique effect. In: Proceedings of the 8th International Conference on Development and Learning (ICDL 2009). Shanghai, ChinaGoogle Scholar
  83. Rowland BA, Stanford TR, Stein BE (2007) A model of the neural mechanisms underlying multisensory integration in the superior colliculus. Perception 36(10):1431–1443CrossRefPubMedGoogle Scholar
  84. Sadato N, Pascual-Leone A, Grafman J, Ibañez V, Deiber MP, Dold G, Hallett M (1996) Activation of the primary visual cortex by Braille reading in blind subjects. Nature 380(6574):526–528CrossRefPubMedGoogle Scholar
  85. Sanborn A, Griffiths T, Navarro DA (2006) A more rational model of categorization. Proc Cog Sci 2006:726–731Google Scholar
  86. Sato Y, Toyoizumi T, Aihara K (2007) Bayesian inference explains perception of unity and ventriloquism aftereffect: identification of common sources of audiovisual stimuli. Neural Comput 19(12):3335–3355CrossRefPubMedGoogle Scholar
  87. Saunders JA, Knill DC (2001) Perception of 3d surface orientation from skew symmetry. Vision Res 41(24):3163–3183CrossRefPubMedGoogle Scholar
  88. Schlicht EJ, Schrater PR (2007) Effects of visual uncertainty on grasping movements. Exp Brain Res 182(1):47–57CrossRefPubMedGoogle Scholar
  89. Schrater PR, Kersten D (2000) How optimal depth cue integration depends on the task. Int J Comp Vis 40(1):71–89CrossRefGoogle Scholar
  90. Schroeder CE, Foxe JJ (2002) The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Brain Res Cogn Brain Res 14(1):187–198CrossRefPubMedGoogle Scholar
  91. Shams L, Seitz AR (2008) Benefits of multisensory learning. Trends Cogn Sci 12(11):411–417CrossRefPubMedGoogle Scholar
  92. Smith AM (ed and trans) (2001) Alhacen’s theory of visual perception: a critical edition, Transactions of the American Philosophical Society, Philadelphia, 91(4–5)Google Scholar
  93. Stocker AA, Simoncelli EP (2006) Noise characteristics and prior expectations in human visual speed perception. Nat Neurosci 9(4):578–585CrossRefPubMedGoogle Scholar
  94. Thomas G (1941) Experimental study of the influence of vision on sound localisation. J Exp Psychol 28:167177CrossRefGoogle Scholar
  95. Triesch J, Ballard DH, Jacobs RA (2002) Fast temporal dynamics of visual cue integration. Perception 31(4):421–434CrossRefPubMedGoogle Scholar
  96. Triesch J, von der Malsburg C (2001) Democratic integration: self-organized integration of adaptive cues. Neural Comput 13(9):2049–2074CrossRefPubMedGoogle Scholar
  97. Trommershäuser J, Maloney LT, Landy MS (2003) Statistical decision theory and trade-offs in the control of motor response. Spat Vis 16(3–4):255–275CrossRefPubMedGoogle Scholar
  98. Trommershäuser J, Maloney LT, Landy MS (2008) Decision making, movement planning and statistical decision theory. Trends Cogn Sci 12(8):291–297CrossRefPubMedGoogle Scholar
  99. van Beers RJ, Sittig AC, Gon JJ (1999) Integration of proprioceptive and visual position-information: an experimentally supported model. J Neurophysiol 81(3):1355–1364PubMedGoogle Scholar
  100. von Schiller P (1932) Die Rauhigkeit als intermodale Erscheinung. Z Psychol Bd 127:265–289Google Scholar
  101. Wallace MT, Stein BE (2007) Early experience determines how the senses will interact. J Neurophysiol 97(1):921–926Google Scholar
  102. Wallace MT, Wilkinson LK, Stein BE (1996) Representation and integration of multiple sensory inputs in primate superior colliculus. J Neurophysiol 76(2):1246–1266PubMedGoogle Scholar
  103. Weiss Y, Fleet DJ (2002) Velocity likelihoods in biological and machine vision. In: Rao RPN, Olshausen BA, Lewicki MS (eds) Probabilistic models of the brain. MIT Press, Cambridge, MAGoogle Scholar
  104. Weiss Y, Simoncelli EP, Adelson EH (2002) Motion illusions as optimal percepts. Nat Neurosci 5(6):598–604CrossRefPubMedGoogle Scholar
  105. Weisswange TH, Rothkopf CA, Rodemann T, Triesch J (2009) Can reinforcement learning explain the development of casual inference in multisensory integration? In: Proceedings of the 8th International Conference on Development and Learning (ICDL 2009). Shanghai, ChinaGoogle Scholar
  106. Wozny DR, Beierholm UR, Shams L (2008) Human trimodal perception follows optimal statistical inference. J Vis 8(3):24, 1–11CrossRefPubMedGoogle Scholar
  107. Yuille AL, Bülthoff HH (1996) Bayesian theory and psychophysics. In: Knill D, Richards W (eds) Perception as Bayesian inference. Cambridge University Press, New York, NY, pp 123–161Google Scholar
  108. Yuille A, Kersten D (2006). Vision as Bayesian inference: analysis by synthesis? Trends Cogn Sci 10(7):301–308CrossRefPubMedGoogle Scholar
  109. Zemel RS, Dayan P, Pouget A (1998) Probabilistic interpretation of population code. Neural Comput 10(2):403–430CrossRefPubMedGoogle Scholar
  110. Zhou YD, Fuster JM (2000) Visuo-tactile cross-modal associations in cortical somatosensory cells. Proc Natl Acad Sci U S A 97(17):9777–9782CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2010

Authors and Affiliations

  • Constantin Rothkopf
    • 1
    Email author
  • Thomas Weisswange
    • 1
  • Jochen Triesch
    • 1
  1. 1.Frankfurt Institute for Advanced Studies (FIAS)Goethe University FrankfurtFrankfurt am MainGermany

Personalised recommendations