Biological Cybernetics

, Volume 103, Issue 3, pp 213–226 | Cite as

Bayesian networks and information theory for audio-visual perception modeling

  • Patricia Besson
  • Jonas Richiardi
  • Christophe Bourdin
  • Lionel Bringoux
  • Daniel R. Mestre
  • Jean-Louis Vercher
Original Paper


Thanks to their different senses, human observers acquire multiple information coming from their environment. Complex cross-modal interactions occur during this perceptual process. This article proposes a framework to analyze and model these interactions through a rigorous and systematic data-driven process. This requires considering the general relationships between the physical events or factors involved in the process, not only in quantitative terms, but also in term of the influence of one factor on another. We use tools from information theory and probabilistic reasoning to derive relationships between the random variables of interest, where the central notion is that of conditional independence. Using mutual information analysis to guide the model elicitation process, a probabilistic causal model encoded as a Bayesian network is obtained. We exemplify the method by using data collected in an audio-visual localization task for human subjects, and we show that it yields a well-motivated model with good predictive ability. The model elicitation process offers new prospects for the investigation of the cognitive mechanisms of multisensory perception.


Graphical model Information theory Mutual information Causal Bayesian networks Model elicitation Decision process 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6): 716–723CrossRefGoogle Scholar
  2. Alais D, Burr D (2004) The ventriloquist effect results from near-optimal bimodal integration. Curr Biol 14(3): 257–262PubMedGoogle Scholar
  3. Anastasio TJ, Patton PE (2003) A two-stage unsupervised learning algorithm reproduces multisensory enhancement in a neural network model of the corticotectal system. J Neurosci 23(17): 6713–6727PubMedGoogle Scholar
  4. Andersen TS, Tiippana K, Sams M (2004) Factors influencing audiovisual fission and fusion illusions. Cogn Brain Res 21: 301–308CrossRefGoogle Scholar
  5. Andersen TS, Tiippana K, Mikko S (2005) Maximum likelihood integration of rapid flashes and beeps. Neurosci Lett 380: 155–160CrossRefPubMedGoogle Scholar
  6. Andersson SA, Madigan D, Perlman MD (1997) A characterization of Markov equivalence classes for acyclic digraphs. Ann Stat 25(2): 505–541CrossRefGoogle Scholar
  7. Battaglia PW, Jacobs RA, Aslin RN (2003) Bayesian integration of visual and auditory signals for spatial localization. J Opt Soc Am A 20(7): 1391–1397CrossRefGoogle Scholar
  8. Burnham KP, Anderson DR (2002a) Model selection and multimodel inference: a practical information-theoretic approach, Chap 2.4, 2nd edn. Springer, New York, pp 66–67Google Scholar
  9. Burnham KP, Anderson DR (2002b) Model selection and multimodel inference: a practical information-theoretic approach, Chap 2.6, 2nd edn. Springer, New York, pp 70–72Google Scholar
  10. Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York, USACrossRefGoogle Scholar
  11. Deneve S, Pouget A (2004) Bayesian multisensory integration and cross-modal spatial links. J Physiol Paris 98(1-3): 249–258CrossRefPubMedGoogle Scholar
  12. Druzdzel MJ, Glymour C (1995) What do college ranking data tell us about student retention: causal discovery in action. In: Proceedings of 4th workshop on intelligent information systems. IPI PAN Press, Augustow, Poland, pp 1–10Google Scholar
  13. Ernst MO (2006) A Bayesian view on multimodal cue integration. In: Human body perception from the inside out, Chap 6. Oxford University Press, New York, pp 105–131Google Scholar
  14. Ernst MO, Banks MS (2002) Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870): 429–433CrossRefPubMedGoogle Scholar
  15. Ernst MO, Bülthoff HH (2004) Merging the senses into a robust percept. TRENDS Cogn Sci 8(4): 162–169CrossRefPubMedGoogle Scholar
  16. He K, Meeden G (1997) Selecting the number of bins in a histogram: a decision theoretic approach. J Stat Plan Inference 61(1): 49–59CrossRefGoogle Scholar
  17. Heron J, Whitake D, McGraw PV (2004) Sensory uncertainty governs the extent of audio-visual interaction. Vis Res 44: 2875–2884CrossRefPubMedGoogle Scholar
  18. Hospedales TM, Vijayakumar S (2008) Structure inference for bayesian multisensory scene understanding. IEEE Trans Pattern Anal Mach Intell (PAMI) 30(12): 2140–2157CrossRefGoogle Scholar
  19. Knill, DC, Richards, W (eds) (1996) Perception as Bayesian inference. Cambridge University Press, New YorkGoogle Scholar
  20. Kontkanen P, Myllymäki P, Silander T, Tirri H, Grünwald P (1998) Bayesian and information-theoretic priors for bayesian network parameters. In: Proceedings on 10th European conference on machine learning, vol 1398. Springer-Verlag, Chemnitz, Germany, pp 89–94Google Scholar
  21. Körding KP, Beierholm U, Ma WJ, Quartz S, Tenenbaum JB, Shams L (2007) Causal inference in multisensory perception. PLoS ONE 2(9): e943CrossRefPubMedGoogle Scholar
  22. Lauritzen SL (1996) Graphical models. Oxford University Press, New York,USAGoogle Scholar
  23. Lewald J, Guski R (2003) Cross-modal perceptual integration of spatially and temporally disparate auditory and visual stimuli. Cogn Brain Res 16: 468–478CrossRefGoogle Scholar
  24. MacKay DJ (2003) Information theory, inference, and learning algorithms. Cambridge University Press, New York, USAGoogle Scholar
  25. Margaritis D (2003) Learning bayesian network model structure from data. Phd thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USAGoogle Scholar
  26. Murphy KP (2002) Dynamic bayesian networks: representation, inference and learning. Phd thesis, University of California, Berkeley, USAGoogle Scholar
  27. Neapolitan RE (2004) Learning Bayesian networks. Prentice Hall, Upper Saddle River, NJGoogle Scholar
  28. Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan kaufmann, San Francisco, CAGoogle Scholar
  29. Richiardi J (2007) Probabilistic models for multi-classifier biometric authentication using quality measures. These no. 3954, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, SwitzerlandGoogle Scholar
  30. Roach NW, Heron J, McGraw PV (2006) Resolving multisensory conflict: a strategy for balancing the costs and benefits of audio-visual integration. In: Proceedings of the Royal Society B: biological sciences, vol 273. Royal Society of London, London, UK, pp 2159–2168Google Scholar
  31. Robins JM, Wasserman L (1999) On the impossibility of inferring causation from association without background knowledge. In: Computation, causation, and discovery. MIT Press, Cambridge, MA, USA, pp 305–321Google Scholar
  32. Sato Y, Toyoizumi T (2007) Bayesian inference explains perception of unity and ventriloquism aftereffect: identification of common sources of audiovisual stimuli. Neural Comput 19: 3335–3355CrossRefPubMedGoogle Scholar
  33. Scott DW, Sain SR (2005) Multi-dimensional density estimation. In: Data mining and computational statistics, Handbook of statistics, Chap 9, vol 23. Elsevier, Amsterdam, pp 229–262Google Scholar
  34. Shams L, Ma WJ, Beierholm U (2005) Sound-induced flash illusion as an optimal percept. NeuroReport 16(17): 1923–1927CrossRefPubMedGoogle Scholar
  35. Sheskin DJ (2004) Handbook of parametric and nonparametric statistical procedures, 3rd edn. CRC press, New York, NY, USAGoogle Scholar
  36. Spence C (2007) Audiovisual multisensory integration. Acoust Sci Technol 28(2): 61–70CrossRefGoogle Scholar
  37. Spirtes P, Glymour C, Scheines R (2001) Causation, prediction, and search. MIT Press, Cambridge, MA, USAGoogle Scholar
  38. Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617CrossRefGoogle Scholar
  39. Theodoridis S, Koutroumbas K (2006) Pattern recognition. Academic Press, Orlando, FL, USAGoogle Scholar
  40. Verma T, Pearl J (1992) An algorithm for deciding if a set of observed independencies has a causal explanation. In: Proceedings of 8th annual conference on uncertainty in artificial intelligence (UAI-92). Morgan Kaufmann, San Mateo, CA, pp 323–333Google Scholar
  41. Warren DH (1979) Spatial localization under conflict conditions: is there a single explanation?. Perception 8(3): 323–337CrossRefPubMedGoogle Scholar
  42. Welch RB, Warren DH (1980) Immediate perceptual response to intersensory discrepancy. Psychol Bull 88(3): 638–667CrossRefPubMedGoogle Scholar
  43. Wozny DR, Beierholm UR, Shams L (2008) Human trimodal perception follows optimal statistical inference. J Vis 8(3): 1–11CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Patricia Besson
    • 1
  • Jonas Richiardi
    • 2
    • 3
  • Christophe Bourdin
    • 1
  • Lionel Bringoux
    • 1
  • Daniel R. Mestre
    • 1
  • Jean-Louis Vercher
    • 1
  1. 1.Institute of Movement SciencesCNRS & Université de la MéditerranéeMarseilleFrance
  2. 2.Medical Image Processing LaboratoryEPFLLausanneSwitzerland
  3. 3.University of GenevaGenevaSwitzerland

Personalised recommendations