Skip to main content
Log in

Bayesian networks and information theory for audio-visual perception modeling

  • Original Paper
  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

Thanks to their different senses, human observers acquire multiple information coming from their environment. Complex cross-modal interactions occur during this perceptual process. This article proposes a framework to analyze and model these interactions through a rigorous and systematic data-driven process. This requires considering the general relationships between the physical events or factors involved in the process, not only in quantitative terms, but also in term of the influence of one factor on another. We use tools from information theory and probabilistic reasoning to derive relationships between the random variables of interest, where the central notion is that of conditional independence. Using mutual information analysis to guide the model elicitation process, a probabilistic causal model encoded as a Bayesian network is obtained. We exemplify the method by using data collected in an audio-visual localization task for human subjects, and we show that it yields a well-motivated model with good predictive ability. The model elicitation process offers new prospects for the investigation of the cognitive mechanisms of multisensory perception.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6): 716–723

    Article  Google Scholar 

  • Alais D, Burr D (2004) The ventriloquist effect results from near-optimal bimodal integration. Curr Biol 14(3): 257–262

    CAS  PubMed  Google Scholar 

  • Anastasio TJ, Patton PE (2003) A two-stage unsupervised learning algorithm reproduces multisensory enhancement in a neural network model of the corticotectal system. J Neurosci 23(17): 6713–6727

    CAS  PubMed  Google Scholar 

  • Andersen TS, Tiippana K, Sams M (2004) Factors influencing audiovisual fission and fusion illusions. Cogn Brain Res 21: 301–308

    Article  Google Scholar 

  • Andersen TS, Tiippana K, Mikko S (2005) Maximum likelihood integration of rapid flashes and beeps. Neurosci Lett 380: 155–160

    Article  CAS  PubMed  Google Scholar 

  • Andersson SA, Madigan D, Perlman MD (1997) A characterization of Markov equivalence classes for acyclic digraphs. Ann Stat 25(2): 505–541

    Article  Google Scholar 

  • Battaglia PW, Jacobs RA, Aslin RN (2003) Bayesian integration of visual and auditory signals for spatial localization. J Opt Soc Am A 20(7): 1391–1397

    Article  Google Scholar 

  • Burnham KP, Anderson DR (2002a) Model selection and multimodel inference: a practical information-theoretic approach, Chap 2.4, 2nd edn. Springer, New York, pp 66–67

  • Burnham KP, Anderson DR (2002b) Model selection and multimodel inference: a practical information-theoretic approach, Chap 2.6, 2nd edn. Springer, New York, pp 70–72

  • Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York, USA

    Book  Google Scholar 

  • Deneve S, Pouget A (2004) Bayesian multisensory integration and cross-modal spatial links. J Physiol Paris 98(1-3): 249–258

    Article  PubMed  Google Scholar 

  • Druzdzel MJ, Glymour C (1995) What do college ranking data tell us about student retention: causal discovery in action. In: Proceedings of 4th workshop on intelligent information systems. IPI PAN Press, Augustow, Poland, pp 1–10

  • Ernst MO (2006) A Bayesian view on multimodal cue integration. In: Human body perception from the inside out, Chap 6. Oxford University Press, New York, pp 105–131

  • Ernst MO, Banks MS (2002) Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870): 429–433

    Article  CAS  PubMed  Google Scholar 

  • Ernst MO, Bülthoff HH (2004) Merging the senses into a robust percept. TRENDS Cogn Sci 8(4): 162–169

    Article  PubMed  Google Scholar 

  • He K, Meeden G (1997) Selecting the number of bins in a histogram: a decision theoretic approach. J Stat Plan Inference 61(1): 49–59

    Article  Google Scholar 

  • Heron J, Whitake D, McGraw PV (2004) Sensory uncertainty governs the extent of audio-visual interaction. Vis Res 44: 2875–2884

    Article  CAS  PubMed  Google Scholar 

  • Hospedales TM, Vijayakumar S (2008) Structure inference for bayesian multisensory scene understanding. IEEE Trans Pattern Anal Mach Intell (PAMI) 30(12): 2140–2157

    Article  Google Scholar 

  • Knill, DC, Richards, W (eds) (1996) Perception as Bayesian inference. Cambridge University Press, New York

    Google Scholar 

  • Kontkanen P, Myllymäki P, Silander T, Tirri H, Grünwald P (1998) Bayesian and information-theoretic priors for bayesian network parameters. In: Proceedings on 10th European conference on machine learning, vol 1398. Springer-Verlag, Chemnitz, Germany, pp 89–94

  • Körding KP, Beierholm U, Ma WJ, Quartz S, Tenenbaum JB, Shams L (2007) Causal inference in multisensory perception. PLoS ONE 2(9): e943

    Article  PubMed  Google Scholar 

  • Lauritzen SL (1996) Graphical models. Oxford University Press, New York,USA

    Google Scholar 

  • Lewald J, Guski R (2003) Cross-modal perceptual integration of spatially and temporally disparate auditory and visual stimuli. Cogn Brain Res 16: 468–478

    Article  Google Scholar 

  • MacKay DJ (2003) Information theory, inference, and learning algorithms. Cambridge University Press, New York, USA

    Google Scholar 

  • Margaritis D (2003) Learning bayesian network model structure from data. Phd thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA

  • Murphy KP (2002) Dynamic bayesian networks: representation, inference and learning. Phd thesis, University of California, Berkeley, USA

  • Neapolitan RE (2004) Learning Bayesian networks. Prentice Hall, Upper Saddle River, NJ

    Google Scholar 

  • Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan kaufmann, San Francisco, CA

    Google Scholar 

  • Richiardi J (2007) Probabilistic models for multi-classifier biometric authentication using quality measures. These no. 3954, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

  • Roach NW, Heron J, McGraw PV (2006) Resolving multisensory conflict: a strategy for balancing the costs and benefits of audio-visual integration. In: Proceedings of the Royal Society B: biological sciences, vol 273. Royal Society of London, London, UK, pp 2159–2168

  • Robins JM, Wasserman L (1999) On the impossibility of inferring causation from association without background knowledge. In: Computation, causation, and discovery. MIT Press, Cambridge, MA, USA, pp 305–321

  • Sato Y, Toyoizumi T (2007) Bayesian inference explains perception of unity and ventriloquism aftereffect: identification of common sources of audiovisual stimuli. Neural Comput 19: 3335–3355

    Article  PubMed  Google Scholar 

  • Scott DW, Sain SR (2005) Multi-dimensional density estimation. In: Data mining and computational statistics, Handbook of statistics, Chap 9, vol 23. Elsevier, Amsterdam, pp 229–262

  • Shams L, Ma WJ, Beierholm U (2005) Sound-induced flash illusion as an optimal percept. NeuroReport 16(17): 1923–1927

    Article  PubMed  Google Scholar 

  • Sheskin DJ (2004) Handbook of parametric and nonparametric statistical procedures, 3rd edn. CRC press, New York, NY, USA

    Google Scholar 

  • Spence C (2007) Audiovisual multisensory integration. Acoust Sci Technol 28(2): 61–70

    Article  Google Scholar 

  • Spirtes P, Glymour C, Scheines R (2001) Causation, prediction, and search. MIT Press, Cambridge, MA, USA

    Google Scholar 

  • Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617

    Article  Google Scholar 

  • Theodoridis S, Koutroumbas K (2006) Pattern recognition. Academic Press, Orlando, FL, USA

    Google Scholar 

  • Verma T, Pearl J (1992) An algorithm for deciding if a set of observed independencies has a causal explanation. In: Proceedings of 8th annual conference on uncertainty in artificial intelligence (UAI-92). Morgan Kaufmann, San Mateo, CA, pp 323–333

  • Warren DH (1979) Spatial localization under conflict conditions: is there a single explanation?. Perception 8(3): 323–337

    Article  CAS  PubMed  Google Scholar 

  • Welch RB, Warren DH (1980) Immediate perceptual response to intersensory discrepancy. Psychol Bull 88(3): 638–667

    Article  CAS  PubMed  Google Scholar 

  • Wozny DR, Beierholm UR, Shams L (2008) Human trimodal perception follows optimal statistical inference. J Vis 8(3): 1–11

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patricia Besson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Besson, P., Richiardi, J., Bourdin, C. et al. Bayesian networks and information theory for audio-visual perception modeling. Biol Cybern 103, 213–226 (2010). https://doi.org/10.1007/s00422-010-0392-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-010-0392-8

Keywords

Navigation