Audition as a Trigger of Head Movements

Part of the Modern Acoustics and Signal Processing book series (MASP)


In multimodal realistic environments, audition and vision are the prominent two sensory modalities that work together to provide humans with a best possible perceptual understanding of the environment. Yet, when designing artificial binaural systems, this collaboration is often not honored. Instead, substantial effort is made to construct best performing purely auditory-scene-analysis systems, sometimes with goals and ambitions that reach beyond human capabilities. It is often not considered that, what enables us to perform so well in complex environments, is the ability of: (i) using more than one source of information, for instance, visual in addition to auditory one and, (ii) making assumptions about the objects to be perceived on the basis of a priori knowledge. In fact, the human capability of inferring information from one modality to another one helps substantially to efficiently analyze the complex environments that humans face everyday. Along this line of thinking, this chapter addresses the effects of attention reorientation triggered by audition. Accordingly, it discusses mechanisms that lead to appropriate motor reactions, such as head movements for putting our visual sensors toward an audiovisual object of interest. After presenting some of the neuronal foundations of multimodal integration and motor reactions linked to auditory-visual perception, some ideas and issues from the field of a robotics are tackled. This is accomplished by referring to computational modeling. Thereby some biological bases are discussed as underlie active multimodal perception, and it is demonstrated how these can be taken into account when designing artificial agents endowed with human-like perception.



This work has been supported by the European FP7 TWO!EARS project, ICT-618075, We also thank two anonymous reviewers for their previous comments on this work.


  1. Ahissar, M., and S. Hochstein. 2004. The reverse hierarchy theory of visual perceptual learning. Trends in Cognitive Sciences 8 (10): 457–464. Scholar
  2. Alain, C., S.R. Arnott, S. Hevenor, S. Graham, and C.L. Grady. 2001. ‘What’, and ‘where’ in the human auditory system. Proceedings of the National Academy of Sciences of the United States of America 98 (21): 12301–12306. Scholar
  3. Alho, K. 1995. Cerebral generators of mismatch negativity (MMN) and its magnetic counterpart (MMNm) elicited by sound changes. Ear and Hearing 16 (1): 38–51. Scholar
  4. Anastasio, T.J., P.E. Patton, and K. Belkacem-Boussaid. 2000. Using Bayes rule to model multisensory enhancement in the superior colliculus. Neural Computation 12 (5): 1165–1187. Scholar
  5. Arnal, L.H., and A.-L. Giraud. 2012. Cortical oscillations and sensory predictions. Trends in Cognitive Sciences 16 (7): 390–398. Scholar
  6. Atilgan, H., S.M. Town, K.C. Wood, G.P. Jones, R.K. Maddox, A.K. Lee, and J.K. Bizley. 2018. Integration of visual information in auditory cortex promotes auditory scene analysis through multisensory binding. Neuron 97 (3): 640–655.e4.
  7. Baranes, A., and P.-Y. Oudeyer. 2009. R-IAC: Robust intrinsically motivated active learning. IEEE Transactions on Autonomous Mental Development 1 (3): 155–169.Google Scholar
  8. Baranes, A., and P.-Y. Oudeyer 2010. Intrinsically motivated goal exploration for active motor learning in robots: A case study. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IROS, IEEE, 1766–1773.
  9. Belin, P., R.J. Zatorre, P. Lafaille, P. Ahad, and B. Pike. 2000. Voice-selective areas in human auditory cortex. Nature 403 (6767): 309–312. Scholar
  10. Berlyne, D.E. 1950. Novelty and curiosity as determinants of exploratory behavior. British Journal of Psychology 41 (1–2): 68–80.Google Scholar
  11. Berlyne, D.E. 1954. A theory of human curiosity. British Journal of Psychology 45 (3): 180–191.Google Scholar
  12. Bisley, J.W., and M.E. Goldberg. 2006. Neural correlates of attention and distractibility in the lateral intraparietal area. Journal of Neurophysology 95: 1696–1717. Scholar
  13. Blauert, J., and G. Brown. 2020. Reflexive and reflective auditory feedback. In The Technology of Binaural Understanding, eds. J. Blauert and J. Braasch, 3–31. Cham, Switzerland: Springer and APA Press.Google Scholar
  14. Cherry, E.C. 1953. Some experiments upon the recognition of speech with one and two ears. The Journal of the Acoustical Society of America 25: 975–979.ADSGoogle Scholar
  15. Cherry, E.C., and W.K. Taylor. 1954. Some further experiments upon the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America 26 (4): 554–559.ADSGoogle Scholar
  16. Cohen-L’hyver, B. 2017. Modulation of head movements for the multimodal analysis of an unknown environment. Ph.D. thesis, University Pierre and Marie Curie.Google Scholar
  17. Cohen-L’hyver, B., S. Argentieri, and B. Gas. 2015. Modulating the auditory turn-to reflex on the basis of multimodal feedback loops: The dynamic weighting model. In IEEE International Conference on Robotics and Biomimetics (ROBIO), 1109–1114.Google Scholar
  18. Cohen-L’hyver, B., S. Argentieri, and B. Gas. 2016. Multimodal fusion and inference using binaural audition and vision. In International Congress on Acoustics.Google Scholar
  19. Cohen-L’hyver, B., S. Argentieri, and B. Gas. 2018. The head turning modulation system: An active multimodal paradigm for intrinsically motivated exploration of unknown environments. Frontiers in Neurorobotics 12: 60.
  20. Corbetta, M., G. Patel, and G.L. Shulman. 2008. Review the reorienting system of the human brain: From environment to theory of mind. 306–324.
  21. Cuperlier, N., M. Quoy, and P. Gaussier. 2007. Neurobiologically inspired mobile robot navigation and planning. Frontiers in Neurorobotics 1.
  22. Duangudom, V., and D.V. Anderson 2007. Using auditory saliency to understand complex auditory scenes. In 15th European Signal Processing Conference.Google Scholar
  23. Durrant-Whyte, H., and T. Bailey. 2006. Simultaneous localization and mapping (SLAM): Part I. IEEE Robotics Automation Magazine 13 (2): 99–110.Google Scholar
  24. Escera, C., K. Alho, I. Winkler, and R. Naatanen. 1998. Neural mechanisms of involuntary attention. Journal of Cognitive Neuroscience 10 (5): 590–604. Scholar
  25. Escera, C., E. Yago, M.J. Corral, S. Corbera, and M.I. Nuñez. 2003. Attention capture by auditory significant stimuli: Semantic analysis follows attention switching. European Journal of Neuroscience 18 (8): 2408–2412. Scholar
  26. Fendrich, R., and P.M. Corballis. 2001. The temporal cross-capture of audition and vision. Perception & Psychophysics 63 (4): 719–725. Scholar
  27. Finney, E.M., I. Fine, and K.R. Dobkins. 2001. Visual stimuli activate auditory cortex in the deaf. Nature Neuroscience 4 (12): 1171–1173. Scholar
  28. Friston, K. 2005. A theory of cortical responses. Philosophical Transactions: Biological Sciences 360 (1456): 815–836. Scholar
  29. Gebhard, J., and G. Mowbray. 1959. On discriminating the rate of visual flicker and auditory flutter. The American Journal of Psychology 72 (4): 521–529.Google Scholar
  30. Girard, B., V. Cuzin, A. Guillot, K.N. Gurney, and T.J. Prescott. 2002. Comparing a brain-inspired robot action selection mechanism with ‘Winner-Takes-All’. In From Animals to Animats 7: Proceedings of the 7th International Conference on Simulation of Adaptive Behavior, vol. 7, 75, MIT Press.Google Scholar
  31. Gurney, K., T.J. Prescott, and P. Redgrave. 2001a. A computational model of action selection in the basal ganglia. I. A new functional anatomy. Biological Cybernetics 84 (6): 401–410.Google Scholar
  32. Gurney, K., T.J. Prescott, and P. Redgrave. 2001b. A computational model of action selection in the basal ganglia. II. Analysis and simulation of behaviour. Biological Cybernetics 84 (6): 411–423.Google Scholar
  33. Hall, J.W., M.P. Haggard, and M.A. Fernandes. 1984. Detection in noise by spectro-temporal pattern analysis. The Journal of the Acoustical Society of America 76 (1): 50–56.ADSGoogle Scholar
  34. Hay, J.C., H.L. Pick, and K. Ikeda. 1965. Visual capture produced by prism spectacles. Psychonomic Science 2 (1–12): 215–216.Google Scholar
  35. Hochstein, S., and M. Ahissar. 2002. View from the top: Hierarchies and reverse hierarchies review. Neuron 36 (3): 791–804.Google Scholar
  36. Itti, L., C. Koch, and E. Niebur. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (11): 1254–1259. Scholar
  37. Iurilli, G., D. Ghezzi, U. Olcese, G. Lassi, C. Nazzaro, R. Tonini, V. Tucci, F. Benfenati, and P. Medini. 2012. Sound-driven synaptic inhibition in primary visual cortex. Neuron 73 (4): 814–828. Scholar
  38. Kayser, C., C.I. Petkov, M. Lippert, and N.K. Logothetis. 2005. Mechanisms for allocating auditory attention: An auditory saliency map. Current Biology 15: 1943–1947. Scholar
  39. Koch, C., and S. Ullman. 1985. Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology 4 (4): 219–227.Google Scholar
  40. Kohonen, T. 1982. Self-organized formation of popologically correct feature maps. Biological Cybernetics 43 (1): 59–69. Scholar
  41. Li, Z. 2002. A saliency map in primary visual cortex. Trends in Cognitive Sciences 6 (1): 9–16.Google Scholar
  42. Lochmann, T., and S. Deneve. 2011. Neural processing as causal inference. Current Opinion in Neurobiology 21 (5): 774–781. Scholar
  43. Macedo, L., and A. Cardoso. 2001. Modeling forms of surprise in an artificial agent. In Proceedings of the Cognitive Science Society, vol. 23.Google Scholar
  44. Makarenko, A.A., S.B. Williams, F. Bourgault, and H.F. Durrant-Whyte. 2002. An experiment in integrated exploration. In IEEE International Conference on Robots and Systems.Google Scholar
  45. May, P.J. 2006. The mammalian superior colliculus: Laminar structure and connections. Progress in Brain Research 321–378.
  46. Mazer, J.A., and J.L. Gallant. 2003. Goal-related activity in v4 during free viewing visual search: Evidence for a ventral stream visual salience map. Neuron 40: 1241–1250.Google Scholar
  47. Meredith, M.A., and B.E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of Neurophysiology 56 (3): 640–662.
  48. Molholm, S., A. Martinez, W. Ritter, D.C. Javitt, and J.J. Foxe. 2005. The neural circuitry of pre-attentive auditory change-detection: An fMRI study of pitch and duration mismatch negativity generators. Cerebral Cortex 15 (5): 545–551. Scholar
  49. Moschovakis, A.K. 1996. The superior colliculus and eye movement control. Current Opinion in Neurobiology 6 (6): 811–816.Google Scholar
  50. Näätänen, R., and K. Alho. 1995. Generators of electrical and magnetic mismatch responses in humans. Brain Topography 7 (4): 315–320.Google Scholar
  51. Näätänen, R., A. Gaillard, and S. Mäntysalo. 1978. Early selective-attention effect on evoked potential reinterpreted. Acta Psychologica 42: 313–329.Google Scholar
  52. Näätänen, R., P. Paavilainen, T. Rinne, and K. Alho. 2007. The mismatch negativity (MMN) in basic research of central auditory processing: A review. Clinical Neurophysiology: Official Journal of the International Federation of Clinical Neurophysiology 118 (12): 2544–2590. Scholar
  53. Nahum, M., I. Nelken, and M. Ahissar. 2008. Low-level information and high-level perception: The case of speech in noise. PLoS Biology 6 (5): e126. Scholar
  54. Nelken, I., and M. Ahissar. 2006. High-level and low-level processing in the auditory system: The role of primary auditory cortex. Dynamic of Speech Production and Perception: 5–12.Google Scholar
  55. Noda, K., H. Arie, Y. Suga, and T. Ogata. 2014. Multimodal integration learning of robot behavior using deep neural networks. Robotics and Autonomous Systems 62 (6): 721–736. Scholar
  56. Nothdurft, H.-C. 2006. Salience and target selection in visual search. Visual Cognition 14 (4–8): 514–542.Google Scholar
  57. Oliva, A., A. Torralba, M.S. Castelhano, and J.M. Henderson. 2003. Top-down control of visual attention in object detection. In IEEE International Conference on Image Processing, vol. 1, 1–4, September 14–17.
  58. Pick, H.L., D.H. Warren, and J.C. Hay. 1969. Sensory conflict in judgments of spatial direction. Attention, Perception, & Psychophysics 6 (4): 203–205.Google Scholar
  59. Posner, M.I., M.J. Nissen, and R.M. Klein. 1976. Visual dominance: An information-processing account of its origins and significance. Psychological Review 83 (2): 157–171. Scholar
  60. Ruesch, J., M. Lopes, A. Bernardino, J. Hörnstein, J. Santos-Victor, and R. Pfeifer. 2008. Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub. In Proceedings—IEEE International Conference on Robotics and Automation, 962–967.
  61. Saldana, H.M., and L.D. Rosenblum. 1993. Visual influences on auditory pluck and bow judgments. 54 (3): 406–416.Google Scholar
  62. Scheier, C.R., R. Nijhawan, and S. Shimojo. 1999. Sound alters visual temporal resolution. Investigative Ophthalmology & Visual Science 40: S792–S792.Google Scholar
  63. Schymura, C., Kolossa D. 2020. Blackboard systems for modeling binaural understanding. In The Technology of Binaural Understanding, eds. J. Blauert and J. Braasch, 91–111. Cham, Switzerland: Springer and ASA Press.Google Scholar
  64. Shamma, S. 2008. On the emergence and awareness of auditory objects. PLoS Biology 6 (6): e155. Scholar
  65. Shams, L., C.A.Y. Kamitani, S. Thompson, and S. Shimojo. 2001. Sound alters visual evoked potentials in humans. Cognitive Neuroscience and Neuropsychology 12 (17): 3849–3852.Google Scholar
  66. Shams, L., Y. Kamitani, and S. Shimojo. 2002. Visual illusion induced by sound. Cognitive Brain Research 14: 147–152.Google Scholar
  67. Shams, L., S. Iwaki, A. Chawla, and J. Bhattacharya. 2005. Early modulation of visual cortex by sound: An MEG study. Neuroscience Letters 378 (2): 76–81. Scholar
  68. Sharma, J., A. Angelucci, and M. Sur. 2000. Induction of visual orientation modules in auditory cortex. Nature 404 (6780): 841–847. Scholar
  69. Spence, C.J., and J. Driver. 1994. Covert spatial orienting in audition: Exogenous and endogenous mechanisms. Journal of Experimental Psychology: Human Perception and Performance 20 (3): 555–574.Google Scholar
  70. Spence, C., and J. Driver. 1996. Audiovisual links in endogenous covert spatial attention. Journal of Experimental Psychology: Human Perception and Performance 22 (4): 1005–1030.Google Scholar
  71. Spence, C., and J. Driver. 1997a. Audiovisual links in exogenous covert spatial orienting. Perception & Psychophysics 59 (1): 1–22.
  72. Spence, C., and J. Driver. 1997b. On measuring selective attention to an expected sensory modality. Perception & Psychophysics 59 (3): 389–403.
  73. Stein, B.E., W. Jiang, and T.R. Stanford. 2004. Multisensory integration in single neurons of the midbrain. The Handbook of Multisensory Processes, vol. 15, 243–264.Google Scholar
  74. Thompson, K.G., and N.P. Bichot. 2005. A visual salience map in the primate frontal eye field. Progress in Brain Research 147: 251–262.Google Scholar
  75. Treisman, A.M., and G. Gelade. 1980. A feature-integration theory of attention. Cognitive Psychology 12 (1): 97–136. Scholar
  76. Turatto, M., F. Benso, G. Galfano, and C. Umiltà. 2002. Nonspatial attentional shifts between audition and vision. Journal of Experimental Psychology: Human Perception and Performance 28 (3): 628–639. Scholar
  77. Two!Ears, N. Ma, I. Trowitzsch, Y. Kashef, J. Mohr, K. Obermayer, C. Schymura, D. Kolossa, T. Walther, H. Wierstorf, T. May, G. Brown, B. Cohen-L’hyver, P. Danès, M. Devy, T. Forgue, A. Podlubne, and B. Vandeportaele. 2012. Report on evaluation of the Two!Ears expert system. Technical report.Google Scholar
  78. Vetter, P., F.W. Smith, and L. Muckli. 2014. Decoding sound and imagery content in early visual cortex. Current Biology 24 (11): 1256–1262. Scholar
  79. Welch, R.B., and D.H. Warren. 1980. Immediate perceptual response to intersensory discrepancy. Psychological Bulletin 88 (3): 638.Google Scholar
  80. Wolfe, J.M. 1994. Guided search 2.0—a revised model of visual search. Psychonomic Bulletin & Review 1 (2): 202–238. Scholar
  81. Yost, W.A. 1992. Auditory perception and sound source determination. Current Directions in Psychological Science 1 (6): 179–184.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.CNRS, Institut des Systèmes Intelligents et de Robotique, ISIR, Sorbonne UniversitéParisFrance

Personalised recommendations