Physiological Models of Auditory Scene Analysis

Part of the Springer Handbook of Auditory Research book series (SHAR, volume 35)


Human listeners are remarkably adept at perceiving speech and other sounds in unfavorable acoustic environments. Typically, the sound source of interest is ­contaminated by other acoustic sources, and listeners are therefore faced with the problem of unscrambling the mixture of sounds that arrives at their ears. Nonetheless, human listeners can segregate one voice from a mixture of many voices at a cocktail party, or follow a single melodic line in a performance of orchestral music. Much as the visual system must combine information about edges, colors and textures in order to identify perceptual wholes (e.g., a face or a table), so the auditory system must solve an analogous auditory scene analysis (ASA) problem in order to recover a perceptual description of a single sound source (Bregman 1990). Understanding how the ASA problem is solved at the ­physiological level is one of the greatest challenges of hearing science, and is one that lies at the core of the “systems” approach of this book.


Channel Selection Neural Oscillator Tone Sequence Target Tone Global Inhibitor 


  1. Alain C, Izenburg A (2003) Effects of attentional load on auditory scene analysis. J Cogn Neurosci 15:1063–1073.PubMedCrossRefGoogle Scholar
  2. Alain C, Schuler BM, McDonald KL (2002) Neural activity associated with distinguishing ­concurrent auditory objects. J Acoust Soc Am 111:990–995.PubMedCrossRefGoogle Scholar
  3. Anstis S, Saida S (1985) Adaptation to auditory streaming of frequency-modulated tones. J Exp Psychol Hum Percept Perform 11:257–271.CrossRefGoogle Scholar
  4. Assmann PF, Summerfield Q (1990) Modeling the perception of concurrent vowels: vowels with different fundamental frequencies. J Acoust Soc Am 88:680–697.PubMedCrossRefGoogle Scholar
  5. Baird B (1997) Synchronized auditory and cognitive 40 Hz attentional streams, and the impact of rhythmic expectation on auditory scene analysis. In: Jordan M, Kearns M, Solla S (eds), Neural Information Processing Systems, Vol. 10. Cambridge, MA: MIT Press, pp 3–10.Google Scholar
  6. Barker J (2007) Robust automatic speech recognition. In: Wang DL, Brown GJ, Computational Auditory Scene Analysis: Principles, Algorithms and Applications. Piscataway, NJ: IEEE Press/Wiley Interscience.Google Scholar
  7. Barlow HB (1972) Single units and cognition: a neuron doctrine for perceptual psychology. Perception 1:371–394.PubMedCrossRefGoogle Scholar
  8. Barth DS, MacDonald KD (1996) Thalamic modulation of high-frequency oscillating potentials in auditory cortex. Nature 383:78–81.PubMedCrossRefGoogle Scholar
  9. Basar E, Basar-Eroglu C, Karakas S, Schurmann M (2000) Brain oscillations in perception and memory. Int J Psychophysiol 35:95–124.PubMedCrossRefGoogle Scholar
  10. Beauvois MW, Meddis R (1991) A computer model of auditory stream segregation. Q J Exp Psychol 43A:517–541.Google Scholar
  11. Beauvois MW, Meddis R (1996) Computer simulation of auditory stream segregation in ­alternating tone sequences. J Acoust Soc Am 99:2270–2280.PubMedCrossRefGoogle Scholar
  12. Berthommier F, Meyer G (1997) Improving amplitude modulation maps for F0-dependent segregation of harmonic sounds. In: Proceedings of EUROSPEECH, Rhodes, Greece, September 22–25, pp 2483–2486.Google Scholar
  13. Bodden M (1993) Modeling human sound-source localization and the cocktail party effect. Acta Acust 1:43–55.Google Scholar
  14. Braasch J (2002) Localization in the presence of a distractor and reverberation in the frontal horizontal plane: II. Model algorithms. Acta Acust/Acustica 88:956–969.Google Scholar
  15. Bregman AS (1990) Auditory Scene Analysis. Cambridge, MA: MIT Press.Google Scholar
  16. Bregman AS, Rudnicky AI (1975) Auditory segregation: stream or streams? J Exp Psychol 1:263–267.Google Scholar
  17. Brosch M, Budinger E, Scheich H (2002) Stimulus-related gamma oscillations in primate auditory cortex. J Neurophysiol 87:2715–2725.PubMedGoogle Scholar
  18. Brown GJ, Cooke MP (1994) Computational auditory scene analysis. Comput Speech Lang 8:297–336.CrossRefGoogle Scholar
  19. Brown GJ, Cooke MP (1998) Temporal synchronization in a neural oscillator model of primitive auditory stream segregation. In: Rosenthal DF, Okuno HG (eds), Computational Auditory Scene Analysis. Mahwah, NJ: Lawrence Erlbaum, pp 87–101.Google Scholar
  20. Brown GJ, Palomäki KJ (2005) A computational model of the speech reception threshold for laterally separated speech and noise. In: Proceedings of Interspeech, Lisbon, September 4–8, pp 1753–1756.Google Scholar
  21. Brown GJ, Wang DL (1997) Modelling the perceptual segregation of concurrent vowels with a network of neural oscillators. Neural Netw 10:1547–1558.CrossRefGoogle Scholar
  22. Cariani P (2001) Neural timing nets. Neural Netw 14:737–753.PubMedCrossRefGoogle Scholar
  23. Cariani P (2003) Recurrent timing nets for auditory scene analysis. Proc IJCNN-2003, Portland, OR, July 20–24.Google Scholar
  24. Carlyon RP, Cusack R, Foxton JM, Robertson IH (2001) Effects of attention and unilateral neglect on auditory stream segregation. J Exp Psychol Hum Percept Perform 27:115–127.PubMedCrossRefGoogle Scholar
  25. Chang P (2004) Exploration of Behavioural, Physiological and Computational Approaches to Auditory Scene Analysis. MSc Thesis, The Ohio State University, Department of Computer Science and Engineering.Google Scholar
  26. Colburn HS (1973) Theory of binaural interaction based on auditory-nerve data. I. General strategy and preliminary results on interaural discrimination. J Acoust Soc Am 54:1458–1470.Google Scholar
  27. Cooke M, Green P, Josifovski L, Vizinho A (2001) Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun 34:267–285.CrossRefGoogle Scholar
  28. Cosp J, Madrenas J (2003) Scene segmentation using neuromorphic oscillatory networks. IEEE Trans Neural Netw 14:1278–1296.PubMedCrossRefGoogle Scholar
  29. Crick F (1984) Function of the thalamic reticular complex: the searchlight hypothesis. Proc Natl Acad Sci U S A 81:4586–4590.PubMedCrossRefGoogle Scholar
  30. Culling JF, Darwin CJ (1994) Perceptual and computational separation of simultaneous vowels: cues arising from low-frequency beating. J Acoust Soc Am 95:1559–1569.PubMedCrossRefGoogle Scholar
  31. Culling JF, Summerfield Q (1995) Perceptual separation of concurrent speech sounds: Absence of across-frequency grouping by common interaural delay. J Acoust Soc Am 98:785–797.PubMedCrossRefGoogle Scholar
  32. Darwin CJ, Bethell-Fox CE (1977) Pitch continuity and speech source attribution. J Exp Psychol Hum Percept Perform 3:665–672.CrossRefGoogle Scholar
  33. de Cheveigné A (1993) Separation of concurrent harmonic sounds: fundamental frequency estimation and a time-domain cancellation model of auditory processing. J Acoust Soc Am 93:3271–3290.CrossRefGoogle Scholar
  34. de Cheveigné A (1997) Concurrent vowel identification. III. A neural model of harmonic interference cancellation. J Acoust Soc Am 101:2857–2865.Google Scholar
  35. de Cheveigné A, Kawahara H, Tsuzaki M, Aikawa K (1997) Concurrent vowel identification. I. Effects of relative amplitude and F0 difference. J Acoust Soc Am 101:2839–2847.CrossRefGoogle Scholar
  36. Durlach NI (1963) Equalization and cancellation theory of binaural masking level differences. J Acoust Soc Am 35:1206–1218.CrossRefGoogle Scholar
  37. Edmonds BA, Culling JF (2005) The spatial unmasking of speech: evidence for within-channel processing of interaural time delay. J Acoust Soc Am 117:3069–3078.PubMedCrossRefGoogle Scholar
  38. Elhilali M (2004) Neural Basis and Computational Strategies for Auditory Processing. PhD Thesis, Department of Electrical and Computer Engineering, University of Maryland.Google Scholar
  39. Elhilali M, Shamma SA (2008) A cocktail party with a cortical twist: how cortical mechanisms contribute to sound segregation. J Acoust Soc Am 124:3751–3771.PubMedCrossRefGoogle Scholar
  40. Fitzhugh R (1961) Impulses and physiological states in models of nerve membrane. Biophys J 1:445–466.PubMedCrossRefGoogle Scholar
  41. Gaik W (1993) Combined evaluation of interaural time and intensity differences: Psychoacoustic results and computer modeling. J Acoust Soc Am 94:98–110.PubMedCrossRefGoogle Scholar
  42. Ghazanfar AA, Chandrasekaran C, Logothetis NK (2008) Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. J Neurosci 28:4457–4469.PubMedCrossRefGoogle Scholar
  43. Ghitza O (2007) Using auditory feedback and rhythmicity for diphone discrimination of degraded speech. In: Proceedings of ICPhS, Saarbrücken, August 6–10.Google Scholar
  44. Girau B, Torres-Huitzil C (2007) Massively distributed digital implementation of an integrate-and-fire LEGION network for visual scene segmentation. Neurocomputing 70:1186–1197.CrossRefGoogle Scholar
  45. Gray CM, König P, Engel AK, Singer W (1989) Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature 338:334–337.PubMedCrossRefGoogle Scholar
  46. Grossberg S, Govindarajan KK, Wyse L, Cohen MA (2004) ARTSTREAM: a neural network model of auditory scene analysis and source segregation. Neural Netw 17:511–536.PubMedCrossRefGoogle Scholar
  47. Hecht-Nielsen R (1998) A theory of the cerebral cortex. In: Proceedings of the International Conference on Neural Information Processing (ICONIP), Burke, VA, October 21–23. Amsterdam: IOS Press, pp 1459–1464.Google Scholar
  48. Hubel DH, Weisel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160:106–154.PubMedGoogle Scholar
  49. Jeffress LA (1948) A place theory of sound localization. J Comp Physiol Psychol 41:35–39.PubMedCrossRefGoogle Scholar
  50. Jones MR, Kidd G, Wetzel R (1981) Evidence for rhythmic attention. J Exp Psychol: Hum Percept Perform 7:1059–1073.CrossRefGoogle Scholar
  51. Kollmeier B, Koch R (1994) Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. J Acoust Soc Am 95:1593–1602.PubMedCrossRefGoogle Scholar
  52. Langner G, Schreiner CE (1988) Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms. J Neurophysiol 60:1799–1822.Google Scholar
  53. Lindemann W (1986) Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. J Acoust Soc Am 80:1608–1622.Google Scholar
  54. Liu C, Wheeler BC, O’Brien WD, Bilger RC, Lansing CR, Feng AS (2000) Localization of multiple sound sources with two microphones. J Acoust Soc Am 108:1888–1905.PubMedCrossRefGoogle Scholar
  55. Liu F, Yamaguchi Y, Shimizu H (1994) Flexible vowel recognition by the generation of dynamic coherence in oscillator neural networks: Speaker-independent vowel recognition. Biol Cybern 71:105–114.PubMedCrossRefGoogle Scholar
  56. Lyon RF (1983) A computational model of binaural localization and separation. In: Proceedings of IEEE ICASSP, Boston, April 14–16, pp 1148–1151.Google Scholar
  57. Marr D (1982) Vision. San Francisco: WH Freeman.Google Scholar
  58. McAdams S, Bregman AS (1979) Hearing musical streams. Comput Music J 3:26–43.Google Scholar
  59. McCabe SL, Denham MJ (1997) A model of auditory streaming. J Acoust Soc Am 101:1611–1621.CrossRefGoogle Scholar
  60. McDonald K, Alain C (2005) Contribution of harmonicity and location to auditory object formation in free field: Evidence from event-related brain potentials. J Acoust Soc Am 118:1593–1604.PubMedCrossRefGoogle Scholar
  61. McGurk H, McDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748.PubMedCrossRefGoogle Scholar
  62. Meddis R, Hewitt MJ (1991) Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I. Pitch identification. J Acoust Soc Am 89:2866–2882.Google Scholar
  63. Meddis R, Hewitt MJ (1992) Modeling the identification of concurrent vowels with different fundamental frequencies. J Acoust Soc Am 91:233–245.PubMedCrossRefGoogle Scholar
  64. Meddis R, O’Mard LP (2006) Virtual pitch in a computational physiological model. J Acoust Soc Am 120:3861–3869.PubMedCrossRefGoogle Scholar
  65. Miller MI, Sachs MB (1983) Representation of stop consonants in the discharge patterns of auditory nerve fibers. J Acoust Soc Am 74:502–517.PubMedCrossRefGoogle Scholar
  66. Milner PM (1974) A model for visual shape recognition. Psychol Rev 81:521–535.PubMedCrossRefGoogle Scholar
  67. Nagumo J, Arimoto S, Yoshizawa S (1962) An active pulse transmission line simulating nerve axon. Proc IRE 50:2061–2070.CrossRefGoogle Scholar
  68. Norris M (2003) Assessment and Extension of Wang’s Oscillatory Model of Auditory Stream Segregation. PhD Thesis, University of Queensland, School of Information Technology and Electrical Engineering.Google Scholar
  69. Palmer SE (1999) Vision Science. Cambridge, MA: MIT Press.Google Scholar
  70. Palomäki KJ, Brown GJ, Wang DL (2004) A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Speech Commun 43:361–378.CrossRefGoogle Scholar
  71. Ribary U, Ioannides AA, Singh KD, Hasson R, Bolton JP, Lado F, Mogilner A, Llinás R (1991) Magnetic field tomography of coherent thalamocortical 40-Hz oscillations in humans. Proc Natl Acad Sci U S A 88:11037–11041.PubMedCrossRefGoogle Scholar
  72. Roman N, Wang DL, Brown GJ (2003) Speech segregation based on sound localization. J Acoust Soc Am 114:2236–2252.PubMedCrossRefGoogle Scholar
  73. Rouat J, Loiselle S, Pichevar R (2007) Towards neurocomputational speech and sound processing. Lect Notes Comput Sci 4391:58–77.CrossRefGoogle Scholar
  74. Sagi B, Nemat-Nasser SC, Kerr R, Hayek R, Downing C, Hecht-Nielsen R (2001) A biologically motivated solution to the cocktail party problem. Neural Comput 13:1575–1602.PubMedCrossRefGoogle Scholar
  75. Scheffers MTM (1983) Sifting Vowels. PhD Thesis, University of Gröningen.Google Scholar
  76. Singer W (1993) Synchronization of cortical activity and its putative role in information processing and learning. Ann Rev Physiol 55:349–374.CrossRefGoogle Scholar
  77. Slaney M, Lyon RF (1990) A perceptual pitch detector. In: Proceedings of ICASSP-1990, Albuquerque, NM, April 3–6, pp 357–360.Google Scholar
  78. Spence CJ, Driver J (1994) Covert spatial orienting in audition: Exogenous and endogenous mechanisms. J Exp Psychol Hum Percept Perform 20:555–574.CrossRefGoogle Scholar
  79. Spieth W, Curtis JF, Webster JC (1954) Responding to one of two simultaneous messages. J Acoust Soc Am 26:391–396.CrossRefGoogle Scholar
  80. Stern RM, Trahiotis C (1992) The role of consistency of interaural timing over frequency in binaural lateralization. In: Cazals Y, Horner K, Demany L (eds), Auditory Physiology and Perception. Oxford: Pergamon, pp. 547–554.Google Scholar
  81. Sussman ES, Ritter W, Vaughan HG (1999) An investigation of the auditory streaming effect using event-related brain potentials. Psychophysiology 36:22–34.PubMedCrossRefGoogle Scholar
  82. Sussman ES, Horváth J, Winkler I, Orr M (2007) The role of attention in the formation of auditory streams. Percept Psychophys 69:136–152.PubMedCrossRefGoogle Scholar
  83. Terman D, Wang D (1995) Global competition and local cooperation in a network of neural oscillators. Physica D 81:148–176.CrossRefGoogle Scholar
  84. Todd NPM (1996) An auditory cortical theory of primitive auditory grouping. Network Comput Neural Syst 7:349–356.CrossRefGoogle Scholar
  85. Van der Pol B (1926) On “relaxation oscillations”. Philos Mag 2:978–992.Google Scholar
  86. Van Noorden LPAS (1975) Temporal Coherence in the Perception of Tone Sequences. PhD Thesis, Eindhoven University of Technology.Google Scholar
  87. Von der Malsburg C (1981) The Correlation Theory of Brain Function. Internal Report No. 81-2, Max-Planck-Institut for Biophysical Chemistry, Göttingen, Germany.Google Scholar
  88. Von der Malsburg C, Schneider W (1986) A neural cocktail-party processor. Biol Cybern 54:29–40.PubMedCrossRefGoogle Scholar
  89. Wallach H (1940) The role of head movements and vestibular and visual cues in sound localization. J Exp Psychol 27:339–368.CrossRefGoogle Scholar
  90. Wang DL (1996) Primitive auditory segregation based on oscillatory correlation. Cogn Sci 20:409–456.CrossRefGoogle Scholar
  91. Wang DL, Brown GJ (1999) Separation of speech from interfering sounds using oscillatory correlation. IEEE Trans Neural Netw 10:684–697.PubMedCrossRefGoogle Scholar
  92. Wang DL, Brown GJ (2006) Computational Auditory Scene Analysis: Principles, Algorithms and Applications. Piscataway, NJ: IEEE Press/Wiley Interscience.Google Scholar
  93. Wang DL, Terman D (1995) Locally excitatory globally inhibitory oscillator networks. IEEE Trans Neural Netw 6:283–286.PubMedCrossRefGoogle Scholar
  94. Warren RM (1970) Perceptual restoration of missing speech sounds. Science 167:392–393.PubMedCrossRefGoogle Scholar
  95. Wrigley SN, Brown GJ (2004) A computational model of auditory selective attention. IEEE Trans Neural Netw 15:1151–1163.PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag US 2010

Authors and Affiliations

  1. 1.Speech and Hearing Research Group, Department of Computer ScienceUniversity of SheffieldSheffieldUK

Personalised recommendations