Modeling the Cocktail Party Problem

  • Mounya ElhilaliEmail author
Part of the Springer Handbook of Auditory Research book series (SHAR, volume 60)


Modeling the cocktail party problem entails developing a computational framework able to describe what the auditory system does when faced with a complex auditory scene. While completely intuitive and omnipresent in humans and animals alike, translating this remarkable ability into a quantitative model remains a challenge. This chapter touches on difficulties facing the field in terms of defining the theoretical principles that govern auditory scene analysis, as well as reconciling current knowledge about perceptual and physiological data with their formulation into computational models. The chapter reviews some of the computational theories, algorithmic strategies, and neural infrastructure proposed in the literature for developing information systems capable of processing multisource sound inputs. Because of divergent interests from various disciplines in the cocktail party problem, the body of literature modeling this effect is equally diverse and multifaceted. The chapter touches on the various approaches used in modeling auditory scene analysis from biomimetic models to strictly engineering systems.


Computational auditory scene analysis Feature extraction Inference model Multichannel audio signal Population separation Receptive field Source separation Stereo mixture Temporal coherence 



Dr. Elhilali’s work is supported by grants from The National Institutes of Health (NIH: R01HL133043) and the Office of Naval Research (ONR: N000141010278, N000141612045, and N000141210740).

Compliance with Ethics Requirements

Mounya Elhilali declares that she has no conflict of interest.


  1. Akeroyd, M. A., Carlyon, R. P., & Deeks, J. M. (2005). Can dichotic pitches form two streams? The Journal of the Acoustical Society of America, 118(2), 977–981.PubMedCrossRefGoogle Scholar
  2. Alais, D., Blake, R., & Lee, S. H. (1998). Visual features that vary together over time group together over space. Nature Neuroscience, 1(2), 160–164.PubMedCrossRefGoogle Scholar
  3. Alinaghi, A., Jackson, P. J., Liu, Q., & Wang, W. (2014). Joint mixing vector and binaural model based stereo source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(9), 1434–1448.CrossRefGoogle Scholar
  4. Almajai, I., & Milner, B. (2011). Visually derived wiener filters for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1642–1651.CrossRefGoogle Scholar
  5. Anemuller, J., Bach, J., Caputo, B., Havlena, M., et al. (2008). The DIRAC AWEAR audio-visual platform for detection of unexpected and incongruent events. In International Conference on Multimodal Interaction, (pp. 289–293).Google Scholar
  6. Arbogast, T. L., Mason, C. R., & Kidd, G. (2002). The effect of spatial separation on informational and energetic masking of speech. The Journal of the Acoustical Society of America, 112(5 Pt 1), 2086–2098.PubMedCrossRefGoogle Scholar
  7. Aubin, T. (2004). Penguins and their noisy world. Annals of the Brazilian Academy of Sciences, 76(2), 279–283.CrossRefGoogle Scholar
  8. Bandyopadhyay, S., & Young, E. D. (2013). Nonlinear temporal receptive fields of neurons in the dorsal cochlear nucleus. Journal of Neurophysiology, 110(10), 2414–2425.PubMedPubMedCentralCrossRefGoogle Scholar
  9. Barchiesi, D., Giannoulis, D., Stowell, D., & Plumbley, M. D. (2015). Acoustic scene classification: Classifying environments from the sounds they produce. IEEE Signal Processing Magazine, 32(3), 16–34.CrossRefGoogle Scholar
  10. Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., et al. (2012). Canonical microcircuits for predictive coding. Neuron, 76(4), 695–711.PubMedPubMedCentralCrossRefGoogle Scholar
  11. Beauvois, M. W., & Meddis, R. (1996). Computer simulation of auditory stream segregation in alternating-tone sequences. The Journal of the Acoustical Society of America, 99(4), 2270–2280.PubMedCrossRefGoogle Scholar
  12. Bell, A. J., & Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6), 1129–1159.PubMedCrossRefGoogle Scholar
  13. Bizley, J. K., & Cohen, Y. E. (2013). The what, where and how of auditory-object perception. Nature Reviews Neuroscience, 14(10), 693–707.PubMedPubMedCentralCrossRefGoogle Scholar
  14. Blake, R., & Lee, S. H. (2005). The role of temporal structure in human vision. Behavioral and Cognitive Neuroscience Review, 4(1), 21–42.CrossRefGoogle Scholar
  15. Bregman, A. S. (1981). Asking the ‘what for’ question in auditory perception. In M. Kubovy & J. Pomerantz (Eds.), Perceptual organization (pp. 99–118). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  16. Bregman, A. S. (1990). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA: MIT Press.Google Scholar
  17. Bregman, A. S., & Campbell, J. (1971). Primary auditory stream segregation and perception of order in rapid sequences of tones. Journal of Experimental Psychology, 89(2), 244–249.PubMedCrossRefGoogle Scholar
  18. Brown, G. J., & Cooke, M. (1994). Computational auditory scene analysis. Computer Speech & Language, 8(4), 297–336.CrossRefGoogle Scholar
  19. Brown, G. J., & Cooke, M. (1998). Temporal synchronization in a neural oscillator model of primitive auditory stream segregation. In D. L. Wang & G. Brown (Eds.), Computational auditory scene analysis (pp. 87–103). London: Lawrence Erlbaum Associates.Google Scholar
  20. Brown, G. J., Barker, J., & Wang, D. (2001). A neural oscillator sound separator for missing data speech recognition. In Proceedings of International Joint Conference on Neural Networks, 2001 (IJCNN ’01) (Vol. 4, pp. 2907–2912).Google Scholar
  21. Buxton, H. (2003). Learning and understanding dynamic scene activity: A review. Image and Vision Computing, 21(1), 125–136.CrossRefGoogle Scholar
  22. Carlyon, R. P. (2004). How the brain separates sounds. Trends in Cognitive Sciences, 8(10), 465–471.PubMedCrossRefGoogle Scholar
  23. Carlyon, R. P., Cusack, R., Foxton, J. M., & Robertson, I. H. (2001). Effects of attention and unilateral neglect on auditory stream segregation. Journal of Experimental Psychology: Human Perception and Performance, 27(1), 115–127.PubMedGoogle Scholar
  24. Chen, F., & Jokinen, K. (Eds.). (2010). Speech technology: Theory and applications. New York: Springer Science+Business Media.Google Scholar
  25. Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5), 975–979.CrossRefGoogle Scholar
  26. Cherry, E. C. (1957). On human communication. Cambridge, MA: MIT Press.Google Scholar
  27. Christison-Lagay, K. L., Gifford, A. M., & Cohen, Y. E. (2015). Neural correlates of auditory scene analysis and perception. International Journal of Psychophysiology, 95(2), 238–245.PubMedCrossRefGoogle Scholar
  28. Ciocca, V. (2008). The auditory organization of complex sounds. Frontiers in Bioscience, 13, 148–169.PubMedCrossRefGoogle Scholar
  29. Cisek, P., Drew, T., & Kalaska, J. (Eds.). (2007). Computational neuroscience: Theoretical insights into brain function. Philadelphia: Elsevier.Google Scholar
  30. Colburn, H. S., & Kulkarni, A. (2005). Models of sound localization. In A. N. Popper & R. R. Fay (Eds.), Sound source localization (pp. 272–316). New York: Springer Science+Business Media.CrossRefGoogle Scholar
  31. Collins, N. (2009). Introduction to computer music. Hoboken, NJ: Wiley.Google Scholar
  32. Cooke, M., & Ellis, D. (2001). The auditory organization of speech and other sources in listeners and computational models. Speech Communication, 35, 141–177.CrossRefGoogle Scholar
  33. Cusack, R., & Roberts, B. (1999). Effects of similarity in bandwidth on the auditory sequential streaming of two-tone complexes. Perception, 28(10), 1281–1289.PubMedCrossRefGoogle Scholar
  34. Cusack, R., & Roberts, B. (2000). Effects of differences in timbre on sequential grouping. Perception and Psychophysics, 62(5), 1112–1120.PubMedCrossRefGoogle Scholar
  35. Darwin, C. J., & Carlyon, R. P. (1995). Auditory grouping. In B. C. J. Moore (Ed.), Hearing (pp. 387–424). Orlando, FL: Academic Press.CrossRefGoogle Scholar
  36. Darwin, C. J., & Hukin, R. W. (1999). Auditory objects of attention: The role of interaural time differences. Journal of Experimental Psychology: Human Perception and Performance, 25(3), 617–629.PubMedGoogle Scholar
  37. deCharms, R. C., Blake, D. T., & Merzenich, M. M. (1998). Optimizing sound features for cortical neurons. Science, 280(5368), 1439–1443.Google Scholar
  38. Deng, L., Li, J., Huang, J., Yao, K., et al. (2013). Recent advances in deep learning for speech research at Microsoft. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, May 26–31, 2013 (pp. 8604–8608).Google Scholar
  39. Depireux, D. A., Simon, J. Z., Klein, D. J., & Shamma, S. A. (2001). Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. Journal of Neurophysiology, 85(3), 1220–1234.PubMedGoogle Scholar
  40. Doclo, S., & Moonen, M. (2003). adaptive. EURASIP Journal of Applied Signal Processing, 11, 1110–1124.CrossRefGoogle Scholar
  41. Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification. Hoboken, NJ: Wiley.Google Scholar
  42. Eggermont, J. J. (2013). The STRF: Its origin, evolution and current application. In D. Depireux & M. Elhilali (Eds.), Handbook of modern techniques in auditory cortex (pp. 1–32). Hauppauge, NY: Nova Science Publishers.Google Scholar
  43. Elhilali, M. (2013). Bayesian inference in auditory scenes. In Proceedings of the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan, (pp. 2792–2795).Google Scholar
  44. Elhilali, M., & Shamma, S. A. (2008). A cocktail party with a cortical twist: How cortical mechanisms contribute to sound segregation. The Journal of the Acoustical Society of America, 124(6), 3751–3771.PubMedPubMedCentralCrossRefGoogle Scholar
  45. Elhilali, M., Ma, L., Micheyl, C., Oxenham, A. J., & Shamma, S. A. (2009). Temporal coherence in the perceptual organization and cortical representation of auditory scenes. Neuron, 61(2), 317–329.PubMedPubMedCentralCrossRefGoogle Scholar
  46. Elhilali, M., Ma, L., Micheyl, C., Oxenham, A., & Shamma, S. (2010). Rate vs. temporal code? A spatio-temporal coherence model of the cortical basis of streaming. In E. Lopez-Poveda, A. Palmer & R. Meddis (Eds.), Auditory physiology, perception and models (pp. 497–506). New York: Springer Science+Business Media.Google Scholar
  47. Elhilali, M., Shamma, S. A., Simon, J. Z., & Fritz, J. B. (2013). A linear systems view to the concept of STRF. In D. Depireux & M. Elhilali (Eds.), Handbook of modern techniques in auditory cortex (pp. 33–60). Hauppauge, NY: Nova Science Publishers.Google Scholar
  48. Escabi, M. A., & Schreiner, C. E. (2002). Nonlinear spectrotemporal sound analysis by neurons in the auditory midbrain. The Journal of Neuroscience, 22(10), 4114–4131.PubMedGoogle Scholar
  49. Farmani, M., Pedersen, M. S., Tan, Z. H., & Jensen, J. (2015). On the influence of microphone array geometry on HRTF-based sound source localization. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 439–443).Google Scholar
  50. Friston, K. J. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138.PubMedCrossRefGoogle Scholar
  51. Fritz, J. B., Elhilali, M., David, S. V., & Shamma, S. A. (2007). Auditory attention–focusing the searchlight on sound. Current Opinion in Neurobiology, 17(4), 437–455.PubMedCrossRefGoogle Scholar
  52. Gilkey, R., & Anderson, T. R. (Eds.). (2014). Binaural and spatial hearing in real and virtual environments. New York: Psychology Press.Google Scholar
  53. Gockel, H., Carlyon, R. P., & Micheyl, C. (1999). Context dependence of fundamental-frequency discrimination: Lateralized temporal fringes. The Journal of the Acoustical Society of America, 106(6), 3553–3563.PubMedCrossRefGoogle Scholar
  54. Grimault, N., Bacon, S. P., & Micheyl, C. (2002). Auditory stream segregation on the basis of amplitude-modulation rate. The Journal of the Acoustical Society of America, 111(3), 1340–1348.PubMedCrossRefGoogle Scholar
  55. Hartmann, W., & Johnson, D. (1991). Stream segregation and peripheral channeling. Music Perception, 9(2), 155–184.CrossRefGoogle Scholar
  56. Haykin, S., & Chen, Z. (2005). The cocktail party problem. Neural Computation, 17(9), 1875–1902.PubMedCrossRefGoogle Scholar
  57. Herbrich, R. (2001). Learning kernel classifiers: Theory and algorithms. Cambridge, MA: MIT Press.Google Scholar
  58. Hinton, G., Deng, L., Yu, D., Dahl, G. E., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6), 82–97.CrossRefGoogle Scholar
  59. Hyvarinen, A., Karhunen, J., & Oja, E. (2001). Independent component analysis. Hoboken, NJ: Wiley.CrossRefGoogle Scholar
  60. Itatani, N., & Klump, G. M. (2011). Neural correlates of auditory streaming of harmonic complex sounds with different phase relations in the songbird forebrain. Journal of Neurophysiology, 105(1), 188–199.PubMedCrossRefGoogle Scholar
  61. Izumi, A. (2002). Auditory stream segregation in Japanese monkeys. Cognition, 82(3), B113–B122.PubMedCrossRefGoogle Scholar
  62. Jadhav, S. D., & Bhalchandra, A. S. (2008). Blind source separation: Trends of new age—a review. In IET International Conference on Wireless, Mobile and Multimedia Networks, 2008, Mumbai, India, January 11–12, 2008 (pp. 251–254).Google Scholar
  63. Jang, G. J., & Lee, T. W. (2003). A maximum likelihood approach to single-channel source separation. Journal of Machine Learning Research, 4(7–8), 1365–1392.Google Scholar
  64. Jeffress, L. A. (1948). A place theory of sound localization. Journal of Comparative and Physiological Psychology, 41(1), 35–39.PubMedCrossRefGoogle Scholar
  65. Jutten, C., & Karhunen, J. (2004). Advances in blind source separation (BSS) and independent component analysis (ICA) for nonlinear mixtures. International Journal of Neural Systems, 14(5), 267–292.PubMedCrossRefGoogle Scholar
  66. Kaya, E. M., & Elhilali, M. (2013). Abnormality detection in noisy biosignals. In Proceedings of the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan (pp. 3949–3952).Google Scholar
  67. Kaya, E. M., & Elhilali, M. (2014). Investigating bottom-up auditory attention. Frontiers in Human Neuroscience, 8(327), doi: 10.3389/fnhum.2014.00327
  68. Kilgard, M. P., Pandya, P. K., Vazquez, J., Gehi, A., et al. (2001). Sensory input directs spatial and temporal plasticity in primary auditory cortex. Journal of Neurophysiology, 86(1), 326–338.PubMedGoogle Scholar
  69. Klein, D. J., Depireux, D. A., Simon, J. Z., & Shamma, S. A. (2000). Robust spectrotemporal reverse correlation for the auditory system: Optimizing stimulus design. Journal of Computational Neuroscience, 9(1), 85–111.PubMedCrossRefGoogle Scholar
  70. Klein, D. J., Konig, P., & Kording, K. P. (2003). Sparse spectrotemporal coding of sounds. EURASIP Journal on Applied Signal Processing, 2003(7), 659–667.CrossRefGoogle Scholar
  71. Korenberg, M., & Hunter, I. (1996). The identification of nonlinear biological systems: Volterra kernel approaches. Annals of Biomedical Engineering, 24(4), 250–268.CrossRefGoogle Scholar
  72. Krim, H., & Viberg, M. (1996). Two decades of array signal processing research: The parametric approach. IEEE Signal Processing Magazine, 13(4), 67–94.CrossRefGoogle Scholar
  73. Krishnan, L., Elhilali, M., & Shamma, S. (2014). Segregating complex sound sources through temporal coherence. PLoS Computational Biology, 10(12), e1003985.PubMedPubMedCentralCrossRefGoogle Scholar
  74. Kristjansson, T., Hershey, J., Olsen, P., Rennie, S., & Gopinath, R. (2006). Super-human multi-talker speech recognition: The IBM 2006 speech separation challenge system. In International Conference on Spoken Language Processing, Pittsburgh, PA, September 17–21, 2006.Google Scholar
  75. Lakatos, P., Shah, A. S., Knuth, K. H., Ulbert, I., et al. (2005). An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of Neurophysiology, 94(3), 1904–1911.PubMedCrossRefGoogle Scholar
  76. Lee, T. S., & Mumford, D. (2003). Hierarchical bayesian inference in the visual cortex. Journal of the Optical Society of America, 20(7), 1434–1448.PubMedCrossRefGoogle Scholar
  77. Le Roux, J., Hershey, J. R., & Weninger. F. (2015). Deep NMF for speech separation. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, April 19–24, 2015 (pp. 66–70).Google Scholar
  78. Lewicki, M. S., Olshausen, B. A., Surlykke, A., & Moss, C. F. (2014). Scene analysis in the natural environment. Frontiers in Psychology, 5, 199.PubMedPubMedCentralGoogle Scholar
  79. Loizou, P. C. (2013). Speech enhancement: Theory and practice (2nd ed.). Boca Raton, FL: CRC Press.Google Scholar
  80. Lu, T., Liang, L., & Wang, X. (2001). Temporal and rate representations of time-varying signals in the auditory cortex of awake primates. Nature Neuroscience, 4(11), 1131–1138.PubMedCrossRefGoogle Scholar
  81. Macken, W. J., Tremblay, S., Houghton, R. J., Nicholls, A. P., & Jones, D. M. (2003). Does auditory streaming require attention? Evidence from attentional selectivity in short-term memory. Journal of Experimental Psychology: Human Perception and Performance, 29(1), 43–51.PubMedGoogle Scholar
  82. Madhu, N., & Martin, R. (2011). A versatile framework for speaker separation using a model-based speaker localization approach. IEEE Transactions on Audio, Speech and Language Processing, 19(7), 1900–1912.CrossRefGoogle Scholar
  83. Marin-Hurtado, J. I., Parikh, D. N., & Anderson, D. V. (2012). Perceptually inspired noise-reduction method for binaural hearing aids. IEEE Transactions on Audio, Speech and Language Processing, 20(4), 1372–1382.CrossRefGoogle Scholar
  84. Marr, D. (1982). Vision. San Francisco: Freeman and Co.Google Scholar
  85. McCabe, S. L., & Denham, M. J. (1997). A model of auditory streaming. The Journal of the Acoustical Society of America, 101(3), 1611–1621.CrossRefGoogle Scholar
  86. Mesgarani, N., & Chang, E. F. (2012). Selective cortical representation of attended speaker in multi-talker speech perception. Nature, 485(7397), 233–236.PubMedCrossRefGoogle Scholar
  87. Micheyl, C., Carlyon, R. P., Gutschalk, A., Melcher, J. R., et al. (2007). The role of auditory cortex in the formation of auditory streams. Hearing Research, 229(1–2), 116–131.PubMedPubMedCentralCrossRefGoogle Scholar
  88. Micheyl, C., Hanson, C., Demany, L., Shamma, S., & Oxenham, A. J. (2013). Auditory stream segregation for alternating and synchronous tones. Journal of Experimental Psychology: Human Perception and Performance, 39(6), 1568–1580.PubMedPubMedCentralGoogle Scholar
  89. Middlebrooks, J. C., Dykes, R. W., & Merzenich, M. M. (1980). Binaural response-specific bands in primary auditory cortex (AI) of the cat: Topographical organization orthogonal to isofrequency contours. Brain Research, 181(1), 31–48.PubMedCrossRefGoogle Scholar
  90. Mill, R. W., Bohm, T. M., Bendixen, A., Winkler, I., & Denham, S. L. (2013). Modelling the emergence and dynamics of perceptual organisation in auditory streaming. PLoS Computational Biology, 9(3), e1002925.PubMedPubMedCentralCrossRefGoogle Scholar
  91. Miller, L. M., Escabi, M. A., Read, H. L., & Schreiner, C. E. (2002). Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. Journal of Neurophysiology, 87(1), 516–527.PubMedGoogle Scholar
  92. Ming, J., Srinivasan, R., Crookes, D., & Jafari, A. (2013). CLOSE—A data-driven approach to speech separation. IEEE Transactions on Audio, Speech and Language Processing, 21(7), 1355–1368.CrossRefGoogle Scholar
  93. Mirbagheri, M., Akram, S., & Shamma, S. (2012). An auditory inspired multimodal framework for speech enhancement. In Proceedings of the 13th Annual Conference of the International Speech Communication Association (INTERSPEECH), Portland, OR.Google Scholar
  94. Moore, B. C. J., & Gockel, H. (2002). Factors influencing sequential stream segregation. Acta Acustica, 88, 320–333.Google Scholar
  95. Mumford, D. (1992). On the computational architecture of the neocortex. II. The role of cortico-cortical loops. Biological Cybernetics, 66(3), 241–251.PubMedCrossRefGoogle Scholar
  96. Naik, G., & Wang, W. (Eds.). (2014). Blind source separation: Advances in theory, algorithms and applications. Berlin/Heidelberg: Springer-Verlag.Google Scholar
  97. Nelken, I. (2004). Processing of complex stimuli and natural scenes in the auditory cortex. Current Opinion in Neurobiology, 14(4), 474–480.PubMedCrossRefGoogle Scholar
  98. Nelken, I., & Bar-Yosef, O. (2008). Neurons and objects: The case of auditory cortex. Frontiers in Neuroscience, 2(1), 107–113.PubMedPubMedCentralCrossRefGoogle Scholar
  99. Parsons, T. W. (1976). Separation of speech from interfering speech by means of harmonic selection. The Journal of the Acoustical Society of America, 60(4), 911–918.CrossRefGoogle Scholar
  100. Patil, K., & Elhilali, M. (2013). Multiresolution auditory representations for scene recognition. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, October 20–23, 2013.Google Scholar
  101. Poggio, T. (2012). The levels of understanding framework, revised. Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2012–014. Cambridge, MA: Massachusetts Institute of Technology.Google Scholar
  102. Pressnitzer, D., Sayles, M., Micheyl, C., & Winter, I. M. (2008). Perceptual organization of sound begins in the auditory periphery. Current Biology, 18(15), 1124–1128.PubMedPubMedCentralCrossRefGoogle Scholar
  103. Rabiner, L., & Juang, B. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  104. Rao, R. P. (2005). Bayesian inference and attentional modulation in the visual cortex. NeuroReport, 16(16), 1843–1848.PubMedCrossRefGoogle Scholar
  105. Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–87.PubMedCrossRefGoogle Scholar
  106. Riesenhuber, M., & Poggio, T. (2002). Neural mechanisms of object recognition. Current Opinion in Neurobiology, 12(2), 162–168.PubMedCrossRefGoogle Scholar
  107. Roberts, B., Glasberg, B. R., & Moore, B. C. (2002). Primitive stream segregation of tone sequences without differences in fundamental frequency or passband. The Journal of the Acoustical Society of America, 112(5), 2074–2085.PubMedCrossRefGoogle Scholar
  108. Roweis, S. T. (2001). One microphone source separation. Advances in Neural Information Processing Systems, 13, 793–799.Google Scholar
  109. Schreiner, C. E. (1998). Spatial distribution of responses to simple and complex sounds in the primary auditory cortex. Audiology and Neuro-Otology, 3(2–3), 104–122.PubMedCrossRefGoogle Scholar
  110. Schreiner, C. E., & Sutter, M. L. (1992). Topography of excitatory bandwidth in cat primary auditory cortex: Single-neuron versus multiple-neuron recordings. Journal of Neurophysiology, 68(5), 1487–1502.PubMedGoogle Scholar
  111. Schroger, E., Bendixen, A., Denham, S. L., Mill, R. W., et al. (2014). Predictive regularity representations in violation detection and auditory stream segregation: From conceptual to computational models. Brain Topography, 27(4), 565–577.PubMedCrossRefGoogle Scholar
  112. Shamma, S., & Fritz, J. (2014). Adaptive auditory computations. Current Opinion in Neurobiology, 25, 164–168.PubMedPubMedCentralCrossRefGoogle Scholar
  113. Shamma, S. A., Elhilali, M., & Micheyl, C. (2011). Temporal coherence and attention in auditory scene analysis. Trends in Neurosciences, 34(3), 114–123.PubMedCrossRefGoogle Scholar
  114. Sharpee, T. O., Atencio, C. A., & Schreiner, C. E. (2011). Hierarchical representations in the auditory cortex. Current Opinion in Neurobiology, 21(5), 761–767.PubMedPubMedCentralCrossRefGoogle Scholar
  115. Sheft, S. (2008). Envelope processing and sound-source perception. In W. A. Yost, A. Popper, & R. R. Fay (Eds.), Auditory perception of sound sources (pp. 233–280). New York: Springer Science+Business Media.Google Scholar
  116. Shinn-Cunningham, B. G. (2008). Object-based auditory and visual attention. Trends in Cognitive Sciences, 12(5), 182–186.PubMedPubMedCentralCrossRefGoogle Scholar
  117. Simpson, A. J. (2015). Probabilistic binary-mask cocktail-party source separation in a convolutional deep neural network. arXiv Preprint arXiv:1503.06962.Google Scholar
  118. Souden, M., Araki, S., Kinoshita, K., Nakatani, T., & Sawada, H. (2013). A multichannel MMSE-based framework for speech source separation and noise reduction. IEEE Transactions on Audio, Speech and Language Processing, 21(9), 1913–1928.CrossRefGoogle Scholar
  119. Stern, R., Brown, G., & Wang, D. L. (2005). Binaural sound localization. In D. L. Wang & G. Brown (Eds.), Computational auditory scene analysis: Principles, algorithms and applications (pp. 147–186). Hoboken, NJ: Wiley-IEEE Press.Google Scholar
  120. Suga, N., Yan, J., & Zhang, Y. (1997). Cortical maps for hearing and egocentric selection for self-organization. Trends in Cognitive Sciences, 1(1), 13–20.PubMedCrossRefGoogle Scholar
  121. Sussman, E. S., Horvath, J., Winkler, I., & Orr, M. (2007). The role of attention in the formation of auditory streams. Perception and Psychophysics, 69(1), 136–152.PubMedCrossRefGoogle Scholar
  122. Trahiotis, C., Bernstein, L. R., Stern, R. M., & Buel, T. N. (2005). Interaural correlation as the basis of a working model of binaural processing: An introduction. In A. N. Popper & R. R. Fay (Eds.), Sound source localization (pp. 238–271). New York: Springer Science+Business Media.CrossRefGoogle Scholar
  123. van der Kouwe, A. W., Wang, D. L., & Brown, G. J. (2001). A comparison of auditory and blind separation techniques for speech segregation. IEEE Transactions on Speech and Audio Processing, 9(3), 189–195.CrossRefGoogle Scholar
  124. van Noorden, L. P. A. S. (1975). Temporal coherence in the perception of tone sequences. Ph.D. dissertation. Eindhoven, The Netherlands: Eindhoven University of Technology.Google Scholar
  125. van Noorden, L. P. A. S. (1977). Minimum differences of level and frequency for perceptual fission of tone sequences ABAB. The Journal of the Acoustical Society of America, 61(4), 1041–1045.PubMedCrossRefGoogle Scholar
  126. Van Veen, B. D., & Buckley, K. M. (1988). Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine, 5(2), 4–24.CrossRefGoogle Scholar
  127. Varga, A. P., & Moore, R. K. (1990). Hidden Markov model decomposition of speech and noise. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Albuquerque, NM, April 3–6, 1990 (pp. 845–848).Google Scholar
  128. Versnel, H., Kowalski, N., & Shamma, S. A. (1995). Ripple analysis in ferret primary auditory cortex. III. Topographic distribution of ripple response parameters. Journal of Auditory Neuroscience, 1, 271–286.Google Scholar
  129. Virtanen, T., Singh, R., & Bhiksha, R. (Eds.). (2012). Techniques for noise robustness in automatic speech recognition. Hoboken, NJ: Wiley.Google Scholar
  130. Vliegen, J., & Oxenham, A. J. (1999). Sequential stream segregation in the absence of spectral cues. The Journal of the Acoustical Society of America, 105(1), 339–346.PubMedCrossRefGoogle Scholar
  131. von der Malsburg, C. (1994). The correlation theory of brain function. In E. Domany, L. Van Hemmenm, & K. Schulten (Eds.), Models of neural networks (pp. 95–119). Berlin: Springer.CrossRefGoogle Scholar
  132. Waibel, A., & Lee, K. (1990). Readings in speech recognition. Burlington, MA: Morgan Kaufmann.Google Scholar
  133. Wang, D., & Chang, P. (2008). An oscillatory correlation model of auditory streaming. Cognitive Neurodynamics, 2(1), 7–19.PubMedPubMedCentralCrossRefGoogle Scholar
  134. Wang, D. L., & Brown, G. J. (1999). Separation of speech from interfering sounds based on oscillatory correlation. IEEE Transactions on Neural Networks, 10(3), 684–697.PubMedCrossRefGoogle Scholar
  135. Wang, D. L., & Brown, G. J. (Eds.). (2006). Computational auditory scene analysis: Principles, algorithms and applications. Hoboken, NJ: Wiley-IEEE Press.Google Scholar
  136. Weinberger, N. M. (2001). Receptive field plasticity and memory in the auditory cortex: Coding the learned importance of events. In J. Steinmetz, M. Gluck, & P. Solomon (Eds.), Model systems and the neurobiology of associative learning (pp. 187–216). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  137. Weintraub, M. (1985). A theory and computational model of auditory monaural sound separation. Ph.D. dissertation. Stanford University.Google Scholar
  138. Whiteley, L., & Sahani, M. (2012). Attention in a bayesian framework. Frontiers in Human Neuroscience, 6(100), doi: 10.3389/fnhum.2012.00100
  139. Winkler, I., Denham, S. L., & Nelken, I. (2009). Modeling the auditory scene: Predictive regularity representations and perceptual objects. Trends in Cognitive Sciences, 13(12), 532–540.PubMedCrossRefGoogle Scholar
  140. Xu, Y., & Chun, M. M. (2009). Selecting and perceiving multiple visual objects. Trends in Cognitive Sciences, 13(4), 167–174.PubMedPubMedCentralCrossRefGoogle Scholar
  141. Yoon, J. S., Park, J. H., & Kim, H. K. (2009). Acoustic model combination to compensate for residual noise in multi-channel source separation. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, April 19–24, 2009 (pp. 3925–3928).Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Laboratory for Computational Audio Perception, Center for Speech and Language Processing, Department of Electrical and Computer EngineeringThe Johns Hopkins UniversityBaltimoreUSA

Personalised recommendations