Monaural Source Separation Using Spectral Cues

  • Barak A. Pearlmutter
  • Anthony M. Zador
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3195)

Abstract

The acoustic environment poses at least two important challenges. First, animals must localise sound sources using a variety of binaural and monaural cues; and second they must separate sources into distinct auditory streams (the “cocktail party problem”). Binaural cues include intra-aural intensity and phase disparity. The primary monaural cue is the spectral filtering introduced by the head and pinnae via the head-related transfer function (HRTF), which imposes different linear filters upon sources arising at different spatial locations.

Here we address the second challenge, source separation. We propose an algorithm for exploiting the monaural HRTF to separate spatially localised acoustic sources in a noisy environment. We assume that each source has a unique position in space, and is therefore subject to preprocessing by a different linear filter. We also assume prior knowledge of weak statistical regularities present in the sources. This framework can incorporate various aspects of acoustic transfer functions (echos, delays, multiple sensors, frequency-dependent attenuation) in a uniform fashion, treating them as cues for, rather than obstacles to, separation. To accomplish this, sources are represented sparsely in an overcomplete basis. This framework can be extended to make predictions about the neural representations required to separate acoustic sources.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bregman, A.S.: Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, Cambridge (1990) ISBN 0-262-02297-4Google Scholar
  2. 2.
    Yost Jr., W.A., Dye, R.H., Sheft, S.: A simulated “cocktail party” with up to three sound sources. Percept Psychophys 58(7), 1026–1036 (1996)CrossRefGoogle Scholar
  3. 3.
    Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20(1), 33–61 (1999)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Lee, T.-W., Lewicki, M.S., Girolami, M., Sejnowski, T.J.: Blind source separation of more sources than mixtures using overcomplete representations. IEEE Signal Processing Letters 4(5), 87–90 (1999)Google Scholar
  5. 5.
    Lewicki, M., Olshausen, B.A.: Inferring sparse, overcomplete image codes using an efficient coding framework. In: Advances in Neural Information Processing Systems 10, pp. 815–821. MIT Press, Cambridge (1998)Google Scholar
  6. 6.
    Lewicki, M.S., Sejnowski, T.J.: Learning overcomplete representations. Neural Computation 12(2), 337–365 (2000)CrossRefGoogle Scholar
  7. 7.
    Zibulevsky, M., Pearlmutter, B.A.: Blind source separation by sparse decomposition in a signal dictionary. Neural Computation 13(4), 863–882 (2001)MATHCrossRefGoogle Scholar
  8. 8.
    Bofill, P., Zibulevsky, M.: Underdetermined blind source separation using sparse representations. Signal Processing 81(11), 2353–2362 (2001)MATHCrossRefGoogle Scholar
  9. 9.
    Rickard, S.T., Dietrich, F.: DOA estimation of manyW-disjoint orthogonal sources from two mixtures using DUET. In: Proceedings of the 10th IEEE Workshop on Statistical Signal and Array Processing (SSAP 2000), Pocono Manor, PA, August 2000, pp. 311–314 (2000)Google Scholar
  10. 10.
    Cauwenberghs, G.: Monaural separation of independent acoustical components. In: Proc. IEEE Int. Symp. Circuits and Systems (ISCAS 1999), Orlando FL, vol. 5, pp. 62–65 (1999)Google Scholar
  11. 11.
    Hochreiter, S., Mozer, M.C.: Monaural separation and classification of mixed signals: A support-vector regression perspective. In: Lee, T.-W., Jung, T.-P., Makeig, S., Sejnowski, T.J. (eds.) 3rd International Conference on Independent Component Analysis and Blind Signal Separation, San Diego, CA, December 9-12 (2001)Google Scholar
  12. 12.
    Jang, G.-J., Lee, T.-W.: A maximum likelihood approach to single-channel source separation. Journal of Machine Learning Research 4, 1365–1392 (2003)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Roweis, S.T.: One microphone source separation. In: Advances in Neural Information Processing Systems 13, pp. 793–799. MIT Press, Cambridge (2001)Google Scholar
  14. 14.
    Poggio, T., Torre, V., Koch, C.: Computational vision and regularization theory. Nature 317(6035), 314–319 (1985)CrossRefGoogle Scholar
  15. 15.
    Donoho, D.L., Elad, M.: Maximal sparsity representation via l1 minimization. Proceedings of the National Academy of Sciences 100, 2197–2202 (2003)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Fletcher, R.: Semidefinite matrix constraints in optimization. SIAM J. Control and Opt. 23, 493–513 (1985)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Hofman, P.M., Van Opstal, A.J.: Bayesian reconstruction of sound localization cues from responses to random spectra. Biol. Cybern. 86(4), 305–316 (2002)MATHCrossRefGoogle Scholar
  18. 18.
    Knudsen, E.I., Konishi, M.: Mechanisms of sound localization in the barn owl. Journal of Comparative Physiology 133, 13–21 (1979)CrossRefGoogle Scholar
  19. 19.
    Wenzel, E.M., Arruda, M., Kistler, D.J., Wightman, F.L.: Localization using nonindividualized head-related transfer functions. J. Acoust. Soc. Am. 94(1), 111–123 (1993)CrossRefGoogle Scholar
  20. 20.
    Wightman, F.L., Kistler, D.J.: Headphone simulation of free-field listening. II: Psychophysical validation. J. Acoust. Soc. Am. 85(2), 868–878 (1989)CrossRefGoogle Scholar
  21. 21.
    Kulkarni, A., Colburn, H.S.: Role of spectral detail in sound-source localization. Nature 396(6713), 747–749 (1998)CrossRefGoogle Scholar
  22. 22.
    King, A.J., Parsons, C.H., Moore, D.R.: Plasticity in the neural coding of auditory space in the mammalian brain. Proc. Natl. Acad. Sci. USA 97(22), 11821–11828 (2000)CrossRefGoogle Scholar
  23. 23.
    Linkenhoker, B.A., Knudsen, E.I.: Incremental training increases the plasticity of the auditory space map in adult barn owls. Nature 419(6904), 293–296 (2002)CrossRefGoogle Scholar
  24. 24.
    Hofman, P.M., Van Riswick, J.G., Van Opstal, A.J.: Relearning sound localization with new ears. Nat. Neurosci. 1(5), 417–421 (1998)CrossRefGoogle Scholar
  25. 25.
    Shinn-Cunningham, B.G.: Models of plasticity in spatial auditory processing. Audiology and Neuro-Otology 6(4), 187–191 (2001)CrossRefGoogle Scholar
  26. 26.
    Bell, A.J., Sejnowski, T.J.: The ‘independent components’ of natural scenes are edge filters. Vision Research 37(23), 3327–3338 (1997)CrossRefGoogle Scholar
  27. 27.
    Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research 37(23), 3311–3325 (1997)CrossRefGoogle Scholar
  28. 28.
    Riesenhuber, M., Poggio, T.: Models of object recognition. Nature Neuroscience 3 Suppl., 1199–1204 (2000)CrossRefGoogle Scholar
  29. 29.
    Olshausen, B., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996)CrossRefGoogle Scholar
  30. 30.
    Olshausen, B.A., O’Connor, K.N.: A new window on sound. Nature Neuroscience 5, 292–293 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Barak A. Pearlmutter
    • 1
  • Anthony M. Zador
    • 2
  1. 1.Hamilton InstituteNational University of IrelandMaynooth, Co. KildareIreland
  2. 2.Cold Spring Harbor LaboratoryCold Spring HarborUSA

Personalised recommendations