Abstract
The acoustic environment poses at least two important challenges. First, animals must localise sound sources using a variety of binaural and monaural cues; and second they must separate sources into distinct auditory streams (the “cocktail party problem”). Binaural cues include intra-aural intensity and phase disparity. The primary monaural cue is the spectral filtering introduced by the head and pinnae via the head-related transfer function (HRTF), which imposes different linear filters upon sources arising at different spatial locations.
Here we address the second challenge, source separation. We propose an algorithm for exploiting the monaural HRTF to separate spatially localised acoustic sources in a noisy environment. We assume that each source has a unique position in space, and is therefore subject to preprocessing by a different linear filter. We also assume prior knowledge of weak statistical regularities present in the sources. This framework can incorporate various aspects of acoustic transfer functions (echos, delays, multiple sensors, frequency-dependent attenuation) in a uniform fashion, treating them as cues for, rather than obstacles to, separation. To accomplish this, sources are represented sparsely in an overcomplete basis. This framework can be extended to make predictions about the neural representations required to separate acoustic sources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bregman, A.S.: Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, Cambridge (1990) ISBN 0-262-02297-4
Yost Jr., W.A., Dye, R.H., Sheft, S.: A simulated “cocktail party” with up to three sound sources. Percept Psychophys 58(7), 1026–1036 (1996)
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20(1), 33–61 (1999)
Lee, T.-W., Lewicki, M.S., Girolami, M., Sejnowski, T.J.: Blind source separation of more sources than mixtures using overcomplete representations. IEEE Signal Processing Letters 4(5), 87–90 (1999)
Lewicki, M., Olshausen, B.A.: Inferring sparse, overcomplete image codes using an efficient coding framework. In: Advances in Neural Information Processing Systems 10, pp. 815–821. MIT Press, Cambridge (1998)
Lewicki, M.S., Sejnowski, T.J.: Learning overcomplete representations. Neural Computation 12(2), 337–365 (2000)
Zibulevsky, M., Pearlmutter, B.A.: Blind source separation by sparse decomposition in a signal dictionary. Neural Computation 13(4), 863–882 (2001)
Bofill, P., Zibulevsky, M.: Underdetermined blind source separation using sparse representations. Signal Processing 81(11), 2353–2362 (2001)
Rickard, S.T., Dietrich, F.: DOA estimation of manyW-disjoint orthogonal sources from two mixtures using DUET. In: Proceedings of the 10th IEEE Workshop on Statistical Signal and Array Processing (SSAP 2000), Pocono Manor, PA, August 2000, pp. 311–314 (2000)
Cauwenberghs, G.: Monaural separation of independent acoustical components. In: Proc. IEEE Int. Symp. Circuits and Systems (ISCAS 1999), Orlando FL, vol. 5, pp. 62–65 (1999)
Hochreiter, S., Mozer, M.C.: Monaural separation and classification of mixed signals: A support-vector regression perspective. In: Lee, T.-W., Jung, T.-P., Makeig, S., Sejnowski, T.J. (eds.) 3rd International Conference on Independent Component Analysis and Blind Signal Separation, San Diego, CA, December 9-12 (2001)
Jang, G.-J., Lee, T.-W.: A maximum likelihood approach to single-channel source separation. Journal of Machine Learning Research 4, 1365–1392 (2003)
Roweis, S.T.: One microphone source separation. In: Advances in Neural Information Processing Systems 13, pp. 793–799. MIT Press, Cambridge (2001)
Poggio, T., Torre, V., Koch, C.: Computational vision and regularization theory. Nature 317(6035), 314–319 (1985)
Donoho, D.L., Elad, M.: Maximal sparsity representation via l1 minimization. Proceedings of the National Academy of Sciences 100, 2197–2202 (2003)
Fletcher, R.: Semidefinite matrix constraints in optimization. SIAM J. Control and Opt. 23, 493–513 (1985)
Hofman, P.M., Van Opstal, A.J.: Bayesian reconstruction of sound localization cues from responses to random spectra. Biol. Cybern. 86(4), 305–316 (2002)
Knudsen, E.I., Konishi, M.: Mechanisms of sound localization in the barn owl. Journal of Comparative Physiology 133, 13–21 (1979)
Wenzel, E.M., Arruda, M., Kistler, D.J., Wightman, F.L.: Localization using nonindividualized head-related transfer functions. J. Acoust. Soc. Am. 94(1), 111–123 (1993)
Wightman, F.L., Kistler, D.J.: Headphone simulation of free-field listening. II: Psychophysical validation. J. Acoust. Soc. Am. 85(2), 868–878 (1989)
Kulkarni, A., Colburn, H.S.: Role of spectral detail in sound-source localization. Nature 396(6713), 747–749 (1998)
King, A.J., Parsons, C.H., Moore, D.R.: Plasticity in the neural coding of auditory space in the mammalian brain. Proc. Natl. Acad. Sci. USA 97(22), 11821–11828 (2000)
Linkenhoker, B.A., Knudsen, E.I.: Incremental training increases the plasticity of the auditory space map in adult barn owls. Nature 419(6904), 293–296 (2002)
Hofman, P.M., Van Riswick, J.G., Van Opstal, A.J.: Relearning sound localization with new ears. Nat. Neurosci. 1(5), 417–421 (1998)
Shinn-Cunningham, B.G.: Models of plasticity in spatial auditory processing. Audiology and Neuro-Otology 6(4), 187–191 (2001)
Bell, A.J., Sejnowski, T.J.: The ‘independent components’ of natural scenes are edge filters. Vision Research 37(23), 3327–3338 (1997)
Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research 37(23), 3311–3325 (1997)
Riesenhuber, M., Poggio, T.: Models of object recognition. Nature Neuroscience 3 Suppl., 1199–1204 (2000)
Olshausen, B., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996)
Olshausen, B.A., O’Connor, K.N.: A new window on sound. Nature Neuroscience 5, 292–293 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pearlmutter, B.A., Zador, A.M. (2004). Monaural Source Separation Using Spectral Cues. In: Puntonet, C.G., Prieto, A. (eds) Independent Component Analysis and Blind Signal Separation. ICA 2004. Lecture Notes in Computer Science, vol 3195. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30110-3_61
Download citation
DOI: https://doi.org/10.1007/978-3-540-30110-3_61
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23056-4
Online ISBN: 978-3-540-30110-3
eBook Packages: Springer Book Archive