Abstract
The robust localization of speech sources is required for a wide range of applications, among them hearing aids and teleconferencing systems. This chapter focuses on binaural approaches to estimate the spatial position of multiple competing speakers in adverse acoustic scenarios by only exploiting the signals reaching both ears. A set of experiments is conducted to systematically evaluate the impact of reverberation and interfering noise on speaker-localization performance. In particular, the spatial distribution of the interfering noise has a considerable effect on speaker-localization performance, being most detrimental if the noise field contains strong directional components. In these conditions, interfering noise might be erroneously classified as a speaker position. This observation highlights the necessity to combine the localization stage with a decision about the underlying source type in order to enable a robust localization of speakers in noisy environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Knowles electronic manikin for acoustic research, KEMAR.
- 2.
Although the problem of moving sources is not covered in this chapter, the MATLAB toolbox ROOMSIMOVE for simulating RIRs for moving sources can be found at http://www.irisa.fr/metiss/members/evincent/software.
- 3.
The corresponding MATLAB code can be found at http://www.umiacs.umd.edu/labs/cvl/pirl/vikas/Current_research/time_delay_estimation/time_delay_estimation.html
References
P. Aarabi. Self-localizing dynamic microphone arrays. IEEE Trans. Sys., Man, Cybern., C, 32(4):474–484, Nov. 2002.
P. Aarabi and S. Mavandadi. Robust sound localization using conditional time-frequency histograms. Inf. Fusion, 4(2):111–122, Sep. 2003.
P. Aarabi and S. Zaky. Iterative spatial probability based sound localization. In Proceedings of the 4th World Multi-conference on Circuits, Systems, Computers and Communications, Athens, Greece, Jul. 2000.
J. B. Allen and D. A. Berkley. Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am., 65(4):943–950, Apr. 1979.
S. Argentieri, A. Portello, M. Bernard, P. Danės, and B. Gas. Binaural systems in robotics. In J. Blauert, editor, The technology of binaural listening, chapter 9. Springer, Berlin-Heidelberg-New York NY, 2013.
R. Baumgartner, P. Majdak, and B. Laback. Assessment of sagittal-plane sound-localization performance in spatial-audio applications, chapter 4. In J. Blauert, editor, The technology of binaural listening. Springer–Berlin–Heidelberg–New York NY, 2013.
J. Benesty, J. Chen, and Y. Huang. Time-delay estimation via linear interpolation and cross correlation. IEEE Trans. Speech Audio Process. 12(5):509–519, 2004.
M. Bodden. Modeling human sound-source localization and the cocktail-party-effect. Acta Acust./Acustica, 1(1):43–55, 1993.
J. Braasch. Modelling of binaural hearing. In J. Blauert, editor, Communication acoustics, chapter 4, pages 75–108. Springer, Berlin, Germany, 2005.
A. S. Bregman. Auditory scene analysis: The perceptual organization of sound. The MIT Press, Cambridge, MA, USA, 1990.
A. W. Bronkhorst. The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions. Acustica, 86:117–128, 2000.
A. W. Bronkhorst and R. Plomp. Effect of multiple speechlike maskers on binaural speech recognition in normal and impaired hearing. J. Acoust. Soc. Am., 92(6):3132–3139, Dec. 1992.
C. P. Brown and R. O. Duda. A structural model for binaural sound synthesis. IEEE Trans. Speech Audio Process., 6(5):476–488, Sep. 1998.
G. J. Brown and M. Cooke. Computational auditory scene analysis. Comput. Speech Lang., 8(4):297–336, Oct. 1994.
G. C. Carter, A. H. Nuttall, and P. G. Cable. The smoothed coherence transform. Proceedings of the IEEE, 61(10):1497–1498, Oct. 1973.
J. Chen, J. Benesty, and Y. Huang. Robust time delay estimation exploiting redundancy among multiple microphones. IEEE Trans. Acoust., Speech, Signal Process., 11(6):549–557, 2003.
J. Chen, J. Benesty, and Y. A. Huang. Performance of GCC- and AMDF-based time-delay estimation in practical reverberant environments. J. Appl. Signal Process., 1:25–36, 2005.
J. Chen, J. Benesty, and Y. A. Huang. Time delay estimation in room acoustic environments: An overview. J. Appl. Signal Process., 2006:1–19, 2006.
E. C. Cherry. Some experiments on the recognition of speech, with one and two ears. J. Acoust. Soc. Am., 25(5):975–979, Sep. 1953.
M. Cooke. A glimpsing model of speech perception in noise. J. Acoust. Soc. Am., 199(3):1562–1573, Mar. 2006.
M. Cooke, P. Green, L. Josifovski, and A. Vizinho. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun., 34:267–285, 2001.
M. Cooke and T.-W. Lee. Speech separation and recognition competition. URL http://staffwww.dcs.shef.ac.uk/people/M. Cooke/SpeechSeparationChallenge.htm, accessed on 15th January 2013, 2006.
C. J. Darwin. Auditory grouping. Trends Cogn. Sci., 1(1):327–333, Dec. 1997.
M. S. Datum, F. Palmieri, and A. Moiseff. An artificial neural network for sound localization using binaural cues. J. Acoust. Soc. Am., 100(1):372–383, Jul. 1996.
J. DiBiase, H. Silverman, and M. Brandstein. Robust localization in reverberant rooms. In M. Brandstein and D. Ward, editors, Microphone arrays: Signal processing techniques and applications, chapter 8, pages 157–180. Springer, Berlin, Germany, 2001.
M. Dietz, S. D. Ewert, and V. Hohmann. Auditory model based direction estimation of concurrent speakers from binaural signals. Speech Commun., 53(5):592–605, 2011.
G. Doblinger. Localization and tracking of acoustical sources. In E. Haensler and G. Schmidt, editors, Topics in acoustic echo and noise control, chapter 4, pages 91–124. Springer, Berlin, Germany, 2006.
R. O. Duda and W. L. Martens. Range dependence of the response of a spherical head model. J. Acoust. Soc. Am., 104(5):3048–3058, Nov. 1998.
C. Faller and J. Merimaa. Source localization in complex listening situations: Selection of binaural cues based on interaural coherence. J. Acoust. Soc. Am., 116(5):3075–3089, Nov. 2004.
W. G. Gardner and K. D. Martin. HRTF measurements of a KEMAR dummy-head microphone. Technical report, # 280, MIT Media Lab, Perceptual Computing, Cambridge, MA, USA, 1994.
B. R. Glasberg and B. C. J. Moore. Derivation of auditory filter shapes from notched-noise data. Hear. Res., 47(1–2):103–138, Aug. 1990.
T. Gustafsson, B. D. Rao, and M. Trivedi. Analysis of time-delay estimation in reverberant environments. In Proc. ICASSP, pages 2097–2100, Orlando, Florida, USA, May 2002.
S. Harding, J. Barker, and G. Brown. Mask estimation for missing data speech recognition based on statistics of binaural interaction. IEEE Trans. Audio, Speech, Lang. Process., 14(1):58–67, Jan. 2006.
J.-S. Hu and W.-H. Liu. Location classification of nonstationary sound sources using binaural room distribution patterns. IEEE Trans. Audio, Speech, Lang. Process., 17(4):682–692, May 2009.
C. Hummersone, R. Mason, and T. Brookes. Dynamic precedence effect modelling for source separation in reverberant environments. IEEE Trans. Audio, Speech, Lang. Process., 18(7):1867–1871, Sep. 2010.
G. Jacovitti and G. Scarano. Discrete time techniques for time delay estimation. IEEE Trans. Signal Process., 41(2):525–533, Feb. 1993.
M. Jeub, M. Schäfer, and P. Vary. A binaural room impulse response database for the evaluation of dereverberation algorithms. Proc. Intl. Conf. Digital Signal Process. (DSP), pages 1–5, Jul. 2009.
A. Jourjine, S. Rickard, and Yilmaz. Blind separation of disjoint orthogonal signals: Demixing N sources from 2 mixtures. In Proc. ICASSP, pages 2985–2988, Istanbul, Turkey, Jun. 2000.
H. Kayser, S. D. Ewert, T. Rohdenburg, V. Hohmann, and B. Kollmeier. Database of multichannel in-ear and behind-the-ear head-related and binaural room impulse responses. EURASIP J. Adv. Sig. Proc., 2009.
G. Kim, Y. Lu, Y. Hu, and P. C. Loizou. An algorithm that improves speech intelligibility in noise for normal-hearing listeners. J. Acoust. Soc. Am., 126(3):1486–1494, Sep. 2009.
C. H. Knapp and G. C. Carter. The generalized correlation method for estimation of time delay. IEEE Trans. Acoust., Speech, Signal Process., ASSP-24(4):320–327, Aug. 1976.
B. Kollmeier and R. Koch. Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. J. Acoust. Soc. Am., 95(3):1593–1602, Mar. 1994.
E. H. A. Langendijk and A. W. Bronkhorst. Contribution of spectral cues to human sound localization. J. Acoust. Soc. Am., 112(4):1583–1596, Oct. 2002.
W. Lindemann. Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. J. Acoust. Soc. Am., 80(6):1608–1622, Dec. 1986.
W. Lindemann. Extension of a binaural cross-correlation model by contralateral inhibition. II. The law of the first wave front. J. Acoust. Soc. Am., 80(6):1623–1630, Dec. 1986.
R. F. Lyon. A computational model of binaural localization and separation. In Proc. ICASSP, pages 1148–1151, Boston, Massachusetts, USA, Apr. 1983.
N. Madhu and R. Martin. Acoustic source localization with microphone arrays. In R. Martin, U. Heute, and C. Antweiler, editors, Advances in Digital Speech Transmission, chapter 6, pages 135–170. Wiley, 2008.
T. May and S. van de Par. Blind estimation of the number of speech sources in reverberant multisource scenarios based on binaural signals. in Proc. IWAENC, Aachen, Germany, Sep. 2012.
T. May, S. van de Par, and A. Kohlrausch. Binaural detection of speech sources in complex acoustic scenes. In Proc. WASPAA, pages 241–244, New Paltz, NY, USA, Oct. 2011.
T. May, S. van de Par, and A. Kohlrausch. A probabilistic model for robust localization based on a binaural auditory front-end. IEEE Trans. Audio, Speech, Lang. Process., 19(1):1–13, Jan. 2011.
T. May, S. van de Par, and A. Kohlrausch. A binaural scene analyzer for joint localization and recognition of speakers in the presence of interfering noise sources and reverberation. IEEE Trans. Audio, Speech, Lang. Process., 20(7):2016–2030, Sep. 2012.
T. May, S. van de Par, and A. Kohlrausch. Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE Trans. Audio, Speech, Lang. Process., 20(1):108–121, Jan. 2012.
R. Meddis, M. J. Hewitt, and T. M. Shackleton. Implementation details of a computation model of the inner hair-cell auditory-nerve synapse. J. Acoust. Soc. Am., 87(4):1813–1816, Apr. 1990.
R. Meddis and E. A. Lopez-Poveda. Auditory periphery: From pinna to auditory nerve. In R. Meddis, E. A. Lopez-Poveda, R. R. Fay, and A. N. Popper, editors, Computational models of the auditory system, volume 35, chapter 2, pages 7–38. Springer, New York, 2010.
B. C. J. Moore. An introduction to the psychology of hearing. Academic Press, San Diego, California, USA, 5th edition, 2003.
J. Nix and V. Hohmann. Sound source localization in real sound fields based on empirical statistics of interaural parameters. J. Acoust. Soc. Am., 119(1):463–479, Jan. 2006.
K. J. Palomäki, G. J. Brown, and D. L. Wang. A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Speech Commun., 43(4):361–378, 2004.
J. Perez-Lorenzo, R. Viciana-Abad, P. Reche-Lopez, F. Rivas, and J. Escolano. Evaluation of generalized cross-correlation methods for direction of arrival estimation using two microphones in real environments. Appl. Acoust., 73(8):698–712, Aug. 2012.
V. C. Raykar, B. Yegnanarayana, S. R. M. Prasanna, and R. Duraiswami. Speaker localization using excitation source information in speech. IEEE Trans. Speech Audio Process., 13(5):751–761, Sep. 2005.
L. Rayleigh. On our perception of sound direction. Philos. Mag., 13:214–232, 1907.
N. Roman and D. L. Wang. Binaural tracking of multiple moving sources. In Proc. ICASSP, volume 5, pages 149–152, Hong Kong, China, Apr. 2003.
N. Roman and D. L. Wang. Binaural tracking of multiple moving sources. IEEE Trans. Audio, Speech, Lang. Process., 16(4):728–739, 2008.
N. Roman, D. L. Wang, and G. J. Brown. Speech segregation based on sound localization. J. Acoust. Soc. Am., 114(4):2236–2252, Oct. 2003.
R. Roy and T. Kailath. ESPRIT - estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust., Speech, Signal Process., 37(7):984–995, Jul. 1989.
S. M. Schimmel, M. F. Müller, and N. Dillier. A fast and accurate “shoebox” room acoustics simulator. In Proc. ICASSP, pages 241–244, Taipei, Taiwan, Apr. 2009.
R. O. Schmidt. Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propagat., AP-34(3):276–280, Mar. 1986.
M. R. Schroeder. New method for measuring reverberation time. J. Acoust. Soc. Am., 37(3):409–412, 1965.
C. L. Searle, L. D. Braida, D. R. Cuddy, and M. F. Davis. Binaural pinna disparity: another auditory localization cue. J. Acoust. Soc. Am., 57(2):448–455, Feb. 1975.
T. M. Shackleton, R. Meddis, and M. J. Hewitt. Across frequency integration in a model of lateralization. J. Acoust. Soc. Am., 91(4):2276–2279, Apr. 1992.
C. Spille, B. Meyer, M. Dietz, and V. Hohmann. Binaural scene analysis with multi-dimensional statistical filters, chapter 6. In J. Blauert, editor, The technology of binaural listening. Springer, Berlin-Heidelberg-New York NY, 2013.
R. M. Stern, A. S. Zeiberg, and C. Trahiotis. Lateralization of complex binaural stimuli: A weighted-image model. J. Acoust. Soc. Am., 84(1):156–165, Jul. 1988.
C. J. Sumner, E. A. Lopez-Poveda, L. P. O’Mard, and R. Meddis. A revised model of the inner-hair cell and auditory-nerve complex. J. Acoust. Soc. Am., 111(5):2178–2188, May 2002.
S. Tervo and T. Lokki. Interpolation methods for the SRP-PHAT algorithm. In Proc. IWAENC, Seattle, Washington, USA, Sep. 2008.
A. P. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones. The NOISEX-92 study on the effect of additive noise on automatic speaker recognition. Technical report, Speech Research Unit, Defence Research Agency, Malvern, UK, 1992.
D. L. Wang and G. Brown, editors. Computational auditory scene analysis: Principles, algorithms and applications. John Wiley & Sons, Hoboken, NJ, USA, 2006.
D. B. Ward, E. A. Lehmann, and R. C. Williamson. Particle filtering algorithms for tracking an acoustic source in a reverberant environment. IEEE Trans. Speech Audio Process., 11(6):826-836, Nov. 2003.
V. Willert, J. Eggert, J. Adamy, R. Stahl, and E. Körner. A probabilistic model for binaural sound localization. IEEE Trans. Sys., Man, Cybern., B, 36(5):982–994, Oct. 2006.
J. Woodruff and D. L. Wang. Sequential organization of speech in reverberant environments by integrating monaural grouping and binaural localization. IEEE Trans. Audio, Speech, Lang. Process., 18(7):1856–1866, Sep. 2010.
J. Woodruff and D. L. Wang. Binaural localization of multiple sources in reverberant and noisy environments. IEEE Trans. Audio, Speech, Lang. Process., 20(5):1503–1512, Jul. 2012.
J. Woodruff and D. L. Wang. Binaural detection, localization, and segregation in reverberant environments based on joint pitch and azimuth cues. IEEE Trans. Audio, Speech, Lang. Process., 21(4):806–815, Apr. 2013.
O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Signal Process. Lett., 52(7):1830–1847, Jul. 2004.
P. Zakarauskas and M. S. Cynader. A computational theory of spectral cue localization. J. Acoust. Soc. Am., 94(3):1323–1331, Sep. 1993.
C. Zhang, D. Florêncio, and Z. Zhang. Why does PHAT work well in low noise, reverberative environments? In Proc. ICASSP, pages 2565–2568, 2008.
L. Zhang and X. Wu. On cross correlation based discrete time delay estimation. In Proc. ICASSP, volume 4, pages 981–984, Philadelphia, Pennsylvania, USA, 2005.
Acknowledgments
The authors are indebted to two anonymous reviewers for their constructive suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
May, T., van de Par, S., Kohlrausch, A. (2013). Binaural Localization and Detection of Speakers in Complex Acoustic Scenes. In: Blauert, J. (eds) The Technology of Binaural Listening. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37762-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-37762-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37761-7
Online ISBN: 978-3-642-37762-4
eBook Packages: EngineeringEngineering (R0)