Binaural Localization and Detection of Speakers in Complex Acoustic Scenes

May, T.; van de Par, S.; Kohlrausch, A.

doi:10.1007/978-3-642-37762-4_15

T. May²,
S. van de Par³ &
A. Kohlrausch^4,5

Part of the book series: Modern Acoustics and Signal Processing ((MASP))

4279 Accesses
20 Citations

Abstract

The robust localization of speech sources is required for a wide range of applications, among them hearing aids and teleconferencing systems. This chapter focuses on binaural approaches to estimate the spatial position of multiple competing speakers in adverse acoustic scenarios by only exploiting the signals reaching both ears. A set of experiments is conducted to systematically evaluate the impact of reverberation and interfering noise on speaker-localization performance. In particular, the spatial distribution of the interfering noise has a considerable effect on speaker-localization performance, being most detrimental if the noise field contains strong directional components. In these conditions, interfering noise might be erroneously classified as a speaker position. This observation highlights the necessity to combine the localization stage with a decision about the underlying source type in order to enable a robust localization of speakers in noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Knowles electronic manikin for acoustic research, KEMAR.
2.
Although the problem of moving sources is not covered in this chapter, the MATLAB toolbox ROOMSIMOVE for simulating RIRs for moving sources can be found at http://www.irisa.fr/metiss/members/evincent/software.
3.
The corresponding MATLAB code can be found at http://www.umiacs.umd.edu/labs/cvl/pirl/vikas/Current_research/time_delay_estimation/time_delay_estimation.html

References

P. Aarabi. Self-localizing dynamic microphone arrays. IEEE Trans. Sys., Man, Cybern., C, 32(4):474–484, Nov. 2002.
Google Scholar
P. Aarabi and S. Mavandadi. Robust sound localization using conditional time-frequency histograms. Inf. Fusion, 4(2):111–122, Sep. 2003.
Google Scholar
P. Aarabi and S. Zaky. Iterative spatial probability based sound localization. In Proceedings of the 4th World Multi-conference on Circuits, Systems, Computers and Communications, Athens, Greece, Jul. 2000.
Google Scholar
J. B. Allen and D. A. Berkley. Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am., 65(4):943–950, Apr. 1979.
Google Scholar
S. Argentieri, A. Portello, M. Bernard, P. Danės, and B. Gas. Binaural systems in robotics. In J. Blauert, editor, The technology of binaural listening, chapter 9. Springer, Berlin-Heidelberg-New York NY, 2013.
Google Scholar
R. Baumgartner, P. Majdak, and B. Laback. Assessment of sagittal-plane sound-localization performance in spatial-audio applications, chapter 4. In J. Blauert, editor, The technology of binaural listening. Springer–Berlin–Heidelberg–New York NY, 2013.
Google Scholar
J. Benesty, J. Chen, and Y. Huang. Time-delay estimation via linear interpolation and cross correlation. IEEE Trans. Speech Audio Process. 12(5):509–519, 2004.
Google Scholar
M. Bodden. Modeling human sound-source localization and the cocktail-party-effect. Acta Acust./Acustica, 1(1):43–55, 1993.
Google Scholar
J. Braasch. Modelling of binaural hearing. In J. Blauert, editor, Communication acoustics, chapter 4, pages 75–108. Springer, Berlin, Germany, 2005.
Google Scholar
A. S. Bregman. Auditory scene analysis: The perceptual organization of sound. The MIT Press, Cambridge, MA, USA, 1990.
Google Scholar
A. W. Bronkhorst. The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions. Acustica, 86:117–128, 2000.
Google Scholar
A. W. Bronkhorst and R. Plomp. Effect of multiple speechlike maskers on binaural speech recognition in normal and impaired hearing. J. Acoust. Soc. Am., 92(6):3132–3139, Dec. 1992.
Google Scholar
C. P. Brown and R. O. Duda. A structural model for binaural sound synthesis. IEEE Trans. Speech Audio Process., 6(5):476–488, Sep. 1998.
Google Scholar
G. J. Brown and M. Cooke. Computational auditory scene analysis. Comput. Speech Lang., 8(4):297–336, Oct. 1994.
Google Scholar
G. C. Carter, A. H. Nuttall, and P. G. Cable. The smoothed coherence transform. Proceedings of the IEEE, 61(10):1497–1498, Oct. 1973.
Google Scholar
J. Chen, J. Benesty, and Y. Huang. Robust time delay estimation exploiting redundancy among multiple microphones. IEEE Trans. Acoust., Speech, Signal Process., 11(6):549–557, 2003.
Google Scholar
J. Chen, J. Benesty, and Y. A. Huang. Performance of GCC- and AMDF-based time-delay estimation in practical reverberant environments. J. Appl. Signal Process., 1:25–36, 2005.
Google Scholar
J. Chen, J. Benesty, and Y. A. Huang. Time delay estimation in room acoustic environments: An overview. J. Appl. Signal Process., 2006:1–19, 2006.
Google Scholar
E. C. Cherry. Some experiments on the recognition of speech, with one and two ears. J. Acoust. Soc. Am., 25(5):975–979, Sep. 1953.
Google Scholar
M. Cooke. A glimpsing model of speech perception in noise. J. Acoust. Soc. Am., 199(3):1562–1573, Mar. 2006.
Google Scholar
M. Cooke, P. Green, L. Josifovski, and A. Vizinho. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun., 34:267–285, 2001.
Google Scholar
M. Cooke and T.-W. Lee. Speech separation and recognition competition. URL http://staffwww.dcs.shef.ac.uk/people/M. Cooke/SpeechSeparationChallenge.htm, accessed on 15th January 2013, 2006.
C. J. Darwin. Auditory grouping. Trends Cogn. Sci., 1(1):327–333, Dec. 1997.
Google Scholar
M. S. Datum, F. Palmieri, and A. Moiseff. An artificial neural network for sound localization using binaural cues. J. Acoust. Soc. Am., 100(1):372–383, Jul. 1996.
Google Scholar
J. DiBiase, H. Silverman, and M. Brandstein. Robust localization in reverberant rooms. In M. Brandstein and D. Ward, editors, Microphone arrays: Signal processing techniques and applications, chapter 8, pages 157–180. Springer, Berlin, Germany, 2001.
Google Scholar
M. Dietz, S. D. Ewert, and V. Hohmann. Auditory model based direction estimation of concurrent speakers from binaural signals. Speech Commun., 53(5):592–605, 2011.
Google Scholar
G. Doblinger. Localization and tracking of acoustical sources. In E. Haensler and G. Schmidt, editors, Topics in acoustic echo and noise control, chapter 4, pages 91–124. Springer, Berlin, Germany, 2006.
Google Scholar
R. O. Duda and W. L. Martens. Range dependence of the response of a spherical head model. J. Acoust. Soc. Am., 104(5):3048–3058, Nov. 1998.
Google Scholar
C. Faller and J. Merimaa. Source localization in complex listening situations: Selection of binaural cues based on interaural coherence. J. Acoust. Soc. Am., 116(5):3075–3089, Nov. 2004.
Google Scholar
W. G. Gardner and K. D. Martin. HRTF measurements of a KEMAR dummy-head microphone. Technical report, # 280, MIT Media Lab, Perceptual Computing, Cambridge, MA, USA, 1994.
Google Scholar
B. R. Glasberg and B. C. J. Moore. Derivation of auditory filter shapes from notched-noise data. Hear. Res., 47(1–2):103–138, Aug. 1990.
Google Scholar
T. Gustafsson, B. D. Rao, and M. Trivedi. Analysis of time-delay estimation in reverberant environments. In Proc. ICASSP, pages 2097–2100, Orlando, Florida, USA, May 2002.
Google Scholar
S. Harding, J. Barker, and G. Brown. Mask estimation for missing data speech recognition based on statistics of binaural interaction. IEEE Trans. Audio, Speech, Lang. Process., 14(1):58–67, Jan. 2006.
Google Scholar
J.-S. Hu and W.-H. Liu. Location classification of nonstationary sound sources using binaural room distribution patterns. IEEE Trans. Audio, Speech, Lang. Process., 17(4):682–692, May 2009.
Google Scholar
C. Hummersone, R. Mason, and T. Brookes. Dynamic precedence effect modelling for source separation in reverberant environments. IEEE Trans. Audio, Speech, Lang. Process., 18(7):1867–1871, Sep. 2010.
Google Scholar
G. Jacovitti and G. Scarano. Discrete time techniques for time delay estimation. IEEE Trans. Signal Process., 41(2):525–533, Feb. 1993.
Google Scholar
M. Jeub, M. Schäfer, and P. Vary. A binaural room impulse response database for the evaluation of dereverberation algorithms. Proc. Intl. Conf. Digital Signal Process. (DSP), pages 1–5, Jul. 2009.
Google Scholar
A. Jourjine, S. Rickard, and Yilmaz. Blind separation of disjoint orthogonal signals: Demixing N sources from 2 mixtures. In Proc. ICASSP, pages 2985–2988, Istanbul, Turkey, Jun. 2000.
Google Scholar
H. Kayser, S. D. Ewert, T. Rohdenburg, V. Hohmann, and B. Kollmeier. Database of multichannel in-ear and behind-the-ear head-related and binaural room impulse responses. EURASIP J. Adv. Sig. Proc., 2009.
Google Scholar
G. Kim, Y. Lu, Y. Hu, and P. C. Loizou. An algorithm that improves speech intelligibility in noise for normal-hearing listeners. J. Acoust. Soc. Am., 126(3):1486–1494, Sep. 2009.
Google Scholar
C. H. Knapp and G. C. Carter. The generalized correlation method for estimation of time delay. IEEE Trans. Acoust., Speech, Signal Process., ASSP-24(4):320–327, Aug. 1976.
Google Scholar
B. Kollmeier and R. Koch. Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. J. Acoust. Soc. Am., 95(3):1593–1602, Mar. 1994.
Google Scholar
E. H. A. Langendijk and A. W. Bronkhorst. Contribution of spectral cues to human sound localization. J. Acoust. Soc. Am., 112(4):1583–1596, Oct. 2002.
Google Scholar
W. Lindemann. Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. J. Acoust. Soc. Am., 80(6):1608–1622, Dec. 1986.
Google Scholar
W. Lindemann. Extension of a binaural cross-correlation model by contralateral inhibition. II. The law of the first wave front. J. Acoust. Soc. Am., 80(6):1623–1630, Dec. 1986.
Google Scholar
R. F. Lyon. A computational model of binaural localization and separation. In Proc. ICASSP, pages 1148–1151, Boston, Massachusetts, USA, Apr. 1983.
Google Scholar
N. Madhu and R. Martin. Acoustic source localization with microphone arrays. In R. Martin, U. Heute, and C. Antweiler, editors, Advances in Digital Speech Transmission, chapter 6, pages 135–170. Wiley, 2008.
Google Scholar
T. May and S. van de Par. Blind estimation of the number of speech sources in reverberant multisource scenarios based on binaural signals. in Proc. IWAENC, Aachen, Germany, Sep. 2012.
Google Scholar
T. May, S. van de Par, and A. Kohlrausch. Binaural detection of speech sources in complex acoustic scenes. In Proc. WASPAA, pages 241–244, New Paltz, NY, USA, Oct. 2011.
Google Scholar
T. May, S. van de Par, and A. Kohlrausch. A probabilistic model for robust localization based on a binaural auditory front-end. IEEE Trans. Audio, Speech, Lang. Process., 19(1):1–13, Jan. 2011.
Google Scholar
T. May, S. van de Par, and A. Kohlrausch. A binaural scene analyzer for joint localization and recognition of speakers in the presence of interfering noise sources and reverberation. IEEE Trans. Audio, Speech, Lang. Process., 20(7):2016–2030, Sep. 2012.
Google Scholar
T. May, S. van de Par, and A. Kohlrausch. Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE Trans. Audio, Speech, Lang. Process., 20(1):108–121, Jan. 2012.
Google Scholar
R. Meddis, M. J. Hewitt, and T. M. Shackleton. Implementation details of a computation model of the inner hair-cell auditory-nerve synapse. J. Acoust. Soc. Am., 87(4):1813–1816, Apr. 1990.
Google Scholar
R. Meddis and E. A. Lopez-Poveda. Auditory periphery: From pinna to auditory nerve. In R. Meddis, E. A. Lopez-Poveda, R. R. Fay, and A. N. Popper, editors, Computational models of the auditory system, volume 35, chapter 2, pages 7–38. Springer, New York, 2010.
Google Scholar
B. C. J. Moore. An introduction to the psychology of hearing. Academic Press, San Diego, California, USA, 5th edition, 2003.
Google Scholar
J. Nix and V. Hohmann. Sound source localization in real sound fields based on empirical statistics of interaural parameters. J. Acoust. Soc. Am., 119(1):463–479, Jan. 2006.
Google Scholar
K. J. Palomäki, G. J. Brown, and D. L. Wang. A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Speech Commun., 43(4):361–378, 2004.
Google Scholar
J. Perez-Lorenzo, R. Viciana-Abad, P. Reche-Lopez, F. Rivas, and J. Escolano. Evaluation of generalized cross-correlation methods for direction of arrival estimation using two microphones in real environments. Appl. Acoust., 73(8):698–712, Aug. 2012.
Google Scholar
V. C. Raykar, B. Yegnanarayana, S. R. M. Prasanna, and R. Duraiswami. Speaker localization using excitation source information in speech. IEEE Trans. Speech Audio Process., 13(5):751–761, Sep. 2005.
Google Scholar
L. Rayleigh. On our perception of sound direction. Philos. Mag., 13:214–232, 1907.
Google Scholar
N. Roman and D. L. Wang. Binaural tracking of multiple moving sources. In Proc. ICASSP, volume 5, pages 149–152, Hong Kong, China, Apr. 2003.
Google Scholar
N. Roman and D. L. Wang. Binaural tracking of multiple moving sources. IEEE Trans. Audio, Speech, Lang. Process., 16(4):728–739, 2008.
Google Scholar
N. Roman, D. L. Wang, and G. J. Brown. Speech segregation based on sound localization. J. Acoust. Soc. Am., 114(4):2236–2252, Oct. 2003.
Google Scholar
R. Roy and T. Kailath. ESPRIT - estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust., Speech, Signal Process., 37(7):984–995, Jul. 1989.
Google Scholar
S. M. Schimmel, M. F. Müller, and N. Dillier. A fast and accurate “shoebox” room acoustics simulator. In Proc. ICASSP, pages 241–244, Taipei, Taiwan, Apr. 2009.
Google Scholar
R. O. Schmidt. Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propagat., AP-34(3):276–280, Mar. 1986.
Google Scholar
M. R. Schroeder. New method for measuring reverberation time. J. Acoust. Soc. Am., 37(3):409–412, 1965.
Google Scholar
C. L. Searle, L. D. Braida, D. R. Cuddy, and M. F. Davis. Binaural pinna disparity: another auditory localization cue. J. Acoust. Soc. Am., 57(2):448–455, Feb. 1975.
Google Scholar
T. M. Shackleton, R. Meddis, and M. J. Hewitt. Across frequency integration in a model of lateralization. J. Acoust. Soc. Am., 91(4):2276–2279, Apr. 1992.
Google Scholar
C. Spille, B. Meyer, M. Dietz, and V. Hohmann. Binaural scene analysis with multi-dimensional statistical filters, chapter 6. In J. Blauert, editor, The technology of binaural listening. Springer, Berlin-Heidelberg-New York NY, 2013.
Google Scholar
R. M. Stern, A. S. Zeiberg, and C. Trahiotis. Lateralization of complex binaural stimuli: A weighted-image model. J. Acoust. Soc. Am., 84(1):156–165, Jul. 1988.
Google Scholar
C. J. Sumner, E. A. Lopez-Poveda, L. P. O’Mard, and R. Meddis. A revised model of the inner-hair cell and auditory-nerve complex. J. Acoust. Soc. Am., 111(5):2178–2188, May 2002.
Google Scholar
S. Tervo and T. Lokki. Interpolation methods for the SRP-PHAT algorithm. In Proc. IWAENC, Seattle, Washington, USA, Sep. 2008.
Google Scholar
A. P. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones. The NOISEX-92 study on the effect of additive noise on automatic speaker recognition. Technical report, Speech Research Unit, Defence Research Agency, Malvern, UK, 1992.
Google Scholar
D. L. Wang and G. Brown, editors. Computational auditory scene analysis: Principles, algorithms and applications. John Wiley & Sons, Hoboken, NJ, USA, 2006.
Google Scholar
D. B. Ward, E. A. Lehmann, and R. C. Williamson. Particle filtering algorithms for tracking an acoustic source in a reverberant environment. IEEE Trans. Speech Audio Process., 11(6):826-836, Nov. 2003.
Google Scholar
V. Willert, J. Eggert, J. Adamy, R. Stahl, and E. Körner. A probabilistic model for binaural sound localization. IEEE Trans. Sys., Man, Cybern., B, 36(5):982–994, Oct. 2006.
Google Scholar
J. Woodruff and D. L. Wang. Sequential organization of speech in reverberant environments by integrating monaural grouping and binaural localization. IEEE Trans. Audio, Speech, Lang. Process., 18(7):1856–1866, Sep. 2010.
Google Scholar
J. Woodruff and D. L. Wang. Binaural localization of multiple sources in reverberant and noisy environments. IEEE Trans. Audio, Speech, Lang. Process., 20(5):1503–1512, Jul. 2012.
Google Scholar
J. Woodruff and D. L. Wang. Binaural detection, localization, and segregation in reverberant environments based on joint pitch and azimuth cues. IEEE Trans. Audio, Speech, Lang. Process., 21(4):806–815, Apr. 2013.
Google Scholar
O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Signal Process. Lett., 52(7):1830–1847, Jul. 2004.
Google Scholar
P. Zakarauskas and M. S. Cynader. A computational theory of spectral cue localization. J. Acoust. Soc. Am., 94(3):1323–1331, Sep. 1993.
Google Scholar
C. Zhang, D. Florêncio, and Z. Zhang. Why does PHAT work well in low noise, reverberative environments? In Proc. ICASSP, pages 2565–2568, 2008.
Google Scholar
L. Zhang and X. Wu. On cross correlation based discrete time delay estimation. In Proc. ICASSP, volume 4, pages 981–984, Philadelphia, Pennsylvania, USA, 2005.
Google Scholar

Download references

Acknowledgments

The authors are indebted to two anonymous reviewers for their constructive suggestions.

Author information

Authors and Affiliations

Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, Kgs. Lyngby, Denmark
T. May
University of Oldenburg, Oldenburg, Germany
S. van de Par
Eindhoven University of Technology, Eindhoven, The Netherlands
A. Kohlrausch
Philips Research Europe, Eindhoven, The Netherlands
A. Kohlrausch

Authors

T. May
View author publications
You can also search for this author in PubMed Google Scholar
S. van de Par
View author publications
You can also search for this author in PubMed Google Scholar
A. Kohlrausch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Kohlrausch .

Editor information

Editors and Affiliations

Fak. Elektrotechnik, LS Allgm.Elektrotechn.+Akustik, Univ. Bochum, Bochum, Germany
Jens Blauert

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

May, T., van de Par, S., Kohlrausch, A. (2013). Binaural Localization and Detection of Speakers in Complex Acoustic Scenes. In: Blauert, J. (eds) The Technology of Binaural Listening. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37762-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-37762-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37761-7
Online ISBN: 978-3-642-37762-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics