Skip to main content

Binaural Localization and Detection of Speakers in Complex Acoustic Scenes

  • Chapter
The Technology of Binaural Listening

Part of the book series: Modern Acoustics and Signal Processing ((MASP))

Abstract

The robust localization of speech sources is required for a wide range of applications, among them hearing aids and teleconferencing systems. This chapter focuses on binaural approaches to estimate the spatial position of multiple competing speakers in adverse acoustic scenarios by only exploiting the signals reaching both ears. A set of experiments is conducted to systematically evaluate the impact of reverberation and interfering noise on speaker-localization performance. In particular, the spatial distribution of the interfering noise has a considerable effect on speaker-localization performance, being most detrimental if the noise field contains strong directional components. In these conditions, interfering noise might be erroneously classified as a speaker position. This observation highlights the necessity to combine the localization stage with a decision about the underlying source type in order to enable a robust localization of speakers in noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Knowles electronic manikin for acoustic research, KEMAR.

  2. 2.

    Although the problem of moving sources is not covered in this chapter, the MATLAB toolbox ROOMSIMOVE for simulating RIRs for moving sources can be found at http://www.irisa.fr/metiss/members/evincent/software.

  3. 3.

    The corresponding MATLAB code can be found at http://www.umiacs.umd.edu/labs/cvl/pirl/vikas/Current_research/time_delay_estimation/time_delay_estimation.html

References

  1. P. Aarabi. Self-localizing dynamic microphone arrays. IEEE Trans. Sys., Man, Cybern., C, 32(4):474–484, Nov. 2002.

    Google Scholar 

  2. P. Aarabi and S. Mavandadi. Robust sound localization using conditional time-frequency histograms. Inf. Fusion, 4(2):111–122, Sep. 2003.

    Google Scholar 

  3. P. Aarabi and S. Zaky. Iterative spatial probability based sound localization. In Proceedings of the 4th World Multi-conference on Circuits, Systems, Computers and Communications, Athens, Greece, Jul. 2000.

    Google Scholar 

  4. J. B. Allen and D. A. Berkley. Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am., 65(4):943–950, Apr. 1979.

    Google Scholar 

  5. S. Argentieri, A. Portello, M. Bernard, P. Danės, and B. Gas. Binaural systems in robotics. In J. Blauert, editor, The technology of binaural listening, chapter 9. Springer, Berlin-Heidelberg-New York NY, 2013.

    Google Scholar 

  6. R. Baumgartner, P. Majdak, and B. Laback. Assessment of sagittal-plane sound-localization performance in spatial-audio applications, chapter 4. In J. Blauert, editor, The technology of binaural listening. Springer–Berlin–Heidelberg–New York NY, 2013.

    Google Scholar 

  7. J. Benesty, J. Chen, and Y. Huang. Time-delay estimation via linear interpolation and cross correlation. IEEE Trans. Speech Audio Process. 12(5):509–519, 2004.

    Google Scholar 

  8. M. Bodden. Modeling human sound-source localization and the cocktail-party-effect. Acta Acust./Acustica, 1(1):43–55, 1993.

    Google Scholar 

  9. J. Braasch. Modelling of binaural hearing. In J. Blauert, editor, Communication acoustics, chapter 4, pages 75–108. Springer, Berlin, Germany, 2005.

    Google Scholar 

  10. A. S. Bregman. Auditory scene analysis: The perceptual organization of sound. The MIT Press, Cambridge, MA, USA, 1990.

    Google Scholar 

  11. A. W. Bronkhorst. The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions. Acustica, 86:117–128, 2000.

    Google Scholar 

  12. A. W. Bronkhorst and R. Plomp. Effect of multiple speechlike maskers on binaural speech recognition in normal and impaired hearing. J. Acoust. Soc. Am., 92(6):3132–3139, Dec. 1992.

    Google Scholar 

  13. C. P. Brown and R. O. Duda. A structural model for binaural sound synthesis. IEEE Trans. Speech Audio Process., 6(5):476–488, Sep. 1998.

    Google Scholar 

  14. G. J. Brown and M. Cooke. Computational auditory scene analysis. Comput. Speech Lang., 8(4):297–336, Oct. 1994.

    Google Scholar 

  15. G. C. Carter, A. H. Nuttall, and P. G. Cable. The smoothed coherence transform. Proceedings of the IEEE, 61(10):1497–1498, Oct. 1973.

    Google Scholar 

  16. J. Chen, J. Benesty, and Y. Huang. Robust time delay estimation exploiting redundancy among multiple microphones. IEEE Trans. Acoust., Speech, Signal Process., 11(6):549–557, 2003.

    Google Scholar 

  17. J. Chen, J. Benesty, and Y. A. Huang. Performance of GCC- and AMDF-based time-delay estimation in practical reverberant environments. J. Appl. Signal Process., 1:25–36, 2005.

    Google Scholar 

  18. J. Chen, J. Benesty, and Y. A. Huang. Time delay estimation in room acoustic environments: An overview. J. Appl. Signal Process., 2006:1–19, 2006.

    Google Scholar 

  19. E. C. Cherry. Some experiments on the recognition of speech, with one and two ears. J. Acoust. Soc. Am., 25(5):975–979, Sep. 1953.

    Google Scholar 

  20. M. Cooke. A glimpsing model of speech perception in noise. J. Acoust. Soc. Am., 199(3):1562–1573, Mar. 2006.

    Google Scholar 

  21. M. Cooke, P. Green, L. Josifovski, and A. Vizinho. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun., 34:267–285, 2001.

    Google Scholar 

  22. M. Cooke and T.-W. Lee. Speech separation and recognition competition. URL http://staffwww.dcs.shef.ac.uk/people/M. Cooke/SpeechSeparationChallenge.htm, accessed on 15th January 2013, 2006.

  23. C. J. Darwin. Auditory grouping. Trends Cogn. Sci., 1(1):327–333, Dec. 1997.

    Google Scholar 

  24. M. S. Datum, F. Palmieri, and A. Moiseff. An artificial neural network for sound localization using binaural cues. J. Acoust. Soc. Am., 100(1):372–383, Jul. 1996.

    Google Scholar 

  25. J. DiBiase, H. Silverman, and M. Brandstein. Robust localization in reverberant rooms. In M. Brandstein and D. Ward, editors, Microphone arrays: Signal processing techniques and applications, chapter 8, pages 157–180. Springer, Berlin, Germany, 2001.

    Google Scholar 

  26. M. Dietz, S. D. Ewert, and V. Hohmann. Auditory model based direction estimation of concurrent speakers from binaural signals. Speech Commun., 53(5):592–605, 2011.

    Google Scholar 

  27. G. Doblinger. Localization and tracking of acoustical sources. In E. Haensler and G. Schmidt, editors, Topics in acoustic echo and noise control, chapter 4, pages 91–124. Springer, Berlin, Germany, 2006.

    Google Scholar 

  28. R. O. Duda and W. L. Martens. Range dependence of the response of a spherical head model. J. Acoust. Soc. Am., 104(5):3048–3058, Nov. 1998.

    Google Scholar 

  29. C. Faller and J. Merimaa. Source localization in complex listening situations: Selection of binaural cues based on interaural coherence. J. Acoust. Soc. Am., 116(5):3075–3089, Nov. 2004.

    Google Scholar 

  30. W. G. Gardner and K. D. Martin. HRTF measurements of a KEMAR dummy-head microphone. Technical report, # 280, MIT Media Lab, Perceptual Computing, Cambridge, MA, USA, 1994.

    Google Scholar 

  31. B. R. Glasberg and B. C. J. Moore. Derivation of auditory filter shapes from notched-noise data. Hear. Res., 47(1–2):103–138, Aug. 1990.

    Google Scholar 

  32. T. Gustafsson, B. D. Rao, and M. Trivedi. Analysis of time-delay estimation in reverberant environments. In Proc. ICASSP, pages 2097–2100, Orlando, Florida, USA, May 2002.

    Google Scholar 

  33. S. Harding, J. Barker, and G. Brown. Mask estimation for missing data speech recognition based on statistics of binaural interaction. IEEE Trans. Audio, Speech, Lang. Process., 14(1):58–67, Jan. 2006.

    Google Scholar 

  34. J.-S. Hu and W.-H. Liu. Location classification of nonstationary sound sources using binaural room distribution patterns. IEEE Trans. Audio, Speech, Lang. Process., 17(4):682–692, May 2009.

    Google Scholar 

  35. C. Hummersone, R. Mason, and T. Brookes. Dynamic precedence effect modelling for source separation in reverberant environments. IEEE Trans. Audio, Speech, Lang. Process., 18(7):1867–1871, Sep. 2010.

    Google Scholar 

  36. G. Jacovitti and G. Scarano. Discrete time techniques for time delay estimation. IEEE Trans. Signal Process., 41(2):525–533, Feb. 1993.

    Google Scholar 

  37. M. Jeub, M. Schäfer, and P. Vary. A binaural room impulse response database for the evaluation of dereverberation algorithms. Proc. Intl. Conf. Digital Signal Process. (DSP), pages 1–5, Jul. 2009.

    Google Scholar 

  38. A. Jourjine, S. Rickard, and Yilmaz. Blind separation of disjoint orthogonal signals: Demixing N sources from 2 mixtures. In Proc. ICASSP, pages 2985–2988, Istanbul, Turkey, Jun. 2000.

    Google Scholar 

  39. H. Kayser, S. D. Ewert, T. Rohdenburg, V. Hohmann, and B. Kollmeier. Database of multichannel in-ear and behind-the-ear head-related and binaural room impulse responses. EURASIP J. Adv. Sig. Proc., 2009.

    Google Scholar 

  40. G. Kim, Y. Lu, Y. Hu, and P. C. Loizou. An algorithm that improves speech intelligibility in noise for normal-hearing listeners. J. Acoust. Soc. Am., 126(3):1486–1494, Sep. 2009.

    Google Scholar 

  41. C. H. Knapp and G. C. Carter. The generalized correlation method for estimation of time delay. IEEE Trans. Acoust., Speech, Signal Process., ASSP-24(4):320–327, Aug. 1976.

    Google Scholar 

  42. B. Kollmeier and R. Koch. Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. J. Acoust. Soc. Am., 95(3):1593–1602, Mar. 1994.

    Google Scholar 

  43. E. H. A. Langendijk and A. W. Bronkhorst. Contribution of spectral cues to human sound localization. J. Acoust. Soc. Am., 112(4):1583–1596, Oct. 2002.

    Google Scholar 

  44. W. Lindemann. Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. J. Acoust. Soc. Am., 80(6):1608–1622, Dec. 1986.

    Google Scholar 

  45. W. Lindemann. Extension of a binaural cross-correlation model by contralateral inhibition. II. The law of the first wave front. J. Acoust. Soc. Am., 80(6):1623–1630, Dec. 1986.

    Google Scholar 

  46. R. F. Lyon. A computational model of binaural localization and separation. In Proc. ICASSP, pages 1148–1151, Boston, Massachusetts, USA, Apr. 1983.

    Google Scholar 

  47. N. Madhu and R. Martin. Acoustic source localization with microphone arrays. In R. Martin, U. Heute, and C. Antweiler, editors, Advances in Digital Speech Transmission, chapter 6, pages 135–170. Wiley, 2008.

    Google Scholar 

  48. T. May and S. van de Par. Blind estimation of the number of speech sources in reverberant multisource scenarios based on binaural signals. in Proc. IWAENC, Aachen, Germany, Sep. 2012.

    Google Scholar 

  49. T. May, S. van de Par, and A. Kohlrausch. Binaural detection of speech sources in complex acoustic scenes. In Proc. WASPAA, pages 241–244, New Paltz, NY, USA, Oct. 2011.

    Google Scholar 

  50. T. May, S. van de Par, and A. Kohlrausch. A probabilistic model for robust localization based on a binaural auditory front-end. IEEE Trans. Audio, Speech, Lang. Process., 19(1):1–13, Jan. 2011.

    Google Scholar 

  51. T. May, S. van de Par, and A. Kohlrausch. A binaural scene analyzer for joint localization and recognition of speakers in the presence of interfering noise sources and reverberation. IEEE Trans. Audio, Speech, Lang. Process., 20(7):2016–2030, Sep. 2012.

    Google Scholar 

  52. T. May, S. van de Par, and A. Kohlrausch. Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE Trans. Audio, Speech, Lang. Process., 20(1):108–121, Jan. 2012.

    Google Scholar 

  53. R. Meddis, M. J. Hewitt, and T. M. Shackleton. Implementation details of a computation model of the inner hair-cell auditory-nerve synapse. J. Acoust. Soc. Am., 87(4):1813–1816, Apr. 1990.

    Google Scholar 

  54. R. Meddis and E. A. Lopez-Poveda. Auditory periphery: From pinna to auditory nerve. In R. Meddis, E. A. Lopez-Poveda, R. R. Fay, and A. N. Popper, editors, Computational models of the auditory system, volume 35, chapter 2, pages 7–38. Springer, New York, 2010.

    Google Scholar 

  55. B. C. J. Moore. An introduction to the psychology of hearing. Academic Press, San Diego, California, USA, 5th edition, 2003.

    Google Scholar 

  56. J. Nix and V. Hohmann. Sound source localization in real sound fields based on empirical statistics of interaural parameters. J. Acoust. Soc. Am., 119(1):463–479, Jan. 2006.

    Google Scholar 

  57. K. J. Palomäki, G. J. Brown, and D. L. Wang. A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Speech Commun., 43(4):361–378, 2004.

    Google Scholar 

  58. J. Perez-Lorenzo, R. Viciana-Abad, P. Reche-Lopez, F. Rivas, and J. Escolano. Evaluation of generalized cross-correlation methods for direction of arrival estimation using two microphones in real environments. Appl. Acoust., 73(8):698–712, Aug. 2012.

    Google Scholar 

  59. V. C. Raykar, B. Yegnanarayana, S. R. M. Prasanna, and R. Duraiswami. Speaker localization using excitation source information in speech. IEEE Trans. Speech Audio Process., 13(5):751–761, Sep. 2005.

    Google Scholar 

  60. L. Rayleigh. On our perception of sound direction. Philos. Mag., 13:214–232, 1907.

    Google Scholar 

  61. N. Roman and D. L. Wang. Binaural tracking of multiple moving sources. In Proc. ICASSP, volume 5, pages 149–152, Hong Kong, China, Apr. 2003.

    Google Scholar 

  62. N. Roman and D. L. Wang. Binaural tracking of multiple moving sources. IEEE Trans. Audio, Speech, Lang. Process., 16(4):728–739, 2008.

    Google Scholar 

  63. N. Roman, D. L. Wang, and G. J. Brown. Speech segregation based on sound localization. J. Acoust. Soc. Am., 114(4):2236–2252, Oct. 2003.

    Google Scholar 

  64. R. Roy and T. Kailath. ESPRIT - estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust., Speech, Signal Process., 37(7):984–995, Jul. 1989.

    Google Scholar 

  65. S. M. Schimmel, M. F. Müller, and N. Dillier. A fast and accurate “shoebox” room acoustics simulator. In Proc. ICASSP, pages 241–244, Taipei, Taiwan, Apr. 2009.

    Google Scholar 

  66. R. O. Schmidt. Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propagat., AP-34(3):276–280, Mar. 1986.

    Google Scholar 

  67. M. R. Schroeder. New method for measuring reverberation time. J. Acoust. Soc. Am., 37(3):409–412, 1965.

    Google Scholar 

  68. C. L. Searle, L. D. Braida, D. R. Cuddy, and M. F. Davis. Binaural pinna disparity: another auditory localization cue. J. Acoust. Soc. Am., 57(2):448–455, Feb. 1975.

    Google Scholar 

  69. T. M. Shackleton, R. Meddis, and M. J. Hewitt. Across frequency integration in a model of lateralization. J. Acoust. Soc. Am., 91(4):2276–2279, Apr. 1992.

    Google Scholar 

  70. C. Spille, B. Meyer, M. Dietz, and V. Hohmann. Binaural scene analysis with multi-dimensional statistical filters, chapter 6. In J. Blauert, editor, The technology of binaural listening. Springer, Berlin-Heidelberg-New York NY, 2013.

    Google Scholar 

  71. R. M. Stern, A. S. Zeiberg, and C. Trahiotis. Lateralization of complex binaural stimuli: A weighted-image model. J. Acoust. Soc. Am., 84(1):156–165, Jul. 1988.

    Google Scholar 

  72. C. J. Sumner, E. A. Lopez-Poveda, L. P. O’Mard, and R. Meddis. A revised model of the inner-hair cell and auditory-nerve complex. J. Acoust. Soc. Am., 111(5):2178–2188, May 2002.

    Google Scholar 

  73. S. Tervo and T. Lokki. Interpolation methods for the SRP-PHAT algorithm. In Proc. IWAENC, Seattle, Washington, USA, Sep. 2008.

    Google Scholar 

  74. A. P. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones. The NOISEX-92 study on the effect of additive noise on automatic speaker recognition. Technical report, Speech Research Unit, Defence Research Agency, Malvern, UK, 1992.

    Google Scholar 

  75. D. L. Wang and G. Brown, editors. Computational auditory scene analysis: Principles, algorithms and applications. John Wiley & Sons, Hoboken, NJ, USA, 2006.

    Google Scholar 

  76. D. B. Ward, E. A. Lehmann, and R. C. Williamson. Particle filtering algorithms for tracking an acoustic source in a reverberant environment. IEEE Trans. Speech Audio Process., 11(6):826-836, Nov. 2003.

    Google Scholar 

  77. V. Willert, J. Eggert, J. Adamy, R. Stahl, and E. Körner. A probabilistic model for binaural sound localization. IEEE Trans. Sys., Man, Cybern., B, 36(5):982–994, Oct. 2006.

    Google Scholar 

  78. J. Woodruff and D. L. Wang. Sequential organization of speech in reverberant environments by integrating monaural grouping and binaural localization. IEEE Trans. Audio, Speech, Lang. Process., 18(7):1856–1866, Sep. 2010.

    Google Scholar 

  79. J. Woodruff and D. L. Wang. Binaural localization of multiple sources in reverberant and noisy environments. IEEE Trans. Audio, Speech, Lang. Process., 20(5):1503–1512, Jul. 2012.

    Google Scholar 

  80. J. Woodruff and D. L. Wang. Binaural detection, localization, and segregation in reverberant environments based on joint pitch and azimuth cues. IEEE Trans. Audio, Speech, Lang. Process., 21(4):806–815, Apr. 2013.

    Google Scholar 

  81. O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Signal Process. Lett., 52(7):1830–1847, Jul. 2004.

    Google Scholar 

  82. P. Zakarauskas and M. S. Cynader. A computational theory of spectral cue localization. J. Acoust. Soc. Am., 94(3):1323–1331, Sep. 1993.

    Google Scholar 

  83. C. Zhang, D. Florêncio, and Z. Zhang. Why does PHAT work well in low noise, reverberative environments? In Proc. ICASSP, pages 2565–2568, 2008.

    Google Scholar 

  84. L. Zhang and X. Wu. On cross correlation based discrete time delay estimation. In Proc. ICASSP, volume 4, pages 981–984, Philadelphia, Pennsylvania, USA, 2005.

    Google Scholar 

Download references

Acknowledgments

The authors are indebted to two anonymous reviewers for their constructive suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Kohlrausch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

May, T., van de Par, S., Kohlrausch, A. (2013). Binaural Localization and Detection of Speakers in Complex Acoustic Scenes. In: Blauert, J. (eds) The Technology of Binaural Listening. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37762-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37762-4_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37761-7

  • Online ISBN: 978-3-642-37762-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics