Improved sound source localization in horizontal plane for binaural robot audition

Kim, Ui-Hyun; Nakadai, Kazuhiro; Okuno, Hiroshi G.

doi:10.1007/s10489-014-0544-y

Improved sound source localization in horizontal plane for binaural robot audition

Published: 18 June 2014

Volume 42, pages 63–74, (2015)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ui-Hyun Kim¹,
Kazuhiro Nakadai² &
Hiroshi G. Okuno¹

852 Accesses
22 Citations
Explore all metrics

Abstract

An improved sound source localization (SSL) method has been developed that is based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for use with binaural robots equipped with two microphones inside artificial pinnae. The conventional SSL method based on the GCC-PHAT method has two main problems when used on a binaural robot platform: 1) diffraction of sound waves with multipath interference caused by the contours of the robot head, which affects localization accuracy, and 2) front-back ambiguity, which limits the localization range to half the horizontal space. The diffraction problem was overcome by incorporating a new time delay factor into the GCC-PHAT method under the assumption of a spherical robot head. The ambiguity problem was overcome by utilizing the amplification effect of the pinnae for localization over the entire azimuth. Experiments conducted using two dummy heads equipped with small or large pinnae showed that localization errors were reduced by 8.91° (3.21° vs. 12.12°) on average with the new time delay factor compared with the conventional GCC-PHAT method and that the success rate for front-back disambiguation using the pinnae amplification effect was 29.76 % (93.46 % vs. 72.02 %) better on average over the entire azimuth than with a conventional head related transfer function (HRTF)-based method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved Sound Source Localization and Front-Back Disambiguation for Humanoid Robots with Two Ears

Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios

Article 25 November 2015

A Review on Sound Source Localization Systems

Article 05 May 2022

References

Starch D (1908) Perimetry of the localization of sound. State University of Iowa
Bregman AS (1990) Auditory scene analysis. MIT Press, Cambridge
Dautenhahn K (2007) Socially intelligent robots: dimensions of human-robot interaction. Philos Trans R Soc B: Biol Sci 362(1480):679–704
Article Google Scholar
Valin JM, Michaud F, Rouat J, Letouneau D (2003) Robust sound source localization using a microphone array on a mobile robot. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS). Las Vegas, pp 1128–1233
Tamai Y, Sasaki Y, Kagami S, Mizoguchi H (2005) Three ring microphone array for 3D sound localization and separation for mobile robot audition. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS). Alberta, pp 4172–4177
Kim UH, Kim J, Kim D, Kim H, You B J(200) Speaker localization using the TDOA-based feature matrix for a humanoid robot. In: Proceedings of the IEEE international symposium on robot and human interactive communication (RO-MAN). Munich, pp 610–615
Hu JS, Chan CY, Wang CK, Wang CC (2009) Simultaneous localization of mobile robot and multiple sound sources using microphone array. In: Proceedings of the IEEE international conference on robots and automation (ICRA). Kobe, pp 2934
Li X, Liu H, Yang X (2011) Sound source localization for mobile robot based on time difference feature and space grid matching. In: Proceedings of the IEEE/RSJ international robotics and systems (IROS). San Francisco, pp 2879–2886
Sasaki Y, Kabasawa M, Thompson S, Kagami S, Oro K (2012) Spherical microphone array for spatial sound localization for a mobile robot. In: Proceedings of the international conference on intelligent robots and systems (IROS). Algarve, pp 713–718
Blauert J (1997) Spatial hearing: The psychophysics of human sound localization (Revised Edition). Cambridge. MIT Press
Wallach H, Newman EB, Rosenzweig MR (1949) The precedence effect in sound localization. Am J Psychol 62(3):315–336
Article Google Scholar
Blauert J, Braasch J (2011) Binaural signal processing. In: Proceedings of the IEE international conference on digital signal processing (DSP). Greece, pp 1–11
Rodemann T (2010) A study on distance estimation in binaural sound localization. In: Proccedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS). Offenback, pp 425–430
Youssef K, Argentieri S, Zarader JL (2012) Toward a systematic study of binaural cues. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS). Villamoura, pp 1004–1009
Cheng CI, Wakefield GH (2001) Introduction to head-related transfer functions (HRTFs): representations of HRTFs in time, frequency, and space. Audio Eng Soc 49:231–249
Google Scholar
Moore BCJ (2003) An introduction to the psychology of hearing, 5th edn. Academic Press
Wang DL, Brown GJ (2006) Computational auditory sceneanalysis: principles, algorithms, and applications. Wiley InterScience
Carter GC, Nuttall AA, Cable PG (1973) The smoothed coherence transform. In: Proccedings of the IEEE 61(10):1497–1498
Article Google Scholar
Knapp C H, Carter G C (1976) The generalized correlation method for estimation of time delay. IEEE Trans Acoust Speech Sig Process 24(4):320–327
Article Google Scholar
Hassab JC, Boucher RE (1979) Optimum estimation of time delay by a generalized correlator. IEEE T-ASSP 27(4):373–380
Article MATH Google Scholar
Wallach H (1940) The role of head movements and vestibular and visual cues in sound localization. J Exp Psychol 27(4):339–368
Article MathSciNet Google Scholar
Hill PA, Nelson PA, Kirkeby O, Hamada H (2000) Resolution of front-back confusion in virtual acoustic imaging systems. Acoust Soc Am 108(6):2901–2910
Article Google Scholar
Nakashima H, Mukai T (2005) 3D sound source localization system based on learning of Binaural hearing. In: Proceedings of the IEEE international conference on systems, man and cybernetics (SMC). Nagoya, vol. 4, pp. 3534–3539
Ovcharenko A, Cho SJ, Chonga UP (2007) Front-back confusion resolution in three-dimensional sound localization using databases built with a dummy head. J Acoust Soc Am 122(1):489–495
Article Google Scholar
Rodemann T, Ince G, Joublin F, Goerick C (2008) Using binaural and spectral cues for azimuth and elevation localization. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS). Nice, pp 2185–2190
Kim UH, Nakadai K, Okuno HG (2013) Improved sound source localization and front-back disambiguation for humanoid robots with two ears, In: Proccedings of internationales conference on industrial engineering and other applications of applied intelligent systems (IEA/AIE). Amsterdam, pp 282–291
Algazi VR, Duda RO, Thompson DM, Avendano C (2001) The CIPIC HRTF database. In: Procedings of the IEEE international worker on applications of signal processing to audio and electroacoustics. New Paltz, New York, pp 99–102
Jian M, Kot AC, Er MH (1998) DOA estimation of speech source with microphone arrays, vol 5l. Monterey
Kim UH, Okuno HG (2013) Improved binaural sound localization and tracking for unknown time-varying number of speakers. Adv Robot 27(15):1161–1173
Article Google Scholar
Kim UH, Okuno HG (2013) Robust localization and tracking of multiple speakers in real environments for binaural robot audition. In: Procedings of international worker on image and audio analysis for multimedia interactive services (WIA ²MIS). France, pp 1–4
Hassab JC, Boucher RE (1981) Performance of the generalized cross correlator in the presence of a strong spectral peak in the signal. IEEE T-ASSP 29(3):549–555
Article Google Scholar
Azaria M, Hertz D (1984) Time delay estimation by generalized cross correlation methods. IEEE Trans Acoust Speech Sig Process 32(2):280–285
Article Google Scholar
Lim JS, Oppenheim AV (1979) Enhancement and bandwith compression of noisy speech. In: Proc. IEEE 67(12):1586–1604
Article Google Scholar
Middlebrooks JC (1991) Sound localization by human listeners. Annu Rev Psychol 42:135–159
Article Google Scholar
Suzuki Y, Asano F, Kim HY, Sone T (1995) An optimum computer-generated pulse signal suitable for the measurement of very long impulse responses. ACM 97(2):1119–1123
Google Scholar
Sohn J., Sung W (1998) A voice activity detector employing soft decision based noise spectrum adaptation. In: Proceedings of IEEE International Conference Acoustic Speech Signal Process (ICASSP), pp 365368
Sohn J, Kim NS, Sung W (1999) A statistical model-based voice activity detection. Sig Process Lett 6(1):1–3
Article Google Scholar
Kim T, Attias T, Lee SY (2007) Blind source separation exploiting higher-order frequency dependencies.IEEE Trans Audio Speech Lang Process 15(1):70–79
Article Google Scholar

Download references

Acknowledgments

This research was partially supported by a Grant-in-Aid for Scientific Research (KAKENHI No. 24220006) from the Japan Society for the Promotion of Science (JSPS).

Author information

Authors and Affiliations

Department of Intelligence Science and Technology, Graduate School of Informatics Kyoto University, Kyoto-shi, Japan
Ui-Hyun Kim & Hiroshi G. Okuno
Honda Research Institute Japan Co., Ltd., Wako-shi, Japan
Kazuhiro Nakadai

Authors

Ui-Hyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiro Nakadai
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi G. Okuno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ui-Hyun Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, UH., Nakadai, K. & Okuno, H.G. Improved sound source localization in horizontal plane for binaural robot audition. Appl Intell 42, 63–74 (2015). https://doi.org/10.1007/s10489-014-0544-y

Download citation

Published: 18 June 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s10489-014-0544-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved sound source localization in horizontal plane for binaural robot audition

Abstract

Access this article

Similar content being viewed by others

Improved Sound Source Localization and Front-Back Disambiguation for Humanoid Robots with Two Ears

Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios

A Review on Sound Source Localization Systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved sound source localization in horizontal plane for binaural robot audition

Abstract

Access this article

Similar content being viewed by others

Improved Sound Source Localization and Front-Back Disambiguation for Humanoid Robots with Two Ears

Selection of the Closest Sound Source for Robot Auditory Attention in Multi-source Scenarios

A Review on Sound Source Localization Systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation