Improved Sound Source Localization and Front-Back Disambiguation for Humanoid Robots with Two Ears

  • Ui-Hyun Kim
  • Kazuhiro Nakadai
  • Hiroshi G. Okuno
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7906)


An improved sound source localization (SSL) method has been developed that is based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for use with humanoid robots equipped with two microphones inside artificial pinnae. The conventional SSL method based on the GCC-PHAT method has two main problems when used on a humanoid robot platform: 1) diffraction of sound waves with multipath interference caused by the shape of the robot head and 2) front-back ambiguity. The diffraction problem was overcome by incorporating a new time delay factor into the GCC-PHAT method under the assumption of a spherical robot head. The ambiguity problem was overcome by utilizing the amplification effect of the pinnae for localization over the entire azimuth. Experiments conducted using a humanoid robot showed that localization errors were reduced by 9.9° on average with the improved method and that the success rate for front-back disambiguation was 32.2% better on average over the entire azimuth than with a conventional HRTF-based method.


Intelligent robot audition human-robot interaction sound source localization front-back disambiguation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sasaki, Y., Kabasawa, M., Thompson, S., Kagami, S., Oro, K.: Spherical Microphone Array for Spatial Sound Localizationfor a Mobile Robot. In: Proc. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems (IROS), Algarve, Portugal, pp. 713–718 (October 2012)Google Scholar
  2. 2.
    Cheng, C.I., Wakefield, G.H.: Introduction to Head-Related Transfer Functions (HRTFs): Representations of HRTFs in Time, Frequency, and Space. Audio Engineering Society 49, 231–249 (2001)Google Scholar
  3. 3.
    Knapp, C.H., Carter, G.C.: The Generalized Correlation Method for Estimation of Time Delay. IEEE Trans. on Acoustics, Speech, and Signal Processing 24(4), 320–327 (1976)CrossRefGoogle Scholar
  4. 4.
    Hill, P.A., Nelson, P.A., Kirkeby, O., Hamada, H.: Resolution of Front-Back Confusion in Virtual Acoustic Imaging Systems. Acoustical Society of America 108(6), 2901–2910 (2000)CrossRefGoogle Scholar
  5. 5.
    Nakashima, H., Mukai, T.: 3D Sound Source Localization System Based on Learning of Binaural Hearing. In: Proc. IEEE Inter. Conf. on Systems, Man and Cybernetics (SMC), Nagoya, Japan, October 10-12, vol. 4, pp. 3534–3539 (2005)Google Scholar
  6. 6.
    Ovcharenko, A., Cho, S.J., Chonga, U.P.: Front-back confusion resolution in three-dimensional sound localization using databases built with a dummy head. Acoustical Society of America 122(1), 489–495 (2007)CrossRefGoogle Scholar
  7. 7.
    Rodemann, T., Ince, G., Joublin, F., Goerick, C.: Using Binaural and Spectral Cues for Azimuth and Elevation Localization. In: Proc. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems (IROS), Nice, France, pp. 2185–2190 (September 2008)Google Scholar
  8. 8.
    Blauert, J.: Spatial Hearing: The Psychophysics of Human Sound Localization (Revised Edition). MIT Press, Cambridge (1997)Google Scholar
  9. 9.
    Kim, U.H., Okuno, H.G.: Improved Binaural Sound Localization and Trackingfor Unknown Time-Varying Number of Speakers. Advanced Robotics (to be published)Google Scholar
  10. 10.
    Middlebrooks, J.C.: Sound Localization by Human Listeners. Annual Review of Psychology 42, 135–159 (1991)CrossRefGoogle Scholar
  11. 11.
    Suzuki, Y., Asano, F., Kim, H.-Y., Sone, T.: An Optimum Computer-Generated Pulse Signal Suitable for the Measurement of very Long Impulse Responses. Acoustical Society of America 97(2), 1119–1123 (1995)CrossRefGoogle Scholar
  12. 12.
    Sohn, J., Kim, N.S., Sung, W.: A Statistical Model-Based Voice Activity Detection. IEEE Signal Processing Letters 6(1), 1–3 (1999)CrossRefGoogle Scholar
  13. 13.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ui-Hyun Kim
    • 1
  • Kazuhiro Nakadai
    • 2
  • Hiroshi G. Okuno
    • 1
  1. 1.Dept. of Intelligence Science and Technology, Graduate School of InformaticsKyoto UniversityKyoto-shiJapan
  2. 2.Honda Research Institute Japan Co., Ltd.Wako-shiJapan

Personalised recommendations