Robot with Two Ears Listens to More than Two Simultaneous Utterances by Exploiting Harmonic Structures

  • Yasuharu Hirasawa
  • Toru Takahashi
  • Tetsuya Ogata
  • Hiroshi G. Okuno
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6703)


In real-world situations, people often hear more than two simultaneous sounds. For robots, when the number of sound sources exceeds that of sensors, the situation is called under-determined, and robots with two ears need to deal with this situation. Some studies on under-determined sound source separation use L1-norm minimization methods, but the performance of automatic speech recognition with separated speech signals is poor due to its spectral distortion. In this paper, a two-stage separation method to improve separation quality with low computational cost is presented. The first stage uses a L1-norm minimization method in order to extract the harmonic structures. The second stage exploits reliable harmonic structures to maintain acoustic features. Experiments that simulate three utterances recorded by two microphones in an anechoic chamber show that our method improves speech recognition correctness by about three points and is fast enough for real-time separation.


Speech Signal Sound Source Humanoid Robot Acoustic Feature Independent Component Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hirose, M., Ogawa, K.: Honda humanoid robots development. Philosophical Trans. A 365(2007), 11–19 (1850)Google Scholar
  2. 2.
    Akachi, K., Kaneko, K., et al.: Development of humanoid robot HRP-3P. In: Proc. Humanoids 2005, pp. 50–55 (2005)Google Scholar
  3. 3.
    MacDorman, K.F., Ishiguro, H.: The uncanny advantage of using androids in cognitive and social science research. Interaction Studies 7(3), 297–337 (2006)CrossRefGoogle Scholar
  4. 4.
    Hyvärinen, A., Oja, E.: Independent Component Analysis: Algorithms and Applications. Neural Networks 13(4-5), 411–430 (2000)CrossRefGoogle Scholar
  5. 5.
    Griffiths, L., Jim, C.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. on Antennas and Propagation 30(1), 27–34 (1982)CrossRefGoogle Scholar
  6. 6.
    Yılmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Trans. on Signal Processing 52(7), 1830–1847 (2004)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Lee, T.W., Lewicki, M.S., et al.: Blind source separation of more sources than mixtures using overcomplete representations. IEEE Signal Processing Letters 6(4), 87–90 (1999)CrossRefGoogle Scholar
  8. 8.
    Bofill, P., Zibulevsky, M.: Underdetermined blind source separation using sparse representations. Signal processing 81(11), 2353–2362 (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Li, Y., Cichocki, A., Amari, S.: Analysis of sparse representation and blind source separation. Neural Computation 16(6), 1193–1234 (2004)CrossRefzbMATHGoogle Scholar
  10. 10.
    Li, Y., Amari, S., Cichocki, A., Ho, D.W.C., Xie, S.: Underdetermined blind source separation based on sparse representation. IEEE Trans. on Signal Processing 54(2), 423–437 (2006)CrossRefGoogle Scholar
  11. 11.
    Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech and Signal Processing 28(4), 357–366 (1980)CrossRefGoogle Scholar
  12. 12.
    Winter, S., Sawada, H., Makino, S.: On real and complex valued L1-norm minimization for overcomplete blind source separation. In: Proc. WASPAA 2005, pp. 86–89 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yasuharu Hirasawa
    • 1
  • Toru Takahashi
    • 1
  • Tetsuya Ogata
    • 1
  • Hiroshi G. Okuno
    • 1
  1. 1.Graduate School of InformaticsKyoto UniversityKyotoJapan

Personalised recommendations