Robust Ego Noise Suppression of a Robot

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6096)


This paper describes an architecture that can enhance a robot with the capability of performing automatic speech recognition even while the robot is moving. The system consists of three blocks: (1) a multi-channel noise reduction block comprising consequent stages of microphone-array-based sound localization, geometric source separation and post filtering, (2) a single-channel template subtraction block and (3) a speech recognition block. In this work, we specifically investigate a missing feature theory based automatic speech recognition (MFT-ASR) approach in block (3), that makes use of spectrotemporal elements that are derived from (1) and (2) to measure the reliability of the audio features and to generate masks that filter unreliable speech features. We evaluate the proposed technique on a robot using word error rates. Furthermore, we present a detailed analysis of recognition accuracy to determine optimal parameters. Proposed MFT-ASR implementation attains significantly higher recognition performance compared to the performances of both single and multi-channel noise reduction methods.


Ego noise noise reduction robot audition speech recognition missing feature theory microphone array 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sato, M., Sugiyama, A., Ohnaka, S.: An adaptive noise canceller with low signal-distortion based on variable stepsize subfilters for human-robot communication. IEICE Trans. Fundamentals E88-A(8), 2055–2061 (2004)CrossRefGoogle Scholar
  2. 2.
    Nishimura, Y., Nakano, M., Nakadai, K., Tsujino, H., Ishizuka, M.: Speech Recognition for a Robot under its Motor Noises by Selective Application of Missing Feature Theory and MLLR. In: ISCA Tutorial and Research Workshop on Statistical And Perceptual Audition (2006)Google Scholar
  3. 3.
    Ito, A., Kanayama, T., Suzuki, M., Makino, S.: Internal Noise Suppression for Speech Recognition by Small Robots. In: Interspeech, pp. 2685–2688 (2005)Google Scholar
  4. 4.
    Ince, G., Nakadai, K., Rodemann, T., Hasegawa, Y., Tsujino, H., Imura, J.: Ego Noise Suppression of a Robot Using Template Subtraction. In: Proc. of the IEEE/RSJ International Conference on Robots and Intelligent Systems (IROS), pp. 199–204 (2009)Google Scholar
  5. 5.
    Cohen, I.: Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement. IEEE Signal Processing Letters 9(1) (2002)Google Scholar
  6. 6.
    Boll, S.: Suppression of Acoustic Noise in Speech Using Spectral Subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-27(2) (1979)Google Scholar
  7. 7.
    Raj, B., Stern, R.M.: Missing-feature approaches in speech recognition. IEEE Signal Processing Magazine 22, 101–116 (2005)CrossRefGoogle Scholar
  8. 8.
    Yamamoto, S., Nakadai, K., Nakano, M., Tsujino, H., Valin, J.M., Komatani, K., Ogata, T., Okuno, H.G.: Real-time robot audition system that recognizes simultaneous speech in the real world. In: Proc. of the IEEE/RSJ International Conference on Robots and Intelligent Systems, IROS (2006)Google Scholar
  9. 9.
    Takahashi, T., Yamamoto, S., Nakadai, K., Komatani, K., Ogata, T., Okuno, H.G.: Soft Missing-Feature Mask Generation for Simultaneous Speech Recognition System in Robots. In: Proceedings of International Conference on Spoken Language Processing (Interspeech), pp. 992–997 (2008)Google Scholar
  10. 10.
    Parra, L.C., Alvino, C.V.: Geometric Source Separation: Merging Convolutive Source Separation with Geometric Beamforming. IEEE Trans. Speech Audio Process 10(6), 352–362 (2002)CrossRefGoogle Scholar
  11. 11.
    Ince, G., Nakadai, K., Rodemann, T., Hasegawa, Y., Tsujino, H., Imura, J.: A Hybrid Framework for Ego Noise Cancellation of a Robot. In: Proc. of the IEEE/RSJ International Conference on Robotics and Automation, ICRA (to appear, 2010)Google Scholar
  12. 12.
    Schmidt, R.: Multiple emitter location and signal parameter estimation. IEEE Trans. on Antennas and Propagation 34(3), 276–280 (1986)CrossRefGoogle Scholar
  13. 13.
    Valin, J.-M., Rouat, J., Michaud, F.: Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter. In: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2123–2128 (2004)Google Scholar
  14. 14.
    Nakajima, H., Nakadai, K., Hasegawa, Y., Tsujino, H.: Adaptive step-size parameter control for real-world blind source separation. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 149–152 (2008)Google Scholar
  15. 15.
    Nishimura, Y., Shinozaki, T., Iwano, K., Furui, S.: Noise-robust speech recognition using multi-band spectral features. In: Proc. of 148th Acoustical Society of America Meetings, vol. 1aSC7 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Honda Res. Inst. Japan Co., Ltd.SaitamaJapan
  2. 2.Honda Res. Inst. Europe GmbHOffenbachGermany
  3. 3.Dept. of Mech. and Env. Informatics, Grad. School of Information Science and Eng.Tokyo Inst. of Tech.TokyoJapan

Personalised recommendations