Ego noise cancellation of a robot using missing feature masks

Abstract

We describe an architecture that gives a robot the capability to recognize speech by cancelling ego noise, even while the robot is moving. The system consists of three blocks: (1) a multi-channel noise reduction block, comprising consequent stages of microphone-array-based sound localization, geometric source separation and post-filtering; (2) a single-channel noise reduction block utilizing template subtraction; and (3) an automatic speech recognition block. In this work, we specifically investigate a missing feature theory-based automatic speech recognition (MFT-ASR) approach in block (3). This approach makes use of spectro-temporal elements derived from (1) and (2) to measure the reliability of the acoustic features, and generates masks to filter unreliable acoustic features. We then evaluated this system on a robot using word correct rates. Furthermore, we present a detailed analysis of recognition accuracy to determine optimal parameters. Implementation of the proposed MFT-ASR approach resulted in significantly higher recognition performance than single or multi-channel noise reduction methods.

This is a preview of subscription content, access via your institution.

Abbreviations

ANN:

Artificial Neural Network

ASR:

Automatic Speech Recognition

BSS:

Blind Source Separation

DoA:

Direction of Arrival

GSS:

Geometric Source Separation

HMM:

Hidden Markov Model

MCRA:

Minima Controlled Recursive Averaging

MFCC:

Mel-Frequency Cepstral Coefficients

MFM:

Missing Feature Mask

MFT:

Missing Feature Theory

MMSE:

Minimum Mean Square Estimation

MSLS:

Mel-Scale Log Spectrum

MUSIC:

MUltiple SIgnal Classification

NN:

Nearest Neighbour

PF:

Post-Filtering

SE:

Speech Enhancement

SS:

Spectral Subtraction

SSL:

Sound Source Localization

SSS:

Sound Source Separation

TS:

Template Subtraction

WCR:

Word Correct Rate

WF:

Wiener Filtering

References

  1. 1.

    Sato M, Sugiyama A, Ohnaka S (2004) An adaptive noise canceller with low signal-distortion based on variable stepsize subfilters for human-robot communication. IEICE Trans Fundam Electron Commun Comput Sci E88-A(8):2055–2061

    Article  Google Scholar 

  2. 2.

    Brandstein M, Ward D (2001) Microphone arrays: signal processing techniques and applications. Springer, Berlin

    Google Scholar 

  3. 3.

    Benesty J, Sondhi MM, Huang Y (2008) Springer handbook of speech processing. Springer, Berlin

    Book  Google Scholar 

  4. 4.

    Ince G, Nakadai K, Rodemann T, Hasegawa Y, Tsujino H, Imura J (2010) A hybrid framework for ego noise cancellation of a robot. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), pp 3623–3628

    Google Scholar 

  5. 5.

    Boll S (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process 27(2):113–120

    Article  Google Scholar 

  6. 6.

    Cohen I (2002) Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Process Lett 9(1):12–15

    Article  Google Scholar 

  7. 7.

    Deller J (2000) Discrete-time processing of speech signals. IEEE Press, New York

    Google Scholar 

  8. 8.

    Martin R (1994) Spectral subtraction based on minimum statistics. In: Proceedings European signal processing, pp 1182–1185

    Google Scholar 

  9. 9.

    Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Signal Process 81:2403–2481

    MATH  Article  Google Scholar 

  10. 10.

    Nakajima H, Ince G, Nakadai K, Hasegawa Y (2010) An easily-configurable robot audition system using histogram-based recursive level estimation. In: Proceedings of the IEEE/RSJ international conference on robots and intelligent systems (IROS), pp 958–963

    Google Scholar 

  11. 11.

    Nakadai K, Okuno HG, Kitano H (2000) Humanoid active audition system improved by the cover acoustics. In: PRICAI 2000 topics in artificial intelligence (sixth pacific rim international conference on artificial intelligence). Springer lecture notes in artificial intelligence, vol. 1886. Springer, Berlin, pp 544–554

    Google Scholar 

  12. 12.

    Ito A, Kanayama T, Suzuki M, Makino S (2005) Internal noise suppression for speech recognition by small robots. In: Proceedings of the interspeech, pp 2685–2688

    Google Scholar 

  13. 13.

    Ince G, Nakadai K, Rodemann T, Hasegawa Y, Tsujino H, Imura J (2009) Ego noise suppression of a robot using template subtraction. In: Proceedings of the IEEE/RSJ international conference on robots and intelligent systems (IROS), pp 199–204

    Google Scholar 

  14. 14.

    Yamamoto S, Nakadai K, Nakano M, Tsujino H, Valin J-M, Komatani K, Ogata T, Okuno HG (2006) Real-time robot audition system that recognizes simultaneous speech in the real world. In: Proceedings of the IEEE/RSJ international conference on robots and intelligent systems (IROS), pp 5333–5338

    Google Scholar 

  15. 15.

    Valin J-M, Rouat J, Michaud F (2004) Enhanced robot audition based on microphone array source separation with post-filter. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 2123–2128

    Google Scholar 

  16. 16.

    Even J, Sawada H, Saruwatari H, Shikano K, Takatani T (2009) Semi-blind suppression of internal noise for hands-free robot spoken dialog system. In: Proceedings of the IEEE/RSJ international conference on robots and intelligent systems (IROS), pp 659–663

    Google Scholar 

  17. 17.

    Mizumachi M, Nakamura S (2004) Passive subtractive beamformer for near-field sound sources. In: Proceedings of the IEEE sensor array and multichannel signal processing workshop, pp 74–78

    Google Scholar 

  18. 18.

    Zheng YR, Goubran RA, El-Tanany M (2003) A nested sensor array focusing on near field targets. In: Proceedings of the IEEE sensors, vol 2, pp 843–848

    Google Scholar 

  19. 19.

    Raj B, Stern RM (2005) Missing-feature approaches in speech recognition. IEEE Signal Process Mag 22:101–116

    Article  Google Scholar 

  20. 20.

    Takahashi T, Yamamoto S, Nakadai K, Komatani K, Ogata T, Okuno HG (2008) Soft missing-feature mask generation for simultaneous speech recognition system in robots. In: Proceedings of the interspeech, pp 992–997

    Google Scholar 

  21. 21.

    Nishimura Y, Ishizuka M, Nakadai K, Nakano M, Tsujino H (2006) Speech recognition for a robot under its motor noises by selective application of missing feature theory and MLLR. In: Proceedings of the IEEE-RAS international conference on humanoid robots, pp 26–33

    Google Scholar 

  22. 22.

    Parra LC, Alvino CV (2002) Geometric source separation: merging convolutive source separation with geometric beamforming. IEEE Trans Speech Audio Process 10(6):352–362

    Article  Google Scholar 

  23. 23.

    Schmidt R (1986) Multiple emitter location and signal parameter estimation. IEEE Trans Antennas Propag 34(3):276–280

    Article  Google Scholar 

  24. 24.

    Nakajima H, Nakadai K, Hasegawa Y, Tsujino H (2008) Adaptive step-size parameter control for real-world blind source separation. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 149–152

    Google Scholar 

  25. 25.

    Nakadai K, Nakajima H, Hasegawa Y, Tsujino H (2009) Sound source separation of moving speakers for robot audition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3685–3688

    Google Scholar 

  26. 26.

    Ephraim Y, Malah D (1984) Speech enhancement using minimum mean-square error short-time spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 32(6):1109–1121

    Article  Google Scholar 

  27. 27.

    Cohen I, Berdugo B (2002) Microphone array post-filtering for non-stationary noise suppression. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 901–904

    Google Scholar 

  28. 28.

    Nishimura Y, Shinozaki T, Iwano K, Furui S (2004) Noise-robust speech recognition using multi-band spectral features. In: Proceedings of the 148th acoustical society of America meetings 1aSC7

    Google Scholar 

  29. 29.

    Nakadai K, Takahashi T, Okuno H, Nakajima H, Hasegawa Y, Tsujino H (2010) Design and implementation of robot audition system “HARK”—open source software for listening to three simultaneous speakers. Adv Robot 24:739–761

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Gökhan Ince.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ince, G., Nakadai, K., Rodemann, T. et al. Ego noise cancellation of a robot using missing feature masks. Appl Intell 34, 360–371 (2011). https://doi.org/10.1007/s10489-011-0285-0

Download citation

Keywords

  • Ego noise
  • Noise reduction
  • Robot audition
  • Automatic speech recognition
  • Missing feature theory
  • Microphone array