Evaluation of Hands-Free Large Vocabulary Continuous Speech Recognition by Blind Dereverberation Based on Spectral Subtraction by Multi-channel LMS Algorithm

Wang, Longbiao; Odani, Kyohei; Kai, Atsuhiko

doi:10.1007/978-3-642-23538-2_17

Longbiao Wang²¹,
Kyohei Odani²¹ &
Atsuhiko Kai²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6836))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

919 Accesses
1 Citations

Abstract

Previously, Wang et al. [1] proposed a blind dereverberation method based on spectral subtraction using a multi-channel least mean squares (MCLMS) algorithm for distant-talking speech recognition. Preliminary experiments showed that this method is effective for isolated word recognition in a reverberant environment. However, robustness and effect factors of the dereverberation method based on spectral subtraction were not investigated. In this paper, we analyze the effect factors of compensation parameter estimation for the dereverberation method based on spectral subtraction, such as the number of channels (the number of microphones), the length of reverberation to be suppressed, and the length of the utterance used for parameter estimation, and evaluate these on large vocabulary continuous speech recognition (LVCSR). We conducted speech recognition experiments on a distorted speech signal simulated by convolving multi-channel impulse responses with clean speech. The proposed method with beamforming achieves a relative word error reduction rate of 19.2% relative to conventional cepstral mean normalization with beamforming for LVCSR. The experimental results also show that our proposed method is robust in a variety of reverberant environments for both isolated and continuous speech recognition and under various parameter estimation conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wang, L., Kitaoka, N., Nakagawa, S.: Distant-talking speech recognition based on spectral subtraction by multi-channel LMS algorithm. IEICE Trans. Information and Systems E94-D(3), 659–667 (2011)
Article Google Scholar
Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Processing 29(2), 254–272 (1981)
Article Google Scholar
Raut, C., Nishimoto, T., Sagayama, S.: Adaptation for long convolutional distortion by maximum likelihood based state filtering approach. In: Proc. of ICASSP-2006, vol. 1, pp. 1133–1136 (2006)
Google Scholar
Jin, Q., Schultz, T., Waibel, A.: Far-field speaker recognition. IEEE Trans. ASLP 15(7), 2023–2032 (2007)
Google Scholar
Huang, Y., Benesty, J., Chen, J.: Optimal step size of the adaptive multi-channel LMS algorithm for blind SIMO identification. IEEE Signal Processing Letters 12(3), 173–175 (2005)
Article Google Scholar
Huang, Y., Benesty, J., Chen, J.: Acoustic MIMO Signal Processing. Springer, Heidelberg (2006)
MATH Google Scholar
Nakamura, S., Hiyane, K., Asano, F., Nishiura, T., Yamada, T.: Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition. In: Proc. of LREC 2000, pp. 965–968 (May 2000)
Google Scholar
Itou, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T., Shikano, K., Itahashi, S.: JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. J. Acoust. Soc. Jpn (E) 20(3), 199–206 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering, Shizuoka University, 3-5-1 Johoku, Naka-ku, Hamamatsu, 432-8561, Japan
Longbiao Wang, Kyohei Odani & Atsuhiko Kai

Authors

Longbiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kyohei Odani
View author publications
You can also search for this author in PubMed Google Scholar
Atsuhiko Kai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Sciences, University of West Bohemia, Univerzitní 22, 306 14, Pilsen, Czech Republic
Ivan Habernal
Faculty of Applied Sciences, Dept. of Computer Science and Engineering, University of West Bohemia, Univerzitni 8, 306 14, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Odani, K., Kai, A. (2011). Evaluation of Hands-Free Large Vocabulary Continuous Speech Recognition by Blind Dereverberation Based on Spectral Subtraction by Multi-channel LMS Algorithm. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-23538-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics