Robust Feature Combination for Speech Recognition Using Linear Microphone Array in a Car
Abstract
When speech recognition is performed in a car environment, there are two important robustness issues that should be taken into account. The first robustness is related to the noisy acoustic condition, and it has been one of the most popular research topics of in-vehicle speech recognition. In contrast, the second robustness, which is related to unstable calibration of the audio input, has not attracted much attention. Consequently, the performance of speech recognition would degrade greatly in a real application if the input device such as a microphone array is badly calibrated. We propose robust feature combination in the MFCC domain using speech inputs from a linear microphone array. It realizes robust (from both the noise and the calibration viewpoints) and practical speech recognition applications in car environments. Even a simple MFCC averaging approach is effective, and a new algorithm, hypothesis-based feature combination (HBFC), improves the performance. We also extend cepstral variance normalization as variance re-scaling, which makes the feature combination approach more robust. The advantages of the proposed algorithms are confirmed by the experiments using the data recorded in a moving car.
Keywords
Speech recognition Microphone array MFCC Feature combination Hypothesis Variance normalization GMMNotes
Acknowledgments
The authors are thankful to Professor Sadaoki Furui of Tokyo Institute of Technology and Professor Tetsunori Kobayashi of Waseda University for their valuable comments. This work was supported in part by the New Energy and Industrial Technology Development Organization (NEDO), Japan.
References
- 1.Y. Obuchi, “Hypothesis-based feature combination for dual-microphone speech recognition,” Proc. HSCMA, Piscataway, NJ, USA, 2005.Google Scholar
- 2.Y. Obuchi and N. Hataoka, “Hypothesis-based feature combination of multiple speech inputs for robust speech recognition in automotive environments,” Proc. Interspeech2006-ICSLP, Pittsburgh, PA, USA, 2006.Google Scholar
- 3.B. S. Atal, “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification,” Journal of Acoustical Society of America, Vol. 55, No. 6, pp. 1304–1312, 1974.CrossRefGoogle Scholar
- 4.Y. Obuchi and N. Hataoka, “Development and evaluation of speech database in automotive environments for practical speech recognition systems,” Proc. Interspeech2006-ICSLP, Pittsburgh, PA, USA, 2006.Google Scholar
- 5.J. P. Openshaw and J. S. Mason, “On the limitations of cepstral features in noise,” Proc. ICASSP, Adelaide, Australia, 1994.Google Scholar
- 6.W. Kellermann, “A self steering digital microphone array,” Proc. ICASSP, Toronto, Canada, 1991.Google Scholar