Abstract
Automatic speech recognition (ASR) systems, trained on speech signals from close-talking microphones, generally fail in recognizing far-field speech. In this paper, we present a Hilbert Envelope based feature extraction technique to alleviate the artifacts introduced by room reverberations. The proposed technique is based on modeling temporal envelopes of the speech signal in narrow sub-bands using Frequency Domain Linear Prediction (FDLP). ASR experiments on far-field speech using the proposed FDLP features show significant performance improvements when compared to other robust feature extraction techniques (average relative improvement of 43 % in word error rate).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Avendano, C.: Temporal Processing of Speech in a Time-Feature Space. Ph.D. thesis, Oregon Graduate Institute (1997)
Avendano, C., Hermansky, H.: On the Effects of Short-Term Spectrum Smoothing in Channel Normalization. IEEE Trans. Speech and Audio Proc. 5(4), 372–374 (1997)
Gelbart, D., Morgan, N.: Double the trouble: handling noise and reverberation in far-field automatic speech recognition. In: Proc. ICSLP, Colorado, USA, pp. 2185–2188 (2002)
Herre, J., Johnston, J.D.: Enhancing the Performance of Perceptual Audio Coders by using Temporal Noise Shaping (TNS). In: Proc. 101st AES Conv., Los. Angeles, USA, pp. 1–24 (1996)
Makhoul, J.: Linear Prediction: A Tutorial Review. Proc. of the IEEE 63(4), 561–580 (1975)
Athineos, M., Hermansky, H., Ellis, D.P.W.: LP-TRAPS: Linear Predictive Temporal Patterns. In: Proc. INTERSPEECH, Jeju Island, Korea, pp. 1154–1157 (2004)
Mourjopoulos, J., Hammond, J.K.: Modelling and Enhancement of Reverberant Speech using an Envelope Convolution Method. In: Proc. ICA, Boston, USA, pp. 1144–1147 (1983)
Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)
Hirsch, H.G., Pearce, D.: The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions. In: Proc. ISCA ITRW ASR 2000, Paris, France, pp. 18–20 (2000)
Pierce, D., Gunawardana, A.: Aurora 2.0 speech recognition in noise: Update 2. In: Colorado, U.S.A. (ed.) Proc. ICSLP Session on Noise Robust Rec, Colorado, USA (2002)
The ICSI Meeting Recorder Project, http://www.icsi.berkeley.edu/Speech/mr
ICSI Room Responses, http://www.icsi.berkeley.edu/speech/papers/asru01-meansub-corr.html
Rosenberg, A.E., Lee, C., Soong, F.K.: Cepstral Channel Normalization Techniques for HMM-Based Speaker Verification. In: Proc. ICSLP, Yokohama, Japan, pp. 1835–1838 (1994)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thomas, S., Ganapathy, S., Hermansky, H. (2008). Hilbert Envelope Based Features for Far-Field Speech Recognition. In: Popescu-Belis, A., Stiefelhagen, R. (eds) Machine Learning for Multimodal Interaction. MLMI 2008. Lecture Notes in Computer Science, vol 5237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85853-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-85853-9_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85852-2
Online ISBN: 978-3-540-85853-9
eBook Packages: Computer ScienceComputer Science (R0)