Abstract
Performance of speech recognition systems strongly degrades in the presence of background noise, like the driving noise in the interior of a car. We compare two different Kalman filtering approaches which attempt to improve noise robustness: Switching Linear Dynamic Models (SLDM) and Autoregressive Switching Linear Dynamical Systems (AR-SLDS). Unlike previous works which are restricted on considering white noise, we evaluate the modeling concepts in a noisy speech recognition task where also colored noise produced through different driving conditions and car types is taken into account. Thereby we demonstrate that speech enhancement based on Kalman filtering prevails over all standard de-noising techniques considered herein, such as Wiener filtering, Histogram Equalization, and Unsupervised Spectral Subtraction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms. ETSI standard doc. ES 202 050 V1.1.5 (2007)
Lathoud, G., Doss, M.M., Boulard, H.: Channel normalization for unsupervised spectral subtraction. In: Proceedings of ASRU (2005)
Rahim, M.G., Juang, B.H., Chou, W., Buhrke, E.: Signal conditioning techniques for robust speech recognition. IEEE Signal Processing Letters, 107–109 (1996)
Lathoud, G., Magimia-Doss, M., Mesot, B., Boulard, H.: Unsupervised spectral subtraction for noise-robust ASR. In: Proceedings of ASRU, pp. 189–194 (2005)
Viikki, O., Laurila, K.: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication, 133–147 (1998)
de la Torre, A., Peinado, A.M., Segura, J.C., Perez-Cordoba, J.L., Benitez, M.C., Rubio, A.J.: Histogram equalization of speech representation for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 355–366 (2005)
Hilger, F., Ney, H.: Quantile based histogram equalization for noise robust speech recognition. In: Eurospeech, pp. 1135–1138 (2001)
Droppo, J., Acero, A.: Noise robust speech recognition with a switching linear dynamic model. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (2004)
Mesot, B., Barber, D.: Switching linear dynamical systems for noise robust speech recognition. IEEE Transactions on Audio, Speech and Language Processing (2007)
Deng, J., Bouchard, M., Yeap, T.H.: Noisy speech feature estimation on the Aurora2 database using a switching linear dynamic model. Journal of Multimedia, 47–52 (2007)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 1–38 (1977)
Droppo, J., Deng, L., Acero, A.: A comparison of three non-linear observation models for noisy speech features. In: Eurospeech, pp. 681–684 (2003)
Bar-Shalom, Y., Li, X.R.: Estimation and tracking: principles, techniques, and software. Artech House, Norwood, MA (1993)
Ephraim, Y., Roberts, W.J.J.: Revisiting autoregressive hidden Markov modeling of speech signals. IEEE Signal Processing Letters, 166–169 (2005)
Poritz, A.: Linear predictive hidden Markov models and the speech signal. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1291–1294 (1982)
Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics, 1554–1563 (1966)
Rauch, H.E., Tung, G., Striebel, C.T.: Maximum likelihood estimates of linear dynamic systems. Journal of American Institiute of Aeronautics and Astronautics, 1445–1450 (1965)
Barber, D.: Expectation correction for smoothed inference in switching linear dynamical systems. Journal of Machine Learning Reseach, 2515–2540 (2006)
Doddington, G.R., Schalk, T.B.: Speech recognition: turning theory to practice. IEEE Spectrum, 26–32 (1981)
Grimm, M., Kroschel, K., Harris, H., Nass, C., Schuller, B., Rigoll, G., Moosmayr, T.: On the necessity and feasibility of detecting a driver’s emotional state while driving. In: Paiva, A., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 126–138. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schuller, B., Wöllmer, M., Moosmayr, T., Ruske, G., Rigoll, G. (2008). Switching Linear Dynamic Models for Noise Robust In-Car Speech Recognition. In: Rigoll, G. (eds) Pattern Recognition. DAGM 2008. Lecture Notes in Computer Science, vol 5096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69321-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-69321-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69320-8
Online ISBN: 978-3-540-69321-5
eBook Packages: Computer ScienceComputer Science (R0)