Switching Linear Dynamic Models for Noise Robust In-Car Speech Recognition

Schuller, Björn; Wöllmer, Martin; Moosmayr, Tobias; Ruske, Günther; Rigoll, Gerhard

doi:10.1007/978-3-540-69321-5_25

Björn Schuller¹,
Martin Wöllmer¹,
Tobias Moosmayr²,
Günther Ruske¹ &
…
Gerhard Rigoll¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5096))

Included in the following conference series:

Joint Pattern Recognition Symposium

2309 Accesses
2 Citations

Abstract

Performance of speech recognition systems strongly degrades in the presence of background noise, like the driving noise in the interior of a car. We compare two different Kalman filtering approaches which attempt to improve noise robustness: Switching Linear Dynamic Models (SLDM) and Autoregressive Switching Linear Dynamical Systems (AR-SLDS). Unlike previous works which are restricted on considering white noise, we evaluate the modeling concepts in a noisy speech recognition task where also colored noise produced through different driving conditions and car types is taken into account. Thereby we demonstrate that speech enhancement based on Kalman filtering prevails over all standard de-noising techniques considered herein, such as Wiener filtering, Histogram Equalization, and Unsupervised Spectral Subtraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms. ETSI standard doc. ES 202 050 V1.1.5 (2007)
Google Scholar
Lathoud, G., Doss, M.M., Boulard, H.: Channel normalization for unsupervised spectral subtraction. In: Proceedings of ASRU (2005)
Google Scholar
Rahim, M.G., Juang, B.H., Chou, W., Buhrke, E.: Signal conditioning techniques for robust speech recognition. IEEE Signal Processing Letters, 107–109 (1996)
Google Scholar
Lathoud, G., Magimia-Doss, M., Mesot, B., Boulard, H.: Unsupervised spectral subtraction for noise-robust ASR. In: Proceedings of ASRU, pp. 189–194 (2005)
Google Scholar
Viikki, O., Laurila, K.: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication, 133–147 (1998)
Google Scholar
de la Torre, A., Peinado, A.M., Segura, J.C., Perez-Cordoba, J.L., Benitez, M.C., Rubio, A.J.: Histogram equalization of speech representation for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 355–366 (2005)
Google Scholar
Hilger, F., Ney, H.: Quantile based histogram equalization for noise robust speech recognition. In: Eurospeech, pp. 1135–1138 (2001)
Google Scholar
Droppo, J., Acero, A.: Noise robust speech recognition with a switching linear dynamic model. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (2004)
Google Scholar
Mesot, B., Barber, D.: Switching linear dynamical systems for noise robust speech recognition. IEEE Transactions on Audio, Speech and Language Processing (2007)
Google Scholar
Deng, J., Bouchard, M., Yeap, T.H.: Noisy speech feature estimation on the Aurora2 database using a switching linear dynamic model. Journal of Multimedia, 47–52 (2007)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 1–38 (1977)
Google Scholar
Droppo, J., Deng, L., Acero, A.: A comparison of three non-linear observation models for noisy speech features. In: Eurospeech, pp. 681–684 (2003)
Google Scholar
Bar-Shalom, Y., Li, X.R.: Estimation and tracking: principles, techniques, and software. Artech House, Norwood, MA (1993)
Google Scholar
Ephraim, Y., Roberts, W.J.J.: Revisiting autoregressive hidden Markov modeling of speech signals. IEEE Signal Processing Letters, 166–169 (2005)
Google Scholar
Poritz, A.: Linear predictive hidden Markov models and the speech signal. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1291–1294 (1982)
Google Scholar
Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics, 1554–1563 (1966)
Google Scholar
Rauch, H.E., Tung, G., Striebel, C.T.: Maximum likelihood estimates of linear dynamic systems. Journal of American Institiute of Aeronautics and Astronautics, 1445–1450 (1965)
Google Scholar
Barber, D.: Expectation correction for smoothed inference in switching linear dynamical systems. Journal of Machine Learning Reseach, 2515–2540 (2006)
Google Scholar
Doddington, G.R., Schalk, T.B.: Speech recognition: turning theory to practice. IEEE Spectrum, 26–32 (1981)
Google Scholar
Grimm, M., Kroschel, K., Harris, H., Nass, C., Schuller, B., Rigoll, G., Moosmayr, T.: On the necessity and feasibility of detecting a driver’s emotional state while driving. In: Paiva, A., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 126–138. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Human-Machine Communication, Technische Universität München, 80290, München, Germany
Björn Schuller, Martin Wöllmer, Günther Ruske & Gerhard Rigoll
BMW Group, Forschungs- und Innovationszentrum, Akustik, Komfort und Werterhaltung, 80788, München, Germany
Tobias Moosmayr

Authors

Björn Schuller
View author publications
You can also search for this author in PubMed Google Scholar
Martin Wöllmer
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Moosmayr
View author publications
You can also search for this author in PubMed Google Scholar
Günther Ruske
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Rigoll
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Gerhard Rigoll

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schuller, B., Wöllmer, M., Moosmayr, T., Ruske, G., Rigoll, G. (2008). Switching Linear Dynamic Models for Noise Robust In-Car Speech Recognition. In: Rigoll, G. (eds) Pattern Recognition. DAGM 2008. Lecture Notes in Computer Science, vol 5096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69321-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-69321-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69320-8
Online ISBN: 978-3-540-69321-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics