Skip to main content

Switching Linear Dynamic Models for Noise Robust In-Car Speech Recognition

  • Conference paper
Pattern Recognition (DAGM 2008)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5096))

Included in the following conference series:

Abstract

Performance of speech recognition systems strongly degrades in the presence of background noise, like the driving noise in the interior of a car. We compare two different Kalman filtering approaches which attempt to improve noise robustness: Switching Linear Dynamic Models (SLDM) and Autoregressive Switching Linear Dynamical Systems (AR-SLDS). Unlike previous works which are restricted on considering white noise, we evaluate the modeling concepts in a noisy speech recognition task where also colored noise produced through different driving conditions and car types is taken into account. Thereby we demonstrate that speech enhancement based on Kalman filtering prevails over all standard de-noising techniques considered herein, such as Wiener filtering, Histogram Equalization, and Unsupervised Spectral Subtraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms. ETSI standard doc. ES 202 050 V1.1.5 (2007)

    Google Scholar 

  2. Lathoud, G., Doss, M.M., Boulard, H.: Channel normalization for unsupervised spectral subtraction. In: Proceedings of ASRU (2005)

    Google Scholar 

  3. Rahim, M.G., Juang, B.H., Chou, W., Buhrke, E.: Signal conditioning techniques for robust speech recognition. IEEE Signal Processing Letters, 107–109 (1996)

    Google Scholar 

  4. Lathoud, G., Magimia-Doss, M., Mesot, B., Boulard, H.: Unsupervised spectral subtraction for noise-robust ASR. In: Proceedings of ASRU, pp. 189–194 (2005)

    Google Scholar 

  5. Viikki, O., Laurila, K.: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication, 133–147 (1998)

    Google Scholar 

  6. de la Torre, A., Peinado, A.M., Segura, J.C., Perez-Cordoba, J.L., Benitez, M.C., Rubio, A.J.: Histogram equalization of speech representation for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 355–366 (2005)

    Google Scholar 

  7. Hilger, F., Ney, H.: Quantile based histogram equalization for noise robust speech recognition. In: Eurospeech, pp. 1135–1138 (2001)

    Google Scholar 

  8. Droppo, J., Acero, A.: Noise robust speech recognition with a switching linear dynamic model. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (2004)

    Google Scholar 

  9. Mesot, B., Barber, D.: Switching linear dynamical systems for noise robust speech recognition. IEEE Transactions on Audio, Speech and Language Processing (2007)

    Google Scholar 

  10. Deng, J., Bouchard, M., Yeap, T.H.: Noisy speech feature estimation on the Aurora2 database using a switching linear dynamic model. Journal of Multimedia, 47–52 (2007)

    Google Scholar 

  11. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 1–38 (1977)

    Google Scholar 

  12. Droppo, J., Deng, L., Acero, A.: A comparison of three non-linear observation models for noisy speech features. In: Eurospeech, pp. 681–684 (2003)

    Google Scholar 

  13. Bar-Shalom, Y., Li, X.R.: Estimation and tracking: principles, techniques, and software. Artech House, Norwood, MA (1993)

    Google Scholar 

  14. Ephraim, Y., Roberts, W.J.J.: Revisiting autoregressive hidden Markov modeling of speech signals. IEEE Signal Processing Letters, 166–169 (2005)

    Google Scholar 

  15. Poritz, A.: Linear predictive hidden Markov models and the speech signal. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1291–1294 (1982)

    Google Scholar 

  16. Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics, 1554–1563 (1966)

    Google Scholar 

  17. Rauch, H.E., Tung, G., Striebel, C.T.: Maximum likelihood estimates of linear dynamic systems. Journal of American Institiute of Aeronautics and Astronautics, 1445–1450 (1965)

    Google Scholar 

  18. Barber, D.: Expectation correction for smoothed inference in switching linear dynamical systems. Journal of Machine Learning Reseach, 2515–2540 (2006)

    Google Scholar 

  19. Doddington, G.R., Schalk, T.B.: Speech recognition: turning theory to practice. IEEE Spectrum, 26–32 (1981)

    Google Scholar 

  20. Grimm, M., Kroschel, K., Harris, H., Nass, C., Schuller, B., Rigoll, G., Moosmayr, T.: On the necessity and feasibility of detecting a driver’s emotional state while driving. In: Paiva, A., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 126–138. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Gerhard Rigoll

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schuller, B., Wöllmer, M., Moosmayr, T., Ruske, G., Rigoll, G. (2008). Switching Linear Dynamic Models for Noise Robust In-Car Speech Recognition. In: Rigoll, G. (eds) Pattern Recognition. DAGM 2008. Lecture Notes in Computer Science, vol 5096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69321-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69321-5_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69320-8

  • Online ISBN: 978-3-540-69321-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics