Skip to main content

Cross-Probability Model Based on Gmm for Feature Vector Normalization

  • Chapter
  • First Online:
In-Vehicle Corpus and Signal Processing for Driver Behavior
  • 1071 Accesses

Abstract

In order to develop a robust man–machine interface based on speech for cars, multi-environment model-based linear normalization, MEMLIN, was presented earlier and it was proved to be effective to compensate environment mismatch. MEMLIN is an empirical feature vector normalization technique which models clean and noisy spaces with Gaussian mixture models, GMMs; and the probability of the clean model Gaussian, given the noisy model one and the noisy feature vector (cross-probability model), is a critical point. In previous works the cross-probability model was approximated as time independent in a training process. However, in this chapter, an estimation based on GMM is considered for MEMLIN. Some experiments with SpeechDat Car and Aurora2 databases were carried out in order to study the performance of the proposed estimation of the cross-probability model, obtaining important improvements: 75.53 and 62.49% of mean improvement in word error rate, WER, for MEMLIN with SpeechDat Car and a reduced set of Aurora2 database, respectively (82.86 and 67.52% if time-independent cross-probability model is applied). Although the behaviour of the technique is satisfactory, using clean acoustic models in decoding produces a mismatch because the normalization is not perfect. So, retraining acoustic models in the normalized space is proposed, reaching 97.27% of mean improvement with SpeechDat Car database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. L. Buera, E. Lleida, A. Miguel and A. Ortega, “Multi-environment models based linear normalization for robust speech recognition in car environments”, Proceedings of the ICASSP, vol. 1, pp. 1013–1016, 2004.

    Google Scholar 

  2. L. Buera, E. Lleida, J. A. Nolazco-Flores, A. Miguel and A. Ortega, “Time-dependent cross-probability model for multi-environment model based linear normalization”, Proceedings of the ICSLP, 2006.

    Google Scholar 

  3. A.P. Dempster and D.B. Rubin, “Maximum likelihood from incomplete data via EM algorithm”, Journal of the Royal Statistical Society, vol. 9, no. 1, pp. 1–37, 1977.

    MathSciNet  Google Scholar 

  4. J. Droppo, L. Deng and A. Acero, “Evaluation of the SPLICE algorithm on the Aurora 2 database”, Proceedings of Eurospeech, pp. 217–220, 2001.

    Google Scholar 

  5. J.L. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains”, IEEE Transactions SAP, vol. 2, pp. 291–298, 1994.

    Google Scholar 

  6. H.G. Hirsch and D. Pearce, “The aurora experimental framework for the performance evaluations of speech recognition systems under noisy conditions”, Proceedings of the ISCA ITRW ASR2000, 2000.

    Google Scholar 

  7. C.J. Leggetter and P.C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models”, Computer Speech and Language, vol. 9, no. 2, pp. 171–185. ISSN 0885-2308.

    Google Scholar 

  8. E. Lombard, “Le signe de l´elevation de la voix”, Ann. Maladies Oreille, Larynx, Nez, Pharynx, vol. 37, pp. 101–119, 1911.

    Google Scholar 

  9. A. Moreno, B. Lindberg, C. Draxler, G. Richard, K. Choukri, S. Euler and J. Allen, “Speechdat-car, a large speech database for automotive environments”, Proceedings of LREC, vol. 2, pp. 895–900, 2000.

    Google Scholar 

  10. L. Neumeyer and M. Weintraub, “Robust speech recognition in noise using adaptation and mapping techniques”, Proceedings of ICASSP, vol. 1, pp. 141–144, 1995.

    Google Scholar 

  11. A. Sankar and C. Lee, “A maximum-likelihood approach to stochastic matching for robust speech recognition”, IEEE Transactions SAP, vol. 4, pp. 190–202, 1996.

    Google Scholar 

  12. R. Stern, B. Raj and P. J. Moreno, “Compensation for environmental degradation in automatic speech recognition”, ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 33–42, 1997.

    Google Scholar 

Download references

Acknowledgments

This work has been supported by the Spanish National Project TIN 2005-08660-C04-01.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Buera, L., Miguel, A., Lleida, E., Ortega, A., Saz, Ó. (2009). Cross-Probability Model Based on Gmm for Feature Vector Normalization. In: Takeda, K., Erdogan, H., Hansen, J.H.L., Abut, H. (eds) In-Vehicle Corpus and Signal Processing for Driver Behavior. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79582-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-79582-9_14

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-79581-2

  • Online ISBN: 978-0-387-79582-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics