Cross-Probability Model Based on Gmm for Feature Vector Normalization

Buera, Luis; Miguel, Antonio; Lleida, Eduardo; Ortega, Alfonso; Saz, Óscar

doi:10.1007/978-0-387-79582-9_14

Luis Buera⁵,
Antonio Miguel,
Eduardo Lleida,
Alfonso Ortega &
…
Óscar Saz

1071 Accesses

Abstract

In order to develop a robust man–machine interface based on speech for cars, multi-environment model-based linear normalization, MEMLIN, was presented earlier and it was proved to be effective to compensate environment mismatch. MEMLIN is an empirical feature vector normalization technique which models clean and noisy spaces with Gaussian mixture models, GMMs; and the probability of the clean model Gaussian, given the noisy model one and the noisy feature vector (cross-probability model), is a critical point. In previous works the cross-probability model was approximated as time independent in a training process. However, in this chapter, an estimation based on GMM is considered for MEMLIN. Some experiments with SpeechDat Car and Aurora2 databases were carried out in order to study the performance of the proposed estimation of the cross-probability model, obtaining important improvements: 75.53 and 62.49% of mean improvement in word error rate, WER, for MEMLIN with SpeechDat Car and a reduced set of Aurora2 database, respectively (82.86 and 67.52% if time-independent cross-probability model is applied). Although the behaviour of the technique is satisfactory, using clean acoustic models in decoding produces a mismatch because the normalization is not perfect. So, retraining acoustic models in the normalized space is proposed, reaching 97.27% of mean improvement with SpeechDat Car database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

L. Buera, E. Lleida, A. Miguel and A. Ortega, “Multi-environment models based linear normalization for robust speech recognition in car environments”, Proceedings of the ICASSP, vol. 1, pp. 1013–1016, 2004.
Google Scholar
L. Buera, E. Lleida, J. A. Nolazco-Flores, A. Miguel and A. Ortega, “Time-dependent cross-probability model for multi-environment model based linear normalization”, Proceedings of the ICSLP, 2006.
Google Scholar
A.P. Dempster and D.B. Rubin, “Maximum likelihood from incomplete data via EM algorithm”, Journal of the Royal Statistical Society, vol. 9, no. 1, pp. 1–37, 1977.
MathSciNet Google Scholar
J. Droppo, L. Deng and A. Acero, “Evaluation of the SPLICE algorithm on the Aurora 2 database”, Proceedings of Eurospeech, pp. 217–220, 2001.
Google Scholar
J.L. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains”, IEEE Transactions SAP, vol. 2, pp. 291–298, 1994.
Google Scholar
H.G. Hirsch and D. Pearce, “The aurora experimental framework for the performance evaluations of speech recognition systems under noisy conditions”, Proceedings of the ISCA ITRW ASR2000, 2000.
Google Scholar
C.J. Leggetter and P.C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models”, Computer Speech and Language, vol. 9, no. 2, pp. 171–185. ISSN 0885-2308.
Google Scholar
E. Lombard, “Le signe de l´elevation de la voix”, Ann. Maladies Oreille, Larynx, Nez, Pharynx, vol. 37, pp. 101–119, 1911.
Google Scholar
A. Moreno, B. Lindberg, C. Draxler, G. Richard, K. Choukri, S. Euler and J. Allen, “Speechdat-car, a large speech database for automotive environments”, Proceedings of LREC, vol. 2, pp. 895–900, 2000.
Google Scholar
L. Neumeyer and M. Weintraub, “Robust speech recognition in noise using adaptation and mapping techniques”, Proceedings of ICASSP, vol. 1, pp. 141–144, 1995.
Google Scholar
A. Sankar and C. Lee, “A maximum-likelihood approach to stochastic matching for robust speech recognition”, IEEE Transactions SAP, vol. 4, pp. 190–202, 1996.
Google Scholar
R. Stern, B. Raj and P. J. Moreno, “Compensation for environmental degradation in automatic speech recognition”, ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 33–42, 1997.
Google Scholar

Download references

Acknowledgments

This work has been supported by the Spanish National Project TIN 2005-08660-C04-01.

Author information

Authors and Affiliations

Aragon Institute of Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain
Luis Buera

Authors

Luis Buera
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Miguel
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Lleida
View author publications
You can also search for this author in PubMed Google Scholar
Alfonso Ortega
View author publications
You can also search for this author in PubMed Google Scholar
Óscar Saz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Buera, L., Miguel, A., Lleida, E., Ortega, A., Saz, Ó. (2009). Cross-Probability Model Based on Gmm for Feature Vector Normalization. In: Takeda, K., Erdogan, H., Hansen, J.H.L., Abut, H. (eds) In-Vehicle Corpus and Signal Processing for Driver Behavior. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79582-9_14

Download citation

DOI: https://doi.org/10.1007/978-0-387-79582-9_14
Published: 06 October 2008
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-79581-2
Online ISBN: 978-0-387-79582-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics