Abstract
In order to develop a robust man–machine interface based on speech for cars, multi-environment model-based linear normalization, MEMLIN, was presented earlier and it was proved to be effective to compensate environment mismatch. MEMLIN is an empirical feature vector normalization technique which models clean and noisy spaces with Gaussian mixture models, GMMs; and the probability of the clean model Gaussian, given the noisy model one and the noisy feature vector (cross-probability model), is a critical point. In previous works the cross-probability model was approximated as time independent in a training process. However, in this chapter, an estimation based on GMM is considered for MEMLIN. Some experiments with SpeechDat Car and Aurora2 databases were carried out in order to study the performance of the proposed estimation of the cross-probability model, obtaining important improvements: 75.53 and 62.49% of mean improvement in word error rate, WER, for MEMLIN with SpeechDat Car and a reduced set of Aurora2 database, respectively (82.86 and 67.52% if time-independent cross-probability model is applied). Although the behaviour of the technique is satisfactory, using clean acoustic models in decoding produces a mismatch because the normalization is not perfect. So, retraining acoustic models in the normalized space is proposed, reaching 97.27% of mean improvement with SpeechDat Car database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
L. Buera, E. Lleida, A. Miguel and A. Ortega, “Multi-environment models based linear normalization for robust speech recognition in car environments”, Proceedings of the ICASSP, vol. 1, pp. 1013–1016, 2004.
L. Buera, E. Lleida, J. A. Nolazco-Flores, A. Miguel and A. Ortega, “Time-dependent cross-probability model for multi-environment model based linear normalization”, Proceedings of the ICSLP, 2006.
A.P. Dempster and D.B. Rubin, “Maximum likelihood from incomplete data via EM algorithm”, Journal of the Royal Statistical Society, vol. 9, no. 1, pp. 1–37, 1977.
J. Droppo, L. Deng and A. Acero, “Evaluation of the SPLICE algorithm on the Aurora 2 database”, Proceedings of Eurospeech, pp. 217–220, 2001.
J.L. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains”, IEEE Transactions SAP, vol. 2, pp. 291–298, 1994.
H.G. Hirsch and D. Pearce, “The aurora experimental framework for the performance evaluations of speech recognition systems under noisy conditions”, Proceedings of the ISCA ITRW ASR2000, 2000.
C.J. Leggetter and P.C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models”, Computer Speech and Language, vol. 9, no. 2, pp. 171–185. ISSN 0885-2308.
E. Lombard, “Le signe de l´elevation de la voix”, Ann. Maladies Oreille, Larynx, Nez, Pharynx, vol. 37, pp. 101–119, 1911.
A. Moreno, B. Lindberg, C. Draxler, G. Richard, K. Choukri, S. Euler and J. Allen, “Speechdat-car, a large speech database for automotive environments”, Proceedings of LREC, vol. 2, pp. 895–900, 2000.
L. Neumeyer and M. Weintraub, “Robust speech recognition in noise using adaptation and mapping techniques”, Proceedings of ICASSP, vol. 1, pp. 141–144, 1995.
A. Sankar and C. Lee, “A maximum-likelihood approach to stochastic matching for robust speech recognition”, IEEE Transactions SAP, vol. 4, pp. 190–202, 1996.
R. Stern, B. Raj and P. J. Moreno, “Compensation for environmental degradation in automatic speech recognition”, ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 33–42, 1997.
Acknowledgments
This work has been supported by the Spanish National Project TIN 2005-08660-C04-01.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Buera, L., Miguel, A., Lleida, E., Ortega, A., Saz, Ó. (2009). Cross-Probability Model Based on Gmm for Feature Vector Normalization. In: Takeda, K., Erdogan, H., Hansen, J.H.L., Abut, H. (eds) In-Vehicle Corpus and Signal Processing for Driver Behavior. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79582-9_14
Download citation
DOI: https://doi.org/10.1007/978-0-387-79582-9_14
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-79581-2
Online ISBN: 978-0-387-79582-9
eBook Packages: EngineeringEngineering (R0)