Science China Information Sciences

, Volume 54, Issue 12, pp 2481–2491 | Cite as

A flexible framework for HMM based noise robust speech recognition using generalized parametric space polynomial regression

Research Papers Special Focus

Abstract

Handling variable, non-stationary ambient noise is a challenging task for automatic speech recognition (ASR) systems. To address this issue, multi-style, noise condition independent (CI) model training using speech data collected in diverse noise environments, or uncertainty decoding techniques can be used. An alternative approach is to explicitly approximate the continuous trajectory of Gaussian component mean and variance parameters against the varying noise level, for example, using variable parameter hidden Markov model (VPHMM). This paper investigates a more generalized form of variable parameter HMMs (GVP-HMM). In addition to Gaussian component means and variances, it can also provide a more compact trajectory modeling for tied linear transformations. An alternative noise condition dependent (CD) training algorithm is also proposed to handle the bias to training noise condition distribution. Consistent error rate gains were obtained over conventional VP-HMM mean and variance only trajectory modeling on a media vocabulary Mandarin Chinese in-car navigation command recognition task.

Keywords

non-stationary noise generalized variable parameter HMM noise robust speech recognition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lippmann R, Martin E, Paul D. Multi-style training for robust isolated-word speech recognition. In: Proceedings of IEEE ICASSP, Dallas, Texas, USA, 1987. 705–708Google Scholar
  2. 2.
    Anastasakos T, McDonough J, Schwartz R, et al. A compact model for speaker-adaptive training. In: Proceedings of ICSLP, Philadelphia, PA, USA, 1996. 1137–1140Google Scholar
  3. 3.
    Gales M J F. Maximum likelihood linear transformations for HMM-based speech recognition. Comput Speech Lang, 1998, 12: 171–185CrossRefGoogle Scholar
  4. 4.
    Leggetter C J, Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density HMMs. Comput Speech Lang, 1995, 9: 171–186CrossRefGoogle Scholar
  5. 5.
    Flego F, Gales M J F. Discriminative adaptive training with VTS and JUD. In: Proceedings of ASRU, Merano, Italy, 2009. 170–175Google Scholar
  6. 6.
    Yu K, Gales M J F. Bayesian adaptive inference and adaptive training. IEEE Trans Audio Speech Lang Process, 2007, 15: 1932–1943CrossRefGoogle Scholar
  7. 7.
    Gales M J F. Adaptive training for robust ASR. In: Proceedings of ASRU, Madonna di Campiglio, Italy, 2001. 15–20Google Scholar
  8. 8.
    Yu K, Gales M J F. Bayesian adaptation and adaptively trained systems. In: Proceedings of ASRU, Cancun, Mexico, 2005. 209–214Google Scholar
  9. 9.
    Arrowood J A, Clements M A. Using observation uncertainty in HMM decoding. In: Proceedings of ICSLP, Denver, Colorado, USA, 2002. 1561–1564Google Scholar
  10. 10.
    Deng L, Droppo J, Acero A. Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Trans Speech Audio, 2005, 13: 412–421CrossRefGoogle Scholar
  11. 11.
    Kristjansson T T, Frey B J. Accounting for uncertainty in observations: A new paradigm for robust speech recognition. In: Proceedings of ICASSP, Orlando, Florida, USA, 2002. 61–64Google Scholar
  12. 12.
    Droppo J, Acero A, Deng L. Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proceedings of ICASSP, Orlando, Florida, USA, 2002. 57–60Google Scholar
  13. 13.
    Liao H, Gales M J F. Issues with uncertainty decoding for noise robust speech recognition. In: Proceedings of Interspeech, Pittsburgh, PA, USA, 2006Google Scholar
  14. 14.
    Benitez C, Segura J, de la Tore A, et al. Including uncertainty of speech observation in robust speech recognition. In: Proceedings of ICSLP, Jeju island, Korea, 2004. 137–140Google Scholar
  15. 15.
    Liao H, Gales M J F. Joint uncertainty decoding for noise robust speech recognition. In: Proceedings of Interspeech, Lisbon, Portugal, 2005Google Scholar
  16. 16.
    Liao H, Gales M J F. Adaptive training with joint uncertainty decoding for robust recognition of noisy data. In: Proceedings of ICASSP, Honolulu, Hawaii, USA, 2007. 389–392Google Scholar
  17. 17.
    Arrowood J A, Clements M A. Using observation uncertainty in HMM decoding. In: Proceedings of ICSLP, Denver, Colorado, USA, 2002. 1561–1564Google Scholar
  18. 18.
    Steouten V, van Hamme H, Wambacq P. Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement. In: Proceedings of ICSLP, Jeju island, Korea, 2004. 105–108Google Scholar
  19. 19.
    Deng L, Droppo J, Acero A. Exploiting variances in robust feature extraction based on a parametric model of speech distortion. In: Proceedings of ICSLP, Jeju island, Korea, 2002. 806–809Google Scholar
  20. 20.
    Wolfel M, Faubel F. Considering uncertainty by particle filter enhanced speech feature in large vocabulary continuous speech recognition. In: Proceedings of ICASSP, Honolulu, Hawaii, USA, 2007. 1049–1052Google Scholar
  21. 21.
    Fujinaga K, Nakai M, Shimodaira H, et al. Multiple-regression hidden Markov model. In: Proceedings of IEEE ICASSP, Salt Lake City, Utah, USA, 2001. 1: 513–516Google Scholar
  22. 22.
    Cui X, Gong Y. A study of variable-parameter Gaussian mixture hidden Markov modeling for noisy speech recognition. IEEE Trans Audio Speech Lang Process, 2007, 15: 1366–1376CrossRefGoogle Scholar
  23. 23.
    Yu D, Deng L, Gong Y, et al. Discriminative training of variable-parameter HMMs for noise robust speech recognition. In: Proceedings of Interspeech, Brisbane, Australia, 2008. 285–288Google Scholar
  24. 24.
    Yu D, Deng L, Gong Y, et al. Parameter clustering and sharing in variable-parameter HMMs for noise robust speech recognition. In: Proceedings of Interspeech, Brisbane, Australia, 2008. 1253–1256Google Scholar
  25. 25.
    Yu D, Deng L, Gong Y, et al. A novel framework and training algorithm for variable-parameter hidden Markov models. IEEE Trans Audio Speech Lang Process, 2009, 17: 1348–1360CrossRefGoogle Scholar
  26. 26.
    Bjorck A, Pereyra V. Solution of Vandermonde systems of equations. Math Comput (Am Math Soc), 1970, 24: 893–903CrossRefMathSciNetGoogle Scholar
  27. 27.
    Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc, 1977, 39: 1–39MATHMathSciNetGoogle Scholar
  28. 28.
    Martin R. An efficient algorithm to estimate the instantaneous SNR speech signals. In: Proceedings of Eurospeech, Berlin, Germany, 1993. 1093–1096Google Scholar
  29. 29.
    Young S, Evermann G, Gales M, et al. The HTK Book. Version 3.4.1. Cambridge: Cambridge University Engineering Department, 2009Google Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Shenzhen Institutes of Advanced TechnologyChinese Academy of Sciences/The Chinese University of Hong KongHong KongChina
  2. 2.Cambridge University Engineering DepartmentCambridgeUK

Personalised recommendations