Advertisement

Feature Space VTS with Phase Term Modeling

  • Maxim Korenevsky
  • Aleksei Romanenko
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9811)

Abstract

A new variant of Vector Taylor Series based features compensation algorithm is proposed. The phase-sensitive speech distortion model is used and the phase term is modeled as a multivariate gaussian with unknown mean vector and covariance matrix. These parameters are estimated based on Maximum Likelihood principle and EM-algorithm is used for this. EM formulas of parameter update are derived as well MMSE estimate of the clean speech features. The experiments on Aurora2 database show that taking phase term into account and data-driven estimation of its parameters result in relative WER reduction of about 20 % compared to phase-insensitive VTS version. The proposed method is also compared to the VTS with constant phase vector and this approximation is shown to be very efficient.

Keywords

Robust speech recognition Feature compensation Vector taylor series Distortion model Phase-sensitive Aurora2 

Notes

Acknowledgments

This work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.575.21.0033 (ID RFMEFI57514X0033).

References

  1. 1.
    Abdel-Hamid, O., Mohamed, A., Jiang, H., Penn, G.: Applying convolutional neural network concepts to hybrid nn-hmm model for speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4277–4280 (2012)Google Scholar
  2. 2.
    Acero, A., Deng, L., Kristjansson, T., Zhang, J.: Hmm adaptation using vector taylor series for noisy speech recognition. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 869–872 (2000)Google Scholar
  3. 3.
    Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)CrossRefGoogle Scholar
  4. 4.
    Deng, L., Droppo, J., Acero, A.: Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE Trans. Speech Audio Process. 12(2), 133–143 (2004)CrossRefGoogle Scholar
  5. 5.
    Gales, M., Flego, F.: Discriminative classifiers with adaptive kernels for noise robust speech recognition. Comput. Speech Lang. 24, 648–662 (2014)CrossRefGoogle Scholar
  6. 6.
    Graves, A., Jaitly, N., Mohamed, A.: Hybrid speech recognition with deep bidirectional lstm. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278 (2013)Google Scholar
  7. 7.
    Hirsch, H., Pearce, D.: The aurora experimental framework for the performance evaluations of speech recognition systems under noisy conditions. In: Proceedings of ISCA ITRWASR2000 on Automatic Speech Recognition: Challenges for the Next Millennium (2000)Google Scholar
  8. 8.
    Hu, Y., Huo, Q.: Irrelevant variability normalization based hmm training using vts approximation of an explicit model of environmental distortions. In: Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), pp. 1042–1045 (2007)Google Scholar
  9. 9.
    Kalinli, O., Seltzer, M., Droppo, J., Acero, A.: Noise adaptive training for robust automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 18(8), 1889–1901 (2010)CrossRefGoogle Scholar
  10. 10.
    Kim, D., Un, C., Kim, N.: Speech recognition in noisy environments using first-order vector taylor series. Speech Commun. 24, 39–49 (1998)CrossRefGoogle Scholar
  11. 11.
    Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: High-performance hmm adaptation with joint compensation of additive and convolutive distortions via vector taylor series. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 65–70 (2007)Google Scholar
  12. 12.
    Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: Efficient vts adaptation using jacobian approximation. In: Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), pp. 1906–1909 (2012)Google Scholar
  13. 13.
    Li, J., Seltzer, M., Gong, Y.: A unified framework of hmm adaptation with joint compensation of additive and convolutive distortions. Computer Speech Lang. 23, 389–405 (2009)CrossRefGoogle Scholar
  14. 14.
    Li, J., Seltzer, M., Gong, Y.: Improvements to vts feature enhancement. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4677–4680 (2012)Google Scholar
  15. 15.
    Liao, H.: Uncertainty Decoding for Noise Robust Speech Recognition. Ph.D. thesis, Sidney Sussex College University of Cambridge (2007)Google Scholar
  16. 16.
    Liao, H., Gales, M.: Joint uncertainty decoding for noise robust speech recognition. In: Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), pp. 1042–1045 (2005)Google Scholar
  17. 17.
    Liao, H., Gales, M.: Joint uncertainty decoding for robust large vocabulary speechrecognition. Technical report, Cambridge University Engeneering Department (2006)Google Scholar
  18. 18.
    Moreno, P.: Speech Recognition in Noisy Environments. Ph.D. thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University (1996)Google Scholar
  19. 19.
    Moreno, P., Raj, B., Stern, R.: A vector taylor series approach for environment-independent speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). vol. 2, pp. 733–736 (1996)Google Scholar
  20. 20.
    Paalanen, P., Kämäräinen, J., Kälviäinen, H.: Gmmbayes toolkit. http://www.it.lut.fi/project/gmmbayes
  21. 21.
    Stouten, V. Van hamme, H., Demuynck, K., Wambacq, P.: Robust speech recognition using model-based feature enhancement. In: Proceedings of 4th Annual Conference of the International Speech Communication Association (Interspeech), pp. 17–20 (2003)Google Scholar
  22. 22.
    Young, S.J., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book Version 3.4. Cambridge University Press, Cambridge (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.ITMO UniversitySaint PetersburgRussia
  2. 2.STC-Innovations Ltd.Saint PetersburgRussia

Personalised recommendations