Skip to main content

Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

Abstract

In this paper, we study an application of time delay neural networks (TDNNs) in acoustic modeling for large vocabulary continuous Russian speech recognition. We created TDNNs with various numbers of hidden layers and units in the hidden layers with p-norm nonlinearity. Training of acoustic models was carried out on our own Russian speech corpus containing phonetically balanced phrases. Duration of the speech corpus is more than 30 h. Testing of TDNN-based acoustic models was performed in the very large vocabulary continuous Russian speech recognition task. Conducted experiments showed that TDNN models outperformed baseline deep neural network models in terms of the word error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Yu, D., Deng, L.: Automatic Speech Recognition. A Deep Learning Approach. Springer, London (2015)

    MATH  Google Scholar 

  2. Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sign. Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  3. Kipyatkova I., Karpov, A.: Variants of deep artificial neural networks for speech recognition systems. In: SPIIRAS Proceedings, vol. 6(49), pp. 80–103 (2016). (in Russian) doi:http://dx.doi.org/10.15622/sp.49.5

  4. Deng, L.: Deep learning: from speech recognition to language and multimodal processing. APSIPA Trans. Sign. Inf. Process. 5, 1–15 (2016)

    Google Scholar 

  5. Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: INTERSPEECH 2011, pp. 437– 440 (2011)

    Google Scholar 

  6. Delcroix, M., Kinoshita, K., Ogawa, A., Yoshioka, T., Tran, D., Nakatani, T.: Context adaptive neural network for rapid adaptation of deep CNN based acoustic models. In: INTERSPEECH 2016, pp. 1573–1577 (2016)

    Google Scholar 

  7. Tran, D.T., Delcroix, M., Ogawa, A., Huemmer, C., Nakatani, T.: Feedback connection for deep neural network-based acoustic modeling. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017), pp. 5240–5244 (2017)

    Google Scholar 

  8. Geiger, J.T., Zhang, Z., Weninger, F., Schuller, B., Rigoll, G.: Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling. In: INTERSPEECH 2014, pp. 631–635 (2014)

    Google Scholar 

  9. Peddini, V., Povey, D., Khundanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: INTERSPEECH 2015, pp. 3214–3218 (2015)

    Google Scholar 

  10. Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: INTERSPEECH 2014, pp. 2997–3001 (2014)

    Google Scholar 

  11. Prudnikov, A., Medennikov, I., Mendelev, V., Korenevsky, M., Khokhlov, Y.: Improving acoustic models for Russian spontaneous speech recognition. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS(LNAI), vol. 9319, pp. 234–242. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_29

    Chapter  Google Scholar 

  12. Kipyatkova, I., Karpov, A.: DNN-based acoustic modeling for Russian speech recognition using kaldi. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS, vol. 9811, pp. 246–253. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_29

    Chapter  Google Scholar 

  13. Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding ASRU (2011)

    Google Scholar 

  14. Saon, G., Soltau, H., Nahamoo, D., Picheny, M.: Speaker adaptation of neural network acoustic models using i-Vectors. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 55–59 (2013)

    Google Scholar 

  15. Povey, D., Zhang, X., Khudanpur, S.: Parallel training of DNNs with natural gradient and parameter averaging (2014). Preprint: arXiv:1410.7455, http://arxiv.org/pdf/1410.7455v8.pdf

  16. Zhang X., Trmal J., Povey D., Khudanpur S.: Improving deep neural network acoustic models using generalized maxout networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 215–219 (2014)

    Google Scholar 

  17. Gapochkin, A.V.: Neural networks in speech recognition systems. Sci. Time 1(1), 29–36 (2014). (in Russian)

    Google Scholar 

  18. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.: Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Sign. Process. 37(3), 328–339 (1989)

    Article  Google Scholar 

  19. Karpov, A., Markov, K., Kipyatkova, I., Vazhenina, D., Ronzhin, A.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun. 56, 213–228 (2014)

    Article  Google Scholar 

  20. Stolcke, A., Zheng, J., Wang, W., Abrash, V.: SRILM at sixteen: update and outlook. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop ASRU 2011 (2011)

    Google Scholar 

  21. Kipyatkova, I., Karpov, A.: Lexicon size and language model order optimization for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS(LNAI), vol. 8113, pp. 219–226. Springer, Cham (2013). doi:10.1007/978-3-319-01931-4_29

    Chapter  Google Scholar 

  22. Kipyatkova, I., Karpov, A., Verkhodanova, V., Zelezny, M.: Modeling of pronunciation, language and nonverbal units at conversational russian speech recognition. Int. J. Comput. Sci. Appl. 10(1), 11–30 (2013)

    Google Scholar 

  23. Jokisch, O., Wagner, A., Sabo, R., Jaeckel, R., Cylwik, N., Rusko, M., Ronzhin A., Hoffmann, R.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proceedings of SPECOM 2009, pp. 515–520 (2009)

    Google Scholar 

  24. State Standard P 50840–95. Speech transmission by communication paths. Evaluation methods of quality, intelligibility and recognizability, p. 230. Standartov Publ., Moscow (1996). (in Russian)

    Google Scholar 

  25. Stepanova, S.B.: Phonetic features of Russian speech: realization and transcription, Ph.D. thesis (1988). (in Russian)

    Google Scholar 

  26. Verkhodanova, V., Ronzhin, A., Kipyatkova, I., Ivanko, D., Karpov, A., Železný, M.: HAVRUS corpus: high-speed recordings of audio-visual Russian speech. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS, vol. 9811, pp. 338–345. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_40

    Chapter  Google Scholar 

  27. Karpov, A.A., Ronzhin, A.L.: Information enquiry kiosk with multimodal user interface. Pattern Recogn. Image Anal. 19(3), 546–558 (2009)

    Article  Google Scholar 

  28. Kipyatkova, I., Karpov, A.: A study of neural network Russian language models for automatic continuous speech recognition systems. Autom. Remote Control 78(5), 858–867 (2017). Springer

    Article  Google Scholar 

Download references

Acknowledgments

This research is partially supported by the Council for Grants of the President of the Russian Federation (project No. MK-1000.2017.8) and by the Russian Foundation for Basic Research (project No. 15–07–04322).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irina Kipyatkova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kipyatkova, I. (2017). Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics