Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition

Kipyatkova, Irina

doi:10.1007/978-3-319-66429-3_35

Irina Kipyatkova^16,17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

2260 Accesses
6 Citations

Abstract

In this paper, we study an application of time delay neural networks (TDNNs) in acoustic modeling for large vocabulary continuous Russian speech recognition. We created TDNNs with various numbers of hidden layers and units in the hidden layers with p-norm nonlinearity. Training of acoustic models was carried out on our own Russian speech corpus containing phonetically balanced phrases. Duration of the speech corpus is more than 30 h. Testing of TDNN-based acoustic models was performed in the very large vocabulary continuous Russian speech recognition task. Conducted experiments showed that TDNN models outperformed baseline deep neural network models in terms of the word error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Neural Networks in Russian Speech Recognition

DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi

Lhasa Dialect Recognition of Different Phonemes Based on TDNN Method

References

Yu, D., Deng, L.: Automatic Speech Recognition. A Deep Learning Approach. Springer, London (2015)
MATH Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sign. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Kipyatkova I., Karpov, A.: Variants of deep artificial neural networks for speech recognition systems. In: SPIIRAS Proceedings, vol. 6(49), pp. 80–103 (2016). (in Russian) doi:http://dx.doi.org/10.15622/sp.49.5
Deng, L.: Deep learning: from speech recognition to language and multimodal processing. APSIPA Trans. Sign. Inf. Process. 5, 1–15 (2016)
Google Scholar
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: INTERSPEECH 2011, pp. 437– 440 (2011)
Google Scholar
Delcroix, M., Kinoshita, K., Ogawa, A., Yoshioka, T., Tran, D., Nakatani, T.: Context adaptive neural network for rapid adaptation of deep CNN based acoustic models. In: INTERSPEECH 2016, pp. 1573–1577 (2016)
Google Scholar
Tran, D.T., Delcroix, M., Ogawa, A., Huemmer, C., Nakatani, T.: Feedback connection for deep neural network-based acoustic modeling. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017), pp. 5240–5244 (2017)
Google Scholar
Geiger, J.T., Zhang, Z., Weninger, F., Schuller, B., Rigoll, G.: Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling. In: INTERSPEECH 2014, pp. 631–635 (2014)
Google Scholar
Peddini, V., Povey, D., Khundanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: INTERSPEECH 2015, pp. 3214–3218 (2015)
Google Scholar
Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: INTERSPEECH 2014, pp. 2997–3001 (2014)
Google Scholar
Prudnikov, A., Medennikov, I., Mendelev, V., Korenevsky, M., Khokhlov, Y.: Improving acoustic models for Russian spontaneous speech recognition. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS(LNAI), vol. 9319, pp. 234–242. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_29
Chapter Google Scholar
Kipyatkova, I., Karpov, A.: DNN-based acoustic modeling for Russian speech recognition using kaldi. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS, vol. 9811, pp. 246–253. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_29
Chapter Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding ASRU (2011)
Google Scholar
Saon, G., Soltau, H., Nahamoo, D., Picheny, M.: Speaker adaptation of neural network acoustic models using i-Vectors. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 55–59 (2013)
Google Scholar
Povey, D., Zhang, X., Khudanpur, S.: Parallel training of DNNs with natural gradient and parameter averaging (2014). Preprint: arXiv:1410.7455, http://arxiv.org/pdf/1410.7455v8.pdf
Zhang X., Trmal J., Povey D., Khudanpur S.: Improving deep neural network acoustic models using generalized maxout networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 215–219 (2014)
Google Scholar
Gapochkin, A.V.: Neural networks in speech recognition systems. Sci. Time 1(1), 29–36 (2014). (in Russian)
Google Scholar
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.: Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Sign. Process. 37(3), 328–339 (1989)
Article Google Scholar
Karpov, A., Markov, K., Kipyatkova, I., Vazhenina, D., Ronzhin, A.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun. 56, 213–228 (2014)
Article Google Scholar
Stolcke, A., Zheng, J., Wang, W., Abrash, V.: SRILM at sixteen: update and outlook. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop ASRU 2011 (2011)
Google Scholar
Kipyatkova, I., Karpov, A.: Lexicon size and language model order optimization for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS(LNAI), vol. 8113, pp. 219–226. Springer, Cham (2013). doi:10.1007/978-3-319-01931-4_29
Chapter Google Scholar
Kipyatkova, I., Karpov, A., Verkhodanova, V., Zelezny, M.: Modeling of pronunciation, language and nonverbal units at conversational russian speech recognition. Int. J. Comput. Sci. Appl. 10(1), 11–30 (2013)
Google Scholar
Jokisch, O., Wagner, A., Sabo, R., Jaeckel, R., Cylwik, N., Rusko, M., Ronzhin A., Hoffmann, R.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proceedings of SPECOM 2009, pp. 515–520 (2009)
Google Scholar
State Standard P 50840–95. Speech transmission by communication paths. Evaluation methods of quality, intelligibility and recognizability, p. 230. Standartov Publ., Moscow (1996). (in Russian)
Google Scholar
Stepanova, S.B.: Phonetic features of Russian speech: realization and transcription, Ph.D. thesis (1988). (in Russian)
Google Scholar
Verkhodanova, V., Ronzhin, A., Kipyatkova, I., Ivanko, D., Karpov, A., Železný, M.: HAVRUS corpus: high-speed recordings of audio-visual Russian speech. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS, vol. 9811, pp. 338–345. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_40
Chapter Google Scholar
Karpov, A.A., Ronzhin, A.L.: Information enquiry kiosk with multimodal user interface. Pattern Recogn. Image Anal. 19(3), 546–558 (2009)
Article Google Scholar
Kipyatkova, I., Karpov, A.: A study of neural network Russian language models for automatic continuous speech recognition systems. Autom. Remote Control 78(5), 858–867 (2017). Springer
Article Google Scholar

Download references

Acknowledgments

This research is partially supported by the Council for Grants of the President of the Russian Federation (project No. MK-1000.2017.8) and by the Russian Foundation for Basic Research (project No. 15–07–04322).

Author information

Authors and Affiliations

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), St. Petersburg, Russia
Irina Kipyatkova
St. Petersburg State University of Aerospace Instrumentation (SUAI), St. Petersburg, Russia
Irina Kipyatkova

Authors

Irina Kipyatkova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irina Kipyatkova .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kipyatkova, I. (2017). Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_35
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition

Abstract

Access this chapter

Similar content being viewed by others

Deep Neural Networks in Russian Speech Recognition

DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi

Lhasa Dialect Recognition of Different Phonemes Based on TDNN Method

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition

Abstract

Access this chapter

Similar content being viewed by others

Deep Neural Networks in Russian Speech Recognition

DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi

Lhasa Dialect Recognition of Different Phonemes Based on TDNN Method

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation