On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model

Soutner, Daniel; Zelinka, Jan; Müller, Luděk

doi:10.1007/978-3-319-11581-8_39

Daniel Soutner²²,
Jan Zelinka²² &
Luděk Müller²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

International Conference on Speech and Computer

1333 Accesses
2 Citations

Abstract

In this paper, we present a new NN/HMM speech recognition system with a NN-base acoustic model and RNN-based language model. The employed neural-network-based acoustic model computes posteriors for states of context-dependent acoustic units. A recurrent neural network with the maximum entropy extension was used as a language model. This hybrid NN/HMM system was compared with our previous hybrid NN/HMM system equipped with a standard n-gram language model. In our experiments, we also compared it to a standard GMM/HMM system. The system performance was evaluated on the British English speech corpus and compared with some previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zelinka, J., Trmal, J., Müller, L.: On context-dependent neural networks and speaker adaptation. In: Proceedings 2012 IEEE 11th International Conference on Signal Processing (2012)
Google Scholar
Wang, G., Sim, K.C.: Sequential classification criteria for nns in automatic speech recognition. In: INTERSPEECH, pp. 441–444 (2011)
Google Scholar
Mikolov, T.: Statistical Language Models Based On Neural Networs. Ph. D. Thesis (2012)
Google Scholar
Young, S.J., Evermann, G., Gales, M.J.F., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK Book, version 3.4. Cambridge University Engineering Department, Cambridge (2006)
Google Scholar
Grézl, F.: TRAP-Based Probabilistic Features for Automatic Speech Recognition. PhD thesis (2007)
Google Scholar
Trmal, J.: Spatio-temporal structure of feature vectors in neural network adaptation. PhD thesis (2012)
Google Scholar
Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the rprop algorithm. In: IEEE International Conference on Neural Networks, 1993, vol. 1, pp. 586–591 (1993)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(3), 1137–1155 (2003)
MATH Google Scholar
Mikolov, T., Kombrink, S., Burget, L., Cernocky, J.H., Sanjeev, K.: Extensions of recurrent neural network language model. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 5531, p. 5528 (2011)
Google Scholar
Garofalo, J., et al.: CSR-I (WSJ0) Complete. Linguistic Data Consortium, Philadelphia (2007)
Google Scholar
Psutka, J., Švec, J., Psutka, J.V., Vaněk, J., Pražák, A., Šmídl, L.: Fast phonetic/Lexical searching in the archives of the czech holocaust testimonies: Advancing towards the MALACH project visions. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 385–391. Springer, Heidelberg (2010)
Chapter Google Scholar
Pražák, A., Psutka, J.V., Psutka, J., Loose, Z.: Towards live subtitling of tv ice-hockey commentary. In: Cabello, E., Virvou, M., Obaidat, M.S., Ji, H., Nicopolitidis, P., Vergados, D.D. (eds.) SIGMAP, pp. 151–155. SciTePress (2013)
Google Scholar
Robinson, T., Fransen, J., Pye, D., Foote, J., Renals, S.: Wsjcam0: A british english speech corpus for large vocabulary continuous speech recognition. In: IEEE Proc. ICASSP 1995, pp. 81–84 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Applied Sciences, New Technologies for the Information Society, University of West Bohemia, Univerzitní 22, 306 14, Plzeň, Czech Republic
Daniel Soutner, Jan Zelinka & Luděk Müller

Authors

Daniel Soutner
View author publications
You can also search for this author in PubMed Google Scholar
Jan Zelinka
View author publications
You can also search for this author in PubMed Google Scholar
Luděk Müller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation of the Russian Academy of Sciences, 39, 14th line, 199178, St. Petersburg, Russia
Andrey Ronzhin
Institute of Applied and Mathematical Linguistics, Moscow State Linguistic University, 38, Ostozhenka, 119034, Moscow, Russia
Rodmonga Potapova
Faculty of Technical Sciences, University of Novi Sad, 6, Trg Dositeja Obradovića, 21000, Novi Sad, Serbia
Vlado Delic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soutner, D., Zelinka, J., Müller, L. (2014). On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-11581-8_39
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics