Abstract
In this paper, we present a new NN/HMM speech recognition system with a NN-base acoustic model and RNN-based language model. The employed neural-network-based acoustic model computes posteriors for states of context-dependent acoustic units. A recurrent neural network with the maximum entropy extension was used as a language model. This hybrid NN/HMM system was compared with our previous hybrid NN/HMM system equipped with a standard n-gram language model. In our experiments, we also compared it to a standard GMM/HMM system. The system performance was evaluated on the British English speech corpus and compared with some previous work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zelinka, J., Trmal, J., Müller, L.: On context-dependent neural networks and speaker adaptation. In: Proceedings 2012 IEEE 11th International Conference on Signal Processing (2012)
Wang, G., Sim, K.C.: Sequential classification criteria for nns in automatic speech recognition. In: INTERSPEECH, pp. 441–444 (2011)
Mikolov, T.: Statistical Language Models Based On Neural Networs. Ph. D. Thesis (2012)
Young, S.J., Evermann, G., Gales, M.J.F., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK Book, version 3.4. Cambridge University Engineering Department, Cambridge (2006)
Grézl, F.: TRAP-Based Probabilistic Features for Automatic Speech Recognition. PhD thesis (2007)
Trmal, J.: Spatio-temporal structure of feature vectors in neural network adaptation. PhD thesis (2012)
Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the rprop algorithm. In: IEEE International Conference on Neural Networks, 1993, vol. 1, pp. 586–591 (1993)
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(3), 1137–1155 (2003)
Mikolov, T., Kombrink, S., Burget, L., Cernocky, J.H., Sanjeev, K.: Extensions of recurrent neural network language model. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 5531, p. 5528 (2011)
Garofalo, J., et al.: CSR-I (WSJ0) Complete. Linguistic Data Consortium, Philadelphia (2007)
Psutka, J., Švec, J., Psutka, J.V., Vaněk, J., Pražák, A., Šmídl, L.: Fast phonetic/Lexical searching in the archives of the czech holocaust testimonies: Advancing towards the MALACH project visions. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 385–391. Springer, Heidelberg (2010)
Pražák, A., Psutka, J.V., Psutka, J., Loose, Z.: Towards live subtitling of tv ice-hockey commentary. In: Cabello, E., Virvou, M., Obaidat, M.S., Ji, H., Nicopolitidis, P., Vergados, D.D. (eds.) SIGMAP, pp. 151–155. SciTePress (2013)
Robinson, T., Fransen, J., Pye, D., Foote, J., Renals, S.: Wsjcam0: A british english speech corpus for large vocabulary continuous speech recognition. In: IEEE Proc. ICASSP 1995, pp. 81–84 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Soutner, D., Zelinka, J., Müller, L. (2014). On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-11581-8_39
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)