Skip to main content

On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model

  • Conference paper
Speech and Computer (SPECOM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

Abstract

In this paper, we present a new NN/HMM speech recognition system with a NN-base acoustic model and RNN-based language model. The employed neural-network-based acoustic model computes posteriors for states of context-dependent acoustic units. A recurrent neural network with the maximum entropy extension was used as a language model. This hybrid NN/HMM system was compared with our previous hybrid NN/HMM system equipped with a standard n-gram language model. In our experiments, we also compared it to a standard GMM/HMM system. The system performance was evaluated on the British English speech corpus and compared with some previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zelinka, J., Trmal, J., Müller, L.: On context-dependent neural networks and speaker adaptation. In: Proceedings 2012 IEEE 11th International Conference on Signal Processing (2012)

    Google Scholar 

  2. Wang, G., Sim, K.C.: Sequential classification criteria for nns in automatic speech recognition. In: INTERSPEECH, pp. 441–444 (2011)

    Google Scholar 

  3. Mikolov, T.: Statistical Language Models Based On Neural Networs. Ph. D. Thesis (2012)

    Google Scholar 

  4. Young, S.J., Evermann, G., Gales, M.J.F., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK Book, version 3.4. Cambridge University Engineering Department, Cambridge (2006)

    Google Scholar 

  5. Grézl, F.: TRAP-Based Probabilistic Features for Automatic Speech Recognition. PhD thesis (2007)

    Google Scholar 

  6. Trmal, J.: Spatio-temporal structure of feature vectors in neural network adaptation. PhD thesis (2012)

    Google Scholar 

  7. Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the rprop algorithm. In: IEEE International Conference on Neural Networks, 1993, vol. 1, pp. 586–591 (1993)

    Google Scholar 

  8. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(3), 1137–1155 (2003)

    MATH  Google Scholar 

  9. Mikolov, T., Kombrink, S., Burget, L., Cernocky, J.H., Sanjeev, K.: Extensions of recurrent neural network language model. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 5531, p. 5528 (2011)

    Google Scholar 

  10. Garofalo, J., et al.: CSR-I (WSJ0) Complete. Linguistic Data Consortium, Philadelphia (2007)

    Google Scholar 

  11. Psutka, J., Švec, J., Psutka, J.V., Vaněk, J., Pražák, A., Šmídl, L.: Fast phonetic/Lexical searching in the archives of the czech holocaust testimonies: Advancing towards the MALACH project visions. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 385–391. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Pražák, A., Psutka, J.V., Psutka, J., Loose, Z.: Towards live subtitling of tv ice-hockey commentary. In: Cabello, E., Virvou, M., Obaidat, M.S., Ji, H., Nicopolitidis, P., Vergados, D.D. (eds.) SIGMAP, pp. 151–155. SciTePress (2013)

    Google Scholar 

  13. Robinson, T., Fransen, J., Pye, D., Foote, J., Renals, S.: Wsjcam0: A british english speech corpus for large vocabulary continuous speech recognition. In: IEEE Proc. ICASSP 1995, pp. 81–84 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Soutner, D., Zelinka, J., Müller, L. (2014). On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11581-8_39

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11580-1

  • Online ISBN: 978-3-319-11581-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics