International Conference on Speech and Computer

SPECOM 2015: Speech and Computer pp 42-50 | Cite as

A Comparison of RNN LM and FLM for Russian Speech Recognition

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9319)

Abstract

In the paper, we describe a research of recurrent neural network (RNN) language model (LM) for N-best list rescoring for automatic continuous Russian speech recognition and make a comparison of it with factored language model (FLM). We tried RNN with different number of units in the hidden layer. For FLM creation, we used five linguistic factors: word, lemma, stem, part-of-speech, and morphological tag. All models were trained on the text corpus of 350M words. Also we made linear interpolation of RNN LM and FLM with the baseline 3-gram LM. We achieved the relative WER reduction of 8 % using FLM and 14 % relative WER reduction using RNN LM with respect to the baseline model.

Keywords

Recurrent neural networks Language models Automatic speech recognition Russian speech 

Notes

Acknowledgments

This research is partially supported by the Council for Grants of the President of Russia (Projects No. MK-5209.2015.8 and MD-3035.2015.8), by the Russian Foundation for Basic Research (Projects No. 15-07-04415 and 15-07-04322), and by the Government of the Russian Federation (Grant No. 074-U01).

References

  1. 1.
    Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Companion Volume of the Proceedings of HLT-NAACL 2003-Short Papers, vol. 2, pp. 4–6. Association for Computational Linguistics (2003)Google Scholar
  2. 2.
    Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)CrossRefGoogle Scholar
  3. 3.
    Huang, Z., Zweig, G., Dumoulin, B.: Cache based recurrent neural network language model inference for first pass speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6354–6358. IEEE (2014)Google Scholar
  4. 4.
    Jokisch, O., Wagner, A., Sabo, R., Jaeckel, R., Cylwik, N., Rusko, M., Ronzhin, A., Hoffmann, R.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proceedings of the SPECOM, pp. 515–520 (2009)Google Scholar
  5. 5.
    Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Twelfth Annual Conference of the International Speech Communication Association (2011)Google Scholar
  6. 6.
    Karpov, A., Markov, K., Kipyatkova, I., Vazhenina, D., Ronzhin, A.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun. 56, 213–228 (2014)CrossRefGoogle Scholar
  7. 7.
    Kipyatkova, I., Karpov, A.: Lexicon size and language model order optimization for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 219–226. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  8. 8.
    Kipyatkova, I., Karpov, A.: Development of factored language models for automatic Russian speech recognition. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference “Dialogue”, pp. 234–246 (2015)Google Scholar
  9. 9.
    Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Proceedings of APSIPA ASC 2009, 2009 Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association, pp. 131–137. International Organizing Committee (2009)Google Scholar
  10. 10.
    Mikolov, T., Deoras, A., Povey, D., Burget, L., Černockỳ, J.: Strategies for training large scale neural network language models. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 196–201. IEEE (2011)Google Scholar
  11. 11.
    Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, 26–30 September 2010Google Scholar
  12. 12.
    Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Cernocky, J.: RNNLM-recurrent neural network language modeling toolkit. In: Proceedings of the ASRU Workshop, pp. 196–201 (2011)Google Scholar
  13. 13.
    Schwenk, H., Gauvain, J.L.: Training neural network language models on very large corpora. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 201–208 (2005)Google Scholar
  14. 14.
    Shi, Y., Larson, M., Wiggers, P., Jonker, C.M.: Exploiting the succeeding words in recurrent neural network language models. In: INTERSPEECH, pp. 632–636 (2013)Google Scholar
  15. 15.
    Sokirko, A.: Morphological modules on the website. In: Proceedings of Dialog 2004 International Conference, pp. 559–564 (2004). www.aot.ru
  16. 16.
    Stolcke, A., Zheng, J., Wang, W., Abrash, V.: Srilm at sixteen: update and outlook. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, p. 5 (2011)Google Scholar
  17. 17.
    Sundermeyer, M., Oparin, I., Gauvain, J.L., Freiberg, B., Schluter, R., Ney, H.: Comparison of feedforward and recurrent neural network language models. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8430–8434 (2013)Google Scholar
  18. 18.
    Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)Google Scholar
  19. 19.
    Vazhenina, D., Markov, K.: Evaluation of advanced language modeling techniques for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 124–131. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  20. 20.
    Zulkarneev, M., Penalov, S.: System of speech recognition for Russian language, using deep neural networks and finite state transducers. Neurocomput. Develop. Appl. 10, 40–46 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.SPIIRASSt. PetersburgRussia
  2. 2.SUAISt. PetersburgRussia
  3. 3.University ITMOSt. PetersburgRussia

Personalised recommendations