Skip to main content

Mongolian Speech Recognition Based on Deep Neural Networks

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (CCL 2015, NLP-NABD 2015)

Abstract

Mongolian is an influential language. And better Mongolian Large Vocabulary Continuous Speech Recognition (LVCSR) systems are required. Recently, the research of speech recognition has achieved a big improvement by introducing the Deep Neural Networks (DNNs). In this study, a DNN-based Mongolian LVCSR system is built. Experimental results show that the DNN-based models outperform the conventional models which based on Gaussian Mixture Models (GMMs) for the Mongolian speech recognition, by a large margin. Compared with the best GMM-based model, the DNN-based one obtains a relative improvement over 50 %. And it becomes a new state-of-the-art system in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Lewis, M.P., Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, 18th edn. Sil International, Dallas, TX (2015). http://www.ethnologue.com

    Google Scholar 

  2. Gao, G., Biligetu, Nabuqing, Zhang, S.: A mongolian speech recognition system based on HMM. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS (LNAI), vol. 4114, pp. 667–676. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Qilao, H., Gao, G.: Researching of speech recognition oriented mongolian acoustic model. In: Chinese Conference on Pattern Recognition, CCPR 2008, pp. 406–411. IEEE (2008)

    Google Scholar 

  4. Bao, F., Gao, G.: Improving of acoustic model for the mongolian speech recognition system. In: Chinese Conference on Pattern Recognition, CCPR 2009, pp. 616–620. IEEE (2009)

    Google Scholar 

  5. Bao, F., Gao, G., Yan, X., Wang, W.: Segmentation-based mongolian LVCSR approach. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. pp. 8136–8139. IEEE (2013)

    Google Scholar 

  6. Ayush, A., Damdinsuren, B.: A design and implementation of HMM based mongolian speech recognition system. In: 2013 8th International Forum on Strategic Technology (IFOST), vol. 2, pp. 341–344, June 2013

    Google Scholar 

  7. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  8. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  9. Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)

    Article  Google Scholar 

  10. Mohamed, A.-R., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012)

    Google Scholar 

  11. Gales, M.J.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)

    Article  Google Scholar 

  12. Forney Jr., G.D.: The viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  14. Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., Gauvain, J.-L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning, pp. 137–186. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlíček, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit (2011)

    Google Scholar 

  16. Stolcke, A., et al.: SRILM-an extensible language modeling toolkit. In: INTERSPEECH (2002)

    Google Scholar 

  17. Bao, F., Gao, G., Yan, X., Wang, H.: Language model for cyrillic mongolian to traditional mongolian conversion. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds.) NLPCC 2013. CCIS, vol. 400, pp. 13–18. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  18. Bao, F., Gao, G., Yan, X., Wei, H.: Research on conversion approach between traditional mongolian and cyrillic mongolian. Comput. Eng. Appl. 2014(23), 206–211 (2014)

    Google Scholar 

  19. Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1045–1048 (2010)

    Google Scholar 

  20. Sundermeyer, M., Oparin, I., Gauvain, J.L., Freiberg, B., Schlüter, R., Ney, H.: Comparison of feedforward and recurrent neural network language models. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8430–8434 (2013)

    Google Scholar 

  21. Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., et al.: Deepspeech: Scaling up end-to-end speech recognition (2014). arXiv preprint arXiv:1412.5567

  22. Chan, W., Lane, I.: Deep recurrent neural networks for acoustic modelling (2015). arXiv preprint arXiv:1504.01482

Download references

Acknowledgements

This research was supported in part by the China national nature science foundation (No.61263037), Inner Mongolia nature science foundation (No. 2014BS0604) and the program of high-level talents of Inner Mongolia University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feilong Bao .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, H., Bao, F., Gao, G. (2015). Mongolian Speech Recognition Based on Deep Neural Networks. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25816-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25815-7

  • Online ISBN: 978-3-319-25816-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics