Skip to main content

Donggan Speech Recognition Based on Convolution Neural Networks

  • Conference paper
  • First Online:
Data Science (ICPCSEE 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1058))

Abstract

Donggan language, which is a special variant of Mandarin, is used by Donggan people in Central Asia. Donggan language includes Gansu dialect and Shaanxi dialect. This paper proposes a convolutional neural network (CNN) based Donggan language speech recognition method for the Donggan Shaanxi dialect. A text corpus and a pronunciation dictionary were designed for of Donggan Shannxi dialect and the corresponding speech corpus was recorded. Then the acoustic models of Donggan Shaanxi dialect was trained by CNN. Experimental results demonstrate that the recognition rate of proposed CNN-based method achieves lower word error rate than that of the monophonic hidden Markov model (HMM) based method, triphone HMM-based method and DNN- based method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lin, T.: Research on Donggan Language. China Social Science Press (2012)

    Google Scholar 

  2. Wang, S.: Survey and Research on the Chinese and Asian Donggan Dialect. Commercia (2015)

    Google Scholar 

  3. Furui, S.: History and development of speech recognition. In: Chen, F., Huggins (eds.) Speech Technology, pp. 1–18. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-73819-2_1

    Google Scholar 

  4. Hai, J., Joo, E.M.: Improved linear predictive coding method for speech recognition. In: Conference on Joint Conference of the Fourth International Conference on Information, Communications & Signal Processing (2003)

    Google Scholar 

  5. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (2003)

    Article  Google Scholar 

  6. Jing, Z., Qin, B.: DTW speech recognition algorithm of optimization template matching. In: World Automation Congress (2012)

    Google Scholar 

  7. Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques. Ttps 2 (2010)

    Google Scholar 

  8. Omer, A.E.: Joint MFCC-and-vector quantization based text-independent speaker recognition system. In: International Conference on Communication (2017)

    Google Scholar 

  9. Hinton, G., Deng, L., Yu, D., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  10. Nguyen, Q.B., Vu, T.T., Chi, M.L.: Improving acoustic model for English ASR System using deep neural network. In: IEEE Rivf International Conference on Computing & Communication Technologies-research (2015)

    Google Scholar 

  11. Hu, W., Fu, M., Pan, W.: Primi speech recognition based on deep neural network. In: IEEE International Conference on Intelligent Systems. IEEE (2016)

    Google Scholar 

  12. Karáfidt, M., Baskar, M.K., Veselý, K., et al.: Analysis of multilingual BLSTM acoustic model on low and high resource languages. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5789–5793. IEEE (2018)

    Google Scholar 

  13. Abdel-Hamid, O., Mohamed, A., Jiang, H., et al.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)

    Article  Google Scholar 

  14. Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint arXiv:1301.3557 (2013)

  15. Rashmi, S., Hanumanthappa, M., Reddy, M.V.: Hidden Markov Model for speech recognition system—a pilot study and a naive approach for speech-to-text model. In: Agrawal, S.S., Devi, A., Wason, R., Bansal, P. (eds.) Speech and Language Processing for Human-Machine Communications. AISC, vol. 664, pp. 77–90. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-6626-9_9

    Chapter  Google Scholar 

  16. Dighe, P., Luyet, G., Asaei, A., et al.: Exploiting low-dimensional structures to enhance DNN based acoustic modeling in speech recognition. In: IEEE International Conference on Acoustics (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongwu Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, H., You, Y., Yang, H. (2019). Donggan Speech Recognition Based on Convolution Neural Networks. In: Cheng, X., Jing, W., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2019. Communications in Computer and Information Science, vol 1058. Springer, Singapore. https://doi.org/10.1007/978-981-15-0118-0_44

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-0118-0_44

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-0117-3

  • Online ISBN: 978-981-15-0118-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics