Donggan Speech Recognition Based on Convolution Neural Networks

Xu, Haiyan; You, Yuren; Yang, Hongwu

doi:10.1007/978-981-15-0118-0_44

Haiyan Xu¹¹,
Yuren You¹¹ &
Hongwu Yang¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1058))

Included in the following conference series:

International Conference of Pioneering Computer Scientists, Engineers and Educators

1431 Accesses
1 Citations

Abstract

Donggan language, which is a special variant of Mandarin, is used by Donggan people in Central Asia. Donggan language includes Gansu dialect and Shaanxi dialect. This paper proposes a convolutional neural network (CNN) based Donggan language speech recognition method for the Donggan Shaanxi dialect. A text corpus and a pronunciation dictionary were designed for of Donggan Shannxi dialect and the corresponding speech corpus was recorded. Then the acoustic models of Donggan Shaanxi dialect was trained by CNN. Experimental results demonstrate that the recognition rate of proposed CNN-based method achieves lower word error rate than that of the monophonic hidden Markov model (HMM) based method, triphone HMM-based method and DNN- based method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lin, T.: Research on Donggan Language. China Social Science Press (2012)
Google Scholar
Wang, S.: Survey and Research on the Chinese and Asian Donggan Dialect. Commercia (2015)
Google Scholar
Furui, S.: History and development of speech recognition. In: Chen, F., Huggins (eds.) Speech Technology, pp. 1–18. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-73819-2_1
Google Scholar
Hai, J., Joo, E.M.: Improved linear predictive coding method for speech recognition. In: Conference on Joint Conference of the Fourth International Conference on Information, Communications & Signal Processing (2003)
Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (2003)
Article Google Scholar
Jing, Z., Qin, B.: DTW speech recognition algorithm of optimization template matching. In: World Automation Congress (2012)
Google Scholar
Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques. Ttps 2 (2010)
Google Scholar
Omer, A.E.: Joint MFCC-and-vector quantization based text-independent speaker recognition system. In: International Conference on Communication (2017)
Google Scholar
Hinton, G., Deng, L., Yu, D., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Nguyen, Q.B., Vu, T.T., Chi, M.L.: Improving acoustic model for English ASR System using deep neural network. In: IEEE Rivf International Conference on Computing & Communication Technologies-research (2015)
Google Scholar
Hu, W., Fu, M., Pan, W.: Primi speech recognition based on deep neural network. In: IEEE International Conference on Intelligent Systems. IEEE (2016)
Google Scholar
Karáfidt, M., Baskar, M.K., Veselý, K., et al.: Analysis of multilingual BLSTM acoustic model on low and high resource languages. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5789–5793. IEEE (2018)
Google Scholar
Abdel-Hamid, O., Mohamed, A., Jiang, H., et al.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)
Article Google Scholar
Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint arXiv:1301.3557 (2013)
Rashmi, S., Hanumanthappa, M., Reddy, M.V.: Hidden Markov Model for speech recognition system—a pilot study and a naive approach for speech-to-text model. In: Agrawal, S.S., Devi, A., Wason, R., Bansal, P. (eds.) Speech and Language Processing for Human-Machine Communications. AISC, vol. 664, pp. 77–90. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-6626-9_9
Chapter Google Scholar
Dighe, P., Luyet, G., Asaei, A., et al.: Exploiting low-dimensional structures to enhance DNN based acoustic modeling in speech recognition. In: IEEE International Conference on Acoustics (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Physics and Electronic Engineering, Northwest Normal University, Lanzhou, 730070, China
Haiyan Xu, Yuren You & Hongwu Yang

Authors

Haiyan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuren You
View author publications
You can also search for this author in PubMed Google Scholar
Hongwu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongwu Yang .

Editor information

Editors and Affiliations

Guilin University of Technology, Guilin, China
Xiaohui Cheng
Northeast Forestry University, Harbin, China
Weipeng Jing
Harbin University of Science and Technology, Harbin, China
Xianhua Song
National Academy of Guo Ding Institute of Data Science, Harbin, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, H., You, Y., Yang, H. (2019). Donggan Speech Recognition Based on Convolution Neural Networks. In: Cheng, X., Jing, W., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2019. Communications in Computer and Information Science, vol 1058. Springer, Singapore. https://doi.org/10.1007/978-981-15-0118-0_44

Download citation

DOI: https://doi.org/10.1007/978-981-15-0118-0_44
Published: 13 September 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0117-3
Online ISBN: 978-981-15-0118-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics