Recognition Method of Wa Language Isolated Words Based on Convolutional Neural Network

Liu, Jinsheng; Gan, Jianhou; Chen, Ken; Wu, Di; Pan, Wenlin

doi:10.1007/978-3-031-20102-8_49

Jinsheng Liu¹²,
Jianhou Gan^12,13,
Ken Chen¹²,
Di Wu¹² &
…
Wenlin Pan¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13657))

Included in the following conference series:

International Conference on Machine Learning for Cyber Security

538 Accesses

Abstract

Speech recognition technology is a popular research direction in artificial intelligence, especially with the development of deep learning technology, speech recognition gradually shifts from traditional recognition methods to end-to-end recognition based on deep learning. Most of the current speech recognition models have achieved high recognition accuracy for mainstream languages, but these models are relatively complex in structure and have many model parameters, which are not suitable for recognizing isolated words in low-resource languages. Based on the deep learning approach, we use a simple and effective model to recognize isolated words in Wa language of minority languages. The encoder includes a simplified deep convolutional neural network VGG and BiLSTM, where the VGG network is used to extract depth features of the audio signal and BiLSTM is further encoded. The decoder includes two decoding methods, CTC and Attention, which can be decoded individually or jointly, which is an end-to-end speech recognition model. We use this model to conduct experiments on our Wa isolated words speech dataset, and the experimental results show that the model has a good recognition effect. The WER is below 20% whether it is decoded alone or jointly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aung, M.A.A., Pa, W.P.: Time delay neural network for Myanmar automatic speech recognition. In: 2020 IEEE Conference on Computer Applications (ICCA), pp. 1–4. IEEE (2020)
Google Scholar
Graves, A., Fernández, S., Gomez, F., et al.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Google Scholar
Chan, W., Jaitly, N., Le, Q., et al.: Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Hori, T., Watanabe, S., Zhang, Y., et al.: Advances in joint CTC-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM. arXiv preprint arXiv:1706.02737 (2017)
Graves, A.: Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Zhang, B., Wu, D., Yao, Z., et al.: Unified streaming and non-streaming two-pass end-to-end model for speech recognition. arXiv preprint arXiv:2012.05481 (2020)
Kuznetsova, A., Kumar, A., Fox, J.D., et al.: Curriculum optimization for low-resource speech recognition. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8187–8191. IEEE (2022)
Google Scholar
Yi, J., Tao, J., Wen, Z., et al.: Language-adversarial transfer learning for low-resource speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 27(3), 621–630 (2018)
Article Google Scholar
Cho, J., Baskar, M.K., Li, R., et al.: Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modelling. In: 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 521–527. IEEE (2018)
Google Scholar
Xiao, Y., Gong, K., Zhou, P., et al.: Adversarial meta sampling for multilingual low-resource speech recognition. arXiv preprint arXiv:2012.11896 (2020)
Schneider, S., Baevski, A., Collobert, R., et al.: wav2vec: unsupervised pre-training for speech recognition. arXiv preprint arXiv:1904.05862 (2019)
Gelin, L., Daniel, M., Pinquier, J., et al.: End-to-end acoustic modelling for phone recognition of young readers. Speech Commun. 134, 71–84 (2021)
Article Google Scholar
Hu, P., Huang, S., Lv, Z.: Investigating the use of mixed-units based modeling for improving uyghur speech recognition. In: SLTU, pp. 215–219 (2018)
Google Scholar
Li, B., Wang, Y., Niu, Y., et al.: Research on isolated word recognition algorithm based on machine learning. In: 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), pp. 580–584. IEEE (2020)
Google Scholar
Zhang, Y., Pezeshki, M., Brakel, P., et al.: Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:1701.02720 (2017)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849. IEEE (2017)
Google Scholar
Gulati, A., Qin, J., Chiu, C.C., et al.: Conformer: convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100 (2020)

Download references

Acknowledgements

This work is supported by Major Science and Technology Project of Yunnan Province (No. 202002AD080001), National New Liberal Arts Research and Reform Practice Project (No. 2021180030), Yunnan Innovation Team of Education Informatization for Nationalities, and Scientific Technology Innovation Team of Educational Big Data Application Technology in University of Yunnan Province.

Author information

Authors and Affiliations

Key Laboratory of Education Informatization for Nationalities, Yunnan Normal University, Ministry of Education, Kunming, 650500, China
Jinsheng Liu, Jianhou Gan, Ken Chen & Di Wu
Yunnan Key Laboratory of Smart Education, Yunnan Normal University, Kunming, 650500, China
Jianhou Gan
School of Mathematics and Computer Science, Yunnan Minzu University, Kunming, 650500, China
Wenlin Pan

Authors

Jinsheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianhou Gan
View author publications
You can also search for this author in PubMed Google Scholar
Ken Chen
View author publications
You can also search for this author in PubMed Google Scholar
Di Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wenlin Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianhou Gan .

Editor information

Editors and Affiliations

School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, IN, USA
Yuan Xu
Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou, China
Hongyang Yan
Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou, China
Huang Teng
Guangdong Polytechnic Normal University, Guangzhou, China
Jun Cai
Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Gan, J., Chen, K., Wu, D., Pan, W. (2023). Recognition Method of Wa Language Isolated Words Based on Convolutional Neural Network. In: Xu, Y., Yan, H., Teng, H., Cai, J., Li, J. (eds) Machine Learning for Cyber Security. ML4CS 2022. Lecture Notes in Computer Science, vol 13657. Springer, Cham. https://doi.org/10.1007/978-3-031-20102-8_49

Download citation

DOI: https://doi.org/10.1007/978-3-031-20102-8_49
Published: 13 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20101-1
Online ISBN: 978-3-031-20102-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics