Deep Learning of Intelligent Speech Recognition in Power Dispatching

Dou, Jianzhong; Li, Qunshan; Lai, Hongyi; Yang, Chao; Luo, Shenzeng; Lin, Ziyu; Yang, Xusheng

doi:10.1007/978-3-030-02804-6_13

Deep Learning of Intelligent Speech Recognition in Power Dispatching

Jianzhong Dou¹⁷,
Qunshan Li¹⁷,
Hongyi Lai¹⁷,
Chao Yang¹⁷,
Shenzeng Luo¹⁷,
Ziyu Lin¹⁸ &
…
Xusheng Yang¹⁹

Conference paper
First Online: 17 January 2019

1291 Accesses
1 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 885))

Abstract

The establishment of speech acoustic model system based on Long Short-Term Memory (LSTM) makes further improvements for the speech recognition. However, the connectionist temporal classification (CTC) training method performances more better in directly corresponding to the phoneme sequence or bound sequence of the speech. This paper combines CTC and LSTM to establish a power dispatching speech recognition model and compares the LSTM-CTC methods with traditional GMM-HMM methods, RNN-based speech recognition methods, and unidirectional LSTM networks through experiments. The results show that the speech recognition framework of LSTM-CTC has higher precision than other methods, and also has strong generalization ability. The LSTM-CTC methods can provide higher speech recognition accuracy and are more suitable for speech recognition in power dispatching as well.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Velichko, V.M., Zagoruyko, N.G.: Automatic recognition of 200 words. Int. J. Man Mach. Stud. 2(3), 223–234 (1970)
Article Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)
Article Google Scholar
Harma, A., Laine, U.K.: A comparison of warped and conventional linear predictive coding. IEEE Trans. Speech Audio Process. 9(5), 579–588 (2001)
Article Google Scholar
Gauvain, J.L., Lee, C.H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)
Article Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)
Article Google Scholar
Juang, B.H., Katagiri, S.: Discriminative learning for minimum error classification. IEEE Trans. Signal Process. 40(12), 3043–3054 (1992)
Article Google Scholar
Young, S.J., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK version 3.4.1). Cambridge University (2009). http://htk.eng.cam.ac.uk
Mohamed, A., Dahl, G., Hinton, G.: Deep belief networks for phone recognition. In: Workshop on Deep Learning for Speech Recognition and Related Applications. MIT Press, Whistler (2009)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
Fullwood, M.J., Liu, M.H., Pan, Y.F., Liu, J., Xu, H., Mohamed, Y.B., Chew, E.G.: An oestrogen-receptor-α-bound human chromatin interactome. Nature 462(7269), 58–64 (2009)
Article Google Scholar
Graves, A., Ferńandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning, pp. 369–376 (2006)
Google Scholar
Senior, A., Sak, H., Quitry, F.D.C., Sainath, T., Rao, K.: Acoustic modelling with CD-CTC-SMBR LSTM RNNs. In: Automatic Speech Recognition and Understanding, pp. 604–609 (2016)
Google Scholar
Senior, A., Sak, H., Shafran, I.: Context dependent phone models for LSTM RNN acoustic modelling. In: IEEE International Conference on Acoustics, pp. 4585–4589 (2015)
Google Scholar
Sak, H., Senior, A., Rao, K., Beaufays, F.: Fast and accurate recurrent neural network acoustic models for speech recognition. In: INTERSPEECH 2015 Proceedings, pp. 1468–1472 (2015)
Google Scholar

Download references

Acknowledgement

This paper is part research results of project ‘Natural language processing and machine learning technology in the application research of dispatching operation (SGHZ0000DKJS1700141)’, which is supported by the Foundation of Central Branch of National Power Net.

Author information

Authors and Affiliations

Central China Power Dispatching and Communication Center, Wuhan, China
Jianzhong Dou, Qunshan Li, Hongyi Lai, Chao Yang & Shenzeng Luo
School of Finance, Hubei University of Economics, Wuhan, China
Ziyu Lin
Beijing Yongshang Technology Co., Ltd., Beijing, China
Xusheng Yang

Authors

Jianzhong Dou
View author publications
You can also search for this author in PubMed Google Scholar
Qunshan Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongyi Lai
View author publications
You can also search for this author in PubMed Google Scholar
Chao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Shenzeng Luo
View author publications
You can also search for this author in PubMed Google Scholar
Ziyu Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xusheng Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianzhong Dou .

Editor information

Editors and Affiliations

Department de Ciències de la Computació, Universitat Politècnica de Catalunya, Barcelona, Spain
Fatos Xhafa
Department of Computer Science and Engineering, Faculty of Engineering and Technology, SOA University, Bhubaneswar, Odisha, India
Srikanta Patnaik
Department of Business Systems and Analytics, La Salle University, Philadelphia, PA, USA
Madjid Tavana

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dou, J. et al. (2019). Deep Learning of Intelligent Speech Recognition in Power Dispatching. In: Xhafa, F., Patnaik, S., Tavana, M. (eds) Advances in Intelligent, Interactive Systems and Applications. IISA 2018. Advances in Intelligent Systems and Computing, vol 885. Springer, Cham. https://doi.org/10.1007/978-3-030-02804-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-02804-6_13
Published: 17 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02803-9
Online ISBN: 978-3-030-02804-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics