Skip to main content

Deep Learning of Intelligent Speech Recognition in Power Dispatching

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 885))

Abstract

The establishment of speech acoustic model system based on Long Short-Term Memory (LSTM) makes further improvements for the speech recognition. However, the connectionist temporal classification (CTC) training method performances more better in directly corresponding to the phoneme sequence or bound sequence of the speech. This paper combines CTC and LSTM to establish a power dispatching speech recognition model and compares the LSTM-CTC methods with traditional GMM-HMM methods, RNN-based speech recognition methods, and unidirectional LSTM networks through experiments. The results show that the speech recognition framework of LSTM-CTC has higher precision than other methods, and also has strong generalization ability. The LSTM-CTC methods can provide higher speech recognition accuracy and are more suitable for speech recognition in power dispatching as well.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Velichko, V.M., Zagoruyko, N.G.: Automatic recognition of 200 words. Int. J. Man Mach. Stud. 2(3), 223–234 (1970)

    Article  Google Scholar 

  2. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)

    Article  Google Scholar 

  3. Harma, A., Laine, U.K.: A comparison of warped and conventional linear predictive coding. IEEE Trans. Speech Audio Process. 9(5), 579–588 (2001)

    Article  Google Scholar 

  4. Gauvain, J.L., Lee, C.H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)

    Article  Google Scholar 

  5. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)

    Article  Google Scholar 

  6. Juang, B.H., Katagiri, S.: Discriminative learning for minimum error classification. IEEE Trans. Signal Process. 40(12), 3043–3054 (1992)

    Article  Google Scholar 

  7. Young, S.J., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK version 3.4.1). Cambridge University (2009). http://htk.eng.cam.ac.uk

  8. Mohamed, A., Dahl, G., Hinton, G.: Deep belief networks for phone recognition. In: Workshop on Deep Learning for Speech Recognition and Related Applications. MIT Press, Whistler (2009)

    Google Scholar 

  9. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  Google Scholar 

  10. Fullwood, M.J., Liu, M.H., Pan, Y.F., Liu, J., Xu, H., Mohamed, Y.B., Chew, E.G.: An oestrogen-receptor-α-bound human chromatin interactome. Nature 462(7269), 58–64 (2009)

    Article  Google Scholar 

  11. Graves, A., Ferńandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning, pp. 369–376 (2006)

    Google Scholar 

  12. Senior, A., Sak, H., Quitry, F.D.C., Sainath, T., Rao, K.: Acoustic modelling with CD-CTC-SMBR LSTM RNNs. In: Automatic Speech Recognition and Understanding, pp. 604–609 (2016)

    Google Scholar 

  13. Senior, A., Sak, H., Shafran, I.: Context dependent phone models for LSTM RNN acoustic modelling. In: IEEE International Conference on Acoustics, pp. 4585–4589 (2015)

    Google Scholar 

  14. Sak, H., Senior, A., Rao, K., Beaufays, F.: Fast and accurate recurrent neural network acoustic models for speech recognition. In: INTERSPEECH 2015 Proceedings, pp. 1468–1472 (2015)

    Google Scholar 

Download references

Acknowledgement

This paper is part research results of project ‘Natural language processing and machine learning technology in the application research of dispatching operation (SGHZ0000DKJS1700141)’, which is supported by the Foundation of Central Branch of National Power Net.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianzhong Dou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dou, J. et al. (2019). Deep Learning of Intelligent Speech Recognition in Power Dispatching. In: Xhafa, F., Patnaik, S., Tavana, M. (eds) Advances in Intelligent, Interactive Systems and Applications. IISA 2018. Advances in Intelligent Systems and Computing, vol 885. Springer, Cham. https://doi.org/10.1007/978-3-030-02804-6_13

Download citation

Publish with us

Policies and ethics