Skip to main content
Log in

Application of intelligent speech analysis based on BiLSTM and CNN dual attention model in power dispatching

  • Original Article
  • Published:
Nanotechnology for Environmental Engineering Aims and scope Submit manuscript

Abstract

As the most natural language and emotional carrier, speech is widely used in intelligent furniture, vehicle navigation and other speech recognition technologies. With the continuous improvement of China's comprehensive national strength, the power industry has also ushered in a new stage of vigorous development. As the basis of production and life, it is a general trend to absorb voice processing technology. In order to better meet the actual needs of power grid dispatching, this paper applies voice processing technology to the field of smart grid dispatching. By testing and evaluating the recognition rate of the existing speech recognition system, a speech emotion recognition technology based on BiLSTM and CNN network dual attention model is proposed, which is suitable for the human–machine interaction system in the field of intelligent scheduling. Firstly, mel spectrum sequence of speech signal is extracted as input of BiLSTM network, and then, time context feature of speech signal is extracted by BiLSTM network. On this basis, the CNN network is used to extract the high-level emotional features from the low-level features and complete the emotional classification of speech signals. Emotional recognition tests were conducted on three different emotional databases, eNTERAFACE05, RML and AFW6.0. The experimental results show that the average recognition rates of this technology on three databases are 55.82%, 88.23% and 43.70%, respectively. In addition, the traditional speech emotion recognition technology is compared with the speech emotion recognition technology based on BiLSTM or CNN, which verifies the effectiveness of the technology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Lian zhang Z, Leiming C, Dehai Z et al (2017) Emotion recognition from chinese speech for smart affective services using a combination of SVM and DBN[J]. Sensors 17(7):1694–1708

    Article  Google Scholar 

  2. Meftah A, Seddiq Y, Alotaibi Y, et al. (2017) Cross-corpus Arabic and English emotion recognition[C]//2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 377–381.

  3. Palaz D, Magimai-Doss M, Collobert R (2019) End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition[J]. Speech Commun 108:15–32

    Article  Google Scholar 

  4. Yang G, He H, Chen Q (2018) Emotion-semantic-enhanced neural network[J]. IEEE/ACM Trans Audio, Speech, Language Process 27(3):531–543

    Article  Google Scholar 

  5. Zheng C, Wang C, Jia N (2020) An ensemble model for multi-level speech emotion recognition. Appl Sci 10(1):205

    Article  Google Scholar 

  6. Tao F, Liu G. (2018) Advanced LSTM: A study about better time dependency modeling in emotion recognition[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2906–2910.

  7. Chen B, Yin Q, Guo P. (2014) A study of deep belief network based chinese speech emotion recognition[C]//2014 Tenth International Conference on Computational Intelligence and Security. IEEE, 180–184.

  8. Zia T, Zahid U (2019) Long short-term memory recurrent neural network architectures for Urdu acoustic modeling[J]. Int J Speech Technol 22(1):21–30

    Article  Google Scholar 

  9. Badshah A M, Ahmad J, Rahim N, et al. (2017) Speech emotion recognition from spectrograms with deep convolutional neural network[C]//2017 international conference on platform technology and service (PlatCon). IEEE, 1–5.

  10. Munasinghe S, Fookes C, Sridharan S (2017) [IEEE 2017 IEEE International Joint Conference on Biometrics (IJCB) - Denver, CO, USA (2017.10.1–2017.10.4)] 2017 IEEE International Joint Conference on Biometrics (IJCB) - Deep features-based expression-invariant tied factor analysis for emotion recognition[J], 546–554.

  11. Zhang X, Tao H, Zha C et al (2015) A robust method for speech emotion recognition based on infinite student’s t -mixture model[J]. Math Probl Eng 2015(1):1–10

    Google Scholar 

  12. Zou C, Zhang X, Zha C et al (2016) A novel DBN feature fusion model for cross-corpus speech emotion recognition[J]. J Electr Comput Eng 2016:1–11

    Article  Google Scholar 

  13. Yan J, Zheng W, Xu Q et al (2016) Sparse kernel reduced-rank regression for bimodal emotion recognition from facial expression and speech[J]. IEEE Trans Multimedia 18(7):1319–1329

    Article  Google Scholar 

  14. You L, Guo W, Dai L, et al. (2019) Multi-task learning with high-order statistics for x-vector based text-independent speaker verification[C]//Proc. Interspeech. 1158–1162.

  15. Han W, Ruan H, Yu X, et al. (2016) Combining feature selection and representation for speech emotion recognition[C]//2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 1–5.

  16. Gharavian D, Bejani M, Sheikhan M (2017) Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks[J]. Multimed Tools Appl 76(2):2331–2352

    Article  Google Scholar 

  17. Riecke L, Peters JC, Valente G et al (2017) Frequency-selective attention in auditory scenes recruits frequency representations throughout human superior temporal cortex[J]. Cereb Cortex 27(5):3002–3014

    Google Scholar 

  18. Zhang JX, Ling ZH, Liu LJ et al (2019) Sequence-to-sequence acoustic modeling for voice conversion[J]. IEEE/ACM Trans Audio, Speech, Language Process 27(3):631–644

    Article  Google Scholar 

  19. Sun L, Chen J, Xie K et al (2018) Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition[J]. Int J Speech Technol 21(4):931–940

    Article  Google Scholar 

  20. Martin O, Kotsia I, Macq B, et al. (2006) The eNTERFACE'05 audio-visual emotion database[C]//22nd International Conference on Data Engineering Workshops (ICDEW'06). IEEE, 1–8.

  21. Wang Y, Guan L (2008) Recognizing human emotional state from audiovisual signals[J]. IEEE Trans Multimedia 10(5):936–946

    Article  Google Scholar 

  22. Dhall A, Goecke R, Joshi J, et al. (2016) Emotiw 2016: Video and group-level emotion recognition challenges[C]//Proceedings of the 18th ACM international conference on multimodal interaction. 427–432.

  23. Eyben F, Wöllmer M, Schuller B. (2010) Opensmile: the munich versatile and fast open-source audio feature extractor[C]//Proceedings of the 18th ACM international conference on Multimedia. 1459–1462.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liu Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shibo, Z., Danke, H., Feifei, H. et al. Application of intelligent speech analysis based on BiLSTM and CNN dual attention model in power dispatching. Nanotechnol. Environ. Eng. 6, 59 (2021). https://doi.org/10.1007/s41204-021-00148-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41204-021-00148-7

Keywords

Navigation