Application of intelligent speech analysis based on BiLSTM and CNN dual attention model in power dispatching

Shibo, Zeng; Danke, Hong; Feifei, Hu; Li, Liu; Fei, Xie

doi:10.1007/s41204-021-00148-7

Application of intelligent speech analysis based on BiLSTM and CNN dual attention model in power dispatching

Original Article
Published: 27 September 2021

Volume 6, article number 59, (2021)
Cite this article

Nanotechnology for Environmental Engineering Aims and scope Submit manuscript

Zeng Shibo¹,
Hong Danke¹,
Hu Feifei¹,
Liu Li² &
…
Xie Fei³

174 Accesses
2 Citations
Explore all metrics

Abstract

As the most natural language and emotional carrier, speech is widely used in intelligent furniture, vehicle navigation and other speech recognition technologies. With the continuous improvement of China's comprehensive national strength, the power industry has also ushered in a new stage of vigorous development. As the basis of production and life, it is a general trend to absorb voice processing technology. In order to better meet the actual needs of power grid dispatching, this paper applies voice processing technology to the field of smart grid dispatching. By testing and evaluating the recognition rate of the existing speech recognition system, a speech emotion recognition technology based on BiLSTM and CNN network dual attention model is proposed, which is suitable for the human–machine interaction system in the field of intelligent scheduling. Firstly, mel spectrum sequence of speech signal is extracted as input of BiLSTM network, and then, time context feature of speech signal is extracted by BiLSTM network. On this basis, the CNN network is used to extract the high-level emotional features from the low-level features and complete the emotional classification of speech signals. Emotional recognition tests were conducted on three different emotional databases, eNTERAFACE05, RML and AFW6.0. The experimental results show that the average recognition rates of this technology on three databases are 55.82%, 88.23% and 43.70%, respectively. In addition, the traditional speech emotion recognition technology is compared with the speech emotion recognition technology based on BiLSTM or CNN, which verifies the effectiveness of the technology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

References

Lian zhang Z, Leiming C, Dehai Z et al (2017) Emotion recognition from chinese speech for smart affective services using a combination of SVM and DBN[J]. Sensors 17(7):1694–1708
Article Google Scholar
Meftah A, Seddiq Y, Alotaibi Y, et al. (2017) Cross-corpus Arabic and English emotion recognition[C]//2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 377–381.
Palaz D, Magimai-Doss M, Collobert R (2019) End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition[J]. Speech Commun 108:15–32
Article Google Scholar
Yang G, He H, Chen Q (2018) Emotion-semantic-enhanced neural network[J]. IEEE/ACM Trans Audio, Speech, Language Process 27(3):531–543
Article Google Scholar
Zheng C, Wang C, Jia N (2020) An ensemble model for multi-level speech emotion recognition. Appl Sci 10(1):205
Article Google Scholar
Tao F, Liu G. (2018) Advanced LSTM: A study about better time dependency modeling in emotion recognition[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2906–2910.
Chen B, Yin Q, Guo P. (2014) A study of deep belief network based chinese speech emotion recognition[C]//2014 Tenth International Conference on Computational Intelligence and Security. IEEE, 180–184.
Zia T, Zahid U (2019) Long short-term memory recurrent neural network architectures for Urdu acoustic modeling[J]. Int J Speech Technol 22(1):21–30
Article Google Scholar
Badshah A M, Ahmad J, Rahim N, et al. (2017) Speech emotion recognition from spectrograms with deep convolutional neural network[C]//2017 international conference on platform technology and service (PlatCon). IEEE, 1–5.
Munasinghe S, Fookes C, Sridharan S (2017) [IEEE 2017 IEEE International Joint Conference on Biometrics (IJCB) - Denver, CO, USA (2017.10.1–2017.10.4)] 2017 IEEE International Joint Conference on Biometrics (IJCB) - Deep features-based expression-invariant tied factor analysis for emotion recognition[J], 546–554.
Zhang X, Tao H, Zha C et al (2015) A robust method for speech emotion recognition based on infinite student’s t -mixture model[J]. Math Probl Eng 2015(1):1–10
Google Scholar
Zou C, Zhang X, Zha C et al (2016) A novel DBN feature fusion model for cross-corpus speech emotion recognition[J]. J Electr Comput Eng 2016:1–11
Article Google Scholar
Yan J, Zheng W, Xu Q et al (2016) Sparse kernel reduced-rank regression for bimodal emotion recognition from facial expression and speech[J]. IEEE Trans Multimedia 18(7):1319–1329
Article Google Scholar
You L, Guo W, Dai L, et al. (2019) Multi-task learning with high-order statistics for x-vector based text-independent speaker verification[C]//Proc. Interspeech. 1158–1162.
Han W, Ruan H, Yu X, et al. (2016) Combining feature selection and representation for speech emotion recognition[C]//2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 1–5.
Gharavian D, Bejani M, Sheikhan M (2017) Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks[J]. Multimed Tools Appl 76(2):2331–2352
Article Google Scholar
Riecke L, Peters JC, Valente G et al (2017) Frequency-selective attention in auditory scenes recruits frequency representations throughout human superior temporal cortex[J]. Cereb Cortex 27(5):3002–3014
Google Scholar
Zhang JX, Ling ZH, Liu LJ et al (2019) Sequence-to-sequence acoustic modeling for voice conversion[J]. IEEE/ACM Trans Audio, Speech, Language Process 27(3):631–644
Article Google Scholar
Sun L, Chen J, Xie K et al (2018) Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition[J]. Int J Speech Technol 21(4):931–940
Article Google Scholar
Martin O, Kotsia I, Macq B, et al. (2006) The eNTERFACE'05 audio-visual emotion database[C]//22nd International Conference on Data Engineering Workshops (ICDEW'06). IEEE, 1–8.
Wang Y, Guan L (2008) Recognizing human emotional state from audiovisual signals[J]. IEEE Trans Multimedia 10(5):936–946
Article Google Scholar
Dhall A, Goecke R, Joshi J, et al. (2016) Emotiw 2016: Video and group-level emotion recognition challenges[C]//Proceedings of the 18th ACM international conference on multimodal interaction. 427–432.
Eyben F, Wöllmer M, Schuller B. (2010) Opensmile: the munich versatile and fast open-source audio feature extractor[C]//Proceedings of the 18th ACM international conference on Multimedia. 1459–1462.

Download references

Author information

Authors and Affiliations

General Dispatching Communication Office of China Southern Power Grid Co., Ltd., Guangdong, China
Zeng Shibo, Hong Danke & Hu Feifei
Information Center of Yunnan Power Grid Co., Ltd., Kunming, China
Liu Li
iFLYTEK Co., Ltd., Hefei, China
Xie Fei

Authors

Zeng Shibo
View author publications
You can also search for this author in PubMed Google Scholar
Hong Danke
View author publications
You can also search for this author in PubMed Google Scholar
Hu Feifei
View author publications
You can also search for this author in PubMed Google Scholar
Liu Li
View author publications
You can also search for this author in PubMed Google Scholar
Xie Fei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liu Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shibo, Z., Danke, H., Feifei, H. et al. Application of intelligent speech analysis based on BiLSTM and CNN dual attention model in power dispatching. Nanotechnol. Environ. Eng. 6, 59 (2021). https://doi.org/10.1007/s41204-021-00148-7

Download citation

Received: 30 July 2021
Accepted: 25 August 2021
Published: 27 September 2021
DOI: https://doi.org/10.1007/s41204-021-00148-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of intelligent speech analysis based on BiLSTM and CNN dual attention model in power dispatching

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Application of intelligent speech analysis based on BiLSTM and CNN dual attention model in power dispatching

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation