Speech Emotion Recognition Model Based on CRNN-CTC

Zhu, Zijiang; Dai, Weihuang; Hu, Yi; Wang, Junhua; Li, Junshan

doi:10.1007/978-3-030-53980-1_113

Zijiang Zhu¹⁸,
Weihuang Dai¹⁹,
Yi Hu¹⁸,
Junhua Wang¹⁸ &
…
Junshan Li¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1244))

Included in the following conference series:

International Conference on Applications and Techniques in Cyber Security and Intelligence

1638 Accesses
3 Citations

Abstract

CRNN (Convolutional Recurrent Neural Network) deep learning model is currently a typical speech emotion recognition technology. When this model is applied, no matter how long the speech sequence is, it will only be converted into an emotional tag. However, the emotional information in speech samples is generally unevenly distributed between frames, which will cause the recognition performance of the model to deteriorate. For this problem, a speech emotion recognition model based on CRNN-CTC (Convolutional Recurrent Neural Network-Connectionist Temporal Classification) is proposed in this paper. On the basis of CRNN model, the speech samples are divided into emotional frames and non-emotional frames first, and then CTC method is used to make the network model focus on the emotional frames of speech for learning to avoid the problem of poor model performance due to the learning of non-emotional frames. Experimental results show that the model achieves the weighted average recall rate (WAR) of 70.11% and the unweighted average recall rate (UAR) of 69.53%. Compared with CRNN model, the performance of speech emotion recognition is significantly improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wootaek, L., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–4. IEEE (2016)
Google Scholar
Fayek, H.M., Lech, M., Cavedon, L.: Evaluating deep learning architectures for Speech Emotion Recognition. Neural Netw. 92, 60–68 (2017)
Article Google Scholar
Sainath, T.N., Weiss, R.J., Wilson, K.W., Li, B.: Multichannel signal processing with deep neural networks for automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 965–979 (2017)
Article Google Scholar
Badshah, A.M., et al.: Deep features-based speech emotion recognition for smart affective services. Multimedia Tools Appl. 78(5), 5571–5589 (2019)
Google Scholar
Hassan, M.M., et al.: Human emotion recognition using deep belief network architecture. Inform. Fusion 51, 10–18 (2019)
Article Google Scholar
Xu, G., Meng, Y., Qiu, X., Yu, Z., Wu, X.: Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7, 51522–51532 (2019)
Article Google Scholar
Liang, D., Liang, H., Yu, Z., Zhang, Y.: Deep convolutional BiLSTM fusion network for facial expression recognition. Vis. Comput. 36(3), 499–508 (2019). https://doi.org/10.1007/s00371-019-01636-3
Article Google Scholar
Zhao, Z., Bao, Z., Zhang, Z., Cummins, N.: Attention-enhanced connectionist temporal classification for discrete speech emotion recognition. In: Proc. Interspeech 2019, pp. 206-210 (2019)
Google Scholar
Soullard, Y., Ruffino, C., Paquet, T.: Ctcmodel: a keras model for connectionist temporal classification. arXiv preprint, arXiv:1901.07957 (2019)
Gao, F., Zhu, J., Jiang, H., Niu, Z., Han, W., Yu, J.: Incremental focal loss GANs. Inf. Process. Manage. 57(3), 102192 (2020)
Article Google Scholar
Zou, Y., Dong, L., Bo, X.: Boosting character-based chinese speech synthesis via multi-task learning and dictionary tutoring. In: Proc. Interspeech 2019, pp. 2055–2059 (2019)
Google Scholar

Download references

Acknowledgements

This work was supported by the Characteristics innovation project of colleges and universities of Guangdong Province (Natural Science, No. 2019KTSCX235).

Author information

Authors and Affiliations

School of Information Science and Technology, South China Business College of Guangdong, University of Foreign Studies, Guangzhou, 510545, China
Zijiang Zhu, Yi Hu, Junhua Wang & Junshan Li
Human Resources Department, South China Business College of Guangdong University of Foreign Studies, Guangzhou, 510545, China
Weihuang Dai

Authors

Zijiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Weihuang Dai
View author publications
You can also search for this author in PubMed Google Scholar
Yi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Junhua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Junshan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weihuang Dai .

Editor information

Editors and Affiliations

Distributed System and Security Research Cluster, Faculty of Science, Engineering and Built Environment, Deakin University, Geelong, VIC, Australia
Jemal H. Abawajy
Department of Information Systems and Cyber Security, The University of Texas at San Antonio, San Antonio, TX, USA
Kim-Kwang Raymond Choo
Shanghai University of Medicine and Health Sciences, Shanghai, China
Zheng Xu
University of Oklahoma, Norman, OK, USA
Mohammed Atiquzzaman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Z., Dai, W., Hu, Y., Wang, J., Li, J. (2021). Speech Emotion Recognition Model Based on CRNN-CTC. In: Abawajy, J., Choo, KK., Xu, Z., Atiquzzaman, M. (eds) 2020 International Conference on Applications and Techniques in Cyber Intelligence. ATCI 2020. Advances in Intelligent Systems and Computing, vol 1244. Springer, Cham. https://doi.org/10.1007/978-3-030-53980-1_113

Download citation

DOI: https://doi.org/10.1007/978-3-030-53980-1_113
Published: 13 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-53979-5
Online ISBN: 978-3-030-53980-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics