Abstract
CRNN (Convolutional Recurrent Neural Network) deep learning model is currently a typical speech emotion recognition technology. When this model is applied, no matter how long the speech sequence is, it will only be converted into an emotional tag. However, the emotional information in speech samples is generally unevenly distributed between frames, which will cause the recognition performance of the model to deteriorate. For this problem, a speech emotion recognition model based on CRNN-CTC (Convolutional Recurrent Neural Network-Connectionist Temporal Classification) is proposed in this paper. On the basis of CRNN model, the speech samples are divided into emotional frames and non-emotional frames first, and then CTC method is used to make the network model focus on the emotional frames of speech for learning to avoid the problem of poor model performance due to the learning of non-emotional frames. Experimental results show that the model achieves the weighted average recall rate (WAR) of 70.11% and the unweighted average recall rate (UAR) of 69.53%. Compared with CRNN model, the performance of speech emotion recognition is significantly improved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wootaek, L., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–4. IEEE (2016)
Fayek, H.M., Lech, M., Cavedon, L.: Evaluating deep learning architectures for Speech Emotion Recognition. Neural Netw. 92, 60–68 (2017)
Sainath, T.N., Weiss, R.J., Wilson, K.W., Li, B.: Multichannel signal processing with deep neural networks for automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 965–979 (2017)
Badshah, A.M., et al.: Deep features-based speech emotion recognition for smart affective services. Multimedia Tools Appl. 78(5), 5571–5589 (2019)
Hassan, M.M., et al.: Human emotion recognition using deep belief network architecture. Inform. Fusion 51, 10–18 (2019)
Xu, G., Meng, Y., Qiu, X., Yu, Z., Wu, X.: Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7, 51522–51532 (2019)
Liang, D., Liang, H., Yu, Z., Zhang, Y.: Deep convolutional BiLSTM fusion network for facial expression recognition. Vis. Comput. 36(3), 499–508 (2019). https://doi.org/10.1007/s00371-019-01636-3
Zhao, Z., Bao, Z., Zhang, Z., Cummins, N.: Attention-enhanced connectionist temporal classification for discrete speech emotion recognition. In: Proc. Interspeech 2019, pp. 206-210 (2019)
Soullard, Y., Ruffino, C., Paquet, T.: Ctcmodel: a keras model for connectionist temporal classification. arXiv preprint, arXiv:1901.07957 (2019)
Gao, F., Zhu, J., Jiang, H., Niu, Z., Han, W., Yu, J.: Incremental focal loss GANs. Inf. Process. Manage. 57(3), 102192 (2020)
Zou, Y., Dong, L., Bo, X.: Boosting character-based chinese speech synthesis via multi-task learning and dictionary tutoring. In: Proc. Interspeech 2019, pp. 2055–2059 (2019)
Acknowledgements
This work was supported by the Characteristics innovation project of colleges and universities of Guangdong Province (Natural Science, No. 2019KTSCX235).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, Z., Dai, W., Hu, Y., Wang, J., Li, J. (2021). Speech Emotion Recognition Model Based on CRNN-CTC. In: Abawajy, J., Choo, KK., Xu, Z., Atiquzzaman, M. (eds) 2020 International Conference on Applications and Techniques in Cyber Intelligence. ATCI 2020. Advances in Intelligent Systems and Computing, vol 1244. Springer, Cham. https://doi.org/10.1007/978-3-030-53980-1_113
Download citation
DOI: https://doi.org/10.1007/978-3-030-53980-1_113
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-53979-5
Online ISBN: 978-3-030-53980-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)