Skip to main content

Advertisement

Log in

Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Human interactions involve emotional cues that can be used to interpret the emotion expressed by the speaker. As the vocal emotions vary from one speaker to another, there is a chance of misinterpretation. To determine the emotion expressed by the speaker, a speech emotion recognizer can be utilized. It is known that speech expresses the emotional states of humans along with the syntax and semantic content of linguistic sentences. Therefore, human emotion recognition using speech signaling is possible. Speech emotion recognition is a crucial and challenging task in which the feature extraction plays a prominent role in its performance. Determining emotional states in speech signals is a very challenging area for many reasons. The first issue of all speech emotion systems is the selection of the best features, which is powerful enough to distinguish various emotions. The presence of different language, pronunciation, sentences, style, and speakers adds additional difficulty since these characteristics include pitch and energy that directly alters most of the features extracted. Redundant features and high computational cost make emotion recognition an undesirable task. Instead of focusing on the words, the vocal changes and communicative pressure on the words should be taken as the primary consideration. The Enhanced Cat Swarm Optimization (ECSO) algorithm for feature extraction has been proposed to address these issues and it is not used in any existing speech emotion recognition approaches. The proposed approach achieves excellent performance in terms of accuracy, recognition rate, sensitivity, and specificity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  • Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language, 25(3), 556–570.

    Article  Google Scholar 

  • El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44, 572–587.

    Article  Google Scholar 

  • Gharavian, D., Mansour, S., Alireza, N., & Sahar, G. (2012). Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Computing and Applications, 21(8), 2115–2126.

    Article  Google Scholar 

  • Jiang, P., Hongliang, F., Huawei, T., Peizhi, L., & Li, Z. (2019). Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition. IEEE Access, 7, 90368–90377.

    Article  Google Scholar 

  • Jing, S., Xia, M., & Lijiang, C. (2018). Prominence features: Effective emotional features for speech emotion recognition. Digital Signal Processing, 72, 216–231.

    Article  Google Scholar 

  • Li, X., & Masato, A. (2019). Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model. Speech Communication, 110, 1–12.

    Article  Google Scholar 

  • Liu, Z.-T., Min, W., Wei-Hua, C., Jun-Wei, M., Jian-Ping, X., & Guan-Zheng, T. (2018). Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing, 273, 271–280.

    Article  Google Scholar 

  • Meng, H., Tianhao, Y., Fei, Y., & Hongwei, W. (2019). Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep Learning Network. IEEE Access, 7, 125868–125881.

    Article  Google Scholar 

  • Milton, A., & Tamil, S. S. (2015). Four-stage feature selection to recognize emotion from speech signals. International Journal of Speech Technology, 18(4), 505–520.

    Article  Google Scholar 

  • Ozseven, T. (2019). A novel feature selection method for speech emotion recognition. Applied Acoustics, 146, 320–326.

    Article  Google Scholar 

  • Ramakrishnan, S., Emary, I. M. M. E. I. (2013). Speech emotion recognition approaches in human computer interaction.Telecommunication Systems, 52(3), 1467–1478.

  • Sheikhan, M., Mahdi, B., & Davood, G. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.

    Article  Google Scholar 

  • Sun, L., Sheng, F., and Fu, W. (2019) Decision tree SVM model with Fisher feature selection for speech emotion recognition. EURASIP Journal on Audio, Speech, and Music Processing, 1(2).

  • Sun, Y., & Guihua, W. (2015). Emotion recognition using semi-supervised feature selection with speaker normalization. International Journal of Speech Technology, 18(3), 317–331.

    Article  Google Scholar 

  • Wang, F., Verhelst, W., & Sahli, H. (2011). Relevance vector machine based speech emotion recognition” Lecture Notes in Computer Science. Affect Comput Intell Interact, 69(75), 111–120.

    Article  Google Scholar 

  • Xiao, Z., Dellandrea, E., Dou, W., Chen, L. (2010). Multi-stage classification of emotional speech motivated by a dimensional model.Multimedia Tools and Applications, 46, 119–345.

  • Zhao, J., Xia, M., & Lijiang, C. (2019a). Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control, 47, 312–323.

    Article  Google Scholar 

  • Zhao, Z., Zhongtian, B., Yiqin, Z., Zixing, Z., Nicholas, C., Zhao, R., et al. (2019b). Exploring deep spectrum representations via attention-based recurrent and convolutional neural networks for speech emotion recognition. IEEE Access, 7, 97515–97525.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Gomathy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gomathy, M. Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm. Int J Speech Technol 24, 155–163 (2021). https://doi.org/10.1007/s10772-020-09776-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09776-x

Keywords

Navigation