Skip to main content
Log in

Four-stage feature selection to recognize emotion from speech signals

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Feature selection plays an important role in emotion recognition from speech signals because it improves the classification accuracy by choosing the best uncorrelated features. In wrapper method of feature selection, the features are evaluated by a classifier. Features of large dimension will increase the computational complexity of the classifier, and further it will affect the training of classifiers which needs inverse of covariance matrix. We propose a four-stage feature selection method which avoids the problem of curse of dimensionality by the principle of divide and conquer. In the proposed method, the dimension of the feature vector is shortened at any stage in a way that the classifiers, whose training is affected by the large feature dimension, can also be used to evaluate the features. Experimental results show that the four-stage feature selection method improves classification accuracy. Another method to improve classification accuracy is evolved by bringing together several classifiers with a fusion technique. Class-specific multiple classifiers scheme is one such method that improves classification accuracy by combining optimum performance feature set and classifier for each emotional class. In this work, we improve the performance of the class-specific multiple classifiers scheme by embedding the proposed feature selection method in its structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aha, D. W., & Bankert, R. L. (1996). A comparative evaluation of sequential feature selection algorithms. Learning from Data, Lecture Notes in Statistics, 112, 199–206.

    Article  Google Scholar 

  • Ai, H., Litman, D. J., Forbes-Riley, K., Rotaru, M., Tetreault, A., & Purandare, A. (2006). Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In Proceedings of Interspeech, 2006 (pp. 797–800). Pittsburgh, PA.

  • Altun, H., & Polat, G. (2009). Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Systems with Applications, 36(4), 8197–8203.

    Article  Google Scholar 

  • Batliner, A., Fischer, K., Huber, R., Spiker, J., & Nöth, E. (2000). Desperately seeking emotions: Actors, wizards and human beings. In: Proceedings of the ISCA workshop on speech and emotion: A conceptual framework for research, Belfast. pp. 195–200.

  • Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., & Amir, N. (2011). Whodunnit-searching for the most important feature types signaling emotion-related user states in speech. Computer Speech & Language, 25(1), 4–28.

    Article  Google Scholar 

  • Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7), 613–625.

    Article  Google Scholar 

  • Böck, R., Hübner, D., & Wendemuth, A. (2010). Determining optimal signal features and parameters for HMM-based emotion classification. In: Proceedings of the 15th IEEE mediterranean electrotechnical conference, Valletta, MT. pp. 1586–1590.

  • Boersma, P., & Weenink, D. (2009). Praat:doing phonetics by computer (computer program). Amsterdam: Institute of Phonetic Sciences, University of Amsterdam.

    Google Scholar 

  • Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2011). Formant position based weighted spectral features for emotion recognition. Speech Communication, 53(9–10), 1186–1197.

    Article  Google Scholar 

  • Burkhardt, F., Paeschke, A., Rolfes, M., Senlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In: Proceedings of interspeech 2005, Lisbon. pp. 1517–1520.

  • Calix, R. A., & Knapp, G. M. (2013). Actor level emotion magnitude prediction in text and speech. Multimedia Tools and Applications, 62, 319–332.

    Article  Google Scholar 

  • Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falcão, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech & Language, 24(3), 445–460.

    Article  Google Scholar 

  • Inanoglu, Z., & Young, S. (2009). Data-driven emotion conversion in spoken English. Speech Communication, 51(3), 268–283.

    Article  Google Scholar 

  • Klein, J., Moon, Y., & Picard, R. W. (2002). This computer responds to user frustration: theory, design and results. Interacting with Computers, 14(2), 119–140.

    Article  Google Scholar 

  • Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.

    Article  Google Scholar 

  • Kotti, M., & Paternò, F. (2012). Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. International Journal of Speech Technology, 15(2), 131–150.

    Article  Google Scholar 

  • Kuncheva, L. I., Bezdek, J. C., & Duin, R. P. W. (2001). Decision templates for multiple classifier fusion: An experimental comparison. Pattern Recognition, 34(2), 299–314.

    Article  MATH  Google Scholar 

  • Lee, C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9–10), 1162–1171.

    Article  Google Scholar 

  • Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transaction on Speech and Audio Processing, 13(2), 293–303.

    Article  Google Scholar 

  • Makhoul, J. (1975). Linear prediction: A tutorial review. Proceeding of IEEE, 63(4), 561–580.

    Article  Google Scholar 

  • Mansoorizadeh, M., & Charkari, N. M. (2010). Multimodal information fusion application to human emotion recognition from face and speech. Multimedia Tools and Applications, 49, 277–297.

    Article  Google Scholar 

  • Martin, O., Kotsia, I., Macq, B., & Pitas, I. (2006). The enterface’05 audio-visual emotion database. In: Proceedings of IEEE workshop on multimedia database management, Atlanta

  • Milton, A., & Selvi, S. T. (2014). Class-specific multiple classifiers scheme to recognize emotions from speech signals. Computer Speech & Language, 28(3), 727–742.

    Article  Google Scholar 

  • Murray, I. R., & Arnott, J. L. (2008). Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech. Computer Speech & Language, 22(2), 107–129.

    Article  Google Scholar 

  • Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transaction on Affective Computing, 3(1), 116–125.

    Article  Google Scholar 

  • Pérez-Espinosa, H., Reyes-García, C. A., & Villasenor-Pineda, L. (2012). Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomedical Signal Processing and Control, 7(1), 79–87.

    Article  Google Scholar 

  • Pfister, T., & Robinson, P. (2011). Real-time recognition of affective states from nonverbal features of speech and its application for public speaking skill analysis. IEEE Transaction on Affective Computing, 2(2), 66–78.

    Article  Google Scholar 

  • Pierre-Yves, O. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human–Computer Studies, 59(1–2), 157–183.

    Article  Google Scholar 

  • Rong, J., Li, G., & Chen, Y. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing and Management, 45(3), 315–328.

    Article  Google Scholar 

  • Scheirer, J., Fernandez, R., Klein, J., & Picard, R. W. (2002). Frustrating the user on purpose: a step toward building an effective computer. Interacting with Computers, 14(2), 93–118.

    Article  Google Scholar 

  • Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011a). Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 3(9–10), 1062–1087.

    Article  Google Scholar 

  • Schuller, B., Müller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proceedings of interspeech 2005, Lisbon. pp. 805-808.

  • Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., & Rigoll, G. (2010). Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Transaction on Affective Computing, 1(1), 1–14.

    Article  Google Scholar 

  • Schuller, B., Zhang, Z., Weninger, F., & Rigoll, G. (2011b). Using multiple databases for training in emotion recognition: To unite or to vote? In: Proceedings of interspeech 2011, Florence. pp. 1553–1556.

  • Sheikhan, M., Bejani, M., & Gharavian, D. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.

    Article  Google Scholar 

  • Tao, J., Kang, Y., & Li, A. (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transaction on Audio, Speech and Language Processing, 14(4), 1145–1154.

    Article  Google Scholar 

  • Väyrynen, E., Toivanen, J., & Seppänen, T. (2011). Classification of emotion in spoken Finnish using vowel-length segments: Increasing reliability with a fusion technique. Speech Communication, 53(3), 269–282.

    Article  Google Scholar 

  • Ververidis, D., & Kotropoulos, C. (2008). Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Processing, 88(12), 2956–2970.

    Article  MATH  Google Scholar 

  • Vlasenko, B., Prylipko, D., Böck, R., & Wendemuth, A. (2014). Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Computer Speech & Language, 28, 483–500.

    Article  Google Scholar 

  • Vlasenko, B., & Wendemuth, A. (2009). Processing affected speech within human machine interaction. In: Proceedings of interspeech 2009, Brighton. pp. 2039–2042.

  • Wang, F., Sahli, H., Gao, J., Jiang, D., & Verhelst, W. (2014). Relevance units machine based dimensional and continuous speech emotion prediction. Multimedia Tools and Applications,. doi:10.1007/s11042-014-2319-1.

    Google Scholar 

  • Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2010). Multi-stage classification of emotional speech motivated by a dimensional model. Multimedia Tools and Applications, 46, 119–345.

    Article  Google Scholar 

  • Zhang, S., & Zhao, X. (2013). Dimensionality reduction-based spoken emotion recognition. Multimedia Tools and Applications, 63(3), 615–646.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Milton.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Milton, A., Selvi, S.T. Four-stage feature selection to recognize emotion from speech signals. Int J Speech Technol 18, 505–520 (2015). https://doi.org/10.1007/s10772-015-9294-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9294-4

Keywords

Navigation