Four-stage feature selection to recognize emotion from speech signals

Milton, A.; Selvi, S. Tamil

doi:10.1007/s10772-015-9294-4

Four-stage feature selection to recognize emotion from speech signals

Published: 29 July 2015

Volume 18, pages 505–520, (2015)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

A. Milton¹ &
S. Tamil Selvi²

402 Accesses
3 Citations
Explore all metrics

Abstract

Feature selection plays an important role in emotion recognition from speech signals because it improves the classification accuracy by choosing the best uncorrelated features. In wrapper method of feature selection, the features are evaluated by a classifier. Features of large dimension will increase the computational complexity of the classifier, and further it will affect the training of classifiers which needs inverse of covariance matrix. We propose a four-stage feature selection method which avoids the problem of curse of dimensionality by the principle of divide and conquer. In the proposed method, the dimension of the feature vector is shortened at any stage in a way that the classifiers, whose training is affected by the large feature dimension, can also be used to evaluate the features. Experimental results show that the four-stage feature selection method improves classification accuracy. Another method to improve classification accuracy is evolved by bringing together several classifiers with a fusion technique. Class-specific multiple classifiers scheme is one such method that improves classification accuracy by combining optimum performance feature set and classifier for each emotional class. In this work, we improve the performance of the class-specific multiple classifiers scheme by embedding the proposed feature selection method in its structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aha, D. W., & Bankert, R. L. (1996). A comparative evaluation of sequential feature selection algorithms. Learning from Data, Lecture Notes in Statistics, 112, 199–206.
Article Google Scholar
Ai, H., Litman, D. J., Forbes-Riley, K., Rotaru, M., Tetreault, A., & Purandare, A. (2006). Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In Proceedings of Interspeech, 2006 (pp. 797–800). Pittsburgh, PA.
Altun, H., & Polat, G. (2009). Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Systems with Applications, 36(4), 8197–8203.
Article Google Scholar
Batliner, A., Fischer, K., Huber, R., Spiker, J., & Nöth, E. (2000). Desperately seeking emotions: Actors, wizards and human beings. In: Proceedings of the ISCA workshop on speech and emotion: A conceptual framework for research, Belfast. pp. 195–200.
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., & Amir, N. (2011). Whodunnit-searching for the most important feature types signaling emotion-related user states in speech. Computer Speech & Language, 25(1), 4–28.
Article Google Scholar
Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7), 613–625.
Article Google Scholar
Böck, R., Hübner, D., & Wendemuth, A. (2010). Determining optimal signal features and parameters for HMM-based emotion classification. In: Proceedings of the 15th IEEE mediterranean electrotechnical conference, Valletta, MT. pp. 1586–1590.
Boersma, P., & Weenink, D. (2009). Praat:doing phonetics by computer (computer program). Amsterdam: Institute of Phonetic Sciences, University of Amsterdam.
Google Scholar
Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2011). Formant position based weighted spectral features for emotion recognition. Speech Communication, 53(9–10), 1186–1197.
Article Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Senlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In: Proceedings of interspeech 2005, Lisbon. pp. 1517–1520.
Calix, R. A., & Knapp, G. M. (2013). Actor level emotion magnitude prediction in text and speech. Multimedia Tools and Applications, 62, 319–332.
Article Google Scholar
Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falcão, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech & Language, 24(3), 445–460.
Article Google Scholar
Inanoglu, Z., & Young, S. (2009). Data-driven emotion conversion in spoken English. Speech Communication, 51(3), 268–283.
Article Google Scholar
Klein, J., Moon, Y., & Picard, R. W. (2002). This computer responds to user frustration: theory, design and results. Interacting with Computers, 14(2), 119–140.
Article Google Scholar
Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.
Article Google Scholar
Kotti, M., & Paternò, F. (2012). Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. International Journal of Speech Technology, 15(2), 131–150.
Article Google Scholar
Kuncheva, L. I., Bezdek, J. C., & Duin, R. P. W. (2001). Decision templates for multiple classifier fusion: An experimental comparison. Pattern Recognition, 34(2), 299–314.
Article MATH Google Scholar
Lee, C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9–10), 1162–1171.
Article Google Scholar
Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transaction on Speech and Audio Processing, 13(2), 293–303.
Article Google Scholar
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceeding of IEEE, 63(4), 561–580.
Article Google Scholar
Mansoorizadeh, M., & Charkari, N. M. (2010). Multimodal information fusion application to human emotion recognition from face and speech. Multimedia Tools and Applications, 49, 277–297.
Article Google Scholar
Martin, O., Kotsia, I., Macq, B., & Pitas, I. (2006). The enterface’05 audio-visual emotion database. In: Proceedings of IEEE workshop on multimedia database management, Atlanta
Milton, A., & Selvi, S. T. (2014). Class-specific multiple classifiers scheme to recognize emotions from speech signals. Computer Speech & Language, 28(3), 727–742.
Article Google Scholar
Murray, I. R., & Arnott, J. L. (2008). Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech. Computer Speech & Language, 22(2), 107–129.
Article Google Scholar
Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transaction on Affective Computing, 3(1), 116–125.
Article Google Scholar
Pérez-Espinosa, H., Reyes-García, C. A., & Villasenor-Pineda, L. (2012). Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomedical Signal Processing and Control, 7(1), 79–87.
Article Google Scholar
Pfister, T., & Robinson, P. (2011). Real-time recognition of affective states from nonverbal features of speech and its application for public speaking skill analysis. IEEE Transaction on Affective Computing, 2(2), 66–78.
Article Google Scholar
Pierre-Yves, O. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human–Computer Studies, 59(1–2), 157–183.
Article Google Scholar
Rong, J., Li, G., & Chen, Y. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing and Management, 45(3), 315–328.
Article Google Scholar
Scheirer, J., Fernandez, R., Klein, J., & Picard, R. W. (2002). Frustrating the user on purpose: a step toward building an effective computer. Interacting with Computers, 14(2), 93–118.
Article Google Scholar
Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011a). Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 3(9–10), 1062–1087.
Article Google Scholar
Schuller, B., Müller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proceedings of interspeech 2005, Lisbon. pp. 805-808.
Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., & Rigoll, G. (2010). Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Transaction on Affective Computing, 1(1), 1–14.
Article Google Scholar
Schuller, B., Zhang, Z., Weninger, F., & Rigoll, G. (2011b). Using multiple databases for training in emotion recognition: To unite or to vote? In: Proceedings of interspeech 2011, Florence. pp. 1553–1556.
Sheikhan, M., Bejani, M., & Gharavian, D. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.
Article Google Scholar
Tao, J., Kang, Y., & Li, A. (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transaction on Audio, Speech and Language Processing, 14(4), 1145–1154.
Article Google Scholar
Väyrynen, E., Toivanen, J., & Seppänen, T. (2011). Classification of emotion in spoken Finnish using vowel-length segments: Increasing reliability with a fusion technique. Speech Communication, 53(3), 269–282.
Article Google Scholar
Ververidis, D., & Kotropoulos, C. (2008). Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Processing, 88(12), 2956–2970.
Article MATH Google Scholar
Vlasenko, B., Prylipko, D., Böck, R., & Wendemuth, A. (2014). Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Computer Speech & Language, 28, 483–500.
Article Google Scholar
Vlasenko, B., & Wendemuth, A. (2009). Processing affected speech within human machine interaction. In: Proceedings of interspeech 2009, Brighton. pp. 2039–2042.
Wang, F., Sahli, H., Gao, J., Jiang, D., & Verhelst, W. (2014). Relevance units machine based dimensional and continuous speech emotion prediction. Multimedia Tools and Applications,. doi:10.1007/s11042-014-2319-1.
Google Scholar
Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2010). Multi-stage classification of emotional speech motivated by a dimensional model. Multimedia Tools and Applications, 46, 119–345.
Article Google Scholar
Zhang, S., & Zhao, X. (2013). Dimensionality reduction-based spoken emotion recognition. Multimedia Tools and Applications, 63(3), 615–646.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, St. Xavier’s Catholic College of Engineering, Chunkankadai, Tamil Nadu, 629003, India
A. Milton
Department of Electronics and Communication Engineering, National Engineering College, Kovilpatti, Tamil Nadu, 628503, India
S. Tamil Selvi

Authors

A. Milton
View author publications
You can also search for this author in PubMed Google Scholar
S. Tamil Selvi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Milton.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Milton, A., Selvi, S.T. Four-stage feature selection to recognize emotion from speech signals. Int J Speech Technol 18, 505–520 (2015). https://doi.org/10.1007/s10772-015-9294-4

Download citation

Received: 13 April 2015
Accepted: 23 July 2015
Published: 29 July 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10772-015-9294-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Four-stage feature selection to recognize emotion from speech signals

Abstract

Access this article

Similar content being viewed by others

An optimal two stage feature selection for speech emotion recognition using acoustic features

Emotion classification from speech signal based on empirical mode decomposition and non-linear features

Speech emotion recognition using multimodal feature fusion with machine learning approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Four-stage feature selection to recognize emotion from speech signals

Abstract

Access this article

Similar content being viewed by others

An optimal two stage feature selection for speech emotion recognition using acoustic features

Emotion classification from speech signal based on empirical mode decomposition and non-linear features

Speech emotion recognition using multimodal feature fusion with machine learning approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation