Abstract
The recognition of the emotional states of speaker is a multi-disciplinary research area that has received great interest in the last years. One of the most important goals is to improve the voiced-based human-machine interactions. Recent works on this domain use the proso-dic features and the spectrum characteristics of speech signal, with standard classifier methods. Furthermore, for traditional methods the improvement in performance has also found a limit. In this paper, the spectral characteristics of emotional signals are used in order to group emotions. Standard classifiers based on Gaussian Mixture Models, Hidden Markov Models and Multilayer Perceptron are tested. These classifiers have been evaluated in different configurations with different features, in order to design a new hierarchical method for emotions classification. The proposed multiple feature hierarchical method improves the performance in 6.35% over the standard classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T.: Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun. 50(6), 487–503 (2008)
Devillers, L., Vidrascu, L.: Real-Life Emotion Recognition in Speech. In: Müller, C. (ed.) Speaker Classifcation II. LNCS (LNAI), vol. 4441, pp. 34–42. Springer, Heidelberg (2007)
Tacconi, D., Mayora, O., Lukowicz, P., Arnrich, B., Setz, C., Troster, G., Haring, C.: Activity and emotion recognition to support early diagnosis of psychiatric diseases. In: Pervasive Computing Technologies for Healthcare, 2008. Second International Conference on Pervasive Health 2008, February 2008, pp. 100–102 (2008)
Kim, J., André, E.: Emotion recognition based on physiological changes in music listening. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(12), 2067–2083 (2008)
Schindler, K., Gool, L.V., de Gelder, B.: Recognizing emotions expressed by body pose: A biologically inspired neural model. Neural Networks 21(9), 1238–1246 (2008)
Vinhas, V., Reis, L.P., Oliveira, E.: Dynamic Multimedia Content Delivery Based on Real-Time User Emotions. Multichannel Online Biosignals Towards Adaptative GUI and Content Delivery. In: BIOSIGNALS 2009 - International Conf. on Bio-inspired Systems and Signal Processing, Porto (Portugal), pp. 299–304 (2009)
Albornoz, E.M., Crolla, M.B., Milone, D.H.: Recognition of emotions in speech. In: Proceedings of XXXIV CLEI, Santa Fe Argentina, September 2008, pp. 1120–1129 (2008)
Borchert, M., Dusterhoft, A.: Emotions in speech - experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. In: Proceedings of IEEE International Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2005, October 2005, pp. 147–151 (2005)
El Ayadi, M., Kamel, M., Karray, F.: Speech Emotion Recognition using Gaussian Mixture Vector Autoregressive Models. In: IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2007, April 2007, vol. 4, pp. 957–960 (2007)
Rong, J., Chen, Y.P., Chowdhury, M., Li, G.: Acoustic Features Extraction for Emotion Recognition. In: 6th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2007, July 2007, pp. 419–424 (2007)
Morrison, D., Wang, R., Silva, L.C.D.: Ensemble methods for spoken emotion recognition in call-centres. Speech Communication 49(2), 98–112 (2007)
Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (Proceedings ICASSP 2004), May 2004, vol. 1, pp. I–577–I–580 (2004)
Fu, L., Mao, X., Chen, L.: Speaker independent emotion recognition based on SVM/HMMs fusion system. In: International Conf. on Audio, Language and Image Processing, ICALIP 2008, July 2008, pp. 61–65 (2008)
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Proc. Interspeech 2005, September 2005, pp. 1517–1520 (2005)
Schuller, B., Vlasenko, B., Arsic, D., Rigoll, G., Wendemuth, A.: Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition. In: IEEE International Conference on Multimedia and Expo, April 2008, pp. 1333–1336 (2008)
Cowie, R., Cornelius, R.: Describing the emotional states that are expressed in speech. Speech Communication 40(1), 5–32 (2003)
Kim, J.: Bimodal Emotion Recognition using Speech and Physiological Changes. In: Robust Speech Recognition and Understanding, pp. 265–280. I-Tech Education and Publishing, Vienna (2007)
Scherer, K.R.: What are emotions? And how can they be measured? Social Science Information 44(4), 695–729 (2005)
Noguerias, A., Moreno, A., Bonafonte, A., Mariño, J.: Speech Emotion Recognition Using Hidden Markov Models. In: Eurospeech 2001, pp. 2679–2682 (2001)
Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.1). Cambridge University Engineering Department, England (2001)
Deller, J.R., Proakis, J.G., Hansen, J.H.: Discrete-Time Processing of Speech Signals. Macmillan Publishing, New York (1993)
Adell Mercado, J., Bonafonte Cávez, A., Escudero Mancebo, D.: Analysis of prosodic features: towards modelling of emotional and pragmatic attributes of speech. In: Procesamiento del lenguaje natural, September 2005, vol. (35), pp. 277–283 (2005)
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice-Hall, Englewood Cliffs (1998)
Bishop, C.M.: Pattern Recognition and Machine Learning, 1st edn. Springer, Heidelberg (2006)
Rabiner, L.R., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Albornoz, E.M., Milone, D.H., Rufiner, H.L. (2010). Multiple Feature Extraction and Hierarchical Classifiers for Emotions Recognition. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds) Development of Multimodal Interfaces: Active Listening and Synchrony. Lecture Notes in Computer Science, vol 5967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12397-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-12397-9_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12396-2
Online ISBN: 978-3-642-12397-9
eBook Packages: Computer ScienceComputer Science (R0)