Abstract
The intention of this work is the investigation of the performance of an automatic emotion recognizer using biologically motivated features, comprising perceived loudness features proposed by Zwicker, robust RASTA-PLP features, and novel long-term modulation spectrum-based features . Single classifiers using only one type of features and multi-classifier systems utilizing all three types are examined using two-classifier fusion techniques. For all the experiments the standard Berlin Database of Emotional Speech comprising recordings of seven different emotions is used to evaluate the performance of the proposed multi-classifier system. The performance is compared with earlier work as well as with human recognition performance. The results reveal that using simple fusion techniques could improve the performance significantly, outperforming other classifiers used in earlier work. The generalization ability of the proposed system is further investigated in a leave-out one-speaker experiment, uncovering a strong ability to recognize emotions expressed by unknown speakers. Moreover, similarities between earlier speech analysis and the automatic emotion recognition results were found.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., and Weiss, B. (2005). A Database of German Emotional Speech. In Proceedings of Interspeech.
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., and Taylor, J. (2001). Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine, 18:32–80.
Dellaert, F., Polzin, T., and Waibel, A. (1996). Recognizing Emotion in Speech. In Proceedings of International Conference on Spoken Language Processing (IC-SLP1996).
Devillers, L., Vidrascu, L., and Lamel, L. (2005). Challenges in Real-Life Emotion Annotation and Machine Learning based Detection. Neural Networks, 18:407–422.
Fragopanagos, N. and Taylor, J. (2005). Emotion Recognition in Human-Computer Interaction. Neural Networks, 18:389–405.
Hermansky, H. (1996). Auditory Modeling in Automatic Recognition of Speech. In Proceedings of Keele Workshop.
Hermansky, H. and Morgan, N. (1994). Rasta Processing of Speech. IEEE Transactions on Speech and Audio Processing, Special Issue on Robust Speech Recognition, 2(4):578–589.
Hermansky, H., Morgan, N., Bayya, A., and Kohn, P. (1991). Rasta-PLP Speech Analysis. Technical report, ICSI Technical Report TR-91-069.
Ho, T. K. (2002). Multiple Classifier Combination: Lessons and Next Steps. Series in Machine Perception and Artificial Intelligence, 47:171–198.
Kanederaa, N., Araib, T., Hermansky, H., and Pavele, M. (1999). On the Relative Importance of Various Components of the Modulation Spectrum for Automatic Speech Recognition. Speech Communications, 28:43–55.
Kohonen, T. (2001). Self Organizing Maps. Springer, New York.
Krishna, H. K., Scherer, S., and Palm, G. (2007). A Novel Feature for Emotion Recognition in Voice Based Applications. In Proceedings of the Second International Conference on Affective Computing and Intelligent Interaction (ACII2007), pages 710–711.
Kuncheva, L. (2004). Combining Pattern Classifiers: Methods and Algorithms. Wiley, New York.
Lee, C. M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., and Narayanan, S. S. (2004). Emotion Recognition Based on Phoneme Classes. In Proceedings of International Conference on Spoken Language Processing (IC-SLP2004).
Murray, I. R. and Arnott, J. L. (1993). Toward the Simulation of Emotion in Synthetic Speech: A Review of the Literature on Human Vocal Emotion. Journal of the Acoustical Society of America, 93(2):1097–1108.
Nicholson, J., Takahashi, K., and Nakatsu, R. (2000). Emotion Recognition in Speech Using Neural Networks. Neural Computing and Applications, 9(4): 290–296.
Oudeyer, P.-Y. (2003). The Production and Recognition of Emotions in Speech: Features and Algorithms. International Journal of Human Computer Interaction, 59(1–2):157–183.
Petrushin, V. (1999). Emotion in Speech: Recognition and Application to Call Centers. In Proceedings of Artificial Neural Networks in Engineering.
Rabiner, L. R. and Schafer, R. W. (1978). Digital Processing of Speech Signals. Prentice-Hall Signal Processing Series. Prentice-Hall, Upper Saddle River.
Scherer, K. R., Johnstone, T., and Klasmeyer, G. (2003). Handbook of Affective Sciences – Vocal Expression of Emotion, Chapter 23, pages 433–456.
Scherer, S., Schwenker, F., and G., P. (2008). Emotion Recognition from Speech Using Multi-Classifier Systems and RBF-Ensembles, Chapter 3, pages 49–70.
Whissel, C. (1989). The Dictionary of Affect in Language, Volume 4 of Emotion: Theory, Research and Experience. Academic Press, New York.
Yacoub, S., Simske, S., Lin, X., and Burns, J. (2003). Recognition of Emotions in Interactive Voice Response Systems. In Proceedings of the European Conference on Speech Communication and Technology (Eurospeech).
Zwicker, E., Fastl, H., Widmann, U., Kurakata, K., Kuwano, S., and Namba, S. (1991). Program for Calculating Loudness According to DIN 45631 (ISO 532B). Journal of the Acoustical Society of Japan, 12(1):39–42.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag US
About this chapter
Cite this chapter
Scherer, S., Schwenker, F., Palm, G. (2009). Classifier Fusion for Emotion Recognition from Speech. In: Kameas, A., Callagan, V., Hagras, H., Weber, M., Minker, W. (eds) Advanced Intelligent Environments. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-76485-6_5
Download citation
DOI: https://doi.org/10.1007/978-0-387-76485-6_5
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-76484-9
Online ISBN: 978-0-387-76485-6
eBook Packages: EngineeringEngineering (R0)