Classifier Fusion for Emotion Recognition from Speech

Scherer, Stefan; Schwenker, Friedhelm; Palm, Günther

doi:10.1007/978-0-387-76485-6_5

Stefan Scherer⁶,
Friedhelm Schwenker⁶ &
Günther Palm⁶

650 Accesses
18 Citations

Abstract

The intention of this work is the investigation of the performance of an automatic emotion recognizer using biologically motivated features, comprising perceived loudness features proposed by Zwicker, robust RASTA-PLP features, and novel long-term modulation spectrum-based features . Single classifiers using only one type of features and multi-classifier systems utilizing all three types are examined using two-classifier fusion techniques. For all the experiments the standard Berlin Database of Emotional Speech comprising recordings of seven different emotions is used to evaluate the performance of the proposed multi-classifier system. The performance is compared with earlier work as well as with human recognition performance. The results reveal that using simple fusion techniques could improve the performance significantly, outperforming other classifiers used in earlier work. The generalization ability of the proposed system is further investigated in a leave-out one-speaker experiment, uncovering a strong ability to recognize emotions expressed by unknown speakers. Moreover, similarities between earlier speech analysis and the automatic emotion recognition results were found.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., and Weiss, B. (2005). A Database of German Emotional Speech. In Proceedings of Interspeech.
Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., and Taylor, J. (2001). Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine, 18:32–80.
Google Scholar
Dellaert, F., Polzin, T., and Waibel, A. (1996). Recognizing Emotion in Speech. In Proceedings of International Conference on Spoken Language Processing (IC-SLP1996).
Google Scholar
Devillers, L., Vidrascu, L., and Lamel, L. (2005). Challenges in Real-Life Emotion Annotation and Machine Learning based Detection. Neural Networks, 18:407–422.
Article Google Scholar
Fragopanagos, N. and Taylor, J. (2005). Emotion Recognition in Human-Computer Interaction. Neural Networks, 18:389–405.
Article Google Scholar
Hermansky, H. (1996). Auditory Modeling in Automatic Recognition of Speech. In Proceedings of Keele Workshop.
Google Scholar
Hermansky, H. and Morgan, N. (1994). Rasta Processing of Speech. IEEE Transactions on Speech and Audio Processing, Special Issue on Robust Speech Recognition, 2(4):578–589.
Google Scholar
Hermansky, H., Morgan, N., Bayya, A., and Kohn, P. (1991). Rasta-PLP Speech Analysis. Technical report, ICSI Technical Report TR-91-069.
Google Scholar
Ho, T. K. (2002). Multiple Classifier Combination: Lessons and Next Steps. Series in Machine Perception and Artificial Intelligence, 47:171–198.
Article Google Scholar
Kanederaa, N., Araib, T., Hermansky, H., and Pavele, M. (1999). On the Relative Importance of Various Components of the Modulation Spectrum for Automatic Speech Recognition. Speech Communications, 28:43–55.
Article Google Scholar
Kohonen, T. (2001). Self Organizing Maps. Springer, New York.
MATH Google Scholar
Krishna, H. K., Scherer, S., and Palm, G. (2007). A Novel Feature for Emotion Recognition in Voice Based Applications. In Proceedings of the Second International Conference on Affective Computing and Intelligent Interaction (ACII2007), pages 710–711.
Google Scholar
Kuncheva, L. (2004). Combining Pattern Classifiers: Methods and Algorithms. Wiley, New York.
Book MATH Google Scholar
Lee, C. M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., and Narayanan, S. S. (2004). Emotion Recognition Based on Phoneme Classes. In Proceedings of International Conference on Spoken Language Processing (IC-SLP2004).
Google Scholar
Murray, I. R. and Arnott, J. L. (1993). Toward the Simulation of Emotion in Synthetic Speech: A Review of the Literature on Human Vocal Emotion. Journal of the Acoustical Society of America, 93(2):1097–1108.
Article Google Scholar
Nicholson, J., Takahashi, K., and Nakatsu, R. (2000). Emotion Recognition in Speech Using Neural Networks. Neural Computing and Applications, 9(4): 290–296.
Article Google Scholar
Oudeyer, P.-Y. (2003). The Production and Recognition of Emotions in Speech: Features and Algorithms. International Journal of Human Computer Interaction, 59(1–2):157–183.
Google Scholar
Petrushin, V. (1999). Emotion in Speech: Recognition and Application to Call Centers. In Proceedings of Artificial Neural Networks in Engineering.
Google Scholar
Rabiner, L. R. and Schafer, R. W. (1978). Digital Processing of Speech Signals. Prentice-Hall Signal Processing Series. Prentice-Hall, Upper Saddle River.
Google Scholar
Scherer, K. R., Johnstone, T., and Klasmeyer, G. (2003). Handbook of Affective Sciences – Vocal Expression of Emotion, Chapter 23, pages 433–456.
Google Scholar
Scherer, S., Schwenker, F., and G., P. (2008). Emotion Recognition from Speech Using Multi-Classifier Systems and RBF-Ensembles, Chapter 3, pages 49–70.
Google Scholar
Whissel, C. (1989). The Dictionary of Affect in Language, Volume 4 of Emotion: Theory, Research and Experience. Academic Press, New York.
Google Scholar
Yacoub, S., Simske, S., Lin, X., and Burns, J. (2003). Recognition of Emotions in Interactive Voice Response Systems. In Proceedings of the European Conference on Speech Communication and Technology (Eurospeech).
Google Scholar
Zwicker, E., Fastl, H., Widmann, U., Kurakata, K., Kuwano, S., and Namba, S. (1991). Program for Calculating Loudness According to DIN 45631 (ISO 532B). Journal of the Acoustical Society of Japan, 12(1):39–42.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Neural Information Processing, Ulm University, Ulm, Germany
Stefan Scherer, Friedhelm Schwenker & Günther Palm

Authors

Stefan Scherer
View author publications
You can also search for this author in PubMed Google Scholar
Friedhelm Schwenker
View author publications
You can also search for this author in PubMed Google Scholar
Günther Palm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Scherer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Scherer, S., Schwenker, F., Palm, G. (2009). Classifier Fusion for Emotion Recognition from Speech. In: Kameas, A., Callagan, V., Hagras, H., Weber, M., Minker, W. (eds) Advanced Intelligent Environments. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-76485-6_5

Download citation

DOI: https://doi.org/10.1007/978-0-387-76485-6_5
Published: 31 March 2009
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-76484-9
Online ISBN: 978-0-387-76485-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics