Classifier Fusion for Emotion Recognition from Speech

  • Stefan SchererEmail author
  • Friedhelm Schwenker
  • Günther Palm


The intention of this work is the investigation of the performance of an automatic emotion recognizer using biologically motivated features, comprising perceived loudness features proposed by Zwicker, robust RASTA-PLP features, and novel long-term modulation spectrum-based features . Single classifiers using only one type of features and multi-classifier systems utilizing all three types are examined using two-classifier fusion techniques. For all the experiments the standard Berlin Database of Emotional Speech comprising recordings of seven different emotions is used to evaluate the performance of the proposed multi-classifier system. The performance is compared with earlier work as well as with human recognition performance. The results reveal that using simple fusion techniques could improve the performance significantly, outperforming other classifiers used in earlier work. The generalization ability of the proposed system is further investigated in a leave-out one-speaker experiment, uncovering a strong ability to recognize emotions expressed by unknown speakers. Moreover, similarities between earlier speech analysis and the automatic emotion recognition results were found.


Modulation spectrum features RASTA-PLP Zwicker loudness 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., and Weiss, B. (2005). A Database of German Emotional Speech. In Proceedings of Interspeech.Google Scholar
  2. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., and Taylor, J. (2001). Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine, 18:32–80.Google Scholar
  3. Dellaert, F., Polzin, T., and Waibel, A. (1996). Recognizing Emotion in Speech. In Proceedings of International Conference on Spoken Language Processing (IC-SLP1996).Google Scholar
  4. Devillers, L., Vidrascu, L., and Lamel, L. (2005). Challenges in Real-Life Emotion Annotation and Machine Learning based Detection. Neural Networks, 18:407–422.CrossRefGoogle Scholar
  5. Fragopanagos, N. and Taylor, J. (2005). Emotion Recognition in Human-Computer Interaction. Neural Networks, 18:389–405.CrossRefGoogle Scholar
  6. Hermansky, H. (1996). Auditory Modeling in Automatic Recognition of Speech. In Proceedings of Keele Workshop.Google Scholar
  7. Hermansky, H. and Morgan, N. (1994). Rasta Processing of Speech. IEEE Transactions on Speech and Audio Processing, Special Issue on Robust Speech Recognition, 2(4):578–589.Google Scholar
  8. Hermansky, H., Morgan, N., Bayya, A., and Kohn, P. (1991). Rasta-PLP Speech Analysis. Technical report, ICSI Technical Report TR-91-069.Google Scholar
  9. Ho, T. K. (2002). Multiple Classifier Combination: Lessons and Next Steps. Series in Machine Perception and Artificial Intelligence, 47:171–198.CrossRefGoogle Scholar
  10. Kanederaa, N., Araib, T., Hermansky, H., and Pavele, M. (1999). On the Relative Importance of Various Components of the Modulation Spectrum for Automatic Speech Recognition. Speech Communications, 28:43–55.CrossRefGoogle Scholar
  11. Kohonen, T. (2001). Self Organizing Maps. Springer, New York.zbMATHGoogle Scholar
  12. Krishna, H. K., Scherer, S., and Palm, G. (2007). A Novel Feature for Emotion Recognition in Voice Based Applications. In Proceedings of the Second International Conference on Affective Computing and Intelligent Interaction (ACII2007), pages 710–711.Google Scholar
  13. Kuncheva, L. (2004). Combining Pattern Classifiers: Methods and Algorithms. Wiley, New York.zbMATHCrossRefGoogle Scholar
  14. Lee, C. M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., and Narayanan, S. S. (2004). Emotion Recognition Based on Phoneme Classes. In Proceedings of International Conference on Spoken Language Processing (IC-SLP2004).Google Scholar
  15. Murray, I. R. and Arnott, J. L. (1993). Toward the Simulation of Emotion in Synthetic Speech: A Review of the Literature on Human Vocal Emotion. Journal of the Acoustical Society of America, 93(2):1097–1108.CrossRefGoogle Scholar
  16. Nicholson, J., Takahashi, K., and Nakatsu, R. (2000). Emotion Recognition in Speech Using Neural Networks. Neural Computing and Applications, 9(4): 290–296.CrossRefGoogle Scholar
  17. Oudeyer, P.-Y. (2003). The Production and Recognition of Emotions in Speech: Features and Algorithms. International Journal of Human Computer Interaction, 59(1–2):157–183.Google Scholar
  18. Petrushin, V. (1999). Emotion in Speech: Recognition and Application to Call Centers. In Proceedings of Artificial Neural Networks in Engineering.Google Scholar
  19. Rabiner, L. R. and Schafer, R. W. (1978). Digital Processing of Speech Signals. Prentice-Hall Signal Processing Series. Prentice-Hall, Upper Saddle River.Google Scholar
  20. Scherer, K. R., Johnstone, T., and Klasmeyer, G. (2003). Handbook of Affective Sciences – Vocal Expression of Emotion, Chapter 23, pages 433–456.Google Scholar
  21. Scherer, S., Schwenker, F., and G., P. (2008). Emotion Recognition from Speech Using Multi-Classifier Systems and RBF-Ensembles, Chapter 3, pages 49–70.Google Scholar
  22. Whissel, C. (1989). The Dictionary of Affect in Language, Volume 4 of Emotion: Theory, Research and Experience. Academic Press, New York.Google Scholar
  23. Yacoub, S., Simske, S., Lin, X., and Burns, J. (2003). Recognition of Emotions in Interactive Voice Response Systems. In Proceedings of the European Conference on Speech Communication and Technology (Eurospeech).Google Scholar
  24. Zwicker, E., Fastl, H., Widmann, U., Kurakata, K., Kuwano, S., and Namba, S. (1991). Program for Calculating Loudness According to DIN 45631 (ISO 532B). Journal of the Acoustical Society of Japan, 12(1):39–42.Google Scholar

Copyright information

© Springer-Verlag US 2009

Authors and Affiliations

  • Stefan Scherer
    • 1
    Email author
  • Friedhelm Schwenker
    • 1
  • Günther Palm
    • 1
  1. 1.Institute of Neural Information ProcessingUlm UniversityUlmGermany

Personalised recommendations