Skip to main content

Classifier Fusion for Emotion Recognition from Speech

  • Chapter
  • First Online:
Advanced Intelligent Environments

Abstract

The intention of this work is the investigation of the performance of an automatic emotion recognizer using biologically motivated features, comprising perceived loudness features proposed by Zwicker, robust RASTA-PLP features, and novel long-term modulation spectrum-based features . Single classifiers using only one type of features and multi-classifier systems utilizing all three types are examined using two-classifier fusion techniques. For all the experiments the standard Berlin Database of Emotional Speech comprising recordings of seven different emotions is used to evaluate the performance of the proposed multi-classifier system. The performance is compared with earlier work as well as with human recognition performance. The results reveal that using simple fusion techniques could improve the performance significantly, outperforming other classifiers used in earlier work. The generalization ability of the proposed system is further investigated in a leave-out one-speaker experiment, uncovering a strong ability to recognize emotions expressed by unknown speakers. Moreover, similarities between earlier speech analysis and the automatic emotion recognition results were found.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., and Weiss, B. (2005). A Database of German Emotional Speech. In Proceedings of Interspeech.

    Google Scholar 

  • Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., and Taylor, J. (2001). Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine, 18:32–80.

    Google Scholar 

  • Dellaert, F., Polzin, T., and Waibel, A. (1996). Recognizing Emotion in Speech. In Proceedings of International Conference on Spoken Language Processing (IC-SLP1996).

    Google Scholar 

  • Devillers, L., Vidrascu, L., and Lamel, L. (2005). Challenges in Real-Life Emotion Annotation and Machine Learning based Detection. Neural Networks, 18:407–422.

    Article  Google Scholar 

  • Fragopanagos, N. and Taylor, J. (2005). Emotion Recognition in Human-Computer Interaction. Neural Networks, 18:389–405.

    Article  Google Scholar 

  • Hermansky, H. (1996). Auditory Modeling in Automatic Recognition of Speech. In Proceedings of Keele Workshop.

    Google Scholar 

  • Hermansky, H. and Morgan, N. (1994). Rasta Processing of Speech. IEEE Transactions on Speech and Audio Processing, Special Issue on Robust Speech Recognition, 2(4):578–589.

    Google Scholar 

  • Hermansky, H., Morgan, N., Bayya, A., and Kohn, P. (1991). Rasta-PLP Speech Analysis. Technical report, ICSI Technical Report TR-91-069.

    Google Scholar 

  • Ho, T. K. (2002). Multiple Classifier Combination: Lessons and Next Steps. Series in Machine Perception and Artificial Intelligence, 47:171–198.

    Article  Google Scholar 

  • Kanederaa, N., Araib, T., Hermansky, H., and Pavele, M. (1999). On the Relative Importance of Various Components of the Modulation Spectrum for Automatic Speech Recognition. Speech Communications, 28:43–55.

    Article  Google Scholar 

  • Kohonen, T. (2001). Self Organizing Maps. Springer, New York.

    MATH  Google Scholar 

  • Krishna, H. K., Scherer, S., and Palm, G. (2007). A Novel Feature for Emotion Recognition in Voice Based Applications. In Proceedings of the Second International Conference on Affective Computing and Intelligent Interaction (ACII2007), pages 710–711.

    Google Scholar 

  • Kuncheva, L. (2004). Combining Pattern Classifiers: Methods and Algorithms. Wiley, New York.

    Book  MATH  Google Scholar 

  • Lee, C. M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., and Narayanan, S. S. (2004). Emotion Recognition Based on Phoneme Classes. In Proceedings of International Conference on Spoken Language Processing (IC-SLP2004).

    Google Scholar 

  • Murray, I. R. and Arnott, J. L. (1993). Toward the Simulation of Emotion in Synthetic Speech: A Review of the Literature on Human Vocal Emotion. Journal of the Acoustical Society of America, 93(2):1097–1108.

    Article  Google Scholar 

  • Nicholson, J., Takahashi, K., and Nakatsu, R. (2000). Emotion Recognition in Speech Using Neural Networks. Neural Computing and Applications, 9(4): 290–296.

    Article  Google Scholar 

  • Oudeyer, P.-Y. (2003). The Production and Recognition of Emotions in Speech: Features and Algorithms. International Journal of Human Computer Interaction, 59(1–2):157–183.

    Google Scholar 

  • Petrushin, V. (1999). Emotion in Speech: Recognition and Application to Call Centers. In Proceedings of Artificial Neural Networks in Engineering.

    Google Scholar 

  • Rabiner, L. R. and Schafer, R. W. (1978). Digital Processing of Speech Signals. Prentice-Hall Signal Processing Series. Prentice-Hall, Upper Saddle River.

    Google Scholar 

  • Scherer, K. R., Johnstone, T., and Klasmeyer, G. (2003). Handbook of Affective Sciences – Vocal Expression of Emotion, Chapter 23, pages 433–456.

    Google Scholar 

  • Scherer, S., Schwenker, F., and G., P. (2008). Emotion Recognition from Speech Using Multi-Classifier Systems and RBF-Ensembles, Chapter 3, pages 49–70.

    Google Scholar 

  • Whissel, C. (1989). The Dictionary of Affect in Language, Volume 4 of Emotion: Theory, Research and Experience. Academic Press, New York.

    Google Scholar 

  • Yacoub, S., Simske, S., Lin, X., and Burns, J. (2003). Recognition of Emotions in Interactive Voice Response Systems. In Proceedings of the European Conference on Speech Communication and Technology (Eurospeech).

    Google Scholar 

  • Zwicker, E., Fastl, H., Widmann, U., Kurakata, K., Kuwano, S., and Namba, S. (1991). Program for Calculating Loudness According to DIN 45631 (ISO 532B). Journal of the Acoustical Society of Japan, 12(1):39–42.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefan Scherer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag US

About this chapter

Cite this chapter

Scherer, S., Schwenker, F., Palm, G. (2009). Classifier Fusion for Emotion Recognition from Speech. In: Kameas, A., Callagan, V., Hagras, H., Weber, M., Minker, W. (eds) Advanced Intelligent Environments. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-76485-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-76485-6_5

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-76484-9

  • Online ISBN: 978-0-387-76485-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics