Multiple Classifier Systems for the Classification of Audio-Visual Emotional States

  • Michael Glodek
  • Stephan Tschechne
  • Georg Layher
  • Martin Schels
  • Tobias Brosch
  • Stefan Scherer
  • Markus Kächele
  • Miriam Schmidt
  • Heiko Neumann
  • Günther Palm
  • Friedhelm Schwenker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6975)

Abstract

Research activities in the field of human-computer interaction increasingly addressed the aspect of integrating some type of emotional intelligence. Human emotions are expressed through different modalities such as speech, facial expressions, hand or body gestures, and therefore the classification of human emotions should be considered as a multimodal pattern recognition problem. The aim of our paper is to investigate multiple classifier systems utilizing audio and visual features to classify human emotional states. For that a variety of features have been derived. From the audio signal the fundamental frequency, LPC- and MFCC coefficients, and RASTA-PLP have been used. In addition to that two types of visual features have been computed, namely form and motion features of intermediate complexity. The numerical evaluation has been performed on the four emotional labels Arousal, Expectancy, Power, Valence as defined in the AVEC data set. As classifier architectures multiple classifier systems are applied, these have been proven to be accurate and robust against missing and noisy data.

Keywords

Gaussian Mixture Model Emotion Recognition Emotional Intelligence Multi Layer Perceptron Critical Band 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bayerl, P., Neumann, H.: A fast biologically inspired algorithm for recurrent motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(2), 246–260 (2007)CrossRefGoogle Scholar
  2. 2.
    Breiman, L.: Bagging predictors. Machine learning 24(2), 123–140 (1996)MATHGoogle Scholar
  3. 3.
    Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.: Emotion recognition in human-computer interaction. Signal Processing Magazine 18(1), 32–80 (2001)CrossRefGoogle Scholar
  4. 4.
    Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Transactions on Acoustics, Speech and Signal Processing 28(4), 357–366 (1980)CrossRefGoogle Scholar
  5. 5.
    Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18(4), 407–422 (2005)CrossRefGoogle Scholar
  6. 6.
    Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 87(4), 1738–1752 (1990)CrossRefGoogle Scholar
  7. 7.
    Hermansky, H., Hanson, B., Wakita, H.: Perceptually based linear predictive analysis of speech. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 10, pp. 509–512. IEEE, Los Alamitos (1985)Google Scholar
  8. 8.
    Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: RASTA-PLP speech analysis technique. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 121–124. IEEE, Los Alamitos (1992)Google Scholar
  9. 9.
    Kuncheva, L., Whitaker, C.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51(2), 181–207 (2003)CrossRefMATHGoogle Scholar
  10. 10.
    Mutch, J., Lowe, D.: Object class recognition and localization using sparse features with limited receptive fields. International Journal of Computer Vision 80(1), 45–57 (2008)CrossRefGoogle Scholar
  11. 11.
    Oudeyer, P.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies 59(1-2), 157–183 (2003)CrossRefGoogle Scholar
  12. 12.
    Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74 (1999)Google Scholar
  13. 13.
    Poggio, T., Knoblich, U., Mutch, J.: CNS: a GPU-based framework for simulating cortically-organized networks. MIT-CSAIL-TR-2010-013/CBCL-286 (2010)Google Scholar
  14. 14.
    Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice-Hall Signal Processing Series (1993)Google Scholar
  15. 15.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  16. 16.
    Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature Neuroscience 2, 1019–1025 (1999)CrossRefGoogle Scholar
  17. 17.
    Robinson, D.W., Dadson, R.: A re-determination of the equal-loudness relations for pure tones. British Journal of Applied Physics 7, 166–181 (1956)CrossRefGoogle Scholar
  18. 18.
    Rolls, E.: Brain mechanisms for invariant visual recognition and learning. Behavioural Processes 33(1-2), 113–138 (1994)CrossRefGoogle Scholar
  19. 19.
    Schels, M., Schwenker, F.: A multiple classifier system approach for facial expressions in image sequences utilizing GMM supervectors. In: International Conference on Pattern Recognition (ICPR), pp. 4251–4254 (2010)Google Scholar
  20. 20.
    Scherer, S., Schwenker, F., Palm, G.: Classifier fusion for emotion recognition from speech. In: Advanced Intelligent Environments, pp. 95–117 (2009)Google Scholar
  21. 21.
    Schmidt, M., Schels, M., Schwenker, F.: A hidden markov model based approach for facial expression recognition in image sequences. In: Schwenker, F., El Gayar, N. (eds.) ANNPR 2010. LNCS(LNAI), vol. 5998, pp. 149–160. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  22. 22.
    Schölkopf, B., Smola, A.J., Williamson, R., Bartlett, P.: New support vector algorithms. Neural Computation 12(5), 1207–1245 (2000)CrossRefGoogle Scholar
  23. 23.
    Schuller, B., Valsta, M., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: The first international audio/visual emotion challenge and workshop (AVEC 2011). In: D´Mello, S., et al. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 415–424. Springer, Heidelberg (2011)Google Scholar
  24. 24.
    Schwenker, F., Scherer, S., Magdi, Y.M., Palm, G.: The GMM-SVM supervector approach for the recognition of the emotional status from speech. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part I. LNCS, vol. 5768, pp. 894–903. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  25. 25.
    Schwenker, F., Scherer, S., Schmidt, M., Schels, M., Glodek, M.: Multiple classifier systems for the recogonition of human emotions. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 315–324. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  26. 26.
    Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 994–1000 (2005)Google Scholar
  27. 27.
    Walter, S., Scherer, S., Schels, M., Glodek, M., Hrabal, D., Schmidt, M., Böck, R., Limbrecht, K., Traue, H.C., Schwenker, F.: Multimodal emotion classification in naturalistic user behavior. In: Jacko, J.A. (ed.) HCI International 2011, Part III. LNCS, vol. 6763, pp. 603–611. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  28. 28.
    Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. Journal of Computer Science and Technology 16(6), 582–589 (2001)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Michael Glodek
    • 1
  • Stephan Tschechne
    • 1
  • Georg Layher
    • 1
  • Martin Schels
    • 1
  • Tobias Brosch
    • 1
  • Stefan Scherer
    • 1
  • Markus Kächele
    • 1
  • Miriam Schmidt
    • 1
  • Heiko Neumann
    • 1
  • Günther Palm
    • 1
  • Friedhelm Schwenker
    • 1
  1. 1.Institute of Neural Information ProcessingUlm UniversityUlmGermany

Personalised recommendations