Multiple Classifier Systems for the Classification of Audio-Visual Emotional States

Glodek, Michael; Tschechne, Stephan; Layher, Georg; Schels, Martin; Brosch, Tobias; Scherer, Stefan; Kächele, Markus; Schmidt, Miriam; Neumann, Heiko; Palm, Günther; Schwenker, Friedhelm

doi:10.1007/978-3-642-24571-8_47

Michael Glodek¹⁹,
Stephan Tschechne¹⁹,
Georg Layher¹⁹,
Martin Schels¹⁹,
Tobias Brosch¹⁹,
Stefan Scherer¹⁹,
Markus Kächele¹⁹,
Miriam Schmidt¹⁹,
Heiko Neumann¹⁹,
Günther Palm¹⁹ &
…
Friedhelm Schwenker¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6975))

Included in the following conference series:

International Conference on Affective Computing and Intelligent Interaction

4678 Accesses
52 Citations

Abstract

Research activities in the field of human-computer interaction increasingly addressed the aspect of integrating some type of emotional intelligence. Human emotions are expressed through different modalities such as speech, facial expressions, hand or body gestures, and therefore the classification of human emotions should be considered as a multimodal pattern recognition problem. The aim of our paper is to investigate multiple classifier systems utilizing audio and visual features to classify human emotional states. For that a variety of features have been derived. From the audio signal the fundamental frequency, LPC- and MFCC coefficients, and RASTA-PLP have been used. In addition to that two types of visual features have been computed, namely form and motion features of intermediate complexity. The numerical evaluation has been performed on the four emotional labels Arousal, Expectancy, Power, Valence as defined in the AVEC data set. As classifier architectures multiple classifier systems are applied, these have been proven to be accurate and robust against missing and noisy data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multimodal Database of Emotional Speech, Video and Gestures

Fusion of Fragmentary Classifier Decisions for Affective State Recognition

Emotion classification from speech signal based on empirical mode decomposition and non-linear features

Article Open access 25 February 2021

References

Bayerl, P., Neumann, H.: A fast biologically inspired algorithm for recurrent motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(2), 246–260 (2007)
Article Google Scholar
Breiman, L.: Bagging predictors. Machine learning 24(2), 123–140 (1996)
MATH Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.: Emotion recognition in human-computer interaction. Signal Processing Magazine 18(1), 32–80 (2001)
Article Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Transactions on Acoustics, Speech and Signal Processing 28(4), 357–366 (1980)
Article Google Scholar
Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18(4), 407–422 (2005)
Article Google Scholar
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 87(4), 1738–1752 (1990)
Article Google Scholar
Hermansky, H., Hanson, B., Wakita, H.: Perceptually based linear predictive analysis of speech. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 10, pp. 509–512. IEEE, Los Alamitos (1985)
Google Scholar
Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: RASTA-PLP speech analysis technique. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 121–124. IEEE, Los Alamitos (1992)
Google Scholar
Kuncheva, L., Whitaker, C.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51(2), 181–207 (2003)
Article MATH Google Scholar
Mutch, J., Lowe, D.: Object class recognition and localization using sparse features with limited receptive fields. International Journal of Computer Vision 80(1), 45–57 (2008)
Article Google Scholar
Oudeyer, P.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies 59(1-2), 157–183 (2003)
Article Google Scholar
Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74 (1999)
Google Scholar
Poggio, T., Knoblich, U., Mutch, J.: CNS: a GPU-based framework for simulating cortically-organized networks. MIT-CSAIL-TR-2010-013/CBCL-286 (2010)
Google Scholar
Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice-Hall Signal Processing Series (1993)
Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature Neuroscience 2, 1019–1025 (1999)
Article Google Scholar
Robinson, D.W., Dadson, R.: A re-determination of the equal-loudness relations for pure tones. British Journal of Applied Physics 7, 166–181 (1956)
Article Google Scholar
Rolls, E.: Brain mechanisms for invariant visual recognition and learning. Behavioural Processes 33(1-2), 113–138 (1994)
Article Google Scholar
Schels, M., Schwenker, F.: A multiple classifier system approach for facial expressions in image sequences utilizing GMM supervectors. In: International Conference on Pattern Recognition (ICPR), pp. 4251–4254 (2010)
Google Scholar
Scherer, S., Schwenker, F., Palm, G.: Classifier fusion for emotion recognition from speech. In: Advanced Intelligent Environments, pp. 95–117 (2009)
Google Scholar
Schmidt, M., Schels, M., Schwenker, F.: A hidden markov model based approach for facial expression recognition in image sequences. In: Schwenker, F., El Gayar, N. (eds.) ANNPR 2010. LNCS(LNAI), vol. 5998, pp. 149–160. Springer, Heidelberg (2010)
Chapter Google Scholar
Schölkopf, B., Smola, A.J., Williamson, R., Bartlett, P.: New support vector algorithms. Neural Computation 12(5), 1207–1245 (2000)
Article Google Scholar
Schuller, B., Valsta, M., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: The first international audio/visual emotion challenge and workshop (AVEC 2011). In: D´Mello, S., et al. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 415–424. Springer, Heidelberg (2011)
Google Scholar
Schwenker, F., Scherer, S., Magdi, Y.M., Palm, G.: The GMM-SVM supervector approach for the recognition of the emotional status from speech. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part I. LNCS, vol. 5768, pp. 894–903. Springer, Heidelberg (2009)
Chapter Google Scholar
Schwenker, F., Scherer, S., Schmidt, M., Schels, M., Glodek, M.: Multiple classifier systems for the recogonition of human emotions. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 315–324. Springer, Heidelberg (2010)
Chapter Google Scholar
Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 994–1000 (2005)
Google Scholar
Walter, S., Scherer, S., Schels, M., Glodek, M., Hrabal, D., Schmidt, M., Böck, R., Limbrecht, K., Traue, H.C., Schwenker, F.: Multimodal emotion classification in naturalistic user behavior. In: Jacko, J.A. (ed.) HCI International 2011, Part III. LNCS, vol. 6763, pp. 603–611. Springer, Heidelberg (2011)
Chapter Google Scholar
Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. Journal of Computer Science and Technology 16(6), 582–589 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Neural Information Processing, Ulm University, 89081, Ulm, Germany
Michael Glodek, Stephan Tschechne, Georg Layher, Martin Schels, Tobias Brosch, Stefan Scherer, Markus Kächele, Miriam Schmidt, Heiko Neumann, Günther Palm & Friedhelm Schwenker

Authors

Michael Glodek
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Tschechne
View author publications
You can also search for this author in PubMed Google Scholar
Georg Layher
View author publications
You can also search for this author in PubMed Google Scholar
Martin Schels
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Brosch
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Scherer
View author publications
You can also search for this author in PubMed Google Scholar
Markus Kächele
View author publications
You can also search for this author in PubMed Google Scholar
Miriam Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Heiko Neumann
View author publications
You can also search for this author in PubMed Google Scholar
Günther Palm
View author publications
You can also search for this author in PubMed Google Scholar
Friedhelm Schwenker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Memphis, 202 Psychology Building, 38152, Memphis, TN, USA
Sidney D’Mello & Arthur Graesser &
Technische Universität München, Arcisstraße 21, 80333, München, Germany
Björn Schuller
Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur (LIMSI-CNRS), Bâtiment 508, 91403, Orsay Cedex, France
Jean-Claude Martin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Glodek, M. et al. (2011). Multiple Classifier Systems for the Classification of Audio-Visual Emotional States. In: D’Mello, S., Graesser, A., Schuller, B., Martin, JC. (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, vol 6975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24571-8_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-24571-8_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24570-1
Online ISBN: 978-3-642-24571-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multiple Classifier Systems for the Classification of Audio-Visual Emotional States

Abstract

Access this chapter

Preview

Similar content being viewed by others

Multimodal Database of Emotional Speech, Video and Gestures

Fusion of Fragmentary Classifier Decisions for Affective State Recognition

Emotion classification from speech signal based on empirical mode decomposition and non-linear features

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multiple Classifier Systems for the Classification of Audio-Visual Emotional States

Abstract

Access this chapter

Preview

Similar content being viewed by others

Multimodal Database of Emotional Speech, Video and Gestures

Fusion of Fragmentary Classifier Decisions for Affective State Recognition

Emotion classification from speech signal based on empirical mode decomposition and non-linear features

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation