On Annotation and Evaluation of Multi-modal Corpora in Affective Human-Computer Interaction

  • Markus KächeleEmail author
  • Martin Schels
  • Sascha Meudt
  • Viktor Kessler
  • Michael Glodek
  • Patrick Thiam
  • Stephan Tschechne
  • Günther Palm
  • Friedhelm Schwenker
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8757)


In this paper, we discuss the topic of affective human-computer interaction from a data driven viewpoint. This comprises the collection of respective databases with emotional contents, feasible annotation procedures and software tools that are able to conduct a suitable labeling process. A further issue that is discussed in this paper is the evaluation of the results that are computed using statistical classifiers. Based on this we propose to use fuzzy memberships in order to model affective user state and endorse respective fuzzy performance measures.


Affective computing Annotation Machine learning Human computer interaction Multimodal corpora Fuzzy memberships 



This paper is based on work done within the Transregional Collaborative Research Centre SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation (DFG). The work of Markus Kächele is supported by a scholarship of the Landesgraduiertenförderung Baden-Württemberg at Ulm University.


  1. 1.
    Böck, R., Siegert, I., Haase, M., Lange, J., Wendemuth, A.: ikannotate – a tool for labelling, transcription, and annotation of emotionally coloured speech. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part I. LNCS, vol. 6974, pp. 25–34. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  2. 2.
    Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M.: ‘FEELTRACE’: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 19–24 (2000)Google Scholar
  3. 3.
    Dubois, D., Prade, H.: Fuzzy Sets and Systems: Theory and Applications. Academic Press, New York (1980)zbMATHGoogle Scholar
  4. 4.
    Glodek, M., Schels, M., Schwenker, F., Palm, G.: Combination of sequential class distributions from multiple channels using markov fusion networks. J. Multimodal User Interfaces 8, 257–272 (2014)CrossRefGoogle Scholar
  5. 5.
    Kächele, M., Glodek, M., Zharkov, D., Meudt, S., Schwenker, F.: Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In: Proceedings of ICPRAM, pp. 671–678 (2014)Google Scholar
  6. 6.
    Kächele, M., Schels, M., Schwenker, F.: Inferring depression and affect from application dependent meta knowledge. In: Proceedings of MM. ACM (2014).
  7. 7.
    Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of ICPR (2014, to appear)Google Scholar
  8. 8.
    Kächele, M., Thiam, P., Palm, G., Schwenker, F.: Majority-class aware support vector domain oversampling for imbalanced classification problems. In: El Gayar, N., Schwenker, F., Suen, C. (eds.) ANNPR 2014. LNCS, vol. 8774, pp. 83–92. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  9. 9.
    Kächele, M., Zharkov, D., Meudt, S., Schwenker, F.: Prosodic, spectral and voice quality feature selection using a long-term stopping criterion for audio-based emotion recognition. In: Proceedings of ICPR (2014, to appear)Google Scholar
  10. 10.
    Kim, J., André, E.: Emotion recognition based on physiological changes in music listening. IEEE Trans. Pattern Anal. Machine Intell. 30(12), 2067–2083 (2008)CrossRefGoogle Scholar
  11. 11.
    Kipp, M.: Anvil - a generic annotation tool for multimodal dialogue. In: Proceedings of 7th European Conference on Speech Communication and Technology (Eurospeech), pp. 1367–1370 (2001)Google Scholar
  12. 12.
    Meudt, S., Bigalke, L., Schwenker, F.: ATLAS - an annotation tool for HCI data utilizing machine learning methods. In: Proceedings of the 1st International Conference on Affective and Pleasurable Design, pp. 5347–5352 (2012)Google Scholar
  13. 13.
    Meudt, S., Zharkov, D., Kächele, M., Schwenker, F.: Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech. In: Proceedings of ICMI, pp. 551–556 (2013)Google Scholar
  14. 14.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Eaglewood Cliffs (1993) Google Scholar
  15. 15.
    Rösner, D., Frommer, J., Friesen, R., Haase, M., Lange, J., Otto, M.: LAST MINUTE: a multimodal corpus of speech-based user-companion interactions. In: Proceedings of LREC, pp. 2559–2566 (2012)Google Scholar
  16. 16.
    Schels, M., Glodek, M., Meudt, S., Scherer, S., Schmidt, M., Layher, G., Tschechne, S., Brosch, T., Hrabal, D., Walter, S., Traue, H., Palm, G., Neumann, H., Schwenker, F.: Multi-modal classifier-fusion for the recognition of emotions. In: Rojc, M., Campbell, N. (eds.) Coverbal Synchrony in Human-Machine Interaction, pp. 73–98. CRC Press, Boca Raton (2013)CrossRefGoogle Scholar
  17. 17.
    Schels, M., Glodek, M., Meudt, S., Schmidt, M., Hrabal, D., Böck, R., Walter, S., Schwenker, F.: Multi-modal classifier-fusion for the classification of emotional states in WOZ scenarios. In: Proceedings of 1st International Conference on Affective and Pleasurable Design, pp. 5337–5346 (2012)Google Scholar
  18. 18.
    Schels, M., Glodek, M., Palm, G., Schwenker, F.: Revisiting AVEC 2011 – an information fusion architecture. In: Apolloni, B., Bassis, S., Esposito, A., Morabito, F.C. (eds.) Neural Nets and Surroundings. SIST, vol. 19, pp. 385–393. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  19. 19.
    Schels, M., Kächele, M., Glodek, M., Hrabal, D., Walter, S., Schwenker, F.: Using unlabeled data to improve classification of emotional states in human computer interaction. J. Multimodal User Interfaces 8(1), 5–16 (2014)CrossRefGoogle Scholar
  20. 20.
    Schels, M., Kächele, M., Hrabal, D., Walter, S., Traue, H.C., Schwenker, F.: Classification of emotional states in a Woz scenario exploiting labeled and unlabeled bio-physiological data. In: Schwenker, F., Trentin, E. (eds.) PSL 2011. LNCS, vol. 7081, pp. 138–147. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  21. 21.
    Schels, M., Schwenker, F.: A multiple classifier system approach for facial expressions in image sequences utilizing GMM supervectors. In: Proceedings of ICPR, pp. 4251–4254. IEEE (2010)Google Scholar
  22. 22.
    Scherer, K.R., Johnstone, T., Klasmeyer, G.: Affective science. In: Davidson, R.J., Scherer, K.R., Goldsmith, H.H. (eds.) Handbook of Affective Sciences - Vocal expression of Emotion, pp. 433–456. Oxford University Press, New York (2003)Google Scholar
  23. 23.
    Scherer, S., Glodek, M., Layher, G., Schels, M., Schmidt, M., Brosch, T., Tschechne, S., Schwenker, F., Neumann, H., Palm, G.: A generic framework for the inference of user states in human computer interaction: how patterns of low level communicational cues support complex affective states. JMUI 6(3–4), 117–141 (2012)Google Scholar
  24. 24.
    Scherer, S., Kane, J., Gobl, C., Schwenker, F.: Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification. Comput. Speech Lang. 27(1), 263–287 (2012)CrossRefGoogle Scholar
  25. 25.
    Scherer, S., Schels, M., Palm, G.: How low level observations can help to reveal the user’s state in HCI. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 81–90. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  26. 26.
    Scherer, S., Siegert, I., Bigalke, L., Meudt, S.: Developing an expressive speech labeling tool incorporating the temporal characteristics of emotion. In: Proceedings of LREC, pp. 1172–1175 (2010)Google Scholar
  27. 27.
    Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: AVEC 2011–the first international audio/visual emotion challenge. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 415–424. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  28. 28.
    Schüssel, F., Honold, F., Schmidt, M., Bubalo, N., Huckauf, A., Weber, M.: Multimodal interaction history and its use in error detection and recovery. In: Proceedings of ICMI. ACM (2014, to appear)Google Scholar
  29. 29.
    Schwenker, F., Frey, M., Glodek, M., Kächele, M., Meudt, S., Schels, M., Schmidt, M.: A new multi-class fuzzy support vector machine algorithm. In: El Gayar, N., Schwenker, F., Suen, C. (eds.) ANNPR 2014. LNCS, vol. 8774, pp. 153–164. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  30. 30.
    Strauß, P.M., Hoffmann, H., Minker, W., Neumann, H., Palm, G., Scherer, S., Schwenker, F., Traue, H., Walter, W., Weidenbacher, U.: Wizard-of-oz data collection for perception and interaction in multi-user environments. In: Proceedings of LREC, pp. 2014–2017 (2006)Google Scholar
  31. 31.
    Thiel, C., Scherer, S., Schwenker, F.: Fuzzy-input fuzzy-output one-against-all support vector machines. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part III. LNCS (LNAI), vol. 4694, pp. 156–165. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  32. 32.
    Torralba, A., Russell, B., Yuen, J.: Labelme: online image annotation and applications. Proc. IEEE 98(8), 1467–1484 (2010)CrossRefGoogle Scholar
  33. 33.
    Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R., Pantic, M.: AVEC 2014: 3D dimensional affect and depression recognition challenge. In: Proceedings of ACM Multimedia 2014. ACM (2014)Google Scholar
  34. 34.
    Walter, S., Kim, J., Hrabal, D., Crawcour, S., Kessler, H., Traue, H.: Transsituational individual-specific biopsychological classification of emotions. IEEE Trans. Syst. Man Cybern. 43(4), 988–995 (2013)CrossRefGoogle Scholar
  35. 35.
    Walter, S., Scherer, S., Schels, M., Glodek, M., Hrabal, D., Schmidt, M., Böck, R., Limbrecht, K., Traue, H.C., Schwenker, F.: Multimodal emotion classification in naturalistic user behavior. In: Jacko, J.A. (ed.) Human-Computer Interaction, Part III, HCII 2011. LNCS, vol. 6763, pp. 603–611. Springer, Heidelberg (2011) Google Scholar
  36. 36.
    Wöllmer, M., Kaiser, M., Eyben, F., Schuller, B., Rigoll, G.: LSTM-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vis. Comput. 31(2), 153–163 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Markus Kächele
    • 1
    Email author
  • Martin Schels
    • 1
  • Sascha Meudt
    • 1
  • Viktor Kessler
    • 1
  • Michael Glodek
    • 1
  • Patrick Thiam
    • 1
  • Stephan Tschechne
    • 1
  • Günther Palm
    • 1
  • Friedhelm Schwenker
    • 1
  1. 1.Institute of Neural Information ProcessingUlm UniversityUlmGermany

Personalised recommendations