How Low Level Observations Can Help to Reveal the User’s State in HCI

  • Stefan Scherer
  • Martin Schels
  • Günther Palm
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6975)


For next generation human computer interaction (HCI), it is crucial to assess the affective state of a user. However, this respective user state is – even for human annotators – only indirectly inferable using background information and the observation of the interaction’s progression as well as the social signals produced by the interlocutors. In this paper, coincidences of directly observable patterns and different user states are examined in order to relate the former to the latter. This evaluation motivates a hierarchical label system, where labels of latent user states are supported by low level observations. The dynamic patterns of occurrences of various social signals may in an integration step infer the latent user’s state. Thus, we expect to advance the understanding of the recognition of affective user states as compositions of lower level observations for automatic classifiers in HCI.


human computer interaction annotation schemes affective state multiparty dialog 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Campbell, W.N.: On the use of nonVerbal speech sounds in human communication. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 117–128. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine 18(1), 32–80 (2001)CrossRefGoogle Scholar
  3. 3.
    Darwin, C.: The expression of emotion in man and animals, 3rd edn. HarperCollins, London (1978)Google Scholar
  4. 4.
    Ekman, P.: Facial expression and emotion. American Psychologist 48, 384–392 (1993)CrossRefGoogle Scholar
  5. 5.
    Kendon, A. (ed.): Nonverbal Communication, Interaction, and Gesture. Selections from Semiotica Series, vol. 41. Walter de Gruyter, Berlin (1981)Google Scholar
  6. 6.
    Kipp, M.: Anvil - a generic annotation tool for multimodal dialogue. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), Aalborg, pp. 1367–1370. ISCA (2001)Google Scholar
  7. 7.
    Layher, G., Liebau, H., Niese, R., Al-Hamadi, A., Michaelis, B., Neumann, H.: Robust stereoscopic head pose estimation in human-computer interaction and a unified evaluation framework. To Appear in 16th International Conference on Image Analysis and Processing (ICIAP 2011). Springer, Heidelberg (2011)Google Scholar
  8. 8.
    Russell, J.A., Barrett, L.F.: Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. Journal of Personality and Social Psychology 76(5), 805–819 (1999)CrossRefGoogle Scholar
  9. 9.
    Scherer, S., Glodek, M., Schwenker, F., Campbell, N., Palm, G.: Spotting laughter in naturalistic multiparty conversations: a comparison of automatic online and offline approaches using audiovisul data. ACM Transactions on Interactive Intelligent Systems: Special Issue on Affective Interaction in Natural Environments (accepted for publication)Google Scholar
  10. 10.
    Scherer, S., Kane, J., Gobl, C., Schwenker, F.: Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification. IEEE Transactions on Audio, Speech and Language Processing (under review)Google Scholar
  11. 11.
    Scherer, S., Schwenker, F., Campbell, W.N., Palm, G.: Multimodal laughter detection in natural discourses. In: Ritter, H., Sagerer, G., Dillmann, R., Buss, M. (eds.) Proceedings of 3rd International Workshop on Human-Centered Robotic Systems (HCRS 2009). Cognitive Systems Monographs, pp. 111–121. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Scherer, S., Trentin, E., Schwenker, F., Palm, G.: Approaching emotion in human computer interaction. In: International Workshop on Spoken Dialogue Systems (IWSDS 2009), pp. 156–168 (2009)Google Scholar
  13. 13.
    Strauss, P.-M., Hoffmann, H., Minker, W., Neumann, H., Palm, G., Scherer, S., Traue, H.C., Weidenbacher, U.: The PIT corpus of german multi-party dialogues. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, pp. 2442–2445. ELRA (2008)Google Scholar
  14. 14.
    Strauss, P.-M., Hoffmann, H., Neumann, H., Minker, W., Palm, G., Scherer, S., Schwenker, F., Traue, H.C., Weidenbacher, U.: Wizard-of-oz data collection for perception and interaction in multi-user environments. In: Proceedings of the Fifth International Language Resources and Evaluation (LREC 2006), pp. 2014–2017. ELRA (2006)Google Scholar
  15. 15.
    Strauss, P.-M., Scherer, S., Layher, G., Hoffmann, H.: Evaluation of the PIT corpus or what a difference a face makes? In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta, pp. 3470–3474. ELRA (May 2010)Google Scholar
  16. 16.
    Yanushevskaya, I., Gobl, C., Chasaide, A.N.: Voice quality and loudness in affect perception. In: Proceedings of Speech Prosody 2008, Campinas, Brazil, pp. 29–32. ISCA (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Stefan Scherer
    • 1
    • 2
  • Martin Schels
    • 1
  • Günther Palm
    • 1
  1. 1.Institute of Neural Information ProcessingUlm UniversityGermany
  2. 2.Speech Communication LabTrinity College DublinIreland

Personalised recommendations