Human Behaviour in HCI: Complex Emotion Detection through Sparse Speech Features

  • Ingo Siegert
  • Kim Hartmann
  • David Philippou-Hübner
  • Andreas Wendemuth
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8212)


To obtain a more human-like interaction with technical systems, those have to be adaptable to the users’ individual skills, preferences, and current emotional state. In human-human interaction (HHI) the behaviour of the speaker is characterised by semantic and prosodic cues, given as short feedback signals. These signals minimally communicate certain dialogue functions such as attention, understanding, confirmation, or other attitudinal reactions. Thus, these signals play an important role in the progress and coordination of interaction. They allow the partners to inform each other of their behavioural or affective state without interrupting the ongoing dialogue.

Vocal communication provides acoustic details revealing the speaker’s feelings, believes, and social relations. Incorporating discourse particles (DPs) in human-computer interaction (HCI) systems will allow the detection of complex emotions, which are currently hard to access. Complex emotions in turn are closely related to human behaviour. Hence, integrating automatic DP detection and complex emotion assignment in HCI systems provides a first approach to the integration of human behaviour understanding in HCI systems.

In this paper we present methods allowing to extract the pitch-contour of DPs and to assign complex emotions to observed DPs. We investigate the occurrences of DPs in naturalistic HCI and show that DPs may be assigned to complex emotions automatically. Furthermore, we show that DPs are indeed related to behaviour, showing an age-gender specific usage during naturalistic HCI. Additionally, we prove that DPs may be used to automatically detect and classify complex emotions during HCI.


Prosodic Analysis Companion Systems Human-Computer Interaction Discourse Particle Pitch Contour Classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allwood, J., Nivre, J., Ahlsén, E.: On the semantics and pragmatics of linguistic feedback. Journal of Semantics 9(1), 1–26 (1992)CrossRefGoogle Scholar
  2. 2.
    Benus, S., Gravana, A., Hirschberg, J.: The Prosody of Backchannels in American Englisch. In: Proceedings of the 16th International Congress of Phonetic Sciences, pp. 1065–1068. Saarbrücken, Germany (2007)Google Scholar
  3. 3.
    Ekman, P.: Basic Emotions, pp. 45–60. John Wiley & Sons, Ltd., Sussex (2005)Google Scholar
  4. 4.
    Fischer, K., Wrede, B., Brindöpke, C., Johanntokrax, M.: Quantitative und funktionale Analysen von Diskurspartikeln im Computer Talk. International Journal for Language Data Processing 20(1-2), 85–100 (1996)Google Scholar
  5. 5.
    Gerhard, D.: Pitch Extraction and Fundamental Frequency: History and Current Techniques. Tech. Rep. TR-CS 2003-06, Regina, Saskatchewan, Canada (2003)Google Scholar
  6. 6.
    Hartmann, K., Siegert, I., Philippou-Hübner, D., Wendemuth, A.: Emotion-Detection in HCI: From Speech Features to Emotion Space. In: Proc. of 12th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Systems (in press, 2013)Google Scholar
  7. 7.
    Kehrein, R., Rabanus, S.: Ein Modell zur funktionalen Beschreibung von Diskurspartikeln. In: Neue Wege der Intonationsforschung, Germanistische Linguistik, vol. 157-158, pp. 33–50. Georg Olms, Hildesheim (2001)Google Scholar
  8. 8.
    Kockmann, M., Burget, L., Černocký, J.H.: Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Commun. 53(9-10), 1172–1185 (2011)CrossRefGoogle Scholar
  9. 9.
    Lacey, J.I.: Somatic response patterning and stress: Some revisions of activation theory. In: Appley, M.H., Trumbull, R. (eds.) Psychological Stress: Issues in Research. Appleton-Century-Crofts, New York (1967)Google Scholar
  10. 10.
    Ladd, R.D.: Intonational Phonology. Studies in Linguistics, vol. 79. Cambridge University Press (1996)Google Scholar
  11. 11.
    Lange, J., Frommer, J.: Subjective experience and intentional setting within intervies of User-Companion-Interaction. In: Informatik Schafft Communities, Beiträge der 41. Jahrestagung der GI. Lecture Notes in Informatics, vol. 192, p. 240 (2011)Google Scholar
  12. 12.
    Martin, J.C., Niewiadomski, R., Devillers, L., Buisine, S., Pelachaud, C.: Multimodal complex emotions: Gesture expressivity and blended facial expressions. I. J. Humanoid Robotics (3), 269–291 (2006)Google Scholar
  13. 13.
    Müller, M.: Information Retrieval for Music and Motion. In: Dynamic Time Warping. Springer, Heidelberg (2007)Google Scholar
  14. 14.
    Paschen, H.: Die Funktion der Diskurspartikel HM. Master’s thesis, University Mainz (1995)Google Scholar
  15. 15.
    Patel, S., Scherer, K.R., Björkner, E., Sundberg, J.: Mapping emotions into acoustic space: the role of voice production. Biological Psychology 87(1), 93–98 (2011)CrossRefGoogle Scholar
  16. 16.
    Plutchik, R.: Emotion, a psychoevolutionary synthesis. Harper & Row (1980)Google Scholar
  17. 17.
    Rösner, D., Friesen, R., Otto, M., Lange, J., Haase, M., Frommer, J.: Intentionality in interacting with companion systems an empirical approach. In: Jacko, J.A. (ed.) Human-Computer Interaction, Part III, HCII 2011. LNCS, vol. 6763, pp. 593–602. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  18. 18.
    Schmidt, J.E.: Bausteine der Intonation. In: Neue Wege der Intonationsforschung, Germanistische Linguistik, vol. 157-158, pp. 9–32. Georg Olms, Hildesheim (2001)Google Scholar
  19. 19.
    Siegert, I., Böck, R., Wendemuth, A.: The influence of context knowledge for multimodal annotation on natural material. In: Joint Proc. of IVA 2012 Workshops (Multimodal Analyses Enabling Artificial Agents in HCI) (2012)Google Scholar
  20. 20.
    Siegert, I., Prylipko, D., Hartmann, K., Böck, R., Wendemuth, A.: Investigating the form-function-relation of the discourse particle “hm” in a naturalistic human-computer interaction. In: 23rd Italian Workshop on Neural Nets. Smart Innovation, Systems and Technologies. Springer, Heidelberg (accepted 2013)Google Scholar
  21. 21.
    Ward, N.: Pragmatic functions of prosodic features in non-lexical utterances. In: Proceedings of Speech Prosody 2004, pp. 325–328. Nara, Japan (2004)Google Scholar
  22. 22.
    Wendemuth, A., Biundo, S.: A Companion Technology for Cognitive Technical Systems. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) COST 2102. LNCS, vol. 7403, pp. 89–103. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  23. 23.
    Xuedong, H., Jack, M., Ariki, Y.: Hidden Markov Models for Speech Recognition. Edinburgh University Press, Edinburgh (1990)Google Scholar
  24. 24.
    Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK book (for HTK Version 3.4). Cambridge University Press, Cambridge (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Ingo Siegert
    • 1
  • Kim Hartmann
    • 1
  • David Philippou-Hübner
    • 1
  • Andreas Wendemuth
    • 1
  1. 1.Cognitive Systems Group, IIKT, and Center for Behavioral Brain SciencesOtto von Guericke University MagdeburgGermany

Personalised recommendations