Journal on Multimodal User Interfaces

, Volume 8, Issue 1, pp 75–86 | Cite as

Analysis of significant dialog events in realistic human–computer interaction

  • Dmytro PrylipkoEmail author
  • Dietmar Rösner
  • Ingo Siegert
  • Stephan Günther
  • Rafael Friesen
  • Matthias Haase
  • Bogdan Vlasenko
  • Andreas Wendemuth
Original Paper


This paper addresses issues of automatically detecting significant dialog events (SDEs) in naturalistic HCI, and of deducing trait-specific conclusions relevant for the design of spoken dialog systems. We perform our investigations on the multimodal LAST MINUTE corpus with records from naturalistic interactions. First, we used textual transcripts to analyse interaction styles and discourse structures. We found indications that younger subjects prefer a more technical style in communication with dialog systems. Next, we model the subject’s internal success state with a hidden Markov model trained using the observed sequences of system feedback. This reveals that younger subjects interact significantly more successful with technical systems. Aiming on automatic detection of specific subjects’s reactions, we then semi-automatically annotate SDEs—phrases indicating an irregular, i.e. not-task-oriented subject behavior. We use both acoustic and linguistic features to build several trait-specific classifiers for dialog phases, which showed pronouncedly different accuracies for diverse age and gender groups. The presented investigations coherently support age-dependence of both expressiveness and problem-solving ability. This in turn induces design rules for future automatic designated “companion” systems.


Human–computer interaction Multimodal analysis Companion technology 



The study is performed in the framework of the   Transregional   Collaborative   Research   Centre SFB/TRR 62 “A Companion-Technology for Cognitive Technical Systems”   funded by the German Research Foundation (DFG). Responsibility for the content lies with the authors.


  1. 1.
    Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003) How to find trouble in communication. Speech Commun 40(1–2):117–143Google Scholar
  2. 2.
    Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L (2011) Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25(1):4–28CrossRefGoogle Scholar
  3. 3.
    Boersma P (2001) Praat, a system for doing phonetics by computer. Glot Int 5(9/10):341–345Google Scholar
  4. 4.
    Callejas Z, López-Cózar R (2008) Influence of contextual information in emotion annotation for spoken dialogue systems. Speech Commun 50:416–433CrossRefGoogle Scholar
  5. 5.
    Campbell N (2007) On the use of nonverbal speech sounds in human communication. Cost 2012 workshop (Vietri), LNAI. Springer, Berlin, Heidelberg, pp 117–128Google Scholar
  6. 6.
    Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal user’s affective state analysis in naturalistic interaction. J Multimodal User Interfaces 3(1):49–66Google Scholar
  7. 7.
    Cohn JF, Schmidt K (2004) The timing of facial motion in posed and spontaneous smiles. Int J Wavelets Multiresolut Inf Process 2(2):121–132CrossRefGoogle Scholar
  8. 8.
    Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J (2001) Emotion recognition in human–computer interaction. IEEE Signal Process Mag 18(1):32–80CrossRefGoogle Scholar
  9. 9.
    Douglas-Cowie E, Devillers L, Martin JC, Cowie R, Savvidou S, Abrilian S, Cox C (2005) Multimodal databases of everyday emotion: facing up to complexity. In: Proceedings of Interspeech’05, pp 813–816Google Scholar
  10. 10.
    Edlund J, Gustafson J, Heldner M, Hjalmarsson A (2008) Towards human-like spoken dialogue systems. Speech Commun 50(8):630–645CrossRefGoogle Scholar
  11. 11.
    Frommer J, Rösner D, Haase M, Lange J, Friesen R, Otto M (2012) Detection and avoidance of failures in dialogues—wizard of Oz experiment operator’s manual. Pabst Science Publishers, GermanyGoogle Scholar
  12. 12.
    Fukuda S, Matsuura Y (1996) Understanding of emotional feelings in sound. Trans Jpn Soc Mech Eng Part C 62(598):2293–2298CrossRefGoogle Scholar
  13. 13.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18CrossRefGoogle Scholar
  14. 14.
    Jimenez-Fernandez A, Del Pozo F, Munoz C, Zoreda JL (1987) Pattern recognition in the vocal expression of emotional categories. In: Proceedings of the 25th annual Conference of the IEEE Engineering in Medicine and Biology Society, pp 2090–2091Google Scholar
  15. 15.
    Jurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, USA.
  16. 16.
    Kapoor A, Burleson W, Picard RW (2007) Automatic prediction of frustration. Int J Hum Comput Stud 65(8):724–736CrossRefGoogle Scholar
  17. 17.
    Krauss RM, Chen Y, Chawla P (1996) Nonverbal behavior and nonverbal communication: what do conversational hand gestures tell us? Adv Exp Soc Psychol 28:389–450CrossRefGoogle Scholar
  18. 18.
    Lee CM, Narayanan S (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303CrossRefGoogle Scholar
  19. 19.
    Prylipko D, Schuller B, Wendemuth A (2012) Fine-tuning HMMs for nonverbal vocalizations in spontaneous speech: a multicorpus perspective. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pp 4625–4628Google Scholar
  20. 20.
    Rösner D, Friesen R, Otto M, Lange J, Haase M, Frommer J (2011) Human–computer Interaction. Towards mobile and intelligent interaction environments. Intentionality in interacting with companion systems: an empirical approach. Springer, Berlin, pp 593–602Google Scholar
  21. 21.
    Rösner D, Frommer J, Andrich R, Friesen R, Haase M, Kunze M, Lange J, Otto M (2012) LAST MINUTE: a novel corpus to support emotion, sentiment and social signal processing. In: Conference on Language Resources and Evaluation, LREC’12 AbstractsGoogle Scholar
  22. 22.
    Rösner D, Kunze M, Otto M, Frommer J (2012) Linguistic analyses of the LAST MINUTE corpus. In: Proceedings of KONVENS’12, ÖGAI, pp 145–154Google Scholar
  23. 23.
    Scherer KR, Ceschi G (1997) Lost luggage: a field study of emotion-antecedent appraisal. Motiv Emot 21:211–235CrossRefGoogle Scholar
  24. 24.
    Scherer S, Glodek M, Layher G, Schels M, Schmidt M, Brosch T, Tschechne S, Schwenker F, Neumann H, Palm G (2012) A generic framework for the inference of user states in human computer interaction. J Multimodal User Interfaces 6(3–4):117–141CrossRefGoogle Scholar
  25. 25.
    Schmidt T, Schütte W (2010) Folker: An annotation tool for efficient transcription of natural, multi-party interaction. In: Proceedings of LREC’10, pp 2091–2096Google Scholar
  26. 26.
    Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53(9–10):1062–1087Google Scholar
  27. 27.
    Selting M, et al (2009) Gesprächsanalytisches Transkriptionssystem 2 (GAT 2)Google Scholar
  28. 28.
    Siegert I, Böck R, Philippou-Hübner D, Vlasenko B, Wendemuth A (2011) Appropriate emotional labeling of non-acted speech using basic emotions, Geneva emotion wheel and self assessment ,anikins. In: Proceedings of ICME’11Google Scholar
  29. 29.
    Suwa M, Sugie N, Fujimora K (1978) A preliminary note on pattern recognition of human emotional expression. In: Proceedings of the IEEE International Conference on Pattern Recognition, pp 408–410Google Scholar
  30. 30.
    Vlasenko B, Prylipko D, Philippou-Hübner D, Wendemuth A (2011) Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In: Proceedings of Interspeech’11, pp 1577–1580Google Scholar
  31. 31.
    Vlasenko B, Prylipko D, Böck R, Wendemuth A (2014) Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Comput Speech Lang (Article in press)Google Scholar
  32. 32.
    Walker M, Langkilde I, Wright J, Gorin A, Litman D (2000) Learning to predict problematic situations in a spoken dialogue system: experiments with how may I help you? In: Proceedings of NAACL’00, pp 210–217Google Scholar
  33. 33.
    Wilks Y (2010) Close engagements with artificial companions: key social, psychological, ethical and design issues. John Benjamins, AmsterdamCrossRefGoogle Scholar
  34. 34.
    Williams CE, Stevens KN (1972) Emotions and speech: some acoustical correlates. J Acoust Soc Am 52(4B):1238–1250CrossRefGoogle Scholar
  35. 35.
    Wöllmer M, Eyben F, Reiter S, Schuller B, Cox C, Douglas-Cowie E, Cowie R (2008) Abandoning emotion classes—towards continuous emotion recognition with modelling of long-range dependencies. In: Proceedings of Interspeech’08, pp 597–600Google Scholar
  36. 36.
    Wolters M, Georgila K, Moore JD, MacPherson SE (2009) Being old doesn’t mean acting old: how older users interact with spoken dialog systems. ACM Trans Access Comput 2(1):1–39Google Scholar
  37. 37.
    Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book (for HTK Version 3.4). Cambridge University Press, CambridgeGoogle Scholar
  38. 38.
    Zeng Z, Tu J, Liu M, Huang T, Pianfetti B, Roth D, Levinson S (2007) Audio-visual affect recognition. IEEE Trans Multimed 9(2):424–428CrossRefGoogle Scholar
  39. 39.
    Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58CrossRefGoogle Scholar

Copyright information

© OpenInterface Association 2013

Authors and Affiliations

  • Dmytro Prylipko
    • 1
    Email author
  • Dietmar Rösner
    • 2
  • Ingo Siegert
    • 1
  • Stephan Günther
    • 2
  • Rafael Friesen
    • 2
  • Matthias Haase
    • 3
  • Bogdan Vlasenko
    • 1
  • Andreas Wendemuth
    • 1
  1. 1.IIKT & CBBS, Otto-von-Guericke University MagdeburgMagdeburgGermany
  2. 2.IWS & CBBS, Otto-von-Guericke-University MagdeburgMagdeburgGermany
  3. 3.Department of Psychosomatic Medicine and PsychotherapyOtto-von-Guericke UniversityMagdeburgGermany

Personalised recommendations