Incremental Multimodal Feedback for Conversational Agents

  • Stefan Kopp
  • Thorsten Stocksmeier
  • Dafydd Gibbon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4722)


Just like humans, conversational computer systems should not listen silently to their input and then respond. Instead, they should enforce the speaker-listener link by attending actively and giving feedback on an utterance while perceiving it. Most existing systems produce direct feedback responses to decisive (e.g. prosodic) cues. We present a framework that conceives of feedback as a more complex system, resulting from the interplay of conventionalized responses to eliciting speaker events and the multimodal behavior that signals how internal states of the listener evolve. A model for producing such incremental feedback, based on multi-layered processes for perceiving, understanding, and evaluating input, is described.


Pitch Contour Virtual Human Listener State Conversational Agent Emotional Prosody 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allwood, J., Cerrato, L.: A study of gestural feedback expressions. In: Paggio, P., K. J.K., Jönsson, A. (eds.) First Nordic Symposium on Multimodal Communication, Copenhagen, 23-24 September, pp. 7–22 (2003)Google Scholar
  2. 2.
    Allwood, J., Nivre, J., Ahlsen, E.: On the semantics and pragmatics of linguistic feedback. Journal of semantics 9(1), 1–26 (1992)CrossRefGoogle Scholar
  3. 3.
    Cassell, J., Bickmore, T.W., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H.H., Yan, H.: Embodiment in Conversational Interfaces: Rea. In: Proceedings of the CHI 1999 Conference, Pittsburgh, PA, pp. 520–527 (1999)Google Scholar
  4. 4.
    Cathcart, N., Carletta, J., Klein, E.: A shallow model of backchannel continuers in spoken dialogue. In: EACL 10. Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, April 2003, pp. 51–58 (2003)Google Scholar
  5. 5.
    Clark, H.H., Schaefer, E.F.: Contributing to discourse. Cognitive Science 13, 259–294 (1989)CrossRefGoogle Scholar
  6. 6.
    Fujie, S., Fukushima, K., Kobayashi, T.: A conversation robot with back-channel feedback function based on linguistic and nonlinguistic information. In: Proc. Int. Conference on Autonomous Robots and Agents (2004)Google Scholar
  7. 7.
    Graesser, A.C., Lu, S., Jackson, G.T., Mitchell, H., Ventura, M., Olney, A., Louwerse, M.M.: A tutor with dialogue in natural language. Behavioral Research Methods, Instruments, and Computers 36, 180–193 (2004)Google Scholar
  8. 8.
    Gratch, J., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., van der Werf, R., Morency, L.-P.: Virtual Rapport. In: Gratch, J., Young, M., Aylett, R., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 14–27. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Kopp, S., Allwood, J., Grammer, K., Ahlsen, E., Stocksmeier, T.: Modeling embodied feedback in virtual humans. In: Wachsmuth, I., Knoblich, G. (eds.) Modeling Communication With Robots and Virtual Humans, Springer, Heidelberg (to appear)Google Scholar
  10. 10.
    Kopp, S., Gesellensetter, L., Krämer, N.C., Wachsmuth, I.: A Conversational Agent as Museum Guide – Design and Evaluation of a Real-World Application. In: Panayiotopoulos, T., Gratch, J., Aylett, R., Ballin, D., Olivier, P., Rist, T. (eds.) IVA 2005. LNCS (LNAI), vol. 3661, pp. 329–343. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Kopp, S., Wachsmuth, I.: Synthesizing multimodal utterances for conversational agents. Computer Animation & Virtual Worlds 15(1), 39–52 (2004)CrossRefGoogle Scholar
  12. 12.
    Schmid, H.: Improvements in Part-of-Speech Tagging With an Application To German (1995),
  13. 13.
    Stocksmeier, T., Kopp, S., Gibbon, D.: Synthesis of prosodic attitudinal variants in german backchannel ’ja’. In: Proc. of Interspeech 2007 (2007)Google Scholar
  14. 14.
    Takeuchi, M., Kitaoka, N., Nakagawa, S.: Timing detection for realtime dialog systems using prosodic and linguistic information. In: SP 2004. Proc. of the International Conference Speech Prosody, pp. 529–532 (2004)Google Scholar
  15. 15.
    Thórisson, K.R.: Communicative Humanoids - A Computational Model of Psychosocial Dialogue Skills. PhD thesis, School of Architecture & Planning, Massachusetts Institute of Technology (September 1996)Google Scholar
  16. 16.
    Ward, N., Tsukahara, W.: Prosodic features which cue back-channel responses in English and Japanese (2000)Google Scholar
  17. 17.
    Yngve, V.H.: On getting a word in edgewise. In: Papers from the Sixth Regional Meeting of the Chicago Linguistics Society, April 16-18, pp. 567–578. University of Chicago, Department of Linguistics (1970)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Stefan Kopp
    • 1
  • Thorsten Stocksmeier
    • 1
  • Dafydd Gibbon
    • 1
  1. 1.Artificial Intelligence Group, Faculty of Technology, University of Bielefeld, Faculty of Linguistics and Literature, University of Bielefeld, D-33594 BielefeldGermany

Personalised recommendations