Predicting Listener Backchannels: A Probabilistic Multimodal Approach

  • Louis-Philippe Morency
  • Iwan de Kok
  • Jonathan Gratch
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5208)


During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model or Conditional Random Fields) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.


Feature Selection Hide Markov Model Conditional Random Field Feature Selection Algorithm Conditional Random Field Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Drolet, A., Morris, M.: Rapport in conflict resolution: accounting for how face-to-face contact fosters mutual cooperation in mixed-motive conflicts. Experimental Social Psychology 36, 26–50 (2000)CrossRefGoogle Scholar
  2. 2.
    Goldberg, S.: The secrets of successful mediators. Negotiation Journal 21(3), 365–376 (2005)CrossRefGoogle Scholar
  3. 3.
    Tsui, P., Schultz, G.: Failure of rapport: Why psychotheraputic engagement fails in the treatment of asian clients. American Journal of Orthopsychiatry 55, 561–569 (1985)Google Scholar
  4. 4.
    Fuchs, D.: Examiner familiarity effects on test performance: implications for training and practice. Topics in Early Childhood Special Education 7, 90–104 (1987)CrossRefGoogle Scholar
  5. 5.
    Burns, M.: Rapport and relationships: The basis of child care. Journal of Child Care 2, 47–57 (1984)Google Scholar
  6. 6.
    Cassell, J., Vilhjlmsson, H., Bickmore, T.: Beat: The behavior expressive animation toolkit. In: Proceedings of the SIGGRAPH (2001)Google Scholar
  7. 7.
    Lee, J., Marsella, S.: Nonverbal behavior generator for embodied conversational agents. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 243–255. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Kipp, M., Neff, M., Kipp, K., Albrecht, I.: Toward natural gesture synthesis: Evaluating gesture units in a data-driven approach. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 15–28. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Thiebaux, M., Marshall, A., Marsella, S., Kallmann, M.: Smartbody: Behavior realization for embodied conversational agents. In: AAMAS (2008)Google Scholar
  10. 10.
    Morency, L.P., Sidner, C., Lee, C., Darrell, T.: Contextual recognition of head gestures. In: ICMI (October 2005)Google Scholar
  11. 11.
    Demirdjian, D., Darrell, T.: 3-d articulated pose tracking for untethered deictic reference. In: Int’l Conf. on Multimodal Interfaces (2002)Google Scholar
  12. 12.
    Heylen, D., Bevacqua, E., Tellier, M., Pelachaud, C.: Searching for prototypical facial feedback signals. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 147–153. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  13. 13.
    Kopp, S., Stocksmeier, T., Gibbon, D.: Incremental multimodal feedback for conversational agents. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 139–146. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Ward, N., Tsukahara, W.: Prosodic features which cue back-channel responses in english and japanese. Journal of Pragmatics 23, 1177–1207 (2000)CrossRefGoogle Scholar
  15. 15.
    Gratch, J., Wang, N., Gerten, J., Fast, E.: Creating rapport with virtual agents. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. 16.
    Jónsdóttir, G.R., Gratch, J., Fast, E., Thórisson, K.R.: Fluid semantic back-channel feedback in dialogue: Challenges and progress. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722. Springer, Heidelberg (2007)Google Scholar
  17. 17.
    Allwood, J.: Dimensions of Embodied Communication - towards a typology of embodied communication. In: Embodied Communication in Humans and Machines, Oxford University Press, OxfordGoogle Scholar
  18. 18.
    Yngve, V.: On getting a word in edgewise. In: Proceedings of the Sixth regional Meeting of the Chicago Linguistic Society (1970)Google Scholar
  19. 19.
    Bavelas, J., Coates, L., Johnson, T.: Listeners as co-narrators. Journal of Personality and Social Psychology 79(6), 941–952 (2000)CrossRefGoogle Scholar
  20. 20.
    Nishimura, R., Kitaoka, N., Nakagawa, S.: A spoken dialog system for chat-like conversations considering response timing. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 599–606. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  21. 21.
    Cathcart, N., Carletta, J., Klein, E.: A shallow model of backchannel continuers in spoken dialogue. In: European ACL, pp. 51–58 (2003)Google Scholar
  22. 22.
    Anderson, H., Bader, M., Bard, E., Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H., Weinert, R.: The mcrc map task corpus. Language and Speech 34(4), 351–366 (1991)Google Scholar
  23. 23.
    Fujie, S., Ejiri, Y., Nakajima, K., Matsusaka, Y., Kobayashi, T.: A conversation robot using head gesture recognition as para-linguistic information. In: RO-MAN, pp. 159–164 (September 2004)Google Scholar
  24. 24.
    Maatman, M., Gratch, J., Marsella, S.: Natural behavior of a listening agent. In: Panayiotopoulos, T., Gratch, J., Aylett, R.S., Ballin, D., Olivier, P., Rist, T. (eds.) IVA 2005. LNCS (LNAI), vol. 3661. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  25. 25.
    Kang, S.H., Gratch, J., Wang, N., Watt, J.: Does the contingency of agents’ nonverbal feedback affect users’ social anxiety? In: AAMAS (2008)Google Scholar
  26. 26.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  27. 27.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: ICML (2001)Google Scholar
  28. 28.
    Igor, S., Petr, S., Pavel, M., Luk, B., Michal, F., Martin, K., Jan, C.: Comparison of keyword spotting approaches for informal continuous speech. In: MLMI (2005)Google Scholar
  29. 29.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Louis-Philippe Morency
    • 1
  • Iwan de Kok
    • 2
  • Jonathan Gratch
    • 1
  1. 1.Institute for Creative TechnologiesUniversity of Southern CaliforniaMarina del ReyUSA
  2. 2.Human Media Interaction GroupUniversity of TwenteEnschedeThe Netherlands

Personalised recommendations