Skip to main content
Log in

A probabilistic multimodal approach for predicting listener backchannels

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model or Conditional Random Fields) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Allwood, J. (2008). Dimensions of embodied communication—towards a typology of embodied communication. In I. Wachsmuth, M. Lenzen, & G. Knoblich (Eds.), Embodied communication in humans and machines. Oxford University Press.

  2. Anderson H., Bader M., Bard E.G., Doherty G., GarrodS. Isard S. et al (1991) The mcrc map task corpus. Language and Speech 34(4): 351–366

    Google Scholar 

  3. Bavelas J.B., Coates L., Johnson T. (2000) Listeners as co-narrators. Journal of Personality and Social Psychology 79(6): 941–952

    Article  Google Scholar 

  4. Burns M. (1984) Rapport and relationships: The basis of child care. Journal of Child Care 2: 47–57

    Google Scholar 

  5. Cassell, J., Vilhjlmsson, H., & Bickmore, T. (2001). Beat: The behavior expressive animation toolkit. In Proceedings of the SIGGRAPH.

  6. Cathcart, N., Carletta, J., & Klein, E. (2003). A shallow model of backchannel continuers in spoken dialogue. In European ACL, pp. 51–58.

  7. Cheek J.M. (1983) The Revised Cheek and Buss Shyness Scale (RCBS). Wellesley College, Wellesley, MA

    Google Scholar 

  8. Demirdjian, D., & Darrell, T. (2002). 3-d articulated pose tracking for untethered deictic reference. In International conference on multimodal interfaces.

  9. Drolet A.L., Morris M.W. (2000) Rapport in conflict resolution: Accounting for how face-to-face contact fosters mutual cooperation in mixed-motive conflicts. Experimental Social Psychology 36: 26–50

    Article  Google Scholar 

  10. Fuchs D. (1987) Examiner familiarity effects on test performance: Implications for training and practice. Topics in Early Childhood Special Education 7: 90–104

    Article  Google Scholar 

  11. Fujie, S., Ejiri, Y., Nakajima, K., Matsusaka, Y., Kobayashi, T. (2004). A conversation robot using head gesture recognition as para-linguistic information. In Proceedings of the international symposium on robot and human interactive communication (pp. 159–164).

  12. Gandhe, S., DeVault, D., Roque, A., Martinovski, B., Artstein, R., Leuski, A., et al. (2008). From domain specification to virtual humans: An integrated approach to authoring tactical questioning characters. In Proceedings of interspeech 2008.

  13. Goldberg S.B. (2005) The secrets of successful mediators. Negotiation Journal 21(3): 365–376

    Google Scholar 

  14. Gratch, J., Wang, N., Gerten, J., & Fast, E. (2007). Creating rapport with virtual agents. In Proceedings of intelligent virtual agents (IVA 2007).

  15. hCRF library. http://sourceforge.net/projects/hcrf. Accessed March 2008.

  16. Heylen, D., Bevacqua, E., Tellier, M., & Pelachaud, C. (2007). Searching for prototypical facial feedback signals. In Proceedings of 7th international conference on intelligent virtual agents (pp. 147–153).

  17. Igor, S., Petr, S., Pavel, M., Luk, B., Michal, F., Martin, K., et al. (2005). Comparison of keyword spotting approaches for informal continuous speech. In Proceedings of the joint workshop on multimodal interaction and related machine learning algorithms.

  18. John O. P., & Srivastava, S. (1999). The big-five trait taxonomy: History, measurement, and theoretical perspectives. In L. A. Pervin & O. P. John (Eds.), Handbook of personality: Theory and research (Vol. 2, pp. 102–138). Guilford Press.

  19. Jónsdóttir, G. R., Gratch, J., Fast, E., & Thórisson, K. R. (2007). Fluid semantic back-channel feedback in dialogue: Challenges and progress. In Proceedings of 7th international conference on intelligent virtual agents.

  20. Kang, S.-H., Gratch, J., Wang, N., & Watt, J. (2008). Does the contingency of agents’ nonverbal feedback affect users’ social anxiety? In Proceedings of the international joint conference on autonomous agents and multiagent systems.

  21. Kenny, P., Parsons, T., Gratch, J., & Rizzo, A. (2008). Evaluation of justina: A virtual patient with ptsd. In Proceedings of 8th international conference on intelligent virtual agents, Tokyo, Japan, September 2008.

  22. Kipp, M., Neff, M., Kipp, K. H., & Albrecht, I. (2007). Toward natural gesture synthesis: Evaluating gesture units in a data-driven approach. In Proceedings of 7th international conference on intelligent virtual agents (pp. 15–28). Springer.

  23. Kopp, S., Stocksmeier, T., & Gibbon, D. (2007). Incremental multimodal feedback for conversational agents. In Proceedings of 7th international conference on intelligent virtual agents (pp. 139–146).

  24. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labelling sequence data. In Proceedings of the eighteenth international conference on machine learning.

  25. Lee, J., & Marsella, S. (2006). Nonverbal behavior generator for embodied conversational agents. In Proceedings of 6th international conference on intelligent virtual agents (pp. 243–255).

  26. Lennox R.D., Wolfe R.N. (1984) Revision of the self-monitoring scale. Journal of Personality and Social psychology 46: 1349–1364

    Article  Google Scholar 

  27. Maatman, M., Gratch, J., & Marsella, S. (2005). Natural behavior of a listening agent. In Proceedings of intelligent virtual agent (IVA 2005) (pp. 25–36).

  28. Morency, L.-P., de Kok, I., & Gratch, J. (2008). Context-based recognition during human interactions: Automatic feature selection and encoding dictionary. In Proceedings of 10th international conference on multimodal interfaces (ICMI 2008), October 2008.

  29. Morency, L.-P., de Kok, I., & Gratch, J. (2008). Predicting listener backchannels: A probabilistic multimodal approach. In Proceedings of intelligent virtual agents (IVA 2008), September 2008.

  30. Morency, L.-P., Sidner, C., Lee, C., & Darrell, T. (2005). Contextual recognition of head gestures. In Proceedings of the international conference on multimodal interfaces, October 2005.

  31. Nishimura R., Kitaoka N., Nakagawa S. (2007) A spoken dialog system for chat-like conversations considering response timing. Lecture Notes in Computer Science 4629: 599–606

    Google Scholar 

  32. OKAO Vision library. http://www.omron.com/r_d/coretech/vision/okao.htm. Accessed Dec 2008.

  33. Rabiner L.R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2): 257–286

    Article  Google Scholar 

  34. Scheier M.F., Carver C.S. (1985) The self-consciousness scale: A revised version for use with general populations. Journal of Applied Social Psychology 15: 687–699

    Article  Google Scholar 

  35. Thiebaux, M., Marshall, A., Marsella, S., & Kallmann, M. (2008). Smartbody: Behavior realization for embodied conversational agents. In Proceedings of the international joint conference on autonomous agents and multiagent systems.

  36. Traum, D., Gratch, J., Marsella, S., Lee, J., & Hartholt, A. (2008). Multi-party, multi-issue, multi-strategy negotiation for multi-modal virtual agents. In Proceedings of 8th international conference on intelligent virtual agents, Tokyo, Japan, September 2008.

  37. Tsui P., Schultz G.L. (1985) Failure of rapport: Why psychotheraputic engagement fails in the treatment of asian clients. American Journal of Orthopsychiatry 55: 561–569

    Article  Google Scholar 

  38. Valenti, R., & Gevers, T. (2008). Accurate eye center location and tracking using isophote curvature. In IEEE conference on computer vision and pattern recognition (CVPR 2008), June 2008.

  39. Ward N., Tsukahara W. (2000) Prosodic features which cue back-channel responses in English and Japanese. Journal of Pragmatics 23: 1177–1207

    Article  Google Scholar 

  40. Yngve, V. H. (1970). On getting a word in edgewise. In Proceedings of the sixth regional meeting of the Chicago Linguistic Society.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Louis-Philippe Morency.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morency, LP., de Kok, I. & Gratch, J. A probabilistic multimodal approach for predicting listener backchannels. Auton Agent Multi-Agent Syst 20, 70–84 (2010). https://doi.org/10.1007/s10458-009-9092-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-009-9092-y

Keywords

Navigation