Autonomous Agents and Multi-Agent Systems

, Volume 20, Issue 1, pp 70–84

A probabilistic multimodal approach for predicting listener backchannels

  • Louis-Philippe Morency
  • Iwan de Kok
  • Jonathan Gratch
Article

Abstract

During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model or Conditional Random Fields) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.

Keywords

Listener backchannel feedback Nonverbal behavior prediction Sequential probabilistic model Conditional random field Head nod Multimodal 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Allwood, J. (2008). Dimensions of embodied communication—towards a typology of embodied communication. In I. Wachsmuth, M. Lenzen, & G. Knoblich (Eds.), Embodied communication in humans and machines. Oxford University Press.Google Scholar
  2. 2.
    Anderson H., Bader M., Bard E.G., Doherty G., GarrodS. Isard S. et al (1991) The mcrc map task corpus. Language and Speech 34(4): 351–366Google Scholar
  3. 3.
    Bavelas J.B., Coates L., Johnson T. (2000) Listeners as co-narrators. Journal of Personality and Social Psychology 79(6): 941–952CrossRefGoogle Scholar
  4. 4.
    Burns M. (1984) Rapport and relationships: The basis of child care. Journal of Child Care 2: 47–57Google Scholar
  5. 5.
    Cassell, J., Vilhjlmsson, H., & Bickmore, T. (2001). Beat: The behavior expressive animation toolkit. In Proceedings of the SIGGRAPH.Google Scholar
  6. 6.
    Cathcart, N., Carletta, J., & Klein, E. (2003). A shallow model of backchannel continuers in spoken dialogue. In European ACL, pp. 51–58.Google Scholar
  7. 7.
    Cheek J.M. (1983) The Revised Cheek and Buss Shyness Scale (RCBS). Wellesley College, Wellesley, MAGoogle Scholar
  8. 8.
    Demirdjian, D., & Darrell, T. (2002). 3-d articulated pose tracking for untethered deictic reference. In International conference on multimodal interfaces.Google Scholar
  9. 9.
    Drolet A.L., Morris M.W. (2000) Rapport in conflict resolution: Accounting for how face-to-face contact fosters mutual cooperation in mixed-motive conflicts. Experimental Social Psychology 36: 26–50CrossRefGoogle Scholar
  10. 10.
    Fuchs D. (1987) Examiner familiarity effects on test performance: Implications for training and practice. Topics in Early Childhood Special Education 7: 90–104CrossRefGoogle Scholar
  11. 11.
    Fujie, S., Ejiri, Y., Nakajima, K., Matsusaka, Y., Kobayashi, T. (2004). A conversation robot using head gesture recognition as para-linguistic information. In Proceedings of the international symposium on robot and human interactive communication (pp. 159–164).Google Scholar
  12. 12.
    Gandhe, S., DeVault, D., Roque, A., Martinovski, B., Artstein, R., Leuski, A., et al. (2008). From domain specification to virtual humans: An integrated approach to authoring tactical questioning characters. In Proceedings of interspeech 2008.Google Scholar
  13. 13.
    Goldberg S.B. (2005) The secrets of successful mediators. Negotiation Journal 21(3): 365–376Google Scholar
  14. 14.
    Gratch, J., Wang, N., Gerten, J., & Fast, E. (2007). Creating rapport with virtual agents. In Proceedings of intelligent virtual agents (IVA 2007).Google Scholar
  15. 15.
    hCRF library. http://sourceforge.net/projects/hcrf. Accessed March 2008.
  16. 16.
    Heylen, D., Bevacqua, E., Tellier, M., & Pelachaud, C. (2007). Searching for prototypical facial feedback signals. In Proceedings of 7th international conference on intelligent virtual agents (pp. 147–153).Google Scholar
  17. 17.
    Igor, S., Petr, S., Pavel, M., Luk, B., Michal, F., Martin, K., et al. (2005). Comparison of keyword spotting approaches for informal continuous speech. In Proceedings of the joint workshop on multimodal interaction and related machine learning algorithms.Google Scholar
  18. 18.
    John O. P., & Srivastava, S. (1999). The big-five trait taxonomy: History, measurement, and theoretical perspectives. In L. A. Pervin & O. P. John (Eds.), Handbook of personality: Theory and research (Vol. 2, pp. 102–138). Guilford Press.Google Scholar
  19. 19.
    Jónsdóttir, G. R., Gratch, J., Fast, E., & Thórisson, K. R. (2007). Fluid semantic back-channel feedback in dialogue: Challenges and progress. In Proceedings of 7th international conference on intelligent virtual agents.Google Scholar
  20. 20.
    Kang, S.-H., Gratch, J., Wang, N., & Watt, J. (2008). Does the contingency of agents’ nonverbal feedback affect users’ social anxiety? In Proceedings of the international joint conference on autonomous agents and multiagent systems.Google Scholar
  21. 21.
    Kenny, P., Parsons, T., Gratch, J., & Rizzo, A. (2008). Evaluation of justina: A virtual patient with ptsd. In Proceedings of 8th international conference on intelligent virtual agents, Tokyo, Japan, September 2008.Google Scholar
  22. 22.
    Kipp, M., Neff, M., Kipp, K. H., & Albrecht, I. (2007). Toward natural gesture synthesis: Evaluating gesture units in a data-driven approach. In Proceedings of 7th international conference on intelligent virtual agents (pp. 15–28). Springer.Google Scholar
  23. 23.
    Kopp, S., Stocksmeier, T., & Gibbon, D. (2007). Incremental multimodal feedback for conversational agents. In Proceedings of 7th international conference on intelligent virtual agents (pp. 139–146).Google Scholar
  24. 24.
    Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labelling sequence data. In Proceedings of the eighteenth international conference on machine learning.Google Scholar
  25. 25.
    Lee, J., & Marsella, S. (2006). Nonverbal behavior generator for embodied conversational agents. In Proceedings of 6th international conference on intelligent virtual agents (pp. 243–255).Google Scholar
  26. 26.
    Lennox R.D., Wolfe R.N. (1984) Revision of the self-monitoring scale. Journal of Personality and Social psychology 46: 1349–1364CrossRefGoogle Scholar
  27. 27.
    Maatman, M., Gratch, J., & Marsella, S. (2005). Natural behavior of a listening agent. In Proceedings of intelligent virtual agent (IVA 2005) (pp. 25–36).Google Scholar
  28. 28.
    Morency, L.-P., de Kok, I., & Gratch, J. (2008). Context-based recognition during human interactions: Automatic feature selection and encoding dictionary. In Proceedings of 10th international conference on multimodal interfaces (ICMI 2008), October 2008.Google Scholar
  29. 29.
    Morency, L.-P., de Kok, I., & Gratch, J. (2008). Predicting listener backchannels: A probabilistic multimodal approach. In Proceedings of intelligent virtual agents (IVA 2008), September 2008.Google Scholar
  30. 30.
    Morency, L.-P., Sidner, C., Lee, C., & Darrell, T. (2005). Contextual recognition of head gestures. In Proceedings of the international conference on multimodal interfaces, October 2005.Google Scholar
  31. 31.
    Nishimura R., Kitaoka N., Nakagawa S. (2007) A spoken dialog system for chat-like conversations considering response timing. Lecture Notes in Computer Science 4629: 599–606Google Scholar
  32. 32.
    OKAO Vision library. http://www.omron.com/r_d/coretech/vision/okao.htm. Accessed Dec 2008.
  33. 33.
    Rabiner L.R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2): 257–286CrossRefGoogle Scholar
  34. 34.
    Scheier M.F., Carver C.S. (1985) The self-consciousness scale: A revised version for use with general populations. Journal of Applied Social Psychology 15: 687–699CrossRefGoogle Scholar
  35. 35.
    Thiebaux, M., Marshall, A., Marsella, S., & Kallmann, M. (2008). Smartbody: Behavior realization for embodied conversational agents. In Proceedings of the international joint conference on autonomous agents and multiagent systems.Google Scholar
  36. 36.
    Traum, D., Gratch, J., Marsella, S., Lee, J., & Hartholt, A. (2008). Multi-party, multi-issue, multi-strategy negotiation for multi-modal virtual agents. In Proceedings of 8th international conference on intelligent virtual agents, Tokyo, Japan, September 2008.Google Scholar
  37. 37.
    Tsui P., Schultz G.L. (1985) Failure of rapport: Why psychotheraputic engagement fails in the treatment of asian clients. American Journal of Orthopsychiatry 55: 561–569CrossRefGoogle Scholar
  38. 38.
    Valenti, R., & Gevers, T. (2008). Accurate eye center location and tracking using isophote curvature. In IEEE conference on computer vision and pattern recognition (CVPR 2008), June 2008.Google Scholar
  39. 39.
    Ward N., Tsukahara W. (2000) Prosodic features which cue back-channel responses in English and Japanese. Journal of Pragmatics 23: 1177–1207CrossRefGoogle Scholar
  40. 40.
    Yngve, V. H. (1970). On getting a word in edgewise. In Proceedings of the sixth regional meeting of the Chicago Linguistic Society.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Louis-Philippe Morency
    • 1
  • Iwan de Kok
    • 2
  • Jonathan Gratch
    • 1
  1. 1.Institute for Creative TechnologiesUniversity of Southern CaliforniaMarina del ReyUSA
  2. 2.Human Media Interaction GroupUniversity of TwenteEnschedeThe Netherlands

Personalised recommendations