Prediction of Visual Backchannels in the Absence of Visual Context Using Mutual Influence

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8108)


Based on the phenomena of mutual influence between participants of a face-to-face conversation, we propose a context-based prediction approach for modeling visual backchannels. Our goal is to create intelligent virtual listeners with the ability of providing backchannel feedbacks, enabling natural and fluid interactions. In our proposed approach, we first anticipate the speaker behaviors, and then use this anticipated visual context to obtain more accurate listener backchannel moments. We model the mutual influence between speaker and listener gestures using a latent variable sequential model. We compared our approach with state-of-the-art prediction models on a publicly available dataset and showed importance of modeling the mutual influence between the speaker and the listener.


nonverbal behavior embodied conversational agent 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ross, M.D., Menzler, S., Zimmermann, E.: Rapid facial mimicry in orangutan play. Biol. Lett. 4, 27–30 (2008)CrossRefGoogle Scholar
  2. 2.
    Hatfield, E., Cacioppo, J., Rapson, R.: Emotional contagion. In: Clark, M.S. (ed.) Review of Personality and Social Psychology: Emotion and Social Behavior, pp. 151–171 (1992)Google Scholar
  3. 3.
    Riek, L.D., Paul, P.C., Robinson, P.: When my robot smiles at me: Enabling human-robot rapport via real-time head gesture mimicry. Journal on Multimodal User Interfaces 3, 99–108 (2010)CrossRefGoogle Scholar
  4. 4.
    Gratch, J., Wang, N., Gerten, J., Fast, E., Duffy, R.: Creating rapport with virtual agents. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 125–138. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  5. 5.
    Drolet, A.L., Morris, M.W.: Rapport in conflict resolution: Accounting for how face-to-face contact fosters mutual cooperation in mixed-motive conflicts. Journal of Experimental Social Psychology 36(1), 26–50 (2000)CrossRefGoogle Scholar
  6. 6.
    Tsui, P., Schultz, G.: Failure of rapport: Why psychotheraputic engagement fails in the treatment of asian clients. American Journal of Orthopsychiatry 55, 561–569 (1985)CrossRefGoogle Scholar
  7. 7.
    Fuchs, D.: Examiner familiarity effects on test performance: implications for training and practice. Topics in Early Childhood Special Education 7, 90–104 (1987)CrossRefGoogle Scholar
  8. 8.
    Burns, M.: Rapport and relationships: The basis of child care. Journal of Child Care 2, 47–57 (1984)Google Scholar
  9. 9.
    Ozkan, D., Morency, L.P.: Latent mixture of discriminative experts. IEEE Transactions on Multimedia 15(2), 326–338 (2013)CrossRefGoogle Scholar
  10. 10.
    Morency, L.P., de Kok, I., Gratch, J.: Predicting listener backchannels: A probabilistic multimodal approach. In: Conference on Intelligent Virutal Agents, IVA (2008)Google Scholar
  11. 11.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: International Conference on Machine Learning, ICML (2001)Google Scholar
  12. 12.
    Morency, L.P., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. In: IEE Conference on Computer Vision and Pattern Recognition, CVPR (2007)Google Scholar
  13. 13.
    Smith, A., Cohn, T., Osborne, M.: Logarithmic opinion pools for conditional random fields. In: Association for Computational Linguistics (ACL), pp. 18–25 (2005)Google Scholar
  14. 14.
    Ward, N., Tsukahara, W.: Prosodic features which cue back-channel responses in english and japanese. Journal of Pragmatics 23, 1177–1207 (2000)CrossRefGoogle Scholar
  15. 15.
    Pantic, M., Pentland, A., Nijholt, A., Huang, T.: Human computing and machine understanding of human behavior: A survey. In: ACM International Conferance on Multimodal Interfaces, pp. 239–248 (2006)Google Scholar
  16. 16.
    Mitra, S., Acharya, T.: Gesture recognition: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 37(3), 311–324 (2007)CrossRefGoogle Scholar
  17. 17.
    Sebea, N., Cohenb, I., Netherl, T.: Multimodal approaches for emotion recognition: A survey (2005)Google Scholar
  18. 18.
    Maatman, R.M., Gratch, J., Marsella, S.: Natural behavior of a listening agent. In: Panayiotopoulos, T., Gratch, J., Aylett, R.S., Ballin, D., Olivier, P., Rist, T. (eds.) IVA 2005. LNCS (LNAI), vol. 3661, pp. 25–36. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  19. 19.
    Nakano, Y., Reinstein, G., Stocky, T., Cassell, J.: Towards a model of face-to-face grounding. In: Association for Computational Linguistics, ACL (2003)Google Scholar
  20. 20.
    Nakano, Y., Murata, K., Enomoto, M., Arimoto, Y., Asa, Y., Sagawa, H.: Predicting evidence of understanding by monitoring user’s task manipulation in multimodal conversations. In: Association for Computational Linguistics (ACL), pp. 121–124 (2007)Google Scholar
  21. 21.
    Ward, N.: Non-lexical conversational sounds in American English (2003)Google Scholar
  22. 22.
    Fujie, S., Ejiri, Y., Nakajima, K., Matsusaka, Y., Kobayashi, T.: A conversation robot using head gesture recognition as para-linguistic information. In: IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 159–164 (2004)Google Scholar
  23. 23.
    Kang, S.H., Gratch, J., Wang, N., Watt, J.: Does the contingency of agents’ nonverbal feedback affect users’ social anxiety? In: International Conference on Autonomous Agents and Multiagent Systems, AAMAS (2008)Google Scholar
  24. 24.
    Semaine the sensitive agent projectGoogle Scholar
  25. 25.
    Gravano, A.: Turn-taking and affirmative cue words in taskoriented dialogue. Technical report (2009)Google Scholar
  26. 26.
    Neiberg, D.: Modelling Paralinguistic Conversational Interaction: Towards social awareness in spoken human-machine dialogue. PhD thesis, KTH, Speech Communication and Technology, QC 20120914 (2012)Google Scholar
  27. 27.
    Nishimura, R., Kitaoka, N., Nakagawa, S.: A spoken dialog system for chat-like conversations considering response timing. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 599–606. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  28. 28.
    Cathcart, N., Carletta, J., Klein, E.: A shallow model of backchannel continuers in spoken dialogue. In: European Chapter of the Association for Computational Linguistics (EACL), pp. 51–58 (2003)Google Scholar
  29. 29.
    Eyben, F., Wöllmer, M., Schuller, B.: openEAR - Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit. In: Affective Computing and Intelligent Interaction (ACII), pp. 576–581 (2009)Google Scholar
  30. 30.
    Sagae, K., Tsujii, J.: Dependency parsing and domain adaptation with LR models and parser ensembles. In: Association for Computational Linguistics (ACL), pp. 1044–1050 (2007)Google Scholar
  31. 31.
    Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The penn treebank: annotating predicate argument structure. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 114–119 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Institute for Creative TechnologiesUniversity of Southern CaligorniaPlaya VistaUSA

Personalised recommendations