A probabilistic multimodal approach for predicting listener backchannels

Morency, Louis-Philippe; de Kok, Iwan; Gratch, Jonathan

doi:10.1007/s10458-009-9092-y

A probabilistic multimodal approach for predicting listener backchannels

Published: 10 May 2009

Volume 20, pages 70–84, (2010)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Louis-Philippe Morency¹,
Iwan de Kok² &
Jonathan Gratch¹

702 Accesses
94 Citations
3 Altmetric
Explore all metrics

Abstract

During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model or Conditional Random Fields) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of Visual Backchannels in the Absence of Visual Context Using Mutual Influence

The Face Speaks: Contextual and Temporal Sensitivity to Backchannel Responses

Explorative Study on the Non-verbal Backchannel Prediction Model for Human-Robot Interaction

References

Allwood, J. (2008). Dimensions of embodied communication—towards a typology of embodied communication. In I. Wachsmuth, M. Lenzen, & G. Knoblich (Eds.), Embodied communication in humans and machines. Oxford University Press.
Anderson H., Bader M., Bard E.G., Doherty G., GarrodS. Isard S. et al (1991) The mcrc map task corpus. Language and Speech 34(4): 351–366
Google Scholar
Bavelas J.B., Coates L., Johnson T. (2000) Listeners as co-narrators. Journal of Personality and Social Psychology 79(6): 941–952
Article Google Scholar
Burns M. (1984) Rapport and relationships: The basis of child care. Journal of Child Care 2: 47–57
Google Scholar
Cassell, J., Vilhjlmsson, H., & Bickmore, T. (2001). Beat: The behavior expressive animation toolkit. In Proceedings of the SIGGRAPH.
Cathcart, N., Carletta, J., & Klein, E. (2003). A shallow model of backchannel continuers in spoken dialogue. In European ACL, pp. 51–58.
Cheek J.M. (1983) The Revised Cheek and Buss Shyness Scale (RCBS). Wellesley College, Wellesley, MA
Google Scholar
Demirdjian, D., & Darrell, T. (2002). 3-d articulated pose tracking for untethered deictic reference. In International conference on multimodal interfaces.
Drolet A.L., Morris M.W. (2000) Rapport in conflict resolution: Accounting for how face-to-face contact fosters mutual cooperation in mixed-motive conflicts. Experimental Social Psychology 36: 26–50
Article Google Scholar
Fuchs D. (1987) Examiner familiarity effects on test performance: Implications for training and practice. Topics in Early Childhood Special Education 7: 90–104
Article Google Scholar
Fujie, S., Ejiri, Y., Nakajima, K., Matsusaka, Y., Kobayashi, T. (2004). A conversation robot using head gesture recognition as para-linguistic information. In Proceedings of the international symposium on robot and human interactive communication (pp. 159–164).
Gandhe, S., DeVault, D., Roque, A., Martinovski, B., Artstein, R., Leuski, A., et al. (2008). From domain specification to virtual humans: An integrated approach to authoring tactical questioning characters. In Proceedings of interspeech 2008.
Goldberg S.B. (2005) The secrets of successful mediators. Negotiation Journal 21(3): 365–376
Google Scholar
Gratch, J., Wang, N., Gerten, J., & Fast, E. (2007). Creating rapport with virtual agents. In Proceedings of intelligent virtual agents (IVA 2007).
hCRF library. http://sourceforge.net/projects/hcrf. Accessed March 2008.
Heylen, D., Bevacqua, E., Tellier, M., & Pelachaud, C. (2007). Searching for prototypical facial feedback signals. In Proceedings of 7th international conference on intelligent virtual agents (pp. 147–153).
Igor, S., Petr, S., Pavel, M., Luk, B., Michal, F., Martin, K., et al. (2005). Comparison of keyword spotting approaches for informal continuous speech. In Proceedings of the joint workshop on multimodal interaction and related machine learning algorithms.
John O. P., & Srivastava, S. (1999). The big-five trait taxonomy: History, measurement, and theoretical perspectives. In L. A. Pervin & O. P. John (Eds.), Handbook of personality: Theory and research (Vol. 2, pp. 102–138). Guilford Press.
Jónsdóttir, G. R., Gratch, J., Fast, E., & Thórisson, K. R. (2007). Fluid semantic back-channel feedback in dialogue: Challenges and progress. In Proceedings of 7th international conference on intelligent virtual agents.
Kang, S.-H., Gratch, J., Wang, N., & Watt, J. (2008). Does the contingency of agents’ nonverbal feedback affect users’ social anxiety? In Proceedings of the international joint conference on autonomous agents and multiagent systems.
Kenny, P., Parsons, T., Gratch, J., & Rizzo, A. (2008). Evaluation of justina: A virtual patient with ptsd. In Proceedings of 8th international conference on intelligent virtual agents, Tokyo, Japan, September 2008.
Kipp, M., Neff, M., Kipp, K. H., & Albrecht, I. (2007). Toward natural gesture synthesis: Evaluating gesture units in a data-driven approach. In Proceedings of 7th international conference on intelligent virtual agents (pp. 15–28). Springer.
Kopp, S., Stocksmeier, T., & Gibbon, D. (2007). Incremental multimodal feedback for conversational agents. In Proceedings of 7th international conference on intelligent virtual agents (pp. 139–146).
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labelling sequence data. In Proceedings of the eighteenth international conference on machine learning.
Lee, J., & Marsella, S. (2006). Nonverbal behavior generator for embodied conversational agents. In Proceedings of 6th international conference on intelligent virtual agents (pp. 243–255).
Lennox R.D., Wolfe R.N. (1984) Revision of the self-monitoring scale. Journal of Personality and Social psychology 46: 1349–1364
Article Google Scholar
Maatman, M., Gratch, J., & Marsella, S. (2005). Natural behavior of a listening agent. In Proceedings of intelligent virtual agent (IVA 2005) (pp. 25–36).
Morency, L.-P., de Kok, I., & Gratch, J. (2008). Context-based recognition during human interactions: Automatic feature selection and encoding dictionary. In Proceedings of 10th international conference on multimodal interfaces (ICMI 2008), October 2008.
Morency, L.-P., de Kok, I., & Gratch, J. (2008). Predicting listener backchannels: A probabilistic multimodal approach. In Proceedings of intelligent virtual agents (IVA 2008), September 2008.
Morency, L.-P., Sidner, C., Lee, C., & Darrell, T. (2005). Contextual recognition of head gestures. In Proceedings of the international conference on multimodal interfaces, October 2005.
Nishimura R., Kitaoka N., Nakagawa S. (2007) A spoken dialog system for chat-like conversations considering response timing. Lecture Notes in Computer Science 4629: 599–606
Google Scholar
OKAO Vision library. http://www.omron.com/r_d/coretech/vision/okao.htm. Accessed Dec 2008.
Rabiner L.R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2): 257–286
Article Google Scholar
Scheier M.F., Carver C.S. (1985) The self-consciousness scale: A revised version for use with general populations. Journal of Applied Social Psychology 15: 687–699
Article Google Scholar
Thiebaux, M., Marshall, A., Marsella, S., & Kallmann, M. (2008). Smartbody: Behavior realization for embodied conversational agents. In Proceedings of the international joint conference on autonomous agents and multiagent systems.
Traum, D., Gratch, J., Marsella, S., Lee, J., & Hartholt, A. (2008). Multi-party, multi-issue, multi-strategy negotiation for multi-modal virtual agents. In Proceedings of 8th international conference on intelligent virtual agents, Tokyo, Japan, September 2008.
Tsui P., Schultz G.L. (1985) Failure of rapport: Why psychotheraputic engagement fails in the treatment of asian clients. American Journal of Orthopsychiatry 55: 561–569
Article Google Scholar
Valenti, R., & Gevers, T. (2008). Accurate eye center location and tracking using isophote curvature. In IEEE conference on computer vision and pattern recognition (CVPR 2008), June 2008.
Ward N., Tsukahara W. (2000) Prosodic features which cue back-channel responses in English and Japanese. Journal of Pragmatics 23: 1177–1207
Article Google Scholar
Yngve, V. H. (1970). On getting a word in edgewise. In Proceedings of the sixth regional meeting of the Chicago Linguistic Society.

Download references

Author information

Authors and Affiliations

Institute for Creative Technologies, University of Southern California, 13274 Fiji Way, Marina del Rey, CA, 90292, USA
Louis-Philippe Morency & Jonathan Gratch
Human Media Interaction Group, University of Twente, P.O. Box 217, 7500AE, Enschede, The Netherlands
Iwan de Kok

Authors

Louis-Philippe Morency
View author publications
You can also search for this author in PubMed Google Scholar
Iwan de Kok
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Gratch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Louis-Philippe Morency.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morency, LP., de Kok, I. & Gratch, J. A probabilistic multimodal approach for predicting listener backchannels. Auton Agent Multi-Agent Syst 20, 70–84 (2010). https://doi.org/10.1007/s10458-009-9092-y

Download citation

Published: 10 May 2009
Issue Date: January 2010
DOI: https://doi.org/10.1007/s10458-009-9092-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A probabilistic multimodal approach for predicting listener backchannels

Abstract

Access this article

Similar content being viewed by others

Prediction of Visual Backchannels in the Absence of Visual Context Using Mutual Influence

The Face Speaks: Contextual and Temporal Sensitivity to Backchannel Responses

Explorative Study on the Non-verbal Backchannel Prediction Model for Human-Robot Interaction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A probabilistic multimodal approach for predicting listener backchannels

Abstract

Access this article

Similar content being viewed by others

Prediction of Visual Backchannels in the Absence of Visual Context Using Mutual Influence

The Face Speaks: Contextual and Temporal Sensitivity to Backchannel Responses

Explorative Study on the Non-verbal Backchannel Prediction Model for Human-Robot Interaction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation