Continuous interaction with a virtual human

Reidsma, Dennis; de Kok, Iwan; Neiberg, Daniel; Pammi, Sathish Chandra; van Straalen, Bart; Truong, Khiet; van Welbergen, Herwin

doi:10.1007/s12193-011-0060-x

Continuous interaction with a virtual human

Original Paper
Open access
Published: 27 May 2011

Volume 4, pages 97–118, (2011)
Cite this article

Download PDF

You have full access to this open access article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Continuous interaction with a virtual human

Download PDF

Dennis Reidsma¹,
Iwan de Kok¹,
Daniel Neiberg²,
Sathish Chandra Pammi³,
Bart van Straalen¹,
Khiet Truong¹ &
…
Herwin van Welbergen¹

1088 Accesses
21 Citations
Explore all metrics

Abstract

This paper presents our progress in developing a Virtual Human capable of being an attentive speaker. Such a Virtual Human should be able to attend to its interaction partner while it is speaking—and modify its communicative behavior on-the-fly based on what it observes in the behavior of its partner. We report new developments concerning a number of aspects, such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and strategies for generating appropriate reactions to listener responses. On the basis of this progress, a task-based setup for a responsive Virtual Human was implemented to carry out two user studies, the results of which are presented and discussed in this paper.

Article PDF

Perception and Evaluation in Human–Robot Interaction: The Human–Robot Interaction Evaluation Scale (HRIES)—A Multicomponent Approach of Anthropomorphism

Article 13 January 2021

Communication in Human-Robot Interaction

Article Open access 27 August 2020

Speech production and perception data collection in R: A tutorial for web-based methods using speechcollectr

Article 03 June 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Allwood J, Cerrate L (2003) A study of gestural feedback expressions. In: Paggio P, Jokinen K, Jönsson K (eds) 1st Nordic symposium on multimodal communication, pp 7–22
Google Scholar
Anderson AH, Bader M, Bard EG, Boyle E, Doherty-Sneddon G, Garrod S, Isard S, Kowtko JC, McAllister J, Miller J, Sotillo C, Thompson H, Weinert R (1991) The HCRC Map Task corpus. Lang Speech 34:351–366
Google Scholar
Bavelas JB, Coates L, Johnson T (2000) Listeners as co-narrators. J Pers Soc Psychol 79(6):941–952
Article Google Scholar
Bavelas JB, Coates L, Johnson T (2002) Listener responses as a collaborative process: The role of gaze. J Commun 52(3):566–580
Article Google Scholar
Benus S, Gravano A, Hirschberg J (2007) The prosody of backchannels in American English. In: Proceedings of the 16th international congress of phonetic sciences 2007, pp 1065–1068
Google Scholar
Black AW, Tokuda K, Zen H (2002) An HMM-based speech synthesis system applied to English. In: Proc of 2002 IEEE SSW, Santa Monica, CA, USA
Google Scholar
Brady PT (1968) A statistical analysis of on-off patterns in 16 conversations. Bell Syst Tech J 47:73–91
Google Scholar
Carletta JC, Isard S, Doherty-Sneddon G, Isard A, Kowtko JC, AH Anderson (1997) The reliability of a dialogue structure coding scheme. Comput Linguist 23(1):13–31
Google Scholar
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Clark HH (1996) Using language. Cambridge University Press, Cambridge
Google Scholar
Clark HH, Brennan SE (1991) Grounding in communication. In: Resnick LB, Levine JM, Teasly SD (eds) Perspectives on socially shared cognition. American Psychological Association, Washington
Google Scholar
Clark HH, Krych MA (2004) Speaking while monitoring addressees for understanding. J Mem Lang 50(1):62–81. doi:10.1016/j.jml.2003.08.004
Article Google Scholar
Dhillon R, Bhagat S, Carvey H, Shriberg E (2004) Meeting recorder project: Dialog act labeling guide. Tech Rep ICSI Technical Report TR-04-002, International Computer Science Institute
Duncan S Jr (1972) Some signals and rules for taking speaking turns in conversation. J Pers Soc Psychol 23(2)
Duncan S Jr (1974) On the structure of speaker-auditor interaction during speaking turns. Lang Soc 3(2):161–180. doi:10.1017/s0047404500004322
Article Google Scholar
Edlund J, Heldner M, Al Moubayed S, Gravano A, Hirschberg J (2010) Very short utterances in conversation. In: Proceedings of fonetik
Google Scholar
Eyben F, Woellmer M, Schuller B (2010) openSMILE—the Munich versatile and fast open-source audio feature extractor. In: Proceedings of ACM multimedia, pp 1459–1462
Google Scholar
French P, Local J (1983) Turn-competitive incomings. J Pragmat 7:17–38
Article Google Scholar
Fujimoto DT (2007) Listener responses in interaction: a case for abandoning the term, backchannel. J Osaka Jogakuin 2 Year Coll 37:35–54
Google Scholar
Goldwater S, Jurafsky D, Manning CD (2010) Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Commun 52:181–200
Article Google Scholar
Goodwin C (1981) Conversational organization: interaction between speakers and hearers. Academic Press, San Diego
Google Scholar
Goodwin C (1986) Between and within: alternative sequential treatments of continuers and assessments. Hum Stud 9(2–3):205–217. doi:10.1007/bf00148127
Article Google Scholar
Gravano A, Hirschberg J (2009) Backchannel-inviting cues in task-oriented dialogue. In: Proceedings of interspeech, Brighton, pp 1019–1022
Google Scholar
Gustafson J, Neiberg D (2010) Prosodic cues to engagement in non-lexical response tokens in Swedish. In: DiSS-LPSS Joint Workshop
Google Scholar
Heldner M, Edlund J (2010) Pauses, gaps and overlaps in conversations. J Phonetics 38(4):555–568. doi:10.1016/j.wocn.2010.08.002
Article Google Scholar
Heylen D (2006) Head gestures gaze and the principles of conversational structure International. Int J Humanoid Robot 3(3):241–267
Article Google Scholar
Heylen D, Bevacqua E, Tellier M, Pelachaud C (2007) Searching for prototypical facial feedback signals. In: Pelachaud C, Martin JC, André E, Chollet G, Karpouzis K, Pelé D (eds) Proceedings of the 7th international conference intelligent virtual agents. Lecture notes in computer science, vol 4722. Springer, Berlin, pp 147–153. doi:10.1007/978-3-540-74997-4_14
Google Scholar
Kendon A (1967) Some functions of gaze direction in social interaction. Acta Psychol 26:22–63
Article Google Scholar
de Kok I, Heylen D (2011) The MultiLis corpus—dealing with individual differences of nonverbal listening behavior. In: Proceedings of COST 2102: toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues, pp 362–375
Chapter Google Scholar
Kopp S (2010) Social resonance and embodied coordination in face-to-face conversation with artificial interlocutors. Speech Commun 52(6):587–597. doi:10.1016/j.specom.2010.02.007
Article Google Scholar
Kopp S, Krenn B, Marsella SC, AN Marshall, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson HH (2006) Towards a common framework for multimodal generation: the behavior markup language. In: Gratch J, Young MR, Aylett RS, Ballin D, Olivier P (eds) Proceedings of the 6th international conference on intelligent virtual agents. Lecture notes in computer science, vol 4133. Springer, Berlin, pp 205–217
Google Scholar
Kurtic E, Brown GJ, Wells B (2010) Resources for turn competition in overlap in multi-party conversations: speech rate, pausing and duration. In: Proceedings of interspeech, pp 2550–2553
Google Scholar
Lee CC, Lee S, Narayanan SS (2008) An analysis of multimodal cues of interruption in dyadic spoken interactions. In: Proceedings of interspeech, pp 1678–1681
Google Scholar
ter Maat M, Truong KP, Heylen D (2010) How turn-taking strategies influence users’ impressions of an agent. In: Allbeck J, Badler NI, Bickmore T, Pelachaud C, Safonova A (eds) Proceedings of the 10th international conference on intelligent virtual agents, Philadelphia, Pennsylvania, USA. Lecture notes in computer science, vol 6356. Springer, Berlin, pp 441–453. doi:10.1007/978-3-642-15892-6_48
Google Scholar
Manusov V, Trees AR (2002) “Are you kidding me?”: The role of nonverbal cues in the verbal accounting process. J Commun 52(3):640–656. doi:10.1111/j.1460-2466.2002.tb02566.x
Article Google Scholar
McKinneya MF, Moelants D, Davies MEP, Klapuri A (2007) Evaluation of audio beat tracking and music tempo extraction algorithms. J New Music Res 36(1):1–16
Article Google Scholar
Neiberg D, Gustafson J (2010) The prosody of Swedish conversational grunts. In: Proc of Interspeech
Google Scholar
Neiberg D, Truong KP (2011) Online detection of vocal listener responses with maximum latency constraints. In: Proc of ICASSP, p 2011
Google Scholar
Nijholt A, Reidsma D, van Welbergen H, op den Akker H, Ruttkay ZM (2008) Mutually coordinated anticipatory multimodal interaction. In: Esposito A, Bourbakis NG, Avouris N, Hatzilygeroudis I (eds) Verbal and nonverbal features of human-human and human-machine interaction. Lecture notes in computer science, vol 5042. Springer, Berlin, pp 70–89
Chapter Google Scholar
Norwine AC, Murphy OJ (1938) Characteristic time intervals in telephonic conversation. Bell Syst Tech J 17:281–291
Google Scholar
Reidsma D (2008) Annotations and subjective machines—of annotators, embodied agents, users, and other humans. PhD thesis, University of Twente. doi:10.3990/1.9789036527262
Reidsma D, Truong K, van Welbergen H, Neiberg D, Pammi S, de Kok I, van Straalen B (2010) Continuous interaction with a virtual human. In: Salah AA, Gevers T (eds) Proceedings of the eNTERFACE’10 summer workshop on multimodal interfaces, pp 24–39
Google Scholar
Sacks H, Schegloff E, Jefferson G (1974) A simplest systematics for the organization of turn-taking for conversation. Language 50:696–735
Article Google Scholar
Schegloff E (2000) Overlapping talk and the organization of turn-taking for conversation. Lang Soc 29:1–63
Article Google Scholar
Schlangen D, Skantze G (2009) A general, abstract model of incremental dialogue processing. In: Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics (EACL-09)
Schröder M (2010) The SEMAINE API: Towards a standards-based framework for building emotion-oriented systems. Adv Hum-Comput Interact 2010:319406. doi:10.1155/2010/319406
Google Scholar
Schröder M, Trouvain J (2003) The German text-to-speech synthesis system MARY: a tool for research, development and teaching. Int J Speech Technol 6(4):365–377
Article Google Scholar
Schröder M, Charfuelan M, Pammi S, Türk O (2008) The MARY TTS entry in the Blizzard Challenge 2008. In: Proc of the Blizzard Challenge
Google Scholar
Skantze G, Hjalmarsson A (2010) Towards incremental speech generation in dialogue systems. In: Proceedings of SIGdial
Google Scholar
Thiebaux M, Marshall AN, Marsella SC, Kallmann M (2008) Smartbody: Behavior realization for embodied conversational agents. In: Proceedings of the 7th international conference on autonomous agents and multiagent systems, pp 151–158
Google Scholar
Thórisson KR (2002) Natural turn-taking needs no manual: Computational theory and model, from perception to action. In: Granström B, House D, Karlsson I (eds) Multimodality in language and speech systems. Kluwer Academic, Dordrecht, pp 173–207
Google Scholar
Toda T, Tokuda K (2007) A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans Inf Syst E90-D(5):816–824
Article Google Scholar
Walker MB, Trimboli C (1982) Smooth transitions in conversational interactions. J Soc Psychol 117:305–306
Article Google Scholar
Ward N (2006) Non-lexical conversational sounds in American English. Pragmat Cogn 14(1):129–182
Article Google Scholar
Ward N, Tsukahara W (2000) Prosodic features which cue back-channel responses in English and Japanese. J Pragmat 32(8):1177–1207
Article Google Scholar
van Welbergen H, Reidsma D, Ruttkay ZM, Zwiers J (2010a) Elckerlyc: A BML realizer for continuous, multimodal interaction with a virtual human. J Multimodal User Interfaces 3(4):271–284. doi:10.1007/s12193-010-0051-3
Article Google Scholar
van Welbergen H, Reidsma D, Zwiers J (2010b) A demonstration of continuous interaction with Elckerlyc. In: Proceedings of the third workshop on multimodal output generation, CTIT Workshop Proceedings. vol WP2010, pp 51–57
Google Scholar

Download references

Author information

Authors and Affiliations

Human Media Interaction, University of Twente, Postbus 217, 7500AE, Enschede, Netherlands
Dennis Reidsma, Iwan de Kok, Bart van Straalen, Khiet Truong & Herwin van Welbergen
Dept. of Speech, Music and Hearing, KTH Royal Institute of Technology, Lindstedtsv. 24, 100 44, Stockholm, Sweden
Daniel Neiberg
Language Technology Lab, German Research Center for Artificial Intelligence DFKI, Stuhlsatzenhausweg 3, D-66123, Saarbruecken, Germany
Sathish Chandra Pammi

Authors

Dennis Reidsma
View author publications
You can also search for this author in PubMed Google Scholar
Iwan de Kok
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Neiberg
View author publications
You can also search for this author in PubMed Google Scholar
Sathish Chandra Pammi
View author publications
You can also search for this author in PubMed Google Scholar
Bart van Straalen
View author publications
You can also search for this author in PubMed Google Scholar
Khiet Truong
View author publications
You can also search for this author in PubMed Google Scholar
Herwin van Welbergen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dennis Reidsma.

Additional information

This paper is base upon a project report of the eNTERFACE’10 Summer Workshop on Multimodal Interfaces [42].

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Reidsma, D., de Kok, I., Neiberg, D. et al. Continuous interaction with a virtual human. J Multimodal User Interfaces 4, 97–118 (2011). https://doi.org/10.1007/s12193-011-0060-x

Download citation

Received: 05 February 2011
Accepted: 29 April 2011
Published: 27 May 2011
Issue Date: July 2011
DOI: https://doi.org/10.1007/s12193-011-0060-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Continuous interaction with a virtual human

Abstract

Article PDF

Similar content being viewed by others

Perception and Evaluation in Human–Robot Interaction: The Human–Robot Interaction Evaluation Scale (HRIES)—A Multicomponent Approach of Anthropomorphism

Communication in Human-Robot Interaction

Speech production and perception data collection in R: A tutorial for web-based methods using speechcollectr

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Continuous interaction with a virtual human

Abstract

Article PDF

Similar content being viewed by others

Perception and Evaluation in Human–Robot Interaction: The Human–Robot Interaction Evaluation Scale (HRIES)—A Multicomponent Approach of Anthropomorphism

Communication in Human-Robot Interaction

Speech production and perception data collection in R: A tutorial for web-based methods using speechcollectr

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation