Abstract
Applications with intelligent conversational virtual humans, called Embodied Conversational Agents (ECAs), seek to bring human-like abilities into machines and establish natural human-computer interaction. In this paper we discuss realization of ECA multimodal behaviors which include speech and nonverbal behaviors. We devise RealActor, an open-source, multi-platform animation system for real-time multimodal behavior realization for ECAs. The system employs a novel solution for synchronizing gestures and speech using neural networks. It also employs an adaptive face animation model based on Facial Action Coding System (FACS) to synthesize face expressions. Our aim is to provide a generic animation system which can help researchers create believable and expressive ECAs.
Similar content being viewed by others
References
Albrecht I, Haber J, Peter Seidel H (2002) Automatic generation of nonverbal facial expressions from speech. In: In Proc. Computer Graphics International 2002, pp 283–293
Bianchi-Berthouze N, Kleinsmith A (2003) A categorical approach to affective gesture recognition. Connect Sci 15(4):259–269
BML Specification http://wiki.mindmakers.org/projects:BML:main
Brkic M, Smid K, Pejsa T, Pandzic IS (2008) Towards natural head movement of autonomous speaker agent. In: Proceedings of the 12th International Conference on Knowledge-Based Intelligent Information and Engineering Systems KES 2008. 5178:73–80
Cassell J (2000) Embodied conversational agents. The MIT (April 2000)
Cassell J, Vilhjalmsson HH, Bickmore T (2001) Beat: the behavior expression animation toolkit. In: SIGGRAPH ’01: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, ACM, pp 477–486
Cerekovic A, Huang HH, Furukawa T, Yamaoka Y, Pandzic, IS, Nishida T, Nakano Y (2009) Implementing a multiuser tour guide system with an embodied conversational agent, International Conference on Active Media Technology (AMT2009), Beijin, China, October 22–24, (2009)
Cerekovic A, Pejsa T, Pandzic IS (2010) A controller-based animation system for synchronizing and realizing human-like conversational behaviors, Proceedings of COST Action 2102 International School Dublin
Chovil N (1991) Discourse-oriented facial displays in conversation. Res Lang Soc Interact 25:163–194
Coulson M (2004) Attributing emotion to static body postures: recognition accuracy, confusions, and viewpoint dependence. J Nonverbal Behav 28(2):117–139
Dariouch B, Ech Chafai N, Mancini M, Pelachaud C (2004) Tools to Create Individual ECAs, Workshop Humaine, Santorini, September (2004)
Ekman P (1973) Cross-cultural studies of facial expression, pp 169–222 in P. Ekman (ed.) Darwin and Facial Expression
Ekman P (1979) In: about brows: emotional and conversational signals. Cambridge University Press, Cambridge, pp 169–202
Ekman P, Friesen W (1978) Facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists, Palo Alto
Face Gen 3d Human Faces: http://www.facegen.com/
Foster ME (2007) Enhancing human-computer interaction with embodied conversational agents, Universal access in human-computer interaction. Ambient Interaction, ISSN 0302–9743, Springer Verlag
Fratarcangeli M, Adolfi M, Stankovic K, Pandzic IS (2009) Animatable face models from uncalibrated input features. In: Proceedings of the 10th International Conference on Telecommunications ConTEL
Gebhard P, Schröder M, Charfuelan M, Endres C, Kipp M, Pammi S, Rumpler M, Türk O (2008) IDEAS4Games: building expressive virtual characters for computer games. In Proceedings of the 8th international Conference on intelligent Virtual Agents (Tokyo, Japan, September 01–03, 2008). H. Prendinger, J. Lester, and M. Ishizuka, Eds. Lecture Notes In Artificial Intelligence, vol. 5208. Springer-Verlag, Berlin, Heidelberg, 426–440
Gosselin P (1995) Kirouac, Gilles, Le decodage de prototypes emotionnels faciaux, Canadian Journal of Experimental Psychology, pp 313–329
Hartmann B, Mancini M, Pelachaud C (2002) Formational parameters and adaptive prototype instantiation for MPEG-4 compliant gesture synthesis. In: Proc. Computer Animation. (19–21), pp 111–119
Heck R, Gleicher M (2007) Parametric motion graphs. In: I3D ’07: Proceedings of the 2007 symposium on Interactive 3D graphics and games, New York, NY, USA, ACM (2007) 129–136
Heloir A, Kipp M (2009) EMBR—A realtime animation engine for interactive embodied agents. IVA 2009, 393–404
Horde3D - Next-Generation Graphics Engine, http://www.horde3d.org/
Ingemars N (2007) A feature based face tracker using extended Kalman filtering, 2007, University essay from Linköpings universitet
Irrlicht Engine, http://irrlicht.sourceforge.net/
Johnston M, Bangalore S (2000) Finite-state multimodal parsing and understanding. In Proceedings of the 18th Conference on Computational Linguistics - Volume 1 (Saarbrücken, Germany, July 31–August 04, 2000). International Conference On Computational Linguistics. Association for Computational Linguistics, Morristown, NJ, 369–375
Johnston M, Cohen PR, McGee D, Oviatt SL, Pittman JA, Smith I (1997) Unification-based multimodal integration. In Proceedings of the Eighth Conference on European Chapter of the Association For Computational Linguistics (Madrid, Spain, July 07–12, 1997). European Chapter Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 281–288
Kleinsmith A, Bianchi-Berthouze N (2007) Recognizing affective dimensions from body posture. ACII (2007) 48–58
Kopp S, Wachsmuth I (2004) Synthesizing multimodal utterances for conversational agents. Comp Animat and Virtual Worlds 15:39–52
Kopp S, Krenn B, Marsella S, Marshall A, Pelachaud C, Pirker H, Thorisson K, Vilhjalmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: Intelligent Virtual Agents, pp 205–217
Kovar L (2004) Automated methods for data-driven synthesis of realistic and controllable human motion. PhD thesis, University of Wisconsin-Madison
Lee J, Marsella S (2006) Nonverbal behavior generator for embodied conversational agents. In: Intelligent Virtual Agents, pp 243–255
Matlab http://www.mathworks.com
McNeill D (1992) Hand and mind: what gestures reveal about thought. University of Chicago Press
Microsoft Speech API: http://www.microsoft.com/speech
Neff M, Kipp M, Albrecht I, Seidel HP (2008) Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans Graph 27(1):1–24
OGRE - Open Source 3D Graphics Engine, http://www.ogre3d.org/
Oviatt SL, DeAngeli A, Kuhn K (1997) Integration and synchronization of input modes during multimodal human-computer interaction. In Proceedings of the Conference on Human Factors in Computing Systems: CHI ’97, pages 415–422, Atlanta, Georgia. ACM Press, New York.
Pandzic IS, Forchheimer R (2002) MPEG-4 Facial Animation—The standard, implementations and applications”, John Wiley & Sons (2002) ISBN 0-470-84465-5
Pandzic IS, Ahlberg J, Wzorek M, Rudol P, Mosmondor M (2003) Faces everywhere: towards ubiquitous production and delivery of face animation. In: Proceedings of the 2nd International Conference on Mobile and Ubiquitous Multimedia MUM 2003, pp 49–55
Pejsa T, Pandzic IS (2009) Architecture of an animation system for human characters. In: Proceedings of the 10th International Conference on Telecommunications ConTEL 2009
Pelachaud C (2009) Studies on gesture expressivity for a virtual agent. Speech Communication, special issue in honor of Bjorn Granstrom and Rolf Carlson, to appear
Rojas R (1996) Neural networks—a systematic introduction. Springer-Verlag
Schroeder M, Hunecke A (2007) Mary tts participation in the blizzard challenge 2007. In: Proceedings of the Blizzard Challenge 2007
Smid K, Zoric G, Pandzic IS (2006) [huge]: Universal architecture for statistically based human gesturing. In: Proceedings of the 6th International Conference on Intelligent Virtual Agents IVA 2006, pp 256–269
Spierling U (2005) Interactive digital storytelling: towards a hybrid conceptual approach. Paper presented at DIGRA 2005, Simon Fraser University, Burnaby, BC, Canada
Spierling U (2005) Beyond virtual tutors: semi-autonomous characters as learning companions. In ACM SIGGRAPH 2005 Educators Program (Los Angeles, California, July 31–August 04, 2005). P. Beckmann-Wells, Ed. SIGGRAPH ’05. ACM, New York, NY, 5
Steinmetz R (1996) Human perception of jitter and media synchronization. IEEE J Sel Areas Commun 14(1)
Stone M, DeCarlo D, Oh I, Rodriguez C, Stere A, Lees A, Bregler C (2004) Speaking with hands: Creating animated conversational characters from recordings of human performance. In: Proceedings of ACM SIGGRAPH 2004 23:506–513
Taylor PA, Black A, Caley R (1998) The architecture of the festival speech synthesis system. In: The Third ESCA Workshop in Speech Synthesis, pp 147–151
Thiebaux M, Marshall A, Marsella S, Kallmann M (2008) Smartbody: behavior realization for embodied conversational agents. In: Proceedings of Autonomous Agents and Multi-Agent Systems AAMAS
Van Deemter K, Krenn B, Piwek P, Klesen M, Schroeder M, Baumann S (2008) Fully generated scripted dialogue for embodied agents. Articial Intelligence, pp 1219–1244
Vilhjalmsson H, Cantelmo N, Cassell J, Chafai NE, Kipp M, Kopp S, Mancini M, Marsella S, Marshall AN, Pelachaud C, Ruttkay Z, Thorisson KR, Welbergen H, Werf RJ (2007) The behavior markup language: recent developments and challenges. In: IVA ’07: Proceedings of the 7th international conference on Intelligent Virtual Agents, Springer-Verlag, pp 99–11
Vinayagamoorthy V, Gillies M, Steed A, Tanguy E, Pan X, Loscos C, Slater M (2006) Building expression into virtual characters. In Eurographics Conference State of the Art Reports
Wehrle T, Kaiser S, Schmidt S, Scherer KR (2000) Studying the dynamics of emotional expression using synthesized facial muscle movements. J Pers Soc Psychol 78(1):105–119
Zorić G, Pandžić IS (2005) A real-time lip sync system using a genetic algorithm for automatic neural network configuration, in Proceedings of the International Conference on Multimedia & Expo, ICME 2005, Amsterdam, Netherlands
Zoric G, Smid K, Pandzic IS (2009) Towards facial gestures generation by speech signal analysis using huge architecture. In: Multimodal signals: cognitive and algorithmic issues: COST Action 2102 and euCognition International School Vietri sul Mare, Italy, April 21–26, 2008 Revised Selected and Invited Papers, Berlin, Heidelberg, Springer-Verlag, pp 112–120
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Čereković, A., Pandžić, I.S. Multimodal behavior realization for embodied conversational agents. Multimed Tools Appl 54, 143–164 (2011). https://doi.org/10.1007/s11042-010-0530-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0530-2