Language Resources and Evaluation

, Volume 41, Issue 3–4, pp 367–388

Virtual agent multimodal mimicry of humans

  • George Caridakis
  • Amaryllis Raouzaiou
  • Elisabetta Bevacqua
  • Maurizio Mancini
  • Kostas Karpouzis
  • Lori Malatesta
  • Catherine Pelachaud


This work is about multimodal and expressive synthesis on virtual agents, based on the analysis of actions performed by human users. As input we consider the image sequence of the recorded human behavior. Computer vision and image processing techniques are incorporated in order to detect cues needed for expressivity features extraction. The multimodality of the approach lies in the fact that both facial and gestural aspects of the user’s behavior are analyzed and processed. The mimicry consists of perception, interpretation, planning and animation of the expressions shown by the human, resulting not in an exact duplicate rather than an expressive model of the user’s original behavior.


Emotional ECA synthesis Expressivity features Facial features extraction Gesture analysis Virtual agents 


  1. Ambady, N., & Rosenthal, R. (1992). Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychological Bulletin, 111(2), 256–274.CrossRefGoogle Scholar
  2. Baenziger, T., Pirker, H., & Scherer, K. (2006). Gemep - Geneva multimodal emotion portrayals: A corpus for the study of multimodal emotional expressions. In L. Devillers et al. (Eds.), Proceedings of LREC’06 Workshop on corpora for research on emotion and affect (pp. 15–19). Italy: Genoa.Google Scholar
  3. Byun, M., & Badler, N. (2002). Facemote: Qualitative parametric modifiers for facial animations. In Symposium on Computer Animation, San Antonio, TX.Google Scholar
  4. Caridakis, G., Castellano, G., Kessous, L., Raouzaiou, A., Malatesta, L., Asteriadis, S., & Karpouzis, K. (2007). Multimodal emotion recognition from expressive faces, body gestures and speech. In Proceedings of the 4th IFIP Conference on Artificial Intelligence Applications and Innovations (AIAI) 2007, Athens, Greece.Google Scholar
  5. Caridakis, G., Malatesta, L., Kessous, L., Amir, N., Raouzaiou, A., & Karpouzis, K. (2006). Modeling naturalistic affective states via facial and vocal expressions recognition. In International Conference on Multimodal Interfaces (ICMI’06), Banff, Alberta, Canada, November 2–4, 2006.Google Scholar
  6. Chartrand, T. L., Maddux, W., & Lakin, J. (2005). Beyond the perception-behavior link: The ubiquitous utility and motivational moderators of nonconscious mimicry. In R. Hassin, J. Uleman, & J. A. Bargh (Eds.), The new unconscious (pp. 334–361). New York, NY: Oxford University Press.Google Scholar
  7. Chi, D., Costa, M., Zhao, L., & Badler, N. (2000). The emote model for effort and shape. In ACM SIGGRAPH ’00, pp. 173–182, New Orleans, LA.Google Scholar
  8. Donato, G., Bartlett, M., Hager, J., Ekman, P., & Sejnowski, T. (1999). Classifying facial actions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(10), 974–989.CrossRefGoogle Scholar
  9. Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. J. Power (Eds.), Handbook of cognition & emotion (pp. 301–320). New York: John Wiley.CrossRefGoogle Scholar
  10. Ekman, P., & Friesen, W. (1969). The repertoire of nonverbal behavioral categories – origins, usage, and coding. Semiotica, 1, 49–98.Google Scholar
  11. Ekman, P. & Friesen, W. (1978). The facial action coding system. San Francisco, CA: Consulting Psychologists Press.Google Scholar
  12. Hartmann, B., Mancini, M., & Pelachaud, C. (2002). Formational parameters and adaptive prototype instantiation for MPEG-4 compliant gesture synthesis. In Computer Animation’02, Geneva, Switzerland. IEEE Computer Society Press.Google Scholar
  13. Hartmann, B., Mancini, M., & Pelachaud, C. (2005a). Implementing expressive gesture synthesis for embodied conversational agents. In Gesture Workshop, Vannes.Google Scholar
  14. Hartmann, B., Mancini, M., Buisine, S., & Pelachaud, C. (2005b). Design and evaluation of expressive gesture synthesis for embodied conversational agents. In AAMAS’05. Utretch.Google Scholar
  15. Ioannou, S., Raouzaiou, A., Tzouvaras, V., Mailis, T., Karpouzis, K., & Kollias, S. (2005). Emotion recognition through facial expression analysis based on a neurofuzzy network. Special Issue on Emotion: Understanding & Recognition, Neural Networks, 18(4), 423–435.Google Scholar
  16. Juslin, P., & Scherer, K. (2005). Vocal expression of affect. In J. Harrigan, R. Rosenthal, & K. Scherer (Eds.), The new handbook of methods in nonverbal behavior research. Oxford, UK: Oxford University Press.Google Scholar
  17. Kochanek, D. H., & Bartels, R. H. (1984). Interpolating splines with local tension, continuity, and bias control. In H. Christiansen (Ed.), Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH’ 84 (pp. 33–41). New York, NY: ACM.
  18. Kopp, S., Sowa, T., & Wachsmuth, I. (2003). Imitation games with an artificial agent: From mimicking to understanding shape-related iconic gestures. In Gesture Workshop, pp. 436–447.Google Scholar
  19. Lakin, J., Jefferis, V., Cheng, C., & Chartrand, T. (2003). The Chameleon effect as social Glue: Evidence for the evolutionary significance of nonconscious mimicry. Journal of Nonverbal Behavior, 27(3), 145–162.CrossRefGoogle Scholar
  20. Martin, J.-C., Abrilian, S., Devillers, L., Lamolle, M., Mancini, M., & Pelachaud, C. (2005). Levels of representation in the annotation of emotion for the specification of expressivity in ECAs. In International Working Conference on Intelligent Virtual Agents, Kos, Greece, pp. 405–417.Google Scholar
  21. Ong, S., & Ranganath, S. (2005). Automatic sign language analysis: A survey and the future beyond lexical meaning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 873–891.CrossRefGoogle Scholar
  22. Oviatt, S. (1999). Ten myths of multimodal interaction. Communications of the ACM, 42(11), 74–81.CrossRefGoogle Scholar
  23. Pelachaud, C., & Bilvi, M. (2003). Computational model of believable conversational agents. In M.-P. Huget (Ed.), Communication in multiagent systems, Vol. 2650 of Lecture notes in Computer Science (pp. 300–317). Springer-Verlag.Google Scholar
  24. Peters, C. (2005). Direction of attention perception for conversation initiation in virtual environments. In International Working Conference on intelligent virtual agents, Kos, Greece, pp. 215–228.Google Scholar
  25. Raouzaiou, A., Tsapatsoulis, N., Karpouzis, K., & Kollias, S. (2002). Parameterized facial expression synthesis based on MPEG-4. EURASIP Journal on Applied Signal Processing, 1(Jan), 1021–1038. Google Scholar
  26. Rapantzikos, K., & Avrithis, Y. (2005). An enhanced spatiotemporal visual attention model for sports video analysis. In International Workshop on content-based Multimedia indexing (CBMI), Riga, Latvia.Google Scholar
  27. Scherer, K., & Ekman, P. (1984). Approaches to emotion. Hillsdale: Lawrence Erlbaum Associates.Google Scholar
  28. Tekalp, A., & Ostermann, J. (2000). Face and 2-d mesh animation in mpeg-4. Signal Processing: Image Communication, 15, 387–421.CrossRefGoogle Scholar
  29. van Swol, L. (2003) The effects of nonverbal mirroring on perceived persuasiveness, agreement with an imitator, and reciprocity in a group discussion. Communication Research, 30(4), 461–480.CrossRefGoogle Scholar
  30. Wallbott, H. G., & Scherer, K. R. (1986). Cues and channels in emotion recognition. Journal of Personality and Social Psychology, 51(4), 690–699.CrossRefGoogle Scholar
  31. Wexelblat, A. (1995). An approach to natural gesture in virtual environments. ACM Transactions on Computer-Human Interaction, 2, 179–200.CrossRefGoogle Scholar
  32. Whissel, C. M. (1989). The dictionary of affect in language. In R. Plutchnik & H. Kellerman (Eds.), Emotion: Theory, research and experience: Vol. 4, The measurement of emotions. New York: Academic Press.Google Scholar
  33. Williams G. W. (1976). Comparing the joint agreement of several raters with another rater. Biometrics, 32, 619–627.CrossRefGoogle Scholar
  34. Wu, Y., & Huang, T. (2001). Hand modeling, analysis, and recognition for vision-based human computer interaction. IEEE Signal Processing Magazine, 18, 51–60.CrossRefGoogle Scholar
  35. Wu, Y., & Huang, T. S. (1999). Vision-based gesture recognition: A review. In The 3rd gesture workshop, Gif-sur-Yvette, France, pp. 103–115.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  • George Caridakis
    • 1
  • Amaryllis Raouzaiou
    • 1
  • Elisabetta Bevacqua
    • 2
  • Maurizio Mancini
    • 2
  • Kostas Karpouzis
    • 1
  • Lori Malatesta
    • 1
  • Catherine Pelachaud
    • 2
  1. 1.Image, Video and Multimedia Systems LaboratoryNational Technical University of AthensAthensGreece
  2. 2.LINC, IUT de MontreuilUniversite de Paris 8ParisFrance

Personalised recommendations