Hand Gesture Synthesis for Conversational Characters

  • Michael Neff
Reference work entry


This chapter focuses on the generation of animated gesticulations, co-verbal gestures that are designed to accompany speech. It begins with a survey of research on human gesture, discussing the various forms of gesture, their structure, and timing requirements relative to speech. The two main problems for synthesizing gesture animation are determining what gestures a character should perform (the specification problem) and then generating appropriate motion (the animation problem). The specification problem has used a range of input, including speech prosody, spoken text, and a communicative intent. Both rule-based and statistical approaches are employed to determine gestures. Animation has also used a range of procedural, physics-based, and data-driven approaches in order to solve a significant set of expressive and coordination requirements. Fluid gesture animation must also reflect the context and include listener behavior and floor management. This chapter concludes with a discussion of future challenges.


Gesture Character animation Nonverbal communication Virtual agents Embodied conversational agents 


  1. Arikan O, Forsyth DA (2002) Interactive motion generation from examples. ACM Trans Graph 21(3):483–490CrossRefzbMATHGoogle Scholar
  2. Bergmann K, Kopp S, Eyssel F (2010) Individualized gesturing outperforms average gesturing–evaluating gesture production in virtual humans. In: International conference on intelligent virtual agents. Springer, Berlin/Heidelberg, pp 104–117CrossRefGoogle Scholar
  3. Bergmann K, Kahl S, Kopp.S (2013) Modeling the semantic coordination of speech and gesture under cognitive and linguistic constraints. In: Intelligent virtual agents. Springer, Berlin, Heidelberg, pp 203–216Google Scholar
  4. Cassell J, Vilhjálmsson H, Bickmore T (2001) BEAT: the behavior expression animation toolkit. In: Proceedings of SIGGRAPH 2001. ACM, New York, NY, pp 477–486Google Scholar
  5. Chi DM, Costa M, Zhao L, Badler NI (2000) The EMOTE model for effort and shape. In: Proceedings of SIGGRAPH 2000. ACM, New York, NY, pp 173–182Google Scholar
  6. Chiu C-C,Morency L-P, Marsella S (2015) Predicting co-verbal gestures: a deep and temporal modeling approach. In: International conference on intelligent virtual agents. Springer, Cham, pp 152–166.Google Scholar
  7. Fernández-Baena A, Montaño R, Antonijoan M, Roversi A, Miralles D, Alas F (2014) Gesture synthesis adapted to speech emphasis. Speech Comm 57:331–350CrossRefGoogle Scholar
  8. Goldin-Meadow S (2005) Hearing gesture: how our hands help us think. Harvard University Press, MassachusettsGoogle Scholar
  9. Goldin-Meadow S (2006) Talking and thinking with our hands. Curr Dir Psychol Sci 15(1):34–39CrossRefGoogle Scholar
  10. Hartmann B, Mancini M, Pelachaud C (2006) Implementing expressive gesture synthesis for embodied conversational agents. In Proc. Gesture Workshop 2005, vol 3881 of LNAI. Springer, Berlin\Heidelberg, pp 45–55Google Scholar
  11. Heloir A, Kipp M (2009) EMBR–A Realtime Animation Engine for Interactive Embodied Agents. In: Intelligent virtual agents 09. Springer, Berlin, Heidelberg, pp 393–404Google Scholar
  12. Heylen D, Kopp S, Marsella SC, Pelachaud C, Vilhjálmsson H (2008) The next step towards a function markup language. In: International workshop on intelligent virtual agents. Springer, Berlin, Heidelberg, pp 270–280Google Scholar
  13. Hostetter AB (2011) When do gestures communicate? A meta-analysis. Psychol Bull 137(2):297CrossRefGoogle Scholar
  14. Jörg S, Hodgins J, Safonova A (2012) Data-driven finger motion synthesis for gesturing characters. ACM Trans Graph 31(6):189CrossRefGoogle Scholar
  15. Kallmann M, Marsella S (2005) Hierarchical motion controllers for real-time autonomous virtual humans. In: Proceedings of the 5th International working conference on intelligent virtual agents (IVA’05), pp 243–265, Kos, Greece, 12–14 September 2005Google Scholar
  16. Kendon A (1972) Some relationships between body motion and speech. Stud dyadic commun 7(177):90Google Scholar
  17. Kendon A (1988) How gestures can become like words. Cross-cult perspect nonverbal commun 1:131–141Google Scholar
  18. Kendon A (1994) Do gestures communicate? A review. Res lang soc interact 27(3):175–200CrossRefGoogle Scholar
  19. Kipp M (2005) Gesture generation by imitation: from human behavior to computer character animation. Universal-Publishers, Boca Raton, Fl, USAGoogle Scholar
  20. Kipp M, Neff M, Kipp K, Albrecht I (2007) Towards natural gesture synthesis: evaluating gesture units in a data-driven approach to gesture synthesis. In Proceedings of intelligent virtual agents (IVA07), vol 4722 of LNAI, Association for Computational Linguistics, Berlin, Heidelberg, pp 15–28Google Scholar
  21. Kita S (1990) The temporal relationship between gesture and speech: a study of Japanese-English bilinguals. MS Dep Psychol Univ Chic 90:91–94Google Scholar
  22. Kita S, Van Gijn I, Van Der Hulst H (1998) Movement phase in signs and co-speech gestures, and their transcriptions by human coders. In: Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction. Springer-Verlag, Berlin, Heidelberg, pp 23–35Google Scholar
  23. Kochanek DHU, Bartels RH (1984) Interpolating splines with local tension, continuity, and bias control. Comput Graph 18(3):33–41CrossRefGoogle Scholar
  24. Kopp S, Wachsmuth I (2004) Synthesizing multimodal utterances for conversational agents. Comput Anim Virtual Worlds 15:39–52CrossRefGoogle Scholar
  25. Kopp S, Tepper P, Cassell J (2004) Towards integrated microplanning of language and iconic gesture for multimodal output. In: Proceedings of the 6th international conference on multimodal interfaces. ACM, New York, NY, pp 97–104Google Scholar
  26. Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: International workshop on intelligent virtual agents. Springer, Berlin, Heidelberg, pp 205–217Google Scholar
  27. Kopp S, Bergmann K, Kahl S (2013) A spreading-activation model of the semantic coordination of speech and gesture. In: Proceedings of the 35th annual conference of the cognitive science society (CogSci 2013). Cognitive Science Society, Austin (in press, 2013)Google Scholar
  28. Kovar L, Gleicher M, Pighin F (2002) Motion graphs. ACM Trans Graph 21(3):473–482CrossRefGoogle Scholar
  29. Lamb W (1965) Posture and gesture: an introduction to the study of physical behavior. Duckworth, LondonGoogle Scholar
  30. Lee J, Marsella S (2006) Nonverbal behavior generator for embodied conversational agents. In: Intelligent virtual agents. Springer, Berlin, Heidelberg, pp 243–255Google Scholar
  31. Lee J, Chai J, Reitsma PSA, Hodgins JK, Pollard NS (2002) Interactive control of avatars animated with human motion data. ACM Trans Graph 21(3):491–500Google Scholar
  32. Levine S, Theobalt C, Koltun V (2009) Real-time prosody-driven synthesis of body language. ACM Trans Graph 28(5):1–10CrossRefGoogle Scholar
  33. Levine S, Krahenbuhl P, Thrun S, Koltun V (2010) Gesture controllers. ACM Trans Graph 29(4):1–11CrossRefGoogle Scholar
  34. Lhommet M, Marsella SC (2013) Gesture with meaning. In: Intelligent Virtual Agents. Springer, Berlin, Heidelberg, pp 303–312Google Scholar
  35. Marsella S, Xu Y, Lhommet M, Feng A, Scherer S, Shapiro A (2013) Virtual character performance from speech. In: Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, ACM, New York, NY, pp 25–35Google Scholar
  36. McNeill D (1992) Hand and mind: what gestures reveal about thought. University of Chicago Press, ChicagoGoogle Scholar
  37. McNeill D (2005) Gesture and thought. University of Chicago Press, ChicagoCrossRefGoogle Scholar
  38. McNeill D, Levy E (1982) Conceptual representations in language activity and gesture. In: Jarvella RJ, Klein W (eds) Speech, place, and action. Wiley, Chichester, pp 271–295Google Scholar
  39. Morency L-P, de Kok I, Gratch J (2008) Predicting listener backchannels: a probabilistic multimodal approach. In: International workshop on intelligent virtual agents. Springer, Berlin/Heidelberg, pp 176–190CrossRefGoogle Scholar
  40. Neff M, Fiume E (2002) Modeling tension and relaxation for computer animation. In Proc. ACM SIGGRAPH Symposium on Computer Animation 2002, ACM, New York, NY, pp 81–88Google Scholar
  41. Neff M, Fiume E (2005) AER: aesthetic exploration and refinement for expressive character animation. In: Proceeding of ACM SIGGRAPH / Eurographics Symposium on Computer Animation 2005, ACM, New York, NY, pp 161–170Google Scholar
  42. Neff M, Kipp M, Albrecht I, Seidel H-P (2008) Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans Graph 27(1):5:1–5:24CrossRefGoogle Scholar
  43. Nobe S (2000) Where do most spontaneous representational gestures actually occur with respect to speech. Lang gesture 2:186CrossRefGoogle Scholar
  44. SAIBA. Working group website, 2012.
  45. Shapiro A (2011) Building a character animation system. In: International conference on motion in games, Springer, Berlin\Heidelberg, pp 98–109Google Scholar
  46. Singer MA, Goldin-Meadow S (2005) Children learn when their teacher’s gestures and speech differ. Psychol Sci 16(2):85–89CrossRefGoogle Scholar
  47. Stone M, DeCarlo D, Oh I, Rodriguez C, Stere A, Lees A, Bregler C (2004) Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans Graph 23(3):506–513CrossRefGoogle Scholar
  48. Thiebaux M, Marshall A, Marsella S, Kallman M (2008) Smartbody: behavior realization for embodied conversational agents. In: Proceedings of 7th International Conference on autonomous agents and multiagent systems (AAMAS 2008), International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, pp 151–158Google Scholar
  49. Van Welbergen H, Reidsma D, Ruttkay Z, Zwiers J (2010) Elckerlyc-A BML realizer for continuous, multimodal interaction with a virtual human. Journal on Multimodal User Interfaces 4(2):97–118Google Scholar
  50. Vilhjalmsson H, Cantelmo N, Cassell J, Chafai NE, Kipp M, Kopp S, Mancini M, Marsella S, Marshall A, Pelachaud C et al (2007) The behavior markup language: recent developments and challenges. In: Intelligent virtual agents. Springer, Berlin/New York, pp 99–111CrossRefGoogle Scholar
  51. Wang Y, Neff M (2013) The influence of prosody on the requirements for gesture-text alignment. In: Intelligent virtual agents. Springer, Berlin/New York, pp 180–188CrossRefGoogle Scholar
  52. Wang Y, Ruhland K, Neff M, O’Sullivan C (2016) Walk the talk: coordinating gesture with locomotion for conversational characters. Comput Anim Virtual Worlds 27(3–4):369–377CrossRefGoogle Scholar
  53. Wheatland N, Wang Y, Song H, Neff M, Zordan V, Jörg S (2015) State of the art in hand and finger modeling and animation. Comput Graphics Forum. The Eurographs Association and John Wiley & Sons, Ltd., Chichester, 34(2):735–760Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and Program for Cinema and Digital MediaUniversity of California – DavisDavisUSA

Personalised recommendations