Language Resources and Evaluation

, Volume 41, Issue 3–4, pp 325–339 | Cite as

An annotation scheme for conversational gestures: how to economically capture timing and form

  • Michael Kipp
  • Michael Neff
  • Irene Albrecht


The empirical investigation of human gesture stands at the center of multiple research disciplines, and various gesture annotation schemes exist, with varying degrees of precision and required annotation effort. We present a gesture annotation scheme for the specific purpose of automatically generating and animating character-specific hand/arm gestures, but with potential general value. We focus on how to capture temporal structure and locational information with relatively little annotation effort. The scheme is evaluated in terms of how accurately it captures the original gestures by re-creating those gestures on an animated character using the annotated data. This paper presents our scheme in detail and compares it to other approaches.


Multimodal corpora Embodied conversational agents Gesture generation Human–computer interaction 



The authors would like to thank all reviewers for their kind and helpful remarks. This research was partially funded by the German Ministry of Research and Technology (BMBF) under grant 01 IMB 01A (VirtualHuman). The responsibility for the contents of this paper lies with the authors.


  1. Buisine, S., Abrilian, S., Niewiadomski, R., Martin, J.-C., Devillers, L., & Pelachaud, C. (2006). Perception of blended emotions: From video corpus to expressive agent. In Proc. of the 6th International Conference on Intelligent Virtual Agents, pp. 93–106.Google Scholar
  2. Calbris, G. (1990). Semiotics of French Gesture. Indiana University Press.Google Scholar
  3. Cassell, J., Vilhjálmsson, H., & Bickmore, T. (2001). BEAT: The behavior expression animation toolkit. In Proceedings of SIGGRAPH, pp. 477–486.Google Scholar
  4. Chafai, N., Pelachaud, C., Pelé, D., & Breton, G. (2006). Gesture expressivity modulations in an ECA application. In Proceedings of the 6th International Conference on Intelligent Virtual Agents, Springer.Google Scholar
  5. Chi, D. M., Costa, M., Zhao L., & Badler, N. I. (2000). The EMOTE model for effort and shape. In Proc. of SIGGRAPH, pp. 173–182.Google Scholar
  6. Frey, S. (1999). Die Macht des Bildes. Bern: Verlag Hans Huber.Google Scholar
  7. Frey, S., Hirsbrunner, H. P., Florin, A., Daw, W., & Crawford, R. (1983). A unified approach to the investigation of nonverbal and verbal behavior in communication research. In W. Doise & S. Moscovici (Eds.), Current issues in European Social Psychology (pp. 143–199). Cambridge University Press.Google Scholar
  8. Hartmann, B., Mancini, M., Buisine, S., & Pelachaud, C. (2005). Design and evaluation of expressive gesture synthesis for embodied conversational agents. In Proc. of the 4th international joint conference on Autonomous agents and multiagent systems, ACM Press.Google Scholar
  9. Kendon, A. (1996). An agenda for gesture studies. The Semiotic Review of Books, 7(3), 8–12.Google Scholar
  10. Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge University Press.Google Scholar
  11. Kipp, M. (2001). Anvil – A generic annotation tool for multimodal dialogue. In Proc. Eurospeech, pp. 1367–1370.Google Scholar
  12. Kipp, M. (2004). Gesture generation by imitation: From human behavior to computer character animation., Boca Raton, Florida.Google Scholar
  13. Kipp, M., Neff, M., Kipp, K. H., & Albrecht, I. (2007). Towards natural gesture synthesis: Evaluating gesture units in a data-driven approach to gesture synthesis. In Proc. of the 7th International Conference on Intelligent Virtual Agents, Springer.Google Scholar
  14. Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases in signs and co-speech gestures, and their transcription by human coders. In I. Wachsmuth & M. Fröhlich (Eds.), Gesture and sign language in human-computer interaction (pp. 23–35). Springer.Google Scholar
  15. Kopp, S., Krenn, B., Marsella, S., Marshall, A., Pelachaud, C., Pirker, H., Thorisson, K., & Vilhjalmsson, H. (2006). Towards a common framework for multimodal generation in ECAs: The behavior markup language. In Proc. of the IVA-06, Springer, pp. 205–217.Google Scholar
  16. Kopp, S., Tepper, P., & Cassell, J. (2004). Towards integrated microplanning of language and iconic gesture for multimodal output. In Proc. Intl. Conf. Multimodal Interfaces; pp. 97–104.Google Scholar
  17. Krämer, N. C., Tietz, B., & Bente, G. (2003). Effects of embodied interface agents and their gestural activity. In Proc. of the 4th International Conference on Intelligent Virtual Agents, Springer.Google Scholar
  18. Loehr, D. (2004). Gesture and intonation. Doctoral Dissertation, Georgetown University.Google Scholar
  19. Martell, C. (2002). FORM: An extensible, kinematically-based gesture annotation scheme. In Proc. ICSLP-02, pp. 353–356.Google Scholar
  20. Martin, J.-C., Niewiadomski, R., Devillers, L., Buisine, S., & Pelachaud, C. (2006). Multimodal complex Emotions: Gesture expressivity and blended facial expressions In International Journal of Humanoid Robotics, World Scientific Publishing Company.Google Scholar
  21. McNeill, D. (1992). Hand and mind: What gestures reveal about thought, University of Chicago Press.Google Scholar
  22. McNeill, D. (2005). Gesture & thought. University of Chicago Press.Google Scholar
  23. Neff, M., Kipp, M., Albrecht, I., & Seidel, H.-P. (2008). Gesture modeling and animation based on a probabilistic recreation of speaker behavior. ACM Transactions on Graphics. ACM Press.Google Scholar
  24. Prillwitz, S., Leven, R., Zienert, H., Hanke, T., & Henning, J. (1989). Hamburg notation system for sign languages: An introductory guide. In International Studies on Sign Language and Communication of the Deaf, Signum Press.Google Scholar
  25. Rist, T., André, E., Baldes, S., Gebhard, P., Klesen, M., Kipp, M., Rist, P., & Schmitt, M. (2003). A review of the development of embodied presentation agents and their application fields. In H. Prendinger & M. Ishizuka (Eds.), Life-like characters - Tools, affective functions, and applications (pp. 377–404). Springer.Google Scholar
  26. Schegloff, E. (1984). On some gestures’ relation to talk. In J. M. Atkinson & J. Heritage (Eds.), Structures of social action (pp. 266–298). Cambrige University Press.Google Scholar
  27. Vilhjalmsson, H., Cantelmo, N., Cassell, J., Chafai, N. E., Kipp, M., Kopp, S., Mancini, M., Marsella, S., Marshall, A. N., Pelachaud, C., Ruttkay, Z., Thórisson, K. R., van Welbergen, H., & van der Werf, R. J. (2007). The behavior markup language: Recent developments and challenges. In Proc. of the 7th International Conference on Intelligent Virtual Agents, Springer.Google Scholar
  28. Webb, R. (1997). Linguistic properties of metaphoric gestures. PhD thesis, New York: University of Rochester.Google Scholar
  29. Wegener Knudsen, M., Martin, J.-C., Dybkjær, L., Machuca Ayuso, M., Bernsen, N.O., Carletta, J., Heid, U., Kita, S., Llisterri, J., Pelachaud, C., Poggi, I., Reithinger, N., van Elswijk, G., & Wittenburg, P. (2002). Survey of multimodal annotation schemes and best practice. ISLE Deliverable D9.1.

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  1. 1.DFKI, Embodied Agents Research GroupSaarbrückenGermany
  2. 2.Department of Computer Science and Program in Technocultural StudiesUniversity of CaliforniaDavisUSA
  3. 3.TomTec Imaging Systems GmbHUnterschleißheimGermany

Personalised recommendations