BEAT: the Behavior Expression Animation Toolkit

Part of the Cognitive Technologies book series (COGTECH)


The Behavior Expression Animation Toolkit (BEAT) allows animators to input typed text that they wish to be spoken by an animated human figure, and to obtain as output appropriate and synchronized non-verbal behaviors and synthesized speech in a form that can be sent to a number of different animation systems. The non-verbal behaviors are assigned on the basis of actual linguistic and contextual analysis of the typed text, relying on rules derived from extensive research into human conversational behavior. The toolkit is extensible, so that new rules can be quickly added. It is designed to plug into larger systems that may also assign personality profiles, motion characteristics, scene constraints, or the animation styles of particular animators.


Hand Gesture Language Module Conversational Agent Pitch Accent Virtual Actor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amaya, K., Bruderlin, A., Calvert, T.: Emotion from motion. In: Proceedings Graphics Interface’96 (1996) pp 222–229Google Scholar
  2. 2.
    Badler, N., Bindiganavale, R., Allbeck, J., Schuler, W., Zhao, L., and Palmer, M.: Parameterized action representation for virtual human agents. In: Embodied Conversational Agents, ed Cassell, J., Sullivan, J., Prevost, S., Churchill, E. ( The MIT Press, Cambridge, MA 2000 ) pp 256–284Google Scholar
  3. 3.
    Becheiraz, P., Thalmann, D.: A behavioral animation system for autonomous actors personified by emotions. In: Proceedings of the 1st Workshop on Embodied Conversational Characters (1998) pp 57–65Google Scholar
  4. 4.
    Blumberg, B., Galyean, T.A.: Multi-level direction of autonomous creatures for real-time virtual environments. In: SIGGRAPH 95 Conference Proceedings ( ACM SIGGRAPH Addison-Wesley, Reading, MA 1995 ) pp 47–54CrossRefGoogle Scholar
  5. 5.
    Bodenheimer, B., Rose, C., Cohen, M.: Verbs and adverbs: Multidimensional motion interpolation. IEEE Computer Graphics and Applications 18 (5): 32–40 (1998)CrossRefGoogle Scholar
  6. 6.
    Brand, M.: Voice puppetry. In: SIGGRAPH 99 Conference Proceedings ( ACM SIGGRAPH, Addison-Wesley, Reading, MA 1999 ) pp 21–28CrossRefGoogle Scholar
  7. 7.
    Bregler, C., Covell, M., Slaney, M.: Video rewrite: Driving visual speech with audio. SIGGRAPH 97 Conference Proceedings (ACM SIGGRAPH, Addison-Wesley, Reading, MA 1997 ) pp 353–360Google Scholar
  8. 8.
    Calvert, T.: Composition of realistic animation sequences for multiple human figures. In: Making Them Move: Mechanics, Control, and Animation of Articulated Figures, ed Badler, N., Barsky, B., Zeltzer, D. ( Morgan-Kaufmann, San Mateo, CA 1991 ) pp 35–50Google Scholar
  9. 9.
    Cassell, J.: Nudge, nudge, wink, wink: Elements of face-to-face conversation for embodied conversational agents. In: Embodied Conversational Agents, ed Cassell, J., Sullivan, J., Prevost, S., Churchill, E. ( The MIT Press, Cambridge, MA 2000 ) pp 1–27Google Scholar
  10. 10.
    Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., Stone, M.: Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In: Siggraph 94 Conference Proceedings ( ACM SIGGRAPH, Addison-Wesley, Reading, MA 1994 ) pp 413–420CrossRefGoogle Scholar
  11. 11.
    Cassell, J., Prevost, S.: Distribution of semantic features across speech and gesture by humans and computers. In: Proceedings of the Workshop on the Integration of Gesture in Language and Speech, Newark, DE (1996) pp 253–270Google Scholar
  12. 12.
    Cassell, J., Torres, O., Prevost, S.: Turn taking vs. discourse structure: How best to model multimodal conversation. In: Machine Conversations, ed Wilks, Y. ( Kluwer, The Hague 1999 ) pp 143–154Google Scholar
  13. 13.
    Chang, J.: Action Scheduling in Humanoid Conversational Agents. MS thesis in Electrical Engineering and Computer Science (MIT 1998 )Google Scholar
  14. 14.
    Chi, D., Costa, M., Zhao, L., Badler, N.: The EMOTE model for effort and shape. In: SIGGRAPH 00 Conference Proceedings ( ACM SIGGRAPH, Addison-Wesley, Reading, MA 2000 ) pp 173–182CrossRefGoogle Scholar
  15. 15.
    Colburn, A., Cohen, M.F., Drucker, S.: The role of eye gaze in avatar mediated conversational interfaces. MSR-TR-2000–81 (Microsoft Research 2000 )Google Scholar
  16. 16.
    Halliday, M.A.K.: Explorations in the Functions of Language. ( Edward Arnold, London 1973 )Google Scholar
  17. 17.
    Hirschberg, J.: Accent and discourse context: Assigning pitch accent in synthetic Speech. In: Proceedings AAAI’90 (1990) pp 952–957Google Scholar
  18. 18.
    Hiyakumoto, L., Prevost, S., Cassell, J.: Semantic and discourse information for text-to-speech intonation. In: Proceedings ACL Workshop on Concept-to-Speech Generation, Madrid (1997)Google Scholar
  19. 19.
    Huang, X., Acero, A., Adcock, J., Hon, H.-W., Goldsmith, J., Liu, J., Plumpe, M.: Whistler: A trainable text-to-speech system. In: Proceedings 4th International Conference on Spoken Language Processing (ICSLP’96), Piscataway, NJ (1996) pp 2387–2390CrossRefGoogle Scholar
  20. 20.
    Kurlander, D., Skelly, T., and Salesin, D.: Comic chat. In: SIGGRAPH 96 Conference Proceedings, ( ACM SIGGRAPH, Addison-Wesley, Reading, MA 1996 ) pp 225–236CrossRefGoogle Scholar
  21. 21.
    Lenat, D.B., Guha, R.V.: Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. ( Addison-Wesley, Reading, MA 1990 )Google Scholar
  22. 22.
    Massaro, D.W.: Perceiving Talking Faces: From Speech Perception to a Behavioral Principle. ( The MIT Press, Cambridge, MA 1987 )Google Scholar
  23. 23.
    McNeill, D.: Hand and Mind: What Gestures Reveal about Thought. (The University of Chicago Press 1992 )Google Scholar
  24. 24.
    Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to Wordnet: An on-line lexical database (1993)Google Scholar
  25. 25.
    Nagao, K., Takeuchi, A.: Speech dialogue with facial displays: Multimodal human-computer conversation. In: Proceedings ACL’94 (1994) pp 102–109Google Scholar
  26. 26.
    Pearce, A., Wyvill, B., Wyvill, G., Hill, D.: Speech and expression: A computer solution to face animation. In: Proceedings Graphics Interface (1986) pp 136–140Google Scholar
  27. 27.
    Pelachaud, C., Badler, N., Steedman, M.: Generating facial expressions for speech. Cognitive Science 20 (1): 1–46 (1994)CrossRefGoogle Scholar
  28. 28.
    Perlin, K.: Noise, hypertexture, antialiasing and gesture. In: Texturing and Modeling, A Procedural Approach, ed Ebert, D. ( AP Professional, Cambridge, MA 1994 )Google Scholar
  29. 29.
    Perlin, K., Goldberg, A.: Improv: A system for scripting interactive actors in virtual worlds. In: Proceedings of SIGGRAPH ’96 (1996) pp 205–216Google Scholar
  30. 30.
    Prevost, S., Steedman, M.: Specifying intonation from context for speech synthesis. Speech Communication 15: 139–153 (1994)CrossRefGoogle Scholar
  31. 31.
    Roehl, B.: Specification for a Standard Humanoid, Version 1.1,ed H.A.W. Group,
  32. 32.
    Taylor, P., Black, A., Caley, R.: The architecture of the Festival Speech Synthesis System. In: Proceedings 3rd ESCA Workshop on Speech Synthesis ( Jenolan Caves, Australia 1998 ) pp 147–151Google Scholar
  33. 33.
    Waters, K., Levergood, T.: An automatic lip-synchronization algorithm for synthetic faces. In: Proceedings of the 2nd ACM International Conference on Multimedia, San Francisco, CA (1994) pp 149–156Google Scholar
  34. 34.
    Yan, H.: Paired Speech and Gesture Generation in Embodied Conversational Agents. MS thesis in the Media Lab (MIT 2000 )Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  1. 1.MIT Media LaboratoryCambridgeUSA
  2. 2.MIT Media LaboratoryCambridgeUSA
  3. 3.MIT Media LaboratoryCambridgeUSA

Personalised recommendations