The Influence of Prosody on the Requirements for Gesture-Text Alignment

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8108)


Designing an agent capable of multimodal communication requires synchronization of the agent’s performance across its communication channels: text, prosody, gesture, body movement and facial expressions. The synchronization of gesture and spoken text has significant repercussions for agent design. To explore this issue, we examined people’s sensitivity to misalignments between gesture and spoken text, varying both the gesture type and the prosodic emphasis. This study included ratings of individual clips and ratings of paired clips with different alignments. Subjects were unable to notice alignment errors of up to ±0.6s when shown a single clip. However, when shown paired clips, gestures occurring after the lexical affiliate are rated less positively. There is also evidence that stronger prosody cues make people more sensitive to misalignment. This suggests that agent designers may be able to “cheat” when it comes to maintaining tight synchronization between audio and gesture without a decrease in agent naturalness, but this cheating may not be optimal.


Agent Design Movie Clip Alignment Timing Gesture Type Facial Expres 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adolphs, R.: Neural systems for recognizing emotion. Current Opinion in Neurobiology 12(2), 169–177 (2002)CrossRefGoogle Scholar
  2. 2.
    Albrecht, I., Haber, J., Seidel, H.P.: Automatic generation of non-verbal facial expressions from speech. In: Advances in Modelling, Animation and Rendering, pp. 283–293. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Boersma, P.: Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341–345 (2002)Google Scholar
  4. 4.
    Bregler, C., Covell, M., Slaney, M.: Video rewrite: Driving visual speech with audio. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, pp. 353–360. ACM Press/Addison-Wesley Publishing Co (1997)Google Scholar
  5. 5.
    Cassell, J., Vilhjálmsson, H., Bickmore, T.: BEAT: the behavior expression animation toolkit. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 477–486. ACM (2001)Google Scholar
  6. 6.
    Chuang, E., Bregler, C.: Mood swings: expressive speech animation. ACM Transactions on Graphics (TOG) 24(2), 331–347 (2005)CrossRefGoogle Scholar
  7. 7.
    Efron, D.: Gesture and Environments. King’s Crown Press (1941)Google Scholar
  8. 8.
    Inc., A.: Maya, 3d computer graphics software (2008)Google Scholar
  9. 9.
    Ju, E., Lee, J.: Expressive facial gestures from motion capture data, vol. 27(2), pp. 381–388 (2008)Google Scholar
  10. 10.
    Kendon, A.: Current issues in the study of gesture. In: The Biological Foundations of Gestures: Motor and Semiotic Aspects, pp. 23–47 (1986)Google Scholar
  11. 11.
    Kirchhof, C.: On the audiovisual integration of speech and gesture. In: The 5th Conference of the International Society for Gesture Studies, ISGS (2012)Google Scholar
  12. 12.
    Kopp, S., Wachsmuth, I.: Synthesizing multimodal utterances for conversational agents. Computer Animation and Virtual Worlds 15(1), 39–52 (2004)CrossRefGoogle Scholar
  13. 13.
    Levine, S., Krähenbühl, P., Thrun, S., Koltun, V.: Gesture controllers. ACM Transactions on Graphics (TOG) 29(4), 124 (2010)CrossRefGoogle Scholar
  14. 14.
    Levine, S., Theobalt, C., Koltun, V.: Real-time prosody-driven synthesis of body language. ACM Transactions on Graphics (TOG) 28, 172 (2009)CrossRefGoogle Scholar
  15. 15.
    McNeill, D.: Hand and mind: What gestures reveal about thought. University of Chicago Press (1992)Google Scholar
  16. 16.
    McNeill, D.: Gesture and thought. University of Chicago Press (2008)Google Scholar
  17. 17.
    Montepare, J., Koff, E., Zaitchik, D., Albert, M.: The use of body movements and gestures as cues to emotions in younger and older adults. Journal of Nonverbal Behavior 23(2), 133–152 (1999)CrossRefGoogle Scholar
  18. 18.
    Morency, L.P., Sidner, C., Lee, C., Darrell, T.: Head gestures for perceptual interfaces: The role of context in improving recognition. Artificial Intelligence 171(8), 568–585 (2007)CrossRefGoogle Scholar
  19. 19.
    Neff, M., Kipp, M., Albrecht, I., Seidel, H.P.: Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Transactions on Graphics (TOG) 27(1), 5 (2008)CrossRefGoogle Scholar
  20. 20.
    Rimé, B., Schiaratura, L.: Gesture and speech (1991)Google Scholar
  21. 21.
    Sargin, M.E., Erzin, E., Yemez, Y., Tekalp, A., Erdem, A., Erdem, C., Ozkan, M.: Prosody-driven head-gesture animation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 2, pp. II–677. IEEE (2007)Google Scholar
  22. 22.
    Schröder, M.: Speech and emotion researchGoogle Scholar
  23. 23.
    Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: Creating animated conversational characters from recordings of human performance. ACM Transactions on Graphics (TOG) 23(3), 506–513 (2004)CrossRefGoogle Scholar
  24. 24.
    Terken, J.: Fundamental frequency and perceived prominence of accented syllables. The Journal of the Acoustical Society of America 89, 1768 (1991)CrossRefGoogle Scholar
  25. 25.
    Wallbott, H.G.: Bodily expression of emotion. European Journal of Social Psychology 28(6), 879–896 (1998)CrossRefGoogle Scholar
  26. 26.
    Yasinnik, Y., Renwick, M., Shattuck-Hufnagel, S.: The timing of speech-accompanying gestures with respect to prosody. In: Proceedings of the International Conference: From Sound to Sense, vol. 50, pp. 10–13 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.University of CaliforniaDavisUSA

Personalised recommendations