Developing Embodied Agents for Education Applications with Accurate Synchronization of Gesture and Speech

  • Jianfeng XuEmail author
  • Yuki Nagai
  • Shinya Takayama
  • Shigeyuki Sakazawa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9420)


Embodied agents have great potential for education field, which are promising to maximize the learner’s learning gains and enjoyment. In many education applications, multimodal representation of embodied agents is a powerful approach for obtaining the above benefit, which requires accurate synchronization of gesture and speech. For this purpose, we investigate the important issues in synchronization as a practical guideline for our algorithm design through a precedent case study and propose a two-step synchronization method. Our case study reveals that two issues (i.e. duration and timing) play an important role in synchronizing of gesture with speech. Considering the synchronization problem as a motion synthesis problem instead of a behavior scheduling problem used in the conventional methods, we employ a motion graph technique with constraints on gesture structure for coarse synchronization in a first step and refine this further by shifting and scaling the gesture in a second step. Subjective evaluation has demonstrated that the proposed method achieves more accurate synchronization with respect to both duration and timing, and higher motion quality than the state-of-the-art methods.

Furthermore, we have implemented the proposed synchronization method in an authoring tool for education applications. We have conducted several experiments in a university, whose results have demonstrated that our system makes the creation of attractive animations easier and faster (only about 10 % operation time) than manual creation of equal quality, and it is effective to use embodied agents in education applications.


Embodied agents Education applications Multimodal synchronization Gesture Motion graphs Dynamic programming 



All the participants in our experiments, especially Prof. Shirotomo Aizawa and his students in Nagoya University of Arts and Sciences, Japan, are greatly appreciated.


  1. 1.
    Arikan, O., Forsyth, D.: Interactive motion generation from examples. ACM Trans. Graph. 21(3), 483–490 (2002)zbMATHCrossRefGoogle Scholar
  2. 2.
    Arikan, O., Forsyth, D.A., O’Brien, J.F.: Motion synthesis from annotations. ACM Trans. Graph. 22(3), 402–408 (2003)zbMATHCrossRefGoogle Scholar
  3. 3.
    Beaudoin, P., Coros, S., van de Panne, M., Poulin, P.: Motion-motif graphs. In: SCA 2008, pp. 117–126 (2008)Google Scholar
  4. 4.
    Beskow, J., Engwall, O., Granstrom, B., Wik, P.: Design strategies for a virtual language tutor. In: INTERSPEECH-2004, pp. 1693–1696 (2004)Google Scholar
  5. 5.
    Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H., Yan, H.: Embodiment in conversational interfaces: Rea. In: CHI 1999, pp. 520–527 (1999)Google Scholar
  6. 6.
    Cassell, J., Sullivan, J., Prevost, S., Churchill, E.F.: Embodied Conversational Agents, 1st edn. The MIT Press, Cambridge (2000)Google Scholar
  7. 7.
    Cassell, J., Vilhjálmsson, H.H., Bickmore, T.: Beat: the behavior expression animation toolkit. In: ACM SIGGRAPH 2001, pp. 477–486 (2001)Google Scholar
  8. 8.
    Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Springer, New York (2001)Google Scholar
  9. 9.
    Ekman, P., Friesen, W.V., Hager, J.C.: Facial Action Coding System: The Manual on CD ROM. A Human Face, Salt Lake City (2002)Google Scholar
  10. 10.
    Forsyth, D.A., Arikan, O., Ikemoto, L., O’Brien, J.F.: Computational studies of human motion: Part 1, tracking and motion synthesis. Found. Trends Comput. Graph. Vis. 1(2), 77–254 (2006)Google Scholar
  11. 11.
    Gleicher, M., Shin, H.J., Kovar, L., Jepsen, A.: Snap-together motion: assembling run-time animations. In: I3D 2003, pp. 181–188 (2003)Google Scholar
  12. 12.
    Gulz, A., Haake, M., Silvervarg, A.: Extending a teachable agent with a social conversation module – effects on student experiences and learning. In: Biswas, G., Bull, S., Kay, J., Mitrovic, A. (eds.) AIED 2011. LNCS, vol. 6738, pp. 106–114. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  13. 13.
    Huang, J., Pelachaud, C.: Expressive body animation pipeline for virtual agent. In: Nakano, Y., Neff, M., Paiva, A., Walker, M. (eds.) IVA 2012. LNCS, vol. 7502, pp. 355–362. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  14. 14.
    Ieronutti, L., Chittaro, L.: Employing virtual humans for education and training in X3D/VRML worlds. Comput. Educ. 49(1), 93–109 (2007)CrossRefGoogle Scholar
  15. 15.
    Kopp, S., Krenn, B., Marsella, S.C., Marshall, A.N., Pelachaud, C., Pirker, H., Thórisson, K.R., Vilhjálmsson, H.H.: Towards a common framework for multimodal generation: the behavior markup language. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 205–217. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  16. 16.
    Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. ACM Trans. Graph. 21(3), 473–482 (2002)CrossRefGoogle Scholar
  17. 17.
    Lee, J., Chai, J., Reitsma, P., Hodgins, J., Pollard, N.: Interactive control of avatars animated with human motion data. ACM Trans. Graph. 21(3), 491–500 (2002)Google Scholar
  18. 18.
    Lee, J., Lee, K.H.: Precomputing avatar behavior from human motion data. In: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp. 79–87 (2004)Google Scholar
  19. 19.
    van Luin, J., op den Akker, R., Nijholt, A.: A dialogue agent for navigation support in virtual reality. In: CHI EA 2001, pp. 117–118 (2001)Google Scholar
  20. 20.
    Maldonado, H., Lee, J.E.R., Brave, S., Nass, C., Nakajima, H., Yamada, R., Iwamura, K., Morishima, Y.: We learn better together: enhancing elearning with emotional characters. In: CSCL 2005, pp. 408–417 (2005)Google Scholar
  21. 21.
    Marsella, S., Xu, Y., Lhommet, M., Feng, A., Scherer, S., Shapiro, A.: Virtual character performance from speech. In: SCA 2013, pp. 25–35 (2013)Google Scholar
  22. 22.
    McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746–748 (1976)CrossRefGoogle Scholar
  23. 23.
    McNeill, D.: Gesture and Thought. University of Chicago Press, Chicago (2005)CrossRefGoogle Scholar
  24. 24.
    McNeill, D.: So you think gestures are nonverbal? Psychol. Rev. 92(3), 350–371 (1985)CrossRefGoogle Scholar
  25. 25.
    Miller, L.M., D’Esposito, M.: Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J. Neurosci. 25(25), 5884–5893 (2005)CrossRefGoogle Scholar
  26. 26.
    Mizuguchi, M., Buchanan, J., Calvert, T.: Data driven motion transitions for interactive games. In: Proceedings of EUROGRAPHICS 2001 short papers (2001)Google Scholar
  27. 27.
    Neff, M., Kipp, M., Albrecht, I., Seidel, H.P.: Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans. Graph. 27(1), 5:1–5:24 (2008)CrossRefGoogle Scholar
  28. 28.
    Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for humanoid robots. In: IEEE/RSJ IROS 2010, pp. 4617–4624 (2010)Google Scholar
  29. 29.
    Niewiadomski, R., Bevacqua, E., Mancini, M., Pelachaud, C.: Greta: an interactive expressive ECA system. In: AAMAS 2009, vol. 2. pp. 1399–1400 (2009)Google Scholar
  30. 30.
    Nishida, T.: Conversational Informatics: An Engineering Approach. Wiley, New York (2007)CrossRefGoogle Scholar
  31. 31.
    Noma, T., Zhao, L., Badler, N.: Design of a virtual human presenter. IEEE Comput. Graph. Appl. 20(4), 79–85 (2000)CrossRefGoogle Scholar
  32. 32.
    Ogan, A., Finkelstein, S., Mayfield, E., D’Adamo, C., Matsuda, N., Cassell, J.: “oh dear stacy!": Social interaction, elaboration, and learning with teachable agents. In: CHI 2012, pp. 39–48 (2012)Google Scholar
  33. 33.
    Oura, K., Yamamoto, D., Takumi, I., Lee, A., Tokuda, K.: On-campus, user-participatable, and voice-interactive digital signage. J. Jpn Soc. Artif. Intell. 28(1), 60–67 (2013)Google Scholar
  34. 34.
    Reitsma, P.S.A., Pollard, N.S.: Evaluating motion graphs for character animation. ACM Trans. Graph. 26(4), 18 (2007)CrossRefGoogle Scholar
  35. 35.
    Ren, C., Zhao, L., Safonova, A.: Human motion synthesis with optimization-based graphs. Comput. Graph. Forum 29(2), 545–554 (2010)CrossRefGoogle Scholar
  36. 36.
    Rist, T., Andr, E., Baldes, S., Gebhard, P., Klesen, M., Kipp, M., Rist, P., Schmitt, M.: A review of the development of embodied presentation agents and their application fields. In: Prendinger, H., Ishizuka, M. (eds.) Life-Like Characters. Cognitive Technologies, pp. 377–404. Springer, Berlin (2004)CrossRefGoogle Scholar
  37. 37.
    Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)CrossRefGoogle Scholar
  38. 38.
    Safonova, A., Hodgins, J.K.: Construction and optimal search of interpolated motion graphs. ACM Trans. Graph. 26(3), 106 (2007)CrossRefGoogle Scholar
  39. 39.
    Shoemake, K.: Animating rotation with quaternion curves. In: ACM SIGGRAPH 1985, pp. 245–254 (1985)Google Scholar
  40. 40.
    Soliman, M., Guetl, C.: Intelligent pedagogical agents in immersive virtual learning environments: a review. In: MIPRO 2010, pp. 827–832 (2010)Google Scholar
  41. 41.
    Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans. Graph. 23(3), 506–513 (2004)CrossRefGoogle Scholar
  42. 42.
    Čerekovič, A., Pandžič, I.: Multimodal behavior realization for embodied conversational agents. Multimedia Tools Appl. 54(1), 143–164 (2011)CrossRefGoogle Scholar
  43. 43.
    Wang, J., Bodenheimer, B.: An evaluation of a cost metric for selecting transitions between motion segments. In: SCA 2003, pp. 232–238 (2003)Google Scholar
  44. 44.
    Xu, J., Myodo, E., Sakazawa, S.: Motion synthesis for affective agents using piecewise principal component regression. In: IEEE ICME 2013, pp. 1–7 (2013)Google Scholar
  45. 45.
    Xu, J., Takagi, K., Sakazawa, S.: Motion synthesis for synchronizing with streaming music by segment-based search on metadata motion graphs. IEEE ICME 2011, pp. 1–6 (2011)Google Scholar
  46. 46.
    Zhao, L., Safonova, A.: Achieving good connectivity in motion graphs. Graph. Models 71(4), 139–152 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Jianfeng Xu
    • 1
    Email author
  • Yuki Nagai
    • 1
  • Shinya Takayama
    • 1
  • Shigeyuki Sakazawa
    • 1
  1. 1.Smart Home and Robot Application Laboratory, KDDI R&D Laboratories, Inc.Fujimino-shiJapan

Personalised recommendations