Head X: Customizable Audiovisual Synthesis for a Multi-purpose Virtual Head

  • Martin Luerssen
  • Trent Lewis
  • David Powers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6464)


The development of embodied conversational agents (ECAs) involves a wide range of cutting-edge technologies extending from multimodal perception to reasoning to synthesis. While each is important to a successful outcome, it is the synthesis that has the most immediate impact on the observer. The specific appearance and voice of an embodied conversational agent (ECA) can be decisive factors in meeting its social objectives. In light of this, we have developed an extensively customizable system for synthesizing a virtual talking 3D head. Rather than requiring explicit integration into a codebase, our software runs as a service that can be controlled by any external client, which substantially simplifies its deployment into new applications. We have explored the benefits of this approach across several internal research projects and student exercises as part of a university topic on ECAs.


Embodied conversational agents audiovisual speech synthesis software library 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bickmore, T., Cassell, J.: Social dialogue with embodied conversational agents. In: van Kuppevelt, J., Dybkjaer, L., Bernsen, N. (eds.) Advances in Natural, Multimodal Dialogue Systems, pp. 23–54. Kluwer Academic, New York (2005)CrossRefGoogle Scholar
  2. 2.
    Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: SIGGRAPH 1999: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194. ACM Press/Addison-Wesley Publishing Co., New York (1999)Google Scholar
  3. 3.
    Cassell, J.: Embodied conversational agents: representation and intelligence in user interface. AI Magazine 22(3), 67–83 (2001)Google Scholar
  4. 4.
    FaceGen SDK 3.6,
  5. 5.
    Gulz, A., Haakeb, M.: Design of animated pedagogical agents a look at their look. Int. J. Human-Computer Studies 64, 322–339 (2006)CrossRefGoogle Scholar
  6. 6.
    Kopp, S., Sowa, T., Wachsmuth, I.: Imitation games with an artificial agents: From mimicking to understanding shape-related iconic gestures. In: Braffort, A., Gherbi, R., Gibet, S., Richardson, J., Teil, D. (eds.) Gesture-Based Communication in Human-Computer Interaction, pp. 436–447. Springer, Berlin (2004)CrossRefGoogle Scholar
  7. 7.
    Massaro, D.W.: From multisensory integration to talking heads and language learning. In: Calvert, G., Spence, C., Stein, B.E. (eds.) Advances in Natural, Multimodal Dialogue Systems, pp. 153–176. MIT Press, Cambridge (2004)Google Scholar
  8. 8.
    Microsoft Speech API 5.3,
  9. 9.
    Milne, M., Luerssen, M., Lewis, T., Leibbrandt, R., Powers, D.: Development of a virtual agent based social tutor for children with autism spectrum disorders. In: Proc. 20th Int. Joint Conf. on Neural Networks, pp. 1555–1563. IEEE, Los Alamitos (2010)Google Scholar
  10. 10.
    Moreno, R., Flowerday, T.: Students’ choice of animated pedagogical agents in science learning: A test of the similarity-attraction hypothesis on gender and ethnicity. Contemporary Educational Psychology 31, 186–207 (2006)CrossRefGoogle Scholar
  11. 11.
    Oh, I., Stone, M.: Understanding RUTH: Creating believable behaviors for a virtual human under uncertainty. In: Duffy, V.G. (ed.) HCII 2007 and DHM 2007. LNCS, vol. 4561, pp. 443–452. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Poggi, I., Pelachaud, C., de Rosis, F., Carofiglio, V., Carolis, B.D.: GRETA. A Believable Embodied Conversational Agent. In: Stock, O., Zancarano, M. (eds.) Multimodal Intelligent Information Presentation, vol. 27, pp. 3–25. Springer, Netherlands (2005)CrossRefGoogle Scholar
  13. 13.
    Powers, D., Luerssen, M., Lewis, T., Leibbrandt, R., Milne, M., Pashalis, J., Treharne, K.: MANA for the Aging. In: Proceedings of the 2010 Workshop on Companionable Dialogue Systems, ACL 2010, pp. 7–12. ACL (2010)Google Scholar
  14. 14.
    Schroder, M., Trouvain, J.: The German text-to-speech synthesis system MARY: A tool for research, development and teaching. International Journal of Speech Technology 6(4), 365–377 (2003)CrossRefGoogle Scholar
  15. 15.
    Wang, A., Emmi, M., Faloutsos, P.: Assembling an expressive facial animation system. In: Sandbox 2007: Proceedings of the 2007 ACM SIGGRAPH Symposium on Video Games, pp. 21–26. ACM, New York (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Martin Luerssen
    • 1
  • Trent Lewis
    • 1
  • David Powers
    • 1
  1. 1.Artificial Intelligence LaboratoryFlinders UniversityAdelaideAustralia

Personalised recommendations