3D Audiovisual Rendering and Real-Time Interactive Control of Expressivity in a Talking Head

  • Jean-Claude Martin
  • Christophe d’Alessandro
  • Christian Jacquemin
  • Brian Katz
  • Aurélien Max
  • Laurent Pointal
  • Albert Rilliard
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4722)


The integration of virtual agents in real-time interactive virtual applications raises several challenges. The rendering of the movements of the virtual character in the virtual scene (locomotion of the character or rotation of its head) and the binaural rendering in 3D of the synthetic speech during these movements need to be spatially coordinated. Furthermore, the system must enable real-time adaptation of the agent’s expressive audiovisual signals to user’s on-going actions. In this paper, we describe a platform that we have designed to address these challenges as follows: (1) the modules enabling real time synthesis and spatial rendering of the synthetic speech, (2) the modules enabling 3D real time rendering of facial expressions using a GPU-based 3D graphic engine, and (3) the integration of these modules within an experimental platform using gesture as an input modality. A new model of phoneme-dependent human speech directivity patterns is included in the speech synthesis system, so that the agent can move in the virtual scene with realistic 3D visual and audio rendering. Future applications of this platform include perceptual studies about multimodal perception and interaction, expressive real time question and answer system and interactive arts.


3D animation voice directivity real-time and interactivity expressive-ness experimental studies 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bailly, G., Bérar, M., Elisei, F., Odisi, M.: Audiovisual Speech Synthesis. International Journal of Speech Technology. Special Issue on Speech Synthesis: Part II. 6(4) (2003)Google Scholar
  2. 2.
    Cohen, M.M., Massaro, D.W.: Modeling coarticulation in synthetic visual speech. Models and Techniques in Computer Animation. AAAI/MIT Press, Cambridge (1993)Google Scholar
  3. 3.
    Ma, J., Cole, R., Pellom, B., Ward, W., Wise, B.: Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data. Computer Animation and Virtual Worlds 15(5) (2004)Google Scholar
  4. 4.
    Beskow, J.: Talking Heads - Models and Applications for Multimodal Speech Synthesis. PhD Thesis. Stockholm (2003), http://www.speech.kth.se/~beskow/thesis/index.html
  5. 5.
    Reveret, L., Essa, I.: Visual Coding and Tracking of Speech Related Facial Motion, Hawai, USAGoogle Scholar
  6. 6.
    Bevacqua, E., Pelachaud, C.: Expressive audio-visual speech. Comp. Anim. Virtual Worlds, 15 (2004)Google Scholar
  7. 7.
    DeCarlo, D., Stone, M., Revilla, C., Venditti, J.: Specifying and Animating Facial Signals for Discourse in Embodied Conversational Agents. Computer Animation and Virtual Worlds 15(1) (2004)Google Scholar
  8. 8.
    Cohen, M., Beskow, J., Massaro, D.: Recent developments in facial animation: an inside view. In: AVSP 1998 (1998)Google Scholar
  9. 9.
    Ostermann, J.: Animation of synthetic faces in MPEG-4. In: Computer Animation 1998, Philadelphia, USA, pp. 49–51 (1998)Google Scholar
  10. 10.
    Schröder, M.: Speech and Emotion Research: An overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD Thesis (2004)Google Scholar
  11. 11.
    Kob, M., Jers, H.: Directivity measurement of a singer. Journal of the Acoustical Society of America 105(2) (1999)Google Scholar
  12. 12.
    Prudon, R., d’Alessandro, C.: selection/concatenation text-to-speech synthesis system: databases development, system design, comparative evaluation. In: 4th ISCA/IEEE International Workshop on Speech Synthesis, IEEE Computer Society Press, Los Alamitos (2001)Google Scholar
  13. 13.
    Katz, B., Prezat, F., d’Alessandro, C.: Human voice phoneme directivity pattern measurements. In: 4th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, Honolulu, Hawaï 3359 (2006)Google Scholar
  14. 14.
    D’Alessandro, C., D’Alessandro, N., Le Beux, S., Simko, J., Cetin, F., Pirker, H.: The speech conductor: gestural control of speech synthesis. In: eNTERFACE 2005. The SIMILAR NoE Summer Workshop on Multimodal Interfaces, Mons, Belgium, pp. 52–61 (2005)Google Scholar
  15. 15.
    Campbell, N.: Speech & Expression; the value of a longitudinal corpus. In: LREC 2004, pp. 183–186 (2004)Google Scholar
  16. 16.
    Martin, J.-C., Abrilian, S., Devillers, L.: Annotating Multimodal Behaviors Occurring during Non Basic Emotions. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 550–557. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    d’Alessandro, C., Rilliard, A., Le Beux, S.: Computerized chironomy: evaluation of hand-controlled Intonation reiteration. In: Interspeech 2007, Antwerp, Belgium (2007)Google Scholar
  18. 18.
    Le Beux, S., Rilliard, A.: A real-time intonation controller for expressive speech synthesis. In: SSW-6. 6th ISCA Workshop on Speech Synthesis, Bonn, Germany (2007)Google Scholar
  19. 19.
    d’Alessandro, N., Doval, B., d’Alessandro, C., Beux, L., Woodruff, P., Fabre, Y., Dutoit, T.: RAMCESS: Realtime and Accurate Musical Control of Expression in Singing Synthesis. Journal on Multimodal User Interfaces 1(1) (2007)Google Scholar
  20. 20.
    Pandzic, I.S., Forchheimer, R.: MPEG-4 Facial Animation. The Standard, Implementation and Applications. John Wiley & Sons, LTD, Chichester (2002)Google Scholar
  21. 21.
    Tsapatsoulis, N., Raouzaiou, A., Kollias, S., Cowie, R., Douglas-Cowie, E.: Emotion Recognition and Synthesis based on MPEG-4 FAPs. MPEG-4 Facial Animation. John Wiley & Sons, Chichester (2002)Google Scholar
  22. 22.
    Balci, K.: MPEG-4 based open source toolkit for 3D Facial Animation. In: Working conference on Advanced visual interfaces, New York, NY, USA, pp. 399–402 (2004)Google Scholar
  23. 23.
    Kshirsagar, S., Garchery, S., Magnenat-Thalmann, N.: Feature Point Based Mesh Deformation Applied to MPEG-4 Facial Animation. In: IFIP Tc5/Wg5.10 Deform 2000 Workshop and Avatars 2000 Workshop on Deformable Avatars, pp. 24–34 (2000)Google Scholar
  24. 24.
    Beeson, C.: Animation in the ”Dawn” demo. GPU Gems, Programming Techniques, Tips, and Tricks for Real-Time Graphics. Wiley, Chichester, UK (2004)Google Scholar
  25. 25.
    Fagel, S.: Video-realistic Synthetic Speech with a Parametric Visual Speech Synthesizer. In: International Conference on Spoken Language Processing (INTERSPEECH/ICSLP 2004) (2004)Google Scholar
  26. 26.
    Jacquemin, C.: Pogany: A tangible cephalomorphic interface for expressive facial animation. In: Paiva, A., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 558–569. Springer, Heidelberg (2007)Google Scholar
  27. 27.
    Benamara, F.: WebCoop: un système de Questions-Réponses coopératif sur le Web. PhD Thesis (2004)Google Scholar
  28. 28.
    Bosma, W.: Extending answers using discourse structure. In: Proceedings of RANLP workshop on Crossing Barriers in Text Summarization Research, Borovets, Bulgaria (2005)Google Scholar
  29. 29.
    Boves, L., den Os, E.: Interactivity and multimodality in the IMIX demonstrator. In: ICME 2005. IEEE conference on Multimedia and Expo (2005)Google Scholar
  30. 30.
    Rosset, S., Galibert, O., Illouz, G., Max, A.: Integrating spoken dialog and question answering: the Ritel project. In: Interspeech’06, Pittsburgh, USA (2006)Google Scholar
  31. 31.
    Marsi, E., van Rooden, F.: Expressing uncertainty with a Talking Head in a Multimodal Question-Answering System, Aberdeen, UKGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Jean-Claude Martin
    • 1
  • Christophe d’Alessandro
    • 1
  • Christian Jacquemin
    • 1
  • Brian Katz
    • 1
  • Aurélien Max
    • 1
  • Laurent Pointal
    • 1
  • Albert Rilliard
    • 1
  1. 1.LIMSI-CNRS, BP 133, 91403 Orsay CedexFrance

Personalised recommendations