Audio Based Real-Time Speech Animation of Embodied Conversational Agents

  • Mario Malcangi
  • Raffaele de Tintis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2915)


A framework dedicated to embodied agents facial animation based on speech analysis in presence of background noise is described. Target application areas are entertainment and mobile visual communication. This novel approach derives from the speech signal all the necessary information needed to drive 3-D facial models. Using both digital signal processing and soft computing (fuzzy logic and neural networks) methodologies, a very flexible and low-cost solution for the extraction of lips and facial-related information has been implemented. The main advantage of the speech-based approach is that it is not invasive, as speech is captured by means of a microphone and there is no physical contact with the subject (no use of magnetic sensors or optical markers). This gives additional flexibility to the application in that more applicability derives, if compared to other methodologies. First a speech-based lip driver system was developed in order to synchronize speech to lip movements, then the methodology was extended to some important facial movements so that a face-synching system could be modeled. The developed system is speaker and language independent, so also neural network training operations are not required.


Speech Signal Voice Activity Detector Speech Frame Linear Predictive Code Tongue Position 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Malcangi, M.: A Soft-Computing approach to fit a speech recognition system on a singlechip. In: 2002 International Workshop System-on-Chip for Real-Time Applications Proceedings, Banff, Canada, July 6-7 (2002)Google Scholar
  2. 2.
    Malcangi, M., de Tintis, R.: LipSync: A Real-Time System for Virtual Characters Lip-Synching. In: XIII Colloquium on Musical Informatics Proceedings, L’Aquila, Italy (2000)Google Scholar
  3. 3.
    Malcangi, M., de Tintis, R.: Sincronizzazione Labiale e Modellazione Facciale in Tempo Reale per l’Animazione di Personaggi Virtuali. In: II Convegno Tecnico Scientifico di MIMOS, Proceedings, Torino, October 28-29 (2002)Google Scholar
  4. 4.
    Poggi, I., Pelachaud, C.: Performative Facial Expressions in Animated Faces. In: Embodied Conversational Agents. MIT Press, Cambridge (2000)Google Scholar
  5. 5.
    Parke, F.I., Waters, K.: Speech Synchronized Animation. In: Computer Facial Animation. A K Peters, Ltd., Wellesley (1996)Google Scholar
  6. 6.
    Nitchie, E.B.: How to Read Lips For Fun and Profit. Hawthorne Books, New York (1979)Google Scholar
  7. 7.
    Cohen, M., Massaro, D.: Modeling co articulation in synthetic visual speech. In: Thalmann, N.M. (ed.) Models and Techniques in Computer Animation. Springer, Tokyo (1993)Google Scholar
  8. 8.
    Löfquist, A.: Speech as audible gestures. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modeling. Kluwer Academic Publishers, Dordrecht (1990)Google Scholar
  9. 9.
    Parke, F.I., Waters, K.: Anatomy of the Face, Head, and Neck. In: Computer Facial Animation. A K Peters, Ltd., Wellesley (1996)Google Scholar
  10. 10.
    Junqua, J.C., Mak, B., Reaves, B.: A robust algorithm for word boundary detection in presence of noise. IEEE Trans. Speech and Audio Processing 2(3) (July 1994)Google Scholar
  11. 11.
    Cao, Y., Sridharan, S., Moody, M.: Voiced/Unvoiced/Silence Classification of Noisy Speech in Real Time Audio Signal Processing. In: 5th Australian Regional Convention, Sydney (April 1995) (AES Preprint N. 4045)Google Scholar
  12. 12.
    Markowitz, J.A.: The Data of Speech Recognition. In: Using Speech Recognition. Prentice-Hall, Englewood Cliffs (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Mario Malcangi
    • 1
  • Raffaele de Tintis
    • 2
  1. 1.DICo – Dipartimento di Informatica e ComunicazioneUniversità degli Studi di MilanoMilanoItaly
  2. 2.DSPengineeringMilanoItaly

Personalised recommendations