Audio Based Real-Time Speech Animation of Embodied Conversational Agents
A framework dedicated to embodied agents facial animation based on speech analysis in presence of background noise is described. Target application areas are entertainment and mobile visual communication. This novel approach derives from the speech signal all the necessary information needed to drive 3-D facial models. Using both digital signal processing and soft computing (fuzzy logic and neural networks) methodologies, a very flexible and low-cost solution for the extraction of lips and facial-related information has been implemented. The main advantage of the speech-based approach is that it is not invasive, as speech is captured by means of a microphone and there is no physical contact with the subject (no use of magnetic sensors or optical markers). This gives additional flexibility to the application in that more applicability derives, if compared to other methodologies. First a speech-based lip driver system was developed in order to synchronize speech to lip movements, then the methodology was extended to some important facial movements so that a face-synching system could be modeled. The developed system is speaker and language independent, so also neural network training operations are not required.
KeywordsSpeech Signal Voice Activity Detector Speech Frame Linear Predictive Code Tongue Position
Unable to display preview. Download preview PDF.
- 1.Malcangi, M.: A Soft-Computing approach to fit a speech recognition system on a singlechip. In: 2002 International Workshop System-on-Chip for Real-Time Applications Proceedings, Banff, Canada, July 6-7 (2002)Google Scholar
- 2.Malcangi, M., de Tintis, R.: LipSync: A Real-Time System for Virtual Characters Lip-Synching. In: XIII Colloquium on Musical Informatics Proceedings, L’Aquila, Italy (2000)Google Scholar
- 3.Malcangi, M., de Tintis, R.: Sincronizzazione Labiale e Modellazione Facciale in Tempo Reale per l’Animazione di Personaggi Virtuali. In: II Convegno Tecnico Scientifico di MIMOS, Proceedings, Torino, October 28-29 (2002)Google Scholar
- 4.Poggi, I., Pelachaud, C.: Performative Facial Expressions in Animated Faces. In: Embodied Conversational Agents. MIT Press, Cambridge (2000)Google Scholar
- 5.Parke, F.I., Waters, K.: Speech Synchronized Animation. In: Computer Facial Animation. A K Peters, Ltd., Wellesley (1996)Google Scholar
- 6.Nitchie, E.B.: How to Read Lips For Fun and Profit. Hawthorne Books, New York (1979)Google Scholar
- 7.Cohen, M., Massaro, D.: Modeling co articulation in synthetic visual speech. In: Thalmann, N.M. (ed.) Models and Techniques in Computer Animation. Springer, Tokyo (1993)Google Scholar
- 8.Löfquist, A.: Speech as audible gestures. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modeling. Kluwer Academic Publishers, Dordrecht (1990)Google Scholar
- 9.Parke, F.I., Waters, K.: Anatomy of the Face, Head, and Neck. In: Computer Facial Animation. A K Peters, Ltd., Wellesley (1996)Google Scholar
- 10.Junqua, J.C., Mak, B., Reaves, B.: A robust algorithm for word boundary detection in presence of noise. IEEE Trans. Speech and Audio Processing 2(3) (July 1994)Google Scholar
- 11.Cao, Y., Sridharan, S., Moody, M.: Voiced/Unvoiced/Silence Classification of Noisy Speech in Real Time Audio Signal Processing. In: 5th Australian Regional Convention, Sydney (April 1995) (AES Preprint N. 4045)Google Scholar
- 12.Markowitz, J.A.: The Data of Speech Recognition. In: Using Speech Recognition. Prentice-Hall, Englewood Cliffs (1996)Google Scholar