Advertisement

Abstract

LUCIA is an MPEG-4 facial animation system developed at ISTC-CNR. It works on standard Facial Animation Parameters and speaks with the Italian version of FESTIVAL TTS. To achieve an emotive/expressive talking head LUCIA was built from real human data physically extracted by ELITE optic-tracking movement analyzer. LUCIA can copy a real human being by reproducing the movements of passive markers positioned on his face and recorded by the ELITE device or can be driven by an emotional XML tagged input text, thus realizing true audio/visual emotive/expressive synthesis. Synchronization between visual and audio data is very important in order to create the correct WAV and FAP files needed for the animation. LUCIA’s voice is based on the ISTC Italian version of FESTIVAL-MBROLA packages, modified by means of an appropriate APML/VSML tagged language. LUCIA is available in two different versions: an open source framework and the “work in progress” WebGL.

Keywords

talking head TTS facial animation mpeg4 3D avatar virtual agent affective computing LUCIA FESTIVAL 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Massaro, D.W., Cohen, M.M., Beskow, J., Cole, R.A.: Developing and Evaluating Conversational Agents. In: Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds.) Embodied Conversational Agents, pp. 287–318. MIT Press, Cambridge (2000)Google Scholar
  2. 2.
    Le Goff, B.: Synthèse à partir du texte de visages 3D parlant français. PhD thesis, Grenoble, France (1997)Google Scholar
  3. 3.
    Bregler, C., Covell, M., Slaney, M.: Video Rewrite: Driving Visual Speech with Audio. In: SIGGRAPH 1997, pp. 353–360 (1997)Google Scholar
  4. 4.
    Lee, Y., Terzopoulos, D., Waters, K.: Realistic Face Modeling for Animation. In: SIGGRAPH 1995, pp. 55–62 (1995)Google Scholar
  5. 5.
    Vatikiotis-Bateson, E., Munhall, K.G., Hirayama, M., Kasahara, Y., Yehia, H.: Physiology-Based Synthesis of Audiovisual Speech. In: 4th Speech Production Seminar: Models and Data, pp. 241–244 (1996)Google Scholar
  6. 6.
    Beskow, J.: Rule-Based Visual Speech Synthesis. In: Eurospeech 1995, Madrid, Spain, pp.299–302 (1995)Google Scholar
  7. 7.
    LeGoff, B., Benoit, C.: A text-to-audiovisualspeech synthesizer for French. In: ICSLP 1996, Philadelphia, U.S.A, pp. 2163–2166 (1996)Google Scholar
  8. 8.
    Farnetani, E., Recasens, D.: Coarticulation Models in Recent Speech Production Theories. In: Hardcastle, W.J. (ed.) Coarticulation in Speech Production. Cambridge University Press, Cambridge (1999)Google Scholar
  9. 9.
    Bladon, R.A., Al-Bamerni, A.: Coarticulation resistance in English. J. Phonetics. 4, 135–150 (1976)Google Scholar
  10. 10.
    Cosi, P., Perin, G.: Labial Coarticulation Modeling for Realistic Facial Animation. In: ICMI 2002, Pittsburgh, U.S.A, pp. 505–510 (2002)Google Scholar
  11. 11.
    Cosi, P., Fusaro, A., Tisato, G.: LUCIA a New Italian Talking-Head Based on a Modified Cohen-Massaro’s Labial Coarticulation Model. In: Eurospeech 2003, Geneva, Switzerland, vol. III, pp. 2269–2272 (2003)Google Scholar
  12. 12.
    Ferrigno, G., Pedotti, A.: ELITE: A Digital Dedicated Hardware System for Movement Analysis via Real-Time TV Signal Processing. IEEE Transactions on Biomedical Engineering, BME-32, 943–950 (1985)CrossRefGoogle Scholar
  13. 13.
    Cosi, P., Tesser, F., Gretter, R., Avesani, C.: Festival Speaks Italian! In: Eurospeech 2001, Aalborg, Denmark, pp. 509–512 (2001)Google Scholar
  14. 14.
    WebGL- OpenGL for the web, http://www.khronos.org/webgl/
  15. 15.
    Cosi, P., Magno Caldognetto, E.: Lip and Jaw Movements for Vowels and Consonants: Spatio-Temporal Characteristics and Bimodal Recognition Applications. In: Storke, D.G., Henneke, M.E. (eds.) Speechreading by Humans and Machine: Models, Systems and Applications. NATO ASI Series, Series F: Computer and Systems Sciences, vol. 150, pp. 291–313. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  16. 16.
    Magno Caldognetto, E., Zmarich, C., Cosi, P., Ferrero, F.: Italian Consonantal Visemes: Relationships Between Spatial/temporal Articulatory Characteristics and Coproduced Acoustic Signal. In: AVSP 1997, Tutorial & Research Workshop on Audio-Visual Speech Processing: Computational & Cognitive Science Approaches, Rhodes, Greece, pp. 5–8 (1997)Google Scholar
  17. 17.
    Magno Caldognetto, E., Zmarich, C., Cosi, P.: Statistical Definition of Visual Information for Italian Vowels and Consonants. In: Burnham, D., Robert-Ribes, J., Vatikiotis-Bateson, E. (eds.) Proceedings of AVSP 1998, Terrigal, Austria, pp. 135–140 (1998)Google Scholar
  18. 18.
    Boersma, P.: PRAAT, a system for doing phonetics by computer. Glot International 5(9/10), 341–345 (1996)Google Scholar
  19. 19.
    Magno Caldognetto, E., Cosi, P., Drioli, C., Tisato, G., Cavicchio, F.: Coproduction of Speech and Emotions: Visual and Acoustic Modifications of Some Phonetic Labial Tar-gets. In: AVSP 2003, ISCA Workshop, St Jorioz, France, pp. 209–214 (2003)Google Scholar
  20. 20.
    Drioli, C., Tisato, G., Cosi, P., Tesser, F.: Emotions and Voice Quality: Experiments with Sinusoidal Modeling. In: Proceedings of Voqual 2003, Voice Quality: Functions, Analysis and Synthesis, ISCA Workshop, Geneva, Switzerland, pp. 127–132 (2003)Google Scholar
  21. 21.
    Tisato, G., Cosi, P., Drioli, C., Tesser, F.: INTERFACE: a New Tool for Building Emotive/Expressive Talking Heads. In: INTERSPEECH 2005, Lisbon, Portugal, pp. 781–784 (2005)Google Scholar
  22. 22.
  23. 23.
  24. 24.
    Hartman, J., Wernecke, J.: The VRML Handbook. Addison Wessley (1996)Google Scholar
  25. 25.
    LUCIA Open source project, http://sourceforge.net/projects/lucia/
  26. 26.
  27. 27.
  28. 28.
    The Free OpenGL Utility Toolkit, http://freeglut.sourceforge.net
  29. 29.
    GTK+ OpenGL Extension, http://projects.gnome.org/gtkglext/
  30. 30.
    OpenAL:a cross platform 3D audio API, http://connect.creativelabs.com/openal/
  31. 31.
    Gstreamer:Open source multimedia framework, http://gstreamer.freedesktop.org/
  32. 32.
  33. 33.
    Tesser, F., Cosi, P., Drioli, C., Tisato, G.: Prosodic Data-Driven Modelling of Narrative Style in FESTIVAL TTS. In: 5th ISCA Speech Synthesis Workshop, Pittsburgh, U.S.A (2004)Google Scholar
  34. 34.
    Tesser, F., Cosi, P., Drioli, C., Tisato, G.: Emotional Festival-Mbrola TTS Synthesis. In: INTERSPEECH 2005, Lisbon, Portugal, pp. 505–508 (2005)Google Scholar
  35. 35.
    Drioli, C., Tesser, F., Tisato, G., Cosi, P.: Control of Voice Quality for Emotional Speech Synthesis. In: 1st Conference of Associazione Italiana di Scienze della Voce, AISV 2004, EDK Editore s.r.l., Padova, Italy, pp. 789–798 (2005)Google Scholar
  36. 36.
    Nicolao, M., Drioli, C., Cosi, P.: GMM modelling of voice quality for FESTIVAL-MBROLA emotive TTS synthesis. In: INTERSPEECH 2006, Pittsburgh, U.S.A, pp. 1794–1797 (2006)Google Scholar
  37. 37.
    Cosi, P., Fusaro, A., Grigoletto, D., Tisato, G.: Data-Driven Tools for Designing Talking Heads Exploiting Emotional Attitudes. In: André, E., Dybkjær, L., Minker, W., Heisterkamp, P. (eds.) ADS 2004. LNCS (LNAI), vol. 3068, pp. 101–112. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  38. 38.
    Magno Caldognetto, E., Cosi, P., Drioli, C., Tisato, G., Cavicchio, F.: Visual and acoustic modifications of phonetic labial targets in emotive speech: Effects of the co-production of speech and emotions. J. Speech Communication 44, 173–185 (2004)CrossRefGoogle Scholar
  39. 39.
    Magno Caldognetto, E., Cosi, P., Cavicchio, F.: Modifications of Speech Articulatory Characteristics in the Emotive Speech. In: André, E., Dybkjær, L., Minker, W., Heisterkamp, P. (eds.) ADS 2004. LNCS (LNAI), vol. 3068, pp. 233–239. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  40. 40.
    De Carolis, B., Pelachaud, C., Poggi, I., Steedman, M.: APML, a Mark-up Language for Believable Behavior Generation. In: Prendinger, H., Ishizuka, M. (eds.) Life-Like Characters, pp. 65–85. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  41. 41.
    Ruttkay, Z., Noot, H., Hagen, P.: Emotion Disc and Emotion Squares: tools to explore the facial expression space. Computer Graphics Forum 22(1), 49–53 (2003)CrossRefGoogle Scholar
  42. 42.
    Massaro, D.W.: Perceiving Talking Faces: from Speech Perception to a Behavioral Principle. MIT Press, Cambridge (1997)Google Scholar
  43. 43.
    Costantini, E., Pianesi, F., Cosi, P.: Evaluation of Synthetic Faces: Human Recognition of Emotional Facial Displays. In: André, E., Dybkjær, L., Minker, W., Heisterkamp, P. (eds.) ADS 2004. LNCS (LNAI), vol. 3068, pp. 276–287. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering 2012

Authors and Affiliations

  • G. Riccardo Leone
    • 1
  • Giulio Paci
    • 1
  • Piero Cosi
    • 1
  1. 1.Institute of Cognitive Sciences and Technologies – National Research CouncilPadovaItaly

Personalised recommendations