Towards Videorealistic Synthetic Visual Speech

  • Barry Theobald
  • J. Andrew Bangham
  • Silko Kruse
  • Gavin Cawley
  • Iain Matthews
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 704)


In this paper we present preliminary results of work towards a videorealistic visual speech synthesiser. A generative model is used to track the face of a talker uttering a series of training sentences and an inventory of synthesis units is built by representing the trajectory of the model parameters with spline curves. A set of model parameters corresponding to a new utterance is formed by concatenating spline segments corresponding to synthesis units in the inventory and sampling at the original frame rate. The new parameters are applied to the model to create a sequence of images corresponding to the talking face.


Shape and appearance models principal component analysis visual speech synthesis facial animation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Arslan and Talkin, 1998]
    Arslan, L. and Talkin, D. (1998). Speech driven 3-d face point trajectory synthesis algorithm. In Proceedings of the Internation Conference on Speech and Language Processing (ICSLP).Google Scholar
  2. [Baker et al., 2001]
    Baker, S., Dellaert, F., and Matthews, I. (2001). Aligning images incrementally backwards. Technical Report CMU-RI-TR-01-03, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
  3. [Bregler et al., 1997]
    Bregler, C., Covell, M., and Slaney, M. (1997). Video rewrite: driving visual speech with audio. In Proceedings of SIGGRAPH, pages 353–360.CrossRefGoogle Scholar
  4. [Brooke and Scott, 1998]
    Brooke, N. and Scott, S. (1998). Two- and three-dimensional audio-visual speech synthesis. In Proceedings of Auditory-Visual Speech Processing, pages 213–218.Google Scholar
  5. [Cohen and Massaro, 1994]
    Cohen, M. and Massaro, D. (1994). Modeling coarticualtion in synthetic visual speech. In Thalmann, N. and D. T., editors, Models and Techniques in Computer Animation, pages 141–155. Springer-Verlag.Google Scholar
  6. [Cootes et al., 1998]
    Cootes, T., Edwards, G., and Taylor, C. (1998). Active appearance models. In Burkhardt, H. and Neumann, B., editors, Proceedings of the European Conference on Computer Vision, volume 2, pages 484–498. Springer-Verlag.Google Scholar
  7. [Cosatto and Graf, 1998]
    Cosatto, E. and Graf, H. (1998). Sample-based synthesis of photo-realistic talking heads. In Proceedings of Computer Animation, pages 103–110.Google Scholar
  8. [de Boor, 2001]
    de Boor, C. (2001). Calculation of the smoothing spline with weighted roughness measure. Mathematical Models and Methods in Applied Sciences, 11(1): 33–41.MathSciNetzbMATHCrossRefGoogle Scholar
  9. [Ezzat and Poggio, 1997]
    Ezzat, T. and Poggio, T. (1997). Videorealistic talking faces: A morphing approach. In Proceedings of the Audiovisual Speech Processing Workshop, Rhodes, Greece.Google Scholar
  10. [Guiard-Marigny et al., 1996]
    Guiard-Marigny, T., Tsingos, N., Adjoudani, A., Benoit, C., and Gascuel, M. (1996). 3d models of the lips for realistic speech animation. In Computer Graphic 96.Google Scholar
  11. [Hallgren and Lyberg, 1998]
    Hallgren, A. and Lyberg, B. (1998). Visual speech synthesis with concatenative speech. In Proceedings of Auditory-Visual Speech Processing, pages 181–183.Google Scholar
  12. [Le Goff and Benoit, 1996]
    Le Goff, B. and Benoit, C. (1996). A text-to-audiovisual-speech synthesizer for french. In Proceedings of the International Conference on Speech and Language Processing (ICSLP), Philadelphia, USA.Google Scholar
  13. [Lee et al., 1993]
    Lee, Y., Terzopoulos, D., and Waters, K. (1993). Constructing physics-based facial models of individuals. In Proceedings of Graphics Interface, pages 1–8.Google Scholar
  14. [Massaro, 1998]
    Massaro, D. (1998). Perceiving Talking Faces. The MIT Press.Google Scholar
  15. [Parke, 1974]
    Parke, F. (1974). A Parametric Model for Human Faces. PhD thesis, University of Utah, Salt Lake City, Utah, USA.Google Scholar
  16. [Parke and Waters, 1996]
    Parke, F. and Waters, K. (1996). Computer Facial Animation. A K Peters.Google Scholar
  17. [Platt and Badler, 1981]
    Platt, S. and Badler, N. (1981). Animating facial expression. Computer Graphics, 15(3):245–252.CrossRefGoogle Scholar
  18. [Waters, 1987]
    Waters, K. (1987). A muscle model for animating three-dimensional facial expressions. Proceeding of ACM SIGGRAPH, 21(4).17–24.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2002

Authors and Affiliations

  • Barry Theobald
    • 1
  • J. Andrew Bangham
    • 1
  • Silko Kruse
    • 1
  • Gavin Cawley
    • 1
  • Iain Matthews
    • 2
  1. 1.University of East AngliaNorwichUK
  2. 2.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations