Skip to main content

Towards Videorealistic Synthetic Visual Speech

  • Chapter

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 704))

Abstract

In this paper we present preliminary results of work towards a videorealistic visual speech synthesiser. A generative model is used to track the face of a talker uttering a series of training sentences and an inventory of synthesis units is built by representing the trajectory of the model parameters with spline curves. A set of model parameters corresponding to a new utterance is formed by concatenating spline segments corresponding to synthesis units in the inventory and sampling at the original frame rate. The new parameters are applied to the model to create a sequence of images corresponding to the talking face.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arslan, L. and Talkin, D. (1998). Speech driven 3-d face point trajectory synthesis algorithm. In Proceedings of the Internation Conference on Speech and Language Processing (ICSLP).

    Google Scholar 

  2. Baker, S., Dellaert, F., and Matthews, I. (2001). Aligning images incrementally backwards. Technical Report CMU-RI-TR-01-03, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.

    Google Scholar 

  3. Bregler, C., Covell, M., and Slaney, M. (1997). Video rewrite: driving visual speech with audio. In Proceedings of SIGGRAPH, pages 353–360.

    Chapter  Google Scholar 

  4. Brooke, N. and Scott, S. (1998). Two- and three-dimensional audio-visual speech synthesis. In Proceedings of Auditory-Visual Speech Processing, pages 213–218.

    Google Scholar 

  5. Cohen, M. and Massaro, D. (1994). Modeling coarticualtion in synthetic visual speech. In Thalmann, N. and D. T., editors, Models and Techniques in Computer Animation, pages 141–155. Springer-Verlag.

    Google Scholar 

  6. Cootes, T., Edwards, G., and Taylor, C. (1998). Active appearance models. In Burkhardt, H. and Neumann, B., editors, Proceedings of the European Conference on Computer Vision, volume 2, pages 484–498. Springer-Verlag.

    Google Scholar 

  7. Cosatto, E. and Graf, H. (1998). Sample-based synthesis of photo-realistic talking heads. In Proceedings of Computer Animation, pages 103–110.

    Google Scholar 

  8. de Boor, C. (2001). Calculation of the smoothing spline with weighted roughness measure. Mathematical Models and Methods in Applied Sciences, 11(1): 33–41.

    Article  MathSciNet  MATH  Google Scholar 

  9. Ezzat, T. and Poggio, T. (1997). Videorealistic talking faces: A morphing approach. In Proceedings of the Audiovisual Speech Processing Workshop, Rhodes, Greece.

    Google Scholar 

  10. Guiard-Marigny, T., Tsingos, N., Adjoudani, A., Benoit, C., and Gascuel, M. (1996). 3d models of the lips for realistic speech animation. In Computer Graphic 96.

    Google Scholar 

  11. Hallgren, A. and Lyberg, B. (1998). Visual speech synthesis with concatenative speech. In Proceedings of Auditory-Visual Speech Processing, pages 181–183.

    Google Scholar 

  12. Le Goff, B. and Benoit, C. (1996). A text-to-audiovisual-speech synthesizer for french. In Proceedings of the International Conference on Speech and Language Processing (ICSLP), Philadelphia, USA.

    Google Scholar 

  13. Lee, Y., Terzopoulos, D., and Waters, K. (1993). Constructing physics-based facial models of individuals. In Proceedings of Graphics Interface, pages 1–8.

    Google Scholar 

  14. Massaro, D. (1998). Perceiving Talking Faces. The MIT Press.

    Google Scholar 

  15. Parke, F. (1974). A Parametric Model for Human Faces. PhD thesis, University of Utah, Salt Lake City, Utah, USA.

    Google Scholar 

  16. Parke, F. and Waters, K. (1996). Computer Facial Animation. A K Peters.

    Google Scholar 

  17. Platt, S. and Badler, N. (1981). Animating facial expression. Computer Graphics, 15(3):245–252.

    Article  Google Scholar 

  18. Waters, K. (1987). A muscle model for animating three-dimensional facial expressions. Proceeding of ACM SIGGRAPH, 21(4).17–24.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Theobald, B., Bangham, J.A., Kruse, S., Cawley, G., Matthews, I. (2002). Towards Videorealistic Synthetic Visual Speech. In: Winkler, J., Niranjan, M. (eds) Uncertainty in Geometric Computations. The Springer International Series in Engineering and Computer Science, vol 704. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0813-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0813-7_15

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5252-5

  • Online ISBN: 978-1-4615-0813-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics