Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model

  • Salil Deena
  • Aphrodite Galata
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5875)


In this work, synthesis of facial animation is done by modelling the mapping between facial motion and speech using the shared Gaussian process latent variable model. Both data are processed separately and subsequently coupled together to yield a shared latent space. This method allows coarticulation to be modelled by having a dynamical model on the latent space. Synthesis of novel animation is done by first obtaining intermediate latent points from the audio data and then using a Gaussian Process mapping to predict the corresponding visual data. Statistical evaluation of generated visual features against ground truth data compares favourably with known methods of speech animation. The generated videos are found to show proper synchronisation with audio and exhibit correct facial dynamics.


Latent Space Audio Data Audio Feature Active Appearance Model Facial Animation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    McGurk, H.: MacDonald: Hearing lips and seeing voices. Nature 264, 746–748 (1976)CrossRefGoogle Scholar
  2. 2.
    Parke, F.I.: A parametric model of human faces. PhD thesis, University of Utah (1974)Google Scholar
  3. 3.
    Terzopoulos, D., Waters, K.: Analysis and synthesis of facial image sequences using physical and anatomical models. IEEE Trans. on Patt. Anal. and Mach. Intel. 15(6), 569–579 (1993)CrossRefGoogle Scholar
  4. 4.
    Kähler, K., Haber, J., Yamauchi, H., Seidel, H.P.: Head shop: generating animated head models with anatomical structure. In: SCA 2002: Proc. of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp. 55–63 (2002)Google Scholar
  5. 5.
    Ezzat, T., Poggio, T.: Miketalk: A talking facial display based on morphing visemes. In: Proc. of the Computer Animation Conference (1998)Google Scholar
  6. 6.
    Bregler, C., Covell, M., Slaney, M.: Video rewrite: driving visual speech with audio. In: SIGGRAPH 1997: Proc. of the 24th ACM annual conference on Computer graphics and interactive techniques, pp. 353–360 (1997)Google Scholar
  7. 7.
    Brand, M.: Voice puppetry. In: SIGGRAPH 1999: Proc. of the ACM 26th annual conference on Computer graphics and interactive techniques, pp. 21–28 (1999)Google Scholar
  8. 8.
    Ezzat, T., Geiger, G., Poggio, T.: Trainable videorealistic speech animation. In: SIGGRAPH 2002: Proceedings of the ACM 29th annual conference on Computer graphics and interactive techniques, pp. 388–398 (2002)Google Scholar
  9. 9.
    Cosker, D., Marshall, D., Rosin, P.L., Hicks, Y.: Speech driven facial animation using a hidden Markov coarticulation model. In: ICPR 2004: Proc. of the IEEE 17th International Conference on Pattern Recognition, vol. 1, pp. 128–131 (2004)Google Scholar
  10. 10.
    Theobald, B.J., Wilkinson, N.: A real-time speech-driven talking head using active appearance models. In: AVSP 2007: Proc. of the International Conference on Auditory-Visual Speech Processing (2007)Google Scholar
  11. 11.
    Englebienne, G., Cootes, T.F., Rattray, M.: A probabilistic model for generating realistic lip movements from speech. In: NIPS 2008: Avances in Neural Information Processing Systems 21, pp. 401–408 (2008)Google Scholar
  12. 12.
    Chai, J.X., Xiao, J., Hodgins, J.: Vision-based control of 3D facial animation. In: SCA 2003: Proc. of the ACM SIGGRAPH/Eurographics symposium on Computer animation, pp. 193–206 (2003)Google Scholar
  13. 13.
    Cao, Y., Faloutsos, P., Kohler, E., Pighin, F.: Real-time speech motion synthesis from recorded motions. In: SCA 2004: Proc. of the ACM SIGGRAPH/Eurographics symposium on Computer animation (2004)Google Scholar
  14. 14.
    Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall PTR, Englewood Cliffs (2001)Google Scholar
  15. 15.
    Tekalp, M., Ostermann, J.: Face and 2-D mesh animation in MPEG-4. Image Communication Journal (1999)Google Scholar
  16. 16.
    Lawrence, N.D.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research 6, 1783–1816 (2005)MathSciNetGoogle Scholar
  17. 17.
    Shon, A., Grochow, K., Hertzmann, A., Rao, R.: Learning shared latent structure for image synthesis and robotic imitation. In: NIPS 2005: Advances in Neural Information Processing Systems 18, pp. 1233–1240 (2005)Google Scholar
  18. 18.
    Ek, C.H., Torr, P.H.S., Lawrence, N.D.: Gaussian process latent variable models for human pose estimation. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 132–143. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Lawrence, N.D., Quinonero-Candela, J.: Local distance preservation in the GP-LVM through back constraints. In: ICML 2006: Proc. of the ACM 23rd International Conference on Machine learning, pp. 513–520 (2006)Google Scholar
  20. 20.
    Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models. In: NIPS 2005: Advances in Neural Information Processing Systems 18 (2005)Google Scholar
  21. 21.
    Ek, C.H., Rihan, J., Torr, P.H., Rogez, G., Lawrence, N.D.: Ambiguity modeling in latent spaces. In: Popescu-Belis, A., Stiefelhagen, R. (eds.) MLMI 2008. LNCS, vol. 5237, pp. 62–73. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  22. 22.
    Ek, C.H., Jaeckel, P., Campbell, N., Lawrence, N.D., Melhuish, C.: Shared Gaussian process latent variable models for handling ambiguous facial expressions. American Institute of Physics Conference Series (2009)Google Scholar
  23. 23.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: ECCV 1998: Proc. of the 5th European Conference on Computer Vision-Vol. II, pp. 484–498 (1998)Google Scholar
  24. 24.
    Lawrence, N.D.: Learning for larger datasets with the Gaussian process latent variable model. In: AISTATS 2007: Proc. of of the Eleventh International Workshop on Artificial Intelligence and Statistics (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Salil Deena
    • 1
  • Aphrodite Galata
    • 1
  1. 1.School of Computer ScienceUniversity of ManchesterManchesterUnited Kingdom

Personalised recommendations