Skip to main content

Face Animation Based on Large Audiovisual Database

  • Chapter
Affective Information Processing
  • 1193 Accesses

Abstract

In this chapter, we present two methods (fused HMM inversion method and unit selection method) for the speech-driven facial animation system. It systematically addresses audiovisual data acquisition, expressive trajectory analysis, and audiovisual mapping. Based on this framework, we learn the correlation between neutral facial deformation and expressive facial deformation with the Gaussian Mixture Model (GMM). A hierarchical structure is proposed to map the acoustic parameters to lip FAPs. Then the synthesized neutral FAP streams are extended with expressive variations according to the prosody of the input speech. The quantitative evaluation of the experimental result is encouraging and the synthesized face shows a realistic quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arun, K. S., Huang, T. S., & Blostein, S. D. (1987). Least-square fitting of two 3-D point sets.IEEE Transactions on Pattern Analysis and Machine Intelligence 9(5), 698–700.

    Article  Google Scholar 

  2. Brand, M. (1999). Voice puppetry. InProceedings of SIGGRAPH’ 99, 21–28.

    Google Scholar 

  3. Brand, M., Oliver, N. & Pentland, A. (1997). Coupled hidden Markov models for complex action recognition. InProceedings of Computer Vision and Pattern Recognition, 994–997.

    Google Scholar 

  4. Bregler, C., Covell, M., & Slaney, M. (1997). Video rewrite: Driving visual speech with audio, InProceedings of the 24th annual conference on Computer graphics and interactive techniques, 353–360

    Google Scholar 

  5. Chen, T. & Rao, R.R. (1998), Audio-Visual Integration in Multimodal Communication, InProceedings of the IEEE, Vol 86, (pp. :837–851).

    Google Scholar 

  6. Choi, K., & Hwang, J. N. (1999). Baum—Welch HMM inversion for reliable audio-to-visual conversion. InProceedings of the IEEE International Workshop Multimedia Signal Processing(pp.175–180).

    Google Scholar 

  7. Cosatto, E., Potamianos, G., & Graf, H. P. (2000). Audiovisual unit selection for the synthesis of photo-realistic talking-heads. InIEEE International Conference on Multimedia and Expo,619–622

    Google Scholar 

  8. Ezzat, T., & Poggio, T. (1998). MikeTalk: A talking facial display based on morphing visemes. InProceedings of Computer Animation Conference, Philadelphia, 96–102.

    Google Scholar 

  9. Fu, S. L., Gutierrez-Osuna, R., Esposito, A., Kakumanu, P. K., and Garcia, O. N. (2005). Audio/visual mapping with cross-modal hidden Markov models.IEEE Transaction on Multimedia, Vol 7, Issue 2, 243–252

    Article  Google Scholar 

  10. Gutierrez-Osuna, R. et al. (2005). Speech-driven facial animation with realistic dynamics.IEEE Transactions on Multimedia, Vol 7, Issue 1, 33–42.

    Article  Google Scholar 

  11. Hong, P. Y., Wen, Z., & Huang, T. S. (2002). Real-time speech-driven face animation with expressions using neural networks.IEEE Transactions on Neural Networks, Vol 13, Issue 4, 916–927.

    Article  Google Scholar 

  12. Li, Y., & Shum, H. Y. (2006). Learning dynamic audio-visual mapping with input-output hidden Markov modelsIEEE Transactions on Multimedia 8(3), 542–549

    Article  Google Scholar 

  13. Li, Y., Yu, F., Xu, Y. Q., Chang, E., & Shum, H. Y. (2001). Speech-driven cartoon animation with emotions. InProceedings of the Ninth ACM International Conference on Multimedia, 365–371.

    Google Scholar 

  14. Liu, W. T., Yin, B. C., Jia, X. B., & Kong, D. H. (2004). Audio to visual signal mappings with HMM. InICASSP 2004. 885–888.

    Google Scholar 

  15. Massaro, D. W., Beskow, J., Cohen, M. M., Fry, C. L., and Rodriguez, T. (1999). Picture my voice: Audio to visual speech synthesis using artificial neural networks. InProceedings of AVSP’ 99, Santa Cruz, CA (pp. 133–138).

    Google Scholar 

  16. Moon, S. Y., & Hwang, J. N. (1997). Robust speech recognition based on joint model and feature space optimization of hidden Markov model.IEEE Transactions on Neural Networks 8(2):194–204.

    Article  Google Scholar 

  17. Pan, H., Levinson, S., Huang, T. S., & Liang, Z. P. (2004). A fused hidden Markov model with application to bimodal speech processing.IEEE Transactions on Signal Processing 52(3), 573–581.

    Article  MathSciNet  Google Scholar 

  18. Rao, R. R., & Chen, T. (1998). Audio-to-visual conversion for multimedia communication.IEEE Transactions on Industrial Electronics 45(1), 15–22.

    Article  MathSciNet  Google Scholar 

  19. Saul, L. K., & Jordan, M. I. (1999). Mixed memory Markov model: Decomposing complex stochastic process as mixture of simpler onesMachine Learning 37, 75–88.

    Article  MATH  Google Scholar 

  20. Tekalp, A. M., & Ostermann, J. (2000). Face and 2-D mesh animation in MPEG-4.Signal Processing: Image Communication 15, 387–421.

    Article  Google Scholar 

  21. Verma, A., Subramaniam, L. V., Rajput, N., Neti, C., & Faruquie, T. A. (2004). Animating expressive faces across languages.IEEE Transactions on Multimedia, Vol 6, 791–800.

    Article  Google Scholar 

  22. Wang, J. Q., Wong, K. H. , Pheng, P. A , Meng, H. M., & Wong, T. T. (2004). A real-time Cantonese text-to-audiovisual speech synthesizer.ICASSP04, Vol 1, I–653–6

    Google Scholar 

  23. Xie, L., & Liu, Z. Q. (2006). Speech animation using coupled hidden Markov models. InThe 18 th International Conference on Pattern Recognition, ICPR 2006, 1128–1131.

    Google Scholar 

  24. Xin, L., Tao, J. H., & Tan, T. N. (2007). Visual speech synthesis based on fused hidden Markov model inversion. InProceedings of ICIP2007, Vol 3, 293–296.

    Google Scholar 

  25. Yamamoto, E., Nakamura, S., & Shikano, K. (1998). Lip movement synthesis from speech based on hidden Markov models.Speech Communication, Vol 26, 105–115.

    Article  Google Scholar 

  26. Yin, P. R., & Tao, J. H. (2005). Dynamic mapping method based speech driven face animation system. InThe First International Conference on Affective Computing and Intelligent Interaction (ACII), 755–763.

    Google Scholar 

  27. Zeng, Z. H., Tu, J. L., Pianfetti, B., Liu, M., Zhang, T., Zhang, Z. Q., Huang, T. S., & Levinsion, S. (2005). Audio-visual affect recognition through multi-stream fused HMM for HCI. InCVPR 2005, Vol 2, 967–972.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag London Limited

About this chapter

Cite this chapter

Tao, J., Yin, P., Xin, L. (2009). Face Animation Based on Large Audiovisual Database. In: Tao, J., Tan, T. (eds) Affective Information Processing. Springer, London. https://doi.org/10.1007/978-1-84800-306-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-306-4_11

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-305-7

  • Online ISBN: 978-1-84800-306-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics