Face Animation Based on Large Audiovisual Database

Tao, Jianhua; Yin, Panrong; Xin, Le

doi:10.1007/978-1-84800-306-4_11

Jianhua Tao²,
Panrong Yin² &
Le Xin²

1193 Accesses

Abstract

In this chapter, we present two methods (fused HMM inversion method and unit selection method) for the speech-driven facial animation system. It systematically addresses audiovisual data acquisition, expressive trajectory analysis, and audiovisual mapping. Based on this framework, we learn the correlation between neutral facial deformation and expressive facial deformation with the Gaussian Mixture Model (GMM). A hierarchical structure is proposed to map the acoustic parameters to lip FAPs. Then the synthesized neutral FAP streams are extended with expressive variations according to the prosody of the input speech. The quantitative evaluation of the experimental result is encouraging and the synthesized face shows a realistic quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arun, K. S., Huang, T. S., & Blostein, S. D. (1987). Least-square fitting of two 3-D point sets.IEEE Transactions on Pattern Analysis and Machine Intelligence 9(5), 698–700.
Article Google Scholar
Brand, M. (1999). Voice puppetry. InProceedings of SIGGRAPH’ 99, 21–28.
Google Scholar
Brand, M., Oliver, N. & Pentland, A. (1997). Coupled hidden Markov models for complex action recognition. InProceedings of Computer Vision and Pattern Recognition, 994–997.
Google Scholar
Bregler, C., Covell, M., & Slaney, M. (1997). Video rewrite: Driving visual speech with audio, InProceedings of the 24th annual conference on Computer graphics and interactive techniques, 353–360
Google Scholar
Chen, T. & Rao, R.R. (1998), Audio-Visual Integration in Multimodal Communication, InProceedings of the IEEE, Vol 86, (pp. :837–851).
Google Scholar
Choi, K., & Hwang, J. N. (1999). Baum—Welch HMM inversion for reliable audio-to-visual conversion. InProceedings of the IEEE International Workshop Multimedia Signal Processing(pp.175–180).
Google Scholar
Cosatto, E., Potamianos, G., & Graf, H. P. (2000). Audiovisual unit selection for the synthesis of photo-realistic talking-heads. InIEEE International Conference on Multimedia and Expo,619–622
Google Scholar
Ezzat, T., & Poggio, T. (1998). MikeTalk: A talking facial display based on morphing visemes. InProceedings of Computer Animation Conference, Philadelphia, 96–102.
Google Scholar
Fu, S. L., Gutierrez-Osuna, R., Esposito, A., Kakumanu, P. K., and Garcia, O. N. (2005). Audio/visual mapping with cross-modal hidden Markov models.IEEE Transaction on Multimedia, Vol 7, Issue 2, 243–252
Article Google Scholar
Gutierrez-Osuna, R. et al. (2005). Speech-driven facial animation with realistic dynamics.IEEE Transactions on Multimedia, Vol 7, Issue 1, 33–42.
Article Google Scholar
Hong, P. Y., Wen, Z., & Huang, T. S. (2002). Real-time speech-driven face animation with expressions using neural networks.IEEE Transactions on Neural Networks, Vol 13, Issue 4, 916–927.
Article Google Scholar
Li, Y., & Shum, H. Y. (2006). Learning dynamic audio-visual mapping with input-output hidden Markov modelsIEEE Transactions on Multimedia 8(3), 542–549
Article Google Scholar
Li, Y., Yu, F., Xu, Y. Q., Chang, E., & Shum, H. Y. (2001). Speech-driven cartoon animation with emotions. InProceedings of the Ninth ACM International Conference on Multimedia, 365–371.
Google Scholar
Liu, W. T., Yin, B. C., Jia, X. B., & Kong, D. H. (2004). Audio to visual signal mappings with HMM. InICASSP 2004. 885–888.
Google Scholar
Massaro, D. W., Beskow, J., Cohen, M. M., Fry, C. L., and Rodriguez, T. (1999). Picture my voice: Audio to visual speech synthesis using artificial neural networks. InProceedings of AVSP’ 99, Santa Cruz, CA (pp. 133–138).
Google Scholar
Moon, S. Y., & Hwang, J. N. (1997). Robust speech recognition based on joint model and feature space optimization of hidden Markov model.IEEE Transactions on Neural Networks 8(2):194–204.
Article Google Scholar
Pan, H., Levinson, S., Huang, T. S., & Liang, Z. P. (2004). A fused hidden Markov model with application to bimodal speech processing.IEEE Transactions on Signal Processing 52(3), 573–581.
Article MathSciNet Google Scholar
Rao, R. R., & Chen, T. (1998). Audio-to-visual conversion for multimedia communication.IEEE Transactions on Industrial Electronics 45(1), 15–22.
Article MathSciNet Google Scholar
Saul, L. K., & Jordan, M. I. (1999). Mixed memory Markov model: Decomposing complex stochastic process as mixture of simpler onesMachine Learning 37, 75–88.
Article MATH Google Scholar
Tekalp, A. M., & Ostermann, J. (2000). Face and 2-D mesh animation in MPEG-4.Signal Processing: Image Communication 15, 387–421.
Article Google Scholar
Verma, A., Subramaniam, L. V., Rajput, N., Neti, C., & Faruquie, T. A. (2004). Animating expressive faces across languages.IEEE Transactions on Multimedia, Vol 6, 791–800.
Article Google Scholar
Wang, J. Q., Wong, K. H. , Pheng, P. A , Meng, H. M., & Wong, T. T. (2004). A real-time Cantonese text-to-audiovisual speech synthesizer.ICASSP04, Vol 1, I–653–6
Google Scholar
Xie, L., & Liu, Z. Q. (2006). Speech animation using coupled hidden Markov models. InThe 18 ^th International Conference on Pattern Recognition, ICPR 2006, 1128–1131.
Google Scholar
Xin, L., Tao, J. H., & Tan, T. N. (2007). Visual speech synthesis based on fused hidden Markov model inversion. InProceedings of ICIP2007, Vol 3, 293–296.
Google Scholar
Yamamoto, E., Nakamura, S., & Shikano, K. (1998). Lip movement synthesis from speech based on hidden Markov models.Speech Communication, Vol 26, 105–115.
Article Google Scholar
Yin, P. R., & Tao, J. H. (2005). Dynamic mapping method based speech driven face animation system. InThe First International Conference on Affective Computing and Intelligent Interaction (ACII), 755–763.
Google Scholar
Zeng, Z. H., Tu, J. L., Pianfetti, B., Liu, M., Zhang, T., Zhang, Z. Q., Huang, T. S., & Levinsion, S. (2005). Audio-visual affect recognition through multi-stream fused HMM for HCI. InCVPR 2005, Vol 2, 967–972.
Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing
Jianhua Tao, Panrong Yin & Le Xin

Authors

Jianhua Tao
View author publications
You can also search for this author in PubMed Google Scholar
Panrong Yin
View author publications
You can also search for this author in PubMed Google Scholar
Le Xin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Haidian, Beijing, 100080, P.R. China
Jianhua Tao & Tieniu Tan BSc, MSc, PhD &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tao, J., Yin, P., Xin, L. (2009). Face Animation Based on Large Audiovisual Database. In: Tao, J., Tan, T. (eds) Affective Information Processing. Springer, London. https://doi.org/10.1007/978-1-84800-306-4_11

Download citation

DOI: https://doi.org/10.1007/978-1-84800-306-4_11
Publisher Name: Springer, London
Print ISBN: 978-1-84800-305-7
Online ISBN: 978-1-84800-306-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics