A video prediction approach for animating single face image

Zhao, Yong; Oveneke, Meshia Cédric; Jiang, Dongmei; Sahli, Hichem

doi:10.1007/s11042-018-6952-y

A video prediction approach for animating single face image

Published: 15 December 2018

Volume 78, pages 16389–16410, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yong Zhao ORCID: orcid.org/0000-0003-2644-952X^1,2,
Meshia Cédric Oveneke²,
Dongmei Jiang¹ &
…
Hichem Sahli^2,3

413 Accesses
6 Citations
Explore all metrics

Abstract

Generating dynamic 2D image-based facial expressions is a challenging task for facial animation. Much research work focused on performance-driven facial animation from given videos or images of a target face, while animating a single face image driven by emotion labels is a less explored problem. In this work, we treat the task of animating single face image from emotion labels as a conditional video prediction problem, and propose a novel framework by combining factored conditional restricted boltzmann machines (FCRBM) and reconstruction contractive auto-encoder (RCAE). A modified RCAE with an associated efficient training strategy is used to extract low dimensional features and reconstruct face images. FCRBM is used as animator to predict facial expression sequence in the feature space given discrete emotion labels and a frontal neutral face image as input. Both quantitative and qualitative evaluations on two facial expression databases, and comparison to state-of-the-art showed the effectiveness of our proposed framework for animating frontal neutral face image from given emotion labels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

Expression dynamic capture and 3D animation generation method based on deep learning

Article 12 August 2022

Speech-Driven Facial Animation Using Cascaded GANs for Learning of Motion and Texture

Affective Behaviour Analysis Using Pretrained Model with Facial Prior

References

Alain G, Bengio Y (2014) What regularized auto-encoders learn from the data-generating distribution. J Mach Learn Res 15(1):3563–3593
MathSciNet MATH Google Scholar
Anderson R, Stenger B, Wan V, Cipolla R (2013) Expressive visual text-to-speech using active appearance models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3382–3389
Averbuch-Elor H, Cohen-Or D, Kopf J, Cohen MF (2017) Bringing portraits to life. ACM Trans Graph (TOG) 36(6):196
Article Google Scholar
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Blanz V, Vetter T (1999) A morphable model for the synthesis of 3d faces. In: Proceedings of the 26th annual conference on computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., pp 187–194
Blanz V, Basso C, Poggio T, Vetter T (2003) Reanimating faces in images and video. In: Computer graphics forum vol 22. Wiley Online Library, pp 641–650
Cao Y, Tien WC, Faloutsos P, Pighin F (2005) Expressive speech-driven facial animation. ACM Trans Graph (TOG) 24(4):1283–1302
Article Google Scholar
Cao C, Wu H, Weng Y, Shao T, Zhou K (2016) Real-time facial animation with image-based dynamic avatars. ACM Trans Graph (TOG) 35(4):126
Article Google Scholar
Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685
Article Google Scholar
Deng Z, Noh J (2008) Computer facial animation: a survey. In: Data-driven 3D facial animation. Springer, pp 1–28
Ding H, Zhou SK, Chellappa R (2017) Facenet2expnet: regularizing a deep face recognition net for expression recognition. In: 2017 12th IEEE International conference on automatic face & gesture recognition (FG 2017). IEEE, pp 118–126
Ersotelos N, Dong F (2008) Building highly realistic facial modeling and animation: a survey. Vis Comput 24(1):13–30
Article Google Scholar
Fan B, Wang L, Soong FK, Xie L (2015) Photo-real talking head with deep bidirectional lstm. In: 2015 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4884–4888
Garrido P, Zollhöfer M, Casas D, Valgaerts L, Varanasi K, Pérez P, Theobalt C (2016) Reconstruction of personalized 3d face rigs from monocular video. ACM Trans Graph (TOG) 35(3):28
Article Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672– 2680
Ichim AE, Bouaziz S, Pauly M (2015) Dynamic 3d avatar creation from hand-held video input. ACM Trans Graph (TOG) 34(4):45
Article Google Scholar
Jiang D, Zhao Y, Sahli H, Zhang Y (2014) Speech driven photo realistic facial animation based on an articulatory dbn model and aam features. Multimed Tools Appl 73(1):397–415
Article Google Scholar
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:13126114
Liu Z, Shan Y, Zhang Z (2001) Expressive expression mapping with ratio images. In: Proceedings of the 28th annual conference on computer graphics and interactive techniques. ACM, pp 271–276
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 94–101
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:14111784
Olszewski K, Li Z, Yang C, Zhou Y, Yu R, Huang Z, Xiang S, Saito S, Kohli P, Li H (2017) Realistic dynamic facial textures from a single image using gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5429–5438
Oveneke MC, Aliosha-Perez M, Zhao Y, Jiang D, Sahli H (2016) Efficient convolutional auto-encoding via random convexification and frequency-domain minimization. arXiv:161109232
Oveneke MC, Zhao Y, Jiang D, Sahli H (2017) Expressive face frontalization and its application to facial expression analysis. Tech. rep., Vrije Universiteit Brussel
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 833–840
Shu Z, Yumer E, Hadap S, Sunkavalli K, Shechtman E, Samaras D (2017) Neural face editing with intrinsic image disentangling. arXiv:170404131
Stoiber N, Seguier R, Breton G (2009) Automatic design of a control interface for a synthetic face. In: Proceedings of the 14th international conference on intelligent user interfaces. ACM, pp 207–216
Susskind JM, Anderson AK, Hinton GE, Movellan JR (2008) Generating facial expressions with deep belief nets. INTECH Open Access Publisher
Sutskever I, Hinton GE, Taylor GW (2009) The recurrent temporal restricted boltzmann machine. In: Advances in neural information processing systems, pp 1601–1608
Taylor GW, Hinton GE (2009) Factored conditional restricted Boltzmann machines for modeling motion style. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 1025–1032
Taylor GW, Hinton GE, Roweis ST (2007) Modeling human motion using binary latent variables. Adv Neural Inf Process Syst 19:1345
Google Scholar
Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2387–2395
Tulyakov S, Liu MY, Yang X, Kautz J (2018) Mocogan: decomposing motion and content for video generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1526–1535
Villegas R, Yang J, Zou Y, Sohn S, Lin X, Lee H (2017) Learning to generate long-term future via hierarchical prediction. arXiv:170405831
Wang Z, Bovik AC (2009) Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process Mag 26(1):98–117
Article Google Scholar
Wang L, Soong FK (2015) Hmm trajectory-guided sample selection for photo-realistic talking head. Multimed Tools Appl 74(22):9849–9869
Article Google Scholar
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13 (4):600–612
Article Google Scholar
Yan X, Yang J, Sohn K, Lee H (2016) Attribute2image: conditional image generation from visual attributes. In: European conference on computer vision. Springer, pp 776–791
Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619
Article Google Scholar
Zhao Y, Jiang D, Sahli H (2015) 3d emotional facial animation synthesis with factored conditional restricted Boltzmann machines. In: 2015 International conference on affective computing and intelligent interaction (ACII). IEEE, pp 797–803

Download references

Acknowledgements

We thank Averbuch-Elor et al. for kindly providing the sequence for comparison. We thank Tao Yang for the kindly processing of the facial expression recognition experiments and all the students for their participation to the subjective analysis. We would like to thank the reviewer for their detailed comments and suggestions for the manuscript. We believe that the comments have identified important areas which required improvement. This work is supported by the Chinese Scholarship Council (CSC) (grant 201506290085), the Shaanxi Provincial International Science and Technology Collaboration Project (grant 2017KW-ZD-14), the Natural Science Foundation of China (grant 61273265), the VUB Interdisciplinary Research Program through the EMO-App project, and the Agency for Innovation by Science and Technology in Flanders (IWT) – PhD grant nr. 131814.

Author information

Authors and Affiliations

NPU-VUB joint AVSP Research Lab, School of Computer Science, Northwestern Polytechnical University (NPU), 127 West Youyi Road, Xi’an, 710072, People’s Republic of China
Yong Zhao & Dongmei Jiang
VUB-NPU joint AVSP Research Lab, Department of Electronics & Informatics (ETRO), Vrije Universiteit Brussel (VUB), Pleinlaan 2, 1050, Brussels, Belgium
Yong Zhao, Meshia Cédric Oveneke & Hichem Sahli
Interuniversity Microelectronics Centre (IMEC), Kapeldreef 75, 3001, Heverlee, Belgium
Hichem Sahli

Authors

Yong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Meshia Cédric Oveneke
View author publications
You can also search for this author in PubMed Google Scholar
Dongmei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Hichem Sahli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Zhao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(AVI 26.7 MB)

(AVI 28.9 MB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, Y., Oveneke, M.C., Jiang, D. et al. A video prediction approach for animating single face image. Multimed Tools Appl 78, 16389–16410 (2019). https://doi.org/10.1007/s11042-018-6952-y

Download citation

Received: 30 December 2017
Revised: 30 September 2018
Accepted: 23 November 2018
Published: 15 December 2018
Issue Date: 30 June 2019
DOI: https://doi.org/10.1007/s11042-018-6952-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A video prediction approach for animating single face image

Abstract

Access this article

Similar content being viewed by others

Expression dynamic capture and 3D animation generation method based on deep learning

Speech-Driven Facial Animation Using Cascaded GANs for Learning of Motion and Texture

Affective Behaviour Analysis Using Pretrained Model with Facial Prior

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A video prediction approach for animating single face image

Abstract

Access this article

Similar content being viewed by others

Expression dynamic capture and 3D animation generation method based on deep learning

Speech-Driven Facial Animation Using Cascaded GANs for Learning of Motion and Texture

Affective Behaviour Analysis Using Pretrained Model with Facial Prior

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation