GANimation: One-Shot Anatomically Consistent Facial Animation

Pumarola, Albert; Agudo, Antonio; Martinez, Aleix M.; Sanfeliu, Alberto; Moreno-Noguer, Francesc

doi:10.1007/s11263-019-01210-3

GANimation: One-Shot Anatomically Consistent Facial Animation

Published: 24 August 2019

Volume 128, pages 698–713, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Albert Pumarola¹,
Antonio Agudo¹,
Aleix M. Martinez²,
Alberto Sanfeliu¹ &
…
Francesc Moreno-Noguer¹

3695 Accesses
64 Citations
7 Altmetric
1 Mention
Explore all metrics

Abstract

Recent advances in generative adversarial networks (GANs) have shown impressive results for the task of facial expression synthesis. The most successful architecture is StarGAN (Choi et al. in CVPR, 2018), that conditions GANs’ generation process with images of a specific domain, namely a set of images of people sharing the same expression. While effective, this approach can only generate a discrete number of expressions, determined by the content and granularity of the dataset. To address this limitation, in this paper, we introduce a novel GAN conditioning scheme based on action units (AU) annotations, which describes in a continuous manifold the anatomical facial movements defining a human expression. Our approach allows controlling the magnitude of activation of each AU and combining several of them. Additionally, we propose a weakly supervised strategy to train the model, that only requires images annotated with their activated AUs, and exploit a novel self-learned attention mechanism that makes our network robust to changing backgrounds, lighting conditions and occlusions. Extensive evaluation shows that our approach goes beyond competing conditional generators both in the capability to synthesize a much wider range of expressions ruled by anatomically feasible muscle movements, as in the capacity of dealing with images in the wild. The code of this work is publicly available at https://github.com/albertpumarola/GANimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GANimation: Anatomically-Aware Facial Animation from a Single Image

Region Based Adversarial Synthesis of Facial Action Units

US-GAN: on the importance of ultimate skip connection for facial expression synthesis

Article 06 June 2023

Notes

The dataset was re-annotated with Baltrušaitis et al. (2015) to obtain continuous activation annotations.
We use the face detector from https://github.com/ageitgey/face_recognition.

References

Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. arXiv preprint arXiv:1701.07875
Baltrušaitis, T., Mahmoud, M., & Robinson, P. (2015). Cross-dataset learning and person-specific normalisation for automatic action unit detection. In FG.
Benitez-Quiroz, C. F., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In CVPR.
Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on computer graphics and interactive techniques (pp. 187–194). New York: ACM Press.
Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S., & Choo, J. (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR.
Dolhansky, B., & Canton Ferrer, C. (2018). Eye in-painting with exemplar generative adversarial networks. In CVPR.
Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454–E1462.
Article Google Scholar
Ekman, P., & Friesen, W. (1978). Facial action coding system: A technique for the measurement of facial movement. Palo Altom: Consulting Psychologists Press.
Google Scholar
Fischler, M. A., & Elschlager, R. A. (1973). The representation and matching of pictorial structures. IEEE Transactions on Computers, 22(1), 67–92.
Article Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In NIPS.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of wasserstein GANs. In NIPS.
Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A.: Image-to-image translation with conditional adversarial networks. In CVPR (2017)
Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In ECCV.
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of GANs for improved quality, stability, and variation. In ICLR.
Kim, T., Cha, M., Kim, H., Lee, J., & Kim, J. (2017). Learning to discover cross-domain relations with generative adversarial networks. In ICML.
Kim, H., Garrido, P., Tewari, A., Xu, W., Thies, J., Nießner, N., et al. (2018). Deep video portraits. ACM Transactions on Graphics, 37, 163.
Google Scholar
Kingma, D., & Ba, J. (2015). ADAM: A method for stochastic optimization. In ICLR.
Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In ICLR.
Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D. H., Hawk, S. T., & Van Knippenberg, A. (2010). Presentation and validation of the radboud faces database. Cognition and Emotion, 24(8), 1377–1388.
Article Google Scholar
Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016). Autoencoding beyond pixels using a learned similarity metric. In ICML.
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In CVPR.
Li, C., & Wand, M. (2016). Precomputed real-time texture synthesis with Markovian generative adversarial networks. In ECCV.
Li, M., Zuo, W., & Zhang, D. (2016) Deep identity-aware transfer of facial attributes. arXiv preprint arXiv:1610.05586
Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In NIPS.
Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In ICCV.
Mathieu, M., Couprie, C., & LeCun, Y. (2016). Deep multi-scale video prediction beyond mean square error. In ICLR.
Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784
Nagano, K., Seo, J., Xing, J., Wei, L., Li, Z., Saito, S., et al. (2018). paGAN: Real-time avatars using dynamic textures. ACM Transactions on Graphics, 37(6), 258:1–258:12.
Article Google Scholar
Nam, S., Ma, C., Chai, M., Brendel, W., Xu, N., & Joo Kim, S. (2019). End-to-end time-lapse video synthesis from a single outdoor image. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1409–1418).
Odena, A., Olah, C., & Shlens, J. (2017). Conditional image synthesis with auxiliary classifier GANs. In ICML.
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., & Efros, A. A. (2016). Context encoders: Feature learning by inpainting. In CVPR.
Perarnau, G., van de Weijer, J., Raducanu, B., & Álvarez, J. M. (2016). Invertible conditional GANs for image editing. arXiv preprint arXiv:1611.06355
Pumarola, A., Agudo, A., Martinez, A. A., Sanfeliu, A., & Moreno-Noguer, F. (2018). Ganimation: Anatomically-aware facial animation from a single image. In ECCV.
Pumarola, A., Agudo, A., Sanfeliu, A., & Moreno-Noguer, F. (2018) Unsupervised person image synthesis in arbitrary poses. In CVPR.
Radford, A., Metz, L., & Chintala, S. (2016). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICLR.
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee., H. (2016). Generative adversarial text to image synthesis. In ICML.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. In Advances in neural information processing systems.
Scherer, K. R. (1982). Emotion as a process: Function, origin and regulation. Social Science Information, 21, 555–570.
Article Google Scholar
Shen, W., & Liu, R. (2017). Learning residual images for face attribute manipulation. In CVPR.
Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In CVPR.
Song, Y., Zhu, J., Wang, X., & Qi, H. (2018). Talking face generation by conditional recurrent adversarial network. arXiv preprint arXiv:1804.04786
Susskind, J. M., Hinton, G. E., Movellan, J. R., & Anderson, A. K. (2008). Generating facial expressions with deep belief nets. In Affective computing. IntechOpen.
Suwajanakorn, S., Seitz, S. M., & Kemelmacher-Shlizerman, I. (2017). Synthesizing obama: Learning lip sync from audio. ACM Transactions on Graphics, 36(4), 95.
Article Google Scholar
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016). Face2face: Real-time face capture and reenactment of RGB videos. In CVPR.
Tulyakov, S., Liu, M. Y., Yang, X., & Kautz, J. (2018). Mocogan: Decomposing motion and content for video generation. In CVPR.
Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Generating videos with scene dynamics. In Advances in neural information processing systems.
Vougioukas, K., Petridis, S., & Pantic, M. (2018). End-to-end speech-driven facial animation with temporal gans. In BMVC.
Wang, X., & Gupta, A. (2016). Generative image modeling using style and structure adversarial networks. In ECCV.
Wang, Z., Liu, D., Yang, J., Han, W., & Huang, T. (2015). Deep networks for image super-resolution with sparse prior. In ICCV.
Yu, H., Garrod, O. G., & Schyns, P. G. (2012). Perception-driven facial expression synthesis. Computers and Graphics, 36(3), 152–162.
Article Google Scholar
Zafeiriou, S., Trigeorgis, G., Chrysos, G., Deng, J., & Shen, J. (2017). The menpo facial landmark localisation challenge: A step towards the solution. In CVPRW.
Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., & Metaxas, D. (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV.
Zhou, H., Liu, Y., Liu, Z., Luo, P., & Wang, X. (2019). Talking face generation by adversarially disentangled audio–visual representation. In AAAI.
Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017a). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV.
Zhu, S., Fidler, S., Urtasun, R., Lin, D., & Loy, C. C. (2017b). Be your own prada: Fashion synthesis with structural coherence. In ICCV.

Download references

Acknowledgements

This work is partially supported by an Amazon Research Award, by the Spanish Ministry of Economy and Competitiveness under Projects HuMoUR TIN2017-90086-R, ColRobTransp DPI2016-78957 and María de Maeztu Seal of Excellence MDM-2016-0656; by the EU Project AEROARMS ICT-2014-1-644271; and by the Grant R01-DC- 014498 of the National Institute of Health. We also thank Nvidia for hardware donation under the GPU Grant Program.

Author information

Authors and Affiliations

Institut de Robòtica i Informàtica Industrial, CSIC-UPC, 08028, Barcelona, Spain
Albert Pumarola, Antonio Agudo, Alberto Sanfeliu & Francesc Moreno-Noguer
The Ohio State University, Columbus, OH, 43210, USA
Aleix M. Martinez

Authors

Albert Pumarola
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Agudo
View author publications
You can also search for this author in PubMed Google Scholar
Aleix M. Martinez
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Sanfeliu
View author publications
You can also search for this author in PubMed Google Scholar
Francesc Moreno-Noguer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Albert Pumarola.

Additional information

Communicated by M. Hebert.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pumarola, A., Agudo, A., Martinez, A.M. et al. GANimation: One-Shot Anatomically Consistent Facial Animation. Int J Comput Vis 128, 698–713 (2020). https://doi.org/10.1007/s11263-019-01210-3

Download citation

Received: 31 January 2019
Accepted: 25 July 2019
Published: 24 August 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11263-019-01210-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GANimation: One-Shot Anatomically Consistent Facial Animation

Abstract

Access this article

Similar content being viewed by others

GANimation: Anatomically-Aware Facial Animation from a Single Image

Region Based Adversarial Synthesis of Facial Action Units

US-GAN: on the importance of ultimate skip connection for facial expression synthesis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GANimation: One-Shot Anatomically Consistent Facial Animation

Abstract

Access this article

Similar content being viewed by others

GANimation: Anatomically-Aware Facial Animation from a Single Image

Region Based Adversarial Synthesis of Facial Action Units

US-GAN: on the importance of ultimate skip connection for facial expression synthesis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation