EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer

Yang, Chenyu; He, Wanrong; Xu, Yingqing; Gao, Yang

doi:10.1007/978-3-031-19787-1_42

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13676))

Included in the following conference series:

European Conference on Computer Vision

2281 Accesses
6 Citations

Abstract

Most existing methods view makeup transfer as transferring color distributions of different facial regions and ignore details such as eye shadows and blushes. Besides, they only achieve controllable transfer within predefined fixed regions. This paper emphasizes the transfer of makeup details and steps towards more flexible controls. To this end, we propose Exquisite and locally editable GAN for makeup transfer (EleGANt). It encodes facial attributes into pyramidal feature maps to preserves high-frequency information. It uses attention to extract makeup features from the reference and adapt them to the source face, and we introduce a novel Sow-Attention Module that applies attention within shifted overlapped windows to reduce the computational cost. Moreover, EleGANt is the first to achieve customized local editing within arbitrary areas by corresponding editing on the feature maps. Extensive experiments demonstrate that EleGANt generates realistic makeup faces with exquisite details and achieves state-of-the-art performance. The code is available at https://github.com/Chenyu-Yang-2000/EleGANt.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

An, J., Xiong, H., Huan, J., Luo, J.: Ultrafast photorealistic style transfer via neural architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10443–10450 (2020)
Google Scholar
Cai, M., Zhang, H., Huang, H., Geng, Q., Li, Y., Huang, G.: Frequency domain image translation: more photo-realistic, better identity-preserving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13930–13940 (2021)
Google Scholar
Chang, H., Lu, J., Yu, F., Finkelstein, A.: PairedCycleGAN: asymmetric style transfer for applying and removing makeup. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 40–48 (2018)
Google Scholar
Chen, C.F., Panda, R., Fan, Q.: RegionViT: regional-to-local attention for vision transformers. In: Proceedings of the International Conference on Learning Representations (ICLR) (2022)
Google Scholar
Chen, H.J., Hui, K.M., Wang, S.Y., Tsao, L.W., Shuai, H.H., Cheng, W.H.: BeautyGlow: on-demand makeup transfer framework with reversible generative network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10042–10050 (2019)
Google Scholar
Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8188–8197 (2020)
Google Scholar
Chu, X., et al.: Twins: revisiting the design of spatial attention in vision transformers. In: Proceedings of the International Conference on Neural Information Processing Systems (NIPS), pp. 9355–9366 (2021)
Google Scholar
Deng, H., Han, C., Cai, H., Han, G., He, S.: Spatially-invariant style-codes controlled makeup transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6549–6557 (2021)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of the International Conference on Learning Representations (ICLR) (2021)
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2414–2423 (2016)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Proceedings of the International Conference on Neural Information Processing Systems (NIPS) (2014)
Google Scholar
Gu, Q., Wang, G., Chiu, M.T., Tai, Y.W., Tang, C.K.: LADN: local adversarial disentangling network for facial makeup and de-makeup. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10481–10490 (2019)
Google Scholar
Guo, D., Sim, T.: Digital face makeup by example. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 73–79 (2009)
Google Scholar
Heo, B., Yun, S., Han, D., Chun, S., Choe, J., Oh, S.J.: Rethinking spatial dimensions of vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11936–11945 (2021)
Google Scholar
Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180 (2019)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1501–1510 (2017)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Proceedings of the International Conference on Neural Information Processing Systems (NIPS), pp. 2017–2025 (2015)
Google Scholar
Jiang, W., et al.: PSGAN: pose and expression robust spatial-aware GAN for customizable makeup transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5194–5202 (2020)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 694–711 (2016)
Google Scholar
Kim, S.S., Kolkin, N., Salavon, J., Shakhnarovich, G.: Deformable style transfer. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 246–261 (2020)
Google Scholar
Li, C., Zhou, K., Lin, S.: Simulating makeup through physics-based manipulation of intrinsic image layers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4621–4629 (2015)
Google Scholar
Li, T., et al.: BeautyGAN: instance-level facial makeup transfer with deep generative adversarial network. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 645–653 (2018)
Google Scholar
Liao, J., Yao, Y., Yuan, L., Hua, G., Kang, S.B.: Visual attribute transfer through deep image analogy. ACM Trans. Graph. 36(4), 1–15 (2017)
Article Google Scholar
Liu, L., Xing, J., Liu, S., Xu, H., Zhou, X., Yan, S.: Wow! You are so beautiful today! ACM Trans. Multim. Comput. Commun. Appl. (TOMM) 11(1s), 1–22 (2014)
Google Scholar
Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)
Google Scholar
Luan, F., Paris, S., Shechtman, E., Bala, K.: Deep photo style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4990–4998 (2017)
Google Scholar
Parmar, N., et al.: Image transformer. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 4055–4064 (2018)
Google Scholar
Rakhimov, R., Volkhonskiy, D., Artemov, A., Zorin, D., Burnaev, E.: Latent video transformer. In: Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP), pp. 101–112 (2021)
Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Article Google Scholar
Tong, W.S., Tang, C.K., Brown, M.S., Xu, Y.Q.: Example-based cosmetic transfer. In: Proceedings of the 15th Pacific Conference on Computer Graphics and Applications (PG), pp. 211–218 (2007)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the International Conference on Neural Information Processing Systems (NIPS), pp. 6000–6010 (2017)
Google Scholar
Wan, Z., Chen, H., An, J., Jiang, W., Yao, C., Luo, J.: Facial attribute transformers for precise and robust makeup transfer. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1717–1726 (2022)
Google Scholar
Wang, H., Li, Y., Wang, Y., Hu, H., Yang, M.H.: Collaborative distillation for ultra-resolution universal style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1860–1869 (2020)
Google Scholar
Wang, W., et al.: Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 568–578 (2021)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7794–7803 (2018)
Google Scholar
Weissenborn, D., Täckström, O., Uszkoreit, J.: Scaling autoregressive video models. In: Proceedings of the International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Wu, C., et al.: Godiva: generating open-domain videos from natural descriptions. arXiv preprint arXiv:2104.14806 (2021)
Wu, C., et al.: N\(\backslash \)” UWA: visual synthesis pre-training for neural visual world creation. arXiv preprint arXiv:2111.12417 (2021)
Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 2048–2057 (2015)
Google Scholar
Xu, L., Du, Y., Zhang, Y.: An automatic framework for example-based virtual makeup. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3206–3210 (2013)
Google Scholar
Yuan, L., et al.: Tokens-to-token VIT: Training vision transformers from scratch on imageNet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 558–567 (2021)
Google Scholar
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Zhang, P., et al.: Multi-scale vision longformer: a new vision transformer for high-resolution image encoding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2998–3008 (2021)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2223–2232 (2017)
Google Scholar

Download references

Acknowledgements

This work is supported by the Ministry of Science and Technology of the People´s Republic of China, the 2030 Innovation Megaprojects “Program on New Generation Artificial Intelligence” (Grant No. 2021AAA0150000). This work is also supported by a grant from the Guoqiang Institute, Tsinghua University. Thanks to Steve Lin for his pre-reading and constructive suggestions.

Author information

Authors and Affiliations

Tsinghua University, Beijing, China
Chenyu Yang, Wanrong He, Yingqing Xu & Yang Gao
Shanghai Qi Zhi Institute, Shanghai, China
Yang Gao

Authors

Chenyu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wanrong He
View author publications
You can also search for this author in PubMed Google Scholar
Yingqing Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yingqing Xu or Yang Gao .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 884 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, C., He, W., Xu, Y., Gao, Y. (2022). EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13676. Springer, Cham. https://doi.org/10.1007/978-3-031-19787-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-19787-1_42
Published: 21 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19786-4
Online ISBN: 978-3-031-19787-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer