Unsupervised style-guided cross-domain adaptation for few-shot stylized face translation

Lan, Jiaying; Ye, Fenghua; Ye, Zhenghua; Xu, Pingping; Ling, Wing-Kuen; Huang, Guoheng

doi:10.1007/s00371-022-02719-4

Unsupervised style-guided cross-domain adaptation for few-shot stylized face translation

Original article
Published: 16 November 2022

Volume 39, pages 6167–6181, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Jiaying Lan¹,
Fenghua Ye²,
Zhenghua Ye³,
Pingping Xu¹,
Wing-Kuen Ling⁴ &
…
Guoheng Huang ORCID: orcid.org/0000-0002-3640-3229¹

444 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

When a domain-to-domain translation-based generative model is trained on a target domain with limited instances and no paired samples, two key issues will emerge: the introduction of noticeable artifacts and the severe overfitting. Existing methods require several thousands of images to perform the proper training, and they tend to consider only a few key elements to control the generation of a single style, while the noticeable artifacts are introduced in the local shapes of their generated faces. When the total number of available images is reduced to just a few images, the training samples are usually overfitted or of poor quality. To address these issues, we propose the portrait style conversion-generative adversarial network to translate one portrait of a photo into varieties of the specific style appearances. Firstly, for the purpose of reducing the noticeable artifacts, a visual content disentangled module is put forward to extract the disentangled visual representations, preserving both the similarities and differences among the instances of the source images. At the same time, a cross-domain generative module is proposed to complete the cross-domain adaptation from source to target. Then, for the purpose of reducing the overfitting, an anchor-based strategy called the adaptive joint patch discriminative module is presented to encourage different levels of realism over different regions in the latent space. Lastly, experiments and ablation studies of the existing state-of-the-art GAN-based model performance comparison were conducted to validate the efficacy of our approach. The proposed method preserves more global characteristics of source image and improves the total image perceptual quality significantly. The results show high superiority of our method compared to the existing state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Learning to Prompt for Vision-Language Models

Article 31 July 2022

A literature review and perspectives in deepfakes: generation, detection, and applications

Article 23 July 2022

References

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014)
Kim, J., Kim, M., Kang, H., Lee, K.H.: U-gat-it: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In: ICLR (2019)
Nizan, O., Tal, A.: Breaking the cycle—colleagues are all you need. In: CVPR (2020)
Zhao, Y., Wu, R., Dong, H.: Unpaired image-to-image translation using adversarial consistency loss. In: ECCV (2020)
Fang, Z., Liu, Z., Liu, T., Hung, C.C., Xiao, J., Feng, G.: Facial expression gan for voice-driven face generation. Vis. Comput. 1–14 (2021)
Sun, Q., Chen, Y., Tao, W., Jiang, H., Zhang, M., Chen, K., Erdt, M.: A gan-based approach toward architectural line drawing colorization prototyping. Vis. Comput. 1–18 (2021)
Nozawa, N., Shum, H.P., Feng, Q., Ho, E.S., Morishima, S.: 3d car shape reconstruction from a contour sketch using gan and lazy learning. Vis. Comput. 1–14 (2021)
Zhang, S., Han, Z., Lai, Y.K., Zwicker, M., Zhang, H.: Stylistic scene enhancement gan: mixed stylistic enhancement generation for 3d indoor scenes. Vis. Comput. 35(6), 1157–1169 (2019)
Article Google Scholar
Lu, Y., Fu, S., Zhang, X.H., Xie, N.: Denoising Monte Carlo renderings via a multi-scale featured dual-residual gan. Vis. Comput. 37(9), 2513–2525 (2021)
Article Google Scholar
Yaniv, J., Newman, Y., Shamir, A.: The face of art: landmark detection and geometric style in portraits. ACM Trans. Graph. (TOG) 38(4), 1–15 (2019)
Article Google Scholar
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. arXiv preprint arXiv:2006.06676 (2020)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR, pp. 4401–4410 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: CVPR, pp. 8110–8119 (2020)
Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: ECCV, pp. 172–189 (2018)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. In: ICLR (2018)
Noguchi, A., Harada, T.: Image generation from small datasets via batch statistics adaptation. In: ICCV, pp. 2750–2758 (2019)
Wang, Y., Gonzalez-Garcia, A., Berga, D., Herranz, L., Khan, F.S., Weijer, J.V.D.: Minegan: effective knowledge transfer from gans to target domains with few images. In: CVPR, pp. 9332–9341 (2020)
Mo, S., Cho, M., Shin, J.: Freeze discriminator: a simple baseline for fine-tuning gans. arXiv preprint arXiv:2002.10964 (2020)
Li, Y., Zhang, R., Lu, J., Shechtman, E.: Few-shot image generation with elastic weight consolidation. arXiv preprint arXiv:2012.02780 (2020)
Zhao, S., Liu, Z., Lin, J., Zhu, J.Y., Han, S.: Differentiable augmentation for data-efficient gan training. Adv. Neural Inf. Process. Syst. 33, 7559–7570 (2020)
Google Scholar
Gatys, L., Ecker, A., Bethge, M.: A neural algorithm of artistic style. J. Vis. 16(12), 326–326 (2016)
Article Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR, pp. 2414–2423 (2016)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017)
Chen, Y., Lai, Y.K., Liu, Y.J.: Cartoongan: generative adversarial networks for photo cartoonization. In: CVPR, pp. 9465–9474 (2018)
He, B., Gao, F., Ma, D., Shi, B., Duan, L.Y.: Chipgan: a generative adversarial network for Chinese ink wash painting style transfer. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1172–1180 (2018)
Yi, R., Liu, Y.J., Lai, Y.K., Rosin, P.L.: Apdrawinggan: generating artistic portrait drawings from face photos with hierarchical gans. In: CVPR, pp. 10743–10752 (2019)
Cao, K., Liao, J., Yuan, L.: Carigans: unpaired photo-to-caricature translation. ACM Trans. Graph. (TOG) 37(6), 1–14 (2018)
Google Scholar
Shi, Y., Deb, D., Jain, A.K.: Warpgan: automatic caricature generation. In: CVPR, pp. 10762–10771 (2019)
Bai, J., Chen, R., Liu, M.: Feature-attention module for context-aware image-to-image translation. Vis. Comput. 36(10), 2145–2159 (2020)
Article Google Scholar
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR, pp. 8789–8797 (2018)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: CVPR, pp. 8798–8807 (2018)
Wang, L., Patel, V.M., Sindagi, V.A.: High-quality facial photo-sketch synthesis using multi-adversarial networks. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 83–90. IEEE (2018)
Islam, N.U., Park, J.: Face attribute modification using fine-tuned attribute-modification network. Electronics 9(5), 743 (2020)
Article Google Scholar
He, Z., Zuo, W., Kan, M., Shan, S., Chen, X.: Attgan: facial attribute editing by only changing what you want. IEEE Trans. Image Process. 28(11), 5464–5478 (2019)
Article MathSciNet MATH Google Scholar
Islam, N.U., Lee, S., Park, J.: Accurate and consistent image-to-image conditional adversarial network. Electronics 9(3), 395 (2020)
Article Google Scholar
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. Adv. Neural Inf. Process. Syst. 30, 700–708 (2017)
Google Scholar
Liu, M.Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., Kautz, J.: Few-shot unsupervised image-to-image translation. In: ICCV, pp. 10551–10560 (2019)
Park, T., Zhu, J.Y., Wang, O., Lu, J., Shechtman, E., Efros, A., Zhang, R.: Swapping autoencoder for deep image manipulation. Adv. Neural Inf. Process. Syst. 33, 7198–7211 (2020)
Google Scholar
Cao, B., Wang, N., Li, J., Hu, Q., Gao, X.: Face photo-sketch synthesis via full-scale identity supervision. Pattern Recogn. 124, 108446 (2022)
Article Google Scholar
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV, pp. 1501–1510 (2017)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: ECCV, pp. 483–499. Springer (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607. PMLR (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR, pp. 9729–9738 (2020)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV, pp. 694–711. Springer (2016)
Dosovitskiy, A., Brox, T.: Inverting visual representations with convolutional networks. In: CVPR, pp. 4829–4837 (2016)
Ulyanov, D., Lebedev, V., Vedaldi, A., Lempitsky, V.S.: Texture networks: feed-forward synthesis of textures and stylized images. In: ICML, vol. 1, p. 4 (2016)
Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. In: ICLR (2019)
Jha, A.H., Anand, S., Singh, M., Veeravasarapu, V.: Disentangling factors of variation with cycle-consistent variational auto-encoders. In: ECCV, pp. 805–820 (2018)
Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for gans do actually converge?. In: ICML, pp. 3481–3490. PMLR (2018)
Esser, P., Haux, J., Ommer, B.: Unsupervised robust disentangling of latent characteristics for image synthesis. In: ICCV, pp. 2699–2709 (2019)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV, pp. 3730–3738 (2015)
Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: ECCV, pp. 35–51 (2018)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6629–6640 (2017)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 2234–2242 (2016)
Binkowski, M., Sutherland, J.D., Arbel, M., Gretton, A.: Demystifying mmd gans. In: ICLR (2018)
Mejjati, Y., Richardt, C., Tompkin, J., Cosker, D., Kim, K.I.: Unsupervised attention-guided image-to-image translation. Adv. Neural Inf. Process. Syst. (2018)
Chen, R., Huang, W., Huang, B., Sun, F., Fang, B.: Reusing discriminators for encoding: towards unsupervised image-to-image translation. In: CVPR, pp. 8168–8177 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)
Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Ghasemipour, S.K.S., Ayan, B.K., Mahdavi, S.S., Lopes, R.G., et al.: Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487 (2022)
Lee, C.H., Liu, Z., Wu, L., Luo, P.: Maskgan: towards diverse and interactive facial image manipulation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020 (2021)

Download references

Acknowledgements

This work was supported in part by the Natural Science Foundation of Guangdong Province under Grant 2021A1515011888, and the Science and technology research in key areas in Foshan under Grant 2020001006832, the Key-Area Research and Development Program of Guangdong Province under Grant 2018B010109007 and 2019B010153002, and the Guangzhou R &D Programme in Key Areas of Science and Technology Projects under Grant 202007040006, and the Guangdong Provincial Key Laboratory of Cyber-Physical System under Grant 2020B1212060069, and the Program of Marine Economy Development (Six Marine Industries) Special Foundation of Department of Natural Resources of Guangdong Province under Grant GDNRC [2020]056.

Author information

Authors and Affiliations

School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006, China
Jiaying Lan, Pingping Xu & Guoheng Huang
School of Art and Design, Guangdong University of Technology, Guangzhou, 510006, China
Fenghua Ye
City College, The Guangzhou Academy of Fine Arts, Guangzhou, 510006, China
Zhenghua Ye
School of Information Engineering, Guangdong University of Technology, Guangzhou, 510006, China
Wing-Kuen Ling

Authors

Jiaying Lan
View author publications
You can also search for this author in PubMed Google Scholar
Fenghua Ye
View author publications
You can also search for this author in PubMed Google Scholar
Zhenghua Ye
View author publications
You can also search for this author in PubMed Google Scholar
Pingping Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wing-Kuen Ling
View author publications
You can also search for this author in PubMed Google Scholar
Guoheng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Fenghua Ye or Guoheng Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lan, J., Ye, F., Ye, Z. et al. Unsupervised style-guided cross-domain adaptation for few-shot stylized face translation. Vis Comput 39, 6167–6181 (2023). https://doi.org/10.1007/s00371-022-02719-4

Download citation

Accepted: 22 October 2022
Published: 16 November 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00371-022-02719-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised style-guided cross-domain adaptation for few-shot stylized face translation

Abstract

Access this article

Similar content being viewed by others

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Learning to Prompt for Vision-Language Models

A literature review and perspectives in deepfakes: generation, detection, and applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised style-guided cross-domain adaptation for few-shot stylized face translation

Abstract

Access this article

Similar content being viewed by others

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Learning to Prompt for Vision-Language Models

A literature review and perspectives in deepfakes: generation, detection, and applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation