Skip to main content
Log in

Unsupervised style-guided cross-domain adaptation for few-shot stylized face translation

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

When a domain-to-domain translation-based generative model is trained on a target domain with limited instances and no paired samples, two key issues will emerge: the introduction of noticeable artifacts and the severe overfitting. Existing methods require several thousands of images to perform the proper training, and they tend to consider only a few key elements to control the generation of a single style, while the noticeable artifacts are introduced in the local shapes of their generated faces. When the total number of available images is reduced to just a few images, the training samples are usually overfitted or of poor quality. To address these issues, we propose the portrait style conversion-generative adversarial network to translate one portrait of a photo into varieties of the specific style appearances. Firstly, for the purpose of reducing the noticeable artifacts, a visual content disentangled module is put forward to extract the disentangled visual representations, preserving both the similarities and differences among the instances of the source images. At the same time, a cross-domain generative module is proposed to complete the cross-domain adaptation from source to target. Then, for the purpose of reducing the overfitting, an anchor-based strategy called the adaptive joint patch discriminative module is presented to encourage different levels of realism over different regions in the latent space. Lastly, experiments and ablation studies of the existing state-of-the-art GAN-based model performance comparison were conducted to validate the efficacy of our approach. The proposed method preserves more global characteristics of source image and improves the total image perceptual quality significantly. The results show high superiority of our method compared to the existing state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014)

  2. Kim, J., Kim, M., Kang, H., Lee, K.H.: U-gat-it: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In: ICLR (2019)

  3. Nizan, O., Tal, A.: Breaking the cycle—colleagues are all you need. In: CVPR (2020)

  4. Zhao, Y., Wu, R., Dong, H.: Unpaired image-to-image translation using adversarial consistency loss. In: ECCV (2020)

  5. Fang, Z., Liu, Z., Liu, T., Hung, C.C., Xiao, J., Feng, G.: Facial expression gan for voice-driven face generation. Vis. Comput. 1–14 (2021)

  6. Sun, Q., Chen, Y., Tao, W., Jiang, H., Zhang, M., Chen, K., Erdt, M.: A gan-based approach toward architectural line drawing colorization prototyping. Vis. Comput. 1–18 (2021)

  7. Nozawa, N., Shum, H.P., Feng, Q., Ho, E.S., Morishima, S.: 3d car shape reconstruction from a contour sketch using gan and lazy learning. Vis. Comput. 1–14 (2021)

  8. Zhang, S., Han, Z., Lai, Y.K., Zwicker, M., Zhang, H.: Stylistic scene enhancement gan: mixed stylistic enhancement generation for 3d indoor scenes. Vis. Comput. 35(6), 1157–1169 (2019)

    Article  Google Scholar 

  9. Lu, Y., Fu, S., Zhang, X.H., Xie, N.: Denoising Monte Carlo renderings via a multi-scale featured dual-residual gan. Vis. Comput. 37(9), 2513–2525 (2021)

    Article  Google Scholar 

  10. Yaniv, J., Newman, Y., Shamir, A.: The face of art: landmark detection and geometric style in portraits. ACM Trans. Graph. (TOG) 38(4), 1–15 (2019)

    Article  Google Scholar 

  11. Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. arXiv preprint arXiv:2006.06676 (2020)

  12. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR, pp. 4401–4410 (2019)

  13. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: CVPR, pp. 8110–8119 (2020)

  14. Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: ECCV, pp. 172–189 (2018)

  15. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

  16. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. In: ICLR (2018)

  17. Noguchi, A., Harada, T.: Image generation from small datasets via batch statistics adaptation. In: ICCV, pp. 2750–2758 (2019)

  18. Wang, Y., Gonzalez-Garcia, A., Berga, D., Herranz, L., Khan, F.S., Weijer, J.V.D.: Minegan: effective knowledge transfer from gans to target domains with few images. In: CVPR, pp. 9332–9341 (2020)

  19. Mo, S., Cho, M., Shin, J.: Freeze discriminator: a simple baseline for fine-tuning gans. arXiv preprint arXiv:2002.10964 (2020)

  20. Li, Y., Zhang, R., Lu, J., Shechtman, E.: Few-shot image generation with elastic weight consolidation. arXiv preprint arXiv:2012.02780 (2020)

  21. Zhao, S., Liu, Z., Lin, J., Zhu, J.Y., Han, S.: Differentiable augmentation for data-efficient gan training. Adv. Neural Inf. Process. Syst. 33, 7559–7570 (2020)

    Google Scholar 

  22. Gatys, L., Ecker, A., Bethge, M.: A neural algorithm of artistic style. J. Vis. 16(12), 326–326 (2016)

    Article  Google Scholar 

  23. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR, pp. 2414–2423 (2016)

  24. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017)

  25. Chen, Y., Lai, Y.K., Liu, Y.J.: Cartoongan: generative adversarial networks for photo cartoonization. In: CVPR, pp. 9465–9474 (2018)

  26. He, B., Gao, F., Ma, D., Shi, B., Duan, L.Y.: Chipgan: a generative adversarial network for Chinese ink wash painting style transfer. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1172–1180 (2018)

  27. Yi, R., Liu, Y.J., Lai, Y.K., Rosin, P.L.: Apdrawinggan: generating artistic portrait drawings from face photos with hierarchical gans. In: CVPR, pp. 10743–10752 (2019)

  28. Cao, K., Liao, J., Yuan, L.: Carigans: unpaired photo-to-caricature translation. ACM Trans. Graph. (TOG) 37(6), 1–14 (2018)

    Google Scholar 

  29. Shi, Y., Deb, D., Jain, A.K.: Warpgan: automatic caricature generation. In: CVPR, pp. 10762–10771 (2019)

  30. Bai, J., Chen, R., Liu, M.: Feature-attention module for context-aware image-to-image translation. Vis. Comput. 36(10), 2145–2159 (2020)

    Article  Google Scholar 

  31. Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR, pp. 8789–8797 (2018)

  32. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)

  33. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: CVPR, pp. 8798–8807 (2018)

  34. Wang, L., Patel, V.M., Sindagi, V.A.: High-quality facial photo-sketch synthesis using multi-adversarial networks. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 83–90. IEEE (2018)

  35. Islam, N.U., Park, J.: Face attribute modification using fine-tuned attribute-modification network. Electronics 9(5), 743 (2020)

    Article  Google Scholar 

  36. He, Z., Zuo, W., Kan, M., Shan, S., Chen, X.: Attgan: facial attribute editing by only changing what you want. IEEE Trans. Image Process. 28(11), 5464–5478 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  37. Islam, N.U., Lee, S., Park, J.: Accurate and consistent image-to-image conditional adversarial network. Electronics 9(3), 395 (2020)

    Article  Google Scholar 

  38. Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. Adv. Neural Inf. Process. Syst. 30, 700–708 (2017)

    Google Scholar 

  39. Liu, M.Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., Kautz, J.: Few-shot unsupervised image-to-image translation. In: ICCV, pp. 10551–10560 (2019)

  40. Park, T., Zhu, J.Y., Wang, O., Lu, J., Shechtman, E., Efros, A., Zhang, R.: Swapping autoencoder for deep image manipulation. Adv. Neural Inf. Process. Syst. 33, 7198–7211 (2020)

    Google Scholar 

  41. Cao, B., Wang, N., Li, J., Hu, Q., Gao, X.: Face photo-sketch synthesis via full-scale identity supervision. Pattern Recogn. 124, 108446 (2022)

    Article  Google Scholar 

  42. Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV, pp. 1501–1510 (2017)

  43. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: ECCV, pp. 483–499. Springer (2016)

  44. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

  45. Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  46. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607. PMLR (2020)

  47. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR, pp. 9729–9738 (2020)

  48. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV, pp. 694–711. Springer (2016)

  49. Dosovitskiy, A., Brox, T.: Inverting visual representations with convolutional networks. In: CVPR, pp. 4829–4837 (2016)

  50. Ulyanov, D., Lebedev, V., Vedaldi, A., Lempitsky, V.S.: Texture networks: feed-forward synthesis of textures and stylized images. In: ICML, vol. 1, p. 4 (2016)

  51. Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. In: ICLR (2019)

  52. Jha, A.H., Anand, S., Singh, M., Veeravasarapu, V.: Disentangling factors of variation with cycle-consistent variational auto-encoders. In: ECCV, pp. 805–820 (2018)

  53. Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for gans do actually converge?. In: ICML, pp. 3481–3490. PMLR (2018)

  54. Esser, P., Haux, J., Ommer, B.: Unsupervised robust disentangling of latent characteristics for image synthesis. In: ICCV, pp. 2699–2709 (2019)

  55. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV, pp. 3730–3738 (2015)

  56. Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: ECCV, pp. 35–51 (2018)

  57. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6629–6640 (2017)

  58. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 2234–2242 (2016)

  59. Binkowski, M., Sutherland, J.D., Arbel, M., Gretton, A.: Demystifying mmd gans. In: ICLR (2018)

  60. Mejjati, Y., Richardt, C., Tompkin, J., Cosker, D., Kim, K.I.: Unsupervised attention-guided image-to-image translation. Adv. Neural Inf. Process. Syst. (2018)

  61. Chen, R., Huang, W., Huang, B., Sun, F., Fang, B.: Reusing discriminators for encoding: towards unsupervised image-to-image translation. In: CVPR, pp. 8168–8177 (2020)

  62. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)

  63. Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Ghasemipour, S.K.S., Ayan, B.K., Mahdavi, S.S., Lopes, R.G., et al.: Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487 (2022)

  64. Lee, C.H., Liu, Z., Wu, L., Luo, P.: Maskgan: towards diverse and interactive facial image manipulation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  65. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020 (2021)

Download references

Acknowledgements

This work was supported in part by the Natural Science Foundation of Guangdong Province under Grant 2021A1515011888, and the Science and technology research in key areas in Foshan under Grant 2020001006832, the Key-Area Research and Development Program of Guangdong Province under Grant 2018B010109007 and 2019B010153002, and the Guangzhou R &D Programme in Key Areas of Science and Technology Projects under Grant 202007040006, and the Guangdong Provincial Key Laboratory of Cyber-Physical System under Grant 2020B1212060069, and the Program of Marine Economy Development (Six Marine Industries) Special Foundation of Department of Natural Resources of Guangdong Province under Grant GDNRC [2020]056.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Fenghua Ye or Guoheng Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lan, J., Ye, F., Ye, Z. et al. Unsupervised style-guided cross-domain adaptation for few-shot stylized face translation. Vis Comput 39, 6167–6181 (2023). https://doi.org/10.1007/s00371-022-02719-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02719-4

Keywords

Navigation