Abstract
When a domain-to-domain translation-based generative model is trained on a target domain with limited instances and no paired samples, two key issues will emerge: the introduction of noticeable artifacts and the severe overfitting. Existing methods require several thousands of images to perform the proper training, and they tend to consider only a few key elements to control the generation of a single style, while the noticeable artifacts are introduced in the local shapes of their generated faces. When the total number of available images is reduced to just a few images, the training samples are usually overfitted or of poor quality. To address these issues, we propose the portrait style conversion-generative adversarial network to translate one portrait of a photo into varieties of the specific style appearances. Firstly, for the purpose of reducing the noticeable artifacts, a visual content disentangled module is put forward to extract the disentangled visual representations, preserving both the similarities and differences among the instances of the source images. At the same time, a cross-domain generative module is proposed to complete the cross-domain adaptation from source to target. Then, for the purpose of reducing the overfitting, an anchor-based strategy called the adaptive joint patch discriminative module is presented to encourage different levels of realism over different regions in the latent space. Lastly, experiments and ablation studies of the existing state-of-the-art GAN-based model performance comparison were conducted to validate the efficacy of our approach. The proposed method preserves more global characteristics of source image and improves the total image perceptual quality significantly. The results show high superiority of our method compared to the existing state-of-the-art models.
Similar content being viewed by others
References
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014)
Kim, J., Kim, M., Kang, H., Lee, K.H.: U-gat-it: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In: ICLR (2019)
Nizan, O., Tal, A.: Breaking the cycle—colleagues are all you need. In: CVPR (2020)
Zhao, Y., Wu, R., Dong, H.: Unpaired image-to-image translation using adversarial consistency loss. In: ECCV (2020)
Fang, Z., Liu, Z., Liu, T., Hung, C.C., Xiao, J., Feng, G.: Facial expression gan for voice-driven face generation. Vis. Comput. 1–14 (2021)
Sun, Q., Chen, Y., Tao, W., Jiang, H., Zhang, M., Chen, K., Erdt, M.: A gan-based approach toward architectural line drawing colorization prototyping. Vis. Comput. 1–18 (2021)
Nozawa, N., Shum, H.P., Feng, Q., Ho, E.S., Morishima, S.: 3d car shape reconstruction from a contour sketch using gan and lazy learning. Vis. Comput. 1–14 (2021)
Zhang, S., Han, Z., Lai, Y.K., Zwicker, M., Zhang, H.: Stylistic scene enhancement gan: mixed stylistic enhancement generation for 3d indoor scenes. Vis. Comput. 35(6), 1157–1169 (2019)
Lu, Y., Fu, S., Zhang, X.H., Xie, N.: Denoising Monte Carlo renderings via a multi-scale featured dual-residual gan. Vis. Comput. 37(9), 2513–2525 (2021)
Yaniv, J., Newman, Y., Shamir, A.: The face of art: landmark detection and geometric style in portraits. ACM Trans. Graph. (TOG) 38(4), 1–15 (2019)
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. arXiv preprint arXiv:2006.06676 (2020)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR, pp. 4401–4410 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: CVPR, pp. 8110–8119 (2020)
Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: ECCV, pp. 172–189 (2018)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. In: ICLR (2018)
Noguchi, A., Harada, T.: Image generation from small datasets via batch statistics adaptation. In: ICCV, pp. 2750–2758 (2019)
Wang, Y., Gonzalez-Garcia, A., Berga, D., Herranz, L., Khan, F.S., Weijer, J.V.D.: Minegan: effective knowledge transfer from gans to target domains with few images. In: CVPR, pp. 9332–9341 (2020)
Mo, S., Cho, M., Shin, J.: Freeze discriminator: a simple baseline for fine-tuning gans. arXiv preprint arXiv:2002.10964 (2020)
Li, Y., Zhang, R., Lu, J., Shechtman, E.: Few-shot image generation with elastic weight consolidation. arXiv preprint arXiv:2012.02780 (2020)
Zhao, S., Liu, Z., Lin, J., Zhu, J.Y., Han, S.: Differentiable augmentation for data-efficient gan training. Adv. Neural Inf. Process. Syst. 33, 7559–7570 (2020)
Gatys, L., Ecker, A., Bethge, M.: A neural algorithm of artistic style. J. Vis. 16(12), 326–326 (2016)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR, pp. 2414–2423 (2016)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017)
Chen, Y., Lai, Y.K., Liu, Y.J.: Cartoongan: generative adversarial networks for photo cartoonization. In: CVPR, pp. 9465–9474 (2018)
He, B., Gao, F., Ma, D., Shi, B., Duan, L.Y.: Chipgan: a generative adversarial network for Chinese ink wash painting style transfer. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1172–1180 (2018)
Yi, R., Liu, Y.J., Lai, Y.K., Rosin, P.L.: Apdrawinggan: generating artistic portrait drawings from face photos with hierarchical gans. In: CVPR, pp. 10743–10752 (2019)
Cao, K., Liao, J., Yuan, L.: Carigans: unpaired photo-to-caricature translation. ACM Trans. Graph. (TOG) 37(6), 1–14 (2018)
Shi, Y., Deb, D., Jain, A.K.: Warpgan: automatic caricature generation. In: CVPR, pp. 10762–10771 (2019)
Bai, J., Chen, R., Liu, M.: Feature-attention module for context-aware image-to-image translation. Vis. Comput. 36(10), 2145–2159 (2020)
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR, pp. 8789–8797 (2018)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: CVPR, pp. 8798–8807 (2018)
Wang, L., Patel, V.M., Sindagi, V.A.: High-quality facial photo-sketch synthesis using multi-adversarial networks. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 83–90. IEEE (2018)
Islam, N.U., Park, J.: Face attribute modification using fine-tuned attribute-modification network. Electronics 9(5), 743 (2020)
He, Z., Zuo, W., Kan, M., Shan, S., Chen, X.: Attgan: facial attribute editing by only changing what you want. IEEE Trans. Image Process. 28(11), 5464–5478 (2019)
Islam, N.U., Lee, S., Park, J.: Accurate and consistent image-to-image conditional adversarial network. Electronics 9(3), 395 (2020)
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. Adv. Neural Inf. Process. Syst. 30, 700–708 (2017)
Liu, M.Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., Kautz, J.: Few-shot unsupervised image-to-image translation. In: ICCV, pp. 10551–10560 (2019)
Park, T., Zhu, J.Y., Wang, O., Lu, J., Shechtman, E., Efros, A., Zhang, R.: Swapping autoencoder for deep image manipulation. Adv. Neural Inf. Process. Syst. 33, 7198–7211 (2020)
Cao, B., Wang, N., Li, J., Hu, Q., Gao, X.: Face photo-sketch synthesis via full-scale identity supervision. Pattern Recogn. 124, 108446 (2022)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV, pp. 1501–1510 (2017)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: ECCV, pp. 483–499. Springer (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607. PMLR (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR, pp. 9729–9738 (2020)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV, pp. 694–711. Springer (2016)
Dosovitskiy, A., Brox, T.: Inverting visual representations with convolutional networks. In: CVPR, pp. 4829–4837 (2016)
Ulyanov, D., Lebedev, V., Vedaldi, A., Lempitsky, V.S.: Texture networks: feed-forward synthesis of textures and stylized images. In: ICML, vol. 1, p. 4 (2016)
Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. In: ICLR (2019)
Jha, A.H., Anand, S., Singh, M., Veeravasarapu, V.: Disentangling factors of variation with cycle-consistent variational auto-encoders. In: ECCV, pp. 805–820 (2018)
Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for gans do actually converge?. In: ICML, pp. 3481–3490. PMLR (2018)
Esser, P., Haux, J., Ommer, B.: Unsupervised robust disentangling of latent characteristics for image synthesis. In: ICCV, pp. 2699–2709 (2019)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV, pp. 3730–3738 (2015)
Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: ECCV, pp. 35–51 (2018)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6629–6640 (2017)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 2234–2242 (2016)
Binkowski, M., Sutherland, J.D., Arbel, M., Gretton, A.: Demystifying mmd gans. In: ICLR (2018)
Mejjati, Y., Richardt, C., Tompkin, J., Cosker, D., Kim, K.I.: Unsupervised attention-guided image-to-image translation. Adv. Neural Inf. Process. Syst. (2018)
Chen, R., Huang, W., Huang, B., Sun, F., Fang, B.: Reusing discriminators for encoding: towards unsupervised image-to-image translation. In: CVPR, pp. 8168–8177 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)
Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Ghasemipour, S.K.S., Ayan, B.K., Mahdavi, S.S., Lopes, R.G., et al.: Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487 (2022)
Lee, C.H., Liu, Z., Wu, L., Luo, P.: Maskgan: towards diverse and interactive facial image manipulation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020 (2021)
Acknowledgements
This work was supported in part by the Natural Science Foundation of Guangdong Province under Grant 2021A1515011888, and the Science and technology research in key areas in Foshan under Grant 2020001006832, the Key-Area Research and Development Program of Guangdong Province under Grant 2018B010109007 and 2019B010153002, and the Guangzhou R &D Programme in Key Areas of Science and Technology Projects under Grant 202007040006, and the Guangdong Provincial Key Laboratory of Cyber-Physical System under Grant 2020B1212060069, and the Program of Marine Economy Development (Six Marine Industries) Special Foundation of Department of Natural Resources of Guangdong Province under Grant GDNRC [2020]056.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lan, J., Ye, F., Ye, Z. et al. Unsupervised style-guided cross-domain adaptation for few-shot stylized face translation. Vis Comput 39, 6167–6181 (2023). https://doi.org/10.1007/s00371-022-02719-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02719-4