Skip to main content
Log in

Learning Portrait Drawing with Unsupervised Parts

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Translating face photos into portrait drawings takes hours for a skilled artist which makes automatic generation of them desirable. Portrait drawing is a difficult image translation task with its own unique challenges. It requires emphasizing important key features of faces as well as ignoring many details of them. Therefore, an image translator should have the capacity to detect facial features and output images with the selected content of the photo preserved. In this work, we propose a method for portrait drawing that only learns from unpaired data with no additional labels. Our method via unsupervised feature learning shows good domain generalization behavior. Our first contribution is an image translation architecture that combines the high-level understanding of images with unsupervised parts and the identity preservation behavior of shallow networks. Our second contribution is a novel asymmetric pose-based cycle consistency loss. This loss relaxes the constraint on the cycle consistency loss which requires an input image to be reconstructed after transformations to a portrait and back to the input image. However, going from an RGB image to a portrait, information loss is expected (e.g. colors, background). This is what cycle consistency constraint tries to prevent and when applied to this scenario, results in learning a translation network that embeds the overall information of RGB images into portraits and causes artifacts in portrait images. Our proposed loss solves this issue. Lastly, we run extensive experiments both on in-domain and out-of-domain images and compare our method with state-of-the-art approaches. We show significant improvements both quantitatively and qualitatively on three datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Abdal, R., Qin, Y., & Wonka, P. (2019). Image2stylegan: How to embed images into the stylegan latent space? In Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), (pp. 4432–4441).

  • Altindis, S.F., Dalva, Y., Pehlivan, H., & Dundar A. (2021). Benchmarking the robustness of instance segmentation models. arXiv preprint arXiv:2109.01123

  • Bhattad, A., Dundar, A., Liu, G., Tao, A., & Catanzaro, B. (2021). View generalization for single image textured 3d models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 6081–6090).

  • Choi, Y., Choi, M., Kim, M., Ha, J.-W., Kim, S., & J. Choo (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 8789–8797).

  • Choi, Y., Uh, Y., Yoo, J., & Ha, J.-W. (2020). Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Dalva, Y., Altindis, S. F., & Dundar, A. (2022). Vecgan: Image-to-image translation with interpretable latent directions. In European conference on computer vision (ECCV).

  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (CVPR), pages 248–255. Ieee.

  • Dundar, A., Gao, J., Tao, A., & Catanzaro, B. (2022). Fine detailed texture learning for 3d meshes with generative models. arXiv preprint arXiv:2203.09362.

  • Dundar, A., Liu, M.-Y., Yu, Z., Wang, T.-C., Zedlewski, J., & Kautz, J. (2020). Domain stylization: A fast covariance matching framework towards domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(7), 2360–2372.

    Article  Google Scholar 

  • Dundar, A., Sapra, K., Liu, G., Tao, A., & Catanzaro, B. (2020). Panoptic-based image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 8070–8079).

  • Dundar, A., Shih, K., Garg, A., Pottorff, R., Tao, A., & Catanzaro, B. (2021). Unsupervised disentanglement of pose, appearance and background from images and videos. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI).

  • Engin,D., Genç,A., & Kemal Ekenel, H. (2018). Cycle-dehaze: Enhanced cyclegan for single image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR-W), (pp. 825–833).

  • Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 2414–2423).

  • Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems (NeurIPS).

  • Huang, J., Liao, J., & Kwong, S. (2021). Unsupervised image-to-image translation via pre-trained stylegan2 network. IEEE Transactions on Multimedia, 24, 1435–1448.

    Article  Google Scholar 

  • Huang, X., Liu, M.-Y., Belongie, S., & Kautz, J. (2018). Multimodal unsupervised image-to-image translation. European Conference on Computer Vision (ECCV).

  • Hung, W.-C., Jampani, V., Liu, S., Molchanov, P., Yang, M.-H., & Kautz, J. (2019). Scops: Self-supervised co-part segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 869–878).

  • Isola, P., Zhu, J.-Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 1125–1134.

  • Jakab, T., Gupta, A., Bilen, H., & Vedaldi, A. (2018). Unsupervised learning of object landmarks through conditional image generation. Advances in Neural Information Processing Systems (NeurIPS), .

  • Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., & Aila, T. (2020). Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems (NeuRIPS), 33, 12104–12114.

    Google Scholar 

  • Kim, G., Park, J., Lee, K., Lee, J., Min, J., Lee, B., Han, D. K., & Ko, H. (2020). Unsupervised real-world super resolution with cycle generative adversarial network and domain discriminator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 456–457.

  • Li, X., Zhang, S., Hu, J., Cao, L., Hong, X., Mao, X., Huang, F., Wu, Y., & Ji, R. (2021). Image-to-image translation via hierarchical style disentanglement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8639–8648.

  • Liu, G., Dundar, A., Shih, K. J., Wang, T.-C., Reda, F. A., Sapra, K., Yu, Z., Yang, X., Tao, A., & Catanzaro, B. (2022). Partial convolution for padding, inpainting, and image synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 6096–6110.

    Google Scholar 

  • Liu, M.-Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. Advances in neural information processing systems (NeurIPS), 30.

  • Liu, M.-Y., & Tuzel, O. (2016). Coupled generative adversarial networks. Advances in Neural Information Processing Systems (NeurIPS), 29, 469–477.

    Google Scholar 

  • Liu, Z., Luo, P., Wang, X., & Tang, X. (December 2015). Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV).

  • Lorenz, D., Bereska, L., Milbich, T., & Ommer, B. (2019). Unsupervised part-based disentangling of object shape and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

  • Mardani, M., Liu, G., Dundar, A., Liu, S., Tao, A., & Catanzaro, B. (2020). Neural ffts for universal texture image synthesis. Advances in Neural Information Processing Systems (NeurIPS), 33, 14081–14092.

    Google Scholar 

  • Park, T., Efros, A. A., Zhang, R., & Zhu, J.-Y. (2020). Contrastive learning for unpaired image-to-image translation. In European conference on computer vision (ECCV), pages 319–345. Springer.

  • Park, T., Liu, M.-Y., Wang, T.-C., & Zhu, J.-Y. (2019). Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2337–2346.

  • Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., & Cohen-Or, D. (2021). Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 2287–2296.

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention.

  • Sardari, F., Ommer, B., & Mirmehdi, M. (2021). Unsupervised view-invariant human posture representation. arXiv preprint arXiv:2109.08730.

  • Shen, W., & Liu, R. (2017). Learning residual images for face attribute manipulation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 4030–4038.

  • Shyam, P., Yoon, K.-J., & Kim, K.-S. (2021). Towards domain invariant single image dehazing. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 9657–9665.

    Article  Google Scholar 

  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 2818–2826.

  • Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., & Catanzaro, B. (2018). High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 8798–8807.

  • Xiao, T., Hong, J., & Ma, J. (2018). Elegant: Exchanging latent encodings with gan for transferring multiple face attributes. In Proceedings of the European conference on computer vision (ECCV), pages 168–184.

  • Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision (ICCV), pages 1395–1403.

  • Xu, M., Wang, H., & Ni, B. (2022). Graphical modeling for multi-source domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI).

  • Yang, S., Jiang, L., Liu, Z., & Loy, C. C. (2022). Pastiche master: Exemplar-based high-resolution portrait style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7693–7702.

  • Yi, R., Liu, Y.-J., Lai, Y.-K., & Rosin, P. (2022). Quality metric guided portrait line drawing generation from unpaired training data. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI).

  • Yi, R., Liu, Y.-J., Lai, Y.-K., & Rosin, P. L. (2019). Apdrawinggan: Generating artistic portrait drawings from face photos with hierarchical gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10743–10752.

  • Yi, R., Liu, Y.-J., Lai, Y.-K., & Rosin, P. L. (2020). Unpaired portrait drawing generation via asymmetric cycle mapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8217–8225.

  • Yi, R., Xia, M., Liu, Y. J., Lai, Y. K., & Rosin, P. L. (2020). Line drawings for face portraits from photos using global and local structure based gans. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), pages 1–1.

  • Yi, Z., Zhang, H., Tan, P., & Gong, M. (2017). Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of International Conference on Computer Vision (ICCV).

  • Yuan, Y., Liu, S., Zhang, J., Zhang, Y., Dong, C., & Lin, L. (2018). Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR-W), pages 701–710.

  • Zhang, G., Kan, M., Shan, S., & Chen, X. (2018). Generative adversarial network with spatial attention for face attribute editing. In Proceedings of the European conference on computer vision (ECCV), pages 417–432.

  • Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595.

  • Zheng, Z., Wu, Y., Han, X., & Shi, J. (August 2020). Forkgan: Seeing into the rainy night. In The IEEE European Conference on Computer Vision (ECCV).

  • Zhu, J., Shen, Y., Zhao, D., & Zhou, B. (2020). In-domain gan inversion for real image editing. In European conference on computer vision (ECCV), pages 592–608. Springer.

  • Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of International Conference on Computer Vision (ICCV).

  • Zou, Y., Yang, X., Yu, Z., Kumar, B., & Kautz, J. (2020). Joint disentangling and adaptation for cross-domain person re-identification. In European Conference on Computer Vision (ECCV), pages 87–104. Springer.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aysegul Dundar.

Additional information

Communicated by Oliver Zendel.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tasdemir, B., Gudukbay, M.G., Eldenk, D. et al. Learning Portrait Drawing with Unsupervised Parts. Int J Comput Vis 132, 1205–1218 (2024). https://doi.org/10.1007/s11263-023-01927-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01927-2

Keywords

Navigation