Disentangling latent space better for few-shot image-to-image translation

Liu, Peng; Wang, Yueyue; Du, Angang; Zhang, Liqiang; Wei, Bin; Gu, Zhaorui; Wang, Xiaodong; Zheng, Haiyong; Li, Juan

doi:10.1007/s13042-022-01552-4

Disentangling latent space better for few-shot image-to-image translation

Original Article
Published: 04 May 2022

Volume 14, pages 419–427, (2023)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Peng Liu ORCID: orcid.org/0000-0001-6447-3382¹^na1,
Yueyue Wang¹^na1,
Angang Du²,
Liqiang Zhang²,
Bin Wei^4,5,
Zhaorui Gu²,
Xiaodong Wang³,
Haiyong Zheng² &
…
Juan Li⁶

778 Accesses
1 Altmetric
Explore all metrics

Abstract

In an unpaired image-to-image translation, the main concept is to learn an underlying mapping between the source and target domains. Previous approaches required large numbers of data from both domains to learn this mapping. However, under a few-shot condition, that is, few-shot image-to-image translation, only one domain can meet the required number of data , and thus, the underlying mapping becomes ill-conditioned owing to the limited data as well as the imbalanced distribution of the two domains. We argue that a powerful model with a better disentangled representation of the latent space can better tackle the more challenging few-shot image-to-image translation . Motivated by this, under a partially-shared assumption, we propose a better disentanglement of the content and style latent space using a domain-specific style latent classifier and a domain-shared cross-content latent discriminator. Moreover, we design asymmetric weak/strong domain discriminators to achieve a better translation performance with limited data within the few-shot domain. Furthermore, our method can be easily embedded into any latent space disentangled model of an image-to-image translation for a few-shot setting. Subjective evaluation and objective evaluation results both show that compared with other state-of-the-art methods, the images synthesized by our method have higher fidelity while maintaining certain diversity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder

ManiFest: Manifold Deformation for Few-Shot Image Translation

Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning

References

Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN. In: ICML, pp 214–223
Bai X, Yang M, Huang T, Dou Z, Yu R, Xu Y (2020) Deep-person: learning discriminative deep features for person re-identification. Pattern Recogn 98:107036
Article Google Scholar
Benaim S, Wolf L (2018) One-shot unsupervised cross domain translation. In: NeurIPS, pp 2104–2114
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE TPAMI 35(8):1798–1828
Article Google Scholar
Bhattacharjee D, Kim S, Vizier G, Salzmann M (2020) Dunit: detection-based unsupervised image-to-image translation. In: CVPR
Chen YC, Xu X, Jia J (2020) Domain adaptive image-to-image translation. In: CVPR
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In: NIPS, pp 2172–2180
Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR, pp 8789–8797
Gonzalez-Garcia A, van de Weijer J, Bengio Y (2018) Image-to-image translation for cross-domain disentanglement. In: NeurIPS, pp 1287–1298
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: NIPS, pp 2672–2680
He H, Garcia EA (2008) Learning from imbalanced data. IEEE TKDE 21(9):1263–1284
Google Scholar
He H, Ma Y (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley, Oxford
Book MATH Google Scholar
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: NIPS, pp 6626–6637
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-VAE: learning basic visual concepts with a constrained variational framework. In: ICLR
Huang C, Li Y, Change Loy C, Tang X (2016) Learning deep representation for imbalanced classification. In: CVPR, pp 5375–5384
Huang X, Liu MY, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: ECCV, pp 172–189
Hu Q, Szabó A, Portenier T, Favaro P, Zwicker M (2018) Disentangling factors of variation by mixing them. In: CVPR, pp 3399–3407
Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: CVPR, pp 1125–1134
Jeong S, Kim Y, Lee E, Sohn K (2021) Memory-guided unsupervised image-to-image translation. In: CVPR, pp 6558–6567
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: ECCV, pp 694–711
Jolliffe I (2011) Principal component analysis. Springer, Berlin
MATH Google Scholar
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks. In: ICML, pp 1857–1865
Kingma DP, Mohamed S, Rezende DJ, Welling M (2014) Semi-supervised learning with deep generative models. In: NIPS, pp 3581–3589
LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR, pp 4681–4690
Lee HY, Tseng HY, Huang JB, Singh M, Yang MH (2018) Diverse image-to-image translation via disentangled representations. In: ECCV, pp 35–51
Lee W, Kim D, Hong S, Lee H (2020) High-fidelity synthesis with disentangled representation. In: ECCV, pp 157–174
Liao M, Lyu P, He M, Yao C, Wu W, Bai X (2019) Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: IEEE transactions on pattern analysis and machine intelligence
Liu MY, Tuzel O (2016) Coupled generative adversarial networks. In: NIPS, pp 469–477
Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: NIPS, pp 700–708
Liu MY, Huang X, Mallya A, Karras T, Aila T, Lehtinen J, Kautz J (2019) Few-shot unsupervised image-to-image translation. arXiv preprint arXiv:1905.01723
Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2018) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things J 5(4):2315–2322. https://doi.org/10.1109/JIOT.2017.2737479
Article Google Scholar
Lu H, Tang Y, Sun Y (2021) Drrs-bc: decentralized routing registration system based on blockchain. IEEE/CAA J Automat Sin 8(12):1868–1876. https://doi.org/10.1109/JAS.2021.1004204
Article Google Scholar
Lu H, Zhang M, Xu X, Li Y, Shen HT (2021) Deep fuzzy hashing network for efficient image retrieval. IEEE Trans Fuzzy Syst 29(1):166–176. https://doi.org/10.1109/TFUZZ.2020.2984991
Article Google Scholar
Lu H, Zhang Y, Li Y, Jiang C, Abbas H (2021) User-oriented virtual mobile network resource management for vehicle communications. IEEE Trans Intell Transp Syst 22(6):3521–3532. https://doi.org/10.1109/TITS.2020.2991766
Article Google Scholar
Maaten LVD, Hinton G (2008) Visualizing data using t-SNE. JMLR 9(Nov), 2579–2605
Ma L, Jia X, Georgoulis S, Tuytelaars T, Van Gool L (2019) Exemplar guided unsupervised image-to-image translation with semantic consistency. In: ICLR
Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2016) Adversarial autoencoders. In: ICLR
Mao Q, Lee HY, Tseng HY, Ma S, Yang MH (2019) Mode seeking generative adversarial networks for diverse image synthesis. In: CVPR
Mathieu MF, Zhao JJ, Zhao J, Ramesh A, Sprechmann P, LeCun Y (2016) Disentangling factors of variation in deep representation using adversarial training. In: NIPS, pp 5040–5048
Mo S, Cho M, Shin J (2019) Instance-aware image-to-image translation. In: International conference on learning representations . https://openreview.net/forum?id=ryxwJhC9YX
Press O, Galanti T, Benaim S, Wolf L (2019) Emerging disentanglement in auto-encoder based unsupervised image content transfer. In: ICLR
Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR
Shen Z, Huang M, Shi J, Xue X, Huang T (2019) Towards instance-level image-to-image translation. In: CVPR
Shu Z, Sahasrabudhe M, Alp Guler R, Samaras D, Paragios N, Kokkinos I (2018) Deforming autoencoders: unsupervised disentangling of shape and appearance. In: ECCV, pp 650–665
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR, pp 2818–2826
Taigman Y, Polyak A, Wolf L (2017) Unsupervised cross-domain image generation. In: ICLR
Yi Z, Zhang H, Tan P, Gong M (2017) DualGAN: Unsupervised dual learning for image-to-image translation. In: ICCV, pp 2849–2857
Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp 586–595
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp 2223–2232
Zhu JY, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, Shechtman E (2017) Toward multimodal image-to-image translation. In: NIPS, pp 465–476

Download references

Acknowledgements

Peng Liu and Yueyue Wang contributed equally to this work. This work was supported by the Natural Science Foundation of Shandong Province under Grant ZR2021MF080, the National Natural Science Foundation of China under Grant numbers 61771440, 32073029, the key project of Shandong Provincial Natural Science Foundation (ZR202010310016) and the postgraduate education quality improvement project of Shandong Province(SDYJG19134) .

Author information

Peng Liu and Yueyue Wang contribute equally to this work.

Authors and Affiliations

Computing Center, Ocean University of China, Qingdao, 266100, China
Peng Liu & Yueyue Wang
Department of Electronic Engineering, Ocean University of China, Qingdao, 266100, China
Angang Du, Liqiang Zhang, Zhaorui Gu & Haiyong Zheng
Department of Computer Science and Technology, Ocean University of China, Qingdao, 266100, China
Xiaodong Wang
The Affiliated Hospital of Qingdao University, Qingdao, 266000, China
Bin Wei
Shandong Key Laboratory of Digital Medicine and Computer Assisted Surgery, Qingdao, 266000, China
Bin Wei
College of Mechanical and Electrical Engineering, Qingdao Agricultural University, Qingdao, 266000, China
Juan Li

Authors

Peng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yueyue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Angang Du
View author publications
You can also search for this author in PubMed Google Scholar
Liqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Wei
View author publications
You can also search for this author in PubMed Google Scholar
Zhaorui Gu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Haiyong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Juan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhaorui Gu or Xiaodong Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, P., Wang, Y., Du, A. et al. Disentangling latent space better for few-shot image-to-image translation. Int. J. Mach. Learn. & Cyber. 14, 419–427 (2023). https://doi.org/10.1007/s13042-022-01552-4

Download citation

Received: 06 August 2021
Accepted: 19 March 2022
Published: 04 May 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s13042-022-01552-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Disentangling latent space better for few-shot image-to-image translation

Abstract

Access this article

Similar content being viewed by others

COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder

ManiFest: Manifold Deformation for Few-Shot Image Translation

Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Disentangling latent space better for few-shot image-to-image translation

Abstract

Access this article

Similar content being viewed by others

COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder

ManiFest: Manifold Deformation for Few-Shot Image Translation

Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation