Abstract
In an unpaired image-to-image translation, the main concept is to learn an underlying mapping between the source and target domains. Previous approaches required large numbers of data from both domains to learn this mapping. However, under a few-shot condition, that is, few-shot image-to-image translation, only one domain can meet the required number of data , and thus, the underlying mapping becomes ill-conditioned owing to the limited data as well as the imbalanced distribution of the two domains. We argue that a powerful model with a better disentangled representation of the latent space can better tackle the more challenging few-shot image-to-image translation . Motivated by this, under a partially-shared assumption, we propose a better disentanglement of the content and style latent space using a domain-specific style latent classifier and a domain-shared cross-content latent discriminator. Moreover, we design asymmetric weak/strong domain discriminators to achieve a better translation performance with limited data within the few-shot domain. Furthermore, our method can be easily embedded into any latent space disentangled model of an image-to-image translation for a few-shot setting. Subjective evaluation and objective evaluation results both show that compared with other state-of-the-art methods, the images synthesized by our method have higher fidelity while maintaining certain diversity.
Similar content being viewed by others
References
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN. In: ICML, pp 214–223
Bai X, Yang M, Huang T, Dou Z, Yu R, Xu Y (2020) Deep-person: learning discriminative deep features for person re-identification. Pattern Recogn 98:107036
Benaim S, Wolf L (2018) One-shot unsupervised cross domain translation. In: NeurIPS, pp 2104–2114
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE TPAMI 35(8):1798–1828
Bhattacharjee D, Kim S, Vizier G, Salzmann M (2020) Dunit: detection-based unsupervised image-to-image translation. In: CVPR
Chen YC, Xu X, Jia J (2020) Domain adaptive image-to-image translation. In: CVPR
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In: NIPS, pp 2172–2180
Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR, pp 8789–8797
Gonzalez-Garcia A, van de Weijer J, Bengio Y (2018) Image-to-image translation for cross-domain disentanglement. In: NeurIPS, pp 1287–1298
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: NIPS, pp 2672–2680
He H, Garcia EA (2008) Learning from imbalanced data. IEEE TKDE 21(9):1263–1284
He H, Ma Y (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley, Oxford
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: NIPS, pp 6626–6637
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-VAE: learning basic visual concepts with a constrained variational framework. In: ICLR
Huang C, Li Y, Change Loy C, Tang X (2016) Learning deep representation for imbalanced classification. In: CVPR, pp 5375–5384
Huang X, Liu MY, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: ECCV, pp 172–189
Hu Q, Szabó A, Portenier T, Favaro P, Zwicker M (2018) Disentangling factors of variation by mixing them. In: CVPR, pp 3399–3407
Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: CVPR, pp 1125–1134
Jeong S, Kim Y, Lee E, Sohn K (2021) Memory-guided unsupervised image-to-image translation. In: CVPR, pp 6558–6567
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: ECCV, pp 694–711
Jolliffe I (2011) Principal component analysis. Springer, Berlin
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks. In: ICML, pp 1857–1865
Kingma DP, Mohamed S, Rezende DJ, Welling M (2014) Semi-supervised learning with deep generative models. In: NIPS, pp 3581–3589
LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR, pp 4681–4690
Lee HY, Tseng HY, Huang JB, Singh M, Yang MH (2018) Diverse image-to-image translation via disentangled representations. In: ECCV, pp 35–51
Lee W, Kim D, Hong S, Lee H (2020) High-fidelity synthesis with disentangled representation. In: ECCV, pp 157–174
Liao M, Lyu P, He M, Yao C, Wu W, Bai X (2019) Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: IEEE transactions on pattern analysis and machine intelligence
Liu MY, Tuzel O (2016) Coupled generative adversarial networks. In: NIPS, pp 469–477
Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: NIPS, pp 700–708
Liu MY, Huang X, Mallya A, Karras T, Aila T, Lehtinen J, Kautz J (2019) Few-shot unsupervised image-to-image translation. arXiv preprint arXiv:1905.01723
Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2018) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things J 5(4):2315–2322. https://doi.org/10.1109/JIOT.2017.2737479
Lu H, Tang Y, Sun Y (2021) Drrs-bc: decentralized routing registration system based on blockchain. IEEE/CAA J Automat Sin 8(12):1868–1876. https://doi.org/10.1109/JAS.2021.1004204
Lu H, Zhang M, Xu X, Li Y, Shen HT (2021) Deep fuzzy hashing network for efficient image retrieval. IEEE Trans Fuzzy Syst 29(1):166–176. https://doi.org/10.1109/TFUZZ.2020.2984991
Lu H, Zhang Y, Li Y, Jiang C, Abbas H (2021) User-oriented virtual mobile network resource management for vehicle communications. IEEE Trans Intell Transp Syst 22(6):3521–3532. https://doi.org/10.1109/TITS.2020.2991766
Maaten LVD, Hinton G (2008) Visualizing data using t-SNE. JMLR 9(Nov), 2579–2605
Ma L, Jia X, Georgoulis S, Tuytelaars T, Van Gool L (2019) Exemplar guided unsupervised image-to-image translation with semantic consistency. In: ICLR
Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2016) Adversarial autoencoders. In: ICLR
Mao Q, Lee HY, Tseng HY, Ma S, Yang MH (2019) Mode seeking generative adversarial networks for diverse image synthesis. In: CVPR
Mathieu MF, Zhao JJ, Zhao J, Ramesh A, Sprechmann P, LeCun Y (2016) Disentangling factors of variation in deep representation using adversarial training. In: NIPS, pp 5040–5048
Mo S, Cho M, Shin J (2019) Instance-aware image-to-image translation. In: International conference on learning representations . https://openreview.net/forum?id=ryxwJhC9YX
Press O, Galanti T, Benaim S, Wolf L (2019) Emerging disentanglement in auto-encoder based unsupervised image content transfer. In: ICLR
Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR
Shen Z, Huang M, Shi J, Xue X, Huang T (2019) Towards instance-level image-to-image translation. In: CVPR
Shu Z, Sahasrabudhe M, Alp Guler R, Samaras D, Paragios N, Kokkinos I (2018) Deforming autoencoders: unsupervised disentangling of shape and appearance. In: ECCV, pp 650–665
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR, pp 2818–2826
Taigman Y, Polyak A, Wolf L (2017) Unsupervised cross-domain image generation. In: ICLR
Yi Z, Zhang H, Tan P, Gong M (2017) DualGAN: Unsupervised dual learning for image-to-image translation. In: ICCV, pp 2849–2857
Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp 586–595
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp 2223–2232
Zhu JY, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, Shechtman E (2017) Toward multimodal image-to-image translation. In: NIPS, pp 465–476
Acknowledgements
Peng Liu and Yueyue Wang contributed equally to this work. This work was supported by the Natural Science Foundation of Shandong Province under Grant ZR2021MF080, the National Natural Science Foundation of China under Grant numbers 61771440, 32073029, the key project of Shandong Provincial Natural Science Foundation (ZR202010310016) and the postgraduate education quality improvement project of Shandong Province(SDYJG19134) .
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, P., Wang, Y., Du, A. et al. Disentangling latent space better for few-shot image-to-image translation. Int. J. Mach. Learn. & Cyber. 14, 419–427 (2023). https://doi.org/10.1007/s13042-022-01552-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01552-4