Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks

  • Minjun Li
  • Haozhi Huang
  • Lin Ma
  • Wei Liu
  • Tong Zhang
  • Yugang JiangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)


Recent studies on unsupervised image-to-image translation have made remarkable progress by training a pair of generative adversarial networks with a cycle-consistent loss. However, such unsupervised methods may generate inferior results when the image resolution is high or the two image domains are of significant appearance differences, such as the translations between semantic layouts and natural images in the Cityscapes dataset. In this paper, we propose novel Stacked Cycle-Consistent Adversarial Networks (SCANs) by decomposing a single translation into multi-stage transformations, which not only boost the image translation quality but also enable higher resolution image-to-image translation in a coarse-to-fine fashion. Moreover, to properly exploit the information from the previous stage, an adaptive fusion block is devised to learn a dynamic integration of the current stage’s output and the previous stage’s output. Experiments on multiple datasets demonstrate that our proposed approach can improve the translation quality compared with previous single-stage unsupervised methods.


Image-to-image translation Unsupervised learning Genearative adverserial network (GAN) 



This work was supported by two projects from NSFC (#61622204 and #61572134) and two projects from STCSM (#16JC1420401 and #16QA1400500).


  1. 1.
    Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: Proceedings of ICCV (2017)Google Scholar
  2. 2.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of CVPR (2016)Google Scholar
  3. 3.
    Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Proceedings of NIPS (2015)Google Scholar
  4. 4.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of ICCV (2015)Google Scholar
  5. 5.
    Goodfellow, I., et al.: Generative adversarial nets. In: Proceedings of NIPS (2014)Google Scholar
  6. 6.
    Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans. Graph. (TOG) 35, 110 (2016)CrossRefGoogle Scholar
  7. 7.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of CVPR (2017)Google Scholar
  8. 8.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). Scholar
  9. 9.
    Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
  10. 10.
    Kim, T., Cha, M., Kim, H., Lee, J., Kim, J.: Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of ICML (2017)Google Scholar
  11. 11.
    Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of ICLR (2014)Google Scholar
  12. 12.
    Kodali, N., Abernethy, J., Hays, J., Kira, Z.: On convergence and stability of GANs. arXiv preprint arXiv:1705.07215 (2017)
  13. 13.
    Laffont, P.Y., Ren, Z., Tao, X., Qian, C., Hays, J.: Transient attributes for high-level understanding and editing of outdoor scenes. ACM Trans. Graph. (TOG) 33, 149 (2014)CrossRefGoogle Scholar
  14. 14.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of CVPR (2017)Google Scholar
  15. 15.
    Liang, X., Zhang, H., Xing, E.P.: Generative semantic manipulation with contrasting GAN. In: Proceedings of NIPS (2017)Google Scholar
  16. 16.
    Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Proceedings of NIPS (2017)Google Scholar
  17. 17.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of CVPR (2015)Google Scholar
  18. 18.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
  19. 19.
    Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts. Distill 1(10), e3 (2016)CrossRefGoogle Scholar
  20. 20.
    Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of CVPR (2016)Google Scholar
  21. 21.
    Simo-Serra, E., Iizuka, S., Sasaki, K., Ishikawa, H.: Learning to simplify: fully convolutional networks for rough sketch cleanup. ACM Trans. Graph. (TOG) 35(4), 121 (2016)CrossRefGoogle Scholar
  22. 22.
    Taigman, Y., Polyak, A., Wolf, L.: Unsupervised cross-domain image generation. In: Proceedings of ICLR (2016)Google Scholar
  23. 23.
    Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of CVPR (2018)Google Scholar
  24. 24.
    Wang, X., Gupta, A.: Generative image modeling using style and structure adversarial networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 318–335. Springer, Cham (2016). Scholar
  25. 25.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. (TIP) 13(4), 600–612 (2004)CrossRefGoogle Scholar
  26. 26.
    Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of ICCV (2015)Google Scholar
  27. 27.
    Yi, Z., Zhang, H., Gong, P.T., et al.: DualGAN: unsupervised dual learning for image-to-image translation. In: Proceedings of ICCV (2017)Google Scholar
  28. 28.
    Xiong, Z., Luo, W., Ma, L., Liu, W., Luo, J.: Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks. In: Proceedings of CVPR (2018)Google Scholar
  29. 29.
    Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of ICCV (2016)Google Scholar
  30. 30.
    Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). Scholar
  31. 31.
    Zhang, R., et al.: Real-time user-guided image colorization with learned deep priors. ACM Trans. Graph. (TOG) (2017)Google Scholar
  32. 32.
    Zhao, B., Chang, B., Jie, Z., Feng, J.: Modular generative adversarial networks. arXiv preprint arXiv:1804.03343 (2018)
  33. 33.
    Zhao, B., Wu, X., Cheng, Z.Q., Liu, H., Jie, Z., Feng, J.: Multi-view image generation from a single-view. arXiv preprint arXiv:1704.04886 (2017)
  34. 34.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of ICCV (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Minjun Li
    • 1
    • 2
  • Haozhi Huang
    • 2
  • Lin Ma
    • 2
  • Wei Liu
    • 2
  • Tong Zhang
    • 2
  • Yugang Jiang
    • 1
    Email author
  1. 1.Shanghai Key Lab of Intelligent Information Processing, School of Computer ScienceFudan UniversityShanghaiChina
  2. 2.Tencent AI LabBellevueUSA

Personalised recommendations