Skip to main content
Log in

Discriminative Region Proposal Adversarial Network for High-Quality Image-to-Image Translation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Image-to-image translation has been made much progress with embracing Generative Adversarial Networks (GANs). However, it’s still very challenging for translation tasks that require high quality, especially at high-resolution and photo-reality. In this work, we present Discriminative Region Proposal Adversarial Network (DRPAN) for high-quality image-to-image translation. We decompose the image-to-image translation procedure into three iterated steps: the first is to generate an image with global structure but some local artifacts (via GAN), the second is to use our Discriminative Region Proposal network (DRPnet) for proposing the most fake region from the generated image, and the third is to implement “image inpainting” on the most fake region for yielding more realistic result through a reviser, so that the system (DRPAN) can be gradually optimized to synthesize images with more attention on the most artifact local part. We explore patch-based GAN to construct DRPnet for proposing the discriminative region to produce masked fake samples, further, we propose a reviser for GANs to distinguish real from masked fake for providing constructive revisions to the generator for producing realistic details, and serve as auxiliaries of the generator to synthesize high-quality results. In addition, we combine pix2pixHD with DRPAN to synthesize high-resolution results with much finer details. Moreover, we improve CycleGAN by DRPAN to address unpaired image-to-image translation with better semantic alignment. Experiments on a variety of paired and unpaired image-to-image translation tasks validate that our method outperforms the state of the art for synthesizing high-quality translation results in terms of both human perceptual studies and automatic quantitative measures. Our code is available at https://github.com/godisboy/DRPAN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In ICML (pp. 214–223).

  • Arora, S., Ge, R., Liang, Y., Ma, T., & Zhang, Y. (2017). Generalization and equilibrium in generative adversarial nets (GANs). In ICML (pp. 224–232).

  • Baroncini, V., Capodiferro, L., Di Claudio, E. D., & Jacovitti, G. (2009). The polar edge coherence: A quasi blind metric for video quality assessment. In ESPC (pp. 564–568).

  • Borji, A. (2019). Pros and cons of GAN evaluation measures. CVIU, 179, 41–65.

    Google Scholar 

  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2015). Semantic image segmentation with deep convolutional nets and fully connected CRFs. In ICLR.

  • Chen, Q., & Koltun, V. (2017). Photographic image synthesis with cascaded refinement networks. In ICCV (pp. 1511–1520).

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The Cityscapes dataset for semantic urban scene understanding. In CVPR (pp. 3213–3223).

  • Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). Generative adversarial networks: An overview. IEEE SPM, 35(1), 53–65.

    Google Scholar 

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR (pp. 248–255).

  • Deshpande, A., Rock, J., & Forsyth, D. (2015). Learning large-scale automatic image colorization. In ICCV (pp. 567–575).

  • Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image super-resolution using deep convolutional networks. IEEE TPAMI, 38(2), 295–307.

    Article  Google Scholar 

  • Dosovitskiy, A., & Brox, T. (2016). Generating images with perceptual similarity metrics based on deep networks. In NIPS (pp. 658–666).

  • Durugkar, I., Gemp, I., & Mahadevan, S. (2017). Generative multi-adversarial networks. In ICLR (pp. 1–14).

  • Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A neural algorithm of artistic style. arXiv:1508.06576.

  • Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In CVPR (pp. 2414–2423).

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680).

  • Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of Wasserstein GANs. In NIPS (pp. 5767–5777).

  • Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In NIPS (pp. 6626–6637).

  • Hong, Y., Hwang, U., Yoo, J., & Yoon, S. (2019). How generative adversarial networks and their variants work: An overview. ACM CSUR, 52(1), 10:1–10:43.

    Google Scholar 

  • Huang, H., Yu, P. S., & Wang, C. (2018). An introduction to image synthesis with generative adversarial nets. arXiv:1803.04469

  • Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., & Belongie, S. (2017). Stacked generative adversarial networks. In CVPR (pp. 5077–5086).

  • Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In CVPR (pp. 5967–5976).

  • Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In ECCV (pp. 694–711).

  • Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of GANs for improved quality, stability, and variation. In ICLR (pp. 1–26).

  • Kim, J., Kwon Lee, J., & Mu Lee, K. (2016). Accurate image super-resolution using very deep convolutional networks. In CVPR (pp. 1646–1654).

  • Kim, T., Cha, M., Kim, H., Lee, J. K., & Kim, J. (2017). Learning to discover cross-domain relations with generative adversarial networks. In ICML (pp. 1857–1865).

  • Kodali, N., Abernethy, J., Hays, J., & Kira, Z. (2017). On convergence and stability of GANs. arXiv:1705.07215

  • Kurach, K., Lucic, M., Zhai, X., Michalski, M., & Gelly, S. (2018). The GAN landscape: Losses, architectures, regularization, and normalization. arXiv:1807.04720

  • Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In CVPR (pp. 4681–4690).

  • Li, C., & Wand, M. (2016). Precomputed real-time texture synthesis with markovian generative adversarial networks. In ECCV (pp. 702–716).

  • Li, Y., Liu, S., Yang, J., & Yang, M. H. (2017). Generative face completion. In CVPR (pp. 5892–5900).

  • Lin, G., Milan, A., Shen, C., & Reid, I. (2017). RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In CVPR (pp. 1925–1934).

  • Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In NIPS.

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR (pp. 3431–3440).

  • Lucic, M., Kurach, K., Michalski, M., Gelly, S., & Bousquet, O. (2018). Are GANs created equal? A large-scale study. In NeurIPS (pp. 698–707).

  • Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv:1411.1784

  • Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In ICML (pp. 807–814).

  • Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., & Yosinski, J. (2017). Plug & play generative networks: Conditional iterative generation of images in latent space. In CVPR (pp. 4467–4477).

  • Odena, A., Olah, C., & Shlens, J. (2017). Conditional image synthesis with auxiliary classifier GANs. In ICML (pp. 2642–2651).

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42(3), 145–175.

    Article  Google Scholar 

  • Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., & Efros, A. A. (2016). Context encoders: Feature learning by inpainting. In CVPR (pp. 2536–2544).

  • Qi, G. J. (2017) Loss-sensitive generative adversarial networks on Lipschitz densities. arXiv:1701.06264

  • Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR (pp. 1–16).

  • Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative adversarial text to image synthesis. In ICML (pp. 1060–1069).

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In MICCAI (pp. 234–241).

  • Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training GANs. In NIPS (pp. 2234–2242).

  • Sheikh, H. R., & Bovik, A. C. (2006). Image information and visual quality. IEEE TIP, 15(2), 430–444.

    Google Scholar 

  • Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., Rueckert, D., & Wang, Z. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR (pp. 1874–1883).

  • Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In CVPR (pp. 2107–2116).

  • Tyleček, R., & Šára, R. (2013). Spatial pattern templates for recognition of objects with regular structure. In GCPR (pp. 364–374).

  • Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2016). Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022

  • Wang, C., Xu, C., Wang, C., & Tao, D. (2018a). Perceptual adversarial networks for image-to-image transformation. IEEE TIP, 27(8), 4066–4079.

    MathSciNet  MATH  Google Scholar 

  • Wang, C., Zheng, H., Yu, Z., Zheng, Z., Gu, Z., & Zheng, B. (2018b). Discriminative region proposal adversarial networks for high-quality image-to-image translation. In Proceedings of the European conference on computer vision (ECCV) (pp. 770–785).

  • Wang, T. C., Liu, M. Y., Zhu, J. Y., Tao, A., Kautz, J., Catanzaro, B. (2018c). High-resolution image synthesis and semantic manipulation with conditional GANs. In CVPR (pp. 8798–8807).

  • Wang, X., & Gupta, A. (2016). Generative image modeling using style and structure adversarial networks. In ECCV (pp. 318–335).

  • Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE TIP, 13(4), 600–612.

    Google Scholar 

  • Yi, Z., Zhang, H., Tan, P., & Gong, M. (2017). DualGAN: Unsupervised dual learning for image-to-image translation. In ICCV (pp. 2868–2876).

  • Yu, A., & Grauman, K. (2014). Fine-grained visual comparisons with local learning. In CVPR (pp. 192–199).

  • Zhang, H., Sindagi, V., & Patel, V. M. (2017a). Image de-raining using a conditional generative adversarial network. arXiv:1701.05957

  • Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D. N. (2017b). StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV (pp. 5907–5915).

  • Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. (2018a). StackGAN++: Realistic image synthesis with stacked generative adversarial networks. IEEE TPAMI, 41(8), 1947–1962.

  • Zhang, R., Isola, P., & Efros, A. A. (2016). Colorful image colorization. In ECCV (pp. 649–666).

  • Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018b). The unreasonable effectiveness of deep features as a perceptual metric. In CVPR (pp. 586–595).

  • Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., & Efros, A. A. (2016). Learning dense correspondence via 3d-guided cycle consistency. In CVPR (pp. 117–126).

  • Zhu, J. Y., Krähenbühl, P., Shechtman, E., & Efros, A. A. (2016). Generative visual manipulation on the natural image manifold. In ECCV (pp. 597–613).

  • Zhu, J. Y., Park, T., Isola, P., Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV (pp. 2242–2251).

Download references

Acknowledgements

The authors would like to thank the pioneer researchers in GAN and image-to-image translation fields. The authors would also like to express their sincere appreciation to the guest editors and anonymous reviewers. This work was supported in part by the National Natural Science Foundation of China under Grants 61771440 and 41776113, in part by the China Scholarship Council under Grant 201806335022, and in part by the Qingdao Municipal Science and Technology Program under Grant 17-1-1-5-jch.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Haiyong Zheng or Zhibin Yu.

Additional information

Communicated by Jun-Yan Zhu, Hongsheng Li, Eli Shechtman, Ming-Yu Liu, Jan Kautz, Antonio Torralba.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Niu, W., Jiang, Y. et al. Discriminative Region Proposal Adversarial Network for High-Quality Image-to-Image Translation. Int J Comput Vis 128, 2366–2385 (2020). https://doi.org/10.1007/s11263-019-01273-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-019-01273-2

Keywords

Navigation