Discriminative Region Proposal Adversarial Network for High-Quality Image-to-Image Translation

  • Chao Wang
  • Wenjie Niu
  • Yufeng Jiang
  • Haiyong ZhengEmail author
  • Zhibin YuEmail author
  • Zhaorui Gu
  • Bing Zheng
Part of the following topical collections:
  1. Special Issue on Generative Adversarial Networks for Computer Vision


Image-to-image translation has been made much progress with embracing Generative Adversarial Networks (GANs). However, it’s still very challenging for translation tasks that require high quality, especially at high-resolution and photo-reality. In this work, we present Discriminative Region Proposal Adversarial Network (DRPAN) for high-quality image-to-image translation. We decompose the image-to-image translation procedure into three iterated steps: the first is to generate an image with global structure but some local artifacts (via GAN), the second is to use our Discriminative Region Proposal network (DRPnet) for proposing the most fake region from the generated image, and the third is to implement “image inpainting” on the most fake region for yielding more realistic result through a reviser, so that the system (DRPAN) can be gradually optimized to synthesize images with more attention on the most artifact local part. We explore patch-based GAN to construct DRPnet for proposing the discriminative region to produce masked fake samples, further, we propose a reviser for GANs to distinguish real from masked fake for providing constructive revisions to the generator for producing realistic details, and serve as auxiliaries of the generator to synthesize high-quality results. In addition, we combine pix2pixHD with DRPAN to synthesize high-resolution results with much finer details. Moreover, we improve CycleGAN by DRPAN to address unpaired image-to-image translation with better semantic alignment. Experiments on a variety of paired and unpaired image-to-image translation tasks validate that our method outperforms the state of the art for synthesizing high-quality translation results in terms of both human perceptual studies and automatic quantitative measures. Our code is available at


Image-to-image translation GAN Pix2pix Pix2pixHD CycleGAN DRPAN 



The authors would like to thank the pioneer researchers in GAN and image-to-image translation fields. The authors would also like to express their sincere appreciation to the guest editors and anonymous reviewers. This work was supported in part by the National Natural Science Foundation of China under Grants 61771440 and 41776113, in part by the China Scholarship Council under Grant 201806335022, and in part by the Qingdao Municipal Science and Technology Program under Grant 17-1-1-5-jch.


  1. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In ICML (pp. 214–223).Google Scholar
  2. Arora, S., Ge, R., Liang, Y., Ma, T., & Zhang, Y. (2017). Generalization and equilibrium in generative adversarial nets (GANs). In ICML (pp. 224–232).Google Scholar
  3. Baroncini, V., Capodiferro, L., Di Claudio, E. D., & Jacovitti, G. (2009). The polar edge coherence: A quasi blind metric for video quality assessment. In ESPC (pp. 564–568).Google Scholar
  4. Borji, A. (2019). Pros and cons of GAN evaluation measures. CVIU, 179, 41–65.Google Scholar
  5. Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2015). Semantic image segmentation with deep convolutional nets and fully connected CRFs. In ICLR.Google Scholar
  6. Chen, Q., & Koltun, V. (2017). Photographic image synthesis with cascaded refinement networks. In ICCV (pp. 1511–1520).Google Scholar
  7. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The Cityscapes dataset for semantic urban scene understanding. In CVPR (pp. 3213–3223).Google Scholar
  8. Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). Generative adversarial networks: An overview. IEEE SPM, 35(1), 53–65.CrossRefGoogle Scholar
  9. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR (pp. 248–255).Google Scholar
  10. Deshpande, A., Rock, J., & Forsyth, D. (2015). Learning large-scale automatic image colorization. In ICCV (pp. 567–575).Google Scholar
  11. Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image super-resolution using deep convolutional networks. IEEE TPAMI, 38(2), 295–307.CrossRefGoogle Scholar
  12. Dosovitskiy, A., & Brox, T. (2016). Generating images with perceptual similarity metrics based on deep networks. In NIPS (pp. 658–666).Google Scholar
  13. Durugkar, I., Gemp, I., & Mahadevan, S. (2017). Generative multi-adversarial networks. In ICLR (pp. 1–14).Google Scholar
  14. Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A neural algorithm of artistic style. arXiv:1508.06576.
  15. Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In CVPR (pp. 2414–2423).Google Scholar
  16. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680).Google Scholar
  17. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of Wasserstein GANs. In NIPS (pp. 5767–5777).Google Scholar
  18. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In NIPS (pp. 6626–6637).Google Scholar
  19. Hong, Y., Hwang, U., Yoo, J., & Yoon, S. (2019). How generative adversarial networks and their variants work: An overview. ACM CSUR, 52(1), 10:1–10:43.Google Scholar
  20. Huang, H., Yu, P. S., & Wang, C. (2018). An introduction to image synthesis with generative adversarial nets. arXiv:1803.04469
  21. Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., & Belongie, S. (2017). Stacked generative adversarial networks. In CVPR (pp. 5077–5086).Google Scholar
  22. Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In CVPR (pp. 5967–5976).Google Scholar
  23. Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In ECCV (pp. 694–711).Google Scholar
  24. Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of GANs for improved quality, stability, and variation. In ICLR (pp. 1–26).Google Scholar
  25. Kim, J., Kwon Lee, J., & Mu Lee, K. (2016). Accurate image super-resolution using very deep convolutional networks. In CVPR (pp. 1646–1654).Google Scholar
  26. Kim, T., Cha, M., Kim, H., Lee, J. K., & Kim, J. (2017). Learning to discover cross-domain relations with generative adversarial networks. In ICML (pp. 1857–1865).Google Scholar
  27. Kodali, N., Abernethy, J., Hays, J., & Kira, Z. (2017). On convergence and stability of GANs. arXiv:1705.07215
  28. Kurach, K., Lucic, M., Zhai, X., Michalski, M., & Gelly, S. (2018). The GAN landscape: Losses, architectures, regularization, and normalization. arXiv:1807.04720
  29. Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In CVPR (pp. 4681–4690).Google Scholar
  30. Li, C., & Wand, M. (2016). Precomputed real-time texture synthesis with markovian generative adversarial networks. In ECCV (pp. 702–716).Google Scholar
  31. Li, Y., Liu, S., Yang, J., & Yang, M. H. (2017). Generative face completion. In CVPR (pp. 5892–5900).Google Scholar
  32. Lin, G., Milan, A., Shen, C., & Reid, I. (2017). RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In CVPR (pp. 1925–1934).Google Scholar
  33. Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In NIPS.Google Scholar
  34. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR (pp. 3431–3440).Google Scholar
  35. Lucic, M., Kurach, K., Michalski, M., Gelly, S., & Bousquet, O. (2018). Are GANs created equal? A large-scale study. In NeurIPS (pp. 698–707).Google Scholar
  36. Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv:1411.1784
  37. Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In ICML (pp. 807–814).Google Scholar
  38. Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., & Yosinski, J. (2017). Plug & play generative networks: Conditional iterative generation of images in latent space. In CVPR (pp. 4467–4477).Google Scholar
  39. Odena, A., Olah, C., & Shlens, J. (2017). Conditional image synthesis with auxiliary classifier GANs. In ICML (pp. 2642–2651).Google Scholar
  40. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42(3), 145–175.CrossRefGoogle Scholar
  41. Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., & Efros, A. A. (2016). Context encoders: Feature learning by inpainting. In CVPR (pp. 2536–2544).Google Scholar
  42. Qi, G. J. (2017) Loss-sensitive generative adversarial networks on Lipschitz densities. arXiv:1701.06264
  43. Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR (pp. 1–16).Google Scholar
  44. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative adversarial text to image synthesis. In ICML (pp. 1060–1069).Google Scholar
  45. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In MICCAI (pp. 234–241).Google Scholar
  46. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training GANs. In NIPS (pp. 2234–2242).Google Scholar
  47. Sheikh, H. R., & Bovik, A. C. (2006). Image information and visual quality. IEEE TIP, 15(2), 430–444.Google Scholar
  48. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., Rueckert, D., & Wang, Z. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR (pp. 1874–1883).Google Scholar
  49. Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In CVPR (pp. 2107–2116).Google Scholar
  50. Tyleček, R., & Šára, R. (2013). Spatial pattern templates for recognition of objects with regular structure. In GCPR (pp. 364–374).Google Scholar
  51. Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2016). Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022
  52. Wang, C., Xu, C., Wang, C., & Tao, D. (2018a). Perceptual adversarial networks for image-to-image transformation. IEEE TIP, 27(8), 4066–4079.MathSciNetzbMATHGoogle Scholar
  53. Wang, C., Zheng, H., Yu, Z., Zheng, Z., Gu, Z., & Zheng, B. (2018b). Discriminative region proposal adversarial networks for high-quality image-to-image translation. In Proceedings of the European conference on computer vision (ECCV) (pp. 770–785).Google Scholar
  54. Wang, T. C., Liu, M. Y., Zhu, J. Y., Tao, A., Kautz, J., Catanzaro, B. (2018c). High-resolution image synthesis and semantic manipulation with conditional GANs. In CVPR (pp. 8798–8807).Google Scholar
  55. Wang, X., & Gupta, A. (2016). Generative image modeling using style and structure adversarial networks. In ECCV (pp. 318–335).Google Scholar
  56. Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE TIP, 13(4), 600–612.Google Scholar
  57. Yi, Z., Zhang, H., Tan, P., & Gong, M. (2017). DualGAN: Unsupervised dual learning for image-to-image translation. In ICCV (pp. 2868–2876).Google Scholar
  58. Yu, A., & Grauman, K. (2014). Fine-grained visual comparisons with local learning. In CVPR (pp. 192–199).Google Scholar
  59. Zhang, H., Sindagi, V., & Patel, V. M. (2017a). Image de-raining using a conditional generative adversarial network. arXiv:1701.05957
  60. Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D. N. (2017b). StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV (pp. 5907–5915).Google Scholar
  61. Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. (2018a). StackGAN++: Realistic image synthesis with stacked generative adversarial networks. IEEE TPAMI, 41(8), 1947–1962. Google Scholar
  62. Zhang, R., Isola, P., & Efros, A. A. (2016). Colorful image colorization. In ECCV (pp. 649–666).Google Scholar
  63. Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018b). The unreasonable effectiveness of deep features as a perceptual metric. In CVPR (pp. 586–595).Google Scholar
  64. Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., & Efros, A. A. (2016). Learning dense correspondence via 3d-guided cycle consistency. In CVPR (pp. 117–126).Google Scholar
  65. Zhu, J. Y., Krähenbühl, P., Shechtman, E., & Efros, A. A. (2016). Generative visual manipulation on the natural image manifold. In ECCV (pp. 597–613).Google Scholar
  66. Zhu, J. Y., Park, T., Isola, P., Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV (pp. 2242–2251).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electronic EngineeringOcean University of ChinaQingdaoChina
  2. 2.Department of MathematicsUniversity of DundeeDundeeUK

Personalised recommendations