Model-Based Occlusion Disentanglement for Image-to-Image Translation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12365)


Image-to-image translation is affected by entanglement phenomena, which may occur in case of target data encompassing occlusions such as raindrops, dirt, etc. Our unsupervised model-based learning disentangles scene and occlusions, while benefiting from an adversarial pipeline to regress physical parameters of the occlusion model. The experiments demonstrate our method is able to handle varying types of occlusions and generate highly realistic translations, qualitatively and quantitatively outperforming the state-of-the-art on multiple datasets.


GAN Image-to-image translation Occlusions Raindrop Soil 

Supplementary material

504476_1_En_27_MOESM1_ESM.pdf (19.5 mb)
Supplementary material 1 (pdf 20010 KB)


  1. 1.
  2. 2.
    Alletto, S., Carlin, C., Rigazio, L., Ishii, Y., Tsukizawa, S.: Adherent raindrop removal with self-supervised attention maps and spatio-temporal generative adversarial networks. In: ICCV Workshops (2019)Google Scholar
  3. 3.
    Anoosheh, A., Agustsson, E., Timofte, R., Van Gool, L.: ComboGAN: unrestrained scalability for image domain translation. In: CVPR Workshops (2018)Google Scholar
  4. 4.
    Bi, S., Sunkavalli, K., Perazzi, F., Shechtman, E., Kim, V.G., Ramamoorthi, R.: Deep CG2Real: Synthetic-to-real translation via image disentanglement. In: ICCV (2019)Google Scholar
  5. 5.
    Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR (2020)Google Scholar
  6. 6.
    Cherian, A., Sullivan, A.: Sem-GAN: semantically-consistent image-to-image translation. In: WACV (2019)Google Scholar
  7. 7.
    Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR (2018)Google Scholar
  8. 8.
    Cord, A., Aubert, D.: Towards rain detection through use of in-vehicle multipurpose cameras. In: IV (2011)Google Scholar
  9. 9.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)Google Scholar
  10. 10.
    Gu, J., Ramamoorthi, R., Belhumeur, P., Nayar, S.: Removing image artifacts due to dirty camera lenses and thin occluders. In: SIGGRAPH Asia (2009)Google Scholar
  11. 11.
    Halder, S.S., Lalonde, J.F., de Charette, R.: Physics-based rendering for improving robustness to rain. In: ICCV (2019)Google Scholar
  12. 12.
    Halimeh, J.C., Roser, M.: Raindrop detection on car windshields using geometric-photometric environment construction and intensity-based correlation. In: IV (2009)Google Scholar
  13. 13.
    Hao, Z., You, S., Li, Y., Li, K., Lu, F.: Learning from synthetic photorealistic raindrop for single image raindrop removal. In: ICCV Workshops (2019)Google Scholar
  14. 14.
    Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). Scholar
  15. 15.
    Hui, L., Li, X., Chen, J., He, H., Yang, J.: Unsupervised multi-domain image translation with domain-specific encoders/decoders. In: ICPR (2018)Google Scholar
  16. 16.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)Google Scholar
  17. 17.
    Kim, J., Kim, M., Kang, H., Lee, K.: U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In: ICLR (2020)Google Scholar
  18. 18.
    Lee, H.Y., et al.: DRIT++: diverse image-to-image translation via disentangled representations. arXiv preprint arXiv:1905.01270 (2019)
  19. 19.
    Li, P., Liang, X., Jia, D., Xing, E.P.: Semantic-aware grad-GAN for virtual-to-real urban scene adaption. BMVC (2018)Google Scholar
  20. 20.
    Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: NeurIPS (2017)Google Scholar
  21. 21.
    Liu, M.Y., et al.: Few-shot unsupervised image-to-image translation. In: ICCV (2019)Google Scholar
  22. 22.
    Ma, S., Fu, J., Wen Chen, C., Mei, T.: DA-GAN: instance-level image translation by deep attention generative adversarial networks. In: CVPR (2018)Google Scholar
  23. 23.
    Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: ICCV (2017)Google Scholar
  24. 24.
    Mejjati, Y.A., Richardt, C., Tompkin, J., Cosker, D., Kim, K.I.: Unsupervised attention-guided image-to-image translation. In: NeurIPS (2018)Google Scholar
  25. 25.
    Mo, S., Cho, M., Shin, J.: InstaGAN: instance-aware image-to-image translation. In: ICLR (2019)Google Scholar
  26. 26.
    Pentland, A.P.: A new sense for depth of field. T-PAMI (1987)Google Scholar
  27. 27.
    Pizzati, F., de Charette, R., Zaccaria, M., Cerri, P.: Domain bridge for unpaired image-to-image translation and unsupervised domain adaptation. In: WACV (2020)Google Scholar
  28. 28.
    Porav, H., Bruls, T., Newman, P.: I can see clearly now: image restoration via de-raining. In: ICRA (2019)Google Scholar
  29. 29.
    Qu, Y., Chen, Y., Huang, J., Xie, Y.: Enhanced pix2pix dehazing network. In: CVPR (2019)Google Scholar
  30. 30.
    Ramirez, P.Z., Tonioni, A., Di Stefano, L.: Exploiting semantics in adversarial training for image-level domain adaptation. In: IPAS (2018)Google Scholar
  31. 31.
    Riba, E., Mishkin, D., Ponsa, D., Rublee, E., Bradski, G.: Kornia: an open source differentiable computer vision library for PyTorch. In: WACV (2020)Google Scholar
  32. 32.
    Romero, A., Arbeláez, P., Van Gool, L., Timofte, R.: SMIT: stochastic multi-label image-to-image translation. In: ICCV Workshops (2019)Google Scholar
  33. 33.
    Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR (2016)Google Scholar
  34. 34.
    Roser, M., Geiger, A.: Video-based raindrop detection for improved image registration. In: ICCV Workshops (2009)Google Scholar
  35. 35.
    Roser, M., Kurz, J., Geiger, A.: Realistic modeling of water droplets for monocular adherent raindrop recognition using Bezier curves. In: ACCV (2010)Google Scholar
  36. 36.
    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: NeurIPS (2016)Google Scholar
  37. 37.
    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)Google Scholar
  38. 38.
    Shen, Z., Huang, M., Shi, J., Xue, X., Huang, T.S.: Towards instance-level image-to-image translation. In: CVPR (2019)Google Scholar
  39. 39.
    Singh, K.K., Ojha, U., Lee, Y.J.: FineGAN: unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In: CVPR (2019)Google Scholar
  40. 40.
    Tang, H., Xu, D., Sebe, N., Yan, Y.: Attention-guided generative adversarial networks for unsupervised image-to-image translation. In: International Joint Conference on Neural Networks (IJCNN) (2019)Google Scholar
  41. 41.
    Tang, H., Xu, D., Yan, Y., Corso, J.J., Torr, P.H., Sebe, N.: Multi-channel attention selection GANs for guided image-to-image translation. In: CVPR (2019)Google Scholar
  42. 42.
    Uricar, M., et al.: Let’s get dirty: GAN based data augmentation for soiling and adverse weather classification in autonomous driving. arXiv preprint arXiv:1912.02249 (2019)
  43. 43.
    Xiao, T., Hong, J., Ma, J.: DNA-GAN: learning disentangled representations from multi-attribute images. In: ICLR Workshops (2018)Google Scholar
  44. 44.
    Xiao, T., Hong, J., Ma, J.: ELEGANT: exchanging latent encodings with GAN for transferring multiple face attributes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 172–187. Springer, Cham (2018). Scholar
  45. 45.
    Xie, Y., Franz, E., Chu, M., Thuerey, N.: tempoGAN: a temporally coherent, volumetric GAN for super-resolution fluid flow. In: SIGGRAPH (2018)Google Scholar
  46. 46.
    Yang, X., Xu, Z., Luo, J.: Towards perceptual image dehazing by physics-based disentanglement and adversarial training. In: AAAI (2018)Google Scholar
  47. 47.
    Yang, X., Xie, D., Wang, X.: Crossing-domain generative adversarial networks for unsupervised multi-domain image-to-image translation. In: MM (2018)Google Scholar
  48. 48.
    Yi, Z., Zhang, H., Tan, P., Gong, M.: DualGAN: unsupervised dual learning for image-to-image translation. In: ICCV (2017)Google Scholar
  49. 49.
    Yogamani, S., et al.: WoodScape: a multi-task, multi-camera fisheye dataset for autonomous driving. In: ICCV (2019)Google Scholar
  50. 50.
    You, S., Tan, R.T., Kawakami, R., Mukaigawa, Y., Ikeuchi, K.: Adherent raindrop modeling, detectionand removal in video. T-PAMI (2015)Google Scholar
  51. 51.
    Zhang, J., Huang, Y., Li, Y., Zhao, W., Zhang, L.: Multi-attribute transfer via disentangled representation. In: AAAI (2019)Google Scholar
  52. 52.
    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)Google Scholar
  53. 53.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)Google Scholar
  54. 54.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: CVPR (2017)Google Scholar
  55. 55.
    Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: NeurIPS (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.InriaParisFrance
  2. 2.VisLabParmaItaly

Personalised recommendations