Model-Based Occlusion Disentanglement for Image-to-Image Translation

Pizzati, Fabio; Cerri, Pietro; de Charette, Raoul

doi:10.1007/978-3-030-58565-5_27

Fabio Pizzati^12,13,
Pietro Cerri¹³ &
Raoul de Charette¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12365))

Included in the following conference series:

European Conference on Computer Vision

3529 Accesses
10 Citations

Abstract

Image-to-image translation is affected by entanglement phenomena, which may occur in case of target data encompassing occlusions such as raindrops, dirt, etc. Our unsupervised model-based learning disentangles scene and occlusions, while benefiting from an adversarial pipeline to regress physical parameters of the occlusion model. The experiments demonstrate our method is able to handle varying types of occlusions and generate highly realistic translations, qualitatively and quantitatively outperforming the state-of-the-art on multiple datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that averaging through the dataset implies similar image aspects and viewpoints. Image-wise guidance could be envisaged at the cost of less reliable guidance.
2.
Note that WoodScape provides soiling mask which we do not use.

References

Rain drops on screen. https://www.shadertoy.com/view/ldSBWW
Alletto, S., Carlin, C., Rigazio, L., Ishii, Y., Tsukizawa, S.: Adherent raindrop removal with self-supervised attention maps and spatio-temporal generative adversarial networks. In: ICCV Workshops (2019)
Google Scholar
Anoosheh, A., Agustsson, E., Timofte, R., Van Gool, L.: ComboGAN: unrestrained scalability for image domain translation. In: CVPR Workshops (2018)
Google Scholar
Bi, S., Sunkavalli, K., Perazzi, F., Shechtman, E., Kim, V.G., Ramamoorthi, R.: Deep CG2Real: Synthetic-to-real translation via image disentanglement. In: ICCV (2019)
Google Scholar
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR (2020)
Google Scholar
Cherian, A., Sullivan, A.: Sem-GAN: semantically-consistent image-to-image translation. In: WACV (2019)
Google Scholar
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR (2018)
Google Scholar
Cord, A., Aubert, D.: Towards rain detection through use of in-vehicle multipurpose cameras. In: IV (2011)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
Google Scholar
Gu, J., Ramamoorthi, R., Belhumeur, P., Nayar, S.: Removing image artifacts due to dirty camera lenses and thin occluders. In: SIGGRAPH Asia (2009)
Google Scholar
Halder, S.S., Lalonde, J.F., de Charette, R.: Physics-based rendering for improving robustness to rain. In: ICCV (2019)
Google Scholar
Halimeh, J.C., Roser, M.: Raindrop detection on car windshields using geometric-photometric environment construction and intensity-based correlation. In: IV (2009)
Google Scholar
Hao, Z., You, S., Li, Y., Li, K., Lu, F.: Learning from synthetic photorealistic raindrop for single image raindrop removal. In: ICCV Workshops (2019)
Google Scholar
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11
Chapter Google Scholar
Hui, L., Li, X., Chen, J., He, H., Yang, J.: Unsupervised multi-domain image translation with domain-specific encoders/decoders. In: ICPR (2018)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Google Scholar
Kim, J., Kim, M., Kang, H., Lee, K.: U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In: ICLR (2020)
Google Scholar
Lee, H.Y., et al.: DRIT++: diverse image-to-image translation via disentangled representations. arXiv preprint arXiv:1905.01270 (2019)
Li, P., Liang, X., Jia, D., Xing, E.P.: Semantic-aware grad-GAN for virtual-to-real urban scene adaption. BMVC (2018)
Google Scholar
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: NeurIPS (2017)
Google Scholar
Liu, M.Y., et al.: Few-shot unsupervised image-to-image translation. In: ICCV (2019)
Google Scholar
Ma, S., Fu, J., Wen Chen, C., Mei, T.: DA-GAN: instance-level image translation by deep attention generative adversarial networks. In: CVPR (2018)
Google Scholar
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: ICCV (2017)
Google Scholar
Mejjati, Y.A., Richardt, C., Tompkin, J., Cosker, D., Kim, K.I.: Unsupervised attention-guided image-to-image translation. In: NeurIPS (2018)
Google Scholar
Mo, S., Cho, M., Shin, J.: InstaGAN: instance-aware image-to-image translation. In: ICLR (2019)
Google Scholar
Pentland, A.P.: A new sense for depth of field. T-PAMI (1987)
Google Scholar
Pizzati, F., de Charette, R., Zaccaria, M., Cerri, P.: Domain bridge for unpaired image-to-image translation and unsupervised domain adaptation. In: WACV (2020)
Google Scholar
Porav, H., Bruls, T., Newman, P.: I can see clearly now: image restoration via de-raining. In: ICRA (2019)
Google Scholar
Qu, Y., Chen, Y., Huang, J., Xie, Y.: Enhanced pix2pix dehazing network. In: CVPR (2019)
Google Scholar
Ramirez, P.Z., Tonioni, A., Di Stefano, L.: Exploiting semantics in adversarial training for image-level domain adaptation. In: IPAS (2018)
Google Scholar
Riba, E., Mishkin, D., Ponsa, D., Rublee, E., Bradski, G.: Kornia: an open source differentiable computer vision library for PyTorch. In: WACV (2020)
Google Scholar
Romero, A., Arbeláez, P., Van Gool, L., Timofte, R.: SMIT: stochastic multi-label image-to-image translation. In: ICCV Workshops (2019)
Google Scholar
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR (2016)
Google Scholar
Roser, M., Geiger, A.: Video-based raindrop detection for improved image registration. In: ICCV Workshops (2009)
Google Scholar
Roser, M., Kurz, J., Geiger, A.: Realistic modeling of water droplets for monocular adherent raindrop recognition using Bezier curves. In: ACCV (2010)
Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: NeurIPS (2016)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)
Google Scholar
Shen, Z., Huang, M., Shi, J., Xue, X., Huang, T.S.: Towards instance-level image-to-image translation. In: CVPR (2019)
Google Scholar
Singh, K.K., Ojha, U., Lee, Y.J.: FineGAN: unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In: CVPR (2019)
Google Scholar
Tang, H., Xu, D., Sebe, N., Yan, Y.: Attention-guided generative adversarial networks for unsupervised image-to-image translation. In: International Joint Conference on Neural Networks (IJCNN) (2019)
Google Scholar
Tang, H., Xu, D., Yan, Y., Corso, J.J., Torr, P.H., Sebe, N.: Multi-channel attention selection GANs for guided image-to-image translation. In: CVPR (2019)
Google Scholar
Uricar, M., et al.: Let’s get dirty: GAN based data augmentation for soiling and adverse weather classification in autonomous driving. arXiv preprint arXiv:1912.02249 (2019)
Xiao, T., Hong, J., Ma, J.: DNA-GAN: learning disentangled representations from multi-attribute images. In: ICLR Workshops (2018)
Google Scholar
Xiao, T., Hong, J., Ma, J.: ELEGANT: exchanging latent encodings with GAN for transferring multiple face attributes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 172–187. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_11
Chapter Google Scholar
Xie, Y., Franz, E., Chu, M., Thuerey, N.: tempoGAN: a temporally coherent, volumetric GAN for super-resolution fluid flow. In: SIGGRAPH (2018)
Google Scholar
Yang, X., Xu, Z., Luo, J.: Towards perceptual image dehazing by physics-based disentanglement and adversarial training. In: AAAI (2018)
Google Scholar
Yang, X., Xie, D., Wang, X.: Crossing-domain generative adversarial networks for unsupervised multi-domain image-to-image translation. In: MM (2018)
Google Scholar
Yi, Z., Zhang, H., Tan, P., Gong, M.: DualGAN: unsupervised dual learning for image-to-image translation. In: ICCV (2017)
Google Scholar
Yogamani, S., et al.: WoodScape: a multi-task, multi-camera fisheye dataset for autonomous driving. In: ICCV (2019)
Google Scholar
You, S., Tan, R.T., Kawakami, R., Mukaigawa, Y., Ikeuchi, K.: Adherent raindrop modeling, detectionand removal in video. T-PAMI (2015)
Google Scholar
Zhang, J., Huang, Y., Li, Y., Zhao, W., Zhang, L.: Multi-attribute transfer via disentangled representation. In: AAAI (2019)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: CVPR (2017)
Google Scholar
Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: NeurIPS (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Inria, Paris, France
Fabio Pizzati & Raoul de Charette
VisLab, Parma, Italy
Fabio Pizzati & Pietro Cerri

Authors

Fabio Pizzati
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Cerri
View author publications
You can also search for this author in PubMed Google Scholar
Raoul de Charette
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raoul de Charette .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 20010 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pizzati, F., Cerri, P., de Charette, R. (2020). Model-Based Occlusion Disentanglement for Image-to-Image Translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12365. Springer, Cham. https://doi.org/10.1007/978-3-030-58565-5_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-58565-5_27
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58564-8
Online ISBN: 978-3-030-58565-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics