Adversarial Image Composition with Auxiliary Illumination

Zhan, Fangneng; Lu, Shijian; Zhang, Changgong; Ma, Feiying; Xie, Xuansong

doi:10.1007/978-3-030-69532-3_15

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12623))

Included in the following conference series:

Asian Conference on Computer Vision

923 Accesses
7 Citations

Abstract

Dealing with the inconsistency between a foreground object and a background image is a challenging task in high-fidelity image composition. State-of-the-art methods strive to harmonize the composed image by adapting the style of foreground objects to be compatible with the background image, whereas the potential shadow of foreground objects within the composed image which is critical to the composition realism is largely neglected. In this paper, we propose an Adversarial Image Composition Net (AIC-Net) that achieves realistic image composition by considering potential shadows that the foreground object projects in the composed image. A novel branched generation mechanism is proposed, which disentangles the generation of shadows and the transfer of foreground styles for optimal accomplishment of the two tasks simultaneously. A differentiable spatial transformation module is designed which bridges the local harmonization and the global harmonization to achieve their joint optimization effectively. Extensive experiments on pedestrian and car composition tasks show that the proposed AIC-Net achieves superior composition performance qualitatively and quantitatively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tsai, Y.H., Shen, X., Lin, Z., Sunkavalli, K., Lu, X., Yang, M.H.: Deep image harmonization. In: CVPR (2017)
Google Scholar
Luan, F., Paris, S., Shechtman, E., Bala, K.: Deep painterly harmonization. arXiv preprint arXiv:1804.03189 (2018)
Wu, H., Zheng, S., Zhang, J., Huang, K.: Gp-gan: towards realistic high-resolution image blending. arXiv:1703.07195 (2017)
Lin, C.H., Yumer, E., Wang, O., Shechtman, E., Lucey, S.: St-gan: spatial transformer generative adversarial networks for image compositing. In: CVPR (2018)
Google Scholar
Zhan, F., Lu, S., Xue, C.: Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In: ECCV, pp. 249–266 (2018)
Google Scholar
Zhan, F., Huang, J., Lu, S.: Adaptive composition gan towards realistic image synthesis, vol. 1905, p. 04693 (2019)
Google Scholar
Zhan, F., Xue, C., Lu, S.: Ga-dan: geometry-aware domain adaptation network for scene text detection and recognition. In: ICCV (2019)
Google Scholar
Zhan, F., Zhu, H., Lu, S.: Scene text synthesis for efficient and effective deep network training. arXiv:1901.09193 (2019)
Zhan, F., Zhu, H., Lu, S.: Spatial fusion gan for image synthesis. In: CVPR (2019)
Google Scholar
Zhan, F., Lu, S.: Esir: end-to-end scene text recognition via iterative image rectification. In: CVPR (2019)
Google Scholar
Zhan, F., Lu, S., Zhang, C., Ma, F., Xie, X.: Towards realistic 3D embedding via view alignment. arXiv preprint arXiv:2007.07066 (2020)
Zhan, F., Lu, S., Xiao, A.: Spatial-aware gan for unsupervised person re-identification. arXiv preprint arXiv:1911.11312 (2019)
Liu, D., Long, C., Zhang, H., Yu, H., Dong, X., Xiao, C.: Arshadowgan: shadow generative adversarial network for augmented reality in single light scenes. In: CVPR (2020)
Google Scholar
Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. In: TOG, vol. 22 (2003)
Google Scholar
Azadi, S., Pathak, D., Ebrahimi, S., Darrell, T.: Compositional gan: learning conditional image composition. arXiv:1807.07560 (2018)
Zhu, J.Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Learning a discriminative model for the perception of realism in composite images. In: ICCV (2015)
Google Scholar
Chen, B.C., Kae, A.: Toward realistic image compositing with adversarial learning. In: CVPR (2019)
Google Scholar
Sunkavalli, K., Johnson, M.K., Matusik, W., Pfister, H.: Multi-scale image harmonization. ACM Trans. Graph. (Proc. ACM SIGGRAPH) 29, 1–10 (2010)
Google Scholar
Xue, S., Agarwala, A., Dorsey, J., Rushmeier, H.: Understanding and improving the realism of image composites. TOG 31, 1-10 (2012)
Google Scholar
Tao, M.W., Johnson, M.K., Paris, S.: Error-tolerant image compositing. IJCV 103, 31–44 (2013). https://doi.org/10.1007/978-3-642-15549-9_3
Tsai, Y.H., Shen, X., Lin, Z., Sunkavalli, K., Yang, M.H.: Sky is not the limit: semantic-aware sky replacement. ACM Trans. Graph. (Proc. SIGGRAPH) 35, 1449 (2016)
Google Scholar
Efros, J.F.L.A.A.: Using color compatibility for assessing image realism. In: ICCV (2007)
Google Scholar
Cong, W., et al.: Dovenet: deep image harmonization via domain verification. In: CVPR (2020)
Google Scholar
Lalonde, J.F., Efros, A.A., Narasimhan, S.G.: Estimating the natural illumination conditions from a single outdoor image. IJCV 98, 123–145 (2012). https://doi.org/10.1007/s11263-011-0501-8
Gardner, M.A., et al.: Learning to predict indoor illumination from a single image. In: SIGGRAPH Asia (2017)
Google Scholar
Hold-Geoffroy, Y., Sunkavalli, K., Hadap, S., Gambaretto, E., Lalonde, J.F.: Deep outdoor illumination estimation. In: CVPR (2017)
Google Scholar
Gardner, M.A., Hold-Geoffroy, Y., Sunkavalli, K., Gagné, C., Lalonde, J.F.: Deep parametric indoor lighting estimation. In: ICCV (2019)
Google Scholar
Garon, M., Sunkavalli, K., Hadap, S., Carr, N., Lalonde, J.F.: Fast spatially-varying indoor lighting estimation. In: CVPR (2019)
Google Scholar
Goodfellow, I.J., et al.: Generative adversarial networks. In: NIPS, pp. 2672–2680 (2014)
Google Scholar
Denton, E., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: NIPS (2015)
Google Scholar
Lee, D., Liu, S., Gu, J., Liu, M.Y., Yang, M.H., Kautz, J.: Context-aware synthesis and placement of object instances. In: NIPS (2018)
Google Scholar
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: NIPS (2016)
Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: ICML (2017)
Google Scholar
Bau, D., et al.: Gan dissection: visualizing and understanding generative adversarial networks. In: ICLR (2019)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Google Scholar
Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: CVPR (2017)
Google Scholar
Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: NIPS (2017)
Google Scholar
Azadi, S., Fisher, M., Kim, V., Wang, Z., Shechtman, E., Darrell, T.: Multi-content gan for few-shot font style transfer. In: CVPR (2018)
Google Scholar
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: CVPR (2019)
Google Scholar
Liu, M.Y., et al.: Few-shot unsupervised image-to-image translation. arXiv:1905.01723 (2019)
Green, R.: Spherical harmonic lighting: the gritty details. In: Game Developers Conference (2003)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Cheng, D., Shi, J., Chen, Y., Deng, X., Zhang, X.: Learning scene illumination by pairwise photos from rear and front mobile cameras. In: Computer Graphics Forum (2018)
Google Scholar
He, K., Sun, J., Tang, X.: Guided image filtering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 1–14. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_1
Chapter Google Scholar
Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 91–102. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21227-7_9
Chapter Google Scholar
Chavdarova, T., et al.: Wildtrack: a multi-camera HD dataset for dense unscripted pedestrian detection. In: CVPR (2018)
Google Scholar
Wang, L., Shi, J., Song, G., Shen, I.: Object detection combining recognition and segmentation. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007. LNCS, vol. 4843, pp. 189–199. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76386-4_17
Chapter Google Scholar
Wen, L., et al.: Ua-detrac: a new benchmark and protocol for multi-object detection and tracking. arXiv preprint arXiv:1511.04136 (2015)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
Google Scholar
Chang, A.X., et al.: Shapenet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NIPS (2017)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Zhou, P., Han, X., Morariu, V.I., Davis, L.S.: Learning rich features for image manipulation detection. In: CVPR (2018)
Google Scholar
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV. (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Nanyang Technological University, Singapore, Singapore
Fangneng Zhan & Shijian Lu
Alibaba DAMO Academy, Hangzhou, China
Fangneng Zhan, Changgong Zhang, Feiying Ma & Xuansong Xie

Authors

Fangneng Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Shijian Lu
View author publications
You can also search for this author in PubMed Google Scholar
Changgong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Feiying Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xuansong Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shijian Lu .

Editor information

Editors and Affiliations

Waseda University, Tokyo, Japan
Hiroshi Ishikawa
Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Czech Technical University in Prague, Prague, Czech Republic
Tomas Pajdla
University of Pennsylvania, Philadelphia, PA, USA
Jianbo Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhan, F., Lu, S., Zhang, C., Ma, F., Xie, X. (2021). Adversarial Image Composition with Auxiliary Illumination. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-69532-3_15
Published: 27 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69531-6
Online ISBN: 978-3-030-69532-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics