Skip to main content

Adversarial Image Composition with Auxiliary Illumination

  • Conference paper
  • First Online:
Computer Vision – ACCV 2020 (ACCV 2020)

Abstract

Dealing with the inconsistency between a foreground object and a background image is a challenging task in high-fidelity image composition. State-of-the-art methods strive to harmonize the composed image by adapting the style of foreground objects to be compatible with the background image, whereas the potential shadow of foreground objects within the composed image which is critical to the composition realism is largely neglected. In this paper, we propose an Adversarial Image Composition Net (AIC-Net) that achieves realistic image composition by considering potential shadows that the foreground object projects in the composed image. A novel branched generation mechanism is proposed, which disentangles the generation of shadows and the transfer of foreground styles for optimal accomplishment of the two tasks simultaneously. A differentiable spatial transformation module is designed which bridges the local harmonization and the global harmonization to achieve their joint optimization effectively. Extensive experiments on pedestrian and car composition tasks show that the proposed AIC-Net achieves superior composition performance qualitatively and quantitatively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tsai, Y.H., Shen, X., Lin, Z., Sunkavalli, K., Lu, X., Yang, M.H.: Deep image harmonization. In: CVPR (2017)

    Google Scholar 

  2. Luan, F., Paris, S., Shechtman, E., Bala, K.: Deep painterly harmonization. arXiv preprint arXiv:1804.03189 (2018)

  3. Wu, H., Zheng, S., Zhang, J., Huang, K.: Gp-gan: towards realistic high-resolution image blending. arXiv:1703.07195 (2017)

  4. Lin, C.H., Yumer, E., Wang, O., Shechtman, E., Lucey, S.: St-gan: spatial transformer generative adversarial networks for image compositing. In: CVPR (2018)

    Google Scholar 

  5. Zhan, F., Lu, S., Xue, C.: Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In: ECCV, pp. 249–266 (2018)

    Google Scholar 

  6. Zhan, F., Huang, J., Lu, S.: Adaptive composition gan towards realistic image synthesis, vol. 1905, p. 04693 (2019)

    Google Scholar 

  7. Zhan, F., Xue, C., Lu, S.: Ga-dan: geometry-aware domain adaptation network for scene text detection and recognition. In: ICCV (2019)

    Google Scholar 

  8. Zhan, F., Zhu, H., Lu, S.: Scene text synthesis for efficient and effective deep network training. arXiv:1901.09193 (2019)

  9. Zhan, F., Zhu, H., Lu, S.: Spatial fusion gan for image synthesis. In: CVPR (2019)

    Google Scholar 

  10. Zhan, F., Lu, S.: Esir: end-to-end scene text recognition via iterative image rectification. In: CVPR (2019)

    Google Scholar 

  11. Zhan, F., Lu, S., Zhang, C., Ma, F., Xie, X.: Towards realistic 3D embedding via view alignment. arXiv preprint arXiv:2007.07066 (2020)

  12. Zhan, F., Lu, S., Xiao, A.: Spatial-aware gan for unsupervised person re-identification. arXiv preprint arXiv:1911.11312 (2019)

  13. Liu, D., Long, C., Zhang, H., Yu, H., Dong, X., Xiao, C.: Arshadowgan: shadow generative adversarial network for augmented reality in single light scenes. In: CVPR (2020)

    Google Scholar 

  14. Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. In: TOG, vol. 22 (2003)

    Google Scholar 

  15. Azadi, S., Pathak, D., Ebrahimi, S., Darrell, T.: Compositional gan: learning conditional image composition. arXiv:1807.07560 (2018)

  16. Zhu, J.Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Learning a discriminative model for the perception of realism in composite images. In: ICCV (2015)

    Google Scholar 

  17. Chen, B.C., Kae, A.: Toward realistic image compositing with adversarial learning. In: CVPR (2019)

    Google Scholar 

  18. Sunkavalli, K., Johnson, M.K., Matusik, W., Pfister, H.: Multi-scale image harmonization. ACM Trans. Graph. (Proc. ACM SIGGRAPH) 29, 1–10 (2010)

    Google Scholar 

  19. Xue, S., Agarwala, A., Dorsey, J., Rushmeier, H.: Understanding and improving the realism of image composites. TOG 31, 1-10 (2012)

    Google Scholar 

  20. Tao, M.W., Johnson, M.K., Paris, S.: Error-tolerant image compositing. IJCV 103, 31–44 (2013). https://doi.org/10.1007/978-3-642-15549-9_3

  21. Tsai, Y.H., Shen, X., Lin, Z., Sunkavalli, K., Yang, M.H.: Sky is not the limit: semantic-aware sky replacement. ACM Trans. Graph. (Proc. SIGGRAPH) 35, 1449 (2016)

    Google Scholar 

  22. Efros, J.F.L.A.A.: Using color compatibility for assessing image realism. In: ICCV (2007)

    Google Scholar 

  23. Cong, W., et al.: Dovenet: deep image harmonization via domain verification. In: CVPR (2020)

    Google Scholar 

  24. Lalonde, J.F., Efros, A.A., Narasimhan, S.G.: Estimating the natural illumination conditions from a single outdoor image. IJCV 98, 123–145 (2012). https://doi.org/10.1007/s11263-011-0501-8

  25. Gardner, M.A., et al.: Learning to predict indoor illumination from a single image. In: SIGGRAPH Asia (2017)

    Google Scholar 

  26. Hold-Geoffroy, Y., Sunkavalli, K., Hadap, S., Gambaretto, E., Lalonde, J.F.: Deep outdoor illumination estimation. In: CVPR (2017)

    Google Scholar 

  27. Gardner, M.A., Hold-Geoffroy, Y., Sunkavalli, K., Gagné, C., Lalonde, J.F.: Deep parametric indoor lighting estimation. In: ICCV (2019)

    Google Scholar 

  28. Garon, M., Sunkavalli, K., Hadap, S., Carr, N., Lalonde, J.F.: Fast spatially-varying indoor lighting estimation. In: CVPR (2019)

    Google Scholar 

  29. Goodfellow, I.J., et al.: Generative adversarial networks. In: NIPS, pp. 2672–2680 (2014)

    Google Scholar 

  30. Denton, E., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: NIPS (2015)

    Google Scholar 

  31. Lee, D., Liu, S., Gu, J., Liu, M.Y., Yang, M.H., Kautz, J.: Context-aware synthesis and placement of object instances. In: NIPS (2018)

    Google Scholar 

  32. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: NIPS (2016)

    Google Scholar 

  33. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: ICML (2017)

    Google Scholar 

  34. Bau, D., et al.: Gan dissection: visualizing and understanding generative adversarial networks. In: ICLR (2019)

    Google Scholar 

  35. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)

    Google Scholar 

  36. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)

    Google Scholar 

  37. Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: CVPR (2017)

    Google Scholar 

  38. Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: NIPS (2017)

    Google Scholar 

  39. Azadi, S., Fisher, M., Kim, V., Wang, Z., Shechtman, E., Darrell, T.: Multi-content gan for few-shot font style transfer. In: CVPR (2018)

    Google Scholar 

  40. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: CVPR (2019)

    Google Scholar 

  41. Liu, M.Y., et al.: Few-shot unsupervised image-to-image translation. arXiv:1905.01723 (2019)

  42. Green, R.: Spherical harmonic lighting: the gritty details. In: Game Developers Conference (2003)

    Google Scholar 

  43. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  44. Cheng, D., Shi, J., Chen, Y., Deng, X., Zhang, X.: Learning scene illumination by pairwise photos from rear and front mobile cameras. In: Computer Graphics Forum (2018)

    Google Scholar 

  45. He, K., Sun, J., Tang, X.: Guided image filtering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 1–14. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_1

    Chapter  Google Scholar 

  46. Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 91–102. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21227-7_9

    Chapter  Google Scholar 

  47. Chavdarova, T., et al.: Wildtrack: a multi-camera HD dataset for dense unscripted pedestrian detection. In: CVPR (2018)

    Google Scholar 

  48. Wang, L., Shi, J., Song, G., Shen, I.: Object detection combining recognition and segmentation. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007. LNCS, vol. 4843, pp. 189–199. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76386-4_17

    Chapter  Google Scholar 

  49. Wen, L., et al.: Ua-detrac: a new benchmark and protocol for multi-object detection and tracking. arXiv preprint arXiv:1511.04136 (2015)

  50. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)

    Google Scholar 

  51. Chang, A.X., et al.: Shapenet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)

  52. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NIPS (2017)

    Google Scholar 

  53. Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)

    Google Scholar 

  54. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  55. Zhou, P., Han, X., Morariu, V.I., Davis, L.S.: Learning rich features for image manipulation detection. In: CVPR (2018)

    Google Scholar 

  56. Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV. (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shijian Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhan, F., Lu, S., Zhang, C., Ma, F., Xie, X. (2021). Adversarial Image Composition with Auxiliary Illumination. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69532-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69531-6

  • Online ISBN: 978-3-030-69532-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics