Skip to main content

Learning Object Placement via Dual-Path Graph Completion

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Object placement aims to place a foreground object over a background image with a suitable location and size. In this work, we treat object placement as a graph completion problem and propose a novel graph completion module (GCM). The background scene is represented by a graph with multiple nodes at different spatial locations with various receptive fields. The foreground object is encoded as a special node that should be inserted at a reasonable place in this graph. We also design a dual-path framework upon the structure of GCM to fully exploit annotated composite images. With extensive experiments on OPA dataset, our method proves to significantly outperform existing methods in generating plausible object placement without loss of diversity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Azadi, S., Pathak, D., Ebrahimi, S., Darrell, T.: Compositional GAN: learning image-conditional binary composition. Int. J. Comput. Vis. 128, 2570–2585 (2020)

    Article  Google Scholar 

  2. Chen, B.C., Kae, A.: Toward realistic image compositing with adversarial learning. In: CVPR (2019)

    Google Scholar 

  3. Chen, T., Cheng, M.M., Tan, P., Shamir, A., Hu, S.M.: Sketch2Photo: Internet image montage. ACM Trans. Graph. (TOG) 28, 1–10 (2009)

    Google Scholar 

  4. Cong, W., Niu, L., Zhang, J., Liang, J., Zhang, L.: BargainNet: background-guided domain translation for image harmonization. In: ICME (2021)

    Google Scholar 

  5. Cong, W., et al.: High-resolution image harmonization via collaborative dual transformations. In: CVPR (2022)

    Google Scholar 

  6. Cong, W., et al.: DoveNet: deep image harmonization via domain verification. In: CVPR (2020)

    Google Scholar 

  7. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context (2019)

    Google Scholar 

  8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  9. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML (2017)

    Google Scholar 

  10. Georgakis, G., Mousavian, A., Berg, A.C., Kosecka, J.: Synthesizing training data for object detection in indoor scenes (2017)

    Google Scholar 

  11. Goodfellow, I., et al.: Generative adversarial nets. NIPS (2014)

    Google Scholar 

  12. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: NeurIPS (2017)

    Google Scholar 

  13. Hong, Y., Niu, L., Zhang, J.: Shadow generation for composite image in real-world scenes. In: AAAI (2022)

    Google Scholar 

  14. Johnson, J., Gupta, A., Fei-Fei, L.: Image generation from scene graphs. In: CVPR (2018)

    Google Scholar 

  15. Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2014)

    Google Scholar 

  16. Lalonde, J.F., Efros, A.A.: Using color compatibility for assessing image realism. In: ICCV (2007)

    Google Scholar 

  17. Lalonde, J.F., Hoiem, D., Efros, A.A., Rother, C., Winn, J., Criminisi, A.: Photo clip art. ACM Trans. Graph. (TOG) 26, 3-es (2007)

    Google Scholar 

  18. Lee, D., Liu, S., Gu, J., Liu, M.Y., Yang, M.H., Kautz, J.: Context-aware synthesis and placement of object instances (2018)

    Google Scholar 

  19. Li, X., Liu, S., Kim, K., Wang, X., Yang, M.H., Kautz, J.: Putting humans in a scene: learning affordance in 3D indoor environments. In: CVPR (2019)

    Google Scholar 

  20. Lin, C.H., Yumer, E., Wang, O., Shechtman, E., Lucey, S.: ST-GAN: spatial transformer generative adversarial networks for image compositing. In: CVPR (2018)

    Google Scholar 

  21. Liu, D., Long, C., Zhang, H., Yu, H., Dong, X., Xiao, C.: ARShadowGAN: shadow generative adversarial network for augmented reality in single light scenes. In: CVPR (2020)

    Google Scholar 

  22. Liu, L., Zhang, B., Li, J., Niu, L., Liu, Q., Zhang, L.: OPA: object placement assessment dataset. arXiv preprint arXiv:2107.01889 (2021)

  23. Liu, X., Yu, H.F., Dhillon, I., Hsieh, C.J.: Learning to encode position for transformer with continuous dynamical model. In: ICML (2020)

    Google Scholar 

  24. Niu, L., et al.: Making images real again: a comprehensive survey on deep image composition. arXiv preprint arXiv:2106.14490 (2021)

  25. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer (2020)

    Google Scholar 

  26. Schuster, M.J., Okerman, J., Nguyen, H., Rehg, J.M., Kemp, C.C.: Perceiving clutter and surfaces for object placement in indoor environments. In: ICHR (2010)

    Google Scholar 

  27. Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations (2018)

    Google Scholar 

  28. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)

    Google Scholar 

  29. Smith, A.R., Blinn, J.F.: Blue screen matting. In: SIGGRAPH (1996)

    Google Scholar 

  30. Tan, F., Bernier, C., Cohen, B., Ordonez, V., Barnes, C.: Where and who? Automatic semantic-aware person composition. In: WACV (2018)

    Google Scholar 

  31. Tripathi, S., Chandra, S., Agrawal, A., Tyagi, A., Rehg, J.M., Chari, V.: Learning to generate synthetic data via compositing. In: CVPR (2019)

    Google Scholar 

  32. Tsai, Y.H., Shen, X., Lin, Z., Sunkavalli, K., Lu, X., Yang, M.H.: Deep image harmonization. In: CVPR (2017)

    Google Scholar 

  33. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  34. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: CVPR (2018)

    Google Scholar 

  35. Weng, S., Li, W., Li, D., Jin, H., Shi, B.: MISC: multi-condition injection and spatially-adaptive compositing for conditional person image synthesis. In: CVPR (2020)

    Google Scholar 

  36. Wu, H., Zheng, S., Zhang, J., Huang, K.: GP-GAN: towards realistic high-resolution image blending. In: ACM Multimedia (2019)

    Google Scholar 

  37. Xue, S., Agarwala, A., Dorsey, J., Rushmeier, H.: Understanding and improving the realism of image composites. ACM Trans. Graph. (TOG) 31, 1–10 (2012)

    Article  Google Scholar 

  38. Zhang, L., Wen, T., Min, J., Wang, J., Han, D., Shi, J.: Learning object placement by inpainting for compositional data augmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 566–581. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_34

    Chapter  Google Scholar 

  39. Zhang, L., Wen, T., Shi, J.: Deep image blending. In: WACV (2020)

    Google Scholar 

  40. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

    Google Scholar 

  41. Zhang, S.-H., Zhou, Z.-P., Liu, B., Dong, X., Hall, P.: What and where: a context-based recommendation system for object insertion. Comput. Vis. Media 6(1), 79–93 (2020). https://doi.org/10.1007/s41095-020-0158-8

    Article  Google Scholar 

  42. Zhu, J.Y., Krahenbuhl, P., Shechtman, E., Efros, A.A.: Learning a discriminative model for the perception of realism in composite images. In: ICCV, pp. 3943–3951 (2015)

    Google Scholar 

  43. Zhu, J.Y., et al.: Multimodal image-to-image translation by enforcing bi-cycle consistency. In: NeurIPS (2017)

    Google Scholar 

Download references

Acknowledgements

The work is supported by Shanghai Municipal Science and Technology Key Project (Grant No. 20511100300), Shanghai Municipal Science and Technology Major Project, China (2021SHZDZX0102), and National Science Foundation of China (Grant No. 61902247).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Niu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5581 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, S., Liu, L., Niu, L., Zhang, L. (2022). Learning Object Placement via Dual-Path Graph Completion. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13677. Springer, Cham. https://doi.org/10.1007/978-3-031-19790-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19790-1_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19789-5

  • Online ISBN: 978-3-031-19790-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics