Compositing-Aware Image Search

  • Hengshuang ZhaoEmail author
  • Xiaohui Shen
  • Zhe Lin
  • Kalyan Sunkavalli
  • Brian Price
  • Jiaya Jia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)


We present a new image search technique that, given a background image, returns compatible foreground objects for image compositing tasks. The compatibility of a foreground object and a background scene depends on various aspects such as semantics, surrounding context, geometry, style and color. However, existing image search techniques measure the similarities on only a few aspects, and may return many results that are not suitable for compositing. Moreover, the importance of each factor may vary for different object categories and image content, making it difficult to manually define the matching criteria. In this paper, we propose to learn feature representations for foreground objects and background scenes respectively, where image content and object category information are jointly encoded during training. As a result, the learned features can adaptively encode the most important compatibility factors. We project the features to a common embedding space, so that the compatibility scores can be easily measured using the cosine similarity, enabling very efficient search. We collect an evaluation set consisting of eight object categories commonly used in compositing tasks, on which we demonstrate that our approach significantly outperforms other search techniques.

Supplementary material

474178_1_En_31_MOESM1_ESM.pdf (48.6 mb)
Supplementary material 1 (pdf 49737 KB)


  1. 1.
    Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. ACM Trans. Graph. (TOG) 22, 313–318 (2003)CrossRefGoogle Scholar
  2. 2.
    Sun, J., Jia, J., Tang, C.K., Shum, H.Y.: Poisson matting. ACM Trans. Graph. (TOG) 23, 315–321 (2004)CrossRefGoogle Scholar
  3. 3.
    Sunkavalli, K., Johnson, M.K., Matusik, W., Pfister, H.: Multi-scale image harmonization. ACM Trans. Graph. (TOG) (2010)Google Scholar
  4. 4.
    Xue, S., Agarwala, A., Dorsey, J., Rushmeier, H.: Understanding and improving the realism of image composites. ACM Trans. Graph. (TOG) (2012)Google Scholar
  5. 5.
    Tsai, Y.H., Shen, X., Lin, Z., Sunkavalli, K., Lu, X., Yang, M.H.: Deep image harmonization. In: CVPR (2017)Google Scholar
  6. 6.
    Lalonde, J.F., Hoiem, D., Efros, A.A., Rother, C., Winn, J., Criminisi, A.: Photo clip art. ACM Trans. Graph. (TOG) (2007)Google Scholar
  7. 7.
    Tan, F., Bernier, C., Cohen, B., Ordonez, V., Barnes, C.: Where and Who? Automatic semantic-aware person composition. In: WACV (2018)Google Scholar
  8. 8.
    Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. In: EMNLP (2016)Google Scholar
  9. 9.
    Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. TPAMI (2011)Google Scholar
  10. 10.
    Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 241–257. Springer, Cham (2016). Scholar
  11. 11.
    Collomosse, J., Bui, T., Wilber, M., Fang, C., Jin, H.: Sketching with style: visual search with sketches and aesthetic context. In: ICCV (2017)Google Scholar
  12. 12.
    Mai, L., Jin, H., Lin, Z., Fang, C., Brandt, J., Liu, F.: Spatial-semantic image search by visual feature synthesis. In: CVPR (2017)Google Scholar
  13. 13.
    Zhu, J.Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Learning a discriminative model for the perception of realism in composite images. In: ICCV (2015)Google Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  15. 15.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
  16. 16.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)Google Scholar
  17. 17.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  18. 18.
    Everingham, M., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The PASCAL visual object classes VOC challenge. IJCV (2010)Google Scholar
  19. 19.
    Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: CVPR (2017)Google Scholar
  20. 20.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM MM (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Hengshuang Zhao
    • 1
    Email author
  • Xiaohui Shen
    • 2
  • Zhe Lin
    • 3
  • Kalyan Sunkavalli
    • 3
  • Brian Price
    • 3
  • Jiaya Jia
    • 1
    • 4
  1. 1.The Chinese University of Hong KongShatinHong Kong
  2. 2.ByteDance AI LabMenlo ParkUSA
  3. 3.Adobe ResearchSan JoseUSA
  4. 4.Tencent Youtu LabShenzhenChina

Personalised recommendations