Advertisement

SceneSketcher: Fine-Grained Image Retrieval with Scene Sketches

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12364)

Abstract

Sketch-based image retrieval (SBIR) has been a popular research topic in recent years. Existing works concentrate on mapping the visual information of sketches and images to a semantic space at the object level. In this paper, for the first time, we study the fine-grained scene-level SBIR problem which aims at retrieving scene images satisfying the user’s specific requirements via a freehand scene sketch. We propose a graph embedding based method to learn the similarity measurement between images and scene sketches, which models the multi-modal information, including the size and appearance of objects as well as their layout information, in an effective manner. To evaluate our approach, we collect a dataset based on SketchyCOCO and extend the dataset using Coco-stuff. Comprehensive experiments demonstrate the significant potential of the proposed approach on the application of fine-grained scene-level image retrieval.

Keywords

Sketch Image retrieval Graph convolutional network 

Notes

Acknowledgements

This work was supported by the National Key Research and Development Plan (2016YFB1001200), Natural Science Foundation of China (61872346, 61725204, 61473276), Natural Science Foundation of Beijing (L182052), and Royal Society-Newton Advanced Fellowship (NA150431).

Supplementary material

504475_1_En_42_MOESM1_ESM.pdf (18.8 mb)
Supplementary material 1 (pdf 19204 KB)

References

  1. 1.
  2. 2.
  3. 3.
    Belongie, S., Malik, J., Puzicha, J.: Shape context: a new descriptor for shape matching and object recognition. In: Advances in Neural Information Processing Systems, pp. 831–837 (2001)Google Scholar
  4. 4.
    Bui, T., Ribeiro, L., Ponti, M., Collomosse, J.: Sketching out the details: sketch-based image retrieval using convolutional neural networks with multi-stage regression. Comput. Graph. 71, 77–87 (2018)CrossRefGoogle Scholar
  5. 5.
    Caesar, H., Uijlings, J., Ferrari, V.: Coco-stuff: thing and stuff classes in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1209–1218 (2018)Google Scholar
  6. 6.
    Cao, Y., Wang, C., Zhang, L., Zhang, L.: Edgel index for large-scale sketch-based image search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–768 (2011)Google Scholar
  7. 7.
    Castrejon, L., Aytar, Y., Vondrick, C., Pirsiavash, H., Torralba, A.: Learning aligned cross-modal representations from weakly aligned data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2940–2949 (2016)Google Scholar
  8. 8.
    Chen, T., Cheng, M.M., Tan, P., Shamir, A., Hu, S.M.: Sketch2Photo: internet image montage. In: ACM Transactions on Graphics (TOG), vol. 28, p. 124 (2009)Google Scholar
  9. 9.
    Chen, Z.M., Wei, X.S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5177–5186 (2019)Google Scholar
  10. 10.
    Dey, S., Dutta, A., Ghosh, S.K., Valveny, E., Lladós, J., Pal, U.: Learning cross-modal deep embeddings for multi-object image retrieval using text and sketch. In: 24th International Conference on Pattern Recognition, pp. 916–921 (2018)Google Scholar
  11. 11.
    Dey, S., Riba, P., Dutta, A., Llados, J., Song, Y.Z.: Doodle to search: practical zero-shot sketch-based image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2179–2188 (2019)Google Scholar
  12. 12.
    Dutta, A., Akata, Z.: Semantically tied paired cycle consistency for zero-shot sketch-based image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5089–5098 (2019)Google Scholar
  13. 13.
    Eitz, M., Hays, J., Alexa, M.: How do humans sketch objects? ACM Trans. Graph. (TOG) 31(4), 1–10 (2012)Google Scholar
  14. 14.
    Eitz, M., Hildebrand, K., Boubekeur, T., Alexa, M.: An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Comput. Graph. 34(5), 482–498 (2010)CrossRefGoogle Scholar
  15. 15.
    Eitz, M., Hildebrand, K., Boubekeur, T., Alexa, M.: Sketch-based image retrieval: benchmark and bag-of-features descriptors. IEEE Trans. Visual Comput. Graph. 17(11), 1624–1636 (2010)CrossRefGoogle Scholar
  16. 16.
    Gao, C., Liu, Q., Xu, Q., Wang, L., Liu, J., Zou, C.: SketchyCOCO: image generation from freehand scene sketches. In: Proceedings of the European Conference on Computer Vision, pp. 5174–5183 (2020)Google Scholar
  17. 17.
    Guo, M., Chou, E., Huang, D.A., Song, S., Yeung, S., Fei-Fei, L.: Neural graph matching networks for fewshot 3D action recognition. In: Proceedings of the European Conference on Computer Vision, pp. 653–669 (2018)Google Scholar
  18. 18.
    Ha, D., Eck, D.: A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477 (2017)
  19. 19.
    Hu, R., Barnard, M., Collomosse, J.: Gradient field descriptor for sketch based retrieval and localization. In: IEEE International Conference on Image Processing, pp. 1025–1028 (2010)Google Scholar
  20. 20.
    Hu, R., Collomosse, J.: A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Comput. Vis. Image Underst. 117(7), 790–806 (2013)CrossRefGoogle Scholar
  21. 21.
    Khan, N., Chaudhuri, U., Banerjee, B., Chaudhuri, S.: Graph convolutional network for multi-label VHR remote sensing scene recognition. Neurocomputing 357, 36–46 (2019)CrossRefGoogle Scholar
  22. 22.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
  23. 23.
    Liu, L., Shen, F., Shen, Y., Liu, X., Shao, L.: Deep sketch hashing: fast free-hand sketch-based image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2862–2871 (2017)Google Scholar
  24. 24.
    Pang, K., et al.: Generalising fine-grained sketch-based image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 677–686 (2019)Google Scholar
  25. 25.
    Sangkloy, P., Burnell, N., Ham, C., Hays, J.: The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans. Graph. (TOG) 35(4), 1–12 (2016)CrossRefGoogle Scholar
  26. 26.
    Song, J., Song, Y.Z., Xiang, T., Hospedales, T.M., Ruan, X.: Deep multi-task attribute-driven ranking for fine-grained sketch-based image retrieval. In: BMVC, vol. 1, p. 3 (2016)Google Scholar
  27. 27.
    Song, J., Yu, Q., Song, Y.Z., Xiang, T., Hospedales, T.M.: Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5551–5560 (2017)Google Scholar
  28. 28.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  29. 29.
    Tolias, G., Chum, O.: Asymmetric feature maps with application to sketch based retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2377–2385 (2017)Google Scholar
  30. 30.
    Tripathi, S., Sridhar, S.N., Sundaresan, S., Tang, H.: Compact scene graphs for layout composition and patch retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 676–683 (2019)Google Scholar
  31. 31.
    Wang, R., Yan, J., Yang, X.: Learning combinatorial embedding networks for deep graph matching. arXiv preprint arXiv:1904.00597 (2019)
  32. 32.
    Xie, Y., Xu, P., Ma, Z.: Deep zero-shot learning for scene sketch. arXiv preprint arXiv:1905.04510 (2019)
  33. 33.
    Xu, P.: Deep learning for free-hand sketch: a survey. arXiv preprint arXiv:2001.02600 (2020)
  34. 34.
    Xu, P., et al.: SketchMate: deep hashing for million-scale human sketch retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8090–8098 (2018)Google Scholar
  35. 35.
    Yu, A., Grauman, K.: Fine-grained visual comparisons with local learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 192–199 (2014)Google Scholar
  36. 36.
    Yu, Q., Liu, F., Song, Y.Z., Xiang, T., Hospedales, T.M., Loy, C.C.: Sketch me that shoe. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 799–807 (2016)Google Scholar
  37. 37.
    Yu, Q., Yang, Y., Liu, F., Song, Y.Z., Xiang, T., Hospedales, T.M.: Sketch-a-Net: a deep neural network that beats humans. Int. J. Comput. Vis. 122(3), 411–425 (2017)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Zhang, J., et al.: Generative domain-migration hashing for sketch-to-image retrieval. In: Proceedings of the European Conference on Computer Vision, pp. 297–314 (2018)Google Scholar
  39. 39.
    Zhang, T., Liu, B., Niu, D., Lai, K., Xu, Y.: Multiresolution graph attention networks for relevance matching. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 933–942 (2018)Google Scholar
  40. 40.
    Zou, C., et al.: SketchyScene: richly-annotated scene sketches. In: Proceedings of the European Conference on Computer Vision, pp. 421–436 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.State Key Laboratory of Computer Science and Beijing Key Lab of Human-Computer InteractionInstitute of Software, Chinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.HMI LaboratoryHuawei TechnologiesShenzhenChina
  4. 4.Cardiff UniversityCardiffWales
  5. 5.Tsinghua UniversityBeijingChina

Personalised recommendations