Approximate Query Matching for Graph-Based Holistic Image Retrieval

  • Abhijit SupremEmail author
  • Duen Horng Chau
  • Calton Pu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10968)


Image retrieval has transitioned from retrieving images with single object descriptions to retrieving images by using complex natural language to describe desired image content. We present work on holistic image search to perform exact and approximate image retrieval that returns images from a database that most closely match the user’s description. Our approach can handle simple queries for single objects (ex: cake) to more complex descriptions of multiple objects and prepositional relations between objects (ex: girl eating cake with a fork on a plate) in graph notation. In addition, our approach can generalize to retrieve queries that are semantically similar in case specific results are not found. We use the scene graph, developed in the Visual Genome dataset as a formalization of image content stored as a graph with nodes for objects and edges for relations describing objects in an image. We combine this with approximate search techniques for large-scale graphs and a semantic scoring algorithm developed by us to holistically retrieve images based on given search criteria. We also present a method to store scene graphs and metadata in graph databases using Neo4 J.


Image retrieval Graph search Approximate search Scene graphs 


  1. 1.
    Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D.A.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Johnson, J., Krishna, R., Stark, M., Li, L.-J., Shamma, D., Bernstein, M., Fei-Fei, L.: Image retrieval using scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3668–3678 (2015)Google Scholar
  3. 3.
    Liu, Y., Zhang, D., Lu, G., Ma, W.-Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)CrossRefGoogle Scholar
  4. 4.
    Salle, A., Idiart, M., Villavicencio, A.: Enhancing the lexvec distributed word representation model using positional contexts and external memory. CoRR (2016)Google Scholar
  5. 5.
    Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: Leibe, B., Matas, J., Sebe, N., Welling, Max (eds.) ECCV 2016. LNCS, vol. 9905, pp. 852–869. Springer, Cham (2016). Scholar
  6. 6.
    Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: Fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4565–4574 (2016)Google Scholar
  7. 7.
    Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative message passing. In: Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  8. 8.
    Patwardhan, S., Pedersen, T.: Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the EACL 2006 Workshop: Making Sense of Sense-Bringing Computational Linguistics and Psycholinguistics Together, vol. 1501, pp. 1–8 (2006)Google Scholar
  9. 9.
    Khodak, M., Risteski, A., Fellbaum, C., Arora, S.: Automated WordNet Construction Using Word Embeddings. In: SENSE 2017 (2017)Google Scholar
  10. 10.
    Arora, S., Liang, Y., Ma, T.: A simple but tough-to-beat baseline for sentence embeddings. In: International Conference on Learning Representations (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer ScienceGeorgia Institute of TechnologyAtlantaUSA
  2. 2.School of Computational Science and EngineeringGeorgia Institute of TechnologyAtlantaUSA

Personalised recommendations