The Visual Computer

, Volume 33, Issue 6–8, pp 1049–1059 | Cite as

Rank-based voting with inclusion relationship for accurate image search

  • Jaehyeong Cho
  • Jae-Pil Heo
  • Taeyoung Kim
  • Bohyung Han
  • Sung-Eui YoonEmail author
Original Article


We present a rank-based voting technique utilizing inclusion relationship for high-quality image search. Since images can have multiple regions of interest, we extract representative object regions using a state-of-the-art region proposal method tailored for our search problem. We then extract CNN features locally from those representative regions and identify inclusion relationship between those regions. To identify similar images given a query, we propose a novel similarity measure based on representative regions and their inclusion relationship. Our similarity measure gives a high score to a pair of images that contain similar object regions with similar spatial arrangement. To verify benefits of our method, we test our method in three standard benchmarks and compare it against the state-of-the-art image search methods using CNN features. Our experiment results demonstrate effectiveness and robustness of the proposed algorithm.


Accurate image search Spatial relationship of image regions Image search-based applications 



This work was supported in part by MSIP/IITP R0126-16-1108, MI/KEIT 10070171 and DAPA/DITC (UC160003D).


  1. 1.
    Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: CVPR, pp. 1269–1277 (2015)Google Scholar
  2. 2.
    Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Computer Vision–ECCV 2014, pp. 584–599. Springer (2014)Google Scholar
  3. 3.
    Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside–outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  4. 4.
    Cheng, M.M., Zhang, Z., Lin, W.Y., Torr, P.: Bing: Binarized normed gradients for objectness estimation at 300fps. In: CVPR (2014)Google Scholar
  5. 5.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  6. 6.
    Gordo, A., Almazan, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. arXiv preprint arXiv:1604.01325 (2016)
  7. 7.
    Hays, J., Efros, A.A.: Scene completion using millions of photographs. In: ACM Transactions on Graphics (SIGGRAPH 2007) 26(3) (2007)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
  9. 9.
    Heo, J.P., Lin, Z., Shen, X., Brandt, J., Yoon, S.E.: Shortlist selection with residual-aware distance estimator for k-nearest neighbor search. In: CVPR (2016). To appearGoogle Scholar
  10. 10.
    Jégou, H., Chum, O.: Negative evidences and co-occurrences in image retrieval: the benefit of pca and whitening. In: ECCV (2012)Google Scholar
  11. 11.
    Jégou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: European Conference on Computer Vision (2008)Google Scholar
  12. 12.
    Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 1169–1176. IEEE (2009)Google Scholar
  13. 13.
    Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)CrossRefGoogle Scholar
  14. 14.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)Google Scholar
  15. 15.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  16. 16.
    Kalantidis, Y., Mellina, C., Osindero, S.: Cross-dimensional weighting for aggregated deep convolutional features. arXiv preprint arXiv:1512.04065 (2015)
  17. 17.
    Kemelmacher-Shlizerman, I.: Transfiguring portraits. ACM Trans. Graph. 35(4), 94 (2016)CrossRefGoogle Scholar
  18. 18.
    Krähenbühl, P., Koltun, V.: Geodesic object proposals. In: Computer Vision–ECCV 2014, pp. 725–739. Springer (2014)Google Scholar
  19. 19.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  20. 20.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPRGoogle Scholar
  21. 21.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  22. 22.
    Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: Cnn features off-the-shelf: an astounding baseline for recognition. In: Computer Vision and Pattern Recognition Workshops (2014)Google Scholar
  23. 23.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems 28, pp. 91–99 (2015)Google Scholar
  24. 24.
    Samii, A., Měch, R., Lin, Z.: Data-driven automatic cropping using semantic composition search. Comput. Graph. Forum 34(1), 141–151 (2015)CrossRefGoogle Scholar
  25. 25.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556 [abs] (2014)
  26. 26.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV (2003)Google Scholar
  27. 27.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. In: SIGGRAPH Conference Proceedings, pp. 835–846 (2006)Google Scholar
  28. 28.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  29. 29.
    Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of cnn activations. arXiv preprint arXiv:1511.05879 (2015)
  30. 30.
    Van de Sande, K.E., Uijlings, J.R., Gevers, T., Smeulders, A.W.: Segmentation as selective search for object recognition. In: ICCV. IEEE (2011)Google Scholar
  31. 31.
    Xie, L., Hong, R., Zhang, B., Tian, Q.: Image classification and retrieval are one. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 3–10. ACM (2015)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  • Jaehyeong Cho
    • 1
  • Jae-Pil Heo
    • 2
  • Taeyoung Kim
    • 1
  • Bohyung Han
    • 3
  • Sung-Eui Yoon
    • 1
    Email author
  1. 1.KAISTDaejeonRepublic of Korea
  2. 2.Sungkyunkwan UniversitySeoulRepublic of Korea
  3. 3.POSTECHPohangRepublic of Korea

Personalised recommendations