Machine Vision and Applications

, Volume 30, Issue 2, pp 243–254 | Cite as

Graph-based particular object discovery

  • Oriane Siméoni
  • Ahmet IscenEmail author
  • Giorgos Tolias
  • Yannis Avrithis
  • Ondřej Chum
Special Issue Paper


Severe background clutter is challenging in many computer vision tasks, including large-scale image retrieval. Global descriptors, which are popular due to their memory and search efficiency, are especially prone to corruption by such a clutter. Eliminating the impact of the clutter on the image descriptor increases the chance of retrieving relevant images and prevents topic drift due to actually retrieving the clutter in the case of query expansion. In this work, we propose a novel salient region detection method. It captures, in an unsupervised manner, patterns that are both discriminative and common in the dataset. Saliency is based on a centrality measure of a nearest neighbor graph constructed from regional CNN representations of dataset images. The proposed method exploits recent CNN architectures trained for object retrieval to construct the image representation from the salient regions. We improve particular object retrieval on challenging datasets containing small objects.


Image retrieval Unsupervised object discovery Image saliency 



This work was supported by the OP VVV funded project CZ.02.1.01/0.0/0.0/16_019/0000765 “Research Center for Informatics.” The Tesla K40 used for this research was donated by the NVIDIA Corporation.


  1. 1.
    Arandjelović, R., Zisserman, A.: Visual vocabulary with a semantic twist. In: ACCV (2014)Google Scholar
  2. 2.
    Avrithis, Y., Kalantidis, Y.: Approximate gaussian mixtures for large scale vocabularies. In: ECCV, pp. 15–28. Springer (2012)Google Scholar
  3. 3.
    Azizpour, H., Razavian, A.S., Sullivan, J., Maki, A., Carlsson, S.: From generic to specific deep representations for visual recognition. arXiv preprint arXiv:1406.5774 (2014)
  4. 4.
    Babenko, A., Lempitsky, V.: Aggregating deep convolutional features for image retrieval. In: ICCV (2015)Google Scholar
  5. 5.
    Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: ECCV (2014)Google Scholar
  6. 6.
    Bagon, S., Brostovski, O., Galun, M., Irani, M.: Detecting and sketching the common. In: CVPR (2010)Google Scholar
  7. 7.
    Cho, M., Kwak, S., Schmid, C., Ponce, J.: Unsupervised object discovery and localization in the wild: part-based matching with bottom-up region proposals. In: CVPR (2015)Google Scholar
  8. 8.
    Chum, O., Matas, J.: Unsupervised discovery of co-occurrence in sparse high dimensional data. In: CVPR (2010)Google Scholar
  9. 9.
    Dong, W., Charikar, M., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: WWW (2011)Google Scholar
  10. 10.
    Gammeter, S., Bossard, L., Quack, T., Gool, L.V.: I know what you did last summer: Object-level auto-annotation of holiday snaps. In: ICCV (2009)Google Scholar
  11. 11.
    Gordo, A., Almazan, J., Revaud, J., Larlus, D.: Deep image retrieval: Learning global representations for image search. In: ECCV (2016)Google Scholar
  12. 12.
    Gordo, A., Almazan, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. arXiv preprint arXiv:1610.07940 (2016)
  13. 13.
    Hubbell, C.H.: An input-output approach to clique identification. Sociometry (1965)Google Scholar
  14. 14.
    Iscen, A., Avrithis, Y., Tolias, G., Furon, T., Chum, O.: Fast spectral ranking for similarity search. In: CVPR (2018)Google Scholar
  15. 15.
    Iscen, A., Tolias, G., Avrithis, Y., Furon, T., Chum, O.: Efficient diffusion on region manifolds: recovering small objects with compact cnn representations. In: CVPR (2017)Google Scholar
  16. 16.
    Jégou, H., Douze, M., Schmid, C.: Improving bag-of-features for large scale image search. IJCV 87(3), 316–336 (2010)CrossRefGoogle Scholar
  17. 17.
    Jeong, D.-J., Choo, S., Seo, W., Cho, N.I.: Regional deep feature aggregation for image retrieval. In: ICASSP (2017)Google Scholar
  18. 18.
    Jimenez, A., Alvarez, J.M., Giro-i Nieto, X.: Class-weighted convolutional features for visual instance search. In: BMVC (2017)Google Scholar
  19. 19.
    Kalantidis, Y., Mellina, C., Osindero, S.: Cross-dimensional weighting for aggregated deep convolutional features. In: arXiv (2015)Google Scholar
  20. 20.
    Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)CrossRefzbMATHGoogle Scholar
  21. 21.
    Kim, G., Torralba, A.: Unsupervised detection of regions of interest using iterative link analysis. In: NIPS (2009)Google Scholar
  22. 22.
    Kim, J., Yoon, S.-E.: Regional attention based deep feature for image retrieval. In: BMVC (2018)Google Scholar
  23. 23.
    Knopp, J., Sivic, J., Pajdla, T.: Avoiding confusing features in place recognition. In: ECCV (2010)Google Scholar
  24. 24.
    Kwak, S., Cho, M., Laptev, I., Ponce, J., Schmid, C.: Unsupervised object discovery and tracking in video collections. In: CVPR (2015)Google Scholar
  25. 25.
    Laskar, Z., Kannala, J.: Context aware query image representation for particular object retrieval. In: Scandinavian Conference on Image Analysis (2017)Google Scholar
  26. 26.
    Mej, N.: Networks: An Introduction. Oxford University Press, Oxford (2010)Google Scholar
  27. 27.
    Mikolajczyk, K., Matas, J.: Improving descriptors for fast tree matching by optimal linear projection. In: CVPR (2007)Google Scholar
  28. 28.
    Mohedano, E., McGuinness, K., Giro-i Nieto, X., O’Connor, N.E.: Saliency weighted convolutional features for instance search. arXiv preprint arXiv:1711.10795 (2017)
  29. 29.
    Nocedal, J., Wright, S.: Numerical Optimization. Springer, Berlin (2006)zbMATHGoogle Scholar
  30. 30.
    Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: arXiv (2016)Google Scholar
  31. 31.
    Oliva, A., Torralba, A.: Building the gist of a scene: the role of global image features in recognition. Prog. Brain Res. 155, 23–36 (2006)CrossRefGoogle Scholar
  32. 32.
    Omercevic, D., Perko, R., Targhi, A.T., Eklundh, J.-O., Leonardis, A.: Vegetation segmentation for boosting performance of mser feature detector. In: Computer Vision Winter Workshop (2008)Google Scholar
  33. 33.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1999)Google Scholar
  34. 34.
    Pang, S., Ma, J., Xue, J., Zhu, J., Ordonez, V.: Image retrieval using heat diffusion for deep feature aggregation. arXiv preprint arXiv:1805.08587 (2018)
  35. 35.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  36. 36.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. In: CVPR (2008)Google Scholar
  37. 37.
    Radenović, F., Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Revisiting oxford and paris: large-scale image retrieval benchmarking. In: CVPR (2018)Google Scholar
  38. 38.
    Radenović, F., Tolias, G., Chum, O.: CNN image retrieval learns from bow: unsupervised fine-tuning with hard examples. In: ECCV (2016)Google Scholar
  39. 39.
    Radenović, F., Tolias, G., Chum, O.: Fine-tuning cnn image retrieval with no human annotation. IEEE Trans. PAMI (2018)Google Scholar
  40. 40.
    Razavian, A.S., Sullivan, J., Carlsson, S., Maki, A.: Visual instance retrieval with deep convolutional networks. ITE Trans. Media. Technol. Appl. 4, 251–258 (2016)CrossRefGoogle Scholar
  41. 41.
    Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: CVPR (2013)Google Scholar
  42. 42.
    Salvador, A., Giró-i Nieto, X., Marqués, F., Satoh, S.: Faster r-cnn features for instance search. In: CVPRW (2016)Google Scholar
  43. 43.
    Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-CAM: Why did you say that? visual explanations from deep networks via gradient-based localization. arXiv preprint arXiv:1610.02391 (2016)
  44. 44.
    Shi, M., Avrithis, Y., Jegou, H.: Early burst detection for memory-efficient image retrieval. In: CVPR (2015)Google Scholar
  45. 45.
    Simeoni, O., Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Unsupervised object discovery for instance recognition. In: WACV (2018)Google Scholar
  46. 46.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ICLR (2014)Google Scholar
  47. 47.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV (2003)Google Scholar
  48. 48.
    Song, J., He, T., Gao, L., Xu, X., Shen, H.T.: Deep region hashing for efficient large-scale instance search from images. In: arXiv (2017)Google Scholar
  49. 49.
    Tolias, G., Avrithis, Y., Jégou, H.: Image search with selective match kernels: aggregation across single and multiple images. IJCV (2016)Google Scholar
  50. 50.
    Tolias, G., Kalantidis, Y., Avrithis, Y.: Symcity: Feature selection by symmetry for large scale image retrieval. In: ACM Multimedia (2012)Google Scholar
  51. 51.
    Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of cnn activations. In: ICLR (2016)Google Scholar
  52. 52.
    Turcot, P., Lowe, D.G.: Better matching with fewer features: The selection of useful features in large database recognition problems. In: ICCVW (2009)Google Scholar
  53. 53.
    Vigna, S.: Spectral ranking. arXiv preprint arXiv:0912.0238 (2009)
  54. 54.
    Wang, S., Jiang, S.: Instre: a new benchmark for instance-level object retrieval and recognition. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 11, 37 (2015)Google Scholar
  55. 55.
    Zheng, L., Wang, S., Wang, J., Tian, Q.: Accurate image search with multi-scale contextual evidences. IJCV 120(1), 1–13 (2016)MathSciNetCrossRefGoogle Scholar
  56. 56.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)Google Scholar
  57. 57.
    Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B.: Ranking on data manifolds. In: NIPS (2003)Google Scholar
  58. 58.
    Zhu, Y., Wang, J., Xie, L., Zheng, L.: Attention-based pyramid aggregation network for visual place recognition. arXiv preprint arXiv:1808.00288 (2018)

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Inria, Univ Rennes, CNRS, IRISARennesFrance
  2. 2.VRG, FEE, CTU in PraguePragueCzech Republic

Personalised recommendations