Image Search with Selective Match Kernels: Aggregation Across Single and Multiple Images

  • 1228 Accesses

  • 24 Citations


This paper considers a family of metrics to compare images based on their local descriptors. It encompasses the vector or locally aggregated descriptors descriptor and matching techniques such as hamming embedding. Making the bridge between these approaches leads us to propose a match kernel that takes the best of existing techniques by combining an aggregation procedure with a selective match kernel. The representation underpinning this kernel is approximated, providing a large scale image search both precise and scalable, as shown by our experiments on several benchmarks. We show that the same aggregation procedure, originally applied per image, can effectively operate on groups of similar features found across multiple images. This method implicitly performs feature set augmentation, while enjoying savings in memory requirements at the same time. Finally, the proposed method is shown effective for place recognition, outperforming state of the art methods on a large scale landmark recognition benchmark.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 199

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. 1.

    This is in contrast to our previous work (Tolias et al. 2013), where we have combined ASMK\(^\star \) with the geometry-based variant.


  1. Arandjelovic, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In CVPR.

  2. Arandjelović, R., & Zisserman, A. (2013). All about VLAD. In CVPR.

  3. Arandjelović, R., & Zisserman, A. (2014). DisLocation: Scalable descriptor distinctiveness for location recognition. In ACCV.

  4. Avrithis, Y., Kalantidis, Y., Tolias, G., & Spyrou, E. (2010). Retrieving landmark and non-landmark images from community photo collections. In ACM Multimedia.

  5. Bo, L., & Sminchisescu, C. (2009). Efficient match kernel between sets of features for visual recognition. In NIPS.

  6. Boureau, Y., Bach, F., Lecun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In cvpr.

  7. Charikar, M. (2002). Similarity estimation techniques from rounding algorithms. In ACM Symposium on Theory of Computing.

  8. Chen, D. M., Baatz, G., Koser, K., Tsai, S. S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., & Grzeszczuk, R. (2011). City-scale landmark identification on mobile devices. In CVPR.

  9. Chum, O., Mikulik, A., Perdoch, M., & Matas, J. (2011). Total recall II: Query expansion revisited. In CVPR.

  10. Chum, O., Philbin, J., Sivic, J., Isard, M., & Zisserman, A. (2007) Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV.

  11. Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV Workshop Statistical Learning in Computer Vision.

  12. Danfeng, Q., Gammeter, S., Bossard, L., Quack, T., & Gool, L. V. (2011). Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. In CVPR.

  13. Delhumeau, J., Gosselin, P.H., Jégou, H., & Pérez, P. (2013). Revisiting the vlad image representation. In ACM Multimedia.

  14. Delvinioti, A., Jégou, H., Amsaleg, L., & Houle, M. E. (2014). Image retrieval with reciprocal and shared nearest neighbors. In VISAPP.

  15. Hays, J., & Efros, A. A. (2008). Im2gps: estimating geographic information from a single image. In CVPR.

  16. Jain, M., Benmokhtar, R., Gros, P., & Jégou, H. (2012). Hamming embedding similarity-based image classification. In ICMR.

  17. Jain, M., Jégou, H., & Gros, P. (2011). Asymmetric hamming embedding: Taking the best of our bits for large scale image search. In ACM Multimedia.

  18. Jégou, H., Douze, M., & Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In ECCV.

  19. Jégou, H., Douze, M., & Schmid, C. (2009). On the burstiness of visual elements. In CVPR.

  20. Jégou, H., Douze, M., Schmid, C. (2010). Improving bag-of-features for large scale image search. IJCV, 87(3), 316–336.

  21. Jégou, H., Douze, M., Schmid, C. (2011). Product quantization for nearest neighbor search. Trans. PAMI, 33(1), 117–128.

  22. Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In CVPR.

  23. Ji, R., Duan, L., Chen, J., Yao, H., Yuan, J., Rui, Y., & Gao, W. (2012). Location discriminative vocabulary coding for mobile landmark search. IJCV, 1–25.

  24. Johns, E., & Yang, G. Z. (2011). From images to scenes: Compressing an image cluster into a single scene model for place recognition. In ICCV.

  25. Kalantidis, Y., & Avrithis, Y. (2014). Locally optimized product quantization for approximate nearest neighbor search. In: CVPR, Columbus, Ohio.

  26. Knopp, J., Sivic, J., & Pajdla, T. (2010). Avoiding confusing features in place recognition. In ECCV.

  27. Li, Y., Crandall, D. J., & Huttenlocher, D. P. (2009). Landmark classification in large-scale image collections. In ICCV.

  28. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110.

  29. Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.

  30. Mikulik, A., Perdoch, M., Chum, O., & Matas, J. (2013). Learning vocabularies over a fine quantization. IJCV, 103(1), 163–175.

  31. Nistér, D., & Stewénius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR (pp. 2161–2168).

  32. Perdoch, M., Chum, O., Matas, J. (2009). Efficient representation of local geometry for large scale object retrieval. In CVPR.

  33. Perronnin, F., & Dance, C. R. (2007). Fisher kernels on visual vocabularies for image categorization. In CVPR.

  34. Perronnin, F., Liu, Y., Sanchez, J., & Poirier, H. (2010). Large-scale image retrieval with compressed Fisher vectors. In CVPR.

  35. Perronnin, F., Sánchez, J., Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV.

  36. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR.

  37. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR.

  38. Qin, D., Wengert, C., Van Gool, L. (2013). Query adaptive similarity for large scale object retrieval. In CVPR.

  39. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.

  40. Schindler, G., Brown, M., & Szeliski, R. (2007). City-scale location recognition. In CVPR.

  41. Shen, X., Lin, Z., Brandt, J., Avidan, S., & Wu, Y. (2012). Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In CVPR.

  42. Sivic, J., Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV.

  43. Tao, R., Gavves, E., Snoek, C. G., & Smeulders, A. W. (2014). Locality in generic instance search from one example. In CVPR.

  44. Tolias, G., & Avrithis, Y. (2011). Speeded-up, relaxed spatial matching. In ICCV.

  45. Tolias, G., Avrithis, Y., & Jégou, H. (2013). To aggregate or not to aggregate: selective match kernels for image search. In ICCV.

  46. Tolias, G., & Jégou, H. (2014). Visual query expansion with or without geometry: Refining local descriptors by feature aggregation. Pattern Recognition.

  47. Torii, A., Sivic, J., Pajdla, T., & Okutomi, M. (2013). Visual place recognition with repetitive structures. In CVPR.

  48. Torralba, A., Fergus, R., & Weiss, Y. (2008). Small codes and large databases for recognition. In CVPR.

  49. Turcot, P., & Lowe, D. G. (2009). Better matching with fewer features: The selection of useful features in large database recognition problems. In CVPR.

  50. Wang, J., Yang, J., K. Yu, F. L., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR.

  51. Wu, Z., Ke, Q., Isard, M., & Sun, J. (2009). Bundling features for large scale partial-duplicate web image search. In CVPR (pp. 25–32).

  52. Zhang, S., Yang, M., Cour, T., Yu, K., & Metaxas, D. N. (2012). Query specific fusion for image retrieval. In ECCV.

Download references


This work was supported by ERC Grant Viamass No. 336054 and ANR project Fire-ID.

Author information

Correspondence to Giorgos Tolias.

Additional information

Communicated by Riad I. Hammoud, Josef Sivic, Larry S. Davis, Marc Pollefeys.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tolias, G., Avrithis, Y. & Jégou, H. Image Search with Selective Match Kernels: Aggregation Across Single and Multiple Images. Int J Comput Vis 116, 247–261 (2016).

Download citation


  • Image retrieval
  • Match kernels
  • Feature aggregation
  • Feature augmentation
  • Query expansion
  • Place recognition