International Journal of Computer Vision

, Volume 116, Issue 3, pp 247–261 | Cite as

Image Search with Selective Match Kernels: Aggregation Across Single and Multiple Images

  • Giorgos ToliasEmail author
  • Yannis Avrithis
  • Hervé Jégou


This paper considers a family of metrics to compare images based on their local descriptors. It encompasses the vector or locally aggregated descriptors descriptor and matching techniques such as hamming embedding. Making the bridge between these approaches leads us to propose a match kernel that takes the best of existing techniques by combining an aggregation procedure with a selective match kernel. The representation underpinning this kernel is approximated, providing a large scale image search both precise and scalable, as shown by our experiments on several benchmarks. We show that the same aggregation procedure, originally applied per image, can effectively operate on groups of similar features found across multiple images. This method implicitly performs feature set augmentation, while enjoying savings in memory requirements at the same time. Finally, the proposed method is shown effective for place recognition, outperforming state of the art methods on a large scale landmark recognition benchmark.


Image retrieval Match kernels Feature aggregation Feature augmentation Query expansion Place recognition 



This work was supported by ERC Grant Viamass No. 336054 and ANR project Fire-ID.


  1. Arandjelovic, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In CVPR.Google Scholar
  2. Arandjelović, R., & Zisserman, A. (2013). All about VLAD. In CVPR.Google Scholar
  3. Arandjelović, R., & Zisserman, A. (2014). DisLocation: Scalable descriptor distinctiveness for location recognition. In ACCV.Google Scholar
  4. Avrithis, Y., Kalantidis, Y., Tolias, G., & Spyrou, E. (2010). Retrieving landmark and non-landmark images from community photo collections. In ACM Multimedia.Google Scholar
  5. Bo, L., & Sminchisescu, C. (2009). Efficient match kernel between sets of features for visual recognition. In NIPS.Google Scholar
  6. Boureau, Y., Bach, F., Lecun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In cvpr.Google Scholar
  7. Charikar, M. (2002). Similarity estimation techniques from rounding algorithms. In ACM Symposium on Theory of Computing.Google Scholar
  8. Chen, D. M., Baatz, G., Koser, K., Tsai, S. S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., & Grzeszczuk, R. (2011). City-scale landmark identification on mobile devices. In CVPR.Google Scholar
  9. Chum, O., Mikulik, A., Perdoch, M., & Matas, J. (2011). Total recall II: Query expansion revisited. In CVPR.Google Scholar
  10. Chum, O., Philbin, J., Sivic, J., Isard, M., & Zisserman, A. (2007) Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV.Google Scholar
  11. Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV Workshop Statistical Learning in Computer Vision.Google Scholar
  12. Danfeng, Q., Gammeter, S., Bossard, L., Quack, T., & Gool, L. V. (2011). Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. In CVPR.Google Scholar
  13. Delhumeau, J., Gosselin, P.H., Jégou, H., & Pérez, P. (2013). Revisiting the vlad image representation. In ACM Multimedia.Google Scholar
  14. Delvinioti, A., Jégou, H., Amsaleg, L., & Houle, M. E. (2014). Image retrieval with reciprocal and shared nearest neighbors. In VISAPP.Google Scholar
  15. Hays, J., & Efros, A. A. (2008). Im2gps: estimating geographic information from a single image. In CVPR.Google Scholar
  16. Jain, M., Benmokhtar, R., Gros, P., & Jégou, H. (2012). Hamming embedding similarity-based image classification. In ICMR.Google Scholar
  17. Jain, M., Jégou, H., & Gros, P. (2011). Asymmetric hamming embedding: Taking the best of our bits for large scale image search. In ACM Multimedia.Google Scholar
  18. Jégou, H., Douze, M., & Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In ECCV.Google Scholar
  19. Jégou, H., Douze, M., & Schmid, C. (2009). On the burstiness of visual elements. In CVPR.Google Scholar
  20. Jégou, H., Douze, M., Schmid, C. (2010). Improving bag-of-features for large scale image search. IJCV, 87(3), 316–336.
  21. Jégou, H., Douze, M., Schmid, C. (2011). Product quantization for nearest neighbor search. Trans. PAMI, 33(1), 117–128.
  22. Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In CVPR.Google Scholar
  23. Ji, R., Duan, L., Chen, J., Yao, H., Yuan, J., Rui, Y., & Gao, W. (2012). Location discriminative vocabulary coding for mobile landmark search. IJCV, 1–25.Google Scholar
  24. Johns, E., & Yang, G. Z. (2011). From images to scenes: Compressing an image cluster into a single scene model for place recognition. In ICCV.Google Scholar
  25. Kalantidis, Y., & Avrithis, Y. (2014). Locally optimized product quantization for approximate nearest neighbor search. In: CVPR, Columbus, Ohio.Google Scholar
  26. Knopp, J., Sivic, J., & Pajdla, T. (2010). Avoiding confusing features in place recognition. In ECCV.Google Scholar
  27. Li, Y., Crandall, D. J., & Huttenlocher, D. P. (2009). Landmark classification in large-scale image collections. In ICCV.Google Scholar
  28. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110.CrossRefGoogle Scholar
  29. Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.CrossRefGoogle Scholar
  30. Mikulik, A., Perdoch, M., Chum, O., & Matas, J. (2013). Learning vocabularies over a fine quantization. IJCV, 103(1), 163–175.MathSciNetCrossRefGoogle Scholar
  31. Nistér, D., & Stewénius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR (pp. 2161–2168).Google Scholar
  32. Perdoch, M., Chum, O., Matas, J. (2009). Efficient representation of local geometry for large scale object retrieval. In CVPR.Google Scholar
  33. Perronnin, F., & Dance, C. R. (2007). Fisher kernels on visual vocabularies for image categorization. In CVPR.Google Scholar
  34. Perronnin, F., Liu, Y., Sanchez, J., & Poirier, H. (2010). Large-scale image retrieval with compressed Fisher vectors. In CVPR.Google Scholar
  35. Perronnin, F., Sánchez, J., Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV.Google Scholar
  36. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR.Google Scholar
  37. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR.Google Scholar
  38. Qin, D., Wengert, C., Van Gool, L. (2013). Query adaptive similarity for large scale object retrieval. In CVPR.Google Scholar
  39. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.Google Scholar
  40. Schindler, G., Brown, M., & Szeliski, R. (2007). City-scale location recognition. In CVPR.Google Scholar
  41. Shen, X., Lin, Z., Brandt, J., Avidan, S., & Wu, Y. (2012). Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In CVPR.Google Scholar
  42. Sivic, J., Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV.Google Scholar
  43. Tao, R., Gavves, E., Snoek, C. G., & Smeulders, A. W. (2014). Locality in generic instance search from one example. In CVPR.Google Scholar
  44. Tolias, G., & Avrithis, Y. (2011). Speeded-up, relaxed spatial matching. In ICCV.Google Scholar
  45. Tolias, G., Avrithis, Y., & Jégou, H. (2013). To aggregate or not to aggregate: selective match kernels for image search. In ICCV.Google Scholar
  46. Tolias, G., & Jégou, H. (2014). Visual query expansion with or without geometry: Refining local descriptors by feature aggregation. Pattern Recognition.Google Scholar
  47. Torii, A., Sivic, J., Pajdla, T., & Okutomi, M. (2013). Visual place recognition with repetitive structures. In CVPR.Google Scholar
  48. Torralba, A., Fergus, R., & Weiss, Y. (2008). Small codes and large databases for recognition. In CVPR.Google Scholar
  49. Turcot, P., & Lowe, D. G. (2009). Better matching with fewer features: The selection of useful features in large database recognition problems. In CVPR.Google Scholar
  50. Wang, J., Yang, J., K. Yu, F. L., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR.Google Scholar
  51. Wu, Z., Ke, Q., Isard, M., & Sun, J. (2009). Bundling features for large scale partial-duplicate web image search. In CVPR (pp. 25–32).Google Scholar
  52. Zhang, S., Yang, M., Cour, T., Yu, K., & Metaxas, D. N. (2012). Query specific fusion for image retrieval. In ECCV.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Giorgos Tolias
    • 1
    Email author
  • Yannis Avrithis
    • 2
  • Hervé Jégou
    • 1
  1. 1.INRIARennesFrance
  2. 2.NTUAAthensGreece

Personalised recommendations