International Journal of Computer Vision

, Volume 87, Issue 3, pp 316–336 | Cite as

Improving Bag-of-Features for Large Scale Image Search

Article

Abstract

This article improves recent methods for large scale image search. We first analyze the bag-of-features approach in the framework of approximate nearest neighbor search. This leads us to derive a more precise representation based on Hamming embedding (HE) and weak geometric consistency constraints (WGC). HE provides binary signatures that refine the matching based on visual words. WGC filters matching descriptors that are not consistent in terms of angle and scale. HE and WGC are integrated within an inverted file and are efficiently exploited for all images in the dataset. We then introduce a graph-structured quantizer which significantly speeds up the assignment of the descriptors to visual words. A comparison with the state of the art shows the interest of our approach when high accuracy is needed.

Experiments performed on three reference datasets and a dataset of one million of images show a significant improvement due to the binary signature and the weak geometric consistency constraints, as well as their efficiency. Estimation of the full geometric transformation, i.e., a re-ranking step on a short-list of images, is shown to be complementary to our weak geometric consistency constraints. Our approach is shown to outperform the state-of-the-art on the three datasets.

Keywords

Image retrieval Nearest neighbor search Object recognition Image search 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andoni, A., Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. (2006). Nearest-neighbor methods in learning and vision: theory and practice. Cambridge: MIT Press. Google Scholar
  2. Chum, O., Philbin, J., Sivic, J., Isard, M., & Zisserman, A. (2007). Total recall: Automatic query expansion with a generative feature model for object retrieval. In International conference on computer vision. Google Scholar
  3. Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. (2004). Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on computational geometry (pp. 253–262). Google Scholar
  4. Douze, M., Jégou, H., Singh, H., Amsaleg, L., & Schmid, C. (2009). Evaluation of gist descriptors for web-scale image search. In Conference on image and video retrieval. Google Scholar
  5. Fraundorfer, F., Stewénius, H., & Nistér, D. (2007). A binning scheme for fast hard drive based image search. In Conference on computer vision and pattern recognition. Google Scholar
  6. Jégou, H., & Douze, M. (2008). INRIA Holidays dataset. http://lear.inrialpes.fr/people/jegou/data.php.
  7. Jégou, H., Harzallah, H., & Schmid, C. (2007). A contextual dissimilarity measure for accurate and efficient image search. In Conference on computer vision and pattern recognition. Google Scholar
  8. Jégou, H., Douze, M., & Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In European conference on computer vision. Google Scholar
  9. Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 77–116. Google Scholar
  10. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. CrossRefGoogle Scholar
  11. Matas, J., Chum, O., Martin, U., & Pajdla, T. (2002). Robust wide baseline stereo from maximally stable extremal regions. In British machine vision conference (pp. 384–393). Google Scholar
  12. Mikolajczyk, K. (2007). Binaries for affine covariant region descriptors. http://www.robots.ox.ac.uk/~vgg/research/affine/.
  13. Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86. CrossRefGoogle Scholar
  14. Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In International conference on computer vision and applications. Google Scholar
  15. Nistér, D., & Stewénius, H. (2006). Scalable recognition with a vocabulary tree. In Conference on computer vision and pattern recognition (pp. 2161–2168). Google Scholar
  16. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175. MATHCrossRefGoogle Scholar
  17. Omercevic, D., Drbohlav, O., & Leonardis, A. (2007). High-dimensional feature matching: employing the concept of meaningful nearest neighbors. In International conference on computer vision. Google Scholar
  18. Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In Conference on computer vision and pattern recognition. Google Scholar
  19. Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In Conference on computer vision and pattern recognition. Google Scholar
  20. Schindler, G., Brown, M., & Szeliski, R. (2007). City-scale location recognition. In Conference on computer vision and pattern recognition. Google Scholar
  21. Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In International conference on computer vision (pp. 1470–1477). Google Scholar
  22. Torralba, A., Fergus, R., & Weiss, Y. (2008). Small codes and large databases for recognition. In Conference on computer vision and pattern recognition. Google Scholar
  23. Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. In Advances in neural information processing systems. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Hervé Jégou
    • 1
  • Matthijs Douze
    • 1
  • Cordelia Schmid
    • 1
  1. 1.INRIA Grenoble Rhône-AlpesSaint-Ismier CedexFrance

Personalised recommendations