Abstract
A novel similarity measure for bag-of-words type large scale image retrieval is presented. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming embedding. The novel similarity function achieves mean average precision that is superior to any result published in the literature on the standard Oxford 5k, Oxford 105k and Paris datasets/protocols. We study the effect of a fine quantization and very large vocabularies (up to 64 million words) and show that the performance of specific object retrieval increases with the size of the vocabulary. This observation is in contradiction with previously published results. We further demonstrate that the large vocabularies increase the speed of the tf-idf scoring step.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We only consider and compare with methods that support queries that cover only a (small) part of the test image. Global methods like GIST (Oliva and Torralba 2006) achieve a much smaller memory footprint at the cost of allowing whole image queries only.
The Holidays dataset presented in (Jegou et al. 2008) contains about 5–10 % of the images rotated unnaturally for a human observer. Because the rotational variant feature descriptor was used in our experiment, we report the performance on a version of the dataset with corrected orientation of the images according to EXIF, or manually (by 90\(^\circ \), 180\(^\circ \) or 270\(^\circ \)), where the EXIF information is missing and the correct (sky-is-up) orientation is obvious.
References
Agarwal, S., Snavely, N., Simon, I., Seitz, S., & Szeliski, R. (2009). Building rome in a day. In Proceedings of ICCV, Kyoto.
Avrithis, Y., & Kalantidis, Y. (2012). Approximate gaussian mixtures for large scale vocabularies. In Proceedings of European conference on computer vision (ECCV 2012), Florence.
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: ACM Press (ISBN: 020139829).
Cech, J., Matas, J., & Perdoch, M. (2008). Efficient sequential correspondence selection by cosegmentation. In Proceedings of CVPR, Anchorage.
Chum, O., & Matas, J. (2010). Large-scale discovery of spatially related images. IEEE PAMI, 32, 371–377.
Chum, O., Perdoch, M., & Matas, J. (2009). Geometric min-hashing: Finding a (thick) needle in a haystack. In Proceedings of CVPR, Miami.
Chum, O., Philbin, J., Sivic, J., Isard, M., & Zisserman, A. (2007). Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proceedings of ICCV, Rio de Janeiro.
Duda, R., Hart, P., & Stork, D. (1995). Pattern classification and scene analysis (2nd ed.). New York: Wiley.
Ferrari, V., Tuytelaars, T., & Van Gool, L. (2004). Simultaneous object recognition and segmentation by image exploration. In Proceedings of ECCV, Prague.
Fraundorfer, F., Stewénius, H., & Nistér, D. (2007). A binning scheme for fast hard drive based image search. In Proceedings of CVPR, Minneapolis.
Godsil, C., & Royle, G. (2001). Algebraic graph theory. New York: Springer.
Hua, G., Brown, M., & Winder, S. (2007). Discriminant embedding for local image descriptors. In Proceedings of ICCV, Rio de Janeiro.
Jegou, H., Douze, M., & Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of ECCV, Marseille.
Jégou, H., Douze, M., & Schmid, C. (2009). On the burstiness of visual elements. In Proceedings CVPR, Miami.
Jégou, H., Douze, M., & Schmid, C. (2010). Improving bag-of-features for large scale image search. IJCV, 87(3), 316–336.
Li, X., Wu, C., Zach, C., Lazebnik, S., & Frahm, J. -M. (2008). Modeling and recognition of landmark image collections using iconic scene graphs. In Proceedings of ECCV, Marseille.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110.
Makadia, A. (2010). Feature tracking for wide-baseline image retrieval. Berlin: Springer.
Mikolajczyk, K., & Matas, J. (2007). Improving sift for fast tree matching by optimal linear projection. In Proceedings of ICCV, Rio de Janeiro.
Mikulik, A., Perdoch, M., Chum, O., & Matas, J. (2010). Learning a fine vocabulary. In Daniilidis, K., Maragos, P., & Paragios, N., (eds.), Proceedings of ECCV, Lecture notes in computer science (Vol. 6313, pp. 1–14). Heidelberg, Germany. (Foundation for Research and Technology-Hellas (FORTH), Springer. CD-ROM).
Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In VISSAPP.
Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In Proceedings of CVPR, New York.
Oliva, A., & Torralba, A. (2006). Building the gist of a scene: The role of global image features in recognition. Progress in Brain Research: Visual Perception 155, 23–36.
Perdoch, M., Chum, O., & Matas, J. (2009). Efficient representation of local geometry for large scale object retrieval. In Proceedings of CVPR, Kyoto.
Perronnin, F. (2008). Universal and adapted vocabularies for generic visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1243–1256.
Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In Proceedings of CVPR, Minneapolis.
Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In Proceedings of CVPR, Anchorage.
Project page (2012). Data, binaries, and source codes released with the paper. http://cmp.felk.cvut.cz/qqmikula/publications/ijcv2012/index.html.
Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of ICCV, Nice (pp. 1470–1477).
Tavenard, R., Amsaleg, L., & Jégou, H. (2010). Balancing clusters to reduce response time variability in large scale image search. Research Report RR-7387, INRIA.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mikulik, A., Perdoch, M., Chum, O. et al. Learning Vocabularies over a Fine Quantization. Int J Comput Vis 103, 163–175 (2013). https://doi.org/10.1007/s11263-012-0600-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-012-0600-1