International Journal of Computer Vision

, Volume 103, Issue 1, pp 163–175 | Cite as

Learning Vocabularies over a Fine Quantization

  • Andrej Mikulik
  • Michal Perdoch
  • Ondřej Chum
  • Jiří Matas


A novel similarity measure for bag-of-words type large scale image retrieval is presented. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming embedding. The novel similarity function achieves mean average precision that is superior to any result published in the literature on the standard Oxford 5k, Oxford 105k and Paris datasets/protocols. We study the effect of a fine quantization and very large vocabularies (up to 64 million words) and show that the performance of specific object retrieval increases with the size of the vocabulary. This observation is in contradiction with previously published results. We further demonstrate that the large vocabularies increase the speed of the tf-idf scoring step.


Image retrieval Vocabulary Feature track 


  1. Agarwal, S., Snavely, N., Simon, I., Seitz, S., & Szeliski, R. (2009). Building rome in a day. In Proceedings of ICCV, Kyoto.Google Scholar
  2. Avrithis, Y., & Kalantidis, Y. (2012). Approximate gaussian mixtures for large scale vocabularies. In Proceedings of European conference on computer vision (ECCV 2012), Florence.Google Scholar
  3. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: ACM Press (ISBN: 020139829).Google Scholar
  4. Cech, J., Matas, J., & Perdoch, M. (2008). Efficient sequential correspondence selection by cosegmentation. In Proceedings of CVPR, Anchorage.Google Scholar
  5. Chum, O., & Matas, J. (2010). Large-scale discovery of spatially related images. IEEE PAMI, 32, 371–377.CrossRefGoogle Scholar
  6. Chum, O., Perdoch, M., & Matas, J. (2009). Geometric min-hashing: Finding a (thick) needle in a haystack. In Proceedings of CVPR, Miami.Google Scholar
  7. Chum, O., Philbin, J., Sivic, J., Isard, M., & Zisserman, A. (2007). Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proceedings of ICCV, Rio de Janeiro.Google Scholar
  8. Duda, R., Hart, P., & Stork, D. (1995). Pattern classification and scene analysis (2nd ed.). New York: Wiley.Google Scholar
  9. Ferrari, V., Tuytelaars, T., & Van Gool, L. (2004). Simultaneous object recognition and segmentation by image exploration. In Proceedings of ECCV, Prague.Google Scholar
  10. Fraundorfer, F., Stewénius, H., & Nistér, D. (2007). A binning scheme for fast hard drive based image search. In Proceedings of CVPR, Minneapolis.Google Scholar
  11. Godsil, C., & Royle, G. (2001). Algebraic graph theory. New York: Springer.CrossRefzbMATHGoogle Scholar
  12. Hua, G., Brown, M., & Winder, S. (2007). Discriminant embedding for local image descriptors. In Proceedings of ICCV, Rio de Janeiro.Google Scholar
  13. Jegou, H., Douze, M., & Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of ECCV, Marseille.Google Scholar
  14. Jégou, H., Douze, M., & Schmid, C. (2009). On the burstiness of visual elements. In Proceedings CVPR, Miami.Google Scholar
  15. Jégou, H., Douze, M., & Schmid, C. (2010). Improving bag-of-features for large scale image search. IJCV, 87(3), 316–336.CrossRefGoogle Scholar
  16. Li, X., Wu, C., Zach, C., Lazebnik, S., & Frahm, J. -M. (2008). Modeling and recognition of landmark image collections using iconic scene graphs. In Proceedings of ECCV, Marseille.Google Scholar
  17. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110.CrossRefGoogle Scholar
  18. Makadia, A. (2010). Feature tracking for wide-baseline image retrieval. Berlin: Springer.Google Scholar
  19. Mikolajczyk, K., & Matas, J. (2007). Improving sift for fast tree matching by optimal linear projection. In Proceedings of ICCV, Rio de Janeiro.Google Scholar
  20. Mikulik, A., Perdoch, M., Chum, O., & Matas, J. (2010). Learning a fine vocabulary. In Daniilidis, K., Maragos, P., & Paragios, N., (eds.), Proceedings of ECCV, Lecture notes in computer science (Vol. 6313, pp. 1–14). Heidelberg, Germany. (Foundation for Research and Technology-Hellas (FORTH), Springer. CD-ROM).Google Scholar
  21. Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In VISSAPP.Google Scholar
  22. Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In Proceedings of CVPR, New York.Google Scholar
  23. Oliva, A., & Torralba, A. (2006). Building the gist of a scene: The role of global image features in recognition. Progress in Brain Research: Visual Perception 155, 23–36.Google Scholar
  24. Perdoch, M., Chum, O., & Matas, J. (2009). Efficient representation of local geometry for large scale object retrieval. In Proceedings of CVPR, Kyoto. Google Scholar
  25. Perronnin, F. (2008). Universal and adapted vocabularies for generic visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1243–1256.CrossRefGoogle Scholar
  26. Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In Proceedings of CVPR, Minneapolis.Google Scholar
  27. Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In Proceedings of CVPR, Anchorage.Google Scholar
  28. Project page (2012). Data, binaries, and source codes released with the paper.
  29. Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of ICCV, Nice (pp. 1470–1477).Google Scholar
  30. Tavenard, R., Amsaleg, L., & Jégou, H. (2010). Balancing clusters to reduce response time variability in large scale image search. Research Report RR-7387, INRIA.Google Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  • Andrej Mikulik
    • 1
  • Michal Perdoch
    • 1
  • Ondřej Chum
    • 1
  • Jiří Matas
    • 1
  1. 1.CMP, Department of Cybernetics, Faculty of Electrical EngineeringCzech Technical University in PraguePragueCzech Republic

Personalised recommendations