VocMatch: Efficient Multiview Correspondence for Structure from Motion

  • Michal Havlena
  • Konrad Schindler
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8691)


Feature matching between pairs of images is a main bottleneck of structure-from-motion computation from large, unordered image sets. We propose an efficient way to establish point correspondences between all pairs of images in a dataset, without having to test each individual pair. The principal message of this paper is that, given a sufficiently large visual vocabulary, feature matching can be cast as image indexing, subject to the additional constraints that index words must be rare in the database and unique in each image. We demonstrate that the proposed matching method, in conjunction with a standard inverted file, is 2-3 orders of magnitude faster than conventional pairwise matching. The proposed vocabulary-based matching has been integrated into a standard SfM pipeline, and delivers results similar to those of the conventional method in much less time.


Feature matching Image clustering Structure from motion 

Supplementary material

978-3-319-10578-9_4_MOESM1_ESM.pdf (8.2 mb)
Electronic Supplementary Material (PDF 8,385 KB)
978-3-319-10578-9_4_MOESM2_ESM.wrl (12.1 mb)
Electronic Supplementary Material (WRL 12,368 KB)
978-3-319-10578-9_4_MOESM3_ESM.wrl (25.6 mb)
Electronic Supplementary Material (WRL 26,192 KB)


  1. 1.
    Agarwal, S., Snavely, N., Simon, I., Seitz, S., Szeliski, R.: Building Rome in a day. In: ICCV (2009)Google Scholar
  2. 2.
    Chum, O., Perdoch, M., Matas, J.: Geometric min-Hashing: Finding a (thick) needle in a haystack. In: CVPR (2009)Google Scholar
  3. 3.
    Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: Automatic query expansion with a generative feature model for object retrieval. In: ICCV (2007)Google Scholar
  4. 4.
    Frahm, J.-M., et al.: Building Rome on a cloudless day. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 368–381. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Havlena, M., Hartmann, W., Schindler, K.: Optimal reduction of large image databases for location recognition. In: BD3DCV (2013)Google Scholar
  6. 6.
    Havlena, M., Torii, A., Knopp, J., Pajdla, T.: Randomized structure from motion based on atomic 3D models from camera triplets. In: CVPR (2009)Google Scholar
  7. 7.
    Havlena, M., Torii, A., Pajdla, T.: Efficient structure from motion by graph optimization. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 100–113. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Klingner, B., Martin, D., Roseborough, J.: Street view motion-from-structure-from-motion. In: ICCV (2013)Google Scholar
  9. 9.
    Li, X., Wu, C., Zach, C., Lazebnik, S., Frahm, J.-M.: Modeling and recognition of landmark image collections using iconic scene graphs. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 427–440. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Li, Y., Snavely, N., Huttenlocher, D.P.: Location recognition using prioritized feature matching. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 791–804. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  12. 12.
    Mikulik, A., Perdoch, M., Chum, O., Matas, J.: Learning vocabularies over a fine quantization. IJCV 103(1), 163–175 (2013)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Muja, M., Lowe, D.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP (2009)Google Scholar
  14. 14.
    Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: CVPR (2006)Google Scholar
  15. 15.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  16. 16.
    Perdoch, M., Chum, O., Matas, J.: Efficient representation of local geometry for large scale object retrieval. In: CVPR (2009)Google Scholar
  17. 17.
    Philbin, J., Chum, O., Isard, M., Šivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  18. 18.
    Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. IJCV 59(3), 207–232 (2004)CrossRefGoogle Scholar
  19. 19.
    Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2D-to-3D matching. In: ICCV (2011)Google Scholar
  20. 20.
    Šivic, J., Zisserman, A.: Video Google: Efficient visual search of videos. In: Toward Category-Level Object Recognition, CLOR (2006)Google Scholar
  21. 21.
    Snavely, N., Seitz, S., Szeliski, R.: Modeling the world from internet photo collections. IJCV 80(2), 189–210 (2008)CrossRefGoogle Scholar
  22. 22.
    Wu, C.: VisualSFM: A visual structure from motion system (2013),
  23. 23.
    Yahoo!: Flickr: Online photo management and photo sharing application (2005),

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Michal Havlena
    • 1
  • Konrad Schindler
    • 1
  1. 1.Institute of Geodesy and PhotogrammetryETH ZürichSwitzerland

Personalised recommendations