Recovering the 3D Geometry of Heritage Monuments from Image Collections



Several methods have been proposed for large-scale 3D reconstruction from large, unorganized image collections. A large reconstruction problem is typically divided into multiple components which are reconstructed independently using structure from motion (SFM) and later merged together. Incremental SFM methods are most popular for the basic structure recovery of a single component. They are robust and effective but strictly sequential in nature. We present a multistage approach for SFM reconstruction of a single component that breaks the sequential nature of the incremental SFM methods. Our approach begins with quickly building a coarse 3D model using only a fraction of features from given images. The coarse model is then enriched by localizing remaining images and matching and triangulating remaining features in subsequent stages. The geometric information available in the form of the coarse model allows us to make these stages effective, efficient, and highly parallel. We show that our method produces similar quality models as compared to standard SFM methods while being notably fast and parallel.



This work is supported by Google India PhD Fellowship and India Digital Heritage Project of the Department of Science and Technology, India. We would like to thank Vanshika Srivastava for her contributions to the project and Chris Sweeney for his crucial help regarding use of Theia for our experiments. We would also like to thank the authors of [8] for sharing the details of the Hampi Vitthala Temple dataset they used.


  1. 1.
    Agarwal S, Mierle K (2010) Others: Ceres solver.
  2. 2.
    Agarwal S, Snavely N, Seitz SM, Szeliski R (2010) Bundle adjustment in the large. In: Proceedings ECCVGoogle Scholar
  3. 3.
    Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building rome in a day. In: Proceedings ICCVGoogle Scholar
  4. 4.
    Agrawal A, Raskar R, Chellappa, R (2006) What is the range of surface reconstructions from a gradient field?. In: Proceedings of the European Conference on Computer VisionGoogle Scholar
  5. 5.
    Aharon M, Elad M, Bruckstein A (2006) K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal ProcessGoogle Scholar
  6. 6.
    Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY (1998) An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J ACM 45(6)Google Scholar
  7. 7.
    Barron J, Malik J (2012) Color constancy, intrinsic images, and shape estimation. In: Proceedings of the European Conference on Computer VisionGoogle Scholar
  8. 8.
    Bhowmick B, Patra S, Chatterjee A, Govindu V, Banerjee S (2014) Divide and conquer: Efficient large-scale structure from motion using graph partitioning. In: Proceedings ACCV, pp. 273–287Google Scholar
  9. 9.
    Brown M, Lowe D (2005) Unsupervised 3d object recognition and reconstruction in unordered datasets. In: 3-D Digital Imaging and ModelingGoogle Scholar
  10. 10.
    Byröd M, Åström K (2010) Conjugate Gradient Bundle AdjustmentGoogle Scholar
  11. 11.
    Cao S, Snavely N(2012) Learning to match images in large-scale collections. In: Proceedings ECCV WorkshopGoogle Scholar
  12. 12.
    Chatterjee A, Govindu VM (2013) Efficient and robust large-scale rotation averaging. In: 2013 IEEE ICCVGoogle Scholar
  13. 13.
    Choudhary S, Narayanan P (2012) Visibility probability structure from SfM datasets and applications. In: Proceedings ECCVGoogle Scholar
  14. 14.
    Chum O, Matas J (2010) Large-scale discovery of spatially related images. IEEE Trans Pattern Anal Mach Intell 32(2):371–377Google Scholar
  15. 15.
    Cohen A, Sattler T, Pollefeys M (2015) Merging the unmatchable: Stitching visually disconnected SfM models. In: Proceedings IEEE ICCVGoogle Scholar
  16. 16.
    Cohen A, Zach C, Sinha S, Pollefeys M (2012) Discovering and exploiting 3d symmetries in structure from motion. In: Proceedings CVPRGoogle Scholar
  17. 17.
    Crandall D, Owens A, Snavely N, Huttenlocher D (2011) Discrete-continuous optimization for large-scale structure from motion. In: Proceedings IEEE CVPRGoogle Scholar
  18. 18.
    Frahm JM, Fite-Georgel P, Gallup D, Johnson T, Raguram R, Wu C, Jen YH, Dunn E, Clipp B, Lazebnik S, Pollefeys M (2010) Building rome on a cloudless day. In: Proceedings ECCVGoogle Scholar
  19. 19.
    Gherardi R, Farenzena M, Fusiello A (2010) Improving the efficiency of hierarchical structure-and-motion. In: Proceedings IEEE CVPRGoogle Scholar
  20. 20.
    Hartley R, Zisserman A (2003) Multiple View Geometry in Computer Vision. Cambridge University Press, CambridgeGoogle Scholar
  21. 21.
    Hartmann W, Havlena M, Schindler K (2014) Predicting matchability. Proceedings IEEE CVPR. CVPR ’14. IEEE Comput Society, Washington, DC, USA, pp 9–16Google Scholar
  22. 22.
    Havlena M, Schindler K (2014) Vocmatch: Efficient multiview correspondence for structure from motion. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds.) Proceedings ECCV 2014Google Scholar
  23. 23.
    Havlena M, Torii A, Knopp J, Pajdla T (2009) Randomized structure from motion based on atomic 3d models from camera triplets. In: Proceedings IEEE CVPRGoogle Scholar
  24. 24.
    Havlena M, Torii A, Pajdla T (2010) Efficient structure from motion by graph optimization. In: Proceedings ECCV 2010Google Scholar
  25. 25.
    Irschara A, Zach C, Frahm JM, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: Proceedings IEEE CVPRGoogle Scholar
  26. 26.
    Jian C, Cong L, Jiaxiang W, Hainan C, Hanqing L (2014) Fast and accurate image matching with cascade hashing for 3d reconstruction. In: Proceedings IEEE CVPRGoogle Scholar
  27. 27.
    Li Y, Snavely N, Huttenlocher DP (2010) Location recognition using prioritized feature matching. In: Proceedings ECCVGoogle Scholar
  28. 28.
    Lou Y, Snavely N, Gehrke J (2012) Matchminer: Efficient spanning structure mining in large image collections. In: Proceedings ECCVGoogle Scholar
  29. 29.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2)Google Scholar
  30. 30.
    Moulon P, Monasse P, Marlet R (2013) Global fusion of relative motions for robust, accurate and scalable structure from motion. In: IEEE ICCVGoogle Scholar
  31. 31.
    Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Intel 36Google Scholar
  32. 32.
    Olsson C, Enqvist O (2011) Stable structure from motion for unordered image collections. In: Proceedings of the 17th Scandinavian conference on Image analysis, ser. SCIA11, pp 524–535Google Scholar
  33. 33.
    Panagopoulos A, Hadap S, Samaras D (2012) Reconstructing shape from dictionaries of shading primitives. In: Proceedings of the Asian Conference on Computer VisionGoogle Scholar
  34. 34.
    Petschnigg G, Szeliski R, Agrawala M, Cohen M, Hoppe H, Toyama K (2004) Digital photography with flash and no-flash image pairs. In: Proceedings of the ACM SIGGRAPHGoogle Scholar
  35. 35.
    Ping-Sing T, Shah M (1994) Shape from shading using linear approximation. Image Vision Comput 12(8):487–498Google Scholar
  36. 36.
    Raguram R, Wu C, Frahm JM, Lazebnik S (2011) Modeling and recognition of landmark image collections using iconic scene graphs. Intern J Comput Vision 95(3):213–239CrossRefGoogle Scholar
  37. 37.
    Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2d-to-3d matching. In: Proceedings IEEE ICCVGoogle Scholar
  38. 38.
    Sattler T, Leibe B, Kobbelt L (2012) Improving image-based localization by active correspondence search. In: Proceedings ECCVGoogle Scholar
  39. 39.
    Schönberger JL, Berg AC, Frahm JM (2015) Paige: Pairwise image geometry encoding for improved efficiency in structure-from-motion. In: IEEE CVPRGoogle Scholar
  40. 40.
    Shah R, Deshpande A, Narayanan PJ (2014) Multistage sfm: Revisiting incremental structure from motion. In: International Conference on 3D Vision (3DV), vol. 1, pp. 417–424Google Scholar
  41. 41.
    Shah R, Deshpande A, Narayanan PJ (2015) Multistage SFM: A Coarse-to-Fine Approach for 3D Reconstruction. In:CoRR (2015)Google Scholar
  42. 42.
    Shah R, Srivastava V, Narayanan PJ (2015) Geometry-aware feature matching for structure from motion applications. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 278–285Google Scholar
  43. 43.
    Sinha S, Steedly D, Szeliski R (2010) A multi-stage linear approach to structure from motion. In: Proceedings ECCV RMLE WorkshopGoogle Scholar
  44. 44.
    Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: Exploring photo collections in 3d. ACM Trans Graph 25(3)Google Scholar
  45. 45.
    Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vision 80(2)Google Scholar
  46. 46.
    Snavely N, Seitz SM, Szeliski R (2008) Skeletal graphs for efficient structure from motion. In: Proceedings IEEE CVPRGoogle Scholar
  47. 47.
    Soman J, Kothapalli K, Narayanan PJ (2010) Some GPU algorithms for graph connected components and spanning tree. Parallel Process Lett 20(04)Google Scholar
  48. 48.
    Sturm PF, Triggs B (1996) A factorization based algorithm for multi-image projective structure and motion. In: Proceedings of the 4th European Conference on Computer Vision, ECCV ’96, pp 709–720Google Scholar
  49. 49.
    Sweeney C (2015) Theia Multiview Geometry Library: Tutorial & Reference. University of California, Santa BarbaraGoogle Scholar
  50. 50.
    Szeliski R, Kang SB (1993) Recovering 3d shape and motion from image streams using nonlinear least squares. In: Proceedings CVPR ’93, 1993 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 752–753 (1993)Google Scholar
  51. 51.
    Taylor C, Kriegman D, Anandan P (1991) Structure and motion in two dimensions from multiple images: a least squares approach. In Proceedings of the IEEE Workshop on Visual Motion, pp 242–248Google Scholar
  52. 52.
    Tomasi C, Kanade T (1992) Shape and motion from image streams under orthography: a factorization method. Intern J Comput Vision 9(2):137–154Google Scholar
  53. 53.
    Triggs B, McLauchlan P, Hartley R, Fitzgibbon A (2000) Bundle adjustment a modern synthesis. In: Triggs B, Zisserman A, Szeliski R (eds.) Vision Algorithms: Theory and Practice, vol. 1883, pp 298–372Google Scholar
  54. 54.
    Wilson K, Snavely N (2014) Robust global translations with 1DSfM. In: Proceedings ECCVGoogle Scholar
  55. 55.
    Wu C (2007) SiftGPU: A GPU implementation of scale invariant feature transform (SIFT).
  56. 56.
    Wu C (2013) Towards linear-time incremental structure from motion. In: 3DV ConferenceGoogle Scholar
  57. 57.
    Wu C, Agarwal S, Curless B, Seitz SM (2011) Multicore bundle adjustment. In: Proceedings IEEE CVPRGoogle Scholar
  58. 58.
    Zhang R, Tsai P, Cryer J, Shah M (1999) Shape-from-shading: A survey. IEEE Transac Pattern Anal Mach IntelGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  1. 1.CVIT, IIIT HyderabadHyderabadIndia

Personalised recommendations