Large-Scale Visual Geo-Localization pp 255-275 | Cite as
Visual Geo-localization of Non-photographic Depictions via 2D–3D Alignment
Abstract
This chapter describes a technique that can geo-localize arbitrary 2D depictions of architectural sites, including drawings, paintings, and historical photographs. This is achieved by aligning the input depiction with a 3D model of the corresponding site. The task is very difficult as the appearance and scene structure in the 2D depictions can be very different from the appearance and geometry of the 3D model, e.g., due to the specific rendering style, drawing error, age, lighting, or change of seasons. In addition, we face a hard search problem: the number of possible alignments of the depiction to a set of 3D models from different architectural sites is huge. To address these issues, we develop a compact representation of complex 3D scenes. 3D models of several scenes are represented by a set of discriminative visual elements that are automatically learnt from rendered views. Similar to object detection, the set of visual elements, as well as the weights of individual features for each element, are learnt in a discriminative fashion. We show that the learnt visual elements are reliably matched in 2D depictions of the scene despite large variations in rendering style (e.g., watercolor, sketch, and historical photograph) and structural changes (e.g., missing scene parts and large occluders) of the scene. We demonstrate that the proposed approach can automatically identify the correct architectural site as well as recover an approximate viewpoint of historical photographs and paintings with respect to the 3D model of the site.
Keywords
Linear Discriminant Analysis Visual Element Local Invariant Feature Putative Correspondence Local Feature MatchNotes
Acknowledgments
We are grateful to Guillaume Seguin, Alyosha Efros, Guillaume Obozinski and Jean Ponce for their useful feedback, and to Yasutaka Furukawa for providing access to the San Marco 3D model. This work was partly supported by the EIT ICT Labs, ANR project SEMAPOLIS (ANR-13-CORD-0003), and the ERC starting grant LEAP. The work was partly carried out at IMAGINE, a joint research project between Ecole des Ponts ParisTech (ENPC) and the Scientific and Technical Centre for Building (CSTB). Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL or the U.S. Government.
References
- 1.Aubry M, Russell B, Sivic J (2014) Painting-to-3D model alignment via discriminative visual elements. ACM Trans Graphics 33(2)Google Scholar
- 2.Baatz G, Saurer O, Köser K, Pollefeys M (2012) Large scale visual geo-localization of images in mountainous terrain. In: Proceedings of European conference on computer visionGoogle Scholar
- 3.Baboud L, Cadik M, Eisemann E, Seidel HP (2011) Automatic photo-to-terrain alignment for the annotation of mountain pictures. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 4.Bach F, Harchaoui Z (2008) Diffrac: a discriminative and flexible framework for clustering. In: Advances in neural information processing systemsGoogle Scholar
- 5.Bishop CM (2006) Pattern recognition and machine learning. SpringerGoogle Scholar
- 6.Bosché F (2010) Automated recognition of 3D CAD model objects in laser scans and calculation of as-built dimensions for dimensional compliance control in construction. Adv Eng Inf 24(1):107–118Google Scholar
- 7.Chen D, Baatz G et al (2011) City-scale landmark identification on mobile devices. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 8.Chum O, Matas J (2006) Geometric hashing with local affine frames. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 9.Crowley EJ, Zisserman A (2014) In search of art. In: Workshop on computer vision for art analysis, ECCVGoogle Scholar
- 10.Crowley EJ, Zisserman A (2014) The state of the art: object retrieval in paintings using discriminative regions. In: British machine vision conferenceGoogle Scholar
- 11.Cummins M, Newman P (2009) Highly scalable appearance-only SLAM—FAB-MAP 2.0. In: Proceedings of robotics: science and systems, Seattle, USAGoogle Scholar
- 12.Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 13.Dean T, Ruzon M, Segal M, Shlens J, Vijayanarasimhan S, Yagnik J (2013) Fast, accurate detection of 100,000 object classes on a single machine. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 14.Doersch C, Singh S, Gupta A, Sivic J, Efros AA (2012) What makes Paris look like Paris? ACM Trans Graphics (Proc SIGGRAPH) 31(4)Google Scholar
- 15.Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: a deep convolutional activation feature for generic visual recognition. arXiv:1310.1531
- 16.Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9(1):1871–1874Google Scholar
- 17.Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9)Google Scholar
- 18.Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395Google Scholar
- 19.Frome A, Singer Y, Sha F, Malik J (2007) Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: Proceedings of international conference on computer visionGoogle Scholar
- 20.Furukawa Y, Ponce J (2010) Accurate, dense, and robust multi-view stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8)Google Scholar
- 21.Furukawa Y, Curless B, Seitz SM, Szeliski R (2010) Towards internet-scale multi-view stereo. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 22.Gharbi M, Malisiewicz T, Paris S, Durand F (2012) A Gaussian approximation of feature space for fast image similarity. Technical report, MITGoogle Scholar
- 23.Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 24.Gronat P, Obozinski G, Sivic J, Pajdla T (2013) Learning and calibrating per-location classifiers for visual place recognition. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 25.Hariharan B, Malik J, Ramanan D (2012) Discriminative decorrelation for clustering and classification. In: Proceedings of European conference on computer visionGoogle Scholar
- 26.Hartley RI, Zisserman A (2004) Multiple view geometry in computer vision, 2n edn. Cambridge University Press. ISBN: 0521540518Google Scholar
- 27.Hauagge D, Snavely N (2012) Image matching using local symmetry features. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 28.
- 29.Huttenlocher DP, Ullman S (1987) Object recognition using alignment. In: International conference on computer visionGoogle Scholar
- 30.Irschara A, Zach C, Frahm JM, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 31.Knopp J, Sivic J, Pajdla T (2010) Avoiding confusing features in place recognition. In: Proceedings of European conference on computer visionGoogle Scholar
- 32.Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systemsGoogle Scholar
- 33.Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3D point clouds. In: Proceedings of European conference on computer visionGoogle Scholar
- 34.Lowe D (1987) The viewpoint consistency constraint. Int J Comput Vis 1(1):57–72Google Scholar
- 35.Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110Google Scholar
- 36.Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-svms for object detection and beyond. In: Proceedings of international conference on computer visionGoogle Scholar
- 37.Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 38.Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
- 39.Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 40.Rapp J (2008) A geometrical analysis of multiple viewpoint perspective in the work of Giovanni Battista Piranesi: an application of geometric restitution of perspective. J Arch 13(6)Google Scholar
- 41.Russell BC, Sivic J, Ponce J, Dessales H (2011) Automatic alignment of paintings and photographs depicting a 3D scene. In: IEEE workshop on 3D representation for recognition (3dRR-11), associated with ICCVGoogle Scholar
- 42.Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2d-to-3d matching. In: Proceedings of international conference on computer visionGoogle Scholar
- 43.Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 44.Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229
- 45.Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for SVM. Math Program Seri B 127(1):3–30Google Scholar
- 46.Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 47.Shrivastava A, Malisiewicz T, Gupta A, Efros AA (2011) Data-driven visual similarity for cross-domain image matching. In: ACM Trans Graphics (Proc SIGGRAPH Asia)Google Scholar
- 48.Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of international conference on computer visionGoogle Scholar
- 49.Szeliski R, Torr P (1998) Geometrically constrained structure from motion: points on planes. In: European workshop on 3D structure from multiple images of large-scale environments (SMILE)Google Scholar
- 50.Torii A, Sivic J, Pajdla T, Okutomi M (2013) Visual place recognition with repetitive structures. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
- 51.Zamir A, Shah M (2010) Accurate image localization based on google maps street view. In: Proceedings of European conference on computer visionGoogle Scholar
- 52.Zeiler M, Fergus R (2013) Visualizing and understanding convolutional networks. arXiv:1311.2901