Visual Geo-localization of Non-photographic Depictions via 2D–3D Alignment

Chapter
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)

Abstract

This chapter describes a technique that can geo-localize arbitrary 2D depictions of architectural sites, including drawings, paintings, and historical photographs. This is achieved by aligning the input depiction with a 3D model of the corresponding site. The task is very difficult as the appearance and scene structure in the 2D depictions can be very different from the appearance and geometry of the 3D model, e.g., due to the specific rendering style, drawing error, age, lighting, or change of seasons. In addition, we face a hard search problem: the number of possible alignments of the depiction to a set of 3D models from different architectural sites is huge. To address these issues, we develop a compact representation of complex 3D scenes. 3D models of several scenes are represented by a set of discriminative visual elements that are automatically learnt from rendered views. Similar to object detection, the set of visual elements, as well as the weights of individual features for each element, are learnt in a discriminative fashion. We show that the learnt visual elements are reliably matched in 2D depictions of the scene despite large variations in rendering style (e.g., watercolor, sketch, and historical photograph) and structural changes (e.g., missing scene parts and large occluders) of the scene. We demonstrate that the proposed approach can automatically identify the correct architectural site as well as recover an approximate viewpoint of historical photographs and paintings with respect to the 3D model of the site.

Keywords

Linear Discriminant Analysis Visual Element Local Invariant Feature Putative Correspondence Local Feature Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

We are grateful to Guillaume Seguin, Alyosha Efros, Guillaume Obozinski and Jean Ponce for their useful feedback, and to Yasutaka Furukawa for providing access to the San Marco 3D model. This work was partly supported by the EIT ICT Labs, ANR project SEMAPOLIS (ANR-13-CORD-0003), and the ERC starting grant LEAP. The work was partly carried out at IMAGINE, a joint research project between Ecole des Ponts ParisTech (ENPC) and the Scientific and Technical Centre for Building (CSTB). Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL or the U.S. Government.

References

  1. 1.
    Aubry M, Russell B, Sivic J (2014) Painting-to-3D model alignment via discriminative visual elements. ACM Trans Graphics 33(2)Google Scholar
  2. 2.
    Baatz G, Saurer O, Köser K, Pollefeys M (2012) Large scale visual geo-localization of images in mountainous terrain. In: Proceedings of European conference on computer visionGoogle Scholar
  3. 3.
    Baboud L, Cadik M, Eisemann E, Seidel HP (2011) Automatic photo-to-terrain alignment for the annotation of mountain pictures. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  4. 4.
    Bach F, Harchaoui Z (2008) Diffrac: a discriminative and flexible framework for clustering. In: Advances in neural information processing systemsGoogle Scholar
  5. 5.
    Bishop CM (2006) Pattern recognition and machine learning. SpringerGoogle Scholar
  6. 6.
    Bosché F (2010) Automated recognition of 3D CAD model objects in laser scans and calculation of as-built dimensions for dimensional compliance control in construction. Adv Eng Inf 24(1):107–118Google Scholar
  7. 7.
    Chen D, Baatz G et al (2011) City-scale landmark identification on mobile devices. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  8. 8.
    Chum O, Matas J (2006) Geometric hashing with local affine frames. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  9. 9.
    Crowley EJ, Zisserman A (2014) In search of art. In: Workshop on computer vision for art analysis, ECCVGoogle Scholar
  10. 10.
    Crowley EJ, Zisserman A (2014) The state of the art: object retrieval in paintings using discriminative regions. In: British machine vision conferenceGoogle Scholar
  11. 11.
    Cummins M, Newman P (2009) Highly scalable appearance-only SLAM—FAB-MAP 2.0. In: Proceedings of robotics: science and systems, Seattle, USAGoogle Scholar
  12. 12.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  13. 13.
    Dean T, Ruzon M, Segal M, Shlens J, Vijayanarasimhan S, Yagnik J (2013) Fast, accurate detection of 100,000 object classes on a single machine. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  14. 14.
    Doersch C, Singh S, Gupta A, Sivic J, Efros AA (2012) What makes Paris look like Paris? ACM Trans Graphics (Proc SIGGRAPH) 31(4)Google Scholar
  15. 15.
    Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: a deep convolutional activation feature for generic visual recognition. arXiv:1310.1531
  16. 16.
    Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9(1):1871–1874Google Scholar
  17. 17.
    Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9)Google Scholar
  18. 18.
    Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395Google Scholar
  19. 19.
    Frome A, Singer Y, Sha F, Malik J (2007) Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: Proceedings of international conference on computer visionGoogle Scholar
  20. 20.
    Furukawa Y, Ponce J (2010) Accurate, dense, and robust multi-view stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8)Google Scholar
  21. 21.
    Furukawa Y, Curless B, Seitz SM, Szeliski R (2010) Towards internet-scale multi-view stereo. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  22. 22.
    Gharbi M, Malisiewicz T, Paris S, Durand F (2012) A Gaussian approximation of feature space for fast image similarity. Technical report, MITGoogle Scholar
  23. 23.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  24. 24.
    Gronat P, Obozinski G, Sivic J, Pajdla T (2013) Learning and calibrating per-location classifiers for visual place recognition. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  25. 25.
    Hariharan B, Malik J, Ramanan D (2012) Discriminative decorrelation for clustering and classification. In: Proceedings of European conference on computer visionGoogle Scholar
  26. 26.
    Hartley RI, Zisserman A (2004) Multiple view geometry in computer vision, 2n edn. Cambridge University Press. ISBN: 0521540518Google Scholar
  27. 27.
    Hauagge D, Snavely N (2012) Image matching using local symmetry features. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  28. 28.
  29. 29.
    Huttenlocher DP, Ullman S (1987) Object recognition using alignment. In: International conference on computer visionGoogle Scholar
  30. 30.
    Irschara A, Zach C, Frahm JM, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  31. 31.
    Knopp J, Sivic J, Pajdla T (2010) Avoiding confusing features in place recognition. In: Proceedings of European conference on computer visionGoogle Scholar
  32. 32.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systemsGoogle Scholar
  33. 33.
    Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3D point clouds. In: Proceedings of European conference on computer visionGoogle Scholar
  34. 34.
    Lowe D (1987) The viewpoint consistency constraint. Int J Comput Vis 1(1):57–72Google Scholar
  35. 35.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110Google Scholar
  36. 36.
    Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-svms for object detection and beyond. In: Proceedings of international conference on computer visionGoogle Scholar
  37. 37.
    Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  38. 38.
    Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
  39. 39.
    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  40. 40.
    Rapp J (2008) A geometrical analysis of multiple viewpoint perspective in the work of Giovanni Battista Piranesi: an application of geometric restitution of perspective. J Arch 13(6)Google Scholar
  41. 41.
    Russell BC, Sivic J, Ponce J, Dessales H (2011) Automatic alignment of paintings and photographs depicting a 3D scene. In: IEEE workshop on 3D representation for recognition (3dRR-11), associated with ICCVGoogle Scholar
  42. 42.
    Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2d-to-3d matching. In: Proceedings of international conference on computer visionGoogle Scholar
  43. 43.
    Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  44. 44.
    Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229
  45. 45.
    Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for SVM. Math Program Seri B 127(1):3–30Google Scholar
  46. 46.
    Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  47. 47.
    Shrivastava A, Malisiewicz T, Gupta A, Efros AA (2011) Data-driven visual similarity for cross-domain image matching. In: ACM Trans Graphics (Proc SIGGRAPH Asia)Google Scholar
  48. 48.
    Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of international conference on computer visionGoogle Scholar
  49. 49.
    Szeliski R, Torr P (1998) Geometrically constrained structure from motion: points on planes. In: European workshop on 3D structure from multiple images of large-scale environments (SMILE)Google Scholar
  50. 50.
    Torii A, Sivic J, Pajdla T, Okutomi M (2013) Visual place recognition with repetitive structures. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  51. 51.
    Zamir A, Shah M (2010) Accurate image localization based on google maps street view. In: Proceedings of European conference on computer visionGoogle Scholar
  52. 52.
    Zeiler M, Fergus R (2013) Visualizing and understanding convolutional networks. arXiv:1311.2901

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.LIGM (UMR CNRS 8049)ENPC/Université Paris-EstMarne-la-ValléeFrance
  2. 2.Adobe ResearchLexingtonUSA
  3. 3.Inria, WILLOW Project-team, Département d’Informatique de l’Ecole Normale Supérieure, ENS/INRIA/CNRS UMRParisFrance

Personalised recommendations