Geo-localization using Volumetric Representations of Overhead Imagery


This paper addresses the problem of determining the location of a ground level image by using geo-referenced overhead imagery. The input query image is assumed to be given with no meta-data and the content of the image is to be matched to a priori constructed reference representations. The semantic breakdown of the content of the query image is provided through manual labeling; however, all processing involving the reference imagery and matching are fully automated. In this paper, a volumetric representation is proposed to fuse different modalities of overhead imagery and construct a 3D reference world. Attributes of this reference world such as orientation of the world surfaces, types of land cover, depth order of fronto-parallel surfaces are indexed and matched to the attributes of the surfaces manually marked on the query image. An exhaustive but highly parallelizable matching scheme is proposed and the performance is evaluated on a set of query images located in a coastal region in Eastern United States. The performance is compared to a baseline region reduction algorithm and to a landmark existence matcher that uses a 2D representation of the reference world. The proposed 3D geo-localization framework performs better than the 2D approach for 75 % of the query images.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20


  1. 1.

    Note that creating semantic break-down of a still image in such detail and accuracy in an automated manner is a very challenging problem and it is beyond the scope of the proposed geo-localization framework. In this work, the best possible semantic description and camera calibration of the query image are assumed to be given as inputs to the system.

  2. 2.

    For the experiments in this paper, the size of the camera space at every location is 21,901 and the number of possible camera locations is 2.8 million in a search area of 634 \(\hbox {km}^{2}\) for the experiments in this paper.

  3. 3.

    Table 2 shows a full list of labels that appeared in the query images used in the experiments of this paper.

  4. 4.

    The dynamic programming based string matching algorithm has complexity O(NM) where N and M are the number of literals in the first and second strings respectively. In the case of a 3D approach taking visibility into account, the complexity would be linear (i.e. O(N) if \(\hbox {N}>\hbox {M}\)) since the two strings would be created to match one-to-one.

  5. 5.

    Note that the LIDAR data used in these experiments has 2 m spatial resolution with 30 cm vertical accuracy. \(1\hbox { m}^{3}\) voxel resolution is chosen to capture the vertical accuracy of the LIDAR data and create more realistic ground view renderings.

  6. 6.

    For the experiments in this paper, the maximum depth of an object to be marked on a query image is set to be 3000 m so that one byte per ray was sufficient to store the depth interval index.

  7. 7.

    Note that, each location can be independently evaluated from the other locations and thus the method is highly parallelizable.

  8. 8.

    There are misalignments between LIDAR data and the orthographic classification maps up to 5 m. Thus, the thin pier regions are sampled more frequently to make sure there are hypothesis locations with correct elevations.

  9. 9.

    The more precise the distance estimation of an object, e.g. a building, the better the scoring would be as its weight wouldn’t be distributed. Thus the quality of the camera calibration becomes important for this matcher as also is the case for the proposed 3D matcher. The purpose of this paper is to compare the matcher performances and thus only the query images with good calibrations are selected for the experiments. For land types with large extents on the ground, e.g. a lake, the reference existence histogram would also record its existence in multiple distance intervals, so the distribution of the weights is not an issue for such land types.

  10. 10.

    The ground-truth locations of these images were available as measured by GPS during capture. A ground-truth camera heading is prepared manually via Google Earth’s image overlay tool.

  11. 11.

    Note that the 2D existence matchers don’t use the object location in the image explicitly but only the attributes of the object assigned by the user. Thus, the same polygonal mark-ups given by the user for proposed volumetric matcher are also used to generate the query signature for the 2D existence matcher.

  12. 12.

    For this experiment, during matching the weights of the corresponding attributes are simply set to 0 to disable its contribution.


  1. Altwaijry, H., Moghimi, M., & Belongie, S. (2014). Recognizing locations with Google glass: A case study. In Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV).

  2. Ardeshir, S., Zamir, A. R., Torroella, A., & Shah, M. (2014). GIS-assisted object detection and geospatial localization. In Proceedings of European Conference on Computer Vision (ECCV).

  3. Aubry, M., Russell, B., & Sivic, J. (2014). Painting-to-3D model alignment via discriminative visual elements. ACM Transactions on Graphics (TOG), 33(2), 14.

  4. Baatz, G., Saurer, O., Koser, K., & Pollefeys, M. (2012). Large scale visual geo-localization of images in mountainous terrain. In Proceedings of European Conference on Computer Vision (ECCV).

  5. Bansal, M., & Daniilidis, K. (2014). Geometric urban geo-localization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  6. Bansal, M., Daniilidis, K., & Sawhney, H. (2012). Ultra-wide baseline facade matching for geo-localization. In 1st International Workshop on Visual Analysis and Geo-localization of Large-Scale Imagery, ECCV.

  7. Calakli, F., Ulusoy, A. O., Restrepo, M. I., & Taubin, G. (2012). High resolution surface reconstruction from multi-view aerial imagery. In Proceedings of 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT).

  8. Crispell, D., Mundy, J., & Taubin, G. (2011). A variable-resolution probabilistic three dimensional model for change detection. IEEE Transactions on Geoscience and Remote Sensing, 49(11), 489–5000.

    Google Scholar 

  9. Frohlich, B., Bach, E., Walde, I., Hese, S., Schmullius, C., & Denzler, J. (2013). Land cover classification of satellite images using contextual information. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, II–3, W1.

    Article  Google Scholar 

  10. Fry, J., Xian, G., Jin, S., Dewitz, J., Homer, C., Yang, L., et al. (2011). Completion of the national land cover database for the conterminous United States. PE&RS, 77(9), 858–864.

  11. Hays, J., & Efros, A. A. (2008). im2gps: Estimating geographic information from a single image. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  12. Irschara, A., Zach, C., Frahm, J., & Bischof, H. (2009). From structure-from-motion point clouds to fast location recognition. In Proceedings of Computer Vision and Pattern Recognition (CVPR).

  13. Kluckner, S., Mauthner, T., Roth, P., & Bischof, H. (2009). Semantic classification in aerial imagery by integrating appearance and height information. In Proceedings of Asian Conference on Computer Vision (ACCV).

  14. Lee, S., & Nevatia, R. (2011). Robust camera calibration tool for video surveillance camera in urban environment. In Proceedings of Camera Networks Workshop (CVPR).

  15. Li, A., Morariu, V. I., & Davis, L. S. (2014). Planar structure matching under projective uncertainty for geolocation. In Proceedings of European Conference on Computer Vision (ECCV).

  16. Li, Y., Snavely, N., & Huttenlocher, D. (2010). Location recognition using prioritized feature matching. In Proceedings of European Conference on Computer Vision (ECCV).

  17. Lin, T., Belongie, S., & Hays, J. (2013). Cross-view image geolocalization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  18. Matei, B., Valk, N. V., Zhu, Z., & Cheng, H. (2013). Image to LIDAR matching for geotagging in urban environments. In Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV).

  19. Miller, A., Jain, V., & Mundy, J. L. (2011). Real-time rendering and dynamic updating of 3D volumetric data. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units (GPUs).

  20. NAIP. (2009). Retrieved from

  21. NOAA. (n.d.). Retrieved from

  22. OSM Map Features. (n.d.). Retrieved from

  23. Ozcanli, O. C., Dong, Y., Mundy, J., Webb, H., Hammoud, R., & Victor, T. (2014). Automatic geo-location correction of satellite imagery. In Proceedings of Computer Vision and Pattern Recognition (CVPR).

  24. Park, M., Chen, Y., & Shafique, K. (2013). Tag configuration Matcher for geo-tagging. In Proceedings of SIGSPATIAL/GIS, pp. 374–377.

  25. Pollard, T., & Mundy, J. L. (2007). Change detection in a 3D world. In Proceedings of Computer Vision and Pattern Recognition (CVPR).

  26. Pollard, T., Eden, I., Cooper, D. B., & Mundy, J. L. (2009). A volumetric approach to change detection in satellite images. American Society for Photogrammetry and Remote Sensing.

  27. Ramalingam, S., Bouaziz, S., Sturm, P., & Brand, M. (2009). Geolocalization using skylines from omni-images. In Proceedings of International Conferences on Computer Vision (ICCV) Workshops.

  28. Restrepo, M. I., Mayer, B. A., Ulusoy, A. O., & Mundy, J. L. (2012). Characterization of 3D volumetric probabilistic scenes for object recognition. IEEE Journal of Selected Topics in Signal Processing, 6, 522–537.

  29. Schindler, G., Brown, M., & Szeliski, R. (2007). City-scale location recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  30. Sibbing, D., Sattler, T., Leibe, B., & Kobbelt, L. (2013). SIFT-Realistic Rendering. In Proceeding of 3D Vision (3DV).

  31. Sobel, E., Vinciguerra, L., Rinehart, M., & Dankert, J. (2012). URGENT phase II final report. BAE Systems, Sponsored by DARPA IPTO.

  32. Tzeng, E., Zhai, A., Clements, M., Townshend, R., & Zakhor, A. (2013). User-driven geolocation of untagged desert imagery using digital elevation models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, (pp. 237–244).

  33. Ulusoy, A. O., Biris, O., & Mundy, J. L. (2013). Dynamic probabilistic volumetric models. In Proceedings of International Conference on Computer Vision (ICCV).

  34. Unsalan, C., & Boyer, K. L. (2011). Multispectral satellite image understanding. Berlin: Springer.

    Google Scholar 

Download references


Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL), Contract FA8650-12-C-7211. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.

Author information



Corresponding author

Correspondence to Ozge C. Ozcanli.

Ethics declarations


The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL, or the U.S. Government.

Additional information

Communicated by Marc Pollefeys, Larry S. Davis, Josef Sivic, Riad I. Hammoud.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ozcanli, O.C., Dong, Y. & Mundy, J.L. Geo-localization using Volumetric Representations of Overhead Imagery. Int J Comput Vis 116, 226–246 (2016).

Download citation


  • Image search
  • Geo-localization
  • 3D Modeling
  • Visibility index
  • OpenStreetMap