Handling Urban Location Recognition as a 2D Homothetic Problem

  • Georges Baatz
  • Kevin Köser
  • David Chen
  • Radek Grzeszczuk
  • Marc Pollefeys
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6316)


We address the problem of large scale place-of-interest recognition in cell phone images of urban scenarios. Here, we go beyond what has been shown in earlier approaches by exploiting the nowadays often available 3D building information (e.g. from extruded floor plans) and massive street-view like image data for database creation. Exploiting vanishing points in query images and thus fully removing 3D rotation from the recognition problem allows then to simplify the feature invariance to a pure homothetic problem, which we show leaves more discriminative power in feature descriptors than classical SIFT. We rerank visual word based document queries using a fast stratified homothetic verification that is tailored for repetitive patterns like window grids on facades and in most cases boosts the correct document to top positions if it was in the short list. Since we exploit 3D building information, the approach finally outputs the camera pose in real world coordinates ready for augmenting the cell phone image with virtual 3D information. The whole system is demonstrated to outperform traditional approaches on city scale experiments for different sources of street-view like image data and a challenging set of cell phone images.


Query Image Scale Ratio Panoramic Image Sift Descriptor Urban Scenario 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Schindler, G., Brown, M., Szeliski, R.: City-Scale Location Recognition. In: CVPR 2007 (2007)Google Scholar
  2. 2.
    Wu, C., Fraundorfer, F., Frahm, J.-M., Pollefeys, M.: 3D model search and pose estimation from single images using VIP features. In: Workshop on Search in 3D, CVPR 2008 (2008)Google Scholar
  3. 3.
    Robertson, D., Cipolla, R.: An image based system for urban navigation. In: BMVC 2004 (2004)Google Scholar
  4. 4.
    Sivic, J., Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos. In: ICCV 2003 (2003)Google Scholar
  5. 5.
    Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: CVPR 2006 (2006)Google Scholar
  6. 6.
    Irschara, A., Zach, C., Frahm, J.-M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: CVPR 2009 (2009)Google Scholar
  7. 7.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(10) (2005)Google Scholar
  8. 8.
    Zhang, W., Kosecka, J.: Image based localization in urban environments. In: 3DPVT 2006 (2006)Google Scholar
  9. 9.
    Zhu, Z., Oskiper, T., Samarasekera, S., Kumar, R., Sawhney, H.S.: Real-time global localization with a pre-built visual landmark database. In: CVPR 2008 (2008)Google Scholar
  10. 10.
    Cao, Y., McDonald, J.: Viewpoint Invariant Features from Single Images using 3D Geometry. In: IEEE Workshop on Applications of Computer Vision 2009 (2009)Google Scholar
  11. 11.
    Bay, H., Ess, A., Tuytelaars, T., van Gool, L.: SURF: Speeded Up Robust Features. Computer Vision and Image Understanding 110(3) (2008)Google Scholar
  12. 12.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2) (2004)Google Scholar
  13. 13.
    Köser, K., Koch, R.: Perspectively Invariant Normal Features. In: Workshop on 3D Representation for Recognition, ICCV 2007 (2007)Google Scholar
  14. 14.
    Wu, C., Clipp, B., Li, X., Frahm, J.-M., Pollefeys, M.: 3D Model Matching with Viewpoint Invariant Patches (VIPs). In: CVPR 2008 (2008)Google Scholar
  15. 15.
    Dreuw, P., Steingrube, P., Hanselmann, H., Ney, H.: SURF-Face: Face Recognition Under Viewpoint Consistency Constraints. In: BMVC 2009 (2009)Google Scholar
  16. 16.
    Jegou, H., Douze, M., Schmid, C.: Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object Retrieval with Large Vocabularies and Fast Spatial Matching. In: CVPR 2007 (2007)Google Scholar
  18. 18.
    Perdoch, M., Chum, O., Matas, J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval. In: CVPR 2009 (2009)Google Scholar
  19. 19.
    Kosecka, J., Zhang, W.: Video Compass. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 476–490. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  20. 20.
    Bishop, C.M.: Pattern Recognition and Machine Learning, p. 123, Section 2.5.1 (2006) ISBN 0-387-31073-8Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Georges Baatz
    • 1
  • Kevin Köser
    • 1
  • David Chen
    • 2
  • Radek Grzeszczuk
    • 3
  • Marc Pollefeys
    • 1
  1. 1.Department of Computer ScienceETH ZurichSwitzerland
  2. 2.Department of Electrical EngineeringStanford UniversityStanfordUSA
  3. 3.Nokia Research at Palo AltoUSA

Personalised recommendations