Skip to main content
Log in

Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This article presents an approach for modeling landmarks based on large-scale, heavily contaminated image collections gathered from the Internet. Our system efficiently combines 2D appearance and 3D geometric constraints to extract scene summaries and construct 3D models. In the first stage of processing, images are clustered based on low-dimensional global appearance descriptors, and the clusters are refined using 3D geometric constraints. Each valid cluster is represented by a single iconic view, and the geometric relationships between iconic views are captured by an iconic scene graph. Using structure from motion techniques, the system then registers the iconic images to efficiently produce 3D models of the different aspects of the landmark. To improve coverage of the scene, these 3D models are subsequently extended using additional, non-iconic views. We also demonstrate the use of iconic images for recognition and browsing. Our experimental results demonstrate the ability to process datasets containing up to 46,000 images in less than 20 hours, using a single commodity PC equipped with a graphics card. This is a significant advance towards Internet-scale operation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agarwal, S., Snavely, N., Simon, I., Seitz, S. M., & Szeliski, R. (2009). Building Rome in a day. In ICCV.

    Google Scholar 

  • Arya, S., Mount, D., Netanyahu, N., Silverman, R., & Wu, A. (1998). An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM, 45, 891–923.

    Article  MathSciNet  MATH  Google Scholar 

  • Beder, C., & Steffen, R. (2006). Determining an initial image pair for fixing the scale of a 3d reconstruction from an image sequence. In Proc. DAGM (pp. 657–666).

    Google Scholar 

  • Berg, T. L., & Forsyth, D. A. (2006). Animals on the web. In CVPR.

    Google Scholar 

  • Berg, T. L., & Forsyth, D. A. (2007). Automatic ranking of iconic images. Technical report, University of California, Berkeley.

  • Berg, T. L., Berg, A. C. (2009). Finding iconic images. In The 2nd internet vision workshop at ieee conference on computer vision and pattern recognition.

    Google Scholar 

  • Blanz, V., Tarr, M., & Bulthoff, H. (1999). What object attributes determine canonical views? Perception, 28(5), 575–600.

    Article  Google Scholar 

  • Chum, O., Philbin, J., Sivic, J., Isard, M., & Zisserman, A. (2007). Total recall: automatic query expansion with a generative feature model for object retrieval. In ICCV.

    Google Scholar 

  • Collins, B., Deng, J., Kai, L., & Fei-Fei, L. (2008). Towards scalable dataset construction: an active learning approach. In ECCV.

    Google Scholar 

  • Crandall, D. J., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In Proceedings of the 18th international conference on world wide web, WWW ’09 (pp. 761–770). New York: ACM.

    Chapter  Google Scholar 

  • Denton, T., Demirci, M., Abrahamson, J., Shokoufandeh, A., & Dickinson, S. (2004). Selecting canonical views for view-based 3-d object recognition. In ICPR (pp. 273–276).

    Google Scholar 

  • Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., & Schmid, C. (2009). Evaluation of gist descriptors for web-scale image search. In International conference on image and video retrieval.

    Google Scholar 

  • Fergus, R., Perona, P., & Zisserman, A. (2004). A visual category filter for Google images. In ECCV.

    Google Scholar 

  • Frahm, J.-M., & Pollefeys, M. (2006). RANSAC for (quasi-) degenerate data (QDEGSAC). In CVPR (Vol. 1, pp. 453–460).

    Google Scholar 

  • Haber, T., Fuchs, C., Bekaert, P., Seidel, H.-P., Goesele, M., & Lensch, H. P. A. (2009). Relighting objects from image collections. In Proceedings of CVPR.

    Google Scholar 

  • Hall, P., & Owen, M. (2005). Simple canonical views. In BMVC (pp. 839–848).

    Google Scholar 

  • Hays, J., & Efros, A. A. (2007). Scene completion using millions of photographs. In SIGGRAPH.

    Google Scholar 

  • Jing, Y., & Baluja, S. (2008). Visualrank: applying PageRank to large-scale image search. IEEE Transations on Pattern Analysis and Machine Intelligence, 30, 1877–1890.

    Article  Google Scholar 

  • Kennedy, L., Chang, S.-F., & Kozintsev, I. (2006). To search or to label? Predicting the performance of search-based automatic image classifiers. In Multimedia information retrieval workshop (MIR) (p. 2006).

    Google Scholar 

  • Kennedy, L., & Naaman, M. (2008). Generating diverse and representative image search results for landmarks. In Proceedings of the seventeenth international world wide web conference (WWW).

    Google Scholar 

  • Li, L.-J., Wang, G., & Li, F.-F. (2007). Optimol: automatic object picture collection via incremental model learning. In CVPR.

    Google Scholar 

  • Li, X., Wu, C., Zach, C., Lazebnik, S., & Frahm, J.-M. (2008). Modeling and recognition of landmark image collections using iconic scene graphs. In ECCV.

    Google Scholar 

  • Li, Y., Crandall, D. J., & Huttenlocher, D. P. (2009). Landmark classification in large-scale image collections. In ICCV.

    Google Scholar 

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Ni, K., Steedly, D., & Dellaert, F. (2007). Out-of-core bundle adjustment for large-scale 3d reconstruction. In ICCV.

    Google Scholar 

  • Nistér, D. (2004). An efficient solution to the five-point relative pose problem. IEEE Transations on Pattern Analysis and Machine Intelligence, 26(6), 756–770.

    Article  Google Scholar 

  • Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR.

    Google Scholar 

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.

    Article  MATH  Google Scholar 

  • Palmer, S., Rosch, E., & Chase, P. (1981). Canonical perspective and the perception of objects. Attention and Performance, IX, 135–151.

    Google Scholar 

  • Philbin, J., & Zisserman, A. (2008). Object mining using a matching graph on very large image collections. In Proceedings of the Indian conference on computer vision, graphics and image processing.

    Google Scholar 

  • Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization: improving particular object retrieval in large scale image databases. In CVPR.

    Google Scholar 

  • Quack, T., Leibe, B., & Van Gool, L. (2008). World-scale mining of objects and events from community photo collections. In Proceedings of the 2008 international conference on content-based image and video retrieval, CIVR ’08 (pp. 47–56). New York: ACM.

    Chapter  Google Scholar 

  • Raguram, R., & Lazebnik, S. (2008). Computing iconic summaries of general visual concepts. In Workshop on internet vision CVPR.

    Google Scholar 

  • Raguram, R., Frahm, J.-M., & Pollefeys, M. (2008). A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In ECCV.

    Google Scholar 

  • Schaffalitzky, F., & Zisserman, A. (2002). Multi-view matching for unordered image sets, or “how do i organize my holiday snaps?” In ECCV ’02: Proceedings of the 7th European conference on computer vision-part I (pp. 414–431).

    Google Scholar 

  • Schroff, F., Criminisi, A., & Zisserman, A. (2007). Harvesting image databases from the web. In ICCV.

    Google Scholar 

  • Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transations on Pattern Analysis and Machine Intelligence, 22, 888–905.

    Article  Google Scholar 

  • Sigurbjörnsson, B., & van Zwol, R. (2008). Flickr tag recommendation based on collective knowledge. In WWW.

    Google Scholar 

  • Simon, I., Snavely, N., & Seitz, S. M. (2007). Scene summarization for online image collections. In ICCV.

    Google Scholar 

  • Snavely, N., Seitz, S. M., & Szeliski, R. (2008a). Skeletal sets for efficient structure from motion. In CVPR.

    Google Scholar 

  • Snavely, N., Seitz, S. M., & Szeliski, R. (2008b). Modeling the world from Internet photo collections. International Journal of Computer Vision, 80(2), 189–210.

    Article  Google Scholar 

  • Snavely, Noah, Seitz, Steven M., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. In SIGGRAPH (pp. 835–846).

    Google Scholar 

  • Sontag, S. (1977). On Photography Classics. London: Penguin.

    Google Scholar 

  • Torralba, A., Fergus, R., & Weiss, Y. (2008). Small codes and large databases for recognition. In CVPR.

    Google Scholar 

  • Weinshall, D., Werman, M., & Gdalyahu, Y. (1994). Canonical views, or the stability and likelihood of images of 3d objects. In Image Understanding Workshop.

    Google Scholar 

  • Zheng, Y.-T., Zhao, M., Song, Y., Adam, H., Buddemeier, U., Bissacco, A., Brucher, F., Chua, T.-S., & Neven, H. (2009). Tour the world: Building a web-scale landmark recognition engine. In CVPR.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahul Raguram.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raguram, R., Wu, C., Frahm, JM. et al. Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs. Int J Comput Vis 95, 213–239 (2011). https://doi.org/10.1007/s11263-011-0445-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-011-0445-z

Keywords

Navigation