Abstract
This article presents an approach for modeling landmarks based on large-scale, heavily contaminated image collections gathered from the Internet. Our system efficiently combines 2D appearance and 3D geometric constraints to extract scene summaries and construct 3D models. In the first stage of processing, images are clustered based on low-dimensional global appearance descriptors, and the clusters are refined using 3D geometric constraints. Each valid cluster is represented by a single iconic view, and the geometric relationships between iconic views are captured by an iconic scene graph. Using structure from motion techniques, the system then registers the iconic images to efficiently produce 3D models of the different aspects of the landmark. To improve coverage of the scene, these 3D models are subsequently extended using additional, non-iconic views. We also demonstrate the use of iconic images for recognition and browsing. Our experimental results demonstrate the ability to process datasets containing up to 46,000 images in less than 20 hours, using a single commodity PC equipped with a graphics card. This is a significant advance towards Internet-scale operation.
Similar content being viewed by others
References
Agarwal, S., Snavely, N., Simon, I., Seitz, S. M., & Szeliski, R. (2009). Building Rome in a day. In ICCV.
Arya, S., Mount, D., Netanyahu, N., Silverman, R., & Wu, A. (1998). An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM, 45, 891–923.
Beder, C., & Steffen, R. (2006). Determining an initial image pair for fixing the scale of a 3d reconstruction from an image sequence. In Proc. DAGM (pp. 657–666).
Berg, T. L., & Forsyth, D. A. (2006). Animals on the web. In CVPR.
Berg, T. L., & Forsyth, D. A. (2007). Automatic ranking of iconic images. Technical report, University of California, Berkeley.
Berg, T. L., Berg, A. C. (2009). Finding iconic images. In The 2nd internet vision workshop at ieee conference on computer vision and pattern recognition.
Blanz, V., Tarr, M., & Bulthoff, H. (1999). What object attributes determine canonical views? Perception, 28(5), 575–600.
Chum, O., Philbin, J., Sivic, J., Isard, M., & Zisserman, A. (2007). Total recall: automatic query expansion with a generative feature model for object retrieval. In ICCV.
Collins, B., Deng, J., Kai, L., & Fei-Fei, L. (2008). Towards scalable dataset construction: an active learning approach. In ECCV.
Crandall, D. J., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In Proceedings of the 18th international conference on world wide web, WWW ’09 (pp. 761–770). New York: ACM.
Denton, T., Demirci, M., Abrahamson, J., Shokoufandeh, A., & Dickinson, S. (2004). Selecting canonical views for view-based 3-d object recognition. In ICPR (pp. 273–276).
Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., & Schmid, C. (2009). Evaluation of gist descriptors for web-scale image search. In International conference on image and video retrieval.
Fergus, R., Perona, P., & Zisserman, A. (2004). A visual category filter for Google images. In ECCV.
Frahm, J.-M., & Pollefeys, M. (2006). RANSAC for (quasi-) degenerate data (QDEGSAC). In CVPR (Vol. 1, pp. 453–460).
Haber, T., Fuchs, C., Bekaert, P., Seidel, H.-P., Goesele, M., & Lensch, H. P. A. (2009). Relighting objects from image collections. In Proceedings of CVPR.
Hall, P., & Owen, M. (2005). Simple canonical views. In BMVC (pp. 839–848).
Hays, J., & Efros, A. A. (2007). Scene completion using millions of photographs. In SIGGRAPH.
Jing, Y., & Baluja, S. (2008). Visualrank: applying PageRank to large-scale image search. IEEE Transations on Pattern Analysis and Machine Intelligence, 30, 1877–1890.
Kennedy, L., Chang, S.-F., & Kozintsev, I. (2006). To search or to label? Predicting the performance of search-based automatic image classifiers. In Multimedia information retrieval workshop (MIR) (p. 2006).
Kennedy, L., & Naaman, M. (2008). Generating diverse and representative image search results for landmarks. In Proceedings of the seventeenth international world wide web conference (WWW).
Li, L.-J., Wang, G., & Li, F.-F. (2007). Optimol: automatic object picture collection via incremental model learning. In CVPR.
Li, X., Wu, C., Zach, C., Lazebnik, S., & Frahm, J.-M. (2008). Modeling and recognition of landmark image collections using iconic scene graphs. In ECCV.
Li, Y., Crandall, D. J., & Huttenlocher, D. P. (2009). Landmark classification in large-scale image collections. In ICCV.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Ni, K., Steedly, D., & Dellaert, F. (2007). Out-of-core bundle adjustment for large-scale 3d reconstruction. In ICCV.
Nistér, D. (2004). An efficient solution to the five-point relative pose problem. IEEE Transations on Pattern Analysis and Machine Intelligence, 26(6), 756–770.
Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Palmer, S., Rosch, E., & Chase, P. (1981). Canonical perspective and the perception of objects. Attention and Performance, IX, 135–151.
Philbin, J., & Zisserman, A. (2008). Object mining using a matching graph on very large image collections. In Proceedings of the Indian conference on computer vision, graphics and image processing.
Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization: improving particular object retrieval in large scale image databases. In CVPR.
Quack, T., Leibe, B., & Van Gool, L. (2008). World-scale mining of objects and events from community photo collections. In Proceedings of the 2008 international conference on content-based image and video retrieval, CIVR ’08 (pp. 47–56). New York: ACM.
Raguram, R., & Lazebnik, S. (2008). Computing iconic summaries of general visual concepts. In Workshop on internet vision CVPR.
Raguram, R., Frahm, J.-M., & Pollefeys, M. (2008). A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In ECCV.
Schaffalitzky, F., & Zisserman, A. (2002). Multi-view matching for unordered image sets, or “how do i organize my holiday snaps?” In ECCV ’02: Proceedings of the 7th European conference on computer vision-part I (pp. 414–431).
Schroff, F., Criminisi, A., & Zisserman, A. (2007). Harvesting image databases from the web. In ICCV.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transations on Pattern Analysis and Machine Intelligence, 22, 888–905.
Sigurbjörnsson, B., & van Zwol, R. (2008). Flickr tag recommendation based on collective knowledge. In WWW.
Simon, I., Snavely, N., & Seitz, S. M. (2007). Scene summarization for online image collections. In ICCV.
Snavely, N., Seitz, S. M., & Szeliski, R. (2008a). Skeletal sets for efficient structure from motion. In CVPR.
Snavely, N., Seitz, S. M., & Szeliski, R. (2008b). Modeling the world from Internet photo collections. International Journal of Computer Vision, 80(2), 189–210.
Snavely, Noah, Seitz, Steven M., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. In SIGGRAPH (pp. 835–846).
Sontag, S. (1977). On Photography Classics. London: Penguin.
Torralba, A., Fergus, R., & Weiss, Y. (2008). Small codes and large databases for recognition. In CVPR.
Weinshall, D., Werman, M., & Gdalyahu, Y. (1994). Canonical views, or the stability and likelihood of images of 3d objects. In Image Understanding Workshop.
Zheng, Y.-T., Zhao, M., Song, Y., Adam, H., Buddemeier, U., Bissacco, A., Brucher, F., Chua, T.-S., & Neven, H. (2009). Tour the world: Building a web-scale landmark recognition engine. In CVPR.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Raguram, R., Wu, C., Frahm, JM. et al. Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs. Int J Comput Vis 95, 213–239 (2011). https://doi.org/10.1007/s11263-011-0445-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0445-z