Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets

Philbin, James; Sivic, Josef; Zisserman, Andrew

doi:10.1007/s11263-010-0363-5

Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets

Published: 15 June 2010

Volume 95, pages 138–153, (2011)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

James Philbin¹,
Josef Sivic² &
Andrew Zisserman¹

870 Accesses
50 Citations
Explore all metrics

Abstract

Given a large-scale collection of images our aim is to efficiently associate images which contain the same entity, for example a building or object, and to discover the significant entities. To achieve this, we introduce the Geometric Latent Dirichlet Allocation (gLDA) model for unsupervised discovery of particular objects in unordered image collections. This explicitly represents images as mixtures of particular objects or facades, and builds rich latent topic models which incorporate the identity and locations of visual words specific to the topic in a geometrically consistent way. Applying standard inference techniques to this model enables images likely to contain the same object to be probabilistically grouped and ranked.

Additionally, to reduce the computational cost of applying the gLDA model to large datasets, we propose a scalable method that first computes a matching graph over all the images in a dataset. This matching graph connects images that contain the same object, and rough image groups can be mined from this graph using standard clustering techniques. The gLDA model can then be applied to generate a more nuanced representation of the data. We also discuss how “hub images” (images representative of an object or landmark) can easily be extracted from our matching graph representation.

We evaluate our techniques on the publicly available Oxford buildings dataset (5K images) and show examples of automatically mined objects. The methods are evaluated quantitatively on this dataset using a ground truth labeling for a number of Oxford landmarks. To demonstrate the scalability of the matching graph method, we show qualitative results on two larger datasets of images taken of the Statue of Liberty (37K images) and Rome (1M+ images).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bundling centre for landmark image discovery

Article 01 December 2015

Toward semantic image similarity from crowdsourced clustering

Article 20 May 2016

Efficient Label Collection for Image Datasets via Hierarchical Clustering

Article 24 August 2017

References

Agarwal, S., Snavely, N., Simon, I., Seitz, S., & Szeliski, R. (2009). Building Rome in a day. In Proc. ICCV.
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: ACM Press.
Google Scholar
Blei, D., Ng, A., & Jordan, M. (2002). Latent Dirichlet allocation. In NIPS.
Bosch, A., Zisserman, A., & Munoz, X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE PAMI, 30(4).
Cao, L., & Fei-Fei, L. (2007). Spatially coherent latent topic model for concurrent object segmentation and classification. In Proc. ICCV.
Chemuduguntu, C., Smyth, P., & Steyvers, M. (2007). Modeling general and specific aspects of documents with a probabilistic topic model. In NIPS.
Chum, O., & Matas, J. (2008). Web scale image clustering: large scale discovery of spatially related images. Technical Report CTU-CMP-2008-15, Czech Technical University in Prague.
Chum, O., Matas, J., & Kittler, J. (2003). Locally optimized RANSAC. In DAGM (pp. 236–243).
Chum, O., Philbin, J., Isard, M., & Zisserman, A. (2007a). Scalable near identical image and shot detection. In Proc. CIVR.
Chum, O., Philbin, J., Sivic, J., Isard, M., & Zisserman, A. (2007b). Total recall: automatic query expansion with a generative feature model for object retrieval. In Proc. ICCV.
Cormen, T., Leiserson, C., Rivest, R., & Stein, C. (1990). Introduction to algorithms. New York: McGraw-Hill.
MATH Google Scholar
Crandall, D., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In Proc. WWW.
Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV (pp. 1–22).
Cummins, M., & Newman, P. (2007). Probabilistic appearance based navigation and loop closing. In Proc. IEEE international conference on robotics and automation (ICRA’07).
Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In Proc. CVPR, Jun 2005.
Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from Google’s image search. In Proc. ICCV.
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6), 381–395.
Article MathSciNet Google Scholar
Fritz, M., & Schiele, B. (2008). Decomposition, discovery and detection of visual categories using topic models. In Proc. CVPR.
Griffiths, T., & Steyvers, M. (2004). Finding scientific topics. Proc. Natl. Acad. Sci., 101, 5228–5235.
Article Google Scholar
Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn., 43, 177–196.
Article Google Scholar
Kim, G., & Torralba, A. (2009). Unsupervised detection of regions of interest using iterative link analysis. In NIPS.
Li, L.-J., Wang, G., & Fei-Fei, L. (2007). Optimol: automatic online picture collection via incremental model learning. In Proc. CVPR.
Li, X., Wu, C., Zach, C., Lazebnik, S., & Frahm, J.-M. (2008). Modeling and recognition of landmark image collections using iconic scene graphs. In Proc. ECCV.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis., 60(2), 91–110.
Article Google Scholar
Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. Int. J. Comput. Vis., 1(60), 63–86.
Article Google Scholar
Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In Proc. CVPR.
Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR.
Philbin, J., Sivic, J., & Zisserman, A. (2008). Geometric LDA: a generative model for particular object discovery. In Proceedings of the British machine vision conference.
Philbin, J., & Zisserman, A. (2008). Object mining using a matching graph on very large image collections. In Proceedings of the Indian conference on computer vision, graphics and image processing.
Quack, T., Ferrari, V., & Van Gool, L. (2006). Video mining with frequent itemset configurations. In Proc. CIVR.
Quack, T., Leibe, B., & Van Gool, L. (2008). World-scale mining of objects and events from community photo collections. In Proc. CIVR.
Quelhas, P., Monay, F., Odobez, J.-M., Gatica, D., Tuytelaars, T., & Van Gool, L. (2005). Modeling scenes with local descriptors and latent aspects. In Proc. ICCV (pp. 883–890).
Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In Proc. CVPR.
Schaffalitzky, F., & Zisserman, A. (2002). Multi-view matching for unordered image sets, or “How do I organize my holiday snaps?”. In Proc. ECCV (Vol. 1, pp. 414–431). Berlin: Springer-Verlag.
Google Scholar
Simon, I., & Seitz, S. M. (2008). Scene segmentation using the wisdom of crowds. In Proc. ECCV.
Simon, I., Snavely, N., & Seitz, S. M. (2007). Scene summarization for online image collections. In Proc. ICCV.
Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering object categories in image collections. In Proc. ICCV.
Sivic, J., Russell, B. C., Zisserman, A., Freeman, W. T., & Efros, A. A. (2008). Unsupervised discovery of visual object class hierarchies. In Proc. CVPR.
Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In Proc. ICCV.
Sivic, J., & Zisserman, A. (2004). Video data mining using configurations of viewpoint invariant regions. In Proc. CVPR, Jun 2004.
Snavely, N., Seitz, S., & Szeliski, R. (2006). Photo tourism: exploring photo collections in 3D. In Proc. ACM SIGGRAPH (pp. 835–846).
Sudderth, E., Torralba, A., Freeman, W. T., & Willsky, A. (2008). Describing visual scenes using transformed objects and parts. Int. J. Comput. Vis., 77(1–3).
Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. C. (2005). Image parsing: unifying segmentation, detection, and recognition. IEEE PAMI, 62(2), 113–140.
Google Scholar
Turcot, P., & Lowe, D. (2009). Better matching with fewer features: the selection of useful features in large database recognition problems. In ICCV workshop on emergent issues in large amounts of visual data (WS-LAVD).
Wang, X., & Grimson, E. (2007). Spatial latent Dirichlet allocation. In NIPS.
Winn, J., & Joijic, N. (2005). Locus: learning object classes with unsupervised segmentation. In Proc. ICCV.

Download references

Author information

Authors and Affiliations

Visual Geometry Group, Department of Engineering Science, University of Oxford, Oxford, UK
James Philbin & Andrew Zisserman
INRIA – Willow Project, Laboratoire d’Informatique de l’Ecole Normale Supérieure, (CNRS/ENS/INRIA UMR 8548), Paris, France
Josef Sivic

Authors

James Philbin
View author publications
You can also search for this author in PubMed Google Scholar
Josef Sivic
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James Philbin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Philbin, J., Sivic, J. & Zisserman, A. Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets. Int J Comput Vis 95, 138–153 (2011). https://doi.org/10.1007/s11263-010-0363-5

Download citation

Received: 23 October 2009
Accepted: 27 May 2010
Published: 15 June 2010
Issue Date: November 2011
DOI: https://doi.org/10.1007/s11263-010-0363-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets

Abstract

Access this article

Similar content being viewed by others

Bundling centre for landmark image discovery

Toward semantic image similarity from crowdsourced clustering

Efficient Label Collection for Image Datasets via Hierarchical Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets

Abstract

Access this article

Similar content being viewed by others

Bundling centre for landmark image discovery

Toward semantic image similarity from crowdsourced clustering

Efficient Label Collection for Image Datasets via Hierarchical Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation