Skip to main content
Log in

Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Given a large-scale collection of images our aim is to efficiently associate images which contain the same entity, for example a building or object, and to discover the significant entities. To achieve this, we introduce the Geometric Latent Dirichlet Allocation (gLDA) model for unsupervised discovery of particular objects in unordered image collections. This explicitly represents images as mixtures of particular objects or facades, and builds rich latent topic models which incorporate the identity and locations of visual words specific to the topic in a geometrically consistent way. Applying standard inference techniques to this model enables images likely to contain the same object to be probabilistically grouped and ranked.

Additionally, to reduce the computational cost of applying the gLDA model to large datasets, we propose a scalable method that first computes a matching graph over all the images in a dataset. This matching graph connects images that contain the same object, and rough image groups can be mined from this graph using standard clustering techniques. The gLDA model can then be applied to generate a more nuanced representation of the data. We also discuss how “hub images” (images representative of an object or landmark) can easily be extracted from our matching graph representation.

We evaluate our techniques on the publicly available Oxford buildings dataset (5K images) and show examples of automatically mined objects. The methods are evaluated quantitatively on this dataset using a ground truth labeling for a number of Oxford landmarks. To demonstrate the scalability of the matching graph method, we show qualitative results on two larger datasets of images taken of the Statue of Liberty (37K images) and Rome (1M+ images).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal, S., Snavely, N., Simon, I., Seitz, S., & Szeliski, R. (2009). Building Rome in a day. In Proc. ICCV.

  • Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: ACM Press.

    Google Scholar 

  • Blei, D., Ng, A., & Jordan, M. (2002). Latent Dirichlet allocation. In NIPS.

  • Bosch, A., Zisserman, A., & Munoz, X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE PAMI, 30(4).

  • Cao, L., & Fei-Fei, L. (2007). Spatially coherent latent topic model for concurrent object segmentation and classification. In Proc. ICCV.

  • Chemuduguntu, C., Smyth, P., & Steyvers, M. (2007). Modeling general and specific aspects of documents with a probabilistic topic model. In NIPS.

  • Chum, O., & Matas, J. (2008). Web scale image clustering: large scale discovery of spatially related images. Technical Report CTU-CMP-2008-15, Czech Technical University in Prague.

  • Chum, O., Matas, J., & Kittler, J. (2003). Locally optimized RANSAC. In DAGM (pp. 236–243).

  • Chum, O., Philbin, J., Isard, M., & Zisserman, A. (2007a). Scalable near identical image and shot detection. In Proc. CIVR.

  • Chum, O., Philbin, J., Sivic, J., Isard, M., & Zisserman, A. (2007b). Total recall: automatic query expansion with a generative feature model for object retrieval. In Proc. ICCV.

  • Cormen, T., Leiserson, C., Rivest, R., & Stein, C. (1990). Introduction to algorithms. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Crandall, D., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In Proc. WWW.

  • Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV (pp. 1–22).

  • Cummins, M., & Newman, P. (2007). Probabilistic appearance based navigation and loop closing. In Proc. IEEE international conference on robotics and automation (ICRA’07).

  • Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In Proc. CVPR, Jun 2005.

  • Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from Google’s image search. In Proc. ICCV.

  • Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6), 381–395.

    Article  MathSciNet  Google Scholar 

  • Fritz, M., & Schiele, B. (2008). Decomposition, discovery and detection of visual categories using topic models. In Proc. CVPR.

  • Griffiths, T., & Steyvers, M. (2004). Finding scientific topics. Proc. Natl. Acad. Sci., 101, 5228–5235.

    Article  Google Scholar 

  • Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn., 43, 177–196.

    Article  Google Scholar 

  • Kim, G., & Torralba, A. (2009). Unsupervised detection of regions of interest using iterative link analysis. In NIPS.

  • Li, L.-J., Wang, G., & Fei-Fei, L. (2007). Optimol: automatic online picture collection via incremental model learning. In Proc. CVPR.

  • Li, X., Wu, C., Zach, C., Lazebnik, S., & Frahm, J.-M. (2008). Modeling and recognition of landmark image collections using iconic scene graphs. In Proc. ECCV.

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis., 60(2), 91–110.

    Article  Google Scholar 

  • Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. Int. J. Comput. Vis., 1(60), 63–86.

    Article  Google Scholar 

  • Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In Proc. CVPR.

  • Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR.

  • Philbin, J., Sivic, J., & Zisserman, A. (2008). Geometric LDA: a generative model for particular object discovery. In Proceedings of the British machine vision conference.

  • Philbin, J., & Zisserman, A. (2008). Object mining using a matching graph on very large image collections. In Proceedings of the Indian conference on computer vision, graphics and image processing.

  • Quack, T., Ferrari, V., & Van Gool, L. (2006). Video mining with frequent itemset configurations. In Proc. CIVR.

  • Quack, T., Leibe, B., & Van Gool, L. (2008). World-scale mining of objects and events from community photo collections. In Proc. CIVR.

  • Quelhas, P., Monay, F., Odobez, J.-M., Gatica, D., Tuytelaars, T., & Van Gool, L. (2005). Modeling scenes with local descriptors and latent aspects. In Proc. ICCV (pp. 883–890).

  • Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In Proc. CVPR.

  • Schaffalitzky, F., & Zisserman, A. (2002). Multi-view matching for unordered image sets, or “How do I organize my holiday snaps?”. In Proc. ECCV (Vol. 1, pp. 414–431). Berlin: Springer-Verlag.

    Google Scholar 

  • Simon, I., & Seitz, S. M. (2008). Scene segmentation using the wisdom of crowds. In Proc. ECCV.

  • Simon, I., Snavely, N., & Seitz, S. M. (2007). Scene summarization for online image collections. In Proc. ICCV.

  • Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering object categories in image collections. In Proc. ICCV.

  • Sivic, J., Russell, B. C., Zisserman, A., Freeman, W. T., & Efros, A. A. (2008). Unsupervised discovery of visual object class hierarchies. In Proc. CVPR.

  • Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In Proc. ICCV.

  • Sivic, J., & Zisserman, A. (2004). Video data mining using configurations of viewpoint invariant regions. In Proc. CVPR, Jun 2004.

  • Snavely, N., Seitz, S., & Szeliski, R. (2006). Photo tourism: exploring photo collections in 3D. In Proc. ACM SIGGRAPH (pp. 835–846).

  • Sudderth, E., Torralba, A., Freeman, W. T., & Willsky, A. (2008). Describing visual scenes using transformed objects and parts. Int. J. Comput. Vis., 77(1–3).

  • Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. C. (2005). Image parsing: unifying segmentation, detection, and recognition. IEEE PAMI, 62(2), 113–140.

    Google Scholar 

  • Turcot, P., & Lowe, D. (2009). Better matching with fewer features: the selection of useful features in large database recognition problems. In ICCV workshop on emergent issues in large amounts of visual data (WS-LAVD).

  • Wang, X., & Grimson, E. (2007). Spatial latent Dirichlet allocation. In NIPS.

  • Winn, J., & Joijic, N. (2005). Locus: learning object classes with unsupervised segmentation. In Proc. ICCV.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James Philbin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Philbin, J., Sivic, J. & Zisserman, A. Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets. Int J Comput Vis 95, 138–153 (2011). https://doi.org/10.1007/s11263-010-0363-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-010-0363-5

Keywords

Navigation