Scene Classification Via pLSA

  • Anna Bosch
  • Andrew Zisserman
  • Xavier Muñoz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3954)


Given a set of images of scenes containing multiple object categories (e.g. grass, roads, buildings) our objective is to discover these objects in each image in an unsupervised manner, and to use this object distribution to perform scene classification. We achieve this discovery using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature, here applied to a bag of visual words representation for each image. The scene classification on the object distribution is carried out by a k-nearest neighbour classifier.

We investigate the classification performance under changes in the visual vocabulary and number of latent topics learnt, and develop a novel vocabulary using colour SIFT descriptors. Classification performance is compared to the supervised approaches of Vogel & Schiele [19] and Oliva & Torralba [11], and the semi-supervised approach of Fei Fei & Perona [3] using their own datasets and testing protocols. In all cases the combination of (unsupervised) pLSA followed by (supervised) nearest neighbour classification achieves superior results. We show applications of this method to image retrieval with relevance feedback and to scene classification in videos.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: SLCV Workshop, ECCV, pp. 1–22 (2004)Google Scholar
  2. 2.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, San Diego, California (2005)Google Scholar
  3. 3.
    Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR, Washington, DC, USA, pp. 524–531 (2005)Google Scholar
  4. 4.
    Geodeme, T., Tuytelaars, T., Vanacker, G., Nuttin, M., Van Gool, L.: Omnidirectional Sparse Visual Path Following with Occlusion-Robust Feature Tracking. In: OMNIVIS Workshop, ICCV (2005)Google Scholar
  5. 5.
    Hofmann, T.: Probabilistic latent semantic indexing. ACM SIGIR (1998)Google Scholar
  6. 6.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 41, 177–196 (2001)CrossRefzbMATHGoogle Scholar
  7. 7.
    Lazebnik, S., Schmid, C., Ponce, J.: A sparse texture representation using affine-invariant regions. In: CVPR, vol. 2, pp. 319–324 (2003)Google Scholar
  8. 8.
    Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV 43, 29–44 (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Lowe, D.: Distinctive image features from scale invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  10. 10.
    Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60, 63–86 (2004)CrossRefGoogle Scholar
  11. 11.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV (42) 145–175Google Scholar
  12. 12.
    Quelhas, P., Monay, F., Odobez, J., Gatica-Perez, D., Tuytelaars, T., Van Gool, L.: Modeling scenes with local descriptors and latent aspects. In: ICCV, Beijing, China (2005)Google Scholar
  13. 13.
    Rocchio, J.: Relevance feedback in information retrieval. In: The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs, NJ (1971)Google Scholar
  14. 14.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV (2003)Google Scholar
  15. 15.
    Sivic, J., Russell, B., Efros, A., Zisserman, A., Freeman, W.T.: Discovering objets and their locations in images. In: ICCV, Beijing, China (2005)Google Scholar
  16. 16.
    Szummer, M., Picard, R.W.: Indoor-outdoor image classification. In: ICCV, Bombay, India, pp. 42–50 (1998)Google Scholar
  17. 17.
    Vailaya, A., Figueiredo, A., Jain, A., Zhang, H.: Image classification for content-based indexing. T-IP 10 (2001)Google Scholar
  18. 18.
    Varma, M., Zisserman, A.: Texture classification: Are filter banks necessary? In: CVPR, Madison, Wisconsin, vol. 2, pp. 691–698 (2003)Google Scholar
  19. 19.
    Vogel, J., Schiele, B.: Natural scene retrieval based on a semantic modeling step. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 207–215. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  20. 20.
    Zhang, R., Zhang, Z.: Hidden semantic concept discovery in region based image retrieval. In: CVPR, Washington, DC, USA (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Anna Bosch
    • 1
  • Andrew Zisserman
    • 2
  • Xavier Muñoz
    • 1
  1. 1.Computer Vision and Robotics GroupUniversity of GironaGironaSpain
  2. 2.Robotics Research GroupUniversity of OxfordOxfordUK

Personalised recommendations