Scene Recognition on the Semantic Manifold

  • Roland Kwitt
  • Nuno Vasconcelos
  • Nikhil Rasiwasia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7575)


A new architecture, denoted spatial pyramid matching on the semantic manifold (SPMSM), is proposed for scene recognition. SPMSM is based on a recent image representation on a semantic probability simplex, which is now augmented with a rough encoding of spatial information. A connection between the semantic simplex and a Riemmanian manifold is established, so as to equip the architecture with a similarity measure that respects the manifold structure of the semantic space. It is then argued that the closed-form geodesic distance between two manifold points is a natural measure of similarity between images. This leads to a conditionally positive definite kernel that can be used with any SVM classifier. An approximation of the geodesic distance reveals connections to the well-known Bhattacharyya kernel, and is explored to derive an explicit feature embedding for this kernel, by simple square-rooting. This enables a low-complexity SVM implementation, using a linear SVM on the embedded features. Several experiments are reported, comparing SPMSM to state-of-the-art recognition methods. SPMSM is shown to achieve the best recognition rates in the literature for two large datasets (MIT Indoor and SUN) and rates equivalent or superior to the state-of-the-art on a number of smaller datasets. In all cases, the resulting SVM also has much smaller dimensionality and requires much fewer support vectors than previous classifiers. This guarantees much smaller complexity and suggests improved generalization beyond the datasets considered.


Recognition Rate Geodesic Distance Semantic Space Linear Support Vector Machine Scene Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Fei-Fei, L., Van Rullen, R., Koch, C., Perona, P.: Rapid natural scene categorization in the near absence of attention. PNAS 99, 9566–9601 (1999)Google Scholar
  2. 2.
    Olivia, A., Torralba, A.: Modeling the shape of a scene: A holistic representation of the spatial envelope. IJCV 42, 145–175 (2001)CrossRefGoogle Scholar
  3. 3.
    Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR (2005)Google Scholar
  4. 4.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing scene categories. In: CVPR (2006)Google Scholar
  5. 5.
    Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)Google Scholar
  6. 6.
    Li, L.J., Su, H., Xing, E., Fei-Fei, L.: Object bank: A high-level image representation for scene classification & semantic feature sparsification. In: NIPS (2010)Google Scholar
  7. 7.
    Dixit, M., Rasiwasia, N., Vasconcelos, N.: Adapted gaussian mixtures for image classification. In: CVPR (2011)Google Scholar
  8. 8.
    Bosch, A., Zisserman, A., Munoz, X.: Image classification with random forests and ferns. In: ICCV (2007)Google Scholar
  9. 9.
    Wu, J., Rehg, J.: CENTRIST: A visual descriptor for scene categorization. PAMI 33, 1489–1501 (2011)CrossRefGoogle Scholar
  10. 10.
    Grauman, K., Darrell, T.: Pyramid match kernels: Discriminative classification with sets of image features. In: ICCV (2005)Google Scholar
  11. 11.
    Rasiwasia, N., Vasconcelos, N.: Scene classification with low-dimensional semantic spaces and weak supervision. In: CVPR (2008)Google Scholar
  12. 12.
    Rasiwasia, N., Moreno, P., Vasconcelos, N.: Bridging the gap: Query by semantic example. IEEE Trans. Multimedia 9, 923–938 (2007)CrossRefGoogle Scholar
  13. 13.
    Schwaninger, A., Vogel, J., Hofer, F., Schiele, B.: A psychophysically plausible model for typicality ranking of natural scenes. ACM Trans. Appl. Percept. 3, 333–353 (2006)CrossRefGoogle Scholar
  14. 14.
    Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: CVPR (2008)Google Scholar
  15. 15.
    Perronnin, F.: Universal and adapted vocabularies for generic visual categorization. PAMI 30, 1243–1256 (2008)CrossRefGoogle Scholar
  16. 16.
    Xiao, J., Hayes, J., Ehringer, K., Olivia, A., Torralba, A.: SUN database: Large-scale scene recognition from Abbey to Zoo. In: CVPR (2010)Google Scholar
  17. 17.
    Nowak, E., Jurie, F., Triggs, B.: Sampling Strategies for Bag-of-Features Image Classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part IV. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proceedings of the International Workshop on Statistical Learning in Computer Vision (2004)Google Scholar
  19. 19.
    Maji, S., Berg, A.: Max-margin additive classifiers for detection. In: ICCV (2009)Google Scholar
  20. 20.
    Perronnin, F.: Large-scale image categorization with explicit data embedding. In: CVPR (2010)Google Scholar
  21. 21.
    Lebanon, G.: Riemannian Geometry and Statistical Machine Learning. PhD thesis, Carnegie Mellon University (2005)Google Scholar
  22. 22.
    Zhang, D., Chen, X., Lee, W.: Text classification with kernels on the multinomial manifold. In: ACM SIGIR (2005)Google Scholar
  23. 23.
    Kaas, R.: The geometry of asymptotic inference. Stat. Sci. 4, 188–219 (1989)CrossRefGoogle Scholar
  24. 24.
    Moreno, P., Ho, P., Vasconcelos, N.: A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. In: NIPS (2003)Google Scholar
  25. 25.
    Schölkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
  26. 26.
    Lafferty, J., Lebanon, G.: Diffusion kernels on statistical manifolds. JMLR 6, 129–163 (2005)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Ablavsky, V., Sclaroff, S.: Learning parameterized histogram kernels on the simplex manifold for image and action classification. In: ICCV (2011)Google Scholar
  28. 28.
    Wu, J., Rehg, J.: Beyond the euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. In: ICCV (2009)Google Scholar
  29. 29.
    Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image Classification Using Super-Vector Coding of Local Image Descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  30. 30.
    Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: ICCV (2007)Google Scholar
  31. 31.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  32. 32.
    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM TIST 2, 1–27 (2011)CrossRefGoogle Scholar
  33. 33.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. JMLR 9, 1871–1874 (2008)zbMATHGoogle Scholar
  34. 34.
    Chapelle, O., Haffner, P., Vapnik, V.: Support vector machines for histogram-based image classification. IEEE Trans. Neural Netw. 10, 1055–1064 (1999)CrossRefGoogle Scholar
  35. 35.
    Wang, C., Blei, D., Fei-Fei, L.: Simultaneous image classification and annotation. In: CVPR (2009)Google Scholar
  36. 36.
    Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: ICCV (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Roland Kwitt
    • 1
  • Nuno Vasconcelos
    • 2
  • Nikhil Rasiwasia
    • 3
  1. 1.Kitware Inc.CarrboroUSA
  2. 2.Department of Electrical and Computer EngineeringUCSan DiegoUSA
  3. 3.Yahoo Labs!BangaloreIndia

Personalised recommendations