Spring Lattice Counting Grids: Scene Recognition Using Deformable Positional Constraints

  • Alessandro Perina
  • Nebojsa Jojic
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7577)


Adopting the Counting Grid (CG) representation [1], the Spring Lattice Counting Grid (SLCG) model uses a grid of feature counts to capture the spatial layout that a variety of images tend to follow. The images are mapped to the counting grid with their features rearranged so as to strike a balance between the mapping quality and the extent of the necessary rearrangement. In particular, the feature sets originating from different image sectors are mapped to different sub-windows in the counting grid in a configuration that is close, but not exactly the same as the configuration of the source sectors. The distribution over deformations of the sector configuration is learnable using a new spring lattice model, while the rearrangement of features within a sector is unconstrained. As a result, the CG model gains a more appropriate level of invariance to realistic image transformations like view point changes, rotations or scales. We tested SLCG on standard scene recognition datasets and on a dataset collected with a wearable camera which recorded the wearer’s visual input over three weeks. Our algorithm is capable of correctly classifying the visited locations more than 80% of the time, outperforming previous approaches to visual location recognition. At this level of performance, a variety of real-world applications of wearable cameras become feasible.


Spring Lattice Latent Dirichlet Allocation Visual Stream Scene Recognition Indoor Scene 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Jojic, N., Perina, A.: Multidimensional counting grids: Inferring word order from disordered bags of words. In: UAI 2011, pp. 547–556 (2011)Google Scholar
  2. 2.
    Perina, A., Jojic, N.: Image analysis by counting on a grid. In: CVPR 2011, pp. 1985–1992 (2011)Google Scholar
  3. 3.
    Bosch, A., Zisserman, A., Muñoz, X.: Scene Classification Via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part IV. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Jojic, N., Perina, A., Murino, V.: Structural Epitome: a way to summarize one’s visual experience. In: NIPS 2010, pp. 1027–1035 (2010)Google Scholar
  5. 5.
    Li, F.-F., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: CVPR (2), pp. 524–531 (2005)Google Scholar
  6. 6.
    Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: Free Energy score space. In: NIPS 2009, pp. 1428–1436 (2009)Google Scholar
  7. 7.
    Jojic, N., Frey, B.J., Kannan, A.: Epitomic analysis of appearance and shape. In: ICCV 2003, pp. 34–43 (2003)Google Scholar
  8. 8.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. Jrn. of Computer Vision 42 (2001)Google Scholar
  9. 9.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: CVPR (2), pp. 2169–2178 (2006)Google Scholar
  10. 10.
    Lowe, D.: Distinctive Image Features from Scale-Invariant Keypoints. Int. Jrn. of Computer Vision 60 (2004)Google Scholar
  11. 11.
    Zhu, J., Li, L.-J., Li, F.-F., Xing, E.P.: Large Margin Learning of Upstream Scene Understanding Models. In: NIPS 2010, pp. 2586–2594 (2010)Google Scholar
  12. 12.
    Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR 2009, pp. 413–420 (2009)Google Scholar
  13. 13.
    Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: ICCV 2003, pp. 273–280 (2003)Google Scholar
  14. 14.
    Blei, D., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3(5) (2003)Google Scholar
  15. 15.
    Fergus, R., Perona, P., Zisserman, A.: Object Class Recognition by Unsupervised Scale-Invariant Learning. In: CVPR 2003 (2003)Google Scholar
  16. 16.
    Sudderth, E., Ihlerl, A., Isard, T., Freeman, W., Willsky, A.: Non Parametric Belief Propagation. In: CVPR 2003 (2003)Google Scholar
  17. 17.
    Isard, M., Pampas, M.: Real-Valued Graphical Models for Computer Vision. In: CVPR 2003 (2003)Google Scholar
  18. 18.
    Sudderth, E., Mandel, M., Freeman, W., Willsky, A.: Visual Hand Tracking Using Nonparametric Belief Propagation. In: CVPR 2004 Workshop on Generative Model Based Vision (2004)Google Scholar
  19. 19.
    Parizi, S.N., Oberlin, J., Felzenszwalb, P.F.: Reconfigurable models for scene recognition. In: CVPR 2012 (2012)Google Scholar
  20. 20.
    Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: ICCV 2011 (2011)Google Scholar
  21. 21.
    Krahenbuhl, P., Koltun, V.: Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. In: NIPS 2011 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Alessandro Perina
    • 1
  • Nebojsa Jojic
    • 1
  1. 1.Microsoft ResearchRedmondUSA

Personalised recommendations