Abstract
Adopting the Counting Grid (CG) representation [1], the Spring Lattice Counting Grid (SLCG) model uses a grid of feature counts to capture the spatial layout that a variety of images tend to follow. The images are mapped to the counting grid with their features rearranged so as to strike a balance between the mapping quality and the extent of the necessary rearrangement. In particular, the feature sets originating from different image sectors are mapped to different sub-windows in the counting grid in a configuration that is close, but not exactly the same as the configuration of the source sectors. The distribution over deformations of the sector configuration is learnable using a new spring lattice model, while the rearrangement of features within a sector is unconstrained. As a result, the CG model gains a more appropriate level of invariance to realistic image transformations like view point changes, rotations or scales. We tested SLCG on standard scene recognition datasets and on a dataset collected with a wearable camera which recorded the wearer’s visual input over three weeks. Our algorithm is capable of correctly classifying the visited locations more than 80% of the time, outperforming previous approaches to visual location recognition. At this level of performance, a variety of real-world applications of wearable cameras become feasible.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Jojic, N., Perina, A.: Multidimensional counting grids: Inferring word order from disordered bags of words. In: UAI 2011, pp. 547–556 (2011)
Perina, A., Jojic, N.: Image analysis by counting on a grid. In: CVPR 2011, pp. 1985–1992 (2011)
Bosch, A., Zisserman, A., Muñoz, X.: Scene Classification Via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part IV. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)
Jojic, N., Perina, A., Murino, V.: Structural Epitome: a way to summarize one’s visual experience. In: NIPS 2010, pp. 1027–1035 (2010)
Li, F.-F., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: CVPR (2), pp. 524–531 (2005)
Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: Free Energy score space. In: NIPS 2009, pp. 1428–1436 (2009)
Jojic, N., Frey, B.J., Kannan, A.: Epitomic analysis of appearance and shape. In: ICCV 2003, pp. 34–43 (2003)
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. Jrn. of Computer Vision 42 (2001)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: CVPR (2), pp. 2169–2178 (2006)
Lowe, D.: Distinctive Image Features from Scale-Invariant Keypoints. Int. Jrn. of Computer Vision 60 (2004)
Zhu, J., Li, L.-J., Li, F.-F., Xing, E.P.: Large Margin Learning of Upstream Scene Understanding Models. In: NIPS 2010, pp. 2586–2594 (2010)
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR 2009, pp. 413–420 (2009)
Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: ICCV 2003, pp. 273–280 (2003)
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3(5) (2003)
Fergus, R., Perona, P., Zisserman, A.: Object Class Recognition by Unsupervised Scale-Invariant Learning. In: CVPR 2003 (2003)
Sudderth, E., Ihlerl, A., Isard, T., Freeman, W., Willsky, A.: Non Parametric Belief Propagation. In: CVPR 2003 (2003)
Isard, M., Pampas, M.: Real-Valued Graphical Models for Computer Vision. In: CVPR 2003 (2003)
Sudderth, E., Mandel, M., Freeman, W., Willsky, A.: Visual Hand Tracking Using Nonparametric Belief Propagation. In: CVPR 2004 Workshop on Generative Model Based Vision (2004)
Parizi, S.N., Oberlin, J., Felzenszwalb, P.F.: Reconfigurable models for scene recognition. In: CVPR 2012 (2012)
Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: ICCV 2011 (2011)
Krahenbuhl, P., Koltun, V.: Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. In: NIPS 2011 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Perina, A., Jojic, N. (2012). Spring Lattice Counting Grids: Scene Recognition Using Deformable Positional Constraints. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33783-3_60
Download citation
DOI: https://doi.org/10.1007/978-3-642-33783-3_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33782-6
Online ISBN: 978-3-642-33783-3
eBook Packages: Computer ScienceComputer Science (R0)