Objects as Attributes for Scene Classification

Li, Li-Jia; Su, Hao; Lim, Yongwhan; Fei-Fei, Li

doi:10.1007/978-3-642-35749-7_5

Li-Jia Li¹⁷,
Hao Su¹⁷,
Yongwhan Lim¹⁷ &
…
Li Fei-Fei¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6553))

Included in the following conference series:

European Conference on Computer Vision

Abstract

Robust low-level image features have proven to be effective representations for a variety of high-level visual recognition tasks, such as object recognition and scene classification. But as the visual recognition tasks become more challenging, the semantic gap between low-level feature representation and the meaning of the scenes increases. In this paper, we propose to use objects as attributes of scenes for scene classification. We represent images by collecting their responses to a large number of object detectors, or “object filters”. Such representation carries high-level semantic information rather than low-level image feature information, making it more suitable for high-level visual recognition tasks. Using very simple, off-the-shelf classifiers such as SVM, we show that this object-level image representation can be used effectively for high-level visual tasks such as scene classification. Our results are superior to reported state-of-the-art performance on a number of standard datasets.

Download to read the full chapter text

Chapter PDF

Visual descriptors for scene categorization: experimental evaluation

Article Open access 17 November 2015

Can computer vision problems benefit from structured hierarchical classification?

Article Open access 06 May 2016

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 509–522 (2002)
Google Scholar
Bosch, A., Zisserman, A., Muñoz, X.: Scene Classification Via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)
Chapter Google Scholar
Bourdev, L., Malik, J.: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations. In: ICCV (2009)
Google Scholar
Ramanan, D., Desai, C., Fowlkes, C.: Discriminative models for multi-class object layout. In: ICCV (2009)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, p. 886 (2005)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR 2009 (2009)
Google Scholar
B. Edition, BNC Sampler British National Corpus.
Google Scholar
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: One-Shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence (2006)
Google Scholar
Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. Short Course CVPR (2007), http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html
Fei-Fei, L., Perona, P.: A Bayesian hierarchy model for learning natural scene categories. In: Computer Vision and Pattern Recognition (2005)
Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object Detection with Discriminatively Trained Part Based Models. Journal of Artificial Intelligence Research 29 (2007)
Google Scholar
Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2007)
Google Scholar
Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(9), 891–906 (1991)
Article Google Scholar
Griffin, G., Holub, A., Perona, P.: Caltech-256 Object Category Dataset (2007)
Google Scholar
Hauptmann, A., Yan, R., Lin, W., Christel, M., Wactlar, H.: Can high-level concepts fill the semantic gap in video retrieval? a case study with broadcast news. IEEE Transactions on Multimedia 9(5), 958 (2007)
Article Google Scholar
Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models: Combining models for holistic scene understanding. In: Proceedings of Neural Information Processing Systems, NIPS, Vancouver, Canada, vol. 8 (2008)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. In: Proceedings of ACM SIGGRAPH 2005, vol. 24(3), pp. 577–584 (2005)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Putting Objects in Perspective. In: CVPR (2006)
Google Scholar
Ide, N., Macleod, C.: The american national corpus: A standardized resource of american english. In: Proceedings of Corpus Linguistics 2001, pp. 274–280. Citeseer (2001)
Google Scholar
Jin, Y., Geman, S.: Context and hierarchy in a probabilistic image model. In: CVPR (2006)
Google Scholar
Kadir, T., Brady, M.: Scale, saliency and image description. International Journal of Computer Vision 45(2), 83–105 (2001)
Article MATH Google Scholar
Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: ICCV (2009)
Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)
Google Scholar
Lampert, C.H., Blaschko, M.B., Hofmann, T., Zurich, S.: Beyond sliding windows: Object localization by efficient subwindow search. In: Proc. of CVPR, vol. 1, p. 3 (2008)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories (2006)
Google Scholar
Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV 43(1), 29–44 (2001)
Article MATH Google Scholar
Li, L.-J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: Proc. ICCV (2007)
Google Scholar
Lowe, D.: Object recognition from local scale-invariant features. In: Proc. International Conference on Computer Vision (1999)
Google Scholar
Mikolajczyk, K., Schmid, C.: An Affine Invariant Interest Point Detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Chapter Google Scholar
Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM (1995)
Google Scholar
Murphy, K., Torralba, A., Freeman, W.T.: Using the forest to see the trees: a graphical model relating features, objects and scenes. In: NIPS (Neural Info. Processing Systems) (2004)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. Journal of Computer Vision 42 (2001)
Google Scholar
P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on pattern Analysis and machine intelligence, 12(7):629–639, 1990.
Article Google Scholar
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: IEEE International Conference on Computer Vision (2007)
Google Scholar
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation (2005)
Google Scholar
Smith, J.R., Naphade, M., Natsev, A.: Multimedia semantic indexing using model vectors. In: ICME 2003: Proceedings of the 2003 International Conference on Multimedia and Expo, pp. 445–448. IEEE Computer Society, Washington, DC (2003)
Google Scholar
Sudderth, E., Torralba, A., Freeman, W.T., Willsky, A.: Learning hierarchical models of scenes, objects, and parts. In: Proc. International Conference on Computer Vision (2005)
Google Scholar
Tversky, B., Hemenway, K.: Categories of environmental scenes. Cognitive Psychology 15(1), 121–149 (1983)
Article Google Scholar
Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple Kernels for Object Detection (2009)
Google Scholar
Von Ahn, L.: Games with a purpose. Computer 39(6), 92–94 (2006)
Article Google Scholar
Zhu, L., Chen, Y., Yuille, A.: Unsupervised learning of a probabilistic grammar for object detection and parsing. In: Advances in Neural Information Processing Systems, vol. 19, p. 1617 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Stanford University, USA
Li-Jia Li, Hao Su, Yongwhan Lim & Li Fei-Fei

Authors

Li-Jia Li
View author publications
You can also search for this author in PubMed Google Scholar
Hao Su
View author publications
You can also search for this author in PubMed Google Scholar
Yongwhan Lim
View author publications
You can also search for this author in PubMed Google Scholar
Li Fei-Fei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 10 King’s College Road, ON M5S 3G4, Toronto, Canada
Kiriakos N. Kutulakos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, LJ., Su, H., Lim, Y., Fei-Fei, L. (2012). Objects as Attributes for Scene Classification. In: Kutulakos, K.N. (eds) Trends and Topics in Computer Vision. ECCV 2010. Lecture Notes in Computer Science, vol 6553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35749-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-35749-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35748-0
Online ISBN: 978-3-642-35749-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Objects as Attributes for Scene Classification

Abstract

Chapter PDF

Similar content being viewed by others

Visual descriptors for scene categorization: experimental evaluation

Can computer vision problems benefit from structured hierarchical classification?

ImageNet Large Scale Visual Recognition Challenge

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Objects as Attributes for Scene Classification

Abstract

Chapter PDF

Similar content being viewed by others

Visual descriptors for scene categorization: experimental evaluation

Can computer vision problems benefit from structured hierarchical classification?

ImageNet Large Scale Visual Recognition Challenge

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation