Abstract
Robust low-level image features have proven to be effective representations for a variety of high-level visual recognition tasks, such as object recognition and scene classification. But as the visual recognition tasks become more challenging, the semantic gap between low-level feature representation and the meaning of the scenes increases. In this paper, we propose to use objects as attributes of scenes for scene classification. We represent images by collecting their responses to a large number of object detectors, or “object filters”. Such representation carries high-level semantic information rather than low-level image feature information, making it more suitable for high-level visual recognition tasks. Using very simple, off-the-shelf classifiers such as SVM, we show that this object-level image representation can be used effectively for high-level visual tasks such as scene classification. Our results are superior to reported state-of-the-art performance on a number of standard datasets.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 509–522 (2002)
Bosch, A., Zisserman, A., Muñoz, X.: Scene Classification Via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)
Bourdev, L., Malik, J.: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations. In: ICCV (2009)
Ramanan, D., Desai, C., Fowlkes, C.: Discriminative models for multi-class object layout. In: ICCV (2009)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, p. 886 (2005)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR 2009 (2009)
B. Edition, BNC Sampler British National Corpus.
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)
Fei-Fei, L., Fergus, R., Perona, P.: One-Shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence (2006)
Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. Short Course CVPR (2007), http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html
Fei-Fei, L., Perona, P.: A Bayesian hierarchy model for learning natural scene categories. In: Computer Vision and Pattern Recognition (2005)
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object Detection with Discriminatively Trained Part Based Models. Journal of Artificial Intelligence Research 29 (2007)
Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2007)
Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(9), 891–906 (1991)
Griffin, G., Holub, A., Perona, P.: Caltech-256 Object Category Dataset (2007)
Hauptmann, A., Yan, R., Lin, W., Christel, M., Wactlar, H.: Can high-level concepts fill the semantic gap in video retrieval? a case study with broadcast news. IEEE Transactions on Multimedia 9(5), 958 (2007)
Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models: Combining models for holistic scene understanding. In: Proceedings of Neural Information Processing Systems, NIPS, Vancouver, Canada, vol. 8 (2008)
Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. In: Proceedings of ACM SIGGRAPH 2005, vol. 24(3), pp. 577–584 (2005)
Hoiem, D., Efros, A.A., Hebert, M.: Putting Objects in Perspective. In: CVPR (2006)
Ide, N., Macleod, C.: The american national corpus: A standardized resource of american english. In: Proceedings of Corpus Linguistics 2001, pp. 274–280. Citeseer (2001)
Jin, Y., Geman, S.: Context and hierarchy in a probabilistic image model. In: CVPR (2006)
Kadir, T., Brady, M.: Scale, saliency and image description. International Journal of Computer Vision 45(2), 83–105 (2001)
Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: ICCV (2009)
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)
Lampert, C.H., Blaschko, M.B., Hofmann, T., Zurich, S.: Beyond sliding windows: Object localization by efficient subwindow search. In: Proc. of CVPR, vol. 1, p. 3 (2008)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories (2006)
Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV 43(1), 29–44 (2001)
Li, L.-J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: Proc. ICCV (2007)
Lowe, D.: Object recognition from local scale-invariant features. In: Proc. International Conference on Computer Vision (1999)
Mikolajczyk, K., Schmid, C.: An Affine Invariant Interest Point Detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM (1995)
Murphy, K., Torralba, A., Freeman, W.T.: Using the forest to see the trees: a graphical model relating features, objects and scenes. In: NIPS (Neural Info. Processing Systems) (2004)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. Journal of Computer Vision 42 (2001)
P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on pattern Analysis and machine intelligence, 12(7):629–639, 1990.
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: IEEE International Conference on Computer Vision (2007)
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation (2005)
Smith, J.R., Naphade, M., Natsev, A.: Multimedia semantic indexing using model vectors. In: ICME 2003: Proceedings of the 2003 International Conference on Multimedia and Expo, pp. 445–448. IEEE Computer Society, Washington, DC (2003)
Sudderth, E., Torralba, A., Freeman, W.T., Willsky, A.: Learning hierarchical models of scenes, objects, and parts. In: Proc. International Conference on Computer Vision (2005)
Tversky, B., Hemenway, K.: Categories of environmental scenes. Cognitive Psychology 15(1), 121–149 (1983)
Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple Kernels for Object Detection (2009)
Von Ahn, L.: Games with a purpose. Computer 39(6), 92–94 (2006)
Zhu, L., Chen, Y., Yuille, A.: Unsupervised learning of a probabilistic grammar for object detection and parsing. In: Advances in Neural Information Processing Systems, vol. 19, p. 1617 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, LJ., Su, H., Lim, Y., Fei-Fei, L. (2012). Objects as Attributes for Scene Classification. In: Kutulakos, K.N. (eds) Trends and Topics in Computer Vision. ECCV 2010. Lecture Notes in Computer Science, vol 6553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35749-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-35749-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35748-0
Online ISBN: 978-3-642-35749-7
eBook Packages: Computer ScienceComputer Science (R0)