Objects as Attributes for Scene Classification
Abstract
Robust low-level image features have proven to be effective representations for a variety of high-level visual recognition tasks, such as object recognition and scene classification. But as the visual recognition tasks become more challenging, the semantic gap between low-level feature representation and the meaning of the scenes increases. In this paper, we propose to use objects as attributes of scenes for scene classification. We represent images by collecting their responses to a large number of object detectors, or “object filters”. Such representation carries high-level semantic information rather than low-level image feature information, making it more suitable for high-level visual recognition tasks. Using very simple, off-the-shelf classifiers such as SVM, we show that this object-level image representation can be used effectively for high-level visual tasks such as scene classification. Our results are superior to reported state-of-the-art performance on a number of standard datasets.
Keywords
Image Representation Spatial Pyramid British National Corpus Scene Dataset Scene ClassReferences
- 1.Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 509–522 (2002)Google Scholar
- 2.Bosch, A., Zisserman, A., Muñoz, X.: Scene Classification Via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 3.Bourdev, L., Malik, J.: Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations. In: ICCV (2009)Google Scholar
- 4.Ramanan, D., Desai, C., Fowlkes, C.: Discriminative models for multi-class object layout. In: ICCV (2009)Google Scholar
- 5.Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, p. 886 (2005)Google Scholar
- 6.Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR 2009 (2009)Google Scholar
- 7.B. Edition, BNC Sampler British National Corpus.Google Scholar
- 8.Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)Google Scholar
- 9.Fei-Fei, L., Fergus, R., Perona, P.: One-Shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence (2006)Google Scholar
- 10.Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. Short Course CVPR (2007), http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html
- 11.Fei-Fei, L., Perona, P.: A Bayesian hierarchy model for learning natural scene categories. In: Computer Vision and Pattern Recognition (2005)Google Scholar
- 12.Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object Detection with Discriminatively Trained Part Based Models. Journal of Artificial Intelligence Research 29 (2007)Google Scholar
- 13.Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2007)Google Scholar
- 14.Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(9), 891–906 (1991)CrossRefGoogle Scholar
- 15.Griffin, G., Holub, A., Perona, P.: Caltech-256 Object Category Dataset (2007)Google Scholar
- 16.Hauptmann, A., Yan, R., Lin, W., Christel, M., Wactlar, H.: Can high-level concepts fill the semantic gap in video retrieval? a case study with broadcast news. IEEE Transactions on Multimedia 9(5), 958 (2007)CrossRefGoogle Scholar
- 17.Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models: Combining models for holistic scene understanding. In: Proceedings of Neural Information Processing Systems, NIPS, Vancouver, Canada, vol. 8 (2008)Google Scholar
- 18.Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. In: Proceedings of ACM SIGGRAPH 2005, vol. 24(3), pp. 577–584 (2005)Google Scholar
- 19.Hoiem, D., Efros, A.A., Hebert, M.: Putting Objects in Perspective. In: CVPR (2006)Google Scholar
- 20.Ide, N., Macleod, C.: The american national corpus: A standardized resource of american english. In: Proceedings of Corpus Linguistics 2001, pp. 274–280. Citeseer (2001)Google Scholar
- 21.Jin, Y., Geman, S.: Context and hierarchy in a probabilistic image model. In: CVPR (2006)Google Scholar
- 22.Kadir, T., Brady, M.: Scale, saliency and image description. International Journal of Computer Vision 45(2), 83–105 (2001)zbMATHCrossRefGoogle Scholar
- 23.Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: ICCV (2009)Google Scholar
- 24.Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)Google Scholar
- 25.Lampert, C.H., Blaschko, M.B., Hofmann, T., Zurich, S.: Beyond sliding windows: Object localization by efficient subwindow search. In: Proc. of CVPR, vol. 1, p. 3 (2008)Google Scholar
- 26.Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories (2006)Google Scholar
- 27.Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV 43(1), 29–44 (2001)zbMATHCrossRefGoogle Scholar
- 28.Li, L.-J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: Proc. ICCV (2007)Google Scholar
- 29.Lowe, D.: Object recognition from local scale-invariant features. In: Proc. International Conference on Computer Vision (1999)Google Scholar
- 30.Mikolajczyk, K., Schmid, C.: An Affine Invariant Interest Point Detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)CrossRefGoogle Scholar
- 31.Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM (1995)Google Scholar
- 32.Murphy, K., Torralba, A., Freeman, W.T.: Using the forest to see the trees: a graphical model relating features, objects and scenes. In: NIPS (Neural Info. Processing Systems) (2004)Google Scholar
- 33.Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. Journal of Computer Vision 42 (2001)Google Scholar
- 34.P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on pattern Analysis and machine intelligence, 12(7):629–639, 1990.CrossRefGoogle Scholar
- 35.Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: IEEE International Conference on Computer Vision (2007)Google Scholar
- 36.Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation (2005)Google Scholar
- 37.Smith, J.R., Naphade, M., Natsev, A.: Multimedia semantic indexing using model vectors. In: ICME 2003: Proceedings of the 2003 International Conference on Multimedia and Expo, pp. 445–448. IEEE Computer Society, Washington, DC (2003)Google Scholar
- 38.Sudderth, E., Torralba, A., Freeman, W.T., Willsky, A.: Learning hierarchical models of scenes, objects, and parts. In: Proc. International Conference on Computer Vision (2005)Google Scholar
- 39.Tversky, B., Hemenway, K.: Categories of environmental scenes. Cognitive Psychology 15(1), 121–149 (1983)CrossRefGoogle Scholar
- 40.Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple Kernels for Object Detection (2009)Google Scholar
- 41.Von Ahn, L.: Games with a purpose. Computer 39(6), 92–94 (2006)CrossRefGoogle Scholar
- 42.Zhu, L., Chen, Y., Yuille, A.: Unsupervised learning of a probabilistic grammar for object detection and parsing. In: Advances in Neural Information Processing Systems, vol. 19, p. 1617 (2007)Google Scholar