Object-Centric Spatial Pooling for Image Classification

  • Olga Russakovsky
  • Yuanqing Lin
  • Kai Yu
  • Li Fei-Fei
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7573)


Spatial pyramid matching (SPM) based pooling has been the dominant choice for state-of-art image classification systems. In contrast, we propose a novel object-centric spatial pooling (OCP) approach, following the intuition that knowing the location of the object of interest can be useful for image classification. OCP consists of two steps: (1) inferring the location of the objects, and (2) using the location information to pool foreground and background features separately to form the image-level representation. Step (1) is particularly challenging in a typical classification setting where precise object location annotations are not available during training. To address this challenge, we propose a framework that learns object detectors using only image-level class labels, or so-called weak labels. We validate our approach on the challenging PASCAL07 dataset. Our learned detectors are comparable in accuracy with state-of-the-art weakly supervised detection methods. More importantly, the resulting OCP approach significantly outperforms SPM-based pooling in image classification.


Object Detector Outer Loop Foreground Region Spatial Pyramid Match Simple Dataset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Nguyen, M.H., Torresani, L., de la Torre, F., Rother, C.: Weakly supervised discriminative localization and classification: a joint learning process. In: ICCV (2009)Google Scholar
  2. 2.
    Bilen, H., Namboodiri, V.P., Gool, L.V.: Object and action classification with latent variables. In: BMVC (2010)Google Scholar
  3. 3.
    Chai, Y., Lempitsky, V., Zisserman, A.: BiCoS: A bi-level co-segmentation method for image classification. In: CVPR (2011)Google Scholar
  4. 4.
    Murphy, K., Torralba, A., Eaton, D., Freeman, W.: Object detection and localization using local and global features. Lecture Notes in Compute Science (2006)Google Scholar
  5. 5.
    Crandall, D.J., Huttenlocher, D.P.: Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 16–29. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Zhang, Y., Chen, T.: Weakly supervised object recognition and localization with invariant high order features. In: BMVC (2010)Google Scholar
  7. 7.
    Feng, J., Ni, B., Tian, Q., Yan, S.: Geometric ℓp-norm feature pooling for image classification. In: CVPR (2011)Google Scholar
  8. 8.
    Hedi, H., Frederic, J., Cordelia, S.: Combining efficient object localization and image classification. In: ICCV (2009)Google Scholar
  9. 9.
    Song, Z., Chen, Q., Huang, Z., Hua, Y., Yan, S.: Contextualizing object detection and classification. In: CVPR (2011)Google Scholar
  10. 10.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained Linear Coding for image classification. In: CVPR (2010)Google Scholar
  11. 11.
    Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image Classification Using Super-Vector Coding of Local Image Descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)Google Scholar
  13. 13.
    Berg, A., Deng, J., Satheesh, S., Su, H., Fei-Fei, L.: Large scale visual recognition challenge (2010-2011),
  14. 14.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal Visual Object Classes (VOC) challenge. IJCV 88, 303–338 (2010)CrossRefGoogle Scholar
  15. 15.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial Pyramid Matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  16. 16.
    Deselaers, T., Alexe, B., Ferrari, V.: Localizing Objects While Learning Their Appearance. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 452–466. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: ICCV (2011)Google Scholar
  18. 18.
    Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: CVPR (2005)Google Scholar
  19. 19.
    Ahonen, T., Hadid, A., Pietikinen, M.: Face description with local binary patterns: Application to face recognition. PAMI 28 (2006)Google Scholar
  20. 20.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  21. 21.
    Huang, Y., Huang, K., Tan, T.: Salient coding for image classification. In: CVPR (2011)Google Scholar
  22. 22.
    Gao, S., Chia, L.T., Tsang, I.W.: Multi-layer group sparse coding – for concurrent image classification and annotation. In: CVPR (2011)Google Scholar
  23. 23.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for Large-Scale Image Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  24. 24.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. PAMI 32 (2010)Google Scholar
  25. 25.
    ovan de Sande, K.E.A., Uijlings, J.R.R., Gevers, T., Smeulders, A.W.M.: Segmentation as selective search for object recognition. In: ICCV (2011)Google Scholar
  26. 26.
    Russell, B.C., Freeman, W.T., Effros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: CVPR (2006)Google Scholar
  27. 27.
    Kim, G., Torralba, A.: Unsupervised detection of regions of interest using iterative link analysis. In: NIPS (2009)Google Scholar
  28. 28.
    Chum, O., Zisserman, A.: An exemplar model for learning object classes. In: CVPR (2007)Google Scholar
  29. 29.
    Oliva, A., Torralba, A.: The role of context in object recognition. Trends in Cognitive Sciences 11 (2007)Google Scholar
  30. 30.
    Lin, Y., Lv, F., Cao, L., Zhu, S., Yang, M., Cour, T., Yu, K., Huang, T.: Large-scale image classification: Fast feature extraction and SVM training. In: CVPR (2011)Google Scholar
  31. 31.
    Guillaumin, M., Verbeek, J., Schmid, C.: Multimodal semi-supervised learning for image classification. In: CVPR (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Olga Russakovsky
    • 1
  • Yuanqing Lin
    • 2
  • Kai Yu
    • 3
  • Li Fei-Fei
    • 1
  1. 1.Stanford UniversityUSA
  2. 2.NEC Laboratories AmericaUSA
  3. 3.Baidu Inc.China

Personalised recommendations