Location-Aware Image Classification

  • Xinggang WangEmail author
  • Xin Yang
  • Wenyu Liu
  • Chen Duan
  • Longin Jan Latecki
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9516)


Currently, the most popular image classification methods are based on global image representations. They face an obvious contradiction between the uncertainty of object position and the global image representation. In this paper, we propose a novel location-aware image classification framework to address this problem. In our framework, an image is classified based on local image representation, and the classifier is learned using an iterative multi-instance learning with a latent SVM, i.e., we infer object location using latent SVM to improve image classification. Our method is very efficient and outperforms the popular spatial pyramid matching (SPM) method and the Region Based Latent SVM (RBLSVM) method [1] on the challenging PASCAL VOC dataset.


Image classification Latent SVM Spatial pyramid matching 



This work was primarily supported by National Natural Science Foundation of China (NSFC) (No. 61503145). This material is also based upon work supported by the NSF under Grants No. IIS-1302164 and OIA-1027897.


  1. 1.
    Yakhnenko, O., Verbeek, J., Schmid, C.: Region-based image classification with a latent SVM model. Research report RR-7665, INRIA (2011)Google Scholar
  2. 2.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Proceedings of ICCV, pp. 1470–1477 (2003)Google Scholar
  3. 3.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of CVPR (2006)Google Scholar
  4. 4.
    Grauman, K., Darrell, T.: Pyramid match kernels criminative classification with sets of image features. In: ICCV (2005)Google Scholar
  5. 5.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Proceedings of CVPR (2010)Google Scholar
  6. 6.
    Song, Z., Chen, Q., Huang, Z., Hua, Y., Yan, S.: Contextualizing object detection and classification. In: Proceedings of CVPR (2011)Google Scholar
  7. 7.
    Xie, L., Tian, Q., Wang, M., Zhang, B.: Spatial pooling of heterogeneous features for image classification. IEEE Trans. Image Process. 23, 1994–2008 (2014)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Zhang, H.J.: Image classification with kernelized spatial-context. IEEE Trans. Multimedia 12, 278–287 (2010)CrossRefGoogle Scholar
  9. 9.
    Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. IEEE Trans. Pattern Anal. Mach. Intell. 89, 31–71 (1997)zbMATHGoogle Scholar
  10. 10.
    Wang, X., Bai, X., Liu, W., Latecki, L.J.: Feature context for image classification and object detection. In: Proceedings of CVPR (2011)Google Scholar
  11. 11.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)CrossRefGoogle Scholar
  12. 12.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. PAMI 24, 509–522 (2002)CrossRefGoogle Scholar
  13. 13.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Proceedings of Advances in Neural Information Processing Systems (2003)Google Scholar
  14. 14.
    Hong, R., Wang, M., Gao, Y., Tao, D., Li, X., Wu, X.: Image annotation by multiple-instance learning with discriminative feature mapping and selection. IEEE Trans. Cybern. 44, 669–680 (2014)CrossRefGoogle Scholar
  15. 15.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge (VOC2007) (2007), Results.
  16. 16.
    Wang, M., Li, G., Lu, Z., Gao, Y., Chua, T.S.: When amazon meets google: product visualization by exploring multiple web sources. ACM Trans. Internet Technol. (TOIT) 12, 12 (2013)CrossRefGoogle Scholar
  17. 17.
    Wang, M., Li, H., Tao, D., Lu, K., Wu, X.: Multimodal graph-based reranking for web image search. IEEE Trans. Image Process. 21, 4649–4661 (2012)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Wang, X., Feng, B., Bai, X., Liu, W., Latecki, L.J.: Bag of contour fragments for robust shape classification. Pattern Recogn. 47, 2116–2125 (2014)CrossRefGoogle Scholar
  19. 19.
    Zhu, J., Wu, T., Zhu, J., Yang, X., Zhang, W.: Learning reconfigurable scene representation by tangram model. In: 2012 IEEE Workshop on Applications of Computer Vision (WACV), pp. 449–456. IEEE (2012)Google Scholar
  20. 20.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2003)Google Scholar
  21. 21.
    Lee, Y.J., Grauman, K.: Object-graphs for context-aware category discovery. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 34, 346–358 (2011)Google Scholar
  22. 22.
    Yuan, J., Wu, Y.: Spatial random partition for common visual pattern discovery. In: Proceedings of ICCV (2007)Google Scholar
  23. 23.
    Zhu, L.L., Lin, C.X., Huang, H., Chen, Y., Yuille, A.L.: Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 759–773. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  24. 24.
    Zhu, J., Zou, W., Yang, X., Zhang, R., Zhou, Q., Zhang, W.: Image classification by hierarchical spatial pooling with partial least squares analysis. In: BMVC, pp. 1–11 (2012)Google Scholar
  25. 25.
    Khan, I., Roth, P.M., Bischof, H.: Learning object detectors from weakly-labeled internet images. In: OAGM Workshop (2010)Google Scholar
  26. 26.
    Alexe, B., Deselares, T., Ferrari, V.: What is an object? In: Proceedings of CVPR (2010)Google Scholar
  27. 27.
    Vijayanarasimhan, S., Grauman, K.: Keywords to visual categories: multiple-instance learning for weakly supervised object categorization. In: Proceedings of CVPR (2008)Google Scholar
  28. 28.
    Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: Proceedings of ICCV (2011)Google Scholar
  29. 29.
    Russakovsky, O., Lin, Y., Yu, K., Fei-Fei, L.: Object-centric spatial pooling for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 1–15. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  30. 30.
    Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: International Conference on Computer Vision (2009)Google Scholar
  31. 31.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)CrossRefGoogle Scholar
  32. 32.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of CVPR (2009)Google Scholar
  33. 33.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  34. 34.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)zbMATHCrossRefGoogle Scholar
  35. 35.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)zbMATHGoogle Scholar
  36. 36.
    Quack, T., Ferrari, V., Leibe, B., Gool, L.V.: Efficient mining of frequent and distinctive feature configurations. In: International Conference on Computer Vision (ICCV 2007) (2007)Google Scholar
  37. 37.
    Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 28–42. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  38. 38.
    Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British Machine Vision Conference (BMVC) (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Xinggang Wang
    • 1
    Email author
  • Xin Yang
    • 1
  • Wenyu Liu
    • 1
  • Chen Duan
    • 2
  • Longin Jan Latecki
    • 3
  1. 1.School of EICHuazhong University of Science and TechnologyWuhanChina
  2. 2.Wuhan Second Ship Design and Research InstituteWuhanChina
  3. 3.Department of CISTemple UniversityPhiladelphiaUSA

Personalised recommendations