Bregman pooling: feature-space local pooling for image classification

Regular Paper
  • 122 Downloads

Abstract

In this paper, we propose a novel feature-space local pooling method for the commonly adopted architecture of image classification. While existing methods partition the feature space based on visual appearance to obtain pooling bins, learning more accurate space partitioning that takes semantics into account boosts performance even for a smaller number of bins. To this end, we propose partitioning the feature space over clusters of visual prototypes common to semantically similar images (i.e., images belonging to the same category). The clusters are obtained by Bregman co-clustering applied offline on a subset of training data. Therefore, being aware of the semantic context of the input image, our features have higher discriminative power than do those pooled from appearance-based partitioning. Testing on four datasets (Caltech-101, Caltech-256, 15 Scenes, and 17 Flowers) belonging to three different classification tasks showed that the proposed method outperforms methods in previous works on local pooling in the feature space for less feature dimensionality. Moreover, when implemented within a spatial pyramid, our method achieves comparable results on three of the datasets used.

Keywords

Image classification Image representation Feature pooling Co-clustering Bregman divergence 

Notes

Acknowledgments

This work was partly supported by Grant-in-Aid for Scientific Research (B) 25280036, Japan Society for the Promotion of Science (JSPS).

References

  1. 1.
    Avila S, Thome N, Cord M, Valle E, De AraúJo A (2013) Pooling in image representation: the visual codeword point of view. Comp Vision Image Underst (CVIU) 117(5):453–465Google Scholar
  2. 2.
    Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS (2007) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J Mach Learn Res (JMLR) 8:1919–1986MATHMathSciNetGoogle Scholar
  3. 3.
    Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comp Vision Image Underst (CVIU) 110(3):346–359CrossRefGoogle Scholar
  4. 4.
    Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intel (PAMI) 35(8):1798–1828CrossRefGoogle Scholar
  5. 5.
    Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8Google Scholar
  6. 6.
    Boureau YL (2012) Learning hierarchical feature extractors for image recognition. PhD thesis, New York UniversityGoogle Scholar
  7. 7.
    Boureau YL, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2559–2566Google Scholar
  8. 8.
    Boureau YL, Le Roux N, Bach F, Ponce J, LeCun Y (2011) Ask the locals: multi-way local pooling for image recognition. In: Proceedings of the International Conference on Computer Vision (ICCV), pp 2651–2658Google Scholar
  9. 9.
    Bregman L (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp Math Math Phys 7(3):200–217CrossRefGoogle Scholar
  10. 10.
    Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British Machine Vision Conference (BMVC), pp 76.1–76.12Google Scholar
  11. 11.
    Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv:1405.3531
  12. 12.
    Chen Q, Song Z, Hua Y, Huang Z, Yan S (2012) Hierarchical matching with side information for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3426–3433Google Scholar
  13. 13.
    Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the International Society for Computational Biology, pp 93–103Google Scholar
  14. 14.
    Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision (ECCV), pp 1–22Google Scholar
  15. 15.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 886–893Google Scholar
  16. 16.
    Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 89–98Google Scholar
  17. 17.
    Everingham M, Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comp Vision (IJCV) 88(2):303–338CrossRefGoogle Scholar
  18. 18.
    Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res (JMLR) 9:1871–1874MATHGoogle Scholar
  19. 19.
    Fanello S, Noceti N, Ciliberto C, Metta G, Odone F (2014) Ask the image: supervised pooling to preserve feature locality. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 851–858Google Scholar
  20. 20.
    Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 524–531Google Scholar
  21. 21.
    Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 178–178Google Scholar
  22. 22.
    Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cyber 36:193–202MATHMathSciNetCrossRefGoogle Scholar
  23. 23.
    Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proceedings of the International Conference on Computer Vision (ICCV), pp 1458–1465Google Scholar
  24. 24.
    Griffin G, Holub A, Perona P (2007) The Caltech 256. Tech. rep, California institute of technologyGoogle Scholar
  25. 25.
    Gupta A, Bowden R (2012) Unity in diversity: discovering topics from words: Information theoretic co-clustering for visual categorization. In: Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), pp 628–633Google Scholar
  26. 26.
    Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129CrossRefGoogle Scholar
  27. 27.
    He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 346–361Google Scholar
  28. 28.
    Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160:106–154CrossRefGoogle Scholar
  29. 29.
    Jegou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intel (PAMI) 34(9):1704–1716CrossRefGoogle Scholar
  30. 30.
    Jia Y, Huang C, Darrell T (2012) Beyond spatial pyramids: Receptive field learning for pooled image features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3370–3377Google Scholar
  31. 31.
    Khan R, Barat C, Muselet D, Ducottet C, Saint-Etienne F, Etienne F (2012) Spatial orientations of visual word pairs to improve bag-of-visual-words model. In: Proceedings of the British Machine Vision Conference (BMVC), pp 102–112Google Scholar
  32. 32.
    Khan R, Barat C, Muselet D, Ducottet C (2015) Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model. Comp Vision Image Underst (CVIU) 132:102–112CrossRefGoogle Scholar
  33. 33.
    Koniusz P, Yan F, Mikolajczyk K (2013) Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection. Comp Vision Image Underst (CVIU) 117(5):479–492CrossRefGoogle Scholar
  34. 34.
    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp 1097–1105Google Scholar
  35. 35.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2169–2178Google Scholar
  36. 36.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  37. 37.
    Liu J, Shah M (2007) Scene modeling using co-clustering. In: Proceedings of the International Conference on Computer Vision (ICCV), pp 1–7Google Scholar
  38. 38.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comp Vision (IJCV) 60(2):91–110CrossRefGoogle Scholar
  39. 39.
    MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, pp 281–297Google Scholar
  40. 40.
    Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool LV (2005) A comparison of affine region detectors. Int J Comp Vision (IJCV) 65(1–2):43–72CrossRefGoogle Scholar
  41. 41.
    Nilsback ME, Zisserman A (2006) A visual vocabulary for flower classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1447–1454Google Scholar
  42. 42.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comp Vision (IJCV) 42(3):145–175MATHCrossRefGoogle Scholar
  43. 43.
    Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 143–156Google Scholar
  44. 44.
    Rematas K, Fritz M, Tuytelaars T (2013) The pooled NBNN kernel: Beyond image-to-class and image-to-image. In: Proceedings of the Asian Conference on Computer Vision (ACCV), pp 176–189Google Scholar
  45. 45.
    Russakovsky O, Lin Y, Yu K, Fei-Fei L (2012) Object-centric spatial pooling for image classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 1–15Google Scholar
  46. 46.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2014) ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575
  47. 47.
    Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1470–1477Google Scholar
  48. 48.
    Vapnik VN (1998) Statistical learning theory, 1st edn. Wiley, New YorkMATHGoogle Scholar
  49. 49.
    Vedaldi A, Fulkerson B (2008) VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/
  50. 50.
    Wang C, Huang K (2014) How to use bag-of-words model better for image classification. Image Vision Comp. doi: 10.1016/j.imavis.2014.10.013
  51. 51.
    Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3360–3367Google Scholar
  52. 52.
    Wang Z, Feng J, Yan S (2014) Collaborative linear coding for robust image classification. Int J Comp Vision (IJCV) 1–12Google Scholar
  53. 53.
    Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1794–1801Google Scholar
  54. 54.
    Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems (NIPS), pp 487–495Google Scholar
  55. 55.
    Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 141–154Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.Laboratory of Media Dynamics, Graduate School of Information Science and TechnologyHokkaido UniversitySapporoJapan

Personalised recommendations