Hybrid Pooling Fusion in the BoW Pipeline

  • Marc Law
  • Nicolas Thome
  • Matthieu Cord
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7585)


In the context of object and scene recognition, state-of-the-art performances are obtained with Bag of Words (BoW) models of mid-level representations computed from dense sampled local descriptors (e.g. SIFT). Several methods to combine low-level features and to set mid-level parameters have been evaluated recently for image classification.

In this paper, we further investigate the impact of the main parameters in the BoW pipeline. We show that an adequate combination of several low (sampling rate, multiscale) and mid level (codebook size, normalization) parameters is decisive to reach good performances. Based on this analysis, we propose a merging scheme exploiting the specificities of edge-based descriptors. Low and high-contrast regions are pooled separately and combined to provide a powerful representation of images. Sucessful experiments are provided on the Caltech-101 and Scene-15 datasets.


Local Descriptor Sparse Code Sift Descriptor Early Fusion Codebook Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  2. 2.
    Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV (2003)Google Scholar
  3. 3.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  4. 4.
    Benois-Pineau, J., Bugeau, A., Karaman, S., Mégret, R.: Spatial and multi-resolution context in visual indexing. In: Visual Indexing and Retrieval, pp. 41–63 (2012)Google Scholar
  5. 5.
    van Gemert, J., Veenman, C., Smeulders, A., Geusebroek, J.M.: Visual word ambiguity. PAMI (2010)Google Scholar
  6. 6.
    Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)Google Scholar
  7. 7.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)Google Scholar
  8. 8.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)Google Scholar
  9. 9.
    Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: CVPR (2010)Google Scholar
  10. 10.
    Boureau, Y., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in vision algorithms. In: ICML (2010)Google Scholar
  11. 11.
    Snoek, C., Worring, M., Hauptmann, A.: Learning rich semantics from news video archives by style analysis. TOMCCAP 2 (2006)Google Scholar
  12. 12.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: CVPR Workshop on GMBV (2004)Google Scholar
  13. 13.
    Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008),
  14. 14.
    Fei-Fei, L.: A bayesian hierarchical model for learning natural scene categories. In: CVPR (2005)Google Scholar
  15. 15.
    Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)Google Scholar
  16. 16.
    Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. PAMI 34 (2011)Google Scholar
  17. 17.
    Boureau, Y., Le Roux, N., Bach, F., Ponce, J., LeCun, Y.: Ask the locals: multi-way local pooling for image recognition. In: ICCV (2011)Google Scholar
  18. 18.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. JMLR 9 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Marc Law
    • 1
  • Nicolas Thome
    • 1
  • Matthieu Cord
    • 1
  1. 1.LIP6UPMC - Sorbonne UniversityParisFrance

Personalised recommendations