Hybrid Pooling Fusion in the BoW Pipeline

Law, Marc; Thome, Nicolas; Cord, Matthieu

doi:10.1007/978-3-642-33885-4_36

Hybrid Pooling Fusion in the BoW Pipeline

Marc Law¹⁹,
Nicolas Thome¹⁹ &
Matthieu Cord¹⁹

Conference paper

4096 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7585))

Abstract

In the context of object and scene recognition, state-of-the-art performances are obtained with Bag of Words (BoW) models of mid-level representations computed from dense sampled local descriptors (e.g. SIFT). Several methods to combine low-level features and to set mid-level parameters have been evaluated recently for image classification.

In this paper, we further investigate the impact of the main parameters in the BoW pipeline. We show that an adequate combination of several low (sampling rate, multiscale) and mid level (codebook size, normalization) parameters is decisive to reach good performances. Based on this analysis, we propose a merging scheme exploiting the specificities of edge-based descriptors. Low and high-contrast regions are pooled separately and combined to provide a powerful representation of images. Sucessful experiments are provided on the Caltech-101 and Scene-15 datasets.

Download to read the full chapter text

Chapter PDF

References

Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Article Google Scholar
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV (2003)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Google Scholar
Benois-Pineau, J., Bugeau, A., Karaman, S., Mégret, R.: Spatial and multi-resolution context in visual indexing. In: Visual Indexing and Retrieval, pp. 41–63 (2012)
Google Scholar
van Gemert, J., Veenman, C., Smeulders, A., Geusebroek, J.M.: Visual word ambiguity. PAMI (2010)
Google Scholar
Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)
Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Google Scholar
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)
Google Scholar
Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: CVPR (2010)
Google Scholar
Boureau, Y., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in vision algorithms. In: ICML (2010)
Google Scholar
Snoek, C., Worring, M., Hauptmann, A.: Learning rich semantics from news video archives by style analysis. TOMCCAP 2 (2006)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: CVPR Workshop on GMBV (2004)
Google Scholar
Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008), http://www.vlfeat.org/
Fei-Fei, L.: A bayesian hierarchical model for learning natural scene categories. In: CVPR (2005)
Google Scholar
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)
Google Scholar
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. PAMI 34 (2011)
Google Scholar
Boureau, Y., Le Roux, N., Bach, F., Ponce, J., LeCun, Y.: Ask the locals: multi-way local pooling for image recognition. In: ICCV (2011)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. JMLR 9 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

LIP6, UPMC - Sorbonne University, Paris, France
Marc Law, Nicolas Thome & Matthieu Cord

Authors

Marc Law
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Thome
View author publications
You can also search for this author in PubMed Google Scholar
Matthieu Cord
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Ingegneria Elettrica, Gestionale e Meccanica (DIEGM), Università degli Studi di Udine, Via delle Scienze, 208, 33100, Udine, Italy
Andrea Fusiello
IIT Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genoa, Italy
Vittorio Murino
Dipartimento di Ingegneria dell’Informazione, Università degli Studi di Modena e Reggio Emilia, Strada Vignolege, 905, 41125, Modena, Italy
Rita Cucchiara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Law, M., Thome, N., Cord, M. (2012). Hybrid Pooling Fusion in the BoW Pipeline. In: Fusiello, A., Murino, V., Cucchiara, R. (eds) Computer Vision – ECCV 2012. Workshops and Demonstrations. ECCV 2012. Lecture Notes in Computer Science, vol 7585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33885-4_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-33885-4_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33884-7
Online ISBN: 978-3-642-33885-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics