Abstract
In the context of object and scene recognition, state-of-the-art performances are obtained with Bag of Words (BoW) models of mid-level representations computed from dense sampled local descriptors (e.g. SIFT). Several methods to combine low-level features and to set mid-level parameters have been evaluated recently for image classification.
In this paper, we further investigate the impact of the main parameters in the BoW pipeline. We show that an adequate combination of several low (sampling rate, multiscale) and mid level (codebook size, normalization) parameters is decisive to reach good performances. Based on this analysis, we propose a merging scheme exploiting the specificities of edge-based descriptors. Low and high-contrast regions are pooled separately and combined to provide a powerful representation of images. Sucessful experiments are provided on the Caltech-101 and Scene-15 datasets.
Chapter PDF
References
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV (2003)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Benois-Pineau, J., Bugeau, A., Karaman, S., Mégret, R.: Spatial and multi-resolution context in visual indexing. In: Visual Indexing and Retrieval, pp. 41–63 (2012)
van Gemert, J., Veenman, C., Smeulders, A., Geusebroek, J.M.: Visual word ambiguity. PAMI (2010)
Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)
Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: CVPR (2010)
Boureau, Y., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in vision algorithms. In: ICML (2010)
Snoek, C., Worring, M., Hauptmann, A.: Learning rich semantics from news video archives by style analysis. TOMCCAP 2 (2006)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: CVPR Workshop on GMBV (2004)
Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008), http://www.vlfeat.org/
Fei-Fei, L.: A bayesian hierarchical model for learning natural scene categories. In: CVPR (2005)
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. PAMI 34 (2011)
Boureau, Y., Le Roux, N., Bach, F., Ponce, J., LeCun, Y.: Ask the locals: multi-way local pooling for image recognition. In: ICCV (2011)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. JMLR 9 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Law, M., Thome, N., Cord, M. (2012). Hybrid Pooling Fusion in the BoW Pipeline. In: Fusiello, A., Murino, V., Cucchiara, R. (eds) Computer Vision – ECCV 2012. Workshops and Demonstrations. ECCV 2012. Lecture Notes in Computer Science, vol 7585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33885-4_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-33885-4_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33884-7
Online ISBN: 978-3-642-33885-4
eBook Packages: Computer ScienceComputer Science (R0)