Image Representation with Bag-of-Words

Xu, Xiang; Wu, Xingkun; Lin, Feng

doi:10.1007/978-3-319-47629-2_4

Xiang Xu⁴,
Xingkun Wu⁵ &
Feng Lin⁴

696 Accesses

Abstract

Image classification, which is to assign one or more category labels to an image, is a very hot topic in computer vision and pattern recognition. It can be applied in video surveillance, remote sensing, web content analysis, biometrics, etc. Many successful models transform low-level descriptors into richer mid-level representations. Extracting mid-level features involves a sequence of interchangeable modules. However, they always consist of two major parts: Bag-of-Words (BoW) and Spatial Pyramid Matching (SPM). The target is to embed low-level descriptors in a representative codebook space.First of all, low-level descriptors are firstly extracted at interest points or in dense grids. Then, a pre-defined codebook is applied to encode each descriptor using a specific coding scheme. The code is normally a vector with binary or continuous elements depends on coding scheme, which can be referred as mid-level descriptor. Next, the image is divided into increasingly finer spatial subregions. Multiple codes from each subregion are pooled together by averaging or normalizing into a histogram. Finally, the final image representation is generated by concatenating the histograms from all subregions together. In this chapter, we introduce the key techniques employed in the BoW framework including SPM, which are coding process and pooling process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In Proc. CVPR, pages 1794–1801, 2009.
Google Scholar
J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In Proc. CVPR, pages 3360–3367, 2010.
Google Scholar
Linlin Shen, Jiaming Lin, Shengyin Wu, and Shiqi Yu. Hep-2 image classification using intensity order pooling based features and bag of words. Pattern Recognition, 47(7):2419–2427, 2014.
Article Google Scholar
Arnold Wiliem, Conrad Sanderson, Yongkang Wong, Peter Hobson, Rodney F Minchin, and Brian C Lovell. Automatic classification of human epithelial type 2 cell indirect immunofluorescence images using cell pyramid matching. Pattern Recognition, 47(7):2315–2324, 2014.
Article Google Scholar
D.G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis., 60(2):91–110, 2004.
Article Google Scholar
Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886–893. IEEE, 2005.
Google Scholar
Timo Ojala, Matti Pietikäinen, and David Harwood. A comparative study of texture measures with classification based on featured distributions. Pattern recognition, 29(1):51–59, 1996.
Article Google Scholar
Jianchao Yang, Kai Yu, and Thomas Huang. Supervised translation-invariant sparse coding. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3517–3524. IEEE, 2010.
Google Scholar
Julien Mairal, Jean Ponce, Guillermo Sapiro, Andrew Zisserman, and Francis R Bach. Supervised dictionary learning. In Advances in neural information processing systems, pages 1033–1040, 2009.
Google Scholar
Zhuolin Jiang, Guangxiao Zhang, and Larry S Davis. Submodular dictionary learning for sparse coding. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3418–3425. IEEE, 2012.
Google Scholar
K. Yu, T. Zhang, and Yihong Gong. Nonlinear learning using local coordinate coding. In Proc. NIPS, pages 2223–2231, 2009.
Google Scholar
Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 2169–2178. IEEE, 2006.
Google Scholar
S. McCann and D.G. Lowe. Local naive bayes nearest neighbor for image classification. In Proc. CVPR, pages 3650–3656, 2012.
Google Scholar
Jan C van Gemert, Jan-Mark Geusebroek, Cor J Veenman, and Arnold WM Smeulders. Kernel codebooks for scene categorization. In Proc. ECCV, pages 696–709. Springer, 2008.
Google Scholar
Jan C van Gemert, Cor J Veenman, Arnold WM Smeulders, and J.M. Geusebroek. Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell., 32(7):1271–1283, 2010.
Article Google Scholar
L. Liu, L. Wang, and X. Liu. In defense of soft-assignment coding. In Proc. ICCV, pages 2486–2493, 2011.
Google Scholar
Yongzhen Huang, Kaiqi Huang, Yinan Yu, and Tieniu Tan. Salient coding for image classification. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1753–1760. IEEE, 2011.
Google Scholar
Zifeng Wu, Yongzhen Huang, Liang Wang, and Tieniu Tan. Group encoding of local features in image classification. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 1505–1508. IEEE, 2012.
Google Scholar
Y-Lan Boureau, Jean Ponce, and Yann LeCun. A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 111–118, 2010.
Google Scholar
Y-Lan Boureau, Francis Bach, Yann LeCun, and Jean Ponce. Learning mid-level features for recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2559–2566. IEEE, 2010.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
Xiang Xu & Feng Lin
Zhejiiang University, Hangzhou, Zhejiang, China
Xingkun Wu

Authors

Xiang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xingkun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Feng Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Xu .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xu, X., Wu, X., Lin, F. (2017). Image Representation with Bag-of-Words. In: Cellular Image Classification. Springer, Cham. https://doi.org/10.1007/978-3-319-47629-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-47629-2_4
Published: 19 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47628-5
Online ISBN: 978-3-319-47629-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics