Skip to main content

Image Representation with Bag-of-Words

  • Chapter
  • First Online:
Cellular Image Classification
  • 696 Accesses

Abstract

Image classification, which is to assign one or more category labels to an image, is a very hot topic in computer vision and pattern recognition. It can be applied in video surveillance, remote sensing, web content analysis, biometrics, etc. Many successful models transform low-level descriptors into richer mid-level representations. Extracting mid-level features involves a sequence of interchangeable modules. However, they always consist of two major parts: Bag-of-Words (BoW) and Spatial Pyramid Matching (SPM). The target is to embed low-level descriptors in a representative codebook space.First of all, low-level descriptors are firstly extracted at interest points or in dense grids. Then, a pre-defined codebook is applied to encode each descriptor using a specific coding scheme. The code is normally a vector with binary or continuous elements depends on coding scheme, which can be referred as mid-level descriptor. Next, the image is divided into increasingly finer spatial subregions. Multiple codes from each subregion are pooled together by averaging or normalizing into a histogram. Finally, the final image representation is generated by concatenating the histograms from all subregions together. In this chapter, we introduce the key techniques employed in the BoW framework including SPM, which are coding process and pooling process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In Proc. CVPR, pages 1794–1801, 2009.

    Google Scholar 

  2. J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In Proc. CVPR, pages 3360–3367, 2010.

    Google Scholar 

  3. Linlin Shen, Jiaming Lin, Shengyin Wu, and Shiqi Yu. Hep-2 image classification using intensity order pooling based features and bag of words. Pattern Recognition, 47(7):2419–2427, 2014.

    Article  Google Scholar 

  4. Arnold Wiliem, Conrad Sanderson, Yongkang Wong, Peter Hobson, Rodney F Minchin, and Brian C Lovell. Automatic classification of human epithelial type 2 cell indirect immunofluorescence images using cell pyramid matching. Pattern Recognition, 47(7):2315–2324, 2014.

    Article  Google Scholar 

  5. D.G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis., 60(2):91–110, 2004.

    Article  Google Scholar 

  6. Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886–893. IEEE, 2005.

    Google Scholar 

  7. Timo Ojala, Matti Pietikäinen, and David Harwood. A comparative study of texture measures with classification based on featured distributions. Pattern recognition, 29(1):51–59, 1996.

    Article  Google Scholar 

  8. Jianchao Yang, Kai Yu, and Thomas Huang. Supervised translation-invariant sparse coding. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3517–3524. IEEE, 2010.

    Google Scholar 

  9. Julien Mairal, Jean Ponce, Guillermo Sapiro, Andrew Zisserman, and Francis R Bach. Supervised dictionary learning. In Advances in neural information processing systems, pages 1033–1040, 2009.

    Google Scholar 

  10. Zhuolin Jiang, Guangxiao Zhang, and Larry S Davis. Submodular dictionary learning for sparse coding. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3418–3425. IEEE, 2012.

    Google Scholar 

  11. K. Yu, T. Zhang, and Yihong Gong. Nonlinear learning using local coordinate coding. In Proc. NIPS, pages 2223–2231, 2009.

    Google Scholar 

  12. Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 2169–2178. IEEE, 2006.

    Google Scholar 

  13. S. McCann and D.G. Lowe. Local naive bayes nearest neighbor for image classification. In Proc. CVPR, pages 3650–3656, 2012.

    Google Scholar 

  14. Jan C van Gemert, Jan-Mark Geusebroek, Cor J Veenman, and Arnold WM Smeulders. Kernel codebooks for scene categorization. In Proc. ECCV, pages 696–709. Springer, 2008.

    Google Scholar 

  15. Jan C van Gemert, Cor J Veenman, Arnold WM Smeulders, and J.M. Geusebroek. Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell., 32(7):1271–1283, 2010.

    Article  Google Scholar 

  16. L. Liu, L. Wang, and X. Liu. In defense of soft-assignment coding. In Proc. ICCV, pages 2486–2493, 2011.

    Google Scholar 

  17. Yongzhen Huang, Kaiqi Huang, Yinan Yu, and Tieniu Tan. Salient coding for image classification. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1753–1760. IEEE, 2011.

    Google Scholar 

  18. Zifeng Wu, Yongzhen Huang, Liang Wang, and Tieniu Tan. Group encoding of local features in image classification. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 1505–1508. IEEE, 2012.

    Google Scholar 

  19. Y-Lan Boureau, Jean Ponce, and Yann LeCun. A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 111–118, 2010.

    Google Scholar 

  20. Y-Lan Boureau, Francis Bach, Yann LeCun, and Jean Ponce. Learning mid-level features for recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2559–2566. IEEE, 2010.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang Xu .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Xu, X., Wu, X., Lin, F. (2017). Image Representation with Bag-of-Words. In: Cellular Image Classification. Springer, Cham. https://doi.org/10.1007/978-3-319-47629-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47629-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47628-5

  • Online ISBN: 978-3-319-47629-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics