Locality constrained encoding of frequency and spatial information for image classification

Abstract

The bag-of-feature (BoF) model provides a way to construct high-level representation for image classification. Although spatial pyramid matching (SPM) has been incorporated into many of its extensions, these models intrinsically lack the mechanism to utilize frequency domain information. In this paper, we propose the locality-constrained encoding of frequency and spatial information (LEFSI) algorithm, in which an image is decomposed into multiple frequency components and each component is further decomposed into subregions using SPM. The scale-invariant feature transform (SIFT) descriptors are first calculated in each subregion, and then converted into a global descriptor by using the codebook generated on a category-by-category basis and locality-constrained linear coding (LLC). The image feature is defined as the concatenation of global descriptors constructed in all subregions. We evaluated this algorithm against several state-of-the-art models on six benchmark datasets. Our results suggest that the proposed LEFSI algorithm can describe images more effectively and provide more accurate image classification.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. 1.

    Bo L, Ren X, Fox D (2011) Hierarchical matching pursuit for image classification: Architecture and fast algorithms. Adv Neural Inform Process Syst NIPS 2011:2115–2123

  2. 2.

    Bosch A, Zisserman A, Munoz X (2007) Image classification using random forests and ferns. IEEE international conference on computer vision, ICCV 2007, Rio de Janeiro, Brazil, 14-20 October (pp 1-8)

  3. 3.

    Boureau YL, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, pp 2559–2566

  4. 4.

    Brown M, Lowe D G (2003) Recognising panoramas, vol. 2. Proceedings Ninth IEEE International Conference on Computer Vision, ICCV 2003, Nice, pp 1218–1225

  5. 5.

    Csurka G (2004) Visual categorization with bags of keypoints. Workshop Stat Learn Eur Conf Comput Vision ECCV 44(247):1–22

    Google Scholar 

  6. 6.

    Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Sys Man Cybern Part B 43(4):996–1002

    Article  Google Scholar 

  7. 7.

    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE computer society conference on computer vision and pattern recognition, CVPR 2005, San Diego, ca, Usa, 20-26 June (Vol.1, pp 886-893)

  8. 8.

    Ding G, Zhou J, Guo Y, Lin Z, Zhao S, Han J (2017) Large-scale image retrieval with sparse embedded hashing. Neurocomputing 257:24–36

    Article  Google Scholar 

  9. 9.

    Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: A library for large linear classification. J Mach Learn Res 9(Aug):1871–1874

    MATH  Google Scholar 

  10. 10.

    Gao Y, Wang M, Tao D, Ji R, Dai Q (1993) 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans Image Processing Publ IEEE Signal Process Soc 21(9):4290–4303

    MathSciNet  Article  MATH  Google Scholar 

  11. 11.

    Gao S, Tsang WH, Chia LT (2010) Kernel sparse representation for image classification and face recognition. European conference on computer vision, ECCV 2010, Heraklion Crete, Greece, 5-11 September (pp 1-14)

  12. 12.

    Griffin G, Holub A, Perona P (2007) Caltech-256 Object Category Dataset. California Institute of Technology. (Unpublished) URL: http://authors.library.caltech.edu/7694

  13. 13

    He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  14. 14.

    Hu W, Xie N, Hu R, Ling H, Chen Q, Yan S, Maybank S (2014) Bin Ratio-Based Histogram Distances and Their Application to Image Classification. IEEE Trans Pattern Anal Mach Intell 36(12):2338–2352

    Article  Google Scholar 

  15. 15.

    Juneja M, Vedaldi A, Jawahar CV, Zisserman A (2013) Blocks that shout: Distinctive parts for scene classification. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2013, Portland, p 923–930

  16. 16

    Krause J, Stark M, Jia D, Li FF (2013) 3d object representations for fine-grained categorization. IEEE international conference on computer vision workshops, ICCV 2013, darling harbour, Sydney, Australia, 1-8 December (pp. 554-561)

  17. 17.

    Larlus D, Jurie F (2009) Latent mixture vocabularies for object categorization and segmentation. Image Vis Comput 27(5):523–534

    Article  Google Scholar 

  18. 18.

    Lazebnik S, Schmid C, Ponce J (2004) Semi-local Affine Parts for Object Recognition. British Machine Vision Conference, BMVC 2004, Kingston, pp 779–788

  19. 19.

    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Computer Society Conference Comp Vision Pattern Recogn CVPR 2006:2169–2178

  20. 20.

    Li FF, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories, vol. 2. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, San Diego, pp 524–531

  21. 21.

    Li FF, Fergus R, Perona P (2007) Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70

    Article  Google Scholar 

  22. 22.

    Li LJ, Su H, Lim Y, Li FF (2014) Object Bank: An Object-Level Image Representation for High-Level Visual Recognition. Int J Comput Vis 107(1):20–39

    Article  Google Scholar 

  23. 23

    Li X, Shi J, Dong YS, Tao DC (2015) A survey on scene image classification. SCIENCE CHINA Technol Sci 45:827–848

    Google Scholar 

  24. 24.

    Li T, Ni B, Wu X, Gao Q, Li Q, Sun D (2016a) On random hyper-class random forest for visual classification. Neurocomputing 172(C:281–289

    Article  Google Scholar 

  25. 25.

    Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. International conference on pattern recognition, ICPR 2012, Tsukuba, Japan, 11-15 November (pp 898-901)

  26. 26.

    Liu Y, Nie L, Han L, Rosenblum DS (2015) Action2Activity: recognizing complex activities from sensor data. International joint conference on artificial intelligence, IJCAI 2015, Buenos Aires, Argentina, 25–31 July (pp 1617-1623)

  27. 27.

    Liu L, Cheng L, Liu Y, Rosenblum DS (2016a) Recognizing complex activities by a probabilistic interval-based model. Thirtieth AAAI conference on artificial intelligence, AAAI 2016, phoenix, Arizona Usa, 12-17 February (Vol.30, pp 1266-1272)

  28. 28.

    Liu Y, Nie L, Liu L, Rosenblum DS (2016b) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115

    Article  Google Scholar 

  29. 29.

    Luo C, Ni B, Yan S, Wang M, Image Classification by Selective Regularized Subspace Learning. IEEE Trans Multimedia 18(1):40–50

  30. 30.

    Luo Y, Tao D, Ramamohanarao K, Xu C, Wen Y (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Knowl Data Eng 27(11):3111–3124

    Article  Google Scholar 

  31. 31.

    Luo Y, Wen Y, Tao D, Gui J, Xu C (2016) Large margin multi-modal multi-task feature extraction for image classification. IEEE Trans Image Process 25(1):414–427

    MathSciNet  Article  Google Scholar 

  32. 32.

    Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151

  33. 33.

    Nilsback ME, Zisserman A (2006) A visual vocabulary for flower classification. IEEE computer society conference on computer vision and pattern recognition, CVPR 2006, New York, NY, Usa, 17-22 June (pp 1447-1454)

  34. 34.

    Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. Indian conference on computer vision, Graphics & Image Processing, ICVGIP 2008, Bhubaneswar, pp 722–729

  35. 35.

    Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29(1):51–59

    Article  Google Scholar 

  36. 36.

    Preoţiuc-Pietro D, Ye L, Hopkins D, Ungar L (2017) Beyond binary labels: political ideology prediction of twitter users. Annual meeting of the Association for Computational Linguistics, ACL2017, Vancouver, Canada, 30 July-4 august (Vol.1, pp.729-740)

  37. 37.

    Quan Y, Xu Y, Sun Y, Huang Y (2016) Supervised dictionary learning with multiple classifier integration. Pattern Recogn 55:247–260

    Article  Google Scholar 

  38. 38.

    Quattoni A, Torralba A (2009) Recognizing indoor scenes. IEEE conference on computer vision and pattern recognition, CVPR 2009, Miami, Florida, Usa, 20-25 June (pp 413-420)

  39. 39.

    Sadeghi F, Tappen MF (2012) Latent Pyramidal Regions for Recognizing Scenes. European Conference on Computer Vision, ECCV 2012, Florence, Italy, 7-13 October

  40. 40.

    Shaban A, Rabiee HR, Najibi M, Yousefi S (2015) From Local Similarities to Global Coding: A Framework for Coding Applications. IEEE Trans Image Process 24(12):5074–5085

    Article  Google Scholar 

  41. 41.

    Shen XB, Sun QS, Yuan YH (2015) A unified multiset canonical correlation analysis framework based on graph embedding for multiple feature extraction. Neurocomputing 148:397–408

    Article  Google Scholar 

  42. 42.

    Song X, Jiang S, Herranz L (2015) Joint multi-feature spatial context for scene recognition in the semantic manifold. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, pp 1312–1320

  43. 43.

    Thiagarajan JJ, Ramamurthy KN, Spanias A (2014) Multiple kernel sparse representations for supervised and unsupervised learning. IEEE Trans Image Process 23(7):2905–2915

    MathSciNet  Article  MATH  Google Scholar 

  44. 44.

    van de Sande K, Gevers T, Snoek C (2010) Evaluating Color Descriptors for Object and Scene Recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596

    Article  Google Scholar 

  45. 45.

    Vedaldi A, Fulkerson B (2010) Vlfeat: an open and portable library of computer vision algorithms. International conference on multimedia, MM 2010, Firenze, Italy, 25-29 October (pp 1469–1472)

  46. 46.

    Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200-2011 dataset. California Institute of Technology

  47. 47.

    Wang ZZ, Yong JH (2008) Texture Analysis and Classification With Linear Regression Model Based on Wavelet Transform. IEEE Trans Image Process 17(8):1421–1430

    MathSciNet  Article  Google Scholar 

  48. 48.

    Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained Linear Coding for image classification. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, pp 3360–3367

  49. 49.

    Wang S, Wang Y, Zhu SC (2015) Learning hierarchical space tiling for scene modeling, parsing and attribute tagging. IEEE Trans Pattern Anal Mach Intell 37(12):2478–2491

    Article  Google Scholar 

  50. 50.

    Xie L, Tian Q, Wang M, Zhang B (2014) Spatial Pooling of Heterogeneous Features for Image Classification. IEEE Trans Image Process 23(5):1994–2008

    MathSciNet  Article  MATH  Google Scholar 

  51. 51.

    Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, Miami, pp 1794–1801

  52. 52.

    Yu K, Zhang T, Gong Y (2009) Nonlinear Learning using Local Coordinate Coding. Adv Neural Inform Process Syst NIPS 2009:2223–2231

  53. 53.

    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision, ECCV 2014, Zurich, Switzerland, 6-12 September (pp 818-833)

  54. 54.

    Zhang L, Zhang D (2016) Visual Understanding via Multi-Feature Shared Learning With Global Consistency. IEEE Trans Multimedia 18(2):247–259

    Article  Google Scholar 

  55. 55.

    Zhangzhang S, Song-Chun Z (2013) Learning AND-OR templates for object recognition and detection. IEEE Trans Softw Eng 35(9):2189–2205

  56. 56.

    Zhao S, Yao H, Gao Y, Ding G, Chua Ts (1949) Predicting personalized image emotion perceptions in social networks. IEEE Transactions on Affective Computing PP(99):1–1

  57. 57.

    Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3D object retrieval via multi-modal graph learning. Signal Process 112(C):110–118

    Article  Google Scholar 

  58. 58.

    Zhao S, Yao H, Gao Y, Ji R, Ding G (2017) Continuous probability distribution prediction of image emotions via multitask shared sparse regression. IEEE Trans Multimedia 19(3):632–645

    Article  Google Scholar 

  59. 59.

    Zhu J, Wu T, Zhu SC, Yang X, Zhang W (2016) A reconfigurable tangram model for scene representation and categorization. IEEE Trans Image Process 25(1):150–166

    MathSciNet  Article  Google Scholar 

  60. 60.

    Zou J, Li W, Chen C, Du Q (2016) Scene classification using local and global features with collaborative representation fusion. Inf Sci 348:209–226

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants 61471297 and 61771397, in part by Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University and in part by the Australian Research Council (ARC) Grants.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yong Xia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pan, Y., Xia, Y., Song, Y. et al. Locality constrained encoding of frequency and spatial information for image classification. Multimed Tools Appl 77, 24891–24907 (2018). https://doi.org/10.1007/s11042-018-5712-3

Download citation

Keywords

  • Image classification
  • Bag-of-features (BoF)
  • Image decomposition
  • Wavelet transform
  • Spatial pyramid matching (SPM)