Advertisement

Multimedia Tools and Applications

, Volume 77, Issue 19, pp 24891–24907 | Cite as

Locality constrained encoding of frequency and spatial information for image classification

  • Yongsheng Pan
  • Yong XiaEmail author
  • Yang Song
  • Weidong Cai
Article
  • 142 Downloads

Abstract

The bag-of-feature (BoF) model provides a way to construct high-level representation for image classification. Although spatial pyramid matching (SPM) has been incorporated into many of its extensions, these models intrinsically lack the mechanism to utilize frequency domain information. In this paper, we propose the locality-constrained encoding of frequency and spatial information (LEFSI) algorithm, in which an image is decomposed into multiple frequency components and each component is further decomposed into subregions using SPM. The scale-invariant feature transform (SIFT) descriptors are first calculated in each subregion, and then converted into a global descriptor by using the codebook generated on a category-by-category basis and locality-constrained linear coding (LLC). The image feature is defined as the concatenation of global descriptors constructed in all subregions. We evaluated this algorithm against several state-of-the-art models on six benchmark datasets. Our results suggest that the proposed LEFSI algorithm can describe images more effectively and provide more accurate image classification.

Keywords

Image classification Bag-of-features (BoF) Image decomposition Wavelet transform Spatial pyramid matching (SPM) 

Notes

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants 61471297 and 61771397, in part by Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University and in part by the Australian Research Council (ARC) Grants.

References

  1. 1.
    Bo L, Ren X, Fox D (2011) Hierarchical matching pursuit for image classification: Architecture and fast algorithms. Adv Neural Inform Process Syst NIPS 2011:2115–2123Google Scholar
  2. 2.
    Bosch A, Zisserman A, Munoz X (2007) Image classification using random forests and ferns. IEEE international conference on computer vision, ICCV 2007, Rio de Janeiro, Brazil, 14-20 October (pp 1-8)Google Scholar
  3. 3.
    Boureau YL, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, pp 2559–2566Google Scholar
  4. 4.
    Brown M, Lowe D G (2003) Recognising panoramas, vol. 2. Proceedings Ninth IEEE International Conference on Computer Vision, ICCV 2003, Nice, pp 1218–1225Google Scholar
  5. 5.
    Csurka G (2004) Visual categorization with bags of keypoints. Workshop Stat Learn Eur Conf Comput Vision ECCV 44(247):1–22Google Scholar
  6. 6.
    Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Sys Man Cybern Part B 43(4):996–1002CrossRefGoogle Scholar
  7. 7.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE computer society conference on computer vision and pattern recognition, CVPR 2005, San Diego, ca, Usa, 20-26 June (Vol.1, pp 886-893)Google Scholar
  8. 8.
    Ding G, Zhou J, Guo Y, Lin Z, Zhao S, Han J (2017) Large-scale image retrieval with sparse embedded hashing. Neurocomputing 257:24–36CrossRefGoogle Scholar
  9. 9.
    Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: A library for large linear classification. J Mach Learn Res 9(Aug):1871–1874zbMATHGoogle Scholar
  10. 10.
    Gao Y, Wang M, Tao D, Ji R, Dai Q (1993) 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans Image Processing Publ IEEE Signal Process Soc 21(9):4290–4303MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Gao S, Tsang WH, Chia LT (2010) Kernel sparse representation for image classification and face recognition. European conference on computer vision, ECCV 2010, Heraklion Crete, Greece, 5-11 September (pp 1-14)Google Scholar
  12. 12.
    Griffin G, Holub A, Perona P (2007) Caltech-256 Object Category Dataset. California Institute of Technology. (Unpublished) URL: http://authors.library.caltech.edu/7694
  13. 13.
    He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916CrossRefGoogle Scholar
  14. 14.
    Hu W, Xie N, Hu R, Ling H, Chen Q, Yan S, Maybank S (2014) Bin Ratio-Based Histogram Distances and Their Application to Image Classification. IEEE Trans Pattern Anal Mach Intell 36(12):2338–2352CrossRefGoogle Scholar
  15. 15.
    Juneja M, Vedaldi A, Jawahar CV, Zisserman A (2013) Blocks that shout: Distinctive parts for scene classification. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2013, Portland, p 923–930Google Scholar
  16. 16.
    Krause J, Stark M, Jia D, Li FF (2013) 3d object representations for fine-grained categorization. IEEE international conference on computer vision workshops, ICCV 2013, darling harbour, Sydney, Australia, 1-8 December (pp. 554-561)Google Scholar
  17. 17.
    Larlus D, Jurie F (2009) Latent mixture vocabularies for object categorization and segmentation. Image Vis Comput 27(5):523–534CrossRefGoogle Scholar
  18. 18.
    Lazebnik S, Schmid C, Ponce J (2004) Semi-local Affine Parts for Object Recognition. British Machine Vision Conference, BMVC 2004, Kingston, pp 779–788Google Scholar
  19. 19.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Computer Society Conference Comp Vision Pattern Recogn CVPR 2006:2169–2178Google Scholar
  20. 20.
    Li FF, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories, vol. 2. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, San Diego, pp 524–531Google Scholar
  21. 21.
    Li FF, Fergus R, Perona P (2007) Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70CrossRefGoogle Scholar
  22. 22.
    Li LJ, Su H, Lim Y, Li FF (2014) Object Bank: An Object-Level Image Representation for High-Level Visual Recognition. Int J Comput Vis 107(1):20–39CrossRefGoogle Scholar
  23. 23.
    Li X, Shi J, Dong YS, Tao DC (2015) A survey on scene image classification. SCIENCE CHINA Technol Sci 45:827–848Google Scholar
  24. 24.
    Li T, Ni B, Wu X, Gao Q, Li Q, Sun D (2016a) On random hyper-class random forest for visual classification. Neurocomputing 172(C:281–289CrossRefGoogle Scholar
  25. 25.
    Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. International conference on pattern recognition, ICPR 2012, Tsukuba, Japan, 11-15 November (pp 898-901)Google Scholar
  26. 26.
    Liu Y, Nie L, Han L, Rosenblum DS (2015) Action2Activity: recognizing complex activities from sensor data. International joint conference on artificial intelligence, IJCAI 2015, Buenos Aires, Argentina, 25–31 July (pp 1617-1623)Google Scholar
  27. 27.
    Liu L, Cheng L, Liu Y, Rosenblum DS (2016a) Recognizing complex activities by a probabilistic interval-based model. Thirtieth AAAI conference on artificial intelligence, AAAI 2016, phoenix, Arizona Usa, 12-17 February (Vol.30, pp 1266-1272)Google Scholar
  28. 28.
    Liu Y, Nie L, Liu L, Rosenblum DS (2016b) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115CrossRefGoogle Scholar
  29. 29.
    Luo C, Ni B, Yan S, Wang M, Image Classification by Selective Regularized Subspace Learning. IEEE Trans Multimedia 18(1):40–50Google Scholar
  30. 30.
    Luo Y, Tao D, Ramamohanarao K, Xu C, Wen Y (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Knowl Data Eng 27(11):3111–3124CrossRefGoogle Scholar
  31. 31.
    Luo Y, Wen Y, Tao D, Gui J, Xu C (2016) Large margin multi-modal multi-task feature extraction for image classification. IEEE Trans Image Process 25(1):414–427MathSciNetCrossRefGoogle Scholar
  32. 32.
    Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151Google Scholar
  33. 33.
    Nilsback ME, Zisserman A (2006) A visual vocabulary for flower classification. IEEE computer society conference on computer vision and pattern recognition, CVPR 2006, New York, NY, Usa, 17-22 June (pp 1447-1454)Google Scholar
  34. 34.
    Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. Indian conference on computer vision, Graphics & Image Processing, ICVGIP 2008, Bhubaneswar, pp 722–729Google Scholar
  35. 35.
    Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29(1):51–59CrossRefGoogle Scholar
  36. 36.
    Preoţiuc-Pietro D, Ye L, Hopkins D, Ungar L (2017) Beyond binary labels: political ideology prediction of twitter users. Annual meeting of the Association for Computational Linguistics, ACL2017, Vancouver, Canada, 30 July-4 august (Vol.1, pp.729-740)Google Scholar
  37. 37.
    Quan Y, Xu Y, Sun Y, Huang Y (2016) Supervised dictionary learning with multiple classifier integration. Pattern Recogn 55:247–260CrossRefGoogle Scholar
  38. 38.
    Quattoni A, Torralba A (2009) Recognizing indoor scenes. IEEE conference on computer vision and pattern recognition, CVPR 2009, Miami, Florida, Usa, 20-25 June (pp 413-420)Google Scholar
  39. 39.
    Sadeghi F, Tappen MF (2012) Latent Pyramidal Regions for Recognizing Scenes. European Conference on Computer Vision, ECCV 2012, Florence, Italy, 7-13 OctoberGoogle Scholar
  40. 40.
    Shaban A, Rabiee HR, Najibi M, Yousefi S (2015) From Local Similarities to Global Coding: A Framework for Coding Applications. IEEE Trans Image Process 24(12):5074–5085CrossRefGoogle Scholar
  41. 41.
    Shen XB, Sun QS, Yuan YH (2015) A unified multiset canonical correlation analysis framework based on graph embedding for multiple feature extraction. Neurocomputing 148:397–408CrossRefGoogle Scholar
  42. 42.
    Song X, Jiang S, Herranz L (2015) Joint multi-feature spatial context for scene recognition in the semantic manifold. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, pp 1312–1320Google Scholar
  43. 43.
    Thiagarajan JJ, Ramamurthy KN, Spanias A (2014) Multiple kernel sparse representations for supervised and unsupervised learning. IEEE Trans Image Process 23(7):2905–2915MathSciNetCrossRefzbMATHGoogle Scholar
  44. 44.
    van de Sande K, Gevers T, Snoek C (2010) Evaluating Color Descriptors for Object and Scene Recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596CrossRefGoogle Scholar
  45. 45.
    Vedaldi A, Fulkerson B (2010) Vlfeat: an open and portable library of computer vision algorithms. International conference on multimedia, MM 2010, Firenze, Italy, 25-29 October (pp 1469–1472)Google Scholar
  46. 46.
    Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200-2011 dataset. California Institute of TechnologyGoogle Scholar
  47. 47.
    Wang ZZ, Yong JH (2008) Texture Analysis and Classification With Linear Regression Model Based on Wavelet Transform. IEEE Trans Image Process 17(8):1421–1430MathSciNetCrossRefGoogle Scholar
  48. 48.
    Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained Linear Coding for image classification. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, pp 3360–3367Google Scholar
  49. 49.
    Wang S, Wang Y, Zhu SC (2015) Learning hierarchical space tiling for scene modeling, parsing and attribute tagging. IEEE Trans Pattern Anal Mach Intell 37(12):2478–2491CrossRefGoogle Scholar
  50. 50.
    Xie L, Tian Q, Wang M, Zhang B (2014) Spatial Pooling of Heterogeneous Features for Image Classification. IEEE Trans Image Process 23(5):1994–2008MathSciNetCrossRefzbMATHGoogle Scholar
  51. 51.
    Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, Miami, pp 1794–1801Google Scholar
  52. 52.
    Yu K, Zhang T, Gong Y (2009) Nonlinear Learning using Local Coordinate Coding. Adv Neural Inform Process Syst NIPS 2009:2223–2231Google Scholar
  53. 53.
    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision, ECCV 2014, Zurich, Switzerland, 6-12 September (pp 818-833)Google Scholar
  54. 54.
    Zhang L, Zhang D (2016) Visual Understanding via Multi-Feature Shared Learning With Global Consistency. IEEE Trans Multimedia 18(2):247–259CrossRefGoogle Scholar
  55. 55.
    Zhangzhang S, Song-Chun Z (2013) Learning AND-OR templates for object recognition and detection. IEEE Trans Softw Eng 35(9):2189–2205Google Scholar
  56. 56.
    Zhao S, Yao H, Gao Y, Ding G, Chua Ts (1949) Predicting personalized image emotion perceptions in social networks. IEEE Transactions on Affective Computing PP(99):1–1Google Scholar
  57. 57.
    Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3D object retrieval via multi-modal graph learning. Signal Process 112(C):110–118CrossRefGoogle Scholar
  58. 58.
    Zhao S, Yao H, Gao Y, Ji R, Ding G (2017) Continuous probability distribution prediction of image emotions via multitask shared sparse regression. IEEE Trans Multimedia 19(3):632–645CrossRefGoogle Scholar
  59. 59.
    Zhu J, Wu T, Zhu SC, Yang X, Zhang W (2016) A reconfigurable tangram model for scene representation and categorization. IEEE Trans Image Process 25(1):150–166MathSciNetCrossRefGoogle Scholar
  60. 60.
    Zou J, Li W, Chen C, Du Q (2016) Scene classification using local and global features with collaborative representation fusion. Inf Sci 348:209–226MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Shaanxi Key Laboratory of Speech & Image Information Processing (SAIIP), School of Computer Science and EngineeringNorthwestern Polytechnical UniversityXi’anChina
  2. 2.Centre for Multidisciplinary Convergence Computing (CMCC), School of Computer Science and EngineeringNorthwestern Polytechnical UniversityXi’anChina
  3. 3.Biomedical and Multimedia Information Technology (BMIT) Research Group, School of Information TechnologiesUniversity of SydneyCamperdownAustralia

Personalised recommendations