Advertisement

Multimedia Tools and Applications

, Volume 75, Issue 10, pp 5533–5555 | Cite as

Image classification based on improved VLAD

  • Xianzhong LongEmail author
  • Hongtao Lu
  • Yong Peng
  • Xianzhong Wang
  • Shaokun Feng
Article

Abstract

Recently, a coding scheme called vector of locally aggregated descriptors (VLAD) has got tremendous successes in large scale image retrieval due to its efficiency of compact representation. VLAD employs only the nearest neighbor visual word in dictionary to aggregate each descriptor feature. It has fast retrieval speed and high retrieval accuracy under small dictionary size. In this paper, we give three improved VLAD variations for image classification: first, similar to the bag of words (BoW) model, we count the number of descriptors belonging to each cluster center and add it to VLAD; second, in order to expand the impact of residuals, squared residuals are taken into account; thirdly, in contrast with one nearest neighbor visual word, we try to look for two nearest neighbor visual words for aggregating each descriptor. Experimental results on UIUC Sports Event, Corel 10 and 15 Scenes datasets show that the proposed methods outperform some state-of-the-art coding schemes in terms of the classification accuracy and computation speed.

Keywords

Image classification Scale-invariant feature transform Vector of locally aggregated descriptors K-means clustering algorithm 

Notes

Acknowledgments

This work is sponsored by NUPTSF (Grant No. NY214168), National Natural Science Foundation of China (Grant No. 61300164, 61272247), Shanghai Science and Technology Committee (Grant No. 13511500200) and European Union Seventh Framework Programme (Grant No. 247619).

References

  1. 1.
    Arandjelovic R, Zisserman A (2013) All about vlad. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1578–1585Google Scholar
  2. 2.
    Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8Google Scholar
  3. 3.
    Bosch A, Zisserman A, Muoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Int 30(4):712–727CrossRefGoogle Scholar
  4. 4.
    Cinbis RG, Verbeek J, Schmid C (2012) Image categorization using fisher kernels of non-iid image models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2184–2191Google Scholar
  5. 5.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  6. 6.
    Csurka G, Dance CR, Fan LX, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol. 1, p. 22Google Scholar
  7. 7.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp 886–893Google Scholar
  8. 8.
    Delhumeau J, Gosselin PH, Jégou H, Pérez P (2013) Revisiting the vlad image representation. In: ACM international conference on Multimedia, pp 653–656Google Scholar
  9. 9.
    Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Proc 15(12):3736–3745MathSciNetCrossRefGoogle Scholar
  10. 10.
    Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comp Vision Image Underst 106(1):59–70CrossRefGoogle Scholar
  11. 11.
    Freund Y, Schapire R (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp 23–37Google Scholar
  12. 12.
    Gao SH, Tsang IWH, Chia LT, Zhao PL (2010) Local features are not lonely–laplacian sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3555–3561Google Scholar
  13. 13.
    Grauman K, Darrell T (2005) The pyramid match kernel: Discriminative classification with sets of image features. In: International Conference on Computer Vision, vol. 2, pp 1458–1465Google Scholar
  14. 14.
    Griffin G, Holub A, Perona P (2007) Caltech-256 object category datasetGoogle Scholar
  15. 15.
    Harada T, Ushiku Y, Yamashita Y, Kuniyoshi Y (2011) Discriminative spatial pyramid. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1617–1624Google Scholar
  16. 16.
    Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3304–3311Google Scholar
  17. 17.
    Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Int 34(9):1704–1716CrossRefGoogle Scholar
  18. 18.
    Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: International Conference on Computer Vision, vol. 1, pp 604–610Google Scholar
  19. 19.
    Krapac J, Verbeek J, Jurie F (2011) Modeling spatial layout with fisher vectors for image categorization. In: IEEE International Conference on Computer Vision, pp 1487–1494Google Scholar
  20. 20.
    Kulkarni N, Li BX (2011) Discriminative affine sparse codes for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1609–1616Google Scholar
  21. 21.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp 2169–2178Google Scholar
  22. 22.
    Li FF, Pietro P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp 524–531Google Scholar
  23. 23.
    Li LJ, Li FF (2007) What, where and who? classifying events by scene and object recognition. In: International Conference on Computer Vision, pp 1–8Google Scholar
  24. 24.
    Long X, Lu H, Li W (2012) Image classification based on nearest neighbor basis vectors. Multimed Tools Appl:1–18Google Scholar
  25. 25.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  26. 26.
    Lu Z, Ip HHS (2009) Image categorization with spatial mismatch kernels. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 397–404Google Scholar
  27. 27.
    Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. Advances in neural information processing systems 19Google Scholar
  28. 28.
    Morel J, Yu G (2009) Asift: A new framework for fully affine invariant image comparison. SIAM J Imaging Sci 2(2):438–469MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8Google Scholar
  30. 30.
    Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision, pp 143–156Google Scholar
  31. 31.
    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8Google Scholar
  32. 32.
    Picard D, Gosselin PH (2011) Improving image similarity with vectors of locally aggregated tensors. In: IEEE International Conference on Image Processing, pp 669–672Google Scholar
  33. 33.
    Quelhas P, Monay F, Odobez JM, Gatica-Perez D, Tuytelaars T, Van Gool L (2005) Modeling scenes with local descriptors and latent aspects. In: International Conference on Computer Vision, vol. 1, pp 883–890Google Scholar
  34. 34.
    Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: An efficient alternative to sift or surf. In: International Conference on Computer VisionGoogle Scholar
  35. 35.
    Sivic J, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp 1470–1477Google Scholar
  36. 36.
    Wang JJ, Yang JC, Yu K, Lv FJ, Huang T, Gong YH (2010) Locality-constrained linear coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3360–3367Google Scholar
  37. 37.
    Xu D, Chang S (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Int 30(11):1985–1997CrossRefGoogle Scholar
  38. 38.
    Yang JC, Yu K, Gong YH, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1794–1801Google Scholar
  39. 39.
    Yang L, Jin R, Sukthankar R, Jurie F (2008) Unifying discriminative visual codebook generation with classifier training for object category recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8Google Scholar
  40. 40.
    Yu K, Zhang T, Gong YH (2009) Nonlinear learning using local coordinate coding. Adv Neural Inf Process Syst 22:2223–2231Google Scholar
  41. 41.
    Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: European Conference on Computer Vision, pp 141–154Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Xianzhong Long
    • 1
    Email author
  • Hongtao Lu
    • 2
  • Yong Peng
    • 2
  • Xianzhong Wang
    • 2
  • Shaokun Feng
    • 2
  1. 1.School of Computer Science & Technology, School of SoftwareNanjing University of Posts and TelecommunicationsNanjingChina
  2. 2.Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Department of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations