Fast Approximate GMM Soft-Assign for Fine-Grained Image Classification with Large Fisher Vectors

  • Josip Krapac
  • Siniša Šegvić
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9358)


We address two drawbacks of image classification with large Fisher vectors. The first drawback is the computational cost of assigning a large number of patch descriptors to a large number of GMM components. We propose to alleviate that by a generally applicable approximate soft-assignment procedure based on a balanced GMM tree. This approximation significantly reduces the computational complexity while only marginally affecting the fine-grained classification performance. The second drawback is a very high dimensionality of the image representation, which makes the classifier learning and inference computationally complex and prone to overtraining. We propose to alleviate that by regularizing the classification model with group Lasso. The resulting block-sparse models achieve better fine-grained classification performance in addition to memory savings and faster prediction. We demonstrate and evaluate our contributions on a standard fine-grained categorization benchmark.



This work has been fully supported by Croatian Science Foundation under the project I-2433-2014.


  1. 1.
    Arandjelović, R., Zisserman, A.: All about VLAD. In: CVPR (2013)Google Scholar
  2. 2.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)Google Scholar
  4. 4.
    Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV-WSLCV, pp. 1–22 (2004)Google Scholar
  5. 5.
    Farrell, R., Oza, O., Zhang, N., Morariu, V.I., Darrell, T., Davis, L.S.: Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance. In: ICCV, pp. 161–168 (2011)Google Scholar
  6. 6.
    Goldberger, J., Roweis, S.: Hierarchical clustering of a mixture model. In: NIPS, pp. 505–512. MIT Press (2005)Google Scholar
  7. 7.
    Gosselin, P.H., Murray, N., Jégou, H., Perronnin, F.: Inria+Xerox@FGcomp: Boosting the Fisher vector for fine-grained classification. Technical report, INRIA/XRCE (2013)Google Scholar
  8. 8.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th ACM Symposium on the Theory of Computing (STOC 1998), pp. 604–613 (1998)Google Scholar
  9. 9.
    Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. PAMI 33(1), 117–128 (2011)CrossRefGoogle Scholar
  10. 10.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)Google Scholar
  11. 11.
    Krapac, J., Šegvić, S.: Weakly supervised object localization with large fisher vectors. In: VISAPP (2015)Google Scholar
  12. 12.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  13. 13.
    Liu, Y.: Image classification with group fusion sparse representation. In: ICME, pp. 568–573 (2012)Google Scholar
  14. 14.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  15. 15.
    Mairal, J., Jenatton, R., Bach, F.R., Obozinski, G.R.: Network flow algorithms for structured sparsity. In: NIPS (2009)Google Scholar
  16. 16.
    Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. Wiley Interdisc. Rew.: Data Min. Knowl. Disc. 2(1), 86–97 (2012)Google Scholar
  17. 17.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  18. 18.
    Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.J.: Image classification with the fisher vector: theory and practice. IJCV 105(3), 222–245 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep fisher networks for large-scale image classification. In: NIPS, pp. 163–171 (2013)Google Scholar
  20. 20.
    Vedaldi, A., Fulkerson, B.: VLFeat: An Open and Portable Library of Computer Vision Algorithms (2008).
  21. 21.
    Verbeek, J.J., Nunnink, J., Vlassis, N.: Accelerated EM-based clustering of large data sets. Data Min. Knowl. Disc. 13(3), 291–307 (2006)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset. Technical report, California Institute of Technology (2011)Google Scholar
  23. 23.
    Zhang, Z., Chen, C., Sun, J., Chan, K.L.: EM algorithms for Gaussian mixtures with split-and-merge operation. Pattern Recognit. 36(9), 1973–1983 (2003)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  1. 1.Faculty of Electrical Engineering and ComputingUniversity of ZagrebZagrebCroatia

Personalised recommendations