Improving the Fisher Kernel for Large-Scale Image Classification

  • Florent Perronnin
  • Jorge Sánchez
  • Thomas Mensink
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6314)


The Fisher kernel (FK) is a generic framework which combines the benefits of generative and discriminative approaches. In the context of image classification the FK was shown to extend the popular bag-of-visual-words (BOV) by going beyond count statistics. However, in practice, this enriched representation has not yet shown its superiority over the BOV. In the first part we show that with several well-motivated modifications over the original framework we can boost the accuracy of the FK. On PASCAL VOC 2007 we increase the Average Precision (AP) from 47.9% to 58.3%. Similarly, we demonstrate state-of-the-art accuracy on CalTech 256. A major advantage is that these results are obtained using only SIFT descriptors and costless linear classifiers. Equipped with this representation, we can now explore image classification on a larger scale. In the second part, as an application, we compare two abundant resources of labeled images to learn classifiers: ImageNet and Flickr groups. In an evaluation involving hundreds of thousands of training images we show that classifiers learned on Flickr groups perform surprisingly well (although they were not intended for this purpose) and that they can complement classifiers learned on more carefully annotated datasets.


Gaussian Mixture Model Training Image Average Precision Sparse Code Sift Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV SLCV Workshop (2004)Google Scholar
  2. 2.
    Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving “bag-of-keypoints” image categorisation. Technical report, University of Southampton (2005)Google Scholar
  3. 3.
    Gemert, J.V., Veenman, C., Smeulders, A., Geusebroek, J.: Visual word ambiguity. In: IEEE PAMI (2010) (accepted)Google Scholar
  4. 4.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  5. 5.
    Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73 (2007)Google Scholar
  6. 6.
    Yang, J., Li, Y., Tian, Y., Duan, L., Gao, W.: Group sensitive multiple kernel learning for object categorization. In: ICCV (2009)Google Scholar
  7. 7.
    Tahir, M., Kittler, J., Mikolajczyk, K., Yan, F., van de Sande, K., Gevers, T.: Visual category recognition using spectral regression and kernel discriminant analysis. In: ICCV workshop on subspace methods (2009)Google Scholar
  8. 8.
    Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC 2007) Results (2007)Google Scholar
  9. 9.
    Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2008 (VOC 2008) Results (2008)Google Scholar
  10. 10.
    Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2009 (VOC 2009) Results (2009)Google Scholar
  11. 11.
    Joachims, T.: Training linear svms in linear time. In: KDD (2006)Google Scholar
  12. 12.
    Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal estimate sub-gradient solver for SVM. In: ICML (2007)Google Scholar
  13. 13.
    Li, Y., Crandall, D., Huttenlocher, D.: Landmark classification in large-scale image collections. In: ICCV (2009)Google Scholar
  14. 14.
    Wang, G., Hoiem, D., Forsyth, D.: Learning image similarity from flickr groups using stochastic intersection kernel machines. In: ICCV (2009)Google Scholar
  15. 15.
    Maji, S., Berg, A.: Max-margin additive classifiers for detection. In: ICCV (2009)Google Scholar
  16. 16.
    Perronnin, F., Sánchez, J., Liu, Y.: Large-scale image categorization with explicit data embedding. In: CVPR (2010)Google Scholar
  17. 17.
    Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR (2010)Google Scholar
  18. 18.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)Google Scholar
  19. 19.
    Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: CVPR (2008)Google Scholar
  20. 20.
    Bo, L., Sminchisescu, C.: Efficient match kernels between sets of features for visual recognition. In: NIPS (2009)Google Scholar
  21. 21.
    Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NIPS (1999)Google Scholar
  22. 22.
    Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)Google Scholar
  23. 23.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  24. 24.
    Zhang, X., Li, Z., Zhang, L., Ma, W., Shum, H.-Y.: Efficient indexing for large-scale visual search. In: ICCV (2009)Google Scholar
  25. 25.
    Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: CVPR (2006)Google Scholar
  26. 26.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology (2007)Google Scholar
  27. 27.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60 (2004)Google Scholar
  28. 28.
    Uijlings, J., Smeulders, A., Scha, R.: What is the spatial extent of an object? In: CVPR (2009)Google Scholar
  29. 29.
    Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: ICCV (2009)Google Scholar
  30. 30.
    Hoiem, D., Wang, G., Forsyth, D.: Building text features for object image classification. In: CVPR (2009)Google Scholar
  31. 31.
    Torralba, A., Fergus, R., Freeman, W.: 80 million tiny images: a large dataset for non-parametric object and scene recognition. In: IEEE PAMI (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Florent Perronnin
    • 1
  • Jorge Sánchez
    • 1
  • Thomas Mensink
    • 1
  1. 1.Xerox Research Centre Europe (XRCE) 

Personalised recommendations