Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations

  • Gabriela Csurka
  • Florent Perronnin
Part of the Communications in Computer and Information Science book series (CCIS, volume 229)


The Fisher Vector (FV) representation of images can be seen as an extension of the popular bag-of-visual word (BOV). Both of them are based on an intermediate representation, the visual vocabulary built in the low level feature space. If a probability density function (in our case a Gaussian Mixture Model) is used to model the visual vocabulary, we can compute the gradient of the log likelihood with respect to the parameters of the model to represent an image. The Fisher Vector is the concatenation of these partial derivatives and describes in which direction the parameters of the model should be modified to best fit the data. This representation has the advantage to give similar or even better classification performance than BOV obtained with supervised visual vocabularies, being at the same time class independent. This latter property allows its usage both in supervised (categorization, semantic image segmentation) and unsupervised tasks (clustering, retrieval). In this paper we will show how it was successfully applied to these problems achieving state-of-the-art performances.


Gaussian Mixture Model Visual Word Image Representation Multiple Kernel Learn Visual Vocabulary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sivic, J.S., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV, vol. 2 (2003)Google Scholar
  2. 2.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning for Computer Vision (2004)Google Scholar
  3. 3.
    Yang, J., Li, Y., Tian, Y., Duan, L., Gao, W.: Group sensitive multiple kernel learning for object categorization. In: ICCV (2009)Google Scholar
  4. 4.
    Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73(2) (2007)Google Scholar
  5. 5.
    Tahir, M., Kittler, J., Mikolajczyk, K., Yan, F., van de Sande, K., Gevers, T.: Visual category recognition using spectral regression and kernel discriminant analysis. In: ICCV Workshop on Subspace Methods (2009)Google Scholar
  6. 6.
    Gemert, J.V., Veenman, C., Smeulders, A., Geusebroek, J.: Visual word ambiguity. IEEE PAMI (accepted, 2010)Google Scholar
  7. 7.
    Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: VOC: The PASCAL Visual Object Classes Challenge,
  8. 8.
    Wang, G., Hoiem, D., Forsyth, D.: Learning image similarity from flickr groups using stochastic intersection kernel machines. In: ICCV (2009)Google Scholar
  9. 9.
    Maji, S., Berg, A.: Max-margin additive classifiers for detection. In: ICCV (2009)Google Scholar
  10. 10.
    Perronnin, F., Sánchez, J., Liu, Y.: Large-scale image categorization with explicit data embedding. In: CVPR (2010)Google Scholar
  11. 11.
    Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR (2010)Google Scholar
  12. 12.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)Google Scholar
  13. 13.
    Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)Google Scholar
  14. 14.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., Yanker, P.: Query by image and video content: The qbic system. IEEE Computer 28(9), 23–32 (1995)CrossRefGoogle Scholar
  16. 16.
    Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. JMLR 5 (2004)Google Scholar
  17. 17.
    Squire, D.M., Müller, W., Müller, H., Rakiller, J., Raki, J.: Content-based query of image databases, inspirations from text retrieval: inverted files, frequency-based weights and relevance feedback. Pattern Recognition Letters 21(13-14), 143–149 (1999)Google Scholar
  18. 18.
    Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: CVPR (2010)Google Scholar
  19. 19.
    Jégou, H., Douze, M., Schmid, C.: Packing bag-of-features. In: ICCV (2009)Google Scholar
  20. 20.
    Zhang, X., Li, Z., Zhang, L., Ma, W., Shum, H.-Y.: Efficient indexing for large-scale visual search. In: ICCV (2009)Google Scholar
  21. 21.
    Müller, H., Clough, P., Deselaers, T., Caputo, B. (eds.): ImageCLEF- Experimental Evaluation in Visual Information Retrieval. The Information Retrieval Series. Springer, Heidelberg (2010) ISBN 978-3-642-15180-4 zbMATHGoogle Scholar
  22. 22.
    Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving “bag-of-keypoints” image categorisation. Technical report, University of Southampton (2005)Google Scholar
  23. 23.
    Perronnin, F., Dance, C.R., Csurka, G., Bressan, M.: Adapted vocabularies for generic visual categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  24. 24.
    Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, vol. 11 (1999)Google Scholar
  25. 25.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  26. 26.
    Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: VOC2008 Results (2008),
  27. 27.
    Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: VOC2007 Results (2007),
  28. 28.
    Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: ICCV (2009)Google Scholar
  29. 29.
    Nowak, S., Huiskes, M.: New strategies for image annotation: Overview of the photo annotation task at ImageCLEF 2010. In: [42]Google Scholar
  30. 30.
    Mensink, T., Csurka, G., Perronnin, F., Sanchez, J., Verbeek, J.: LEAR and XRCE’s participation to visual concept detection task - ImageCLEF 2010. In: [42]Google Scholar
  31. 31.
    van de Sande, K.E.A., Gevers, T.: The university of amsterdam’s concept detection system at ImageCLEF 2010. In: [42]Google Scholar
  32. 32.
    Motohashi, N., Izawa, R., Takagi, T.: Meiji university at the ImageCLEF2010 visual concept detection and annotation task: Working notes. In: [42]Google Scholar
  33. 33.
    Clinchant, S., Csurka, G., Ah-Pine, J., Jacquet, G., Perronnin, F., Sanchez, J., Minoukadeh, K.: XRCE’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of ImageCLEF 2010. In: [42]Google Scholar
  34. 34.
  35. 35.
    Ah-Pine, J., Clinchant, S., Csurka, G., Perronnin, F., Renders, J.M.: 3.4. In: [21] ISBN 978-3-642-15180-4Google Scholar
  36. 36.
    Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  37. 37.
    Verbeek, J., Triggs, B.: Scene segmentation with crfs learned from partially labeled images. In: NIPS (2007)Google Scholar
  38. 38.
    Li, L.-J., Socher, R., Fei-Fei, L.: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In: CVPR (2009)Google Scholar
  39. 39.
    Csurka, G., Perronnin, F.: A simple high performance approach to semantic segmentation. In: BMVC (2008)Google Scholar
  40. 40.
    Carreira, J., Sminchisescu, C.: Constrained parametric min-cuts for automatic object segmentation. In: IEEE International Conference on Computer Vision and Pattern Recognition (June 2010); description of our PASCAL VOC 2009 segmentation entryGoogle Scholar
  41. 41.
    Marchesotti, L., Cifarelli, C., Csurka, G.: A framework for visual saliency detection with applications to image thumbnailing. In: ICCV (2009)Google Scholar
  42. 42.
    Braschler, M., Harman, D.: CLEF 2010 LABs and Workshops, Notebook Papers, September 22-23, Padua, Italy (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Gabriela Csurka
    • 1
  • Florent Perronnin
    • 1
  1. 1.Xerox Research Centre EuropeMeylanFrance

Personalised recommendations