Skip to main content

Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations

  • Conference paper

Part of the Communications in Computer and Information Science book series (CCIS,volume 229)

Abstract

The Fisher Vector (FV) representation of images can be seen as an extension of the popular bag-of-visual word (BOV). Both of them are based on an intermediate representation, the visual vocabulary built in the low level feature space. If a probability density function (in our case a Gaussian Mixture Model) is used to model the visual vocabulary, we can compute the gradient of the log likelihood with respect to the parameters of the model to represent an image. The Fisher Vector is the concatenation of these partial derivatives and describes in which direction the parameters of the model should be modified to best fit the data. This representation has the advantage to give similar or even better classification performance than BOV obtained with supervised visual vocabularies, being at the same time class independent. This latter property allows its usage both in supervised (categorization, semantic image segmentation) and unsupervised tasks (clustering, retrieval). In this paper we will show how it was successfully applied to these problems achieving state-of-the-art performances.

Keywords

  • Gaussian Mixture Model
  • Visual Word
  • Image Representation
  • Multiple Kernel Learn
  • Visual Vocabulary

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-25382-9_2
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-642-25382-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sivic, J.S., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV, vol. 2 (2003)

    Google Scholar 

  2. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning for Computer Vision (2004)

    Google Scholar 

  3. Yang, J., Li, Y., Tian, Y., Duan, L., Gao, W.: Group sensitive multiple kernel learning for object categorization. In: ICCV (2009)

    Google Scholar 

  4. Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73(2) (2007)

    Google Scholar 

  5. Tahir, M., Kittler, J., Mikolajczyk, K., Yan, F., van de Sande, K., Gevers, T.: Visual category recognition using spectral regression and kernel discriminant analysis. In: ICCV Workshop on Subspace Methods (2009)

    Google Scholar 

  6. Gemert, J.V., Veenman, C., Smeulders, A., Geusebroek, J.: Visual word ambiguity. IEEE PAMI (accepted, 2010)

    Google Scholar 

  7. Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: VOC: The PASCAL Visual Object Classes Challenge, http://pascallin.ecs.soton.ac.uk/challenges/VOC/

  8. Wang, G., Hoiem, D., Forsyth, D.: Learning image similarity from flickr groups using stochastic intersection kernel machines. In: ICCV (2009)

    Google Scholar 

  9. Maji, S., Berg, A.: Max-margin additive classifiers for detection. In: ICCV (2009)

    Google Scholar 

  10. Perronnin, F., Sánchez, J., Liu, Y.: Large-scale image categorization with explicit data embedding. In: CVPR (2010)

    Google Scholar 

  11. Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR (2010)

    Google Scholar 

  12. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)

    Google Scholar 

  13. Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)

    Google Scholar 

  14. Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  15. Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., Yanker, P.: Query by image and video content: The qbic system. IEEE Computer 28(9), 23–32 (1995)

    CrossRef  Google Scholar 

  16. Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. JMLR 5 (2004)

    Google Scholar 

  17. Squire, D.M., Müller, W., Müller, H., Rakiller, J., Raki, J.: Content-based query of image databases, inspirations from text retrieval: inverted files, frequency-based weights and relevance feedback. Pattern Recognition Letters 21(13-14), 143–149 (1999)

    Google Scholar 

  18. Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: CVPR (2010)

    Google Scholar 

  19. Jégou, H., Douze, M., Schmid, C.: Packing bag-of-features. In: ICCV (2009)

    Google Scholar 

  20. Zhang, X., Li, Z., Zhang, L., Ma, W., Shum, H.-Y.: Efficient indexing for large-scale visual search. In: ICCV (2009)

    Google Scholar 

  21. Müller, H., Clough, P., Deselaers, T., Caputo, B. (eds.): ImageCLEF- Experimental Evaluation in Visual Information Retrieval. The Information Retrieval Series. Springer, Heidelberg (2010) ISBN 978-3-642-15180-4

    MATH  Google Scholar 

  22. Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving “bag-of-keypoints” image categorisation. Technical report, University of Southampton (2005)

    Google Scholar 

  23. Perronnin, F., Dance, C.R., Csurka, G., Bressan, M.: Adapted vocabularies for generic visual categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  24. Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, vol. 11 (1999)

    Google Scholar 

  25. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)

    Google Scholar 

  26. Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: VOC2008 Results (2008), http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008/results/index.shtml

  27. Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: VOC2007 Results (2007), http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/results/index.shtml

  28. Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: ICCV (2009)

    Google Scholar 

  29. Nowak, S., Huiskes, M.: New strategies for image annotation: Overview of the photo annotation task at ImageCLEF 2010. In: [42]

    Google Scholar 

  30. Mensink, T., Csurka, G., Perronnin, F., Sanchez, J., Verbeek, J.: LEAR and XRCE’s participation to visual concept detection task - ImageCLEF 2010. In: [42]

    Google Scholar 

  31. van de Sande, K.E.A., Gevers, T.: The university of amsterdam’s concept detection system at ImageCLEF 2010. In: [42]

    Google Scholar 

  32. Motohashi, N., Izawa, R., Takagi, T.: Meiji university at the ImageCLEF2010 visual concept detection and annotation task: Working notes. In: [42]

    Google Scholar 

  33. Clinchant, S., Csurka, G., Ah-Pine, J., Jacquet, G., Perronnin, F., Sanchez, J., Minoukadeh, K.: XRCE’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of ImageCLEF 2010. In: [42]

    Google Scholar 

  34. ImagCLEF, http://ir.shef.ac.uk/imageclef/

  35. Ah-Pine, J., Clinchant, S., Csurka, G., Perronnin, F., Renders, J.M.: 3.4. In: [21] ISBN 978-3-642-15180-4

    Google Scholar 

  36. Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)

    CrossRef  Google Scholar 

  37. Verbeek, J., Triggs, B.: Scene segmentation with crfs learned from partially labeled images. In: NIPS (2007)

    Google Scholar 

  38. Li, L.-J., Socher, R., Fei-Fei, L.: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In: CVPR (2009)

    Google Scholar 

  39. Csurka, G., Perronnin, F.: A simple high performance approach to semantic segmentation. In: BMVC (2008)

    Google Scholar 

  40. Carreira, J., Sminchisescu, C.: Constrained parametric min-cuts for automatic object segmentation. In: IEEE International Conference on Computer Vision and Pattern Recognition (June 2010); description of our PASCAL VOC 2009 segmentation entry

    Google Scholar 

  41. Marchesotti, L., Cifarelli, C., Csurka, G.: A framework for visual saliency detection with applications to image thumbnailing. In: ICCV (2009)

    Google Scholar 

  42. Braschler, M., Harman, D.: CLEF 2010 LABs and Workshops, Notebook Papers, September 22-23, Padua, Italy (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Csurka, G., Perronnin, F. (2011). Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations. In: Richard, P., Braz, J. (eds) Computer Vision, Imaging and Computer Graphics. Theory and Applications. VISIGRAPP 2010. Communications in Computer and Information Science, vol 229. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25382-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25382-9_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25381-2

  • Online ISBN: 978-3-642-25382-9

  • eBook Packages: Computer ScienceComputer Science (R0)