Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations

Csurka, Gabriela; Perronnin, Florent

doi:10.1007/978-3-642-25382-9_2

Gabriela Csurka³ &
Florent Perronnin³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 229))

Included in the following conference series:

International Conference on Computer Vision, Imaging and Computer Graphics

3178 Accesses
25 Citations

Abstract

The Fisher Vector (FV) representation of images can be seen as an extension of the popular bag-of-visual word (BOV). Both of them are based on an intermediate representation, the visual vocabulary built in the low level feature space. If a probability density function (in our case a Gaussian Mixture Model) is used to model the visual vocabulary, we can compute the gradient of the log likelihood with respect to the parameters of the model to represent an image. The Fisher Vector is the concatenation of these partial derivatives and describes in which direction the parameters of the model should be modified to best fit the data. This representation has the advantage to give similar or even better classification performance than BOV obtained with supervised visual vocabularies, being at the same time class independent. This latter property allows its usage both in supervised (categorization, semantic image segmentation) and unsupervised tasks (clustering, retrieval). In this paper we will show how it was successfully applied to these problems achieving state-of-the-art performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sivic, J.S., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV, vol. 2 (2003)
Google Scholar
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning for Computer Vision (2004)
Google Scholar
Yang, J., Li, Y., Tian, Y., Duan, L., Gao, W.: Group sensitive multiple kernel learning for object categorization. In: ICCV (2009)
Google Scholar
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73(2) (2007)
Google Scholar
Tahir, M., Kittler, J., Mikolajczyk, K., Yan, F., van de Sande, K., Gevers, T.: Visual category recognition using spectral regression and kernel discriminant analysis. In: ICCV Workshop on Subspace Methods (2009)
Google Scholar
Gemert, J.V., Veenman, C., Smeulders, A., Geusebroek, J.: Visual word ambiguity. IEEE PAMI (accepted, 2010)
Google Scholar
Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: VOC: The PASCAL Visual Object Classes Challenge, http://pascallin.ecs.soton.ac.uk/challenges/VOC/
Wang, G., Hoiem, D., Forsyth, D.: Learning image similarity from flickr groups using stochastic intersection kernel machines. In: ICCV (2009)
Google Scholar
Maji, S., Berg, A.: Max-margin additive classifiers for detection. In: ICCV (2009)
Google Scholar
Perronnin, F., Sánchez, J., Liu, Y.: Large-scale image categorization with explicit data embedding. In: CVPR (2010)
Google Scholar
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR (2010)
Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Google Scholar
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Chapter Google Scholar
Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., Yanker, P.: Query by image and video content: The qbic system. IEEE Computer 28(9), 23–32 (1995)
Article Google Scholar
Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. JMLR 5 (2004)
Google Scholar
Squire, D.M., Müller, W., Müller, H., Rakiller, J., Raki, J.: Content-based query of image databases, inspirations from text retrieval: inverted files, frequency-based weights and relevance feedback. Pattern Recognition Letters 21(13-14), 143–149 (1999)
Google Scholar
Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: CVPR (2010)
Google Scholar
Jégou, H., Douze, M., Schmid, C.: Packing bag-of-features. In: ICCV (2009)
Google Scholar
Zhang, X., Li, Z., Zhang, L., Ma, W., Shum, H.-Y.: Efficient indexing for large-scale visual search. In: ICCV (2009)
Google Scholar
Müller, H., Clough, P., Deselaers, T., Caputo, B. (eds.): ImageCLEF- Experimental Evaluation in Visual Information Retrieval. The Information Retrieval Series. Springer, Heidelberg (2010) ISBN 978-3-642-15180-4
MATH Google Scholar
Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving “bag-of-keypoints” image categorisation. Technical report, University of Southampton (2005)
Google Scholar
Perronnin, F., Dance, C.R., Csurka, G., Bressan, M.: Adapted vocabularies for generic visual categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006)
Chapter Google Scholar
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, vol. 11 (1999)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Google Scholar
Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: VOC2008 Results (2008), http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008/results/index.shtml
Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: VOC2007 Results (2007), http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/results/index.shtml
Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: ICCV (2009)
Google Scholar
Nowak, S., Huiskes, M.: New strategies for image annotation: Overview of the photo annotation task at ImageCLEF 2010. In: [42]
Google Scholar
Mensink, T., Csurka, G., Perronnin, F., Sanchez, J., Verbeek, J.: LEAR and XRCE’s participation to visual concept detection task - ImageCLEF 2010. In: [42]
Google Scholar
van de Sande, K.E.A., Gevers, T.: The university of amsterdam’s concept detection system at ImageCLEF 2010. In: [42]
Google Scholar
Motohashi, N., Izawa, R., Takagi, T.: Meiji university at the ImageCLEF2010 visual concept detection and annotation task: Working notes. In: [42]
Google Scholar
Clinchant, S., Csurka, G., Ah-Pine, J., Jacquet, G., Perronnin, F., Sanchez, J., Minoukadeh, K.: XRCE’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of ImageCLEF 2010. In: [42]
Google Scholar
ImagCLEF, http://ir.shef.ac.uk/imageclef/
Ah-Pine, J., Clinchant, S., Csurka, G., Perronnin, F., Renders, J.M.: 3.4. In: [21] ISBN 978-3-642-15180-4
Google Scholar
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)
Chapter Google Scholar
Verbeek, J., Triggs, B.: Scene segmentation with crfs learned from partially labeled images. In: NIPS (2007)
Google Scholar
Li, L.-J., Socher, R., Fei-Fei, L.: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In: CVPR (2009)
Google Scholar
Csurka, G., Perronnin, F.: A simple high performance approach to semantic segmentation. In: BMVC (2008)
Google Scholar
Carreira, J., Sminchisescu, C.: Constrained parametric min-cuts for automatic object segmentation. In: IEEE International Conference on Computer Vision and Pattern Recognition (June 2010); description of our PASCAL VOC 2009 segmentation entry
Google Scholar
Marchesotti, L., Cifarelli, C., Csurka, G.: A framework for visual saliency detection with applications to image thumbnailing. In: ICCV (2009)
Google Scholar
Braschler, M., Harman, D.: CLEF 2010 LABs and Workshops, Notebook Papers, September 22-23, Padua, Italy (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Xerox Research Centre Europe, 6 ch. de Maupertuis, 38240, Meylan, France
Gabriela Csurka & Florent Perronnin

Authors

Gabriela Csurka
View author publications
You can also search for this author in PubMed Google Scholar
Florent Perronnin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LISA-ISTIA, Université d’Angers, 62, avenue Notre-Dame du Lac, 49000, Angers, France
Paul Richard
Departamento de Sistemas e Informática, Escola Superior de Tecnologia do IPS, Rua do Vale de Chaves Estefanilha, 2910, Setúbal, Portugal
José Braz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Csurka, G., Perronnin, F. (2011). Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations. In: Richard, P., Braz, J. (eds) Computer Vision, Imaging and Computer Graphics. Theory and Applications. VISIGRAPP 2010. Communications in Computer and Information Science, vol 229. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25382-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-25382-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25381-2
Online ISBN: 978-3-642-25382-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics