Nonparametric Estimation of Fisher Vectors to Aggregate Image Descriptors

  • Hervé Le Borgne
  • Pablo Muñoz Fuentes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6915)

Abstract

We investigate how to represent a natural image in order to be able to recognize the visual concepts within it. The core of the proposed method consists in a new approach to aggregate local features, based on a non-parametric estimation of the Fisher vector, that result from the derivation of the gradient of the loglikelihood. For this, we need to use low level local descriptors that are learned with independent component analysis and thus provide a statistically independent description of the images. The resulting signature has a very intuitive interpretation and we propose an efficient implementation as well. We show on publicly available datasets that the proposed image signature performs very well.

Keywords

Independent Component Analysis Gaussian Mixture Model Image Retrieval Visual Word Nonparametric Estimation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: CVPR, San Francisco, USA (2010)Google Scholar
  2. 2.
    Comon, P.: Independent component analysis, a new concept? Signal Processing 36(3), 287–314 (1994)CrossRefMATHGoogle Scholar
  3. 3.
    Deselaers, T., Deserno, T.: The visual concept detection task in imageclef 2008. In: ImageCLEF Workshop (2008)Google Scholar
  4. 4.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)MATHGoogle Scholar
  5. 5.
    van Gemert, J.C., Veenman, C.J., Smeulders, A.W.M., Geusebroek, J.M.: Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(7), 1271–1283 (2010)CrossRefGoogle Scholar
  6. 6.
    Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley-Interscience, Hoboken (2001)CrossRefGoogle Scholar
  7. 7.
    Jaakola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NIPS, pp. 1–8 (1999)Google Scholar
  8. 8.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, San Francisco, USA (June 2010)Google Scholar
  9. 9.
    Kooperberg, C., Stone, C.J.: Logspline density estimation for censored data. Journal of Computational and Graphical Statistics 1, 301–328 (1997)Google Scholar
  10. 10.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR, Washington, DC, USA, pp. 2169–2178 (2006)Google Scholar
  11. 11.
    Le Borgne, H., Guérin Dugué, A., Antoniadis, A.: Representation of images for classification with independent features. Pattern Recognition Letters 25(2), 141–154 (2004)CrossRefGoogle Scholar
  12. 12.
    Le Borgne, H., Honnorat, N.: Fast shared boosting for large-scale concept detection. Multimedia Tools and Applications (2010)Google Scholar
  13. 13.
    Liu, J., Shah, M.: Scene modeling using co-clustering. In: ICCV (2007)Google Scholar
  14. 14.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: CVPR 1999, Los Alamitos, CA, USA, vol. 2, pp. 1150–1157 (August 1999)Google Scholar
  15. 15.
    Masnadi-Shirazi, H., Mahadevan, V., Vasconcelos, N.: On the design of robust classifiers for computer vision. In: CVPR, San Francisco, USA, pp. 779–786 (June 2010)Google Scholar
  16. 16.
    Perronnin, F., Dance, C.R.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)Google Scholar
  17. 17.
    Perronnin, F., Dance, C.R.: Large-scale image retrieval with compressed fisher kernels. In: CVPR, San Francisco, USA, pp. 3384–3391 (2010)Google Scholar
  18. 18.
    Rasiwasia, N., Vasconcelos, N.: Holistic context modeling using semantic co-occurrences. In: CVPR, Los Alamitos, CA, USA, pp. 1889–1895 (2009)Google Scholar
  19. 19.
    Rasiwasia, N., Vasconcelos, N.: Scene classification with low-dimensional semantic spaces and weak supervision. In: CVPR, pp. 1–6 (2008)Google Scholar
  20. 20.
    Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV, vol. 2, pp. 1470–1477 (2003)Google Scholar
  21. 21.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Hervé Le Borgne
    • 1
  • Pablo Muñoz Fuentes
    • 1
  1. 1.CEA, LIST, Laboratory of Vision and Content EngineeringFontenay-au-RosesFrance

Personalised recommendations