Adapted Vocabularies for Generic Visual Categorization

  • Florent Perronnin
  • Christopher Dance
  • Gabriela Csurka
  • Marco Bressan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3954)


Several state-of-the-art Generic Visual Categorization (GVC) systems are built around a vocabulary of visual terms and characterize images with one histogram of visual word counts. We propose a novel and practical approach to GVC based on a universal vocabulary, which describes the content of all the considered classes of images, and class vocabularies obtained through the adaptation of the universal vocabulary using class-specific data. An image is characterized by a set of histograms – one per class – where each histogram describes whether the image content is best modeled by the universal vocabulary or the corresponding class vocabulary. It is shown experimentally on three very different databases that this novel representation outperforms those approaches which characterize an image with a single histogram.


Feature Vector Gaussian Mixture Model Visual Word Speaker Recognition Visual Vocabulary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Amir, A., Argillander, J., Berg, M., Chang, S.-F., Franz, M., Hsu, W., Iyengar, G., Kender, J., Kennedy, L., Lin, C.-Y., Naphade, M., Natsev, A., Smith, J., Tesic, J., Wu, G., Yang, R., Zhang, D.: IBM research TRECVID-2004 video retrieval system. In: Proc. of TREC Video Retrieval Evaluation (2004)Google Scholar
  2. 2.
    Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical report, Department of Electrical Engineering and Computer Science, UC Berkeley (1998)Google Scholar
  3. 3.
    Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. Journal of Machine Mearning Research 5, 913–939 (2004)MathSciNetGoogle Scholar
  4. 4.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proc. of ECCV Workshop on Statistical Learning for Computer Vision (2004)Google Scholar
  5. 5.
    Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Deselaers, T., Keysers, D., Ney, H.: Classification error rate for quantitative evaluation of content-based image retrieval systems. In: Proc. of ICPR (2004)Google Scholar
  7. 7.
    Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving “bag-of-keypoints” image categorisation. Technical report, University of Southampton (2005)Google Scholar
  8. 8.
    Gauvain, J.-L., Lee, C.-H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. on Speech and Audio Processing 2(2), 291–298 (1994)CrossRefGoogle Scholar
  9. 9.
    Hsu, W.H., Chang, S.-F.: Visual cue cluster construction via information bottleneck principle and kernel density estimation. In: Proc. of CIVR (2005)Google Scholar
  10. 10.
    Leung, T., Malik, J.: Recognizing surfaces using three-dimensional textons. In: Proc. of ICCV (1999)Google Scholar
  11. 11.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  12. 12.
    Reynolds, D., Quatieri, T., Dunn, R.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 19–41 (2000)CrossRefGoogle Scholar
  13. 13.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)zbMATHGoogle Scholar
  14. 14.
    Sivic, J.S., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. of ICCV, vol. 2, pp. 1470–1477 (2003)Google Scholar
  15. 15.
    Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. Int. Journal of Computer Vision 62(1–2), 61–81 (2005)CrossRefGoogle Scholar
  16. 16.
    Winn, K., Criminisi, A., Minka, T.: Object categorization by learned visual dictionary. In: Proc. of ICCV (2005)Google Scholar
  17. 17.
    Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: an in-depth study. INRIA, Research report 5737 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Florent Perronnin
    • 1
  • Christopher Dance
    • 1
  • Gabriela Csurka
    • 1
  • Marco Bressan
    • 1
  1. 1.Xerox Research Centre EuropeMeylanFrance

Personalised recommendations