Generic Visual Categorization Using Weak Geometry

  • Gabriela Csurka
  • Christopher R. Dance
  • Florent Perronnin
  • Jutta Willamowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4170)


In the first part of this chapter we make a general presentation of the bag-of-keypatches approach to generic visual categorization (GVC). Our approach is inspired by the bag-of-words approach to text categorization. This method is able to identify the object content of natural images while generalizing across variations inherent to the object class. To obtain a visual vocabulary insensitive to viewpoint and illumination, rotation or affine invariant orientation histogram descriptors of image patches are vector quantized. Each image is then represented by one visual word occurrence histogram. To classify the images we use one-against-all SVM classifiers and choose the best ranked category. The main advantages of the method are that it is simple, computationally efficient and intrinsically invariant. We obtained excellent results as well for multi-class categorization as for object detection.

In the second part we improve the categorizer by incorporating geometric information. Based on scale, orientation or closeness of the keypatches we can consider a large number of simple geometrical relationships, each of which can be considered as a simplistic classifier. We select from this multitude of classifiers (several millions in our case) and combine them effectively with the original classifier. Results are shown on a new challenging 10 class dataset.


Gaussian Mixture Model Visual Word Object Detection Interest Point Image Patch 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amir, A., Argillander, J., Berg, M., Chang, S.-F., Franz, M., Hsu, W., Iyengar, G., Kender, J., Kennedy, L., Lin, C.-Y., Naphade, M., Natsev, A., Smith, J., Tesic, J., Wu, G., Yang, R., Zhang, D.: IBM research TRECVID-2004 video retrieval system. In: Proc. of TREC Video Retrieval Evaluation (2004)Google Scholar
  2. 2.
    Carbonetto, P., de Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. JMLR 5, 913–939 (2004)Google Scholar
  4. 4.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proc. ECCV International Workshop on Statistical Learning in Computer Vision (2004)Google Scholar
  5. 5.
    Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving “bag-of-keypoints” image categorisation. Technical report, University of Southampton (2005)Google Scholar
  6. 6.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proc. CVPR, vol. 2, pp. 264–271 (2003)Google Scholar
  7. 7.
    Ferrari, V., Tuytelaars, T., Van Gool, L.: Simultaneous object recognition and segmentation by image exploration. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 40–54. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Hsu, W.H., Chang, S.-F.: Visual cue cluster construction via information bottleneck principle and kernel density estimation. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 82–91. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  10. 10.
    Lazebnik, S., Schmid, C., Ponce, J.: Semi-local affine parts for object recognition. In: Proc. BMVC, vol.2, pp. 959–968 (2004)Google Scholar
  11. 11.
    Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: Proc. ECCV Workshop on Statistical Learning in Computer Vision, pp. 17–32 (2004)Google Scholar
  12. 12.
    Li, Y., Bilmes, J.A., Shapiro, L.G.: Object class recognition using images of abstract regions. In: Proc. ICPR, vol. 1, pp. 40–44 (2004)Google Scholar
  13. 13.
    Lodhi, H., Shawe-Taylor, J., Christianini, N., Watkins, C.: Text classification using string kernels. In: Advances in Neural Information Processing Systems, vol. 13 (2001)Google Scholar
  14. 14.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: Proc. ICCV, pp. 1150–1157 (1999)Google Scholar
  15. 15.
    Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Opelt, A., Fussenegger, M., Pinz, A., Auer, P.: Weak hypotheses and boosting for generic object detection and recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 71–84. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  17. 17.
    Pan, J.-Y., Yang, H.-J., Faloutsos, C., Duygulu, P.: GCap: Graph-based automatic image captioning. In: Proc. CVPR Workshop on Multimedia Data and Document Engineering (2004)Google Scholar
  18. 18.
    Pelleg, D., Moore, A.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Proc. ICML (2000)Google Scholar
  19. 19.
    Perronnin, F., Dance, C., Csurka, G., Bressan, M.: Adapted vocabularies for generic visual categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  20. 20.
    Platt, J.C.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Advances in Large Margin Classifiers. MIT Press, Cambridge (1999)Google Scholar
  21. 21.
    Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)MATHCrossRefGoogle Scholar
  22. 22.
    Sivic, J.S., Russell, B.C., Efros, A.A., Zisserman, A., Feeman, W.F.: Discovering objects and their localization in images. In: Proc. ICCV, pp. 370–377 (2005)Google Scholar
  23. 23.
    Sivic, J.S., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. ICCV, vol. 2, pp. 1470–1477 (2003)Google Scholar
  24. 24.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proc. ICML (2000)Google Scholar
  25. 25.
    Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: Efficient boosting procedures for multiclass object detection. In: Proc. CVPR, vol. 2, pp. 762–769 (2004)Google Scholar
  26. 26.
    Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)MATHGoogle Scholar
  27. 27.
    Zhu, L., Rao, A., Zhang, A.: Theory of keyblock-based image retrieval. ACM Transactions on Information Systems 20(2), 224–257 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Gabriela Csurka
    • 1
  • Christopher R. Dance
    • 1
  • Florent Perronnin
    • 1
  • Jutta Willamowski
    • 1
  1. 1.Xerox Research Centre EuropeMeylanFrance

Personalised recommendations