Learning Compositional Categorization Models

  • Björn Ommer
  • Joachim M. Buhmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3953)


This contribution proposes a compositional approach to visual object categorization of scenes. Compositions are learned from the Caltech 101 database and form intermediate abstractions of images that are semantically situated between low-level representations and the high-level categorization. Salient regions, which are described by localized feature histograms, are detected as image parts. Subsequently compositions are formed as bags of parts with a locality constraint. After performing a spatial binding of compositions by means of a shape model, coupled probabilistic kernel classifiers are applied thereupon to establish the final image categorization. In contrast to the discriminative training of the categorizer, intermediate compositions are learned in a generative manner yielding relevant part agglomerations, i.e. groupings which are frequently appearing in the dataset while simultaneously supporting the discrimination between sets of categories. Consequently, compositionality simplifies the learning of a complex categorization model for complete scenes by splitting it up into simpler, sharable compositions. The architecture is evaluated on the highly challenging Caltech 101 database which exhibits large intra-category variations. Our compositional approach shows competitive retrieval rates in the range of 53.6 ± 0.88% or, with a multi-scale feature set, rates of 57.8 ± 0.79%.


Training Image Local Descriptor Category Label Salient Region Retrieval Rate 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22 (1973)Google Scholar
  2. 2.
    Lades, M., Vorbrüggen, J.C., Buhmann, J.M., Lange, J., von der Malsburg, C., Würtz, R.P., Konen, W.: Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Comput. 42 (1993)Google Scholar
  3. 3.
    Leibe, B., Schiele, B.: Scale-invariant object categorization using a scale-adaptive mean-shift search. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 145–153. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Fergus, R., Perona, P., Zisserman, A.: A visual category filter for google images. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 242–256. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: CVPR Workshop GMBV (2004)Google Scholar
  6. 6.
    Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. IEEE Trans. Pattern Anal. Machine Intell. 26 (2004)Google Scholar
  7. 7.
    Ommer, B., Buhmann, J.M.: Object categorization by compositional graphical models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 235–250. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: CVPR (2005)Google Scholar
  9. 9.
    Geman, S., Potter, D.F., Chi, Z.: Composition Systems. Technical report, Division of Applied Mathematics. Brown University, Providence, RI (1998)Google Scholar
  10. 10.
    Biederman, I.: Recognition-by-components: A theory of human image understanding. Psychological Review 94 (1987)Google Scholar
  11. 11.
    Ommer, B., Buhmann, J.M.: A compositionality architecture for perceptual feature grouping. In: Rangarajan, A., Figueiredo, M.A.T., Zerubia, J. (eds.) EMMCVPR 2003. LNCS, vol. 2683. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Lowe, D.G.: Perceptual Organization and Visual Recognition. Kluwer Academic Publishers, Norwell (1985)Google Scholar
  13. 13.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Computer Vision 60 (2004)Google Scholar
  14. 14.
    Veltkamp, R.C., Tanase, M.: Content-based image and video retrieval. In: A Survey of Content-Based Image Retrieval Systems. Kluwer, Dordrecht (2002)Google Scholar
  15. 15.
    Dance, C., Willamowski, J., Fan, L., Bray, C., Csurka, G.: Visual categorization with bags of keypoints. In: ECCV Workshop on Stat. Learn. in Comp. Vis. (2004)Google Scholar
  16. 16.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 18–32. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  17. 17.
    Holub, A.D., Welling, M., Perona, P.: Combining generative models and fisher kernels for object class recognition. In: ICCV (2005)Google Scholar
  18. 18.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Computer Vision 61 (2005)Google Scholar
  19. 19.
    Heisele, B., Serre, T., Pontil, M., Vetter, T., Poggio, T.: Categorization by learning and combining object parts. In: NIPS (2001)Google Scholar
  20. 20.
    Borenstein, E., Sharon, E., Ullman, S.: Combining top-down and bottom-up segmentation. In: CVPR Workshop on Perceptual Organization in Comp. Vis. (2004)Google Scholar
  21. 21.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)Google Scholar
  22. 22.
    Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Computer Vision 60 (2004)Google Scholar
  23. 23.
    Winkler, G.: Image Analysis, Random Fields and Markov Chain Monte Carlo Methods—A Mathematical Introduction, 2nd edn. Springer, Heidelberg (2003)Google Scholar
  24. 24.
    Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing visual features for multiclass and multiview object detection. In: CVPR (2004)Google Scholar
  25. 25.
    Roth, V., Tsuda, K.: Pairwise coupling for machine recognition of hand-printed japanese characters. In: CVPR (2001)Google Scholar
  26. 26.
    Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: NIPS (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Björn Ommer
    • 1
  • Joachim M. Buhmann
    • 1
  1. 1.Institute of Computational ScienceETH ZurichZurichSwitzerland

Personalised recommendations