An Implicit Shape Model for Combined Object Categorization and Segmentation

  • Bastian Leibe
  • Ales Leonardis
  • Bernt Schiele
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4170)


We present a method for object categorization in real-world scenes. Following a common consensus in the field, we do not assume that a figure-ground segmentation is available prior to recognition. However, in contrast to most standard approaches for object class recognition, our approach automatically segments the object as a result of the categorization.

This combination of recognition and segmentation into one process is made possible by our use of an Implicit Shape Model, which integrates both capabilities into a common probabilistic framework. This model can be thought of as a non-parametric approach which can easily handle configurations of large numbers of object parts. In addition to the recognition and segmentation result, it also generates a per-pixel confidence measure specifying the area that supports a hypothesis and how much it can be trusted. We use this confidence to derive a natural extension of the approach to handle multiple objects in a scene and resolve ambiguities between overlapping hypotheses with an MDL-based criterion.

In addition, we present an extensive evaluation of our method on a standard dataset for car detection and compare its performance to existing methods from the literature. Our results show that the proposed method outperforms previously published methods while needing one order of magnitude less training examples. Finally, we present results for articulated objects, which show that the proposed method can categorize and segment unfamiliar objects in different articulations and with widely varying texture patterns, even under significant partial occlusion.


Object Detection Segmentation Result Image Patch Minimal Description Length Partial Occlusion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, S., Roth, D.: Learning a sparse representation for object detection. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 113–127. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. Pattern Recognition 13(2), 111–122 (1981)MATHCrossRefGoogle Scholar
  3. 3.
    Borenstein, E., Ullman, S.: Class-specific, top-down segmentation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 109–122. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Burl, M.C., Weber, M., Perona, P.: A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, p. 628. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  5. 5.
    Cheng, Y.: Mean shift mode seeking and clustering. Trans. PAMI 17(8), 790–799 (1995)Google Scholar
  6. 6.
    Comaniciu, D., Meer, P.: Distribution free decomposition of multivariate data. Pattern Analysis and Applications 2(1), 22–30 (1999)MATHCrossRefGoogle Scholar
  7. 7.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, p. 484. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  8. 8.
    Fergus, R., Zisserman, A., Perona, P.: Object class recognition by unsupervised scale-invariant learning. In: CVPR 2003 (2003)Google Scholar
  9. 9.
    Garg, A., Agarwal, S., Huang, T.: Fusion of global and local information for object detection. In: ICPR 2002, pp. 723–726 (2002)Google Scholar
  10. 10.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, pp. 147–151 (1988)Google Scholar
  11. 11.
    Jones, M., Poggio, T.: Model-based matching by linear combinations of prototypes. In: MIT AI Memo 1583. MIT Press, Cambridge (1996)Google Scholar
  12. 12.
    Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV 2004 Workshop on Stat. Learn. in Comp. Vis, Prague, Czech Republic, May 2004, pp. 17–32 (2004)Google Scholar
  13. 13.
    Leibe, B., Schiele, B.: Interleaved object categorization and segmentation. In: BMVC 2003, Norwich, UK, pp. 759–768 (September 2003)Google Scholar
  14. 14.
    Leibe, B., Schiele, B.: Scale-Invariant Object Categorization Using a Scale-Adaptive Mean-Shift Search. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 145–153. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    Leibe, B., Seemann, E., Schiele, B.: Pedestrian detection in crowded scenes. In: CVPR (2005)Google Scholar
  16. 16.
    Leonardis, A., Gupta, A., Bajcsy, R.: Segmentation of range images as the search for geometric parametric models. IJCV 14, 253–277 (1995)CrossRefGoogle Scholar
  17. 17.
    Lowe, D.G.: Object recognition from local scale invariant features. In: ICCV 1999 (1999)Google Scholar
  18. 18.
    Magee, D., Boyle, R.: Detecting lameness using re-sampling condensation and multi-stream cyclic hidden markov models. Image and Vision Computing 20(8), 581–594 (2002)CrossRefGoogle Scholar
  19. 19.
    Mohan, A., Papageorgiou, C., Poggio, T.: Example-based object detection in images by components. Trans. PAMI 23(4), 349–361 (2001)Google Scholar
  20. 20.
    Papageorgiou, C., Poggio, T.: A trainable system for object detection. IJCV 38(1), 15–33 (2000)MATHCrossRefGoogle Scholar
  21. 21.
    Schmid, C.: Constructing models for content-based image retrieval. In: CVPR 2001 (2001)Google Scholar
  22. 22.
    Schneiderman, H., Kanade, T.: A statistical method of 3d object detection applied to faces and cars. In: CVPR 2000, pp. 746–751 (2000)Google Scholar
  23. 23.
    Ullman, S.: Three-dimensional object recognition based on the combination of views. Cognition 67(1), 21–44 (1998)CrossRefGoogle Scholar
  24. 24.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR 2001, pp. 511–518 (2001)Google Scholar
  25. 25.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 18–32. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  26. 26.
    Yu, S.X., Shi, J.: Object-specific figure-ground segregation. In: CVPR 2003 (2003)Google Scholar
  27. 27.
    Yuille, A.L., Cohen, D.S., Hallinan, P.W.: Feature extraction from faces using deformable templates. In: CVPR 1989 (1989)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Bastian Leibe
    • 1
  • Ales Leonardis
    • 2
  • Bernt Schiele
    • 3
  1. 1.Computer Vision LabETH ZurichSwitzerland
  2. 2.Faculty of Computer and Information ScienceUniversity of LjubljanaSlovenia
  3. 3.Department of Computer ScienceTU DarmstadtGermany

Personalised recommendations