Toward Category-Level Object Recognition

Volume 4170 of the series Lecture Notes in Computer Science pp 508-524

An Implicit Shape Model for Combined Object Categorization and Segmentation

  • Bastian LeibeAffiliated withComputer Vision Lab, ETH Zurich
  • , Ales LeonardisAffiliated withFaculty of Computer and Information Science, University of Ljubljana
  • , Bernt SchieleAffiliated withDepartment of Computer Science, TU Darmstadt

* Final gross prices may vary according to local VAT.

Get Access


We present a method for object categorization in real-world scenes. Following a common consensus in the field, we do not assume that a figure-ground segmentation is available prior to recognition. However, in contrast to most standard approaches for object class recognition, our approach automatically segments the object as a result of the categorization.

This combination of recognition and segmentation into one process is made possible by our use of an Implicit Shape Model, which integrates both capabilities into a common probabilistic framework. This model can be thought of as a non-parametric approach which can easily handle configurations of large numbers of object parts. In addition to the recognition and segmentation result, it also generates a per-pixel confidence measure specifying the area that supports a hypothesis and how much it can be trusted. We use this confidence to derive a natural extension of the approach to handle multiple objects in a scene and resolve ambiguities between overlapping hypotheses with an MDL-based criterion.

In addition, we present an extensive evaluation of our method on a standard dataset for car detection and compare its performance to existing methods from the literature. Our results show that the proposed method outperforms previously published methods while needing one order of magnitude less training examples. Finally, we present results for articulated objects, which show that the proposed method can categorize and segment unfamiliar objects in different articulations and with widely varying texture patterns, even under significant partial occlusion.