Robust Object Detection with Interleaved Categorization and Segmentation

Leibe, Bastian; Leonardis, Aleš; Schiele, Bernt

doi:10.1007/s11263-007-0095-3

Robust Object Detection with Interleaved Categorization and Segmentation

Published: 17 November 2007

Volume 77, pages 259–289, (2008)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Bastian Leibe¹,
Aleš Leonardis² &
Bernt Schiele³

4439 Accesses
701 Citations
12 Altmetric
Explore all metrics

Abstract

This paper presents a novel method for detecting and localizing objects of a visual category in cluttered real-world scenes. Our approach considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. As shown in our work, the tight coupling between those two processes allows them to benefit from each other and improve the combined performance.

The core part of our approach is a highly flexible learned representation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a probabilistic segmentation from the recognition result. This segmentation is then in turn used to again improve recognition by allowing the system to focus its efforts on object pixels and to discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is employed in an MDL based hypothesis verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion.

An extensive evaluation on several large data sets shows that the proposed system is applicable to a range of different object categories, including both rigid and articulated objects. In addition, its flexible representation allows it to achieve competitive object detection performance already from training sets that are between one and two orders of magnitude smaller than those used in comparable systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agarwal, S., Atwan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.
Article Google Scholar
Bajcsy, R., Solina, F., & Gupta, A. (1990). Segmentation versus object representation—are they separable? In Analysis and interpretation of range images (pp. 207–223). New York: Springer.
Google Scholar
Ballard, D. H. (1981). Generalizing the hough transform to detect arbitrary shapes. Pattern Recognition, 13(2), 111–122.
Article MATH Google Scholar
Belongie, S., Malik, J., & Puchiza, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522.
Article Google Scholar
Benzécri, J. P. (1982). Construction d’une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques. Cahiers de l’Analyse des Données, 7(2), 209–218.
MATH Google Scholar
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In Lecture notes in computer science : Vol. 2353. ECCV’02 (pp. 109–122). Berlin: Springer.
Google Scholar
Borenstein, E., Sharon, E., & Ullman, S. (2004). Combining top-down and bottom-up segmentations. In Workshop on perceptual organization in computer vision, Washington, DC, June 2004.
Bruynooghe, M. (1977). Méthodes nouvelles en classification automatique des données taxinomiques nombreuses. Statistique et Analyse des Données, 3, 24–42.
Google Scholar
Burl, M. C., Weber, M., & Perona, P. (1998). A probabilistic approach to object recognition using local photometry and global geometry. In ECCV’98.
Cheng, Y. (1995). Mean shift mode seeking and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8), 790–799.
Article Google Scholar
Collins, R. (2003). Mean-shift blob tracking through scale space. In CVPR’03.
Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Article Google Scholar
Comaniciu, D., Ramesh, V., & Meer, P. (2001). The variable bandwidth mean shift and data-driven scale selection. In ICCV’01.
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (1998). Active appearance models. In ECCV’98.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR’05.
Day, W. H. E., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1, 7–24.
Article MATH Google Scholar
de Rham, C. (1980). La classification hiérarchique ascendante selon la méthode des voisins réciproques. Cahiers de l’Analyse des Données, 5(2), 135–144.
Google Scholar
Deselaers, T., Keysers, D., & Ney, H. (2005). Improving a discriminative approach to object recognition using image patches. In DAGM’05.
Dorko, G., & Schmid, C. (2003). Selection of scale invariant parts for object class recognition. In ICCV’03.
Everingham, M., et al.(2006). The 2005 PASCAL visual object class challenge. In J. Quinonero-Candela, I. Dagan, B. Magnini, & F. d’Alche-Buc (Eds.), Lecture notes in artificial intelligence : Vol. 3944. Machine learning challenges. Evaluating predictive uncertainity, visual object classification, and recognising textual entailment. Berlin: Springer. http://www.pascal-network.org/challenges/VOC/.
Google Scholar
Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1).
Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In CVPR’05.
Fergus, R., Zisserman, A., & Perona, P. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR’03.
Ferrari, V., Tuytelaars, T., & van Gool, L. (2004). Simultaneous recognition and segmentation by image exploration. In ECCV’04.
Garcia, C., & Delakis, M. (2004). Convolutional face finder: a neural architecture for fast and robust face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1408–1423.
Article Google Scholar
Garg, A., Agarwal, S., & Huang, T. (2002). Fusion of global and local information for object detection. In ICPR’02.
Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In Alvey vision conference (pp. 147–151).
Heisele, B., Serre, T., Pontil, M., & Poggio, T. (2001). Component-based face detection. In CVPR’01 (pp. 657–662).
Hough, P. V. C. (1962). Method and means for recognizing complex patterns. U.S. Patent 3069654.
Jones, M., & Poggio, T. (1996). Model-based matching by linear combinations of prototypes. MIT AI Memo 1583, MIT.
Jones, M. J., & Poggio, T. (1998). Multidimensional morphable models: a framework for representing and matching object classes. International Journal of Computer Vision, 29(2), 107–131.
Article Google Scholar
Kadir, T., & Brady, M. (2001). Scale, saliency, and image description. International Journal of Computer Vision, 45(2), 83–105.
Article MATH Google Scholar
Leibe, B., & Schiele, B. (2003). Interleaved object categorization and segmentation. In BMVC’03 (pp. 759–768), Norwich, UK, September 2003.
Leibe, B., & Schiele, B. (2004). Scale invariant object categorization using a scale-adaptive mean-shift search. In Lecture notes in computer science : Vol. 3175. DAGM’04 (pp. 145–153). Berlin: Springer.
Google Scholar
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV’04 workshop on statistical learning in computer vision.
Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR’05.
Leonardis, A., Gupta, A., & Bajcsy, R. (1995). Segmentation of range images as the search for geometric parametric models. International Journal of Computer Vision, 14, 253–277.
Article Google Scholar
Li, F.-F., Fergus, R., & Perona, P. (2003). A Bayesian approach to unsupervised one-shot learning of object categories. In ICCV’03.
Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116.
Article Google Scholar
Lowe, D. G. (1999). Object recognition from local scale invariant features. In ICCV’99.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley symposium on mathematical statistics and probability (pp. 281–297).
Magee, D., & Boyle, R. (2002). Detecting lameness using ‘re-sampling condensation’ and ‘multi-stream cyclic hidden Markov models’. Image and Vision Computing, 20(8), 581–594.
Article Google Scholar
Malik, J., Belongie, S., Leung, T., & Shi, J. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43(1), 7–27.
Article MATH Google Scholar
Marr, D. (1982). Vision. San Francisco: Freeman.
Google Scholar
Matas, J., Chum, O., Martin, U., & Pajdla, T. (2002). Robust wide baseline stereo from maximally stable extremal regions. In BMVC’02 (pp. 384–393).
Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10).
Mikolajczyk, C., Schmid, C., & Zisserman, A. (2004). Human detection based on a probabilistic assembly of robust part detectors. In Lecture notes in computer science : Vol. 3021. ECCV’04 (pp. 69–82). Berlin: Springer.
Google Scholar
Mikolajczyk, K., Leibe, B., & Schiele, B. (2005a). Local features for object class recognition. In ICCV’05.
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Van Gool, L. (2005b). A comparison of affine region detectors. International Journal of Computer Vision, 65(1/2), 43–72.
Article Google Scholar
Mohan, A., Papageorgiou, C., & Poggio, T. (2001). Example-based object detection in images by components. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(4), 349–361.
Article Google Scholar
Mutch, J., & Lowe, D. (2006). Multiclass object recognition with sparse, localized features. In CVPR’06.
Needham, A. (2001). Object recognition and object segregation in 4.5-month-old infants. Journal of Experimental Child Psychology, 78(3), 3–24.
Article Google Scholar
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In ECCV’04.
Papageorgiou, C., & Poggio, T. (2000). A trainable system for object detection. International Journal of Computer Vision, 38(1), 15–33.
Article MATH Google Scholar
Peterson, M. A. (1994). Object recognition processes can and do operate before figure-ground organization. Current Directions in Psychological Science, 3, 105–111.
Article Google Scholar
Ronfard, R., Schmid, C., & Triggs, B. (2002). Learning to parse pictures of people. In ECCV’02 (pp. 700–714).
Rowley, H., Baluja, S., & Kanade, T. (1998). Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), 23–38.
Article Google Scholar
Russell, B., Torralba, A., & Freeman, W. T. (2005). The MIT LabelMe database. http://people.csail.mit.edu/brussell/research/LabelMe.
Schmid, C., & Mohr, R. (1996). Combining greyvalue invariants with local constraints for object recognition. In CVPR’96.
Schneiderman, H., & Kanade, T. (2004). Object detection using the statistics of parts. International Journal of Computer Vision, 56(3), 151–177.
Article Google Scholar
Sclaroff, S. (1997). Deformable prototypes for encoding shape categories in image databases. Pattern Recognition, 30(4).
Seemann, E., Leibe, B., Mikolajczyk, K., & Schiele, B. (2005). An evaluation of local shape-based features for pedestrian detection. In BMVC’05, Oxford, UK.
Sharon, E., Brandt, A., & Basri, R. (2000). Fast multiscale image segmentation. In CVPR’00 (pp. 70–77).
Shi, J., & Malik, J. (1997). Normalized cuts and image segmentation. In CVPR’97 (pp. 731–737).
Stauffer, C., & Grimson, W. E. L. (1999). Adaptive background mixture models for realtime tracking. In CVPR’99.
Thureson, J., & Carlsson, S. (2004). Appearance based qualitative image description for object class recognition. In ECCV’04.
Torralba, A., Murphy, K., & Freeman, W. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In CVPR’04.
Tu, Z., Chen, X., Yuille, A. L., & Zhu, S.-C. (2003). Image parsing: Unifying segmentation, detection, and recognition. In ICCV’03.
Tuytelaars, T., & van Gool, L. (2004). Matching widely separated views based on affinely invariant neighbourhoods. International Journal of Computer Vision, 59(1), 61–85.
Article Google Scholar
Ullman, S. (1998). Three-dimensional object recognition based on the combination of views. Cognition, 67(1), 21–44.
Article Google Scholar
Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5(7), 682–687.
Google Scholar
Vecera, S. P., & O’Reilly, R. C. (1998). Figure-ground organization and object recognition processes: an interactive account. Journal of Experimental Psychology: Human Perception and Performance, 24(2), 441–462.
Article Google Scholar
Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.
Article Google Scholar
Weber, M. (2000). Unsupervised learning of models for object recognition. PhD thesis, California Institute of Technology, Pasadena, CA.
Weber, M., Welling, M., & Perona, P. (2000). Towards automatic discovery of object categories. In CVPR’00.
Wiskott, L., Fellous, J. M., Krueger, N., & von der Malsburg, C. (1997). Face recognition by elastic bunch graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 775–779.
Article Google Scholar
Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In ICCV’05.
Yu, S. X., & Shi, J. (2003). Object-specific figure-ground segregation. In CVPR’03.
Yuille, A. L., Cohen, D. S., & Hallinan, P. W. (1989). Feature extraction from faces using deformable templates. In CVPR’89.
Zhang, W., Yu, B., Zelinsky, G. J., & Samaras, D. (2005). Object class recognition using multiple layer boosting with heterogeneous features. In CVPR’05.
Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Vision Laboratory, ETH Zurich, Zurich, Switzerland
Bastian Leibe
Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
Aleš Leonardis
Department of Computer Science, TU Darmstadt, Darmstadt, Germany
Bernt Schiele

Authors

Bastian Leibe
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Leonardis
View author publications
You can also search for this author in PubMed Google Scholar
Bernt Schiele
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bastian Leibe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leibe, B., Leonardis, A. & Schiele, B. Robust Object Detection with Interleaved Categorization and Segmentation. Int J Comput Vis 77, 259–289 (2008). https://doi.org/10.1007/s11263-007-0095-3

Download citation

Received: 02 September 2005
Accepted: 18 September 2007
Published: 17 November 2007
Issue Date: May 2008
DOI: https://doi.org/10.1007/s11263-007-0095-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Object Detection with Interleaved Categorization and Segmentation

Abstract

Access this article

Similar content being viewed by others

Using Models of Objects with Deformable Parts for Joint Categorization and Segmentation of Objects

A Hybrid Approach for Object Proposal Generation

Scalable scene understanding via saliency consensus

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust Object Detection with Interleaved Categorization and Segmentation

Abstract

Access this article

Similar content being viewed by others

Using Models of Objects with Deformable Parts for Joint Categorization and Segmentation of Objects

A Hybrid Approach for Object Proposal Generation

Scalable scene understanding via saliency consensus

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation