Skip to main content
Log in

Robust Object Detection with Interleaved Categorization and Segmentation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper presents a novel method for detecting and localizing objects of a visual category in cluttered real-world scenes. Our approach considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. As shown in our work, the tight coupling between those two processes allows them to benefit from each other and improve the combined performance.

The core part of our approach is a highly flexible learned representation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a probabilistic segmentation from the recognition result. This segmentation is then in turn used to again improve recognition by allowing the system to focus its efforts on object pixels and to discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is employed in an MDL based hypothesis verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion.

An extensive evaluation on several large data sets shows that the proposed system is applicable to a range of different object categories, including both rigid and articulated objects. In addition, its flexible representation allows it to achieve competitive object detection performance already from training sets that are between one and two orders of magnitude smaller than those used in comparable systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal, S., Atwan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.

    Article  Google Scholar 

  • Bajcsy, R., Solina, F., & Gupta, A. (1990). Segmentation versus object representation—are they separable? In Analysis and interpretation of range images (pp. 207–223). New York: Springer.

    Google Scholar 

  • Ballard, D. H. (1981). Generalizing the hough transform to detect arbitrary shapes. Pattern Recognition, 13(2), 111–122.

    Article  MATH  Google Scholar 

  • Belongie, S., Malik, J., & Puchiza, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522.

    Article  Google Scholar 

  • Benzécri, J. P. (1982). Construction d’une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques. Cahiers de l’Analyse des Données, 7(2), 209–218.

    MATH  Google Scholar 

  • Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In Lecture notes in computer science : Vol. 2353. ECCV’02 (pp. 109–122). Berlin: Springer.

    Google Scholar 

  • Borenstein, E., Sharon, E., & Ullman, S. (2004). Combining top-down and bottom-up segmentations. In Workshop on perceptual organization in computer vision, Washington, DC, June 2004.

  • Bruynooghe, M. (1977). Méthodes nouvelles en classification automatique des données taxinomiques nombreuses. Statistique et Analyse des Données, 3, 24–42.

    Google Scholar 

  • Burl, M. C., Weber, M., & Perona, P. (1998). A probabilistic approach to object recognition using local photometry and global geometry. In ECCV’98.

  • Cheng, Y. (1995). Mean shift mode seeking and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8), 790–799.

    Article  Google Scholar 

  • Collins, R. (2003). Mean-shift blob tracking through scale space. In CVPR’03.

  • Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.

    Article  Google Scholar 

  • Comaniciu, D., Ramesh, V., & Meer, P. (2001). The variable bandwidth mean shift and data-driven scale selection. In ICCV’01.

  • Cootes, T. F., Edwards, G. J., & Taylor, C. J. (1998). Active appearance models. In ECCV’98.

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR’05.

  • Day, W. H. E., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1, 7–24.

    Article  MATH  Google Scholar 

  • de Rham, C. (1980). La classification hiérarchique ascendante selon la méthode des voisins réciproques. Cahiers de l’Analyse des Données, 5(2), 135–144.

    Google Scholar 

  • Deselaers, T., Keysers, D., & Ney, H. (2005). Improving a discriminative approach to object recognition using image patches. In DAGM’05.

  • Dorko, G., & Schmid, C. (2003). Selection of scale invariant parts for object class recognition. In ICCV’03.

  • Everingham, M., et al.(2006). The 2005 PASCAL visual object class challenge. In J. Quinonero-Candela, I. Dagan, B. Magnini, & F. d’Alche-Buc (Eds.), Lecture notes in artificial intelligence : Vol. 3944. Machine learning challenges. Evaluating predictive uncertainity, visual object classification, and recognising textual entailment. Berlin: Springer. http://www.pascal-network.org/challenges/VOC/.

    Google Scholar 

  • Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1).

  • Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In CVPR’05.

  • Fergus, R., Zisserman, A., & Perona, P. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR’03.

  • Ferrari, V., Tuytelaars, T., & van Gool, L. (2004). Simultaneous recognition and segmentation by image exploration. In ECCV’04.

  • Garcia, C., & Delakis, M. (2004). Convolutional face finder: a neural architecture for fast and robust face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1408–1423.

    Article  Google Scholar 

  • Garg, A., Agarwal, S., & Huang, T. (2002). Fusion of global and local information for object detection. In ICPR’02.

  • Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In Alvey vision conference (pp. 147–151).

  • Heisele, B., Serre, T., Pontil, M., & Poggio, T. (2001). Component-based face detection. In CVPR’01 (pp. 657–662).

  • Hough, P. V. C. (1962). Method and means for recognizing complex patterns. U.S. Patent 3069654.

  • Jones, M., & Poggio, T. (1996). Model-based matching by linear combinations of prototypes. MIT AI Memo 1583, MIT.

  • Jones, M. J., & Poggio, T. (1998). Multidimensional morphable models: a framework for representing and matching object classes. International Journal of Computer Vision, 29(2), 107–131.

    Article  Google Scholar 

  • Kadir, T., & Brady, M. (2001). Scale, saliency, and image description. International Journal of Computer Vision, 45(2), 83–105.

    Article  MATH  Google Scholar 

  • Leibe, B., & Schiele, B. (2003). Interleaved object categorization and segmentation. In BMVC’03 (pp. 759–768), Norwich, UK, September 2003.

  • Leibe, B., & Schiele, B. (2004). Scale invariant object categorization using a scale-adaptive mean-shift search. In Lecture notes in computer science : Vol. 3175. DAGM’04 (pp. 145–153). Berlin: Springer.

    Google Scholar 

  • Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV’04 workshop on statistical learning in computer vision.

  • Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR’05.

  • Leonardis, A., Gupta, A., & Bajcsy, R. (1995). Segmentation of range images as the search for geometric parametric models. International Journal of Computer Vision, 14, 253–277.

    Article  Google Scholar 

  • Li, F.-F., Fergus, R., & Perona, P. (2003). A Bayesian approach to unsupervised one-shot learning of object categories. In ICCV’03.

  • Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116.

    Article  Google Scholar 

  • Lowe, D. G. (1999). Object recognition from local scale invariant features. In ICCV’99.

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley symposium on mathematical statistics and probability (pp. 281–297).

  • Magee, D., & Boyle, R. (2002). Detecting lameness using ‘re-sampling condensation’ and ‘multi-stream cyclic hidden Markov models’. Image and Vision Computing, 20(8), 581–594.

    Article  Google Scholar 

  • Malik, J., Belongie, S., Leung, T., & Shi, J. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43(1), 7–27.

    Article  MATH  Google Scholar 

  • Marr, D. (1982). Vision. San Francisco: Freeman.

    Google Scholar 

  • Matas, J., Chum, O., Martin, U., & Pajdla, T. (2002). Robust wide baseline stereo from maximally stable extremal regions. In BMVC’02 (pp. 384–393).

  • Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10).

  • Mikolajczyk, C., Schmid, C., & Zisserman, A. (2004). Human detection based on a probabilistic assembly of robust part detectors. In Lecture notes in computer science : Vol. 3021. ECCV’04 (pp. 69–82). Berlin: Springer.

    Google Scholar 

  • Mikolajczyk, K., Leibe, B., & Schiele, B. (2005a). Local features for object class recognition. In ICCV’05.

  • Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Van Gool, L. (2005b). A comparison of affine region detectors. International Journal of Computer Vision, 65(1/2), 43–72.

    Article  Google Scholar 

  • Mohan, A., Papageorgiou, C., & Poggio, T. (2001). Example-based object detection in images by components. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(4), 349–361.

    Article  Google Scholar 

  • Mutch, J., & Lowe, D. (2006). Multiclass object recognition with sparse, localized features. In CVPR’06.

  • Needham, A. (2001). Object recognition and object segregation in 4.5-month-old infants. Journal of Experimental Child Psychology, 78(3), 3–24.

    Article  Google Scholar 

  • Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In ECCV’04.

  • Papageorgiou, C., & Poggio, T. (2000). A trainable system for object detection. International Journal of Computer Vision, 38(1), 15–33.

    Article  MATH  Google Scholar 

  • Peterson, M. A. (1994). Object recognition processes can and do operate before figure-ground organization. Current Directions in Psychological Science, 3, 105–111.

    Article  Google Scholar 

  • Ronfard, R., Schmid, C., & Triggs, B. (2002). Learning to parse pictures of people. In ECCV’02 (pp. 700–714).

  • Rowley, H., Baluja, S., & Kanade, T. (1998). Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), 23–38.

    Article  Google Scholar 

  • Russell, B., Torralba, A., & Freeman, W. T. (2005). The MIT LabelMe database. http://people.csail.mit.edu/brussell/research/LabelMe.

  • Schmid, C., & Mohr, R. (1996). Combining greyvalue invariants with local constraints for object recognition. In CVPR’96.

  • Schneiderman, H., & Kanade, T. (2004). Object detection using the statistics of parts. International Journal of Computer Vision, 56(3), 151–177.

    Article  Google Scholar 

  • Sclaroff, S. (1997). Deformable prototypes for encoding shape categories in image databases. Pattern Recognition, 30(4).

  • Seemann, E., Leibe, B., Mikolajczyk, K., & Schiele, B. (2005). An evaluation of local shape-based features for pedestrian detection. In BMVC’05, Oxford, UK.

  • Sharon, E., Brandt, A., & Basri, R. (2000). Fast multiscale image segmentation. In CVPR’00 (pp. 70–77).

  • Shi, J., & Malik, J. (1997). Normalized cuts and image segmentation. In CVPR’97 (pp. 731–737).

  • Stauffer, C., & Grimson, W. E. L. (1999). Adaptive background mixture models for realtime tracking. In CVPR’99.

  • Thureson, J., & Carlsson, S. (2004). Appearance based qualitative image description for object class recognition. In ECCV’04.

  • Torralba, A., Murphy, K., & Freeman, W. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In CVPR’04.

  • Tu, Z., Chen, X., Yuille, A. L., & Zhu, S.-C. (2003). Image parsing: Unifying segmentation, detection, and recognition. In ICCV’03.

  • Tuytelaars, T., & van Gool, L. (2004). Matching widely separated views based on affinely invariant neighbourhoods. International Journal of Computer Vision, 59(1), 61–85.

    Article  Google Scholar 

  • Ullman, S. (1998). Three-dimensional object recognition based on the combination of views. Cognition, 67(1), 21–44.

    Article  Google Scholar 

  • Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5(7), 682–687.

    Google Scholar 

  • Vecera, S. P., & O’Reilly, R. C. (1998). Figure-ground organization and object recognition processes: an interactive account. Journal of Experimental Psychology: Human Perception and Performance, 24(2), 441–462.

    Article  Google Scholar 

  • Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.

    Article  Google Scholar 

  • Weber, M. (2000). Unsupervised learning of models for object recognition. PhD thesis, California Institute of Technology, Pasadena, CA.

  • Weber, M., Welling, M., & Perona, P. (2000). Towards automatic discovery of object categories. In CVPR’00.

  • Wiskott, L., Fellous, J. M., Krueger, N., & von der Malsburg, C. (1997). Face recognition by elastic bunch graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 775–779.

    Article  Google Scholar 

  • Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In ICCV’05.

  • Yu, S. X., & Shi, J. (2003). Object-specific figure-ground segregation. In CVPR’03.

  • Yuille, A. L., Cohen, D. S., & Hallinan, P. W. (1989). Feature extraction from faces using deformable templates. In CVPR’89.

  • Zhang, W., Yu, B., Zelinsky, G. J., & Samaras, D. (2005). Object class recognition using multiple layer boosting with heterogeneous features. In CVPR’05.

  • Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bastian Leibe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leibe, B., Leonardis, A. & Schiele, B. Robust Object Detection with Interleaved Categorization and Segmentation. Int J Comput Vis 77, 259–289 (2008). https://doi.org/10.1007/s11263-007-0095-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-007-0095-3

Keywords

Navigation