Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection

Opelt, Andreas; Pinz, Axel; Zisserman, Andrew

doi:10.1007/s11263-008-0139-3

Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection

Open access
Published: 13 May 2008

Volume 80, pages 16–44, (2008)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computer Vision Aims and scope Submit manuscript

Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection

Download PDF

Andreas Opelt¹,
Axel Pinz¹ &
Andrew Zisserman²

1668 Accesses
Explore all metrics

Abstract

We present a novel algorithmic approach to object categorization and detection that can learn category specific detectors, using Boosting, from a visual alphabet of shape and appearance. The alphabet itself is learnt incrementally during this process. The resulting representation consists of a set of category-specific descriptors—basic shape features are represented by boundary-fragments, and appearance is represented by patches—where each descriptor in combination with centroid vectors for possible object centroids (geometry) forms an alphabet entry. Our experimental results highlight several qualities of this novel representation. First, we demonstrate the power of purely shape-based representation with excellent categorization and detection results using a Boundary-Fragment-Model (BFM), and investigate the capabilities of such a model to handle changes in scale and viewpoint, as well as intra- and inter-class variability. Second, we show that incremental learning of a BFM for many categories leads to a sub-linear growth of visual alphabet entries by sharing of shape features, while this generalization over categories at the same time often improves categorization performance (over independently learning the categories). Finally, the combination of basic shape and appearance (boundary-fragments and patches) features can further improve results. Certain feature types are preferred by certain categories, and for some categories we achieve the lowest error rates that have been reported so far.

Article PDF

Object Classification from Shape Detection

Shape-Based Object Discovery in Images

Unsupervised Visual Object Categorisation with BoF and Spatial Matching

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.
Article Google Scholar
Amit, Y., German, D., & Fan, X. (2004). A coarse-to-fine strategy for multi-class shape detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1606–1621.
Article Google Scholar
Amores, J., Sebe, N., & Radeva, P. (2005). Fast spatial pattern discovery integrating boosting with constellations of contextual descriptors. In Proceedings of the CVPR (Vol. 2, pp. 769–774), CA, USA, June 2005.
Bar-Hillel, A., Hertz, T., & Weinshall, D. (2005). Object class recognition by boosting a part-based model. In Proceedings of the CVPR (Vol. 1, pp. 702–709), June 2005.
Bart, E., & Ullman, S. (2005). Cross-generalization:learning novel classes from a single example by feature replacement. In Proceedings of the CVPR (Vol. 1, pp. 672–679).
Bernstein, E. J., & Amit, Y. (2005). Part-based statistical models for object classification and detection. In Proceedings of the CVPR (Vol. 2, pp. 734–740).
Borgefors, G. (1988). Hierarchical chamfer matching: a parametric edge matching algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(6), 849–865.
Article Google Scholar
Breu, H., Gil, J., Kirkpatrick, D., & Werman, M. (1995). Linear time Euclidean distance transform algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5), 529–533.
Article Google Scholar
Caputo, B., Wallraven, C., & Nilsback, M. E. (2004). Object categorization via local kernels. In Proceedings of the ICPR (Vol. 2, pp. 132–135).
Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach towards feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Article Google Scholar
Crandall, D., Felzenszwalb, P., & Huttenlocher, D. (2005). Spatial priors for part-based recognition using statistical models. In Proceedings of the CVPR (pp. 10–17).
Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In ECCV’04: workshop on statistical learning in computer vision (pp. 59–74).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the CVPR (Vol. 1, pp. 886–893).
Deselaers, T., Keysers, D., & Ney, H. (2005). Discriminative training for object recognition using images patches. In Proceedings of the CVPR (Vol. 2, pp. 157–162).
Epstein, B., & Ullman, S. (2005). Feature hierarchies for object classification. In Proceedings of the ICCV (Vol. 1, pp. 220–227).
Everingham, M., Zisserman, A., Williams, C., Van Gool, L., Allan, M., Bishop, C., Chapelle, O., Dalal, N., Deselaers, T., Dorko, G., Duffner, S., Eichhorn, J., Farquhar, J., Fritz, M., Garcia, C., Griffiths, T., Jurie, F., Keysers, D., Koskela, M., Laaksonen, J., Larlus, D., Leibe, B., Meng, H., Ney, H., Schiele, B., Schmid, C., Seemann, E., Shawe-Taylor, J., Storkey, A., Szedmak, S., Triggs, B., Ulusoy, I., Viitaniemi, V., & Zhang, J. (2005). The 2005 pascal visual object classes challenge. In Lecture notes in artificial intelligence. Selected proceedings of the first PASCAL challenges workshop. Berlin: Springer.
Google Scholar
Fan, X. (2005). Efficient multiclass object detection by a hierarchy of classifiers. In Proceedings of the CVPR (Vol. 1, pp. 716–723).
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In Proceedings of the CVPR workshop on generative-model based vision.
Felzenszwalb, P., & Huttenlocher, D. (2004). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Article Google Scholar
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the CVPR (pp. 264–271).
Fergus, R., Perona, P., & Zisserman, A. (2004). A visual category filter for Google images. In Proceedings of the ECCV (pp. 242–256).
Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In Proceedings of the CVPR (Vol. 1, pp. 380–387).
Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. International Journal of Computer Vision, 71(3), 273–303.
Article Google Scholar
Ferrari, V., Tuytelaars, T., & Van Gool, L. (2004). Simultaneous object recognition and segmentation by image exploration. In Proceedings of the ECCV (pp. 40–54).
Ferrari, V., Tuytelaars, T., & Van Gool, L. (2006). Object detection by contour segment networks. In Proceedings of the ECCV (Vol. 3, pp. 14–28).
Freund, Y., & Schapire, R. (1997). A decision theoretic generalisation of online learning. Computer and System Sciences, 55(1), 119–139.
Article MATH MathSciNet Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: a statistical view of boosting (Technical report). Stanford University, Department of Statistics, California 94305.
Gavrila, D. M., & Philomin, V. (1999). Real-time object detection for smart vehicles. In Proceedings of the ICCV (pp. 87–93).
Jurie, F., & Schmid, C. (2004). Scale-invariant shape features for recognition of object categories. In Proceedings of conference on vision and pattern recognition (pp. 90–96).
Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2004). Extending pictural structures for object recognition. In Proceedings of the BMVC.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV’04: workshop on statistical learning in computer vision (pp. 17–32), May 2004.
Leibe, B., & Schiele, B. (2004). Scale-invariant object categorization using a scale-adaptive means-shift search. In DAGM’04 (pp. 145–153), August 2004.
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Proceedings of the ICCV (pp. 1150–1157).
Magee, D., & Boyle, R. (2002). Detection of lameness using re-sampling condensation and multi-steam cyclic hidden Markov models. Image and Vision Computing, 20(8), 581–594.
Article Google Scholar
Marszalek, M., & Schmid, C. (2006). Spatial weighting for bag-of-features. In Proceedings of the CVPR.
Mikolajczyk, K., Leibe, B., & Schiele, B. (2006). Multiple object class detection with a generative model. In Proceedings of the CVPR.
Mutch, J., & Lowe, D. (2006). Multiclass object recognition with sparse, localized features. In Proceedings of the CVPR.
Nistér, D., & Stewénius, H. (2006). Scalable recognition with a vocabulary tree. In Proceedings of the CVPR.
Ommer, B., & Buhmann, J. M. (2006). Learning compositional categorization models. In Proceedings of the ECCV (Vol. 3, pp. 316–329).
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In Proceedings of the ECCV (pp. 71–84).
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2006a). Generic object recognition with boosting. Pattern Analysis and Machine Intelligence, 28(3).
Opelt, A., Pinz, A., & Zisserman, A. (2006b). Incremental learning of object detectors using a visual shape alphabet. In Proceedings of the CVPR (Vol. 1, pp. 3–10), June 2006.
Opelt, A., Pinz, A., & Zisserman, A. (2006c). A boundary-fragment-model for object detection. In Proceedings of the ECCV (Vol. 2, pp. 575–588), May 2006.
Opelt, A., Pinz, A., & Zisserman, A. (2006d). Fusing shape and appearance information for category detection. In Proceedings of the BMVC (Vol. 1, pp. 117–126), September 2006.
Quinn, P. C., Eimas, P. D., & Tarr, M. J. (2001). Perceptual categorization of cat and dog silhouettes by 3-to-4 month old infants. Journal of Experimental Child Psychology, 79(1), 78–94.
Article Google Scholar
Sali, E., & Ullman, S. (1999). Combining class-specific fragments for object classification. In Proceedings of the BMVC (Vol. 1, pp. 203–213).
Seemann, E., Leibe, B., & Schiele, B. (2006). Multi-aspect detection of articulated objects. In Proceedings of the CVPR.
Serre, T., Wolf, L., & Poggio, T. (2005). A new biologically motivated framework for robust object recognition. In Proceedings of the CVPR.
Shotton, J., Blake, A., & Cipolla, R. (2005). Contour-based learning for object detection. In Proceedings of the ICCV (Vol. 1, pp. 503–510).
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). TextonBoost: Joint appearance, shape and context modeling fdor multi-class object recognition and segmentation. In Proceedings of the ECCV (Vol. 1, pp. 1–15), May 2006.
Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In Proceedings of the ICCV.
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In Proceedings of the ICCV.
Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., & VanGool, L. (2006). Towards multi-view object detection. In Proceedings of the CVPR.
Thureson, J., & Carlsson, S. (2004). Appearance based qualitative image description for object class recognition. In Proceedings of the ECCV (pp. 518–529).
Torralba, A., Murphy, K. P., & Freeman, W. T. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In Proceedings of the CVPR.
Tu, Z. (2005). Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering. In Proceedings of the CVPR (pp. 1589–1596).
Vidal-Naquet, M., & Ullman, S. (2003). Object recognition with informative features and linear classification. In Proceedings of the ICCV (Vol. 1, pp. 281–288).
Wang, G., Zhang, Y., & FeiFei, L. (2006). Using dependent regions for object categorization in a generative framework. In Proceedings of the CVPR.
Williams, C. K. I., & Allan, M. (2006). On a connection between object localization with a generative template of features and pose-space prediction methods (Technical Report EDI-INF-RR-0719). School of Informatics, University of Edinburgh.
Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learning universal visual dictionary. In Proceedings of the ICCV (pp. 1800–1807).
Zhang, W., Yu, B., Zelinsky, G. J., & Samaras, D. (2005). Object class recognition using multiple layer boosting with heterogenous features. In Proceedings of the CVPR (pp. 66–73).
Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Electrical Measurement and Measurement Signal Processing, Graz University of Technology, Graz, Austria
Andreas Opelt & Axel Pinz
Department of Engineering Science, University of Oxford, Oxford, UK
Andrew Zisserman

Authors

Andreas Opelt
View author publications
You can also search for this author in PubMed Google Scholar
Axel Pinz
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Axel Pinz.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Opelt, A., Pinz, A. & Zisserman, A. Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection. Int J Comput Vis 80, 16–44 (2008). https://doi.org/10.1007/s11263-008-0139-3

Download citation

Received: 28 February 2007
Accepted: 04 April 2008
Published: 13 May 2008
Issue Date: October 2008
DOI: https://doi.org/10.1007/s11263-008-0139-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection

Abstract

Article PDF

Similar content being viewed by others

Object Classification from Shape Detection

Shape-Based Object Discovery in Images

Unsupervised Visual Object Categorisation with BoF and Spatial Matching

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection

Abstract

Article PDF

Similar content being viewed by others

Object Classification from Shape Detection

Shape-Based Object Discovery in Images

Unsupervised Visual Object Categorisation with BoF and Spatial Matching

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation