Abstract
We present a novel algorithmic approach to object categorization and detection that can learn category specific detectors, using Boosting, from a visual alphabet of shape and appearance. The alphabet itself is learnt incrementally during this process. The resulting representation consists of a set of category-specific descriptors—basic shape features are represented by boundary-fragments, and appearance is represented by patches—where each descriptor in combination with centroid vectors for possible object centroids (geometry) forms an alphabet entry. Our experimental results highlight several qualities of this novel representation. First, we demonstrate the power of purely shape-based representation with excellent categorization and detection results using a Boundary-Fragment-Model (BFM), and investigate the capabilities of such a model to handle changes in scale and viewpoint, as well as intra- and inter-class variability. Second, we show that incremental learning of a BFM for many categories leads to a sub-linear growth of visual alphabet entries by sharing of shape features, while this generalization over categories at the same time often improves categorization performance (over independently learning the categories). Finally, the combination of basic shape and appearance (boundary-fragments and patches) features can further improve results. Certain feature types are preferred by certain categories, and for some categories we achieve the lowest error rates that have been reported so far.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.
Amit, Y., German, D., & Fan, X. (2004). A coarse-to-fine strategy for multi-class shape detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1606–1621.
Amores, J., Sebe, N., & Radeva, P. (2005). Fast spatial pattern discovery integrating boosting with constellations of contextual descriptors. In Proceedings of the CVPR (Vol. 2, pp. 769–774), CA, USA, June 2005.
Bar-Hillel, A., Hertz, T., & Weinshall, D. (2005). Object class recognition by boosting a part-based model. In Proceedings of the CVPR (Vol. 1, pp. 702–709), June 2005.
Bart, E., & Ullman, S. (2005). Cross-generalization:learning novel classes from a single example by feature replacement. In Proceedings of the CVPR (Vol. 1, pp. 672–679).
Bernstein, E. J., & Amit, Y. (2005). Part-based statistical models for object classification and detection. In Proceedings of the CVPR (Vol. 2, pp. 734–740).
Borgefors, G. (1988). Hierarchical chamfer matching: a parametric edge matching algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(6), 849–865.
Breu, H., Gil, J., Kirkpatrick, D., & Werman, M. (1995). Linear time Euclidean distance transform algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5), 529–533.
Caputo, B., Wallraven, C., & Nilsback, M. E. (2004). Object categorization via local kernels. In Proceedings of the ICPR (Vol. 2, pp. 132–135).
Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach towards feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Crandall, D., Felzenszwalb, P., & Huttenlocher, D. (2005). Spatial priors for part-based recognition using statistical models. In Proceedings of the CVPR (pp. 10–17).
Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In ECCV’04: workshop on statistical learning in computer vision (pp. 59–74).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the CVPR (Vol. 1, pp. 886–893).
Deselaers, T., Keysers, D., & Ney, H. (2005). Discriminative training for object recognition using images patches. In Proceedings of the CVPR (Vol. 2, pp. 157–162).
Epstein, B., & Ullman, S. (2005). Feature hierarchies for object classification. In Proceedings of the ICCV (Vol. 1, pp. 220–227).
Everingham, M., Zisserman, A., Williams, C., Van Gool, L., Allan, M., Bishop, C., Chapelle, O., Dalal, N., Deselaers, T., Dorko, G., Duffner, S., Eichhorn, J., Farquhar, J., Fritz, M., Garcia, C., Griffiths, T., Jurie, F., Keysers, D., Koskela, M., Laaksonen, J., Larlus, D., Leibe, B., Meng, H., Ney, H., Schiele, B., Schmid, C., Seemann, E., Shawe-Taylor, J., Storkey, A., Szedmak, S., Triggs, B., Ulusoy, I., Viitaniemi, V., & Zhang, J. (2005). The 2005 pascal visual object classes challenge. In Lecture notes in artificial intelligence. Selected proceedings of the first PASCAL challenges workshop. Berlin: Springer.
Fan, X. (2005). Efficient multiclass object detection by a hierarchy of classifiers. In Proceedings of the CVPR (Vol. 1, pp. 716–723).
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In Proceedings of the CVPR workshop on generative-model based vision.
Felzenszwalb, P., & Huttenlocher, D. (2004). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the CVPR (pp. 264–271).
Fergus, R., Perona, P., & Zisserman, A. (2004). A visual category filter for Google images. In Proceedings of the ECCV (pp. 242–256).
Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In Proceedings of the CVPR (Vol. 1, pp. 380–387).
Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. International Journal of Computer Vision, 71(3), 273–303.
Ferrari, V., Tuytelaars, T., & Van Gool, L. (2004). Simultaneous object recognition and segmentation by image exploration. In Proceedings of the ECCV (pp. 40–54).
Ferrari, V., Tuytelaars, T., & Van Gool, L. (2006). Object detection by contour segment networks. In Proceedings of the ECCV (Vol. 3, pp. 14–28).
Freund, Y., & Schapire, R. (1997). A decision theoretic generalisation of online learning. Computer and System Sciences, 55(1), 119–139.
Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: a statistical view of boosting (Technical report). Stanford University, Department of Statistics, California 94305.
Gavrila, D. M., & Philomin, V. (1999). Real-time object detection for smart vehicles. In Proceedings of the ICCV (pp. 87–93).
Jurie, F., & Schmid, C. (2004). Scale-invariant shape features for recognition of object categories. In Proceedings of conference on vision and pattern recognition (pp. 90–96).
Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2004). Extending pictural structures for object recognition. In Proceedings of the BMVC.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV’04: workshop on statistical learning in computer vision (pp. 17–32), May 2004.
Leibe, B., & Schiele, B. (2004). Scale-invariant object categorization using a scale-adaptive means-shift search. In DAGM’04 (pp. 145–153), August 2004.
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Proceedings of the ICCV (pp. 1150–1157).
Magee, D., & Boyle, R. (2002). Detection of lameness using re-sampling condensation and multi-steam cyclic hidden Markov models. Image and Vision Computing, 20(8), 581–594.
Marszalek, M., & Schmid, C. (2006). Spatial weighting for bag-of-features. In Proceedings of the CVPR.
Mikolajczyk, K., Leibe, B., & Schiele, B. (2006). Multiple object class detection with a generative model. In Proceedings of the CVPR.
Mutch, J., & Lowe, D. (2006). Multiclass object recognition with sparse, localized features. In Proceedings of the CVPR.
Nistér, D., & Stewénius, H. (2006). Scalable recognition with a vocabulary tree. In Proceedings of the CVPR.
Ommer, B., & Buhmann, J. M. (2006). Learning compositional categorization models. In Proceedings of the ECCV (Vol. 3, pp. 316–329).
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In Proceedings of the ECCV (pp. 71–84).
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2006a). Generic object recognition with boosting. Pattern Analysis and Machine Intelligence, 28(3).
Opelt, A., Pinz, A., & Zisserman, A. (2006b). Incremental learning of object detectors using a visual shape alphabet. In Proceedings of the CVPR (Vol. 1, pp. 3–10), June 2006.
Opelt, A., Pinz, A., & Zisserman, A. (2006c). A boundary-fragment-model for object detection. In Proceedings of the ECCV (Vol. 2, pp. 575–588), May 2006.
Opelt, A., Pinz, A., & Zisserman, A. (2006d). Fusing shape and appearance information for category detection. In Proceedings of the BMVC (Vol. 1, pp. 117–126), September 2006.
Quinn, P. C., Eimas, P. D., & Tarr, M. J. (2001). Perceptual categorization of cat and dog silhouettes by 3-to-4 month old infants. Journal of Experimental Child Psychology, 79(1), 78–94.
Sali, E., & Ullman, S. (1999). Combining class-specific fragments for object classification. In Proceedings of the BMVC (Vol. 1, pp. 203–213).
Seemann, E., Leibe, B., & Schiele, B. (2006). Multi-aspect detection of articulated objects. In Proceedings of the CVPR.
Serre, T., Wolf, L., & Poggio, T. (2005). A new biologically motivated framework for robust object recognition. In Proceedings of the CVPR.
Shotton, J., Blake, A., & Cipolla, R. (2005). Contour-based learning for object detection. In Proceedings of the ICCV (Vol. 1, pp. 503–510).
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). TextonBoost: Joint appearance, shape and context modeling fdor multi-class object recognition and segmentation. In Proceedings of the ECCV (Vol. 1, pp. 1–15), May 2006.
Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In Proceedings of the ICCV.
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In Proceedings of the ICCV.
Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., & VanGool, L. (2006). Towards multi-view object detection. In Proceedings of the CVPR.
Thureson, J., & Carlsson, S. (2004). Appearance based qualitative image description for object class recognition. In Proceedings of the ECCV (pp. 518–529).
Torralba, A., Murphy, K. P., & Freeman, W. T. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In Proceedings of the CVPR.
Tu, Z. (2005). Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering. In Proceedings of the CVPR (pp. 1589–1596).
Vidal-Naquet, M., & Ullman, S. (2003). Object recognition with informative features and linear classification. In Proceedings of the ICCV (Vol. 1, pp. 281–288).
Wang, G., Zhang, Y., & FeiFei, L. (2006). Using dependent regions for object categorization in a generative framework. In Proceedings of the CVPR.
Williams, C. K. I., & Allan, M. (2006). On a connection between object localization with a generative template of features and pose-space prediction methods (Technical Report EDI-INF-RR-0719). School of Informatics, University of Edinburgh.
Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learning universal visual dictionary. In Proceedings of the ICCV (pp. 1800–1807).
Zhang, W., Yu, B., Zelinsky, G. J., & Samaras, D. (2005). Object class recognition using multiple layer boosting with heterogenous features. In Proceedings of the CVPR (pp. 66–73).
Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Opelt, A., Pinz, A. & Zisserman, A. Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection. Int J Comput Vis 80, 16–44 (2008). https://doi.org/10.1007/s11263-008-0139-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-008-0139-3