Advertisement

International Journal of Computer Vision

, Volume 80, Issue 1, pp 16–44 | Cite as

Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection

  • Andreas Opelt
  • Axel Pinz
  • Andrew Zisserman
Open Access
Article

Abstract

We present a novel algorithmic approach to object categorization and detection that can learn category specific detectors, using Boosting, from a visual alphabet of shape and appearance. The alphabet itself is learnt incrementally during this process. The resulting representation consists of a set of category-specific descriptors—basic shape features are represented by boundary-fragments, and appearance is represented by patches—where each descriptor in combination with centroid vectors for possible object centroids (geometry) forms an alphabet entry. Our experimental results highlight several qualities of this novel representation. First, we demonstrate the power of purely shape-based representation with excellent categorization and detection results using a Boundary-Fragment-Model (BFM), and investigate the capabilities of such a model to handle changes in scale and viewpoint, as well as intra- and inter-class variability. Second, we show that incremental learning of a BFM for many categories leads to a sub-linear growth of visual alphabet entries by sharing of shape features, while this generalization over categories at the same time often improves categorization performance (over independently learning the categories). Finally, the combination of basic shape and appearance (boundary-fragments and patches) features can further improve results. Certain feature types are preferred by certain categories, and for some categories we achieve the lowest error rates that have been reported so far.

Keywords

Generic object recognition Object categorization Category representation Visual alphabet Boosting 

References

  1. Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490. CrossRefGoogle Scholar
  2. Amit, Y., German, D., & Fan, X. (2004). A coarse-to-fine strategy for multi-class shape detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1606–1621. CrossRefGoogle Scholar
  3. Amores, J., Sebe, N., & Radeva, P. (2005). Fast spatial pattern discovery integrating boosting with constellations of contextual descriptors. In Proceedings of the CVPR (Vol. 2, pp. 769–774), CA, USA, June 2005. Google Scholar
  4. Bar-Hillel, A., Hertz, T., & Weinshall, D. (2005). Object class recognition by boosting a part-based model. In Proceedings of the CVPR (Vol. 1, pp. 702–709), June 2005. Google Scholar
  5. Bart, E., & Ullman, S. (2005). Cross-generalization:learning novel classes from a single example by feature replacement. In Proceedings of the CVPR (Vol. 1, pp. 672–679). Google Scholar
  6. Bernstein, E. J., & Amit, Y. (2005). Part-based statistical models for object classification and detection. In Proceedings of the CVPR (Vol. 2, pp. 734–740). Google Scholar
  7. Borgefors, G. (1988). Hierarchical chamfer matching: a parametric edge matching algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(6), 849–865. CrossRefGoogle Scholar
  8. Breu, H., Gil, J., Kirkpatrick, D., & Werman, M. (1995). Linear time Euclidean distance transform algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5), 529–533. CrossRefGoogle Scholar
  9. Caputo, B., Wallraven, C., & Nilsback, M. E. (2004). Object categorization via local kernels. In Proceedings of the ICPR (Vol. 2, pp. 132–135). Google Scholar
  10. Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach towards feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619. CrossRefGoogle Scholar
  11. Crandall, D., Felzenszwalb, P., & Huttenlocher, D. (2005). Spatial priors for part-based recognition using statistical models. In Proceedings of the CVPR (pp. 10–17). Google Scholar
  12. Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In ECCV’04: workshop on statistical learning in computer vision (pp. 59–74). Google Scholar
  13. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the CVPR (Vol. 1, pp. 886–893). Google Scholar
  14. Deselaers, T., Keysers, D., & Ney, H. (2005). Discriminative training for object recognition using images patches. In Proceedings of the CVPR (Vol. 2, pp. 157–162). Google Scholar
  15. Epstein, B., & Ullman, S. (2005). Feature hierarchies for object classification. In Proceedings of the ICCV (Vol. 1, pp. 220–227). Google Scholar
  16. Everingham, M., Zisserman, A., Williams, C., Van Gool, L., Allan, M., Bishop, C., Chapelle, O., Dalal, N., Deselaers, T., Dorko, G., Duffner, S., Eichhorn, J., Farquhar, J., Fritz, M., Garcia, C., Griffiths, T., Jurie, F., Keysers, D., Koskela, M., Laaksonen, J., Larlus, D., Leibe, B., Meng, H., Ney, H., Schiele, B., Schmid, C., Seemann, E., Shawe-Taylor, J., Storkey, A., Szedmak, S., Triggs, B., Ulusoy, I., Viitaniemi, V., & Zhang, J. (2005). The 2005 pascal visual object classes challenge. In Lecture notes in artificial intelligence. Selected proceedings of the first PASCAL challenges workshop. Berlin: Springer. Google Scholar
  17. Fan, X. (2005). Efficient multiclass object detection by a hierarchy of classifiers. In Proceedings of the CVPR (Vol. 1, pp. 716–723). Google Scholar
  18. Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In Proceedings of the CVPR workshop on generative-model based vision. Google Scholar
  19. Felzenszwalb, P., & Huttenlocher, D. (2004). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79. CrossRefGoogle Scholar
  20. Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the CVPR (pp. 264–271). Google Scholar
  21. Fergus, R., Perona, P., & Zisserman, A. (2004). A visual category filter for Google images. In Proceedings of the ECCV (pp. 242–256). Google Scholar
  22. Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In Proceedings of the CVPR (Vol. 1, pp. 380–387). Google Scholar
  23. Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. International Journal of Computer Vision, 71(3), 273–303. CrossRefGoogle Scholar
  24. Ferrari, V., Tuytelaars, T., & Van Gool, L. (2004). Simultaneous object recognition and segmentation by image exploration. In Proceedings of the ECCV (pp. 40–54). Google Scholar
  25. Ferrari, V., Tuytelaars, T., & Van Gool, L. (2006). Object detection by contour segment networks. In Proceedings of the ECCV (Vol. 3, pp. 14–28). Google Scholar
  26. Freund, Y., & Schapire, R. (1997). A decision theoretic generalisation of online learning. Computer and System Sciences, 55(1), 119–139. zbMATHCrossRefMathSciNetGoogle Scholar
  27. Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: a statistical view of boosting (Technical report). Stanford University, Department of Statistics, California 94305. Google Scholar
  28. Gavrila, D. M., & Philomin, V. (1999). Real-time object detection for smart vehicles. In Proceedings of the ICCV (pp. 87–93). Google Scholar
  29. Jurie, F., & Schmid, C. (2004). Scale-invariant shape features for recognition of object categories. In Proceedings of conference on vision and pattern recognition (pp. 90–96). Google Scholar
  30. Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2004). Extending pictural structures for object recognition. In Proceedings of the BMVC. Google Scholar
  31. Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV’04: workshop on statistical learning in computer vision (pp. 17–32), May 2004. Google Scholar
  32. Leibe, B., & Schiele, B. (2004). Scale-invariant object categorization using a scale-adaptive means-shift search. In DAGM’04 (pp. 145–153), August 2004. Google Scholar
  33. Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Proceedings of the ICCV (pp. 1150–1157). Google Scholar
  34. Magee, D., & Boyle, R. (2002). Detection of lameness using re-sampling condensation and multi-steam cyclic hidden Markov models. Image and Vision Computing, 20(8), 581–594. CrossRefGoogle Scholar
  35. Marszalek, M., & Schmid, C. (2006). Spatial weighting for bag-of-features. In Proceedings of the CVPR. Google Scholar
  36. Mikolajczyk, K., Leibe, B., & Schiele, B. (2006). Multiple object class detection with a generative model. In Proceedings of the CVPR. Google Scholar
  37. Mutch, J., & Lowe, D. (2006). Multiclass object recognition with sparse, localized features. In Proceedings of the CVPR. Google Scholar
  38. Nistér, D., & Stewénius, H. (2006). Scalable recognition with a vocabulary tree. In Proceedings of the CVPR. Google Scholar
  39. Ommer, B., & Buhmann, J. M. (2006). Learning compositional categorization models. In Proceedings of the ECCV (Vol. 3, pp. 316–329). Google Scholar
  40. Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In Proceedings of the ECCV (pp. 71–84). Google Scholar
  41. Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2006a). Generic object recognition with boosting. Pattern Analysis and Machine Intelligence, 28(3). Google Scholar
  42. Opelt, A., Pinz, A., & Zisserman, A. (2006b). Incremental learning of object detectors using a visual shape alphabet. In Proceedings of the CVPR (Vol. 1, pp. 3–10), June 2006. Google Scholar
  43. Opelt, A., Pinz, A., & Zisserman, A. (2006c). A boundary-fragment-model for object detection. In Proceedings of the ECCV (Vol. 2, pp. 575–588), May 2006. Google Scholar
  44. Opelt, A., Pinz, A., & Zisserman, A. (2006d). Fusing shape and appearance information for category detection. In Proceedings of the BMVC (Vol. 1, pp. 117–126), September 2006. Google Scholar
  45. Quinn, P. C., Eimas, P. D., & Tarr, M. J. (2001). Perceptual categorization of cat and dog silhouettes by 3-to-4 month old infants. Journal of Experimental Child Psychology, 79(1), 78–94. CrossRefGoogle Scholar
  46. Sali, E., & Ullman, S. (1999). Combining class-specific fragments for object classification. In Proceedings of the BMVC (Vol. 1, pp. 203–213). Google Scholar
  47. Seemann, E., Leibe, B., & Schiele, B. (2006). Multi-aspect detection of articulated objects. In Proceedings of the CVPR. Google Scholar
  48. Serre, T., Wolf, L., & Poggio, T. (2005). A new biologically motivated framework for robust object recognition. In Proceedings of the CVPR. Google Scholar
  49. Shotton, J., Blake, A., & Cipolla, R. (2005). Contour-based learning for object detection. In Proceedings of the ICCV (Vol. 1, pp. 503–510). Google Scholar
  50. Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). TextonBoost: Joint appearance, shape and context modeling fdor multi-class object recognition and segmentation. In Proceedings of the ECCV (Vol. 1, pp. 1–15), May 2006. Google Scholar
  51. Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In Proceedings of the ICCV. Google Scholar
  52. Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In Proceedings of the ICCV. Google Scholar
  53. Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., & VanGool, L. (2006). Towards multi-view object detection. In Proceedings of the CVPR. Google Scholar
  54. Thureson, J., & Carlsson, S. (2004). Appearance based qualitative image description for object class recognition. In Proceedings of the ECCV (pp. 518–529). Google Scholar
  55. Torralba, A., Murphy, K. P., & Freeman, W. T. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In Proceedings of the CVPR. Google Scholar
  56. Tu, Z. (2005). Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering. In Proceedings of the CVPR (pp. 1589–1596). Google Scholar
  57. Vidal-Naquet, M., & Ullman, S. (2003). Object recognition with informative features and linear classification. In Proceedings of the ICCV (Vol. 1, pp. 281–288). Google Scholar
  58. Wang, G., Zhang, Y., & FeiFei, L. (2006). Using dependent regions for object categorization in a generative framework. In Proceedings of the CVPR. Google Scholar
  59. Williams, C. K. I., & Allan, M. (2006). On a connection between object localization with a generative template of features and pose-space prediction methods (Technical Report EDI-INF-RR-0719). School of Informatics, University of Edinburgh. Google Scholar
  60. Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learning universal visual dictionary. In Proceedings of the ICCV (pp. 1800–1807). Google Scholar
  61. Zhang, W., Yu, B., Zelinsky, G. J., & Samaras, D. (2005). Object class recognition using multiple layer boosting with heterogenous features. In Proceedings of the CVPR (pp. 66–73). Google Scholar
  62. Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238. CrossRefGoogle Scholar

Copyright information

© The Author(s) 2008

Authors and Affiliations

  1. 1.Institute of Electrical Measurement and Measurement Signal ProcessingGraz University of TechnologyGrazAustria
  2. 2.Department of Engineering ScienceUniversity of OxfordOxfordUK

Personalised recommendations