Skip to main content

TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNIP,volume 3951)


This paper proposes a new approach to learning a discriminative model of object classes, incorporating appearance, shape and context information efficiently. The learned model is used for automatic visual recognition and semantic segmentation of photographs. Our discriminative model exploits novel features, based on textons, which jointly model shape and texture. Unary classification and feature selection is achieved using shared boosting to give an efficient classifier which can be applied to a large number of classes. Accurate image segmentation is achieved by incorporating these classifiers in a conditional random field. Efficient training of the model on very large datasets is achieved by exploiting both random feature selection and piecewise training methods.

High classification and segmentation accuracy are demonstrated on three different databases: i) our own 21-object class database of photographs of real objects viewed under general lighting conditions, poses and viewpoints, ii) the 7-class Corel subset and iii) the 7-class Sowerby database used in [1]. The proposed algorithm gives competitive results both for highly textured (e.g. grass, trees), highly structured (e.g. cars, faces, bikes, aeroplanes) and articulated objects (e.g. body, cow).


  • Class Label
  • Training Image
  • Object Class
  • Context Modeling
  • Conditional Random Field

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. He, X., Zemel, R.S., Carreira-Perpiñán, M.A.: Multiscale conditional random fields for image labeling. In: Proc. of IEEE CVPR (2004)

    Google Scholar 

  2. Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR 2003, vol. II, pp. 264–271 (2003)

    Google Scholar 

  3. Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondences. In: CVPR (2005)

    Google Scholar 

  4. Winn, J., Criminisi, A., Minka, T.: Categorization by learned universal visual dictionary. In: Int. Conf. of Computer Vision (2005)

    Google Scholar 

  5. Kumar, S., Herbert, M.: Discriminative fields for modeling spatial dependencies in natural images. In: NIPS (2004)

    Google Scholar 

  6. Borenstein, E., Sharon, E., Ullman, S.: Combining top-down and bottom-up segmentation. In: Proceedings IEEE workshop on Perceptual Organization in Computer Vision, CVPR (2004)

    Google Scholar 

  7. Winn, J., Jojic, N.: LOCUS: Learning Object Classes with Unsupervised Segmentation. In: Proc. of IEEE ICCV (2005)

    Google Scholar 

  8. Kumar, P., Torr, P., Zisserman, A.: Obj cut. In: Proc. of IEEE CVPR (2005)

    Google Scholar 

  9. Leibe, B., Schiele, B.: Interleaved object categorization and segmentation. In: BMVC 2003, vol. II, pp. 264–271 (2003)

    Google Scholar 

  10. Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)

    CrossRef  Google Scholar 

  11. Tu, Z., Chen, X., Yuille, A.L., Zhu, S.: Image parsing: Unifying segmentation, detection, and recognition. In: CVPR (2003)

    Google Scholar 

  12. Konishi, S., Yuille, A.L.: Statistical cues for domain specific image segmentation with performance analysis. In: CVPR (2000)

    Google Scholar 

  13. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (2001)

    Google Scholar 

  14. Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: Proc. of IEEE ICCV (2001)

    Google Scholar 

  15. Rother, C., Kolmogorov, V., Blake, A.: Interactive foreground extraction using iterated graph cuts. In: ACM Transactions on Graphics, SIGGRAPH 2004 (2004)

    Google Scholar 

  16. Sutton, C., McCallum, A.: Piecewise training of undirected models. In: 21st Conference on Uncertainty in Artificial Intelligence (2005)

    Google Scholar 

  17. Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV 43, 29–44 (2001)

    CrossRef  MATH  Google Scholar 

  18. Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. International Journal of Computer Vision: Special Issue on Texture Analysis and Synthesis 62, 61–81 (2005)

    CrossRef  Google Scholar 

  19. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR 2001, vol. I, pp. 511–518 (2001)

    Google Scholar 

  20. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. PAMI 24, 509–522 (2002)

    CrossRef  Google Scholar 

  21. Torralba, A., Murphy, K., Freeman, W.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proc. of IEEE CVPR, pp. 762–769 (2004)

    Google Scholar 

  22. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Technical report, Dept. of Statistics, Stanford University (1998)

    Google Scholar 

  23. Baluja, S., Rowley, H.A.: Boosting sex identification performance, pp. 1508–1513. AAAI Press, Menlo Park (2005)

    Google Scholar 

  24. Kumar, S., Hebert, M.: A hierarchical field framework for unified context-based classification. In: ICCV 2005, vol. II, pp. 1284–1291 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shotton, J., Winn, J., Rother, C., Criminisi, A. (2006). TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds) Computer Vision – ECCV 2006. ECCV 2006. Lecture Notes in Computer Science, vol 3951. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33832-1

  • Online ISBN: 978-3-540-33833-8

  • eBook Packages: Computer ScienceComputer Science (R0)