TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3951)


This paper proposes a new approach to learning a discriminative model of object classes, incorporating appearance, shape and context information efficiently. The learned model is used for automatic visual recognition and semantic segmentation of photographs. Our discriminative model exploits novel features, based on textons, which jointly model shape and texture. Unary classification and feature selection is achieved using shared boosting to give an efficient classifier which can be applied to a large number of classes. Accurate image segmentation is achieved by incorporating these classifiers in a conditional random field. Efficient training of the model on very large datasets is achieved by exploiting both random feature selection and piecewise training methods.

High classification and segmentation accuracy are demonstrated on three different databases: i) our own 21-object class database of photographs of real objects viewed under general lighting conditions, poses and viewpoints, ii) the 7-class Corel subset and iii) the 7-class Sowerby database used in [1]. The proposed algorithm gives competitive results both for highly textured (e.g. grass, trees), highly structured (e.g. cars, faces, bikes, aeroplanes) and articulated objects (e.g. body, cow).


Class Label Training Image Object Class Context Modeling Conditional Random Field 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    He, X., Zemel, R.S., Carreira-Perpiñán, M.A.: Multiscale conditional random fields for image labeling. In: Proc. of IEEE CVPR (2004)Google Scholar
  2. 2.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR 2003, vol. II, pp. 264–271 (2003)Google Scholar
  3. 3.
    Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondences. In: CVPR (2005)Google Scholar
  4. 4.
    Winn, J., Criminisi, A., Minka, T.: Categorization by learned universal visual dictionary. In: Int. Conf. of Computer Vision (2005)Google Scholar
  5. 5.
    Kumar, S., Herbert, M.: Discriminative fields for modeling spatial dependencies in natural images. In: NIPS (2004)Google Scholar
  6. 6.
    Borenstein, E., Sharon, E., Ullman, S.: Combining top-down and bottom-up segmentation. In: Proceedings IEEE workshop on Perceptual Organization in Computer Vision, CVPR (2004)Google Scholar
  7. 7.
    Winn, J., Jojic, N.: LOCUS: Learning Object Classes with Unsupervised Segmentation. In: Proc. of IEEE ICCV (2005)Google Scholar
  8. 8.
    Kumar, P., Torr, P., Zisserman, A.: Obj cut. In: Proc. of IEEE CVPR (2005)Google Scholar
  9. 9.
    Leibe, B., Schiele, B.: Interleaved object categorization and segmentation. In: BMVC 2003, vol. II, pp. 264–271 (2003)Google Scholar
  10. 10.
    Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Tu, Z., Chen, X., Yuille, A.L., Zhu, S.: Image parsing: Unifying segmentation, detection, and recognition. In: CVPR (2003)Google Scholar
  12. 12.
    Konishi, S., Yuille, A.L.: Statistical cues for domain specific image segmentation with performance analysis. In: CVPR (2000)Google Scholar
  13. 13.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (2001)Google Scholar
  14. 14.
    Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: Proc. of IEEE ICCV (2001)Google Scholar
  15. 15.
    Rother, C., Kolmogorov, V., Blake, A.: Interactive foreground extraction using iterated graph cuts. In: ACM Transactions on Graphics, SIGGRAPH 2004 (2004)Google Scholar
  16. 16.
    Sutton, C., McCallum, A.: Piecewise training of undirected models. In: 21st Conference on Uncertainty in Artificial Intelligence (2005)Google Scholar
  17. 17.
    Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV 43, 29–44 (2001)CrossRefzbMATHGoogle Scholar
  18. 18.
    Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. International Journal of Computer Vision: Special Issue on Texture Analysis and Synthesis 62, 61–81 (2005)CrossRefGoogle Scholar
  19. 19.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR 2001, vol. I, pp. 511–518 (2001)Google Scholar
  20. 20.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. PAMI 24, 509–522 (2002)CrossRefGoogle Scholar
  21. 21.
    Torralba, A., Murphy, K., Freeman, W.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proc. of IEEE CVPR, pp. 762–769 (2004)Google Scholar
  22. 22.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Technical report, Dept. of Statistics, Stanford University (1998)Google Scholar
  23. 23.
    Baluja, S., Rowley, H.A.: Boosting sex identification performance, pp. 1508–1513. AAAI Press, Menlo Park (2005)Google Scholar
  24. 24.
    Kumar, S., Hebert, M.: A hierarchical field framework for unified context-based classification. In: ICCV 2005, vol. II, pp. 1284–1291 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  1. 1.Microsoft Research Ltd.CambridgeUK
  2. 2.Department of EngineeringUniversity of CambridgeUK

Personalised recommendations