International Journal of Computer Vision

, Volume 80, Issue 3, pp 300–316 | Cite as

Multi-Class Segmentation with Relative Location Prior

  • Stephen GouldEmail author
  • Jim Rodgers
  • David Cohen
  • Gal Elidan
  • Daphne Koller


Multi-class image segmentation has made significant advances in recent years through the combination of local and global features. One important type of global feature is that of inter-class spatial relationships. For example, identifying “tree” pixels indicates that pixels above and to the sides are more likely to be “sky” whereas pixels below are more likely to be “grass.” Incorporating such global information across the entire image and between all classes is a computational challenge as it is image-dependent, and hence, cannot be precomputed.

In this work we propose a method for capturing global information from inter-class spatial relationships and encoding it as a local feature. We employ a two-stage classification process to label all image pixels. First, we generate predictions which are used to compute a local relative location feature from learned relative location maps. In the second stage, we combine this with appearance-based features to provide a final segmentation. We compare our results to recent published results on several multi-class image segmentation databases and show that the incorporation of relative location information allows us to significantly outperform the current state-of-the-art.


Multi-class image segmentation Segmentation Relative location 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adams, N. J., & Williams, C. K. (2003). Dynamic trees for image modelling. Image and Vision Computing, 21, 865–877. CrossRefGoogle Scholar
  2. Barnard, K., Duygulu, P., Freitas, N. D., Forsyth, D., Blei, D., & Jordan, M. (2003). Matching words and pictures. Journal of Machine Learning Research, 3, 1107–1135. zbMATHCrossRefGoogle Scholar
  3. Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 1222–1239. CrossRefGoogle Scholar
  4. Carbonetto, P., de Freitas, N., & Barnard, K. (2004). A statistical model for general contextual object recognition. In ECCV. Google Scholar
  5. Criminisi, A. (2004). Microsoft research Cambridge object recognition image database (version 1.0 and 2.0).
  6. Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181. CrossRefGoogle Scholar
  7. Fink, M., & Perona, P. (2003). Mutual boosting for contextual inference. In NIPS. Google Scholar
  8. Greig, D. M., Porteous, B. T., & Seheult, A. H. (1989). Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society. Series B (Methodological), 51(2), 271–279. Google Scholar
  9. He, X., Zemel, R., & Carreira-Perpinan, M. (2004). Multiscale conditional random fields for image labelling. In CVPR. Google Scholar
  10. He, X., Zemel, R. S., & Ray, D. (2006). Learning and incorporating top-down cues in image segmentation. Berlin: Springer. Google Scholar
  11. Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2005). OBJ CUT. In CVPR. Google Scholar
  12. Kumar, S., & Hebert, M. (2005). A hierarchical field framework for unified context-based classification. In ICCV. Google Scholar
  13. Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML. Google Scholar
  14. Minka, T. P. (2003). A comparison of numerical optimizers for logistic regression (Technical Report 758). Carnegie Mellon University, Department of Statistics. Google Scholar
  15. Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: combining segmentation and recognition. In CVPR. Google Scholar
  16. Murphy, K., Torralba, A., & Freeman, W. (2003). Using the forest to see the tree: a graphical model relating features, objects and the scenes. In NIPS. Google Scholar
  17. Opelt, A., Pinz, A., & Zisserman, A. (2006). Incremental learning of object detectors using a visual shape alphabet. In CVPR. Google Scholar
  18. Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo: Morgan Kaufmann. Google Scholar
  19. Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In ICCV. Google Scholar
  20. Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In ICCV. Google Scholar
  21. Schapire, R. E., & Singer, Y. (1999). Improved boosting using confidence-rated predictions. Machine Learning, 37, 297–336. zbMATHCrossRefGoogle Scholar
  22. Schroff, F., Criminisi, A., & Zisserman, A. (2006). Single-histogram class models for image segmentation. In ICVGIP. Google Scholar
  23. Shental, N., Zomet, A., Hertz, T., & Weiss, Y. (2003). Learning and inferring image segmentations using the gbp typical cut. In ICCV. Google Scholar
  24. Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV’06. Google Scholar
  25. Singhal, A., Luo, J., & Zhu, W. (2003). Probabilistic spatial context models for scene content understanding. In CVPR. Google Scholar
  26. Sutton, C., & McCallum, A. (2005). Piecewise training of undirected models. In UAI. Google Scholar
  27. Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., & Rother, C. (2008). A comparative study of energy minimization methods for Markov random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6), 1068–1080. CrossRefGoogle Scholar
  28. Torralba, A. B., Murphy, K. P., & Freeman, W. T. (2004). Contextual models for object detection using boosted random fields. In NIPS. Google Scholar
  29. Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In ICCV. Google Scholar
  30. Winn, J., & Shotton, J. (2006). The layout consistent random field for recognizing and segmenting partially occluded objects. In CVPR. Google Scholar
  31. Yang, L., Meer, P., & Foran, D. J. (2007). Multiple class segmentation using a unified framework over mean-shift patches. In CVPR. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Stephen Gould
    • 1
    Email author
  • Jim Rodgers
    • 1
  • David Cohen
    • 1
  • Gal Elidan
    • 1
  • Daphne Koller
    • 1
  1. 1.Department of Computer ScienceStanford UniversityStanfordUSA

Personalised recommendations