Skip to main content
Log in

Learning to Combine Bottom-Up and Top-Down Segmentation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Bottom-up segmentation based only on low-level cues is a notoriously difficult problem. This difficulty has lead to recent top-down segmentation algorithms that are based on class-specific image information. Despite the success of top-down algorithms, they often give coarse segmentations that can be significantly refined using low-level cues. This raises the question of how to combine both top-down and bottom-up cues in a principled manner.

In this paper we approach this problem using supervised learning. Given a training set of ground truth segmentations we train a fragment-based segmentation algorithm which takes into account both bottom-up and top-down cues simultaneously, in contrast to most existing algorithms which train top-down and bottom-up modules separately. We formulate the problem in the framework of Conditional Random Fields (CRF) and derive a feature induction algorithm for CRF, which allows us to efficiently search over thousands of candidate fragments. Whereas pure top-down algorithms often require hundreds of fragments, our simultaneous learning procedure yields algorithms with a handful of fragments that are combined with low-level cues to efficiently compute high quality segmentations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Barbu, A., & Zhu, S. C. (2003). Graph partition by swendsen-wang cut. In Proceedings of the IEEE international conference on computer vision.

  • Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In Proc. of the European conf. on comput. vision, May 2002.

  • Borenstein, E., Sharon, E., & Ullman, S. (2004). Combining top-down and bottom-up segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshop on perceptual organization in computer vision, June 2004.

  • Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • He, X., Zemel, R., & Carreira-Perpinan, M. (2004). Multiscale conditional random fields for image labeling. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  • He, X., & Zemel, R. S. Debajyoti, R., (2006). Learning and incorporating top-down cues in image segmentation. In ECCV.

  • Kumar, S., & Hebert, M. (2003). Discriminative random fields: A discriminative framework for contextual interaction in classification. In Proceedings of the IEEE international conference on computer vision.

  • Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2004). Objcut. In Proceedings of the IEEE Conference on computer vision and pattern recognition.

  • Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. 18th international conf. on machine learning (pp. 282–289). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Lafferty, J., Zhu, X., & Liu, Y. (2004). Kernel conditional random fields: Representation and clique selection. In ICML.

  • LeCun, Y., & Huang, F. J. (2005). Loss functions for discriminative training of energy-based models. In Proc. of the 10th international workshop on artificial intelligence and statistics (AIStats’05).

  • Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Proceedings of the workshop on statistical learning in computer vision, Prague, Czech Republic, May 2004.

  • Malik, J., Belongie, S., Leung, T., & Shi, J. (2000). Contour and texture analysis for image segmentation. In K. L. Boyer & S. Sarkar (Eds.), Perceptual organization for artificial vision systems. Dordrecht: Kluwer Academic.

    Google Scholar 

  • McCallum, A. (2004). Efficiently inducing features of conditional random fields. In UAI.

  • Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In NIPS.

  • Ren, X., Fowlkes, C., & Malik, J. (2005). Cue integration in figure/ground labeling. In Advances in neural information processing systems (NIPS).

  • Sharon, E., Brandt, A., & Basri, R. (2001). Segmentation and boundary detection using multiscale intensity measurements. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  • Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV.

  • Tu, Z. W., Chen, X. R., Yuille, A. L., & Zhu, S. C. (2003). Image parsing: segmentation, detection, and recognition. In Proceedings of the IEEE international conference on computer vision.

  • Wainwright, M. J., Jaakkola, T., & Willsky, A. S. (2003). Tree-reweighted belief propagation and approximate ml estimation by pseudo-moment matching. In 9th workshop on artificial intelligence and statistics.

  • Winn, J., & Jojic, N. (2005). Locus: Learning object classes with unsupervised segmentation. In Proc. int’l conf. comput. vision.

  • Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2005). Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51, 2282–2312.

    Article  MathSciNet  Google Scholar 

  • Yu, S. X., & Shi, J. (2003). Object-specific figure-ground segregation. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  • Yuille, A., & Hallinan, P. (2002). Deformable templates. In A. Blake & A. Yuille (Eds.), Active vision. Cambridge: MIT Press.

    Google Scholar 

  • Zhu, S. C., Wu, Z. N., & Mumford, D. (1997). Minimax entropy principle and its application to texture modeling. Neural Computation, 9(8), 1627–1660.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Levin, A., Weiss, Y. Learning to Combine Bottom-Up and Top-Down Segmentation. Int J Comput Vis 81, 105–118 (2009). https://doi.org/10.1007/s11263-008-0166-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-008-0166-0

Keywords

Navigation