International Journal of Computer Vision

, Volume 82, Issue 3, pp 302–324 | Cite as

Robust Higher Order Potentials for Enforcing Label Consistency

  • Pushmeet Kohli
  • L’ubor Ladický
  • Philip H. S. Torr
Article

Abstract

This paper proposes a novel framework for labelling problems which is able to combine multiple segmentations in a principled manner. Our method is based on higher order conditional random fields and uses potentials defined on sets of pixels (image segments) generated using unsupervised segmentation algorithms. These potentials enforce label consistency in image regions and can be seen as a generalization of the commonly used pairwise contrast sensitive smoothness potentials. The higher order potential functions used in our framework take the form of the Robust Pn model and are more general than the Pn Potts model recently proposed by Kohli et al. We prove that the optimal swap and expansion moves for energy functions composed of these potentials can be computed by solving a st-mincut problem. This enables the use of powerful graph cut based move making algorithms for performing inference in the framework. We test our method on the problem of multi-class object segmentation by augmenting the conventional crf used for object segmentation with higher order potentials defined on image regions. Experiments on challenging data sets show that integration of higher order potentials quantitatively and qualitatively improves results leading to much better definition of object boundaries. We believe that this method can be used to yield similar improvements for many other labelling problems.

Keywords

Discrete energy minimization Markov and conditional random fields Object recognition and segmentation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alahari, K., Kohli, P., & Torr, P. (2008). Reduce, reuse and recycle: efficiently solving multi-label MRFs. In IEEE conference on computer vision and pattern recognition. Google Scholar
  2. Blake, A., Rother, C., Brown, M., Perez, P., & Torr, P. (2004). Interactive image segmentation using an adaptive GMMRF model. In European conference on computer vision (pp. I: 428–441). Google Scholar
  3. Borenstein, E., & Malik, J. (2006). Shape guided object segmentation. In IEEE conference on computer vision and pattern recognition (pp. 969–976). Google Scholar
  4. Boros, E., & Hammer, P. (2002). Pseudo-boolean optimization. Discrete Applied Mathematics, 123(1–3), 155–225. MATHMathSciNetGoogle Scholar
  5. Boykov, Y., & Jolly, M. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In International conference on computer vision (pp. I: 105–112). Google Scholar
  6. Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239. CrossRefGoogle Scholar
  7. Bray, M., Kohli, P., & Torr, P. (2006). Posecut: Simultaneous segmentation and 3d pose estimation of humans using dynamic graph-cuts. In European conference on computer vision (pp. 642–655). Google Scholar
  8. Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619. CrossRefGoogle Scholar
  9. Felzenszwalb, P., & Huttenlocher, D. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181. CrossRefGoogle Scholar
  10. Flach, B. (2002). Strukturelle bilderkennung (Tech. Rep.). Universit at Dresden. Google Scholar
  11. Freedman, D., & Drineas, P. (2005). Energy minimization via graph cuts: Settling what is possible. In IEEE conference on computer vision and pattern recognition (pp. 939–946). Google Scholar
  12. Fujishige, S. (1991). Submodular functions and optimization. Amsterdam: North-Holland. MATHGoogle Scholar
  13. He, X., Zemel, R., & Carreira-Perpiñán, M. (2004). Multiscale conditional random fields for image labeling. In IEEE conference on computer vision and pattern recognition (2) (pp. 695–702). Google Scholar
  14. He, X., Zemel, R., & Ray, D. (2006). Learning and incorporating top-down cues in image segmentation. In European conference on computer vision (pp. 338–351). Google Scholar
  15. Hoiem, D., Efros, A., & Hebert, M. (2005a). Automatic photo pop-up. ACM Transactions on Graphics, 24(3), 577–584. CrossRefGoogle Scholar
  16. Hoiem, D., Efros, A., & Hebert, M. (2005b). Geometric context from a single image. In International conference on computer vision (pp. 654–661). Google Scholar
  17. Huang, R., Pavlovic, V., & Metaxas, D. (2004). A graphical model framework for coupling MRFs and deformable models. In IEEE conference on computer vision and pattern recognition (Vol. 11, pp. 739–746). Google Scholar
  18. Ishikawa, H. (2003). Exact optimization for Markov random fields with convex priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 1333–1336. CrossRefGoogle Scholar
  19. Kohli, P., Kumar, M., & Torr, P. (2007). P 3 and beyond: solving energies with higher order cliques. In IEEE conference on computer vision and pattern recognition. Google Scholar
  20. Kohli, P., Ladicky, L., & Torr, P. (2008). Robust higher order potentials for enforcing label consistency. In CVPR. Google Scholar
  21. Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583. CrossRefGoogle Scholar
  22. Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 147–159. CrossRefGoogle Scholar
  23. Komodakis, N., & Tziritas, G. (2005). A new framework for approximate labeling via graph cuts. In International conference on computer vision (pp. 1018–1025). Google Scholar
  24. Komodakis, N., Tziritas, G., & Paragios, N. (2007). Fast, approximately optimal solutions for single and dynamic MRFs. In CVPR. Google Scholar
  25. Kumar, M., & Torr, P. (2008). Improved moves for truncated convex models. In Proceedings of advances in neural information processing systems. Google Scholar
  26. Kumar, M., Torr, P., & Zisserman, A. (2005). Obj cut. In IEEE conference on computer vision and pattern recognition (1) (pp. 18–25). Google Scholar
  27. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labelling sequence data. In International conference on machine learning (pp. 282–289). Google Scholar
  28. Lan, X., Roth, S., Huttenlocher, D., & Black, M. (2006). Efficient belief propagation with learned higher-order Markov random fields. In European conference on computer vision (pp. 269–282). Google Scholar
  29. Lauristzen, S. (1996). Graphical models. Oxford: Oxford University Press. Google Scholar
  30. Lempitsky, V., Rother, C., & Blake, A. (2007). Logcut—efficient graph cut optimization for Markov random fields. In ICCV. Google Scholar
  31. Levin, A., & Weiss, Y. (2006). Learning to combine bottom-up and top-down segmentation. In European conference on computer vision (pp. 581–594). Google Scholar
  32. Lovasz, L. (1983). Submodular functions and convexity. In Mathematical programming: the state of the art (pp. 235–257). Google Scholar
  33. Orlin, J. (2007). A faster strongly polynomial time algorithm for submodular function minimization. In Proceedings of integer programming and combinatorial optimization (pp. 240–251). Google Scholar
  34. Paget, R., & Longstaff, I. (1998). Texture synthesis via a noncausal nonparametric multiscale Markov random field. IEEE Transactions on Image Processing, 7(6), 925–931. CrossRefGoogle Scholar
  35. Potetz, B. (2007). Efficient belief propagation for vision using linear constraint nodes. In IEEE conference on computer vision and pattern recognition. Google Scholar
  36. Rabinovich, A., Belongie, S., Lange, T., & Buhmann, J. (2006). Model order selection and cue combination for image segmentation. In IEEE conference on computer vision and pattern recognition (1) (pp. 1130–1137). Google Scholar
  37. Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In International conference on computer vision (pp. 10–17). Google Scholar
  38. Roth, S., & Black, M. (2005). Fields of experts: A framework for learning image priors. In IEEE conference on computer vision and pattern recognition (pp. 860–867). Google Scholar
  39. Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: interactive foreground extraction using iterated graph cuts. In ACM transactions on graphics (pp. 309–314). Google Scholar
  40. Russell, B., Freeman, W., Efros, A., Sivic, J., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In IEEE conference on computer vision and pattern recognition (2) (pp. 1605–1614). Google Scholar
  41. Schlesinger, D., & Flach, B. (2006). Transforming an arbitrary minsum problem into a binary one (Tech. Rep. TUD-FI06-01). Dresden University of Technology, April 2006. Google Scholar
  42. Sharon, E., Brandt, A., & Basri, R. (2001). Segmentation and boundary detection using multiscale intensity measurements. In IEEE conference on computer vision and pattern recognition (1) (pp. 469–476). Google Scholar
  43. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905. CrossRefGoogle Scholar
  44. Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In European conference on computer vision (pp. 1–15). Google Scholar
  45. Tu, Z., & Zhu, S. (2002). Image segmentation by data-driven Markov chain Monte Carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 657–673. CrossRefGoogle Scholar
  46. Veksler, O. (2007). Graph cut based optimization for MRFs with truncated convex priors. In CVPR. Google Scholar
  47. Wainwright, M., Jaakkola, T., & Willsky, A. (2005). Map estimation via agreement on trees: message-passing and linear programming. IEEE Transactions on Information Theory, 51(11), 3697–3717. CrossRefMathSciNetGoogle Scholar
  48. Wang, J., Bhat, P., Colburn, A., Agrawala, M., & Cohen, M. (2005). Interactive video cutout. ACM Transactions on Graphics, 24(3), 585–594. CrossRefGoogle Scholar
  49. Yedidia, J., Freeman, W., & Weiss, Y. (2000). Generalized belief propagation. In NIPS (pp. 689–695). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Pushmeet Kohli
    • 1
  • L’ubor Ladický
    • 2
  • Philip H. S. Torr
    • 2
  1. 1.Microsoft ResearchCambridgeUK
  2. 2.Oxford Brookes UniversityOxfordUK

Personalised recommendations