Advertisement

International Journal of Computer Vision

, Volume 96, Issue 1, pp 1–27 | Cite as

Fast Approximate Energy Minimization with Label Costs

  • Andrew Delong
  • Anton Osokin
  • Hossam N. Isack
  • Yuri BoykovEmail author
Article

Abstract

The α-expansion algorithm has had a significant impact in computer vision due to its generality, effectiveness, and speed. It is commonly used to minimize energies that involve unary, pairwise, and specialized higher-order terms. Our main algorithmic contribution is an extension of α-expansion that also optimizes “label costs” with well-characterized optimality bounds. Label costs penalize a solution based on the set of labels that appear in it, for example by simply penalizing the number of labels in the solution.

Our energy has a natural interpretation as minimizing description length (MDL) and sheds light on classical algorithms like K-means and expectation-maximization (EM). Label costs are useful for multi-model fitting and we demonstrate several such applications: homography detection, motion segmentation, image segmentation, and compression. Our C++ and MATLAB code is publicly available http://vision.csd.uwo.ca/code/.

Keywords

Energy minimization Multi-model fitting Metric labeling Graph cuts Minimum description length 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akaike, H. (1974). A new look at statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723. CrossRefzbMATHMathSciNetGoogle Scholar
  2. Ayed, I. B., & Mitiche, A. (2008). A region merging prior for variational level set image segmentation. IEEE Transactions on Image Processing, 17(12), 2301–2311. CrossRefMathSciNetGoogle Scholar
  3. Babayev, D. A. (1974). Comments on the note of frieze. Mathematical Programming, 7(1), 249–252. CrossRefzbMATHMathSciNetGoogle Scholar
  4. Barinova, O., Lempitsky, V., & Kohli, P. (2010). On the detection of multiple object instances using hough transforms. In IEEE conf. on computer vision and pattern recognition (CVPR), June 2010. Google Scholar
  5. Birchfield, S., & Tomasi, C. (1999). Multiway cut for stereo and motion with slanted surfaces. In International conf. on computer vision (ICCV). Google Scholar
  6. Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer. zbMATHGoogle Scholar
  7. Blake, A., & Zisserman, A. (1987). Visual reconstruction. Cambridge: MIT Press. Google Scholar
  8. Boros, E., & Hammer, P. L. (2002). Pseudo-boolean optimization. Discrete Applied Mathematics, 123(1–3), 155–225. CrossRefzbMATHMathSciNetGoogle Scholar
  9. Boykov, Y., & Kolmogorov, V. (2003). Computing geodesics and minimal surfaces via graph cuts. In International conf. on computer vision (ICCV). Google Scholar
  10. Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE transactions on pattern analysis and machine intelligence, 29(9), 1124–1137. CrossRefGoogle Scholar
  11. Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE transactions on pattern analysis and machine intelligence, 23(11), 1222–1239. CrossRefGoogle Scholar
  12. Brox, T., & Weickert, J. (2004). Level set based segmentation of multiple objects. In LNCS: Vol. 3175. Pattern recognition (pp. 415–423). CrossRefGoogle Scholar
  13. Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference. Berlin: Springer. zbMATHGoogle Scholar
  14. Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. IEEE transactions on pattern analysis and machine intelligence, 24(5), 603–619. CrossRefGoogle Scholar
  15. Cornuejols, G., Fisher, M. L., & Nemhauser, G. L. (1977). Location of bank accounts to optimize float: an analytic study of exact and approximate algorithms. Management Science, 23(8), 789–810. CrossRefzbMATHMathSciNetGoogle Scholar
  16. Cornuejols, G., Nemhauser, G. L., & Wolsey, L. A. (1983). The uncapacitated facility location problem. Technical Report 605, Op. Research, Cornell University, August. Google Scholar
  17. Dahlhaus, E., Johnson, D. S., Papadimitriou, C. H., Seymour, P. D., & Yannakakis, M. (1994). The complexity of multiterminal cuts. SIAM Journal on Computing, 23(4), 864–894. CrossRefzbMATHMathSciNetGoogle Scholar
  18. Delong, A., Osokin, A., Isack, H., & Boykov, Y. (2010). Fast approximate energy minimization with label costs. In IEEE conf. on computer vision and pattern recognition (CVPR), June 2010. Google Scholar
  19. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38. zbMATHMathSciNetGoogle Scholar
  20. Everett, H. (1963). Generalized Lagrange multiplier method for solving problems of optimum allocation of resources. Operations Research, 11(3), 399–417. CrossRefzbMATHMathSciNetGoogle Scholar
  21. Feige, U. (1998). A threshold of \(\ln n\) for approximating set cover. Journal of the ACM, 45(4), 634–652. CrossRefzbMATHMathSciNetGoogle Scholar
  22. Figueiredo, M. A., & Jain, A. K. (2002). Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 381–396. CrossRefGoogle Scholar
  23. Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395. CrossRefMathSciNetGoogle Scholar
  24. Freedman, D., & Drineas, P. (2005). Energy minimization via graph cuts: settling what is possible. In IEEE conf. on computer vision and pattern recognition (CVPR), June 2005. Google Scholar
  25. Frieze, A. M. (1974). A cost function property for plant location problems. Mathematical Programming, 7(1), 245–248. CrossRefzbMATHMathSciNetGoogle Scholar
  26. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741. CrossRefzbMATHGoogle Scholar
  27. Gersho, A., & Gray, R. M. (2001). Vector quantization and signal compression. Norwell: Kluwer Academic. Google Scholar
  28. Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press. Google Scholar
  29. Hochbaum, D. S. (1982). Heuristics for the fixed cost median problem. Mathematical Programming, 22(1), 148–162. CrossRefzbMATHMathSciNetGoogle Scholar
  30. Hoiem, D., Rother, C., & Winn, J. (2007). 3D LayoutCRF for multi-view object class recognition and segmentation. In IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  31. Isack, H. N., & Boykov, Y. (2011) Energy-based geometric multi-model fitting. International Journal of Computer Vision (IJCV). doi: 10.1007/s11263-011-0474-7
  32. Kleinberg, J., & Tardos, E. (2002). Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields. Journal of the ACM, 49(5). Google Scholar
  33. Kohli, P., Kumar, M. P., & Torr, P. H. S. (2007). \(\mathcal{P}^{3}\) & Beyond: solving energies with higher order cliques. In IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  34. Kohli, P., Ladický, L., & Torr, P. H. S. (2009). Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision, 82(3), 302–324. CrossRefGoogle Scholar
  35. Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583. CrossRefGoogle Scholar
  36. Kolmogorov, V., Boykov, Y., & Rother, C. (2007). Applications of parametric maxflow in computer vision. In International conf. on computer vision (ICCV). Google Scholar
  37. Kolmogorov, V., & Zabih, R. (2004). What energy functions can be optimized via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 147–159. CrossRefGoogle Scholar
  38. Kuehn, A. A., & Hamburger, M. J. (1963). A heuristic program for locating warehouses. Management Science, 9(4), 643–666. CrossRefGoogle Scholar
  39. Ladický, L., Russell, C., Kohli, P., & Torr, P. (2010). Graph cut based inference with co-occurrence statistics. In European conf. on computer vision (ECCV), September 2010. Google Scholar
  40. Lazic, N., Givoni, I., Frey, B., & Aarabi, P. (2009). FLoSS: facility location for subspace segmentation. In International conf. on computer vision (ICCV). Google Scholar
  41. Leclerc, Y. G. (1989). Constructing simple stable descriptions for image partitioning. International Journal of Computer Vision, 3(1), 73–102. CrossRefGoogle Scholar
  42. Li, H. (2007). Two-view motion segmentation from linear programming relaxation. In IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  43. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110. CrossRefGoogle Scholar
  44. MacKay, D. J. C. (2003). Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press. zbMATHGoogle Scholar
  45. Mitchell, T., & Beauchamp, J. (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404), 1023–1032. CrossRefzbMATHMathSciNetGoogle Scholar
  46. Nemhauser, G. L., Wolsey, L. A., & Fisher, M. L. (1978). An analysis of approximations for maximizing submodular set functions—I. Mathematical Programming, 14(1), 265–294. CrossRefzbMATHMathSciNetGoogle Scholar
  47. Ortega, A., & Ramchandran, K. (1998). Rate-distortion methods for image and video compression. IEEE Signal Processing Magazine, 15(6), 23–50. CrossRefGoogle Scholar
  48. Rother, C., Kolmogorov, V., & Blake, A. (2004). GrabCut: interactive foreground extraction using iterated graph cuts. In ACM SIGGRAPH. Google Scholar
  49. Shmoys, D. B., Tardos, E., & Aardal, K. (1998). Approximation algorithms for facility location problems (extended abstract). In ACM symposium on theory of computing (STOC) (pp. 265–274). Google Scholar
  50. Sun, M. (2005). A tabu search heuristic for the uncapacitated facility location problem. In Metaheuristic optimization via memory and evolution: Vol. 30 (pp. 191–211). Berlin: Springer. CrossRefGoogle Scholar
  51. Sung, K. K., & Poggio, T. (1995). Example based learning for view-based human face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 39–51. CrossRefGoogle Scholar
  52. Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., & Rother, C. (2008). A comparative study of energy minimization methods for Markov random fields with smoothness-based priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6), 1068–1080. CrossRefGoogle Scholar
  53. Szummer, M., Kohli, P., & Hoiem, D. (2008). Learning CRFs using graph cuts. In European conf. on computer vision (ECCV). Google Scholar
  54. Taskar, B., Chatalbashev, V., & Koller, D. (2004). Learning associative Markov networks. In International conf. on machine learning (ICML). Google Scholar
  55. Torr, P. H. S. (1998). Geometric motion segmentation and model selection. Philosophical Trans. of the Royal Society A (pp. 1321–1340). Google Scholar
  56. Tron, R., & Vidal, R. (2007). A benchmark for the comparison of 3-d motion segmentation algorithms. In IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  57. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2006). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2), 1453–1484. MathSciNetGoogle Scholar
  58. Ueda, N., Nakano, R., Ghahramani, Z., & Hinton, G. E. (2000). SMEM algorithm for mixture models. Neural Computation, 12(9), 2109–2128. CrossRefGoogle Scholar
  59. Werner, T. (2008). High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (MAP-MRF). In IEEE conf. on computer vision and pattern recognition (CVPR), June 2008. Google Scholar
  60. Woodford, O. J., Rother, C., & Kolmogorov, V. (2009). A global perspective on MAP inference for low-level vision. In International conf. on computer vision (ICCV), October 2009. Google Scholar
  61. Yuan, J., & Boykov, Y. (2010). TV-based multi-label image segmentation with label cost prior. In British machine vision conference (BMVC), Sept 2010. Google Scholar
  62. Zabih, R., & Kolmogorov, V. (2004). Spatially coherent clustering with graph cuts. In IEEE conf. on computer vision and pattern recognition (CVPR), June 2004. Google Scholar
  63. Zhu, S. C., & Yuille, A. L. (1996). Region competition: unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(9), 884–900. CrossRefGoogle Scholar
  64. Zuliani, M., Kenney, C. S., & Manjunath, B. S. (2005). The multiRANSAC algorithm and its application to detect planar homographies. In International conf. on image processing (ICIP). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Andrew Delong
    • 1
  • Anton Osokin
    • 2
  • Hossam N. Isack
    • 1
  • Yuri Boykov
    • 1
    Email author
  1. 1.Department of Computer ScienceUniversity of Western OntarioLondonCanada
  2. 2.Department of Computational Mathematics and CyberneticsMoscow State UniversityMoscowRussia

Personalised recommendations