Monocular Extraction of 2.1D Sketch Using Constrained Convex Optimization

Abstract

This paper presents an approach to estimating the 2.1D sketch from monocular, low-level visual cues. We use a low-level segmenter to partition the image into regions, and, then, estimate their 2.1D sketch, subject to figure-ground and similarity constraints between neighboring regions. The 2.1D sketch assigns a depth ordering to image regions which are expected to correspond to objects and surfaces in the scene. This is cast as a constrained convex optimization problem, and solved within the optimization transfer framework. The optimization objective takes into account the curvature and convexity of parts of region boundaries, appearance, and spatial layout properties of regions. Our new optimization transfer algorithm admits a closed-form expression of the duality gap, and thus allows explicit computation of the achieved accuracy. The algorithm is efficient with quadratic complexity in the number of constraints between image regions. Quantitative and qualitative results on challenging, real-world images of Berkeley segmentation, Geometric Context, and Stanford Make3D datasets demonstrate our high accuracy, efficiency, and robustness.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Notes

  1. 1.

    A region-subregion pair may not correspond to an object and its part (e.g., sky seen through tree branches). In this case, even human vision could not correctly identify their depths, without resorting to higher-level semantic cues, which are beyond our scope.

  2. 2.

    Note that Newton’s and other Hessian based methods would not be better alternatives to the gradient descent in our case, due to a large number of unknown variables in \(\varvec{d}\) (on the order of \(10^2\)).

  3. 3.

    http://cvxr.com/cvx/.

  4. 4.

    Since we do not have ground truth annotations for the AL relations, we use the computed ones for GT(FG-AL).

References

  1. Adelson, E.H. (1995). Layered representation for vision and video. In: Proceedings IEEE Workshop on Representation of Visual Scenes.

  2. Afonso, M., Bioucas-Dias, J., & Figueiredo, M. (2010). Fast image recovery using variable splitting and constrained optimization. IEEE Transactions on Image Processing, 19(9), 2345–2356.

    Article  MathSciNet  Google Scholar 

  3. Ahuja, N., & Todorovic, S. (2008). Connected segmentation tree—a joint representation of region layout and hierarchy. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.

  4. Alon, N. (2006). Ranking tournaments. SIAM J. Discrete Math., 20, 137–142.

    Article  MATH  MathSciNet  Google Scholar 

  5. Amer, M., Raich, R., & Todorovic, S. (2010). Monocular extraction of 2.1D sketch. In: Proceedings of the International Conference Image Processing ICIP.

  6. Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE TPAMI, 33, 898–916.

    Article  Google Scholar 

  7. Bar-Noy, A., & Naor, J. (1990). Sorting, minimal feedback sets, and hamilton paths in tournaments. SIAM Journal on Discrete Mathematics, 3(1), 7–20.

    Article  MATH  MathSciNet  Google Scholar 

  8. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.

    Google Scholar 

  9. Charbit, P., Thomasse, S., & Yeo, A. (2007). The minimum feedback arc set problem is np-hard for tournaments. Combinatorics, Probability & Computing, 16, 1–4.

    Article  MATH  MathSciNet  Google Scholar 

  10. Comaniciu, D., & Meer, P. (2002). Meanshift: A robust approach toward feature space analysis. IEEE TPAMI, 24(5), 603–619.

    Article  Google Scholar 

  11. Darrell, T., & Pentland, A. (1995). Cooperative robust estimation using layers of support. IEEE TPAMI, 17(5), 474–487.

    Article  Google Scholar 

  12. Darrell, T., & Wohn, K. (1988). Pyramid based depth from focus. In: Proceedings of the Conference on CVPR’88 Computer Society.

  13. Dimiccoli, M., & Salembier, P. (2009). Hierarchical region-based representation for segmentation and filtering with depth in single images. In: IEEE International Conference on Image Processing ICIP.

  14. Esedoglu, S., & March, R. (2003). Segmentation with depth but without detecting junctions. Journal of Mathematical Imaging and Vision, 18, 7–15.

    Article  MATH  MathSciNet  Google Scholar 

  15. Favaro, P., Soatto, S., Burger, M., & Osher, S. (2008). Shape from defocus via diffusion. IEEE TPAMI, 30(3), 518–531.

    Article  Google Scholar 

  16. Fowlkes, C. C., Martin, D. R., & Malik, J. (2007). Local figure-ground cues are valid for natural images. Journal of Vision, 7(8), 2.

    Article  Google Scholar 

  17. Fragkiadaki, K., & Shi, J. (2010). Figure-ground image segmentation helps weakly-supervised learning of objects. In: Proceedings of the European Conference on Computer Vision ECCV.

  18. Gao, R., Wu, T., Zhu, S., & Sang, N. (2007). Bayesian Inference for Layer Representation with mixed Markov random field. In: Proceedings of the European Conference on Computer Vision EMMCVPR.

  19. Gould, S., Fulton, R., & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of the International Conference on Computer Vision ICCV, (pp. 1–8).

  20. Gu, C., Lim, J.J., Arbelaez, P., & Malik, J. (2009). Recognition using regions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.

  21. Guo, C., Zhu, S., & Wu, Y. (2007). Primal sketch: Integrating texture and structure. Computer Vision and Image Understanding, 106(1), 5–19.

    Article  Google Scholar 

  22. He, K., Sun, J., & Tang, X. (2009). Single image haze removal using dark channel prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.

  23. Hoiem, D., Efros, A., & Hebert, M. (2008). Closing the loop in scene interpretation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.

  24. Hoiem, D., Efros, A., & Hebert, M. (2010). Recovering occlusion boundaries from an image. IJCV, 91(3), 328–346.

    Article  MathSciNet  Google Scholar 

  25. Hoiem, D., Efros, A. A., & Hebert, M. (2005). Automatic photo pop-up. ACM Transactions on Graphics, 24(3), 577–584.

    Article  Google Scholar 

  26. Hoiem, D., Efros, A.A., & Hebert, M. (2005). Geometric context from a single image. In: Proceedings of the IEEE International Conference on Computer Vision ICCV, (pp. 654–661).

  27. Hunter, D., & Lange, K. (2004). A tutorial on MM algorithms. The American Statistician, 58(1), 30–38.

    Article  MathSciNet  Google Scholar 

  28. Hwang, T., Clark, J., & Yuille, A. (1989). A depth recovery algorithm using defocus information. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR, (pp. 476–482).

  29. Jia, Z., Gallagher, A., Chang, Y.J., & Chen, T. (2012). A learning based framework for depth ordering” ieee conference on computer vision and pattern recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  30. Kenyon-Mathieu, C., & Schudy, W. (2007). How to rank with few errors. In: Proceedings of the ACM symposium on Theory of Computing (STOC), pp. 95–103.

  31. Krishnan, A., & Ahuja, N. (1996). Range estimation from focus using a non-frontal imaging camera. IJCV, 20(3), 169–186.

    Article  Google Scholar 

  32. Leichter, I., & Lindenbaum, M. (2009). Boundary ownership by lifting to 2.1d. In: Proceedings of the International Conference on Computer Vision ICCV, (pp. 9–16).

  33. Liu, B., Gould, S., & Koller, D. (2010). Single image depth estimation from predicted semantic labels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.

  34. Malik, J., & Maydan, D. (1989). Recovering three dimensional shape from a single image of curved objects. TPAMI, 11(6), 555–566.

  35. Marr, D. (1979). Visual information processing: The structure and creation of visual representations. In: Proceedings of the International Joint Conference on Artificial Intelligence IJCAI.

  36. Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of the IEEE International Conference on Computer Vision ICCV.

  37. Mathieu, C., & Schudy, W. (2011). How to rank with fewer errors: A PTAS for feedback arc set in tournaments. Journal of the ACM, 95–103.

  38. Morel, J., & Salembier, P. (2008). Monocular depth by nonlinear diffusion. In: Proceedings of the Indian Conference on Computer Vision, Graphics & Image Processing ICVGIP, (pp. 95–102).

  39. Nitzberg, M., & Mumford, D. (1990). The 2.1-D sketch. In: Proceedings of the International Conference on Computer Vision ICCV, (pp. 138–144) .

  40. Pentland, A. P. (1987). A new sense for depth of field. IEEE TPAMI, 9, 523–531.

    Article  Google Scholar 

  41. Rajagopalan, A., & Chaudhuri, S. (1997). A variational approach to recovering depth from defocused images. IEEE TPAMI, 19(10), 1158–1164.

    Article  Google Scholar 

  42. Ren, X., Fowlkes, C.C., & Malik, J. (2006). Figure/ground assignment in natural images. In: Proceedings of the European conference on Computer Vision ECCV, (pp. 614–627).

  43. Roy-Chowdhury, A.K., & Chellappa, R. (2005). Statistical bias in 3d reconstruction from a monocular video. IEEE Transactions on Image Processing TIP 14(8), 1057–1062.

  44. Saund, E. (1999). Perceptual organization of occluding contours generated by opaque surfaces. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR, (vol. 2, pp. 624–630).

  45. Saxena, A., Sun, M., & Ng, A. Y. (2009). Make3D: Learning 3D scene structure from a single still image. IEEE TPAMI, 31(5), 824–840.

    Article  Google Scholar 

  46. Scharstein, D., & Pal, C. (2007). Learning conditional random fields for stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.

  47. Stamm, H. (1991). On feedback problems in planar digraphs. Berlin / Heidelberg: Springer.

    Google Scholar 

  48. Sun, M., Bradski, G.R., Xu, B.X., & Savarese, S. (2010). Depth-encoded Hough voting for joint object detection and shape recovery. In: Proceedings of the European conference on Computer vision ECCV, (pp. 658–671).

  49. Varma, M., & Garg, R. (2007). Locally invariant fractal features for statistical texture classification. In: Proceedings of the International Conference on Computer Vision ICCV.

  50. Vecera, S. P., Vogel, E. K., & Woodman, G. F. (2002). Lower region: A new cue for figure-ground assignment. Journal of Experimental Psychology: General, 131, 194–205.

    Article  Google Scholar 

  51. Wright, S., Nowak, R., & Figueiredo, M. (2009). Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7), 2479–2493.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported in part by Grants NSF RI 1302700 and DARPA MSEE FA 8650-11-1-7149.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mohamed R. Amer.

Additional information

Communicated by C. Schnörr.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1 (PDF 4918 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Amer, M.R., Yousefi, S., Raich, R. et al. Monocular Extraction of 2.1D Sketch Using Constrained Convex Optimization. Int J Comput Vis 112, 23–42 (2015). https://doi.org/10.1007/s11263-014-0752-2

Download citation

Keywords

  • 2.1D sketch
  • Figure-ground assignment
  • Image segmentation
  • Convex quadratic optimization