Skip to main content
Log in

Monocular Extraction of 2.1D Sketch Using Constrained Convex Optimization

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper presents an approach to estimating the 2.1D sketch from monocular, low-level visual cues. We use a low-level segmenter to partition the image into regions, and, then, estimate their 2.1D sketch, subject to figure-ground and similarity constraints between neighboring regions. The 2.1D sketch assigns a depth ordering to image regions which are expected to correspond to objects and surfaces in the scene. This is cast as a constrained convex optimization problem, and solved within the optimization transfer framework. The optimization objective takes into account the curvature and convexity of parts of region boundaries, appearance, and spatial layout properties of regions. Our new optimization transfer algorithm admits a closed-form expression of the duality gap, and thus allows explicit computation of the achieved accuracy. The algorithm is efficient with quadratic complexity in the number of constraints between image regions. Quantitative and qualitative results on challenging, real-world images of Berkeley segmentation, Geometric Context, and Stanford Make3D datasets demonstrate our high accuracy, efficiency, and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. A region-subregion pair may not correspond to an object and its part (e.g., sky seen through tree branches). In this case, even human vision could not correctly identify their depths, without resorting to higher-level semantic cues, which are beyond our scope.

  2. Note that Newton’s and other Hessian based methods would not be better alternatives to the gradient descent in our case, due to a large number of unknown variables in \(\varvec{d}\) (on the order of \(10^2\)).

  3. http://cvxr.com/cvx/.

  4. Since we do not have ground truth annotations for the AL relations, we use the computed ones for GT(FG-AL).

References

  • Adelson, E.H. (1995). Layered representation for vision and video. In: Proceedings IEEE Workshop on Representation of Visual Scenes.

  • Afonso, M., Bioucas-Dias, J., & Figueiredo, M. (2010). Fast image recovery using variable splitting and constrained optimization. IEEE Transactions on Image Processing, 19(9), 2345–2356.

    Article  MathSciNet  Google Scholar 

  • Ahuja, N., & Todorovic, S. (2008). Connected segmentation tree—a joint representation of region layout and hierarchy. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.

  • Alon, N. (2006). Ranking tournaments. SIAM J. Discrete Math., 20, 137–142.

    Article  MATH  MathSciNet  Google Scholar 

  • Amer, M., Raich, R., & Todorovic, S. (2010). Monocular extraction of 2.1D sketch. In: Proceedings of the International Conference Image Processing ICIP.

  • Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE TPAMI, 33, 898–916.

    Article  Google Scholar 

  • Bar-Noy, A., & Naor, J. (1990). Sorting, minimal feedback sets, and hamilton paths in tournaments. SIAM Journal on Discrete Mathematics, 3(1), 7–20.

    Article  MATH  MathSciNet  Google Scholar 

  • Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Charbit, P., Thomasse, S., & Yeo, A. (2007). The minimum feedback arc set problem is np-hard for tournaments. Combinatorics, Probability & Computing, 16, 1–4.

    Article  MATH  MathSciNet  Google Scholar 

  • Comaniciu, D., & Meer, P. (2002). Meanshift: A robust approach toward feature space analysis. IEEE TPAMI, 24(5), 603–619.

    Article  Google Scholar 

  • Darrell, T., & Pentland, A. (1995). Cooperative robust estimation using layers of support. IEEE TPAMI, 17(5), 474–487.

    Article  Google Scholar 

  • Darrell, T., & Wohn, K. (1988). Pyramid based depth from focus. In: Proceedings of the Conference on CVPR’88 Computer Society.

  • Dimiccoli, M., & Salembier, P. (2009). Hierarchical region-based representation for segmentation and filtering with depth in single images. In: IEEE International Conference on Image Processing ICIP.

  • Esedoglu, S., & March, R. (2003). Segmentation with depth but without detecting junctions. Journal of Mathematical Imaging and Vision, 18, 7–15.

    Article  MATH  MathSciNet  Google Scholar 

  • Favaro, P., Soatto, S., Burger, M., & Osher, S. (2008). Shape from defocus via diffusion. IEEE TPAMI, 30(3), 518–531.

    Article  Google Scholar 

  • Fowlkes, C. C., Martin, D. R., & Malik, J. (2007). Local figure-ground cues are valid for natural images. Journal of Vision, 7(8), 2.

    Article  Google Scholar 

  • Fragkiadaki, K., & Shi, J. (2010). Figure-ground image segmentation helps weakly-supervised learning of objects. In: Proceedings of the European Conference on Computer Vision ECCV.

  • Gao, R., Wu, T., Zhu, S., & Sang, N. (2007). Bayesian Inference for Layer Representation with mixed Markov random field. In: Proceedings of the European Conference on Computer Vision EMMCVPR.

  • Gould, S., Fulton, R., & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of the International Conference on Computer Vision ICCV, (pp. 1–8).

  • Gu, C., Lim, J.J., Arbelaez, P., & Malik, J. (2009). Recognition using regions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.

  • Guo, C., Zhu, S., & Wu, Y. (2007). Primal sketch: Integrating texture and structure. Computer Vision and Image Understanding, 106(1), 5–19.

    Article  Google Scholar 

  • He, K., Sun, J., & Tang, X. (2009). Single image haze removal using dark channel prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.

  • Hoiem, D., Efros, A., & Hebert, M. (2008). Closing the loop in scene interpretation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.

  • Hoiem, D., Efros, A., & Hebert, M. (2010). Recovering occlusion boundaries from an image. IJCV, 91(3), 328–346.

    Article  MathSciNet  Google Scholar 

  • Hoiem, D., Efros, A. A., & Hebert, M. (2005). Automatic photo pop-up. ACM Transactions on Graphics, 24(3), 577–584.

    Article  Google Scholar 

  • Hoiem, D., Efros, A.A., & Hebert, M. (2005). Geometric context from a single image. In: Proceedings of the IEEE International Conference on Computer Vision ICCV, (pp. 654–661).

  • Hunter, D., & Lange, K. (2004). A tutorial on MM algorithms. The American Statistician, 58(1), 30–38.

    Article  MathSciNet  Google Scholar 

  • Hwang, T., Clark, J., & Yuille, A. (1989). A depth recovery algorithm using defocus information. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR, (pp. 476–482).

  • Jia, Z., Gallagher, A., Chang, Y.J., & Chen, T. (2012). A learning based framework for depth ordering” ieee conference on computer vision and pattern recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Kenyon-Mathieu, C., & Schudy, W. (2007). How to rank with few errors. In: Proceedings of the ACM symposium on Theory of Computing (STOC), pp. 95–103.

  • Krishnan, A., & Ahuja, N. (1996). Range estimation from focus using a non-frontal imaging camera. IJCV, 20(3), 169–186.

    Article  Google Scholar 

  • Leichter, I., & Lindenbaum, M. (2009). Boundary ownership by lifting to 2.1d. In: Proceedings of the International Conference on Computer Vision ICCV, (pp. 9–16).

  • Liu, B., Gould, S., & Koller, D. (2010). Single image depth estimation from predicted semantic labels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.

  • Malik, J., & Maydan, D. (1989). Recovering three dimensional shape from a single image of curved objects. TPAMI, 11(6), 555–566.

  • Marr, D. (1979). Visual information processing: The structure and creation of visual representations. In: Proceedings of the International Joint Conference on Artificial Intelligence IJCAI.

  • Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of the IEEE International Conference on Computer Vision ICCV.

  • Mathieu, C., & Schudy, W. (2011). How to rank with fewer errors: A PTAS for feedback arc set in tournaments. Journal of the ACM, 95–103.

  • Morel, J., & Salembier, P. (2008). Monocular depth by nonlinear diffusion. In: Proceedings of the Indian Conference on Computer Vision, Graphics & Image Processing ICVGIP, (pp. 95–102).

  • Nitzberg, M., & Mumford, D. (1990). The 2.1-D sketch. In: Proceedings of the International Conference on Computer Vision ICCV, (pp. 138–144) .

  • Pentland, A. P. (1987). A new sense for depth of field. IEEE TPAMI, 9, 523–531.

    Article  Google Scholar 

  • Rajagopalan, A., & Chaudhuri, S. (1997). A variational approach to recovering depth from defocused images. IEEE TPAMI, 19(10), 1158–1164.

    Article  Google Scholar 

  • Ren, X., Fowlkes, C.C., & Malik, J. (2006). Figure/ground assignment in natural images. In: Proceedings of the European conference on Computer Vision ECCV, (pp. 614–627).

  • Roy-Chowdhury, A.K., & Chellappa, R. (2005). Statistical bias in 3d reconstruction from a monocular video. IEEE Transactions on Image Processing TIP 14(8), 1057–1062.

  • Saund, E. (1999). Perceptual organization of occluding contours generated by opaque surfaces. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR, (vol. 2, pp. 624–630).

  • Saxena, A., Sun, M., & Ng, A. Y. (2009). Make3D: Learning 3D scene structure from a single still image. IEEE TPAMI, 31(5), 824–840.

    Article  Google Scholar 

  • Scharstein, D., & Pal, C. (2007). Learning conditional random fields for stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.

  • Stamm, H. (1991). On feedback problems in planar digraphs. Berlin / Heidelberg: Springer.

    Google Scholar 

  • Sun, M., Bradski, G.R., Xu, B.X., & Savarese, S. (2010). Depth-encoded Hough voting for joint object detection and shape recovery. In: Proceedings of the European conference on Computer vision ECCV, (pp. 658–671).

  • Varma, M., & Garg, R. (2007). Locally invariant fractal features for statistical texture classification. In: Proceedings of the International Conference on Computer Vision ICCV.

  • Vecera, S. P., Vogel, E. K., & Woodman, G. F. (2002). Lower region: A new cue for figure-ground assignment. Journal of Experimental Psychology: General, 131, 194–205.

    Article  Google Scholar 

  • Wright, S., Nowak, R., & Figueiredo, M. (2009). Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7), 2479–2493.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported in part by Grants NSF RI 1302700 and DARPA MSEE FA 8650-11-1-7149.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed R. Amer.

Additional information

Communicated by C. Schnörr.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1 (PDF 4918 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amer, M.R., Yousefi, S., Raich, R. et al. Monocular Extraction of 2.1D Sketch Using Constrained Convex Optimization. Int J Comput Vis 112, 23–42 (2015). https://doi.org/10.1007/s11263-014-0752-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0752-2

Keywords

Navigation