Skip to main content
Log in

Recovering Relative Depth from Low-Level Features Without Explicit T-junction Detection and Interpretation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This work presents a novel computational model for relative depth order estimation from a single image based on low-level local features that encode perceptual depth cues such as convexity/concavity, inclusion, and T-junctions in a quantitative manner, considering information at different scales. These multi-scale features are based on a measure of how likely is a pixel to belong simultaneously to different objects (interpreted as connected components of level sets) and, hence, to be occluded in some of them, providing a hint on the local depth order relationships. They are directly computed on the discrete image data in an efficient manner, without requiring the detection and interpretation of edges or junctions. Its behavior is clarified and illustrated for some simple images. Then the recovery of the relative depth order on the image is achieved by global integration of these local features applying a non-linear diffusion filtering of bilateral type. The validity of the proposed features and the integration approach is demonstrated by experiments on real images and comparison with state-of-the-art monocular depth estimation techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

References

  • Alvarez, L., Gousseau, Y., & Morel, J. (1999a). Scales in natural images and a consequence on their bounded variation norm. Scale-Space Theories in Computer Vision, 247–258.

  • Alvarez, L., Gousseau, Y., & Morel, J. (1999b). The size of objects in natural and artificial images. Advances in Imaging and Electron Physics, 111, 167–242.

    Article  Google Scholar 

  • Amer, M., Raich, R., & Todorovic, S. (2010). Monocular extraction of 2.1 d sketch. In Proceedings of the international conference on image processing.

  • Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis Machine Intelligence, 33(5), 898–916.

    Article  Google Scholar 

  • Bordenave, C., Gousseau, Y., & Roueff, F. (2006). The dead leaves model: A general tessellation modeling occlusion. Advances in Applied Probability, 38(1), 31–46.

    Article  MathSciNet  MATH  Google Scholar 

  • Buades, A., Coll, B., & Morel, J. (2005). A non-local algorithm for image denoising. In IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005 (Vol. 2, pp. 60–65). IEEE.

  • Buades, A., Le, T., Morel, J., & Vese, L. (2010). Fast cartoon + texture image filters. IEEE Transactions on Image Processing, 19(8), 1978–1986.

    Article  MathSciNet  Google Scholar 

  • Calderero, F., & Marques, F. (2010). Region merging techniques using information theory statistical measures. IEEE Transactions on Image Processing, 19(6), 1567–1586.

    Article  MathSciNet  Google Scholar 

  • Caselles, V., Coll, B., & Morel, J. (1996). A kanizsa programme. In ICAOS’96 (pp. 356–359).

  • Caselles, V., Coll, B., & Morel, J. (1999). Topographic maps and local contrast changes in natural images. International Journal of Computer Vision, 33(1), 5–27.

    Article  MathSciNet  Google Scholar 

  • Caselles, V., & Monasse, P. (2010). Geometric description of images as topographic maps, Vol. 1984. New York: Springer.

  • Darrell, T., & Pentland, A. (1995). Cooperative robust estimation using layers of support. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5), 474–487.

    Article  Google Scholar 

  • Dimiccoli, M., Morel, J., & Salembier, P. (2008). Monocular depth by nonlinear diffusion. In Sixth Indian conference on computer vision, graphics & image processing, 2008. ICVGIP’08 (pp. 95–102). IEEE.

  • Dimiccoli, M., & Salembier, P. (2009a). Exploiting t-junctions for depth segregation in single images. In IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009 (pp. 1229–1232). IEEE.

  • Dimiccoli, M., & Salembier, P. (2009b). Hierarchical region-based representation for segmentation and filtering with depth in single images. In 16th IEEE international conference on Image processing (ICIP), 2009 (pp. 3533–3536). IEEE.

  • Eisemann, E., & Durand, F. (2004). Flash photography enhancement via intrinsic relighting. In ACM transactions on graphics (TOG) (Vol. 23, pp. 673–678). ACM.

  • Favaro, P., Soatto, S., Burger, M., & Osher, S. (2008). Shape from defocus via diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(3), 518–531.

    Article  Google Scholar 

  • Feldman, D., & Weinshall, D. (2008). Motion segmentation and depth ordering using an occlusion detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1171–1185.

    Article  Google Scholar 

  • Fowlkes, C., Martin, D., & Malik, J. (2007). Local figure-ground cues are valid for natural images. Journal of Vision, 7(8), Article 2.

    Google Scholar 

  • Froyen, V., Feldman, J., & Singh, M. (2010) A bayesian framework for figure-ground interpretation. Advances in Neural Information Processing Systems, 23, 631–639.

  • Froyen, V., Feldman, J., & Singh, M. (2010). Local propagation of border-ownership. Journal of Vision, 10(7), 1176–1176.

    Article  Google Scholar 

  • Froyen, V., Kogo, N., Feldman, J., Singh, M., & Wagemans, J. (2011). Integration of contour and skeleton based cues in the reconstruction of surface structure. Perception, 40(Supplement), 175a.

    Google Scholar 

  • Gao, R., Wu, T., Zhu, S., & Sang, N. (2007). Bayesian inference for layer representation with mixed markov random field. In Energy minimization methods in computer vision and pattern recognition (pp. 213–224). Springer.

  • Gibson, J. (1986). The ecological approach to visual perception. Lawrence Erlbaum.

  • Goldstein, E. B. (2002). Sensation and perception (6th ed.). Pacific Grove, CA: Wadsworth.

    Google Scholar 

  • Gousseau, Y., & Morel, J. (2001). Are natural images of bounded variation? SIAM Journal on Mathematical Analysis, 33(3), 634–648.

    Article  MathSciNet  MATH  Google Scholar 

  • Hoiem, D., Efros, A., & Hebert, M. (2011). Recovering occlusion boundaries from an image. International Journal of Computer Vision, 91(3), 328–346.

    Article  MathSciNet  MATH  Google Scholar 

  • Howard, I. (2012). Perceiving in depth, volume 3: Other mechanisms of depth perception, Vol. 29. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Kanizsa, G. (1980). Grammatica del vedere: saggi su percezione e gestalt, ii mulino.

  • Kim, S., & Feldman, J. (2009). Globally inconsistent figure/ground relations induced by a negative part. Journal of Vision, 9(10), Article 8.

    Google Scholar 

  • Kogo, N., Froyen, V., Feldman, J., Singh, M., & Wagemans, J. (2011a). Integration of local and global cues to reconstruct surface structure. Journal of Vision, 11(11), 1100–1100.

    Article  Google Scholar 

  • Kogo, N., Galli, A., & Wagemans, J. (2011b). Switching dynamics of border ownership: A stochastic model for bi-stable perception. Vision Research, 51, 2085–2098.

    Article  Google Scholar 

  • Kogo, N., Strecha, C., Van Gool, L., & Wagemans, J. (2010). Surface construction by a 2-d differentiation-integration process: A neurocomputational model for perceived border ownership, depth, and lightness in Kanizsa figures. Psychological review, 117(2), 406.

    Article  Google Scholar 

  • Kopf, J., Cohen, M., Lischinski, D., & Uyttendaele, M. (2007). Joint bilateral upsampling. ACM Transactions on Graphics, 26(3), 96.

    Article  Google Scholar 

  • Lee, S., & Sharma, S. (2011). Real-time disparity estimation algorithm for stereo camera systems. IEEE Transactions on Consumer Electronics, 57(3), 1018–1026.

    Article  Google Scholar 

  • Leichter, I., & Lindenbaum, M. (2009). Boundary ownership by lifting to 2.1 d. In IEEE 12th International Conference on computer vision, 2009 (pp. 9–16). IEEE.

  • Lindeberg, T. (1994). Scale-space theory in computer vision. New York: Springer.

    Book  Google Scholar 

  • Liu, B., Gould, S., & Koller, D. (2010). Single image depth estimation from predicted semantic labels. In IEEE conference on computer vision and pattern recognition (CVPR), 2010 (pp. 1253–1260). IEEE.

  • Maire, M. (2010). Simultaneous segmentation and figure/ground organization using angular embedding. Computer Vision-ECCV, 6312, 450–464.

    Google Scholar 

  • Marr, D. (1982). Vision: a computational approach. San Francisco: Freeman & Co.

  • Metzger, W. (1975). Gesetze des sehens (die lehre vom sehen der formen und dinge des raumes und der bewegung). Frankfurt/M.: Kramer.

  • Namboodiri, V., & Chaudhuri, S. (2008). Recovery of relative depth from a single observation using an uncalibrated (real-aperture) camera. In IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008 (pp. 1–6). IEEE.

  • Nitzberg, M., & Mumford, D. (1990). The 2.1-d sketch. In Proceedings, third international conference on computer vision, 1990 (pp. 138–144). IEEE.

  • Nitzberg, M., Mumford, D., & Shiota, T. (1993). Filtering, segmentation, and depth, Vol. 662. New York: Springer.

  • Palou, G., & Salembier, P. (2011). Occlusion-based depth ordering on monocular images with binary partition tree. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1093–1096). IEEE.

  • Parida, L., Geiger, D., & Hummel, R. (1998). Junctions: Detection, classification, and reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(7), 687–698.

    Article  Google Scholar 

  • Paris, S., & Durand, F. (2009). A fast approximation of the bilateral filter using a signal processing approach. International Journal of Computer Vision, 81(1), 24–52.

    Article  Google Scholar 

  • Peterson, M., & Skow, E. (2008). Inhibitory competition between shape properties in figure-ground perception. Journal of Experimental Psychology: Human Perception and Performance, 34(2), 251.

    Article  Google Scholar 

  • Petschnigg, G., Szeliski, R., Agrawala, M., Cohen, M., Hoppe, H., & Toyama, K. (2004). Digital photography with flash and no-flash image pairs. In ACM transactions on graphics (TOG) (Vol. 23, pp. 664–672). ACM.

  • Pham, T., & Van Vliet, L. (2005). Separable bilateral filtering for fast video preprocessing. In IEEE international conference on multimedia and expo, 2005 (ICME 2005) (p. 4). IEEE.

  • Rensink, R., & Enns, J. (1998). Early completion of occluded objects. Vision Research, 38(15–16), 2489–2505.

    Article  Google Scholar 

  • Rubin, N. (2001). Figure and ground in the brain. Nature Neuroscience, 4, 857–858.

    Article  Google Scholar 

  • Saxena, A., Chung, S., & Ng, A. (2008). 3-D depth reconstruction from a single still image. International Journal of Computer Vision, 76(1), 53–69.

    Article  Google Scholar 

  • Serra, J. (1982). Image analysis and mathematical morphology, Vol. 1. London and New York: Academic Press.

    Google Scholar 

  • Soille, P. (2003). Morphological image analysis: Principles and applications. New York: Springer.

  • Tomasi, C., & Manduchi, R. (1998). Bilateral filtering for gray and color images. In Sixth international conference on computer vision, 1998 (pp. 839–846). IEEE.

  • Torralba, A., & Oliva, A. (2002). Depth estimation from image structure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9), 1226–1238.

    Article  Google Scholar 

  • Vincent, L., & Soille, P. (1991). Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(6), 583–598.

    Article  Google Scholar 

  • Von Gioi, R., Jakubowicz, J., Morel, J., & Randall, G. (2010). LSD: A fast line segment detector with a false detection control. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(4), 722–732.

    Article  Google Scholar 

  • Wang, J., & Adelson, E. (1994). Representing moving images with layers. IEEE Transactions on Image Processing, 3(5), 625–638.

    Article  Google Scholar 

  • Williams, L., & Jacobs, D. (1997). Stochastic completion fields: A neural model of illusory contour shape and salience. Neural Computation, 9(4), 837–858.

    Google Scholar 

  • Yang, Q., Yang, R., Davis, J., & Nistér, D. (2007). Spatial-depth super resolution for range images. In IEEE conference on computer vision and pattern recognition, 2007 (CVPR’07) (pp. 1–8). IEEE.

  • Yaroslavsky, L. (1985). Digital picture processing. An introduction, Vol. 1. New York: Springer.

    Book  Google Scholar 

  • Yu, S. (2009). Angular embedding: from jarring intensity differences to perceived luminance. In IEEE conference on computer vision and pattern recognition, 2009 (CVPR 2009) (pp. 2302–2309). IEEE.

  • Zhou, H., & Friedman, H. (2000). Coding of border ownership in monkey visual cortex. The Journal of Neuroscience, 20(17), 6594–6611.

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank Mariella Dimiccoli, Guillem Palou, Philippe Salembier, and Michael Maire for their time and effort, and for kindly providing the state-of-the-art results shown in this work for comparison. We acknowledge partial support by MICINN project, reference MTM2009-08171, and by GRC reference 2009 SGR 773. VC also acknowledges partial support by “ICREA Acadèmia” prize for excellence in research funded by the Generalitat de Catalunya.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felipe Calderero.

A Proof of Lemma 1

A Proof of Lemma 1

Since

$$\begin{aligned} g^{\prime }(\beta ) = \frac{(a-b) e^{(a+b)\beta }-ae^{a\beta }+be^{b\beta }}{(e^{b\beta }-1)^2}, \end{aligned}$$

it suffices to prove that

$$\begin{aligned} g_1(\beta )= (a-b) e^{(a+b)\beta }-ae^{a\beta }+be^{b\beta }> 0. \end{aligned}$$

For that, since \(g_1(0)=0\), it suffices to prove that

$$\begin{aligned} g_1^{\prime }(\beta ) = (a^2-b^2) e^{(a+b)\beta } - a^2e^{a\beta }+b^2e^{b\beta } > 0. \end{aligned}$$

By dividing \(g_1^{\prime }(\beta )\) by \(e^{(a+b)\beta }\), this is equivalent to prove that

$$\begin{aligned} g_2(\beta ) = (a^2-b^2) - a^2e^{-b\beta }+b^2e^{-a\beta } > 0. \end{aligned}$$

Again, since \(g_2(0) = 0\), it suffices to prove that

$$\begin{aligned} g_2^{\prime }(\beta ) = ab (a e^{-b\beta }-be^{-a\beta }) > 0. \end{aligned}$$

But this is clearly true since \(a > b\). Lemma 1 is proved.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Calderero, F., Caselles, V. Recovering Relative Depth from Low-Level Features Without Explicit T-junction Detection and Interpretation. Int J Comput Vis 104, 38–68 (2013). https://doi.org/10.1007/s11263-013-0613-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-013-0613-4

Keywords

Navigation