International Journal of Computer Vision

, Volume 91, Issue 3, pp 328–346 | Cite as

Recovering Occlusion Boundaries from an Image



Occlusion reasoning is a fundamental problem in computer vision. In this paper, we propose an algorithm to recover the occlusion boundaries and depth ordering of free-standing structures in the scene. Rather than viewing the problem as one of pure image processing, our approach employs cues from an estimated surface layout and applies Gestalt grouping principles using a conditional random field (CRF) model. We propose a hierarchical segmentation process, based on agglomerative merging, that re-estimates boundary strength as the segmentation progresses. Our experiments on the Geometric Context dataset validate our choices for features, our iterative refinement of classifiers, and our CRF model. In experiments on the Berkeley Segmentation Dataset, PASCAL VOC 2008, and LabelMe, we also show that the trained algorithm generalizes to other datasets and can be used as an object boundary predictor with figure/ground labels.


Image segmentation Occlusion boundaries Figure/ground labeling Image interpretation Scene understanding 3D reconstruction Depth from image Edge detection 


  1. Ahuja, N. (1996). A transform for multiscale image segmentation by integrated edge and region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(12).
  2. Alexe, B., Deselaers, T., & Ferrari, V. (2010). What is an object? In CVPR 2010. Google Scholar
  3. Amir, A., & Lindenbaum, M. (1998). A generic grouping algorithm and its quantitative analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(2).
  4. Arbelaez, P. (2006). Boundary extraction in natural images using ultrametric contour maps. In Proc. POCV. Google Scholar
  5. Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2009). From contours to regions: an empirical evaluation. In CVPR. Google Scholar
  6. Bakin, J. S., Nakayama, K., & Gilbert, C. D. (2000). Visual responses in monkey areas v1 and v2 to three-dimensional surface configurations. The Journal of Neuroscience. Google Scholar
  7. Black, M. J., & Fleet, D. J. (2000). Probabilistic detection and tracking of motion discontinuities. International Journal of Computer Vision, 38(3), 231–245. MATHCrossRefGoogle Scholar
  8. Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6), 679–698. CrossRefGoogle Scholar
  9. Cao, L., Liu, J., & Tang, X. (2005). 3D object reconstruction from a single 2D line drawing without hidden lines. In ICCV. Google Scholar
  10. Clowes, M. (1971). On seeing things. Artificial Intelligence, 2(1), 79–116. CrossRefGoogle Scholar
  11. Collins, M., Schapire, R., & Singer, Y. (2002). Logistic regression, Adaboost and Bregman distances. Machine Learning, 48(1–3). Google Scholar
  12. Cour, T., Benezit, F., & Shi, J. (2005). Spectral segmentation with multiscale graph decomposition. In CVPR. Google Scholar
  13. Draper, S. (1981). The use of gradient and dual space in line-drawing interpretation. Artificial Intelligence, 17, 461–508. CrossRefGoogle Scholar
  14. Elder, J., & Zucker, S. (1996). Computing contour closure. In ECCV. Google Scholar
  15. Endres, I., & Hoiem, D. (2010). Category independent object proposals. In ECCV. Google Scholar
  16. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2008). The PASCAL visual object classes challenge 2008 (VOC2008) results.
  17. Farhadi, A., Endres, I., & Hoiem, D. (2010). Attribute-centric recognition for cross-category generalization. In CVPR. Google Scholar
  18. Felzenszwalb, P., & Huttenlocher, D. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2).
  19. Gibson, J. (1950). The perception of the visual world. Boston: Houghton Mifflin. Google Scholar
  20. Gould, S., Gao, T., & Koller, D. (2009). Region-based segmentation and object detection. In NIPS. Google Scholar
  21. Guzman, A. (1968). Computer recognition of three-dimensional objects in a visual scene. Technical report MAC-TR-59. MIT. Google Scholar
  22. Herault, L., & Horaud, R. (1993). Figure-ground discrimination: A combinatorial optimization approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15.
  23. Heskes, T., Albers, K., & Kappen, B. (2003). Approximate inference and constrained optimization. In Proc. UAI. Google Scholar
  24. Hoiem, D., Efros, A. A., & Hebert, M. (2005). Automatic photo pop-up. In ACM SIGGRAPH 2005. Google Scholar
  25. Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1), 151–172. CrossRefGoogle Scholar
  26. Hoiem, D., Efros, A. A., & Hebert, M. (2008). Closing the loop on scene interpretation. In CVPR. Google Scholar
  27. Hoiem, D., Stein, A. N., Efros, A. A., & Hebert, M. (2007). Recovering occlusion boundaries from an image. In ICCV. Google Scholar
  28. Huffman, D. (1971). Impossible objects as nonsense sentences. Machine Intelligence, 6, 295–323. Google Scholar
  29. Huffman, D. (1977). Realizable configurations of lines in pictures of polyhedra. Machine Intelligence, 8, 493–509. Google Scholar
  30. Jacobs, D. (1993). Robust and efficient detection of convex groups. In CVPR. Google Scholar
  31. Jain, R., & Aggarwal, J. (1979). Computer analysis of scenes with curved objects. Proceedings of the IEEE, 67(5), 805–812. CrossRefGoogle Scholar
  32. Jermyn, I., & Ishikawa, H. (2001). Globally optimal regions and boundaries as minimum ratio weight cycles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10), 1075–1088. CrossRefGoogle Scholar
  33. Kanade, T. (1980). A theory of the Origami world. Artificial Intelligence, 13, 279–311. MATHCrossRefMathSciNetGoogle Scholar
  34. Kim, S.J., Koh, K., Lustig, M., Boyd, S., & Gorinevsky, D. (2007). An interior-point method for large-scale l1-regularized logistic regression. Journal of Machine Learning Research, 8, 1519–1555. Google Scholar
  35. Kovacs, I., & Julesz, B. (1993). A closed curve is much more than an incomplete one: effect of closure in figure-ground discrimination. In Proc. Nat’l Academy of Science USA, 90. Google Scholar
  36. Kumar, M. P., Torr, P., & Zisserman, A. (2010). Objcut: efficient segmentation using top-down and bottom-up cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 530–545. CrossRefGoogle Scholar
  37. Lalonde, J.-F., Hoiem, D., Efros, A. A., Rother, C., Winn, J., & Criminisi, A. (2007). Photo clip art. In ACM SIGGRAPH 2007. Google Scholar
  38. Leclerc, Y., & Fischler, M. (1992). An optimization-based approach to the interpretation of single line drawings as 3D wire frames. International Journal of Computer Vision, 9(2).
  39. Lee, S.-I., Ganapathi, V., & Koller, D. (2007). Efficient structure learning of Markov networks using L 1-regularization. In NIPS. Google Scholar
  40. Leichter, I., & Lindenbaum, M. (2009). Boundary ownership by lifting to 2.1D. In NIPS. Google Scholar
  41. Leung, T., & Malik, J. (1998). Contour continuity in region based image segmentation. In ECCV. Google Scholar
  42. Li, F., Carreira, J., & Sminchisescu, C. (2010). Object recognition as ranking holistic figure-ground hypotheses. In CVPR. Google Scholar
  43. Lipson, H., & Shpitalni, M. (1996). Optimization-based reconstruction of a 3D object from a single freehand line drawing. Computer-Aided Design, 28(8). Google Scholar
  44. Lowe, D. (1985). Perceptual organization and visual recognition. Kluwer Academic: Norwell. Google Scholar
  45. Mahamud, S., Williams, L. R., Thornber, K. K., & Xu, K. (2003). Segmentation of multiple salient closed contours from real images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(4).
  46. Maire, M., Arbelaez, P., Fowlkes, C., & Malik, J. (2008). Using contours to detect and localize junctions in natural images. In CVPR. Google Scholar
  47. Malik, J. (1987). Interpreting line drawings of curved objects. International Journal of Computer Vision, 1(1), 73–103. CrossRefGoogle Scholar
  48. Marill, T. (1991). Emulating the human interpretation of line-drawings as three-dimensional objects. International Journal of Computer Vision, 6(2).
  49. Martin, D., Fowlkes, C., & Malik, J. (2002). Learning to find brightness and texture boundaries in natural images. In NIPS. Google Scholar
  50. Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV. Google Scholar
  51. Martin, D. R., Fowlkes, C. C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(5), 530–549. CrossRefGoogle Scholar
  52. McDermott, J. (2004). Psychophysics with junctions in real images. Perception, 33(9), 1101–1127. CrossRefGoogle Scholar
  53. Ng, A. Y. (2004). Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In ICML. Google Scholar
  54. Nitzberg, M., & Mumford, D. (1990). The 2.1-D sketch. In ICCV. Google Scholar
  55. Perona, P., & Freeman, W. (1998). A factorization approach to grouping. In ECCV. Google Scholar
  56. Prasad, M., Zisserman, A., Fitzgibbon, A., Kumar, M., & Torr, P. (2006). Learning class-specific edges for object detection and segmentation. In ICCV. Google Scholar
  57. Ren, X., Fowlkes, C. C., & Malik, J. (2006). Figure/ground assignment in natural images. In ECCV. Google Scholar
  58. Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In ICCV. Google Scholar
  59. Roberts, L. (1965). Machine perception of 3-D solids. In OEOIP, pp. 159–197. Google Scholar
  60. Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In CVPR. Google Scholar
  61. Sarkar, S., & Soundararajan, P. (2000). Supervised learning of large perceptual organization: graph spectral partitioning and learning automata. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(5).
  62. Saund, E. (2006). Logic and MRF circuitry for labeling occluding and thinline visual contours. In NIPS. Google Scholar
  63. Saxena, A., Chung, S., & Ng, A. Y. (2005). Learning depth from single monocular images. In NIPS. Google Scholar
  64. Saxena, A., Chung, S. H., & Ng, A. Y. (2007). 3-d depth reconstruction from a single still image. International Journal of Computer Vision, 76.
  65. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8).
  66. Shoji, K., Kato, K., & Toyama, F. (2001). 3-D interpretation of single line drawings based on entropy minimization principle. In ICCV. Google Scholar
  67. Smith, P., Drummond, T., & Cipolla, R. (2004). Layered motion segmentation and depth ordering by tracking edges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(4), 479–494. CrossRefGoogle Scholar
  68. Stein, A. N., & Hebert, M. (2006a). Local detection of occlusion boundaries in video. In BMVC. Google Scholar
  69. Stein, A. N., & Hebert, M. (2006b). Using spatio-temporal patches for simultaneous estimation of edge strength, orientation, and motion. In Beyond Patches Workshop at CVPR. Google Scholar
  70. Stein, A. N., Hoiem, D., & Hebert, M. (2007). Learning to find object boundaries using motion cues. In ICCV. Google Scholar
  71. Sugihara, K. (1984a). An algebraic approach to the shape-from-image-problem. Artificial Intelligence, 23, 59–95. MATHCrossRefMathSciNetGoogle Scholar
  72. Sugihara, K. (1984b). A necessary and sufficient condition for a picture to represent a polyhedral scene. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(5), 578–586. CrossRefGoogle Scholar
  73. Vaillant, R., & Faugeras, O. (1992). Using extremal boundaries for 3D object modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2), 157–173. CrossRefGoogle Scholar
  74. Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2).
  75. Waltz, D. L. (1975). Understanding line drawings of scenes with shadows. In P. Winston (Ed.), The psychology of computer vision (pp. 19–91). McGraw-Hill, New York. Google Scholar
  76. Wertheimer, M. (1938). Laws of organization in perceptual forms. In W. D. Ellis (Ed.), A sourcebook of gestalt psychology. Routledge, London. Google Scholar
  77. Yuille, A. L. (2002). CCCP algorithms to minimize the Bethe and Kikuchi free energies: convergent alternatives to belief propagation. Neural Computation, 14(7). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignChampaignUSA
  2. 2.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations