Occlusion Boundaries from Motion: Low-Level Detection and Mid-Level Reasoning

Stein, Andrew N.; Hebert, Martial

doi:10.1007/s11263-008-0203-z

Occlusion Boundaries from Motion: Low-Level Detection and Mid-Level Reasoning

Published: 03 February 2009

Volume 82, pages 325–357, (2009)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Andrew N. Stein¹ &
Martial Hebert¹

617 Accesses
57 Citations
Explore all metrics

Abstract

The boundaries of objects in an image are often considered a nuisance to be “handled” due to the occlusion they exhibit. Since most, if not all, computer vision techniques aggregate information spatially within a scene, information spanning these boundaries, and therefore from different physical surfaces, is invariably and erroneously considered together. In addition, these boundaries convey important perceptual information about 3D scene structure and shape. Consequently, their identification can benefit many different computer vision pursuits, from low-level processing techniques to high-level reasoning tasks.

While much focus in computer vision is placed on the processing of individual, static images, many applications actually offer video, or sequences of images, as input. The extra temporal dimension of the data allows the motion of the camera or the scene to be used in processing. In this paper, we focus on the exploitation of subtle relative-motion cues present at occlusion boundaries. When combined with more standard appearance information, we demonstrate these cues’ utility in detecting occlusion boundaries locally. We also present a novel, mid-level model for reasoning more globally about object boundaries and propagating such local information to extract improved, extended boundaries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adelson, E. H., & Bergen, J. R. (1985). Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A, 2(2), 284–299.
Article Google Scholar
Adelson, E. H., & Bergen, J. R. (1991). The plenoptic function and the elements of early vision. In M. Landy & J. A. Movshon (Eds.), Computational models of visual processing (pp. 3–20). Cambridge: MIT Press. Chap. 1.
Google Scholar
Arbeláez, P. (2006). Boundary extraction in natural images using ultrametric contour maps. In IEEE computer society workshop on perceptual organization in computer vision (POCV).
Barron, J. L., Fleet, D. J., & Beauchemin, S. S. (1994). Performance of optical flow techniques. International Journal of Computer Vision (IJCV), 12(1), 47–77.
Google Scholar
Black, M. J., & Fleet, D. J. (2000). Probabilistic detection and tracking of motion discontinuities. International Journal of Computer Vision (IJCV), 38(3), 231–245.
Article MATH Google Scholar
Bouthemy, P. (1989). A maximum likelihood framework for determining moving edges. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 11(5), 499–511.
Article Google Scholar
Brostow, G., & Essa, I. (1999). Motion based decompositing of video. In IEEE international conference on computer vision (ICCV) (Vol. 1. pp. 8–13).
Collins, M., Schapire, R., & Singer, Y. (2002). Logistic regression, Adaboost and Bregman distances. Machine Learning, 48(1–3), 253–285.
Article MATH Google Scholar
Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 24(5), 603–614.
Article Google Scholar
Darrell, T., & Pentland, A. P. (1995). Cooperative robust estimation using layers of support. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 17(5), 474–487.
Article Google Scholar
Derpanis, K. G., & Gryn, J. M. (2005). Three-dimensional Nth derivative of Gaussian separable steerable filters. In IEEE international conference on image processing (ICIP) (Vol. III. pp. 553–556).
Dollár, P., Tu, Z., & Belongie, S. (2006). Supervised learning of edges and objects boundaries. In IEEE conference on computer vision and pattern recognition (CVPR).
Drummond, T., & Cipolla, R. (2000). Application of Lie algebras to visual servoing. International Journal of Computer Vision (IJCV), 37(1), 21–41.
Article MATH Google Scholar
Felzenszwalb, P., & Huttenlocher, D. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision (IJCV), 59(2), 167–181.
Article Google Scholar
Fleet, D. J., & Weiss, Y. (2005). Optical flow estimation. In N. Paragios, Y. Chen, & O. Faugeras (Eds.), Mathematical models for computer vision: The handbook. Berlin: Springer.
Google Scholar
Fleet, D. J., Black, M. J., & Nestares, O. (2002). Bayesian inference of visual motion boundaries. In G. Lakemeyer & B. Nebel (Eds.), Exploring artificial intelligence in the new millenium (pp. 139–173). San Mateo: Morgan Kaufmann.
Google Scholar
Fowlkes, C., Martin, D., & Malik, J. (2003). Learning affinity functions for image segmentation: combining patch-based and gradient-based approaches. In IEEE conference on computer vision and pattern recognition (CVPR).
Frey, B. J. (1998). Graphical Models for Machine Learning and Digital Communication. Cambridge: MIT Press.
Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2), 377–407.
Article MathSciNet Google Scholar
Fusiello, A., Roberto, V., & Trucco, E. (1997). Efficient stereo with multiple windowing. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 858–863).
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 6(6), 721–741.
Article MATH Google Scholar
Guan, L., Franco, J.-S., & Pollefeys, M. (2007). 3D occlusion inference from Silhouette cues. In IEEE conference on computer vision and pattern recognition (CVPR).
Guzman, A. (1968). Decomposition of a visual scene into three dimensional bodies. In AFIPS fall joint conference (Vol. 33. pp. 291–304).
Heeger, D. J. (1988). Optical flow using spatiotemporal filters. International Journal of Computer Vision (IJCV), 1, 270–302.
Article Google Scholar
Heitz, F., & Bouthemy, P. (1993). Multimodal estimation of discontinuous optical flow using Markov random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 15(12), 1217–1232.
Article Google Scholar
Heskes, T., Albers, K., & Kappen, B. (2003). Approximate inference and constrained optimization. In Uncertainty in artificial intelligence (UAI) (pp. 313–320).
Hirschmüller, H., Innocent, P. R., & Garibaldi, J. (2002). Real-time correlation-based stereo vision with reduced border errors. International Journal of Computer Vision (IJCV), 47(1–3), 229–246.
Article MATH Google Scholar
Hoiem, D., Efros, A. A., & Hebert, M. (2005). Automatic photo pop-up. ACM Transactions on Graphics (SIGGRAPH), 24(3), 577–584.
Article Google Scholar
Hoiem, D., Efros, A. A., & Hebert, M. (2007a). Recovering surface layout from an image. International Journal of Computer Vision (IJCV), 75(1), 151–172.
Article Google Scholar
Hoiem, D., Stein, A. N., Efros, A. A., & Hebert, M. (2007b). Recovering occlusion boundaries from a single image. In IEEE international conference on computer vision (ICCV).
Irani, M., & Peleg, S. (1993). Motion analysis for image enhancement: resolution, occlusion, and transparency. Journal of Visual Communication and Image Representation, 4(4), 324–335.
Article Google Scholar
Jepson, A. D., Fleet, D. J., & Black, M. J. (2002). A layered motion representation with occlusion and compact spatial support. In European conference on computer vision (ECCV) (Vol. 1. pp. 692–706).
Jojic, N., & Frey, B. J. (2001). Learning flexible sprites in video layers. In IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 1. pp. 196–206).
Kanade, T., & Okutomi, M. (1994). A stereo matching algorithm with an adaptive window: theory and experiment. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 16(9), 920–932.
Article Google Scholar
Ke, Q., & Kanade, T. (2002). A robust subspace approach to layer extraction. In IEEE workshop on motion and video computing (MOTION) (pp. 37–43).
Konishi, S., Yuille, A. L., Coughlan, J. M., & Zhu, S. C. (2003). Statistical edge detection: learning and evaluating edge cues. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 25(1), 57–74.
Article Google Scholar
Kumar, M. P., Torr, P., & Zisserman, A. (2005). Learning layered motion segmentations of video. In IEEE international conference on computer vision (ICCV) (Vol. 1. pp. 33–40).
Kumar, S., & Hebert, M. (2006). Discriminative random fields. International Journal of Computer Vision (IJCV), 68(2), 179–202.
Article Google Scholar
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In International conference on machine learning (ICML).
Lazebnik, S., & Ponce, J. (2005). The local projective shape of smooth surfaces and their outlines. International Journal of Computer Vision (IJCV), 63(1), 65–83.
Article Google Scholar
Leordeanu, M., & Hebert, M. (2005). A spectral technique for correspondence problems using pairwise constraints. In IEEE International conference on computer vision (ICCV).
Leung, T., & Malik, J. (1998). Contour continuity in region based image segmentation. In European conference on computer vision (ECCV).
Liu, C., Freeman, W. T., & Adelson, E. H. (2006). Analysis of contour motions. In Advances in neural information processing systems (NIPS).
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV), 60(2), 91–110.
Article Google Scholar
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In International joint conferences on artificial intelligence (IJCAI) (pp. 674–679).
MacKay, D. J. C. (2003). Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press.
MATH Google Scholar
Mahamud, S., Williams, L. R., Thornber, K. K., & Xu, K. (2003). Segmentation of multiple salient closed contours from real images. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 25(4), 433–444.
Article Google Scholar
Malisiewicz, T., & Efros, A. A. (2007). Improving spatial support for objects via multiple segmentations. In British machine vision conference (BMVC).
Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In IEEE international conference on computer vision (ICCV) (Vol. 2. pp. 416–423).
Martin, D. R., Fowlkes, C. C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 26(5), 530–549.
Article Google Scholar
Maxwell, B. A., & Brubaker, S. J. (2003). Texture edge detection using the compass operator. In British machine vision conference (BMVC) (Vol. II. pp. 549–558).
Mori, G. (2005). Guiding model search using segmentation. In IEEE international conference on computer vision (ICCV).
Mori, G., Ren, X., Efros, A., & Malik, J. (2004). Recovering human body configurations: combining segmentation and recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 2. pp. 3226–3333).
Nestares, O., & Fleet, D. J. (2001). Probabilistic tracking of motion boundaries with spatiotemporal predictions. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 358–365).
Ogale, A. S., Fermüller, C., & Aloimonos, Y. (2005). Motion segmentation using occlusions. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 27(6), 988–992.
Article Google Scholar
Pearl, J. (1982). Reverend Bayes on inference engines: A distributed hierarchical approach. In Association for the advancement of artificial intelligence (AAAI) (pp. 133–136).
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo: Morgan Kaufmann.
Google Scholar
Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In IEEE international conference on computer vision (ICCV) (Vol. 1. pp. 10–17).
Ren, X., Fowlkes, C. C., & Malik, J. (2005). Cue integration for figure/ground labeling. In Advances in neural information processing systems (NIPS).
Ren, X., Fowlkes, C. C., & Malik, J. (2006). Figure/ground assignment in natural images. In European conference on computer vision (ECCV).
Ross, M. G., & Kaelbling, L. P. (2005). Learning static object segmentation from motion segmentation. In Association for the advancement of artificial intelligence (AAAI).
Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2005). LabelMe: a database and web-based tool for image annotation (Memo AIM-2005-025). MIT AI Lab, http://labelme.csail.mit.edu/.
Ruzon, M., & Tomasi, C. (1999). Color edge detection with the compass operator. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 160–166).
Sato, J., & Cipolla, R. (1999). Affine reconstruction of curved surfaces from uncalibrated views of apparent contours. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 21(11), 1188–1197.
Article Google Scholar
Sethi, A., Renaudie, D., Kriegman, D., & Ponce, J. (2004). Curve and surface duals and the recognition of curved 3d objects from their silhouettes. International Journal of Computer Vision (IJCV), 58(1), 73–86.
Article Google Scholar
Shechtman, E., & Irani, M. (2005). Space-time behavior based correlation. In IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 1. pp. 405–412).
Shi, J., & Malik, J. (1998). Motion segmentation and tracking using normalized cuts. In IEEE international conference on computer vision (ICCV) (pp. 1154–1160).
Simoncelli, E., Adelson, E. H., & Heeger, D. J. (1991). Probability distributions of optical flow. In IEEE conference on computer vision and pattern recognition (CVPR).
Smith, P., Drummond, T., & Cipolla, R. (2004). Layered motion segmentation and depth ordering by tracking edges. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 26(4), 479–494.
Article Google Scholar
Smith, P. A. (2001). Edge-based motion segmentation. Ph.D. thesis, Jesus College, University of Cambridge.
Stein, A., & Hebert, M. (2005). Incorporating background invariance into feature-based object recognition. In IEEE workshop on applications of computer vision (WACV) (pp. 37–44).
Stein, A., & Hebert, M. (2007). Combining local appearance and motion cues for occlusion boundary detection. In British machine vision conference (BMVC).
Stein, A., Hoiem, D., & Hebert, M. (2007). Learning to find object boundaries using motion cues. In IEEE international conference on computer vision (ICCV).
Stein, A. N. (2008). Occlusion boundaries: low-level processing to high-level reasoning. Doctoral Dissertation, The Robotics Institute, Carnegie Mellon University.
Stein, A. N., & Hebert, M. (2006a). Local detection of occlusion boundaries in video. In British machine vision conference (BMVC) (pp. 407–416).
Stein, A. N., & Hebert, M. (2006b). Using spatio-temporal patches for simultaneous estimation of edge strength, orientation, and motion. In Beyond patches workshop at IEEE conference on computer vision and pattern recognition (CVPR) (p. 19).
Stein, A. N., Stepleton, T. S., & Hebert, M. (2008). Towards unsupervised whole-object segmentation: combining automated matting with boundary detection. In IEEE conference on computer vision and pattern recognition (CVPR).
Tao, H., Sawhney, H. S., & Kumar, R. (2001). A global matching framework for stereo computation. In IEEE international conference on computer vision (ICCV) (Vol. 1. pp. 532–539).
Tomasi, C., & Kanade, T. (1991). Detection and tracking of point features (Technical Report CMU-CS-91-132). Carnegie Mellon University.
Vaillant, R., & Faugeras, O. D. (1992). Using extremal boundaries for 3-D object modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 14(2), 157–173.
Article Google Scholar
Veit, T., Cao, F., & Bouthemy, P. (2006). An a contrario decision framework for region-based motion detection. International Journal of Computer Vision (IJCV), 68(2), 163–178.
Article Google Scholar
Waltz, D. A. (1975). Understanding line drawings of scenes with shadows. In The psychology of computer vision (pp. 19–91). New York: McGraw-Hill.
Google Scholar
Wang, J. Y. A., & Adelson, E. H. (1994). Representing moving images with layers. IEEE Transactions on Image Processing, 3(5), 625–638.
Article Google Scholar
Weiss, Y. (1997). Interpreting images by propagating Bayesian beliefs. In Advances in neural information processing systems (Vol. 9, p. 908).
Weiss, Y. (2000). Correctness of local probability propagation in graphical models with loops. Neural Computation, 12(1), 1–41.
Article MATH Google Scholar
Wolf, L., Huang, X., Martin, I., & Metaxas, D. (2006). Patch-based texture edges and segmentation. In European conference on computer vision (ECCV) (pp. 481–493).
Xiao, J., & Shah, M. (2005). Accurate motion layer segmentation and matting. In IEEE conference on computer vision and pattern recognition (CVPR).
Xiao, J., Cheng, H., Sawhney, H., Rao, C., & Isnardi, M. (2006). Bilateral filtering-based optical flow estimation with occlusion detection. In European conference on computer vision (ECCV) (Vol. I, pp. 211–224).
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2005). Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51(7), 2282–2312.
Article MathSciNet Google Scholar
Yin, P., Criminisi, A., Winn, J., & Essa, I. (2007). Tree-based classifiers for bilayer video segmentation. In IEEE conference on computer vision and pattern recognition (CVPR).
Yu, S. X., & Shi, J. (2001). Perceiving shapes through region and boundary interaction (Technical Report CMU-RI-TR-01-21). Robotics Institute, Carnegie Mellon University.
Yuille, A. L. (2002). CCCP algorithms to minimize the Bethe and Kikuchi free energies: convergent alternatives to belief propagation. Neural Computation, 14(7), 1691–1722.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

The Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
Andrew N. Stein & Martial Hebert

Authors

Andrew N. Stein
View author publications
You can also search for this author in PubMed Google Scholar
Martial Hebert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew N. Stein.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stein, A.N., Hebert, M. Occlusion Boundaries from Motion: Low-Level Detection and Mid-Level Reasoning. Int J Comput Vis 82, 325–357 (2009). https://doi.org/10.1007/s11263-008-0203-z

Download citation

Received: 09 June 2008
Accepted: 23 December 2008
Published: 03 February 2009
Issue Date: May 2009
DOI: https://doi.org/10.1007/s11263-008-0203-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Occlusion Boundaries from Motion: Low-Level Detection and Mid-Level Reasoning

Abstract

Access this article

Similar content being viewed by others

Continual Occlusion and Optical Flow Estimation

Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework

Detecting Occlusions as an Inverse Problem

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Occlusion Boundaries from Motion: Low-Level Detection and Mid-Level Reasoning

Abstract

Access this article

Similar content being viewed by others

Continual Occlusion and Optical Flow Estimation

Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework

Detecting Occlusions as an Inverse Problem

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation