Optimal Image and Video Closure by Superpixel Grouping

Levinshtein, Alex; Sminchisescu, Cristian; Dickinson, Sven

doi:10.1007/s11263-012-0527-6

Optimal Image and Video Closure by Superpixel Grouping

Published: 04 May 2012

Volume 100, pages 99–119, (2012)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Alex Levinshtein¹,
Cristian Sminchisescu² &
Sven Dickinson¹

1237 Accesses
22 Citations
6 Altmetric
Explore all metrics

Abstract

Detecting independent objects in images and videos is an important perceptual grouping problem. One common perceptual grouping cue that can facilitate this objective is the cue of contour closure, reflecting the spatial coherence of objects in the world and their projections as closed boundaries separating figure from background. Detecting contour closure in images consists of finding a cycle of disconnected contour fragments that separates an object from its background. Searching the entire space of possible groupings is intractable, and previous approaches have adopted powerful perceptual grouping heuristics, such as proximity and co-curvilinearity, to constrain the search. We introduce a new formulation of the problem, by transforming the problem of finding cycles of contour fragments to finding subsets of superpixels whose collective boundary has strong edge support (few gaps) in the image. Our cost function, a ratio of a boundary gap measure to area, promotes spatially coherent sets of superpixels. Moreover, its properties support a global optimization procedure based on parametric maxflow. Extending closure detection to videos, we introduce the concept of spatiotemporal closure. Analogous to image closure, we formulate our spatiotemporal closure cost over a graph of spatiotemporal superpixels. Our cost function is a ratio of motion and appearance discontinuity measures on the boundary of the selection to an internal homogeneity measure of the selected spatiotemporal volume. The resulting approach automatically recovers coherent components in images and videos, corresponding to objects, object parts, and objects with surrounding context, providing a good set of multiscale hypotheses for high-level scene analysis. We evaluate both our image and video closure frameworks by comparing them to other closure detection approaches, and find that they yield improved performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Superpixels for Video Content Using a Contour-Based EM Optimization

Supervoxel-based segmentation of 3D imagery with optical flow integration for spatiotemporal processing

Article Open access 19 June 2018

Interactive Segmentation of High-Resolution Video Content Using Temporally Coherent Superpixels and Graph Cut

Notes

The contour image takes the form of a globalPb image (Maire et al. 2008).
While spatial coherence is promoted, it is not guaranteed. Since minimizing Eq. (3) can occasionally result in disconnected sets of superpixels, we further guarantee connectedness by selecting the largest-area connected component of X.
See the Approach Overview section at http://www.cs.toronto.edu/~babalex/SpatiotemporalClosure/supplementary_material.html for a graphical overview of the method.
See the Superpixel Extraction section at http://www.cs.toronto.edu/~babalex/SpatiotemporalClosure/supplementary_material.html for a better visualization of superpixel extraction and comparison with the original Turbopixels approach.
Matlab code for our method is available at http://www.cs.toronto.edu/~babalex/closure_code.tgz.
SC can be tuned (see Fig. 9) to perform better for K=1 at a small expense of performance for higher K’s.
For WSD, there are three ground truth segmentations per image. If we instead choose the closest of the three ground truth segmentations per image (as opposed to taking the average), our score on WSD improves to 88.76 %.
Supplementary material (http://www.cs.toronto.edu/~babalex/closure_supplementary.tgz) contains the results of our algorithm for all the images in both datasets.
See the Results at http://www.cs.toronto.edu/~babalex/SpatiotemporalClosure/supplementary_material.html for a video visualization of the results.

References

Alpert, S., Galun, M., Basri, R., & Brandt, A. (2007). Image segmentation by probabilistic bottom-up aggregation and cue integration. In IEEE international conference on computer vision and pattern recognition.
Google Scholar
Bascle, B., & Deriche, R. (1995). Region tracking through image sequences. In ICCV (p. 302).
Google Scholar
Black, M., & Jepson, A. (1998). Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. International Journal of Computer Vision, 26(1), 63–84.
Article Google Scholar
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In European conference on computer vision (pp. 109–124).
Google Scholar
Boykov, Y. Y., & Jolly, M. P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images.
Brady, M., & Asada, H. (1984). Smoothed local symmetries and their implementation. The International Journal of Robotics Research, 3(3), 36–61.
Article Google Scholar
Carreira, J., & Sminchisescu, C. (2010). Constrained parametric min-cuts for automatic object segmentation. In IEEE international conference on computer vision and pattern recognition.
Google Scholar
Cham, T. J., & Cipolla, R. (1996). Geometric saliency of curve correspondences and grouping of symmetric contours. In European conference on computer vision (pp. 385–398).
Google Scholar
Chung, D., MacLean, W., & Dickinson, S. (2006). Integrating region and boundary information for spatially coherent object tracking. Image and Vision Computing, 24(7), 680–692.
Article Google Scholar
Comaniciu, D., Ramesh, V., & Meer, P. (2003). Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5), 564–577.
Article Google Scholar
Cremers, D., & Soatto, S. (2005). Motion competition: A variational approach to piecewise parametric motion segmentation. International Journal of Computer Vision, 62, 3.
Article Google Scholar
DeMenthon, D. (2002). Spatio-temporal segmentation of video by hierarchical mean shift analysis. In SMVP.
Google Scholar
Deng, Y., & Manjunath, B. (2001). Unsupervised segmentation of color-texture regions in images and video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 800–810.
Article Google Scholar
Dinkelbach, W. (1967). On nonlinear fractional programming. Management Science, 13, 492–498.
Article MathSciNet MATH Google Scholar
Elder, J., & Zucker, S. (1994). A measure of closure. Vision Research, 34, 3361–3369.
Article Google Scholar
Elder, J. H., & Zucker, S. W. (1996). Computing contour closure. In European conference on computer vision (pp. 399–412).
Google Scholar
Endres, I., & Hoiem, D. (2010). Category independent object proposals. In ECCV (pp. 575–588).
Google Scholar
Estrada, F. J., & Jepson, A. D. (2004). Perceptual grouping for contour extraction. In IEEE international conference on pattern recognition (pp. 32–35).
Google Scholar
Estrada, F. J., & Jepson, A. D. (2006). Robust boundary detection with adaptive grouping. In Computer vision and pattern recognition workshop (p. 184).
Google Scholar
Fowlkes, C., Belongie, S., & Malik, J. (2001). Efficient spatiotemporal grouping using the Nyström method. In CVPR (pp. 231–238).
Google Scholar
Gelgon, M., & Bouthemy, P. (2000). A region-level motion-based graph representation and labeling for tracking a spatial image partition. Pattern Recognition, 33(4), 725–740.
Article Google Scholar
Greenspan, H., Goldberger, J., & Mayer, A. (2004). Probabilistic space-time video modeling via piecewise gmm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(3), 384–396.
Article Google Scholar
Huang, Y., Liu, Q., & Metaxas, D. (2009). Video object segmentation by hypergraph cut. In IEEE international conference on computer vision and pattern recognition (pp. 1738–1745).
Chapter Google Scholar
Isard, M., & Blake, A. (1998). Condensation—conditional density propagation for visual tracking. International Journal of Computer Vision, 29, 5–28.
Article Google Scholar
Jacobs, D. (1996). Robust and efficient detection of salient convex groups. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(1), 23–37.
Article MathSciNet Google Scholar
Jepson, A. D., Fleet, D. J., & Black, M. J. (2002). A layered motion representation with occlusion and compact spatial support. In ECCV (pp. 692–706).
Google Scholar
Jermyn, I., & Ishikawa, H. (2001). Globally optimal regions and boundaries as minimum ratio weight cycles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 1075–1088.
Article Google Scholar
Jojic, N., & Frey, B. J. (2001). Learning flexible sprites in video layers. In CVPR (Vol. 1, p. 199).
Google Scholar
Kolmogorov, V., Boykov, Y., & Rother, C. (2007). Applications of parametric maxflow in computer vision. In IEEE international conference on computer vision (pp. 1–8).
Chapter Google Scholar
Lempitsky, V., Kohli, P., Rother, C., & Sharp, T. (2009). Image segmentation with a bounding box prior.
Levinshtein, A., Sminchisescu, C., & Dickinson, S. (2009a). Multiscale symmetric part detection and grouping (pp. 2162–2169).
Google Scholar
Levinshtein, A., Stere, A., Kutulakos, K. N., Fleet, D. J., Dickinson, S. J., & Siddiqi, K. (2009b). Turbopixels: Fast superpixels using geometric flows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2290–2297.
Article Google Scholar
Levinshtein, A., Sminchisescu, C., & Dickinson, S. (2010a). Optimal contour closure by superpixel grouping. In ECCV (pp. 480–493).
Google Scholar
Levinshtein, A., Sminchisescu, C., & Dickinson, S. (2010b). Spatiotemporal closure. In ACCV (pp. 369–382).
Google Scholar
Li, F., Carreira, J., & Sminchisescu, C. (2010). Object recognition as ranking holistic figure-ground hypotheses. In CVPR.
Google Scholar
Lowe, D. G. (1985). Perceptual organization and visual recognition. Norwell: Kluwer Academic.
Book Google Scholar
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of imaging understanding workshop (pp. 674–679).
Google Scholar
Maire, M., Arbelaez, P., Fowlkes, C., & Malik, J. (2008). Using contours to detect and localize junctions in natural images. In IEEE international conference on computer vision and pattern recognition.
Google Scholar
Martin, D. R., Fowlkes, C. C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 530–549.
Article Google Scholar
Megret, R., & DeMenthon, D. (2002). A survey of spatio-temporal grouping techniques (Tech. rep.). University of Maryland, College Park.
Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: Combining segmentation and recognition. In IEEE international conference on computer vision and pattern recognition (pp. 326–333).
Google Scholar
Moscheni, F., Bhattacharjee, S., & Kunt, M. (1998). Spatiotemporal segmentation based on region merging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(9), 897–915.
Article Google Scholar
Paragios, N., & Deriche, R. (2000). Geodesic active contours and level sets for the detection and tracking of moving objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(3), 266–280.
Article Google Scholar
Patras, I., Lagendijk, R. L., & Hendriks, E. A. (2001). Video segmentation by map labeling of watershed segments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 326–332.
Article Google Scholar
Pritch, Y., Rav-Acha, A., & Peleg, S. (2008). Nonchronological video synopsis and indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1971–1984.
Article Google Scholar
Ren, X., Fowlkes, C. C., & Malik, J. (2005a). Cue integration in figure/ground labeling. In Advances in neural information processing systems.
Google Scholar
Ren, X., Fowlkes, C. C., & Malik, J. (2005b). Scale-invariant contour completion using conditional random fields. In IEEE international conference on computer vision (Vol. 2, pp. 1214–1221).
Google Scholar
Rother, C., Kolmogorov, V., & Blake, A. (2004). “grabcut”: interactive foreground extraction using iterated graph cuts. SIGGRAPH, 23(3), 309–314.
Article Google Scholar
Saint-Marc, P., Rom, H., & Medioni, G. (1993). B-spline contour representation and symmetry detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(11), 1191–1197.
Article Google Scholar
Shi, J., & Malik, J. (1998). Motion segmentation and tracking using normalized cuts. In ICCV (p. 1154).
Google Scholar
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Article Google Scholar
Stahl, J., & Wang, S. (2007). Edge grouping combining boundary and region information. IEEE Transactions on Image Processing, 16(10), 2590–2606.
Article MathSciNet Google Scholar
Stahl, J., & Wang, S. (2008). Globally optimal grouping for symmetric closed boundaries by combining boundary and region information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(3), 395–411.
Article Google Scholar
Stein, A., Hoiem, D., & Hebert, M. (2007). Learning to find object boundaries using motion cues. In IEEE international conference on computer vision (pp. 1–8).
Chapter Google Scholar
Wang, D. (1998). Unsupervised video segmentation based on watersheds and temporal tracking. IEEE Transactions on Circuits and Systems for Video Technology, 8(5), 539–546.
Article Google Scholar
Wang, J., & Adelson, E. (1994). Representing moving images with layers. IEEE Transactions on Image Processing, 3(5), 625–638.
Article Google Scholar
Wang, S., Kubota, T., Siskind, J., & Wang, J. (2005). Salient closed boundary extraction with ratio contour. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(4), 546–561.
Article Google Scholar
Weiss, Y. (1997). Smoothness in layers: Motion segmentation using nonparametric mixture estimation. In CVPR (p. 520).
Google Scholar
Weiss, Y., & Adelson, E. H. (1996). A unified mixture framework for motion segmentation: Incorporating spatial coherence and estimating the number of models. In CVPR (p. 321).
Google Scholar
Welch, G., & Bishop, G. (1995). An introduction to the Kalman filter (Tech. rep.).
Wertheimer, M. (1938). Laws of organization in perceptual forms. In W. Ellis (ed.), Source book of gestalt psychology. New York: Harcourt, Brace.
Google Scholar
Williams, L. R., & Hanson, A. R. (1996). Perceptual completion of occluded surfaces. Computer Vision and Image Understanding, 64(1), 1–20.
Article Google Scholar
Williams, L. R., & Jacobs, D. W. (1995). Stochastic completion fields: a neural model of illusory contour shape and salience. In IEEE international conference on computer vision (p. 408).
Google Scholar
Ylä-Jääski, A., & Ade, F. (1996). Grouping symmetrical structures for object segmentation and description. Computer Vision and Image Understanding, 63(3), 399–417.
Article Google Scholar
Zhu, Q., Song, G., & Shi, J. (2007). Untangling cycles for contour grouping. In IEEE international conference on computer vision.
Google Scholar

Download references

Acknowledgements

We thank Allan Jepson for discussion about closure cost functions and optimization procedures, and Yuri Boykov and Vladimir Kolmogorov for providing their parametric maxflow implementation. This work was supported in part by the European Commission under a Marie Curie Excellence Grant MCEXT-025481 (Cristian Sminchisescu), CNCSIS-UEFISCU under project number PN II- RU-RC-2/2009 (Cristian Sminchisescu), CNCSIS-UEFISCU under project number PN II- RU-RC-2/2009 (Cristian Sminchisescu), NSERC (Alex Levinshtein, Sven Dickinson), MITACS (Alex Levinshtein), and DARPA (Sven Dickinson).

Author information

Authors and Affiliations

University of Toronto, 6 King’s College Rd., Pratt Building, Toronto, ON, Canada, M5S 3G4
Alex Levinshtein & Sven Dickinson
Institut für Numerische Simulation, University of Bonn, Wegelerstr. 4 (Flachbau), 53115, Bonn, Germany
Cristian Sminchisescu

Authors

Alex Levinshtein
View author publications
You can also search for this author in PubMed Google Scholar
Cristian Sminchisescu
View author publications
You can also search for this author in PubMed Google Scholar
Sven Dickinson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alex Levinshtein.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Levinshtein, A., Sminchisescu, C. & Dickinson, S. Optimal Image and Video Closure by Superpixel Grouping. Int J Comput Vis 100, 99–119 (2012). https://doi.org/10.1007/s11263-012-0527-6

Download citation

Received: 01 May 2011
Accepted: 15 April 2012
Published: 04 May 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s11263-012-0527-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal Image and Video Closure by Superpixel Grouping

Abstract

Access this article

Similar content being viewed by others

Superpixels for Video Content Using a Contour-Based EM Optimization

Supervoxel-based segmentation of 3D imagery with optical flow integration for spatiotemporal processing

Interactive Segmentation of High-Resolution Video Content Using Temporally Coherent Superpixels and Graph Cut

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal Image and Video Closure by Superpixel Grouping

Abstract

Access this article

Similar content being viewed by others

Superpixels for Video Content Using a Contour-Based EM Optimization

Supervoxel-based segmentation of 3D imagery with optical flow integration for spatiotemporal processing

Interactive Segmentation of High-Resolution Video Content Using Temporally Coherent Superpixels and Graph Cut

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation