Skip to main content
Log in

Optimal Image and Video Closure by Superpixel Grouping

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Detecting independent objects in images and videos is an important perceptual grouping problem. One common perceptual grouping cue that can facilitate this objective is the cue of contour closure, reflecting the spatial coherence of objects in the world and their projections as closed boundaries separating figure from background. Detecting contour closure in images consists of finding a cycle of disconnected contour fragments that separates an object from its background. Searching the entire space of possible groupings is intractable, and previous approaches have adopted powerful perceptual grouping heuristics, such as proximity and co-curvilinearity, to constrain the search. We introduce a new formulation of the problem, by transforming the problem of finding cycles of contour fragments to finding subsets of superpixels whose collective boundary has strong edge support (few gaps) in the image. Our cost function, a ratio of a boundary gap measure to area, promotes spatially coherent sets of superpixels. Moreover, its properties support a global optimization procedure based on parametric maxflow. Extending closure detection to videos, we introduce the concept of spatiotemporal closure. Analogous to image closure, we formulate our spatiotemporal closure cost over a graph of spatiotemporal superpixels. Our cost function is a ratio of motion and appearance discontinuity measures on the boundary of the selection to an internal homogeneity measure of the selected spatiotemporal volume. The resulting approach automatically recovers coherent components in images and videos, corresponding to objects, object parts, and objects with surrounding context, providing a good set of multiscale hypotheses for high-level scene analysis. We evaluate both our image and video closure frameworks by comparing them to other closure detection approaches, and find that they yield improved performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. The contour image takes the form of a globalPb image (Maire et al. 2008).

  2. While spatial coherence is promoted, it is not guaranteed. Since minimizing Eq. (3) can occasionally result in disconnected sets of superpixels, we further guarantee connectedness by selecting the largest-area connected component of X.

  3. See the Approach Overview section at http://www.cs.toronto.edu/~babalex/SpatiotemporalClosure/supplementary_material.html for a graphical overview of the method.

  4. See the Superpixel Extraction section at http://www.cs.toronto.edu/~babalex/SpatiotemporalClosure/supplementary_material.html for a better visualization of superpixel extraction and comparison with the original Turbopixels approach.

  5. Matlab code for our method is available at http://www.cs.toronto.edu/~babalex/closure_code.tgz.

  6. SC can be tuned (see Fig. 9) to perform better for K=1 at a small expense of performance for higher K’s.

  7. For WSD, there are three ground truth segmentations per image. If we instead choose the closest of the three ground truth segmentations per image (as opposed to taking the average), our score on WSD improves to 88.76 %.

  8. Supplementary material (http://www.cs.toronto.edu/~babalex/closure_supplementary.tgz) contains the results of our algorithm for all the images in both datasets.

  9. See the Results at http://www.cs.toronto.edu/~babalex/SpatiotemporalClosure/supplementary_material.html for a video visualization of the results.

References

  • Alpert, S., Galun, M., Basri, R., & Brandt, A. (2007). Image segmentation by probabilistic bottom-up aggregation and cue integration. In IEEE international conference on computer vision and pattern recognition.

    Google Scholar 

  • Bascle, B., & Deriche, R. (1995). Region tracking through image sequences. In ICCV (p. 302).

    Google Scholar 

  • Black, M., & Jepson, A. (1998). Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. International Journal of Computer Vision, 26(1), 63–84.

    Article  Google Scholar 

  • Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In European conference on computer vision (pp. 109–124).

    Google Scholar 

  • Boykov, Y. Y., & Jolly, M. P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images.

  • Brady, M., & Asada, H. (1984). Smoothed local symmetries and their implementation. The International Journal of Robotics Research, 3(3), 36–61.

    Article  Google Scholar 

  • Carreira, J., & Sminchisescu, C. (2010). Constrained parametric min-cuts for automatic object segmentation. In IEEE international conference on computer vision and pattern recognition.

    Google Scholar 

  • Cham, T. J., & Cipolla, R. (1996). Geometric saliency of curve correspondences and grouping of symmetric contours. In European conference on computer vision (pp. 385–398).

    Google Scholar 

  • Chung, D., MacLean, W., & Dickinson, S. (2006). Integrating region and boundary information for spatially coherent object tracking. Image and Vision Computing, 24(7), 680–692.

    Article  Google Scholar 

  • Comaniciu, D., Ramesh, V., & Meer, P. (2003). Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5), 564–577.

    Article  Google Scholar 

  • Cremers, D., & Soatto, S. (2005). Motion competition: A variational approach to piecewise parametric motion segmentation. International Journal of Computer Vision, 62, 3.

    Article  Google Scholar 

  • DeMenthon, D. (2002). Spatio-temporal segmentation of video by hierarchical mean shift analysis. In SMVP.

    Google Scholar 

  • Deng, Y., & Manjunath, B. (2001). Unsupervised segmentation of color-texture regions in images and video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 800–810.

    Article  Google Scholar 

  • Dinkelbach, W. (1967). On nonlinear fractional programming. Management Science, 13, 492–498.

    Article  MathSciNet  MATH  Google Scholar 

  • Elder, J., & Zucker, S. (1994). A measure of closure. Vision Research, 34, 3361–3369.

    Article  Google Scholar 

  • Elder, J. H., & Zucker, S. W. (1996). Computing contour closure. In European conference on computer vision (pp. 399–412).

    Google Scholar 

  • Endres, I., & Hoiem, D. (2010). Category independent object proposals. In ECCV (pp. 575–588).

    Google Scholar 

  • Estrada, F. J., & Jepson, A. D. (2004). Perceptual grouping for contour extraction. In IEEE international conference on pattern recognition (pp. 32–35).

    Google Scholar 

  • Estrada, F. J., & Jepson, A. D. (2006). Robust boundary detection with adaptive grouping. In Computer vision and pattern recognition workshop (p. 184).

    Google Scholar 

  • Fowlkes, C., Belongie, S., & Malik, J. (2001). Efficient spatiotemporal grouping using the Nyström method. In CVPR (pp. 231–238).

    Google Scholar 

  • Gelgon, M., & Bouthemy, P. (2000). A region-level motion-based graph representation and labeling for tracking a spatial image partition. Pattern Recognition, 33(4), 725–740.

    Article  Google Scholar 

  • Greenspan, H., Goldberger, J., & Mayer, A. (2004). Probabilistic space-time video modeling via piecewise gmm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(3), 384–396.

    Article  Google Scholar 

  • Huang, Y., Liu, Q., & Metaxas, D. (2009). Video object segmentation by hypergraph cut. In IEEE international conference on computer vision and pattern recognition (pp. 1738–1745).

    Chapter  Google Scholar 

  • Isard, M., & Blake, A. (1998). Condensation—conditional density propagation for visual tracking. International Journal of Computer Vision, 29, 5–28.

    Article  Google Scholar 

  • Jacobs, D. (1996). Robust and efficient detection of salient convex groups. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(1), 23–37.

    Article  MathSciNet  Google Scholar 

  • Jepson, A. D., Fleet, D. J., & Black, M. J. (2002). A layered motion representation with occlusion and compact spatial support. In ECCV (pp. 692–706).

    Google Scholar 

  • Jermyn, I., & Ishikawa, H. (2001). Globally optimal regions and boundaries as minimum ratio weight cycles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 1075–1088.

    Article  Google Scholar 

  • Jojic, N., & Frey, B. J. (2001). Learning flexible sprites in video layers. In CVPR (Vol. 1, p. 199).

    Google Scholar 

  • Kolmogorov, V., Boykov, Y., & Rother, C. (2007). Applications of parametric maxflow in computer vision. In IEEE international conference on computer vision (pp. 1–8).

    Chapter  Google Scholar 

  • Lempitsky, V., Kohli, P., Rother, C., & Sharp, T. (2009). Image segmentation with a bounding box prior.

  • Levinshtein, A., Sminchisescu, C., & Dickinson, S. (2009a). Multiscale symmetric part detection and grouping (pp. 2162–2169).

    Google Scholar 

  • Levinshtein, A., Stere, A., Kutulakos, K. N., Fleet, D. J., Dickinson, S. J., & Siddiqi, K. (2009b). Turbopixels: Fast superpixels using geometric flows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2290–2297.

    Article  Google Scholar 

  • Levinshtein, A., Sminchisescu, C., & Dickinson, S. (2010a). Optimal contour closure by superpixel grouping. In ECCV (pp. 480–493).

    Google Scholar 

  • Levinshtein, A., Sminchisescu, C., & Dickinson, S. (2010b). Spatiotemporal closure. In ACCV (pp. 369–382).

    Google Scholar 

  • Li, F., Carreira, J., & Sminchisescu, C. (2010). Object recognition as ranking holistic figure-ground hypotheses. In CVPR.

    Google Scholar 

  • Lowe, D. G. (1985). Perceptual organization and visual recognition. Norwell: Kluwer Academic.

    Book  Google Scholar 

  • Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of imaging understanding workshop (pp. 674–679).

    Google Scholar 

  • Maire, M., Arbelaez, P., Fowlkes, C., & Malik, J. (2008). Using contours to detect and localize junctions in natural images. In IEEE international conference on computer vision and pattern recognition.

    Google Scholar 

  • Martin, D. R., Fowlkes, C. C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 530–549.

    Article  Google Scholar 

  • Megret, R., & DeMenthon, D. (2002). A survey of spatio-temporal grouping techniques (Tech. rep.). University of Maryland, College Park.

  • Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: Combining segmentation and recognition. In IEEE international conference on computer vision and pattern recognition (pp. 326–333).

    Google Scholar 

  • Moscheni, F., Bhattacharjee, S., & Kunt, M. (1998). Spatiotemporal segmentation based on region merging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(9), 897–915.

    Article  Google Scholar 

  • Paragios, N., & Deriche, R. (2000). Geodesic active contours and level sets for the detection and tracking of moving objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(3), 266–280.

    Article  Google Scholar 

  • Patras, I., Lagendijk, R. L., & Hendriks, E. A. (2001). Video segmentation by map labeling of watershed segments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 326–332.

    Article  Google Scholar 

  • Pritch, Y., Rav-Acha, A., & Peleg, S. (2008). Nonchronological video synopsis and indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1971–1984.

    Article  Google Scholar 

  • Ren, X., Fowlkes, C. C., & Malik, J. (2005a). Cue integration in figure/ground labeling. In Advances in neural information processing systems.

    Google Scholar 

  • Ren, X., Fowlkes, C. C., & Malik, J. (2005b). Scale-invariant contour completion using conditional random fields. In IEEE international conference on computer vision (Vol. 2, pp. 1214–1221).

    Google Scholar 

  • Rother, C., Kolmogorov, V., & Blake, A. (2004). “grabcut”: interactive foreground extraction using iterated graph cuts. SIGGRAPH, 23(3), 309–314.

    Article  Google Scholar 

  • Saint-Marc, P., Rom, H., & Medioni, G. (1993). B-spline contour representation and symmetry detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(11), 1191–1197.

    Article  Google Scholar 

  • Shi, J., & Malik, J. (1998). Motion segmentation and tracking using normalized cuts. In ICCV (p. 1154).

    Google Scholar 

  • Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.

    Article  Google Scholar 

  • Stahl, J., & Wang, S. (2007). Edge grouping combining boundary and region information. IEEE Transactions on Image Processing, 16(10), 2590–2606.

    Article  MathSciNet  Google Scholar 

  • Stahl, J., & Wang, S. (2008). Globally optimal grouping for symmetric closed boundaries by combining boundary and region information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(3), 395–411.

    Article  Google Scholar 

  • Stein, A., Hoiem, D., & Hebert, M. (2007). Learning to find object boundaries using motion cues. In IEEE international conference on computer vision (pp. 1–8).

    Chapter  Google Scholar 

  • Wang, D. (1998). Unsupervised video segmentation based on watersheds and temporal tracking. IEEE Transactions on Circuits and Systems for Video Technology, 8(5), 539–546.

    Article  Google Scholar 

  • Wang, J., & Adelson, E. (1994). Representing moving images with layers. IEEE Transactions on Image Processing, 3(5), 625–638.

    Article  Google Scholar 

  • Wang, S., Kubota, T., Siskind, J., & Wang, J. (2005). Salient closed boundary extraction with ratio contour. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(4), 546–561.

    Article  Google Scholar 

  • Weiss, Y. (1997). Smoothness in layers: Motion segmentation using nonparametric mixture estimation. In CVPR (p. 520).

    Google Scholar 

  • Weiss, Y., & Adelson, E. H. (1996). A unified mixture framework for motion segmentation: Incorporating spatial coherence and estimating the number of models. In CVPR (p. 321).

    Google Scholar 

  • Welch, G., & Bishop, G. (1995). An introduction to the Kalman filter (Tech. rep.).

  • Wertheimer, M. (1938). Laws of organization in perceptual forms. In W. Ellis (ed.), Source book of gestalt psychology. New York: Harcourt, Brace.

    Google Scholar 

  • Williams, L. R., & Hanson, A. R. (1996). Perceptual completion of occluded surfaces. Computer Vision and Image Understanding, 64(1), 1–20.

    Article  Google Scholar 

  • Williams, L. R., & Jacobs, D. W. (1995). Stochastic completion fields: a neural model of illusory contour shape and salience. In IEEE international conference on computer vision (p. 408).

    Google Scholar 

  • Ylä-Jääski, A., & Ade, F. (1996). Grouping symmetrical structures for object segmentation and description. Computer Vision and Image Understanding, 63(3), 399–417.

    Article  Google Scholar 

  • Zhu, Q., Song, G., & Shi, J. (2007). Untangling cycles for contour grouping. In IEEE international conference on computer vision.

    Google Scholar 

Download references

Acknowledgements

We thank Allan Jepson for discussion about closure cost functions and optimization procedures, and Yuri Boykov and Vladimir Kolmogorov for providing their parametric maxflow implementation. This work was supported in part by the European Commission under a Marie Curie Excellence Grant MCEXT-025481 (Cristian Sminchisescu), CNCSIS-UEFISCU under project number PN II- RU-RC-2/2009 (Cristian Sminchisescu), CNCSIS-UEFISCU under project number PN II- RU-RC-2/2009 (Cristian Sminchisescu), NSERC (Alex Levinshtein, Sven Dickinson), MITACS (Alex Levinshtein), and DARPA (Sven Dickinson).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex Levinshtein.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Levinshtein, A., Sminchisescu, C. & Dickinson, S. Optimal Image and Video Closure by Superpixel Grouping. Int J Comput Vis 100, 99–119 (2012). https://doi.org/10.1007/s11263-012-0527-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-012-0527-6

Keywords

Navigation