Abstract
Detecting independent objects in images and videos is an important perceptual grouping problem. One common perceptual grouping cue that can facilitate this objective is the cue of contour closure, reflecting the spatial coherence of objects in the world and their projections as closed boundaries separating figure from background. Detecting contour closure in images consists of finding a cycle of disconnected contour fragments that separates an object from its background. Searching the entire space of possible groupings is intractable, and previous approaches have adopted powerful perceptual grouping heuristics, such as proximity and co-curvilinearity, to constrain the search. We introduce a new formulation of the problem, by transforming the problem of finding cycles of contour fragments to finding subsets of superpixels whose collective boundary has strong edge support (few gaps) in the image. Our cost function, a ratio of a boundary gap measure to area, promotes spatially coherent sets of superpixels. Moreover, its properties support a global optimization procedure based on parametric maxflow. Extending closure detection to videos, we introduce the concept of spatiotemporal closure. Analogous to image closure, we formulate our spatiotemporal closure cost over a graph of spatiotemporal superpixels. Our cost function is a ratio of motion and appearance discontinuity measures on the boundary of the selection to an internal homogeneity measure of the selected spatiotemporal volume. The resulting approach automatically recovers coherent components in images and videos, corresponding to objects, object parts, and objects with surrounding context, providing a good set of multiscale hypotheses for high-level scene analysis. We evaluate both our image and video closure frameworks by comparing them to other closure detection approaches, and find that they yield improved performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The contour image takes the form of a globalPb image (Maire et al. 2008).
While spatial coherence is promoted, it is not guaranteed. Since minimizing Eq. (3) can occasionally result in disconnected sets of superpixels, we further guarantee connectedness by selecting the largest-area connected component of X.
See the Approach Overview section at http://www.cs.toronto.edu/~babalex/SpatiotemporalClosure/supplementary_material.html for a graphical overview of the method.
See the Superpixel Extraction section at http://www.cs.toronto.edu/~babalex/SpatiotemporalClosure/supplementary_material.html for a better visualization of superpixel extraction and comparison with the original Turbopixels approach.
Matlab code for our method is available at http://www.cs.toronto.edu/~babalex/closure_code.tgz.
SC can be tuned (see Fig. 9) to perform better for K=1 at a small expense of performance for higher K’s.
For WSD, there are three ground truth segmentations per image. If we instead choose the closest of the three ground truth segmentations per image (as opposed to taking the average), our score on WSD improves to 88.76 %.
Supplementary material (http://www.cs.toronto.edu/~babalex/closure_supplementary.tgz) contains the results of our algorithm for all the images in both datasets.
See the Results at http://www.cs.toronto.edu/~babalex/SpatiotemporalClosure/supplementary_material.html for a video visualization of the results.
References
Alpert, S., Galun, M., Basri, R., & Brandt, A. (2007). Image segmentation by probabilistic bottom-up aggregation and cue integration. In IEEE international conference on computer vision and pattern recognition.
Bascle, B., & Deriche, R. (1995). Region tracking through image sequences. In ICCV (p. 302).
Black, M., & Jepson, A. (1998). Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. International Journal of Computer Vision, 26(1), 63–84.
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In European conference on computer vision (pp. 109–124).
Boykov, Y. Y., & Jolly, M. P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images.
Brady, M., & Asada, H. (1984). Smoothed local symmetries and their implementation. The International Journal of Robotics Research, 3(3), 36–61.
Carreira, J., & Sminchisescu, C. (2010). Constrained parametric min-cuts for automatic object segmentation. In IEEE international conference on computer vision and pattern recognition.
Cham, T. J., & Cipolla, R. (1996). Geometric saliency of curve correspondences and grouping of symmetric contours. In European conference on computer vision (pp. 385–398).
Chung, D., MacLean, W., & Dickinson, S. (2006). Integrating region and boundary information for spatially coherent object tracking. Image and Vision Computing, 24(7), 680–692.
Comaniciu, D., Ramesh, V., & Meer, P. (2003). Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5), 564–577.
Cremers, D., & Soatto, S. (2005). Motion competition: A variational approach to piecewise parametric motion segmentation. International Journal of Computer Vision, 62, 3.
DeMenthon, D. (2002). Spatio-temporal segmentation of video by hierarchical mean shift analysis. In SMVP.
Deng, Y., & Manjunath, B. (2001). Unsupervised segmentation of color-texture regions in images and video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 800–810.
Dinkelbach, W. (1967). On nonlinear fractional programming. Management Science, 13, 492–498.
Elder, J., & Zucker, S. (1994). A measure of closure. Vision Research, 34, 3361–3369.
Elder, J. H., & Zucker, S. W. (1996). Computing contour closure. In European conference on computer vision (pp. 399–412).
Endres, I., & Hoiem, D. (2010). Category independent object proposals. In ECCV (pp. 575–588).
Estrada, F. J., & Jepson, A. D. (2004). Perceptual grouping for contour extraction. In IEEE international conference on pattern recognition (pp. 32–35).
Estrada, F. J., & Jepson, A. D. (2006). Robust boundary detection with adaptive grouping. In Computer vision and pattern recognition workshop (p. 184).
Fowlkes, C., Belongie, S., & Malik, J. (2001). Efficient spatiotemporal grouping using the Nyström method. In CVPR (pp. 231–238).
Gelgon, M., & Bouthemy, P. (2000). A region-level motion-based graph representation and labeling for tracking a spatial image partition. Pattern Recognition, 33(4), 725–740.
Greenspan, H., Goldberger, J., & Mayer, A. (2004). Probabilistic space-time video modeling via piecewise gmm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(3), 384–396.
Huang, Y., Liu, Q., & Metaxas, D. (2009). Video object segmentation by hypergraph cut. In IEEE international conference on computer vision and pattern recognition (pp. 1738–1745).
Isard, M., & Blake, A. (1998). Condensation—conditional density propagation for visual tracking. International Journal of Computer Vision, 29, 5–28.
Jacobs, D. (1996). Robust and efficient detection of salient convex groups. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(1), 23–37.
Jepson, A. D., Fleet, D. J., & Black, M. J. (2002). A layered motion representation with occlusion and compact spatial support. In ECCV (pp. 692–706).
Jermyn, I., & Ishikawa, H. (2001). Globally optimal regions and boundaries as minimum ratio weight cycles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 1075–1088.
Jojic, N., & Frey, B. J. (2001). Learning flexible sprites in video layers. In CVPR (Vol. 1, p. 199).
Kolmogorov, V., Boykov, Y., & Rother, C. (2007). Applications of parametric maxflow in computer vision. In IEEE international conference on computer vision (pp. 1–8).
Lempitsky, V., Kohli, P., Rother, C., & Sharp, T. (2009). Image segmentation with a bounding box prior.
Levinshtein, A., Sminchisescu, C., & Dickinson, S. (2009a). Multiscale symmetric part detection and grouping (pp. 2162–2169).
Levinshtein, A., Stere, A., Kutulakos, K. N., Fleet, D. J., Dickinson, S. J., & Siddiqi, K. (2009b). Turbopixels: Fast superpixels using geometric flows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2290–2297.
Levinshtein, A., Sminchisescu, C., & Dickinson, S. (2010a). Optimal contour closure by superpixel grouping. In ECCV (pp. 480–493).
Levinshtein, A., Sminchisescu, C., & Dickinson, S. (2010b). Spatiotemporal closure. In ACCV (pp. 369–382).
Li, F., Carreira, J., & Sminchisescu, C. (2010). Object recognition as ranking holistic figure-ground hypotheses. In CVPR.
Lowe, D. G. (1985). Perceptual organization and visual recognition. Norwell: Kluwer Academic.
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of imaging understanding workshop (pp. 674–679).
Maire, M., Arbelaez, P., Fowlkes, C., & Malik, J. (2008). Using contours to detect and localize junctions in natural images. In IEEE international conference on computer vision and pattern recognition.
Martin, D. R., Fowlkes, C. C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 530–549.
Megret, R., & DeMenthon, D. (2002). A survey of spatio-temporal grouping techniques (Tech. rep.). University of Maryland, College Park.
Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: Combining segmentation and recognition. In IEEE international conference on computer vision and pattern recognition (pp. 326–333).
Moscheni, F., Bhattacharjee, S., & Kunt, M. (1998). Spatiotemporal segmentation based on region merging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(9), 897–915.
Paragios, N., & Deriche, R. (2000). Geodesic active contours and level sets for the detection and tracking of moving objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(3), 266–280.
Patras, I., Lagendijk, R. L., & Hendriks, E. A. (2001). Video segmentation by map labeling of watershed segments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 326–332.
Pritch, Y., Rav-Acha, A., & Peleg, S. (2008). Nonchronological video synopsis and indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1971–1984.
Ren, X., Fowlkes, C. C., & Malik, J. (2005a). Cue integration in figure/ground labeling. In Advances in neural information processing systems.
Ren, X., Fowlkes, C. C., & Malik, J. (2005b). Scale-invariant contour completion using conditional random fields. In IEEE international conference on computer vision (Vol. 2, pp. 1214–1221).
Rother, C., Kolmogorov, V., & Blake, A. (2004). “grabcut”: interactive foreground extraction using iterated graph cuts. SIGGRAPH, 23(3), 309–314.
Saint-Marc, P., Rom, H., & Medioni, G. (1993). B-spline contour representation and symmetry detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(11), 1191–1197.
Shi, J., & Malik, J. (1998). Motion segmentation and tracking using normalized cuts. In ICCV (p. 1154).
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Stahl, J., & Wang, S. (2007). Edge grouping combining boundary and region information. IEEE Transactions on Image Processing, 16(10), 2590–2606.
Stahl, J., & Wang, S. (2008). Globally optimal grouping for symmetric closed boundaries by combining boundary and region information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(3), 395–411.
Stein, A., Hoiem, D., & Hebert, M. (2007). Learning to find object boundaries using motion cues. In IEEE international conference on computer vision (pp. 1–8).
Wang, D. (1998). Unsupervised video segmentation based on watersheds and temporal tracking. IEEE Transactions on Circuits and Systems for Video Technology, 8(5), 539–546.
Wang, J., & Adelson, E. (1994). Representing moving images with layers. IEEE Transactions on Image Processing, 3(5), 625–638.
Wang, S., Kubota, T., Siskind, J., & Wang, J. (2005). Salient closed boundary extraction with ratio contour. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(4), 546–561.
Weiss, Y. (1997). Smoothness in layers: Motion segmentation using nonparametric mixture estimation. In CVPR (p. 520).
Weiss, Y., & Adelson, E. H. (1996). A unified mixture framework for motion segmentation: Incorporating spatial coherence and estimating the number of models. In CVPR (p. 321).
Welch, G., & Bishop, G. (1995). An introduction to the Kalman filter (Tech. rep.).
Wertheimer, M. (1938). Laws of organization in perceptual forms. In W. Ellis (ed.), Source book of gestalt psychology. New York: Harcourt, Brace.
Williams, L. R., & Hanson, A. R. (1996). Perceptual completion of occluded surfaces. Computer Vision and Image Understanding, 64(1), 1–20.
Williams, L. R., & Jacobs, D. W. (1995). Stochastic completion fields: a neural model of illusory contour shape and salience. In IEEE international conference on computer vision (p. 408).
Ylä-Jääski, A., & Ade, F. (1996). Grouping symmetrical structures for object segmentation and description. Computer Vision and Image Understanding, 63(3), 399–417.
Zhu, Q., Song, G., & Shi, J. (2007). Untangling cycles for contour grouping. In IEEE international conference on computer vision.
Acknowledgements
We thank Allan Jepson for discussion about closure cost functions and optimization procedures, and Yuri Boykov and Vladimir Kolmogorov for providing their parametric maxflow implementation. This work was supported in part by the European Commission under a Marie Curie Excellence Grant MCEXT-025481 (Cristian Sminchisescu), CNCSIS-UEFISCU under project number PN II- RU-RC-2/2009 (Cristian Sminchisescu), CNCSIS-UEFISCU under project number PN II- RU-RC-2/2009 (Cristian Sminchisescu), NSERC (Alex Levinshtein, Sven Dickinson), MITACS (Alex Levinshtein), and DARPA (Sven Dickinson).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Levinshtein, A., Sminchisescu, C. & Dickinson, S. Optimal Image and Video Closure by Superpixel Grouping. Int J Comput Vis 100, 99–119 (2012). https://doi.org/10.1007/s11263-012-0527-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-012-0527-6