LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing

Xu, Chenliang; Corso, Jason J.

doi:10.1007/s11263-016-0906-5

LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing

Published: 25 April 2016

Volume 119, pages 272–290, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Chenliang Xu¹ &
Jason J. Corso¹

1294 Accesses
36 Citations
1 Altmetric
Explore all metrics

Abstract

Supervoxel segmentation has strong potential to be incorporated into early video analysis as superpixel segmentation has in image analysis. However, there are many plausible supervoxel methods and little understanding as to when and where each is most appropriate. Indeed, we are not aware of a single comparative study on supervoxel segmentation. To that end, we study seven supervoxel algorithms, including both off-line and streaming methods, in the context of what we consider to be a good supervoxel: namely, spatiotemporal uniformity, object/region boundary detection, region compression and parsimony. For the evaluation we propose a comprehensive suite of seven quality metrics to measure these desirable supervoxel characteristics. In addition, we evaluate the methods in a supervoxel classification task as a proxy for subsequent high-level uses of the supervoxels in video analysis. We use six existing benchmark video datasets with a variety of content-types and dense human annotations. Our findings have led us to conclusive evidence that the hierarchical graph-based (GBH), segmentation by weighted aggregation (SWA) and temporal superpixels (TSP) methods are the top-performers among the seven methods. They all perform well in terms of segmentation accuracy, but vary in regard to the other desiderata: GBH captures object boundaries best; SWA has the best potential for region compression; and TSP achieves the best undersegmentation error.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Superpixels for Video Content Using a Contour-Based EM Optimization

Video Segmentation with Superpixels

Improved Image Boundaries for Better Video Segmentation

Notes

http://people.csail.mit.edu/sparis/.
http://people.csail.mit.edu/jchang7/code.php.
We manually exclude the corrupted frames, and organize the dataset into short clips with roughly 100 frames-per-clip. The organized short clips can be downloaded from our website.
http://www.robots.ox.ac.uk/~vgg/research/texclass/filters.html.
http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

References

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P. & Susstrunk, S. (2012). Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 898–916.
Article Google Scholar
Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M. J., & Szeliski, R. (2011). A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92(1), 1–31.
Article Google Scholar
Brendel, W. & Todorovic, S. (2009) Video object segmentation by tracking regions. In IEEE International Conference on Computer Vision.
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R. (2008). Segmentation and recognition using structure from motion point clouds. In European Conference on Computer Vision.
Budvytis, I. & Badrinarayanan, V., Cipolla, R. (2011). Semi-supervised video segmentation using tree structured graphical models. In IEEE Conference on Computer Vision and Pattern Recognition.
Chang, J., Wei, D. & III J. W. F. (2013). A video representation using temporal superpixels. In IEEE Conference on Computer Vision and Pattern Recognition.
Chen, A. Y. C. & Corso, J. J. (2010). Propagating multi-class pixel labels throughout video frames. In Proceedings of Western New York Image Processing Workshop.
Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Article Google Scholar
Corso, J. J., Sharon, E., Dube, S., El-Saden, S., Sinha, U., & Yuille, A. (2008). Efficient multilevel brain tumor segmentation with integrated bayesian model classification. IEEE Transactions on Medical Imaging, 27(5), 629–640.
Article Google Scholar
de Souza, K. J. F., de Albuquerque Araújo, A., et al (2014). Graph-based hierarchical video segmentation based on a simple dissimilarity measure. Pattern Recognition Letters.
DeMenthon, D. & Megret, R. (2002). Spatio-temporal segmentation of video by hierarchical mean shift analysis. In Statistical Methods in Video Processing Workshop.
Drucker, F. & MacCormick, J. (2009). Fast superpixels for video analysis. In IEEE Workshop on Motion and Video Computing.
Erdem, Ç. E., Sankur, B., & Tekalp, A. M. (2004). Performance measures for video object segmentation and tracking. IEEE Transactions on Image Processing, 13(7), 937–951.
Article Google Scholar
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Article Google Scholar
Fowlkes, C., Belongie, S. & Malik, J. (2001) Efficient spatiotemporal grouping using the nystrom method. In IEEE Conference on Computer Vision and Pattern Recognition.
Fowlkes, C., Belongie, S., Chung, F., & Malik, J. (2004). Spectral grouping using the nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 214–225.
Article Google Scholar
Fukunaga, K., & Hostetler, L. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1), 32–40.
Article MathSciNet MATH Google Scholar
Galasso, F., Cipolla, R. & Schiele, B. (2012). Video segmentation with superpixels. In Asian Conference on Computer Vision.
Galasso, F., Nagaraja, N. S., Cardenas, T. J., Brox, T. & Schiele, B. (2013). A unified video segmentation benchmark: Annotation, metrics and analysis. In IEEE International Conference on Computer Vision.
Gould, S., Fulton, R. & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In IEEE International Conference on Computer Vision.
Greenspan, H., Goldberger, J., & Mayer, A. (2004). Probabilistic space-time video modeling via piecewise gmm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(3), 384–396.
Article Google Scholar
Grundmann M., Kwatra V., Han M. & Essa I. (2010). Efficient hierarchical graph-based video segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.
Hanbury, A. (2008). How do superpixels affect image segmentation? Progress in Pattern Recognition (pp. 178–186)., Image Analysis and Applications Berlin: Springer.
Google Scholar
He, X., Zemel, R. S. & Ray, D. (2006). Learning and incorporating top-down cues in image segmentation. In European Conference on Computer Vision.
Hoiem, D., Efros, A. A., & Hebert, M. (2005). Automatic photo pop-up. ACM Transactions on Graphics, 24, 577–584.
Article Google Scholar
Khan, S. & Shah, M. (2001). Object based segmentation of video using color, motion and spatial information. In IEEE Conference on Computer Vision and Pattern Recognition.
Kläser, A., Marszałek, M. & Schmid, C. (2008). A spatio-temporal descriptor based on 3d-gradients. In British Machine Vision Conference.
Laptev, I. (2005). On space-time interest points. International Journal of Computer Vision, 64(2), 107–123.
Article MathSciNet Google Scholar
Lee, J. & Choi, S. (2014). Incremental tree-based inference with dependent normalized random measures. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (pp. 558–566).
Lee, Y. J., Kim, J., Grauman, K. (2011). Key-segments for video object segmentation. In IEEE International Conference on Computer Vision.
Levinshtein, A., Stere, A., Kutulakos, K. N., Fleet, D. J., Dickinson, S. J., & Siddiqi, K. (2009). Turbopixels: Fast superpixels using geometric flows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2290–2297.
Article Google Scholar
Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J. M. (2013). Video segmentation by tracking many figure-ground segments. In IEEE International Conference on Computer Vision.
Liu, C., Freeman, W. T., Adelson, E. H., Weiss, Y. (2008a). Human-assisted motion annotation. In IEEE Conference on Computer Vision and Pattern Recognition.
Liu, M.Y., Tuzel, O., Ramalingam, S., Chellappa, R. (2011). Entropy rate superpixel segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.
Liu, S., Dong, G., Yan, C. H. & Ong, S. H. (2008b). Video segmentation: Propagation, validation and aggregation of a preceding graph. In IEEE Conference on Computer Vision and Pattern Recognition.
Megret, R., & DeMenthon, D. (2002). A survey of spatio-temporal grouping techniques. UMD: Technical report.
Moore, A.P., Prince, S., Warrell, J., Mohammed, U. & Jones, G. (2008). Superpixel lattices. In IEEE Conference on Computer Vision and Pattern Recognition.
Mori, G., Ren, X., Efros, A. A. & Malik, J. (2004). Recovering human body configurations: Combining segmentation and recognition. In IEEE Conference on Computer Vision and Pattern Recognition.
Palou G. & Salembier, P. (2013) Hierarchical video representation with trajectory binary partition tree. In IEEE Conference on Computer Vision and Pattern Recognition.
Paris S. (2008) Edge-preserving smoothing and mean-shift segmentation of video streams. In European Conference on Computer Vision.
Paris S. & Durand, F. (2007). A topological approach to hierarchical segmentation using mean shift. In IEEE Conference on Computer Vision and Pattern Recognition.
Patel, N. V., & Sethi, I. K. (1997). Video shot detection and characterization for video databases. Pattern Recognition, 30(4), 583–592.
Article Google Scholar
Ren, X. & Malik, J. (2003) Learning a classification model for segmentation. In IEEE International Conference on Computer Vision.
Reso, M., Jachalsky, J., Rosenhahn, B. & Ostermann, J. (2013). Temporally consistent superpixels. In IEEE International Conference on Computer Vision.
Schmid, C., & Mohr, R. (1997). Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 530–535.
Article Google Scholar
Sharon, E., Brandt, A. & Basri, R. (2000). Fast multiscale image segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.
Sharon, E., Galun, M., Sharon, D., Basri, R., & Brandt, A. (2006). Hierarchy and adaptivity in segmenting visual scenes. Nature, 442(7104), 810–813.
Article Google Scholar
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81(1), 2–23.
Article Google Scholar
Sundberg, P., Brox, T., Maire, M., Arbeláez, P., Malik, J. (2011). Occlusion boundary detection and figure/ground assignment from optical flow. In IEEE Conference on Computer Vision and Pattern Recognition.
Tighe, J. & Lazebnik, S. (2010). Superparsing: Scalable nonparametric image parsing with superpixels. International Journal of Computer Vision.
Tripathi, S., Hwang, Y., Belongie, S. & Nguyen, T. (2014). Improving streaming video segmentation with early and mid-level visual processing. In IEEE Winter Conference on Applications of Computer Vision.
Tsai, D., Flagg, M. & Rehg, J. M. (2010) Motion coherent tracking with multi-label mrf optimization. In British Machine Vision Conference.
Van den Bergh, M., Roig, G., Boix, X., Manen, S. & Van Gool, L. (2013). Online video seeds for temporal window objectness. In IEEE International Conference on Computer Vision.
Vazquez-Reina, A., Avidan, S., Pfister, H. & Miller, E. (2010). Multiple hypothesis video segmentation from superpixel flows. In European Conference on Computer Vision.
Veksler, O., Boykov, Y., Mehrani, P. (2010) Superpixels and supervoxels in an energy optimization framework. In European Conference on Computer Vision.
Vincent, L., & Soille, P. (1991). Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(6), 583–598.
Article Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C. L. (2013). Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision.
Wang, J., Thiesson, B., Xu, Y., & Cohen, M. (2004). Image and video segmentation by anisotropic kernel mean shift. European Conference on Computer Vision
Xu, C., Corso, J. J. (2012). Evaluation of super-voxel methods for early video processing. In IEEE Conference on Computer Vision and Pattern Recognition
Xu, C., Xiong, C. & Corso, J. J. (2012). Streaming hierarchical video segmentation. In European Conference on Computer Vision.
Xu, C., Whitt, S. & Corso, J. J. (2013). Flattening supervoxel hierarchies by the uniform entropy slice. In IEEE International Conference on Computer Vision.
Zeng, G., Wang, P., Wang, J., Gan, R. & Zha, H. (2011). Structure-sensitive superpixels via geodesic distance. In IEEE International Conference on Computer Vision.

Download references

Acknowledgments

This work was partially supported by the National Science Foundation CAREER grant (IIS-0845282), the Army Research Office (W911NF-11-1-0090) and the DARPA Mind’s Eye program (W911NF-10-2-0062). We are grateful to the authors of the code and datasets that we have relied upon in this study, and we are grateful to the reviewers’ comments, which have greatly improved this paper.

Author information

Authors and Affiliations

Electrical Engineering and Computer Science, University of Michigan, 1301 Beal Avenue, Ann Arbor, MI, 48109-2122, USA
Chenliang Xu & Jason J. Corso

Authors

Chenliang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jason J. Corso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenliang Xu.

Additional information

Communicated by Ivan Laptev, Josef Sivic, Deva Ramanan.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 480 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, C., Corso, J.J. LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing. Int J Comput Vis 119, 272–290 (2016). https://doi.org/10.1007/s11263-016-0906-5

Download citation

Received: 13 June 2014
Accepted: 04 April 2016
Published: 25 April 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11263-016-0906-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing

Abstract

Access this article

Similar content being viewed by others

Superpixels for Video Content Using a Contour-Based EM Optimization

Video Segmentation with Superpixels

Improved Image Boundaries for Better Video Segmentation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 480 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Superpixels for Video Content Using a Contour-Based EM Optimization

Video Segmentation with Superpixels

Improved Image Boundaries for Better Video Segmentation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 480 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation