Skip to main content

Advertisement

Log in

LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Supervoxel segmentation has strong potential to be incorporated into early video analysis as superpixel segmentation has in image analysis. However, there are many plausible supervoxel methods and little understanding as to when and where each is most appropriate. Indeed, we are not aware of a single comparative study on supervoxel segmentation. To that end, we study seven supervoxel algorithms, including both off-line and streaming methods, in the context of what we consider to be a good supervoxel: namely, spatiotemporal uniformity, object/region boundary detection, region compression and parsimony. For the evaluation we propose a comprehensive suite of seven quality metrics to measure these desirable supervoxel characteristics. In addition, we evaluate the methods in a supervoxel classification task as a proxy for subsequent high-level uses of the supervoxels in video analysis. We use six existing benchmark video datasets with a variety of content-types and dense human annotations. Our findings have led us to conclusive evidence that the hierarchical graph-based (GBH), segmentation by weighted aggregation (SWA) and temporal superpixels (TSP) methods are the top-performers among the seven methods. They all perform well in terms of segmentation accuracy, but vary in regard to the other desiderata: GBH captures object boundaries best; SWA has the best potential for region compression; and TSP achieves the best undersegmentation error.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://people.csail.mit.edu/sparis/.

  2. http://people.csail.mit.edu/jchang7/code.php.

  3. We manually exclude the corrupted frames, and organize the dataset into short clips with roughly 100 frames-per-clip. The organized short clips can be downloaded from our website.

  4. http://www.robots.ox.ac.uk/~vgg/research/texclass/filters.html.

  5. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

References

  • Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P. & Susstrunk, S. (2012). Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 898–916.

    Article  Google Scholar 

  • Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M. J., & Szeliski, R. (2011). A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92(1), 1–31.

    Article  Google Scholar 

  • Brendel, W. & Todorovic, S. (2009) Video object segmentation by tracking regions. In IEEE International Conference on Computer Vision.

  • Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R. (2008). Segmentation and recognition using structure from motion point clouds. In European Conference on Computer Vision.

  • Budvytis, I. & Badrinarayanan, V., Cipolla, R. (2011). Semi-supervised video segmentation using tree structured graphical models. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Chang, J., Wei, D. & III J. W. F. (2013). A video representation using temporal superpixels. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Chen, A. Y. C. & Corso, J. J. (2010). Propagating multi-class pixel labels throughout video frames. In Proceedings of Western New York Image Processing Workshop.

  • Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.

    Article  Google Scholar 

  • Corso, J. J., Sharon, E., Dube, S., El-Saden, S., Sinha, U., & Yuille, A. (2008). Efficient multilevel brain tumor segmentation with integrated bayesian model classification. IEEE Transactions on Medical Imaging, 27(5), 629–640.

    Article  Google Scholar 

  • de Souza, K. J. F., de Albuquerque Araújo, A., et al (2014). Graph-based hierarchical video segmentation based on a simple dissimilarity measure. Pattern Recognition Letters.

  • DeMenthon, D. & Megret, R. (2002). Spatio-temporal segmentation of video by hierarchical mean shift analysis. In Statistical Methods in Video Processing Workshop.

  • Drucker, F. & MacCormick, J. (2009). Fast superpixels for video analysis. In IEEE Workshop on Motion and Video Computing.

  • Erdem, Ç. E., Sankur, B., & Tekalp, A. M. (2004). Performance measures for video object segmentation and tracking. IEEE Transactions on Image Processing, 13(7), 937–951.

    Article  Google Scholar 

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.

    Article  Google Scholar 

  • Fowlkes, C., Belongie, S. & Malik, J. (2001) Efficient spatiotemporal grouping using the nystrom method. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Fowlkes, C., Belongie, S., Chung, F., & Malik, J. (2004). Spectral grouping using the nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 214–225.

    Article  Google Scholar 

  • Fukunaga, K., & Hostetler, L. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1), 32–40.

    Article  MathSciNet  MATH  Google Scholar 

  • Galasso, F., Cipolla, R. & Schiele, B. (2012). Video segmentation with superpixels. In Asian Conference on Computer Vision.

  • Galasso, F., Nagaraja, N. S., Cardenas, T. J., Brox, T. & Schiele, B. (2013). A unified video segmentation benchmark: Annotation, metrics and analysis. In IEEE International Conference on Computer Vision.

  • Gould, S., Fulton, R. & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In IEEE International Conference on Computer Vision.

  • Greenspan, H., Goldberger, J., & Mayer, A. (2004). Probabilistic space-time video modeling via piecewise gmm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(3), 384–396.

    Article  Google Scholar 

  • Grundmann M., Kwatra V., Han M. & Essa I. (2010). Efficient hierarchical graph-based video segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Hanbury, A. (2008). How do superpixels affect image segmentation? Progress in Pattern Recognition (pp. 178–186)., Image Analysis and Applications Berlin: Springer.

    Google Scholar 

  • He, X., Zemel, R. S. & Ray, D. (2006). Learning and incorporating top-down cues in image segmentation. In European Conference on Computer Vision.

  • Hoiem, D., Efros, A. A., & Hebert, M. (2005). Automatic photo pop-up. ACM Transactions on Graphics, 24, 577–584.

    Article  Google Scholar 

  • Khan, S. & Shah, M. (2001). Object based segmentation of video using color, motion and spatial information. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Kläser, A., Marszałek, M. & Schmid, C. (2008). A spatio-temporal descriptor based on 3d-gradients. In British Machine Vision Conference.

  • Laptev, I. (2005). On space-time interest points. International Journal of Computer Vision, 64(2), 107–123.

    Article  MathSciNet  Google Scholar 

  • Lee, J. & Choi, S. (2014). Incremental tree-based inference with dependent normalized random measures. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (pp. 558–566).

  • Lee, Y. J., Kim, J., Grauman, K. (2011). Key-segments for video object segmentation. In IEEE International Conference on Computer Vision.

  • Levinshtein, A., Stere, A., Kutulakos, K. N., Fleet, D. J., Dickinson, S. J., & Siddiqi, K. (2009). Turbopixels: Fast superpixels using geometric flows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2290–2297.

    Article  Google Scholar 

  • Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J. M. (2013). Video segmentation by tracking many figure-ground segments. In IEEE International Conference on Computer Vision.

  • Liu, C., Freeman, W. T., Adelson, E. H., Weiss, Y. (2008a). Human-assisted motion annotation. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Liu, M.Y., Tuzel, O., Ramalingam, S., Chellappa, R. (2011). Entropy rate superpixel segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Liu, S., Dong, G., Yan, C. H. & Ong, S. H. (2008b). Video segmentation: Propagation, validation and aggregation of a preceding graph. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Megret, R., & DeMenthon, D. (2002). A survey of spatio-temporal grouping techniques. UMD: Technical report.

  • Moore, A.P., Prince, S., Warrell, J., Mohammed, U. & Jones, G. (2008). Superpixel lattices. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Mori, G., Ren, X., Efros, A. A. & Malik, J. (2004). Recovering human body configurations: Combining segmentation and recognition. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Palou G. & Salembier, P. (2013) Hierarchical video representation with trajectory binary partition tree. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Paris S. (2008) Edge-preserving smoothing and mean-shift segmentation of video streams. In European Conference on Computer Vision.

  • Paris S. & Durand, F. (2007). A topological approach to hierarchical segmentation using mean shift. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Patel, N. V., & Sethi, I. K. (1997). Video shot detection and characterization for video databases. Pattern Recognition, 30(4), 583–592.

    Article  Google Scholar 

  • Ren, X. & Malik, J. (2003) Learning a classification model for segmentation. In IEEE International Conference on Computer Vision.

  • Reso, M., Jachalsky, J., Rosenhahn, B. & Ostermann, J. (2013). Temporally consistent superpixels. In IEEE International Conference on Computer Vision.

  • Schmid, C., & Mohr, R. (1997). Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 530–535.

    Article  Google Scholar 

  • Sharon, E., Brandt, A. & Basri, R. (2000). Fast multiscale image segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Sharon, E., Galun, M., Sharon, D., Basri, R., & Brandt, A. (2006). Hierarchy and adaptivity in segmenting visual scenes. Nature, 442(7104), 810–813.

    Article  Google Scholar 

  • Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.

  • Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81(1), 2–23.

    Article  Google Scholar 

  • Sundberg, P., Brox, T., Maire, M., Arbeláez, P., Malik, J. (2011). Occlusion boundary detection and figure/ground assignment from optical flow. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Tighe, J. & Lazebnik, S. (2010). Superparsing: Scalable nonparametric image parsing with superpixels. International Journal of Computer Vision.

  • Tripathi, S., Hwang, Y., Belongie, S. & Nguyen, T. (2014). Improving streaming video segmentation with early and mid-level visual processing. In IEEE Winter Conference on Applications of Computer Vision.

  • Tsai, D., Flagg, M. & Rehg, J. M. (2010) Motion coherent tracking with multi-label mrf optimization. In British Machine Vision Conference.

  • Van den Bergh, M., Roig, G., Boix, X., Manen, S. & Van Gool, L. (2013). Online video seeds for temporal window objectness. In IEEE International Conference on Computer Vision.

  • Vazquez-Reina, A., Avidan, S., Pfister, H. & Miller, E. (2010). Multiple hypothesis video segmentation from superpixel flows. In European Conference on Computer Vision.

  • Veksler, O., Boykov, Y., Mehrani, P. (2010) Superpixels and supervoxels in an energy optimization framework. In European Conference on Computer Vision.

  • Vincent, L., & Soille, P. (1991). Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(6), 583–598.

    Article  Google Scholar 

  • Wang, H., Kläser, A., Schmid, C., Liu, C. L. (2013). Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision.

  • Wang, J., Thiesson, B., Xu, Y., & Cohen, M. (2004). Image and video segmentation by anisotropic kernel mean shift. European Conference on Computer Vision

  • Xu, C., Corso, J. J. (2012). Evaluation of super-voxel methods for early video processing. In IEEE Conference on Computer Vision and Pattern Recognition

  • Xu, C., Xiong, C. & Corso, J. J. (2012). Streaming hierarchical video segmentation. In European Conference on Computer Vision.

  • Xu, C., Whitt, S. & Corso, J. J. (2013). Flattening supervoxel hierarchies by the uniform entropy slice. In IEEE International Conference on Computer Vision.

  • Zeng, G., Wang, P., Wang, J., Gan, R. & Zha, H. (2011). Structure-sensitive superpixels via geodesic distance. In IEEE International Conference on Computer Vision.

Download references

Acknowledgments

This work was partially supported by the National Science Foundation CAREER grant (IIS-0845282), the Army Research Office (W911NF-11-1-0090) and the DARPA Mind’s Eye program (W911NF-10-2-0062). We are grateful to the authors of the code and datasets that we have relied upon in this study, and we are grateful to the reviewers’ comments, which have greatly improved this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chenliang Xu.

Additional information

Communicated by Ivan Laptev, Josef Sivic, Deva Ramanan.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 480 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, C., Corso, J.J. LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing. Int J Comput Vis 119, 272–290 (2016). https://doi.org/10.1007/s11263-016-0906-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-016-0906-5

Keywords

Navigation