Towards Segmenting Consumer Stereo Videos: Benchmark, Baselines and Ensembles

  • Wei-Chen ChiuEmail author
  • Fabio Galasso
  • Mario Fritz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10115)


Are we ready to segment consumer stereo videos? The amount of this data type is rapidly increasing and encompasses rich information of appearance, motion and depth cues. However, the segmentation of such data is still largely unexplored. First, we propose therefore a new benchmark: videos, annotations and metrics to measure progress on this emerging challenge. Second, we evaluate several state of the art segmentation methods and propose a novel ensemble method based on recent spectral theory. This combines existing image and video segmentation techniques in an efficient scheme. Finally, we propose and integrate into this model a novel regressor, learnt to optimize the stereo segmentation performance directly via a differentiable proxy. The regressor makes our segmentation ensemble adaptive to each stereo video and outperforms the segmentations of the ensemble as well as a most recent RGB-D segmentation technique.


Optical Flow Segmentation Algorithm Spectral Cluster Motion Segmentation Video Segmentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material

Supplementary material 1 (mp4 22993 KB)

440742_1_En_24_MOESM2_ESM.pdf (773 kb)
Supplementary material 2 (pdf 773 KB)


  1. 1.
    Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR (2011)Google Scholar
  2. 2.
    Oneata, D., Revaud, J., Verbeek, J., Schmid, C.: Spatio-temporal object detection proposals. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 737–752. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10578-9_48 Google Scholar
  3. 3.
    Taralova, E.H., Torre, F., Hebert, M.: Motion words for videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 725–740. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10590-1_47 Google Scholar
  4. 4.
    Raza, S.H., Grundmann, M., Essa, I.: Geometric context from video. In: CVPR (2013)Google Scholar
  5. 5.
    Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 31–42. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-11752-2_3 Google Scholar
  6. 6.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33783-3_44 CrossRefGoogle Scholar
  7. 7.
    Ochs, P., Brox, T.: Object segmentation in video: a hierarchical variational approach for turning point trajectories into dense regions. In: ICCV (2011)Google Scholar
  8. 8.
    Li, Z., Wu, X.M., Chang, S.F.: Segmentation using superpixels: a bipartite graph partitioning approach. In: CVPR (2012)Google Scholar
  9. 9.
    Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: CVPR (2010)Google Scholar
  10. 10.
    Hickson, S., Birchfield, S., Essa, I., Christensen, H.: Efficient hierarchical graph-based segmentation of RGBD videos. In: CVPR (2014)Google Scholar
  11. 11.
    Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15555-0_21 CrossRefGoogle Scholar
  12. 12.
    Galasso, F., Iwasaki, M., Nobori, K., Cipolla, R.: Spatio-temporal clustering of probabilistic region trajectories. In: ICCV (2011)Google Scholar
  13. 13.
    Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: ICCV (2013)Google Scholar
  14. 14.
    Wang, W., Shen, J., Porikli, F.: Saliency-aware geodesic video object segmentation. In: CVPR (2015)Google Scholar
  15. 15.
    Faktor, A., Irani, M.: Video segmentation by non-local consensus voting. In: BMVC (2014)Google Scholar
  16. 16.
    Kanade, T., Okutomi, M.: A stereo matching algorithm with an adaptive window: theory and experiment. TPAMI 16, 920–932 (1994)CrossRefGoogle Scholar
  17. 17.
    Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47, 7–42 (2002)CrossRefzbMATHGoogle Scholar
  18. 18.
    Bleyer, M., Rhemann, C., Rother, C.: Extracting 3D scene-consistent object proposals and depth from stereo images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 467–481. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33715-4_34 CrossRefGoogle Scholar
  19. 19.
    Huguet, F., Devernay, F.: A variational method for scene flow estimation from stereo sequences. In: ICCV (2007)Google Scholar
  20. 20.
    Basha, T., Moses, Y., Kiryati, N.: Multi-view scene flow estimation: a view centered variational approach. IJCV 101, 6–21 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Vogel, C., Schindler, K., Roth, S.: 3D scene flow estimation with a rigid motion prior. In: ICCV (2011)Google Scholar
  22. 22.
    den Bergh, M.V., Gool, L.J.V.: Real-time stereo and flow-based video segmentation with superpixels. In: WACV (2012)Google Scholar
  23. 23.
    Weikersdorfer, D., Schick, A., Cremers, D.: Depth-adaptive superpixels for RGB-D video segmentation. In: ICIP (2013)Google Scholar
  24. 24.
    Held, D., Guillory, D., Rebsamen, B., Thrun, S., Savarese, S.: A probabilistic framework for real-time 3D segmentation using spatial, temporal, and semantic cues. In: RSS (2016)Google Scholar
  25. 25.
    Kim, G., Xing, E.P.: On multiple foreground cosegmentation. In: CVPR(2012)Google Scholar
  26. 26.
    Joulin, A., Bach, F., Ponce, J.: Multi-class cosegmentation. In: CVPR (2012)Google Scholar
  27. 27.
    Fu, H., Xu, D., Zhang, B., Lin, S.: Object-based multiple foreground video co-segmentation. In: CVPR (2014)Google Scholar
  28. 28.
    Chiu, W.C., Fritz, M.: Multi-class video co-segmentation with a generative multi-video model. In: CVPR (2013)Google Scholar
  29. 29.
    Karayev, S., Baumgartner, T., Fritz, M., Darrell, T.: Timely object recognition. In: Advances in Neural Information Processing Systems (NIPS) (2012)Google Scholar
  30. 30.
    Karayev, S., Fritz, M., Darrell, T.: Anytime recognition of objects and scenes. In: CVPR (2014)Google Scholar
  31. 31.
    Ebert, S., Fritz, M., Schiele, B.: Ralf: A reinforced active learning formulation for object class recognition. In: CVPR (2012)Google Scholar
  32. 32.
    Mac Aodha, O., Brostow, G.J., Pollefeys, M.: Segmenting video into classes of algorithm-suitability. In: CVPR (2010)Google Scholar
  33. 33.
    Bai, X., Wang, J., Simons, D., Sapiro, G.: Video SnapCut: robust video object cutout using localized classifiers. In: SIGGRAPH (2009)Google Scholar
  34. 34.
    Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. TPAMI 33, 898–916 (2011)CrossRefGoogle Scholar
  35. 35.
    Galasso, F., Nagaraja, N.S., Cardenas, T.J., Brox, T., Schiele, B.: A unified video segmentation benchmark: annotation, metrics and analysis. In: ICCV (2013)Google Scholar
  36. 36.
    Geiger, A., Roser, M., Urtasun, R.: Efficient large-scale stereo matching. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6492, pp. 25–38. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-19315-6_3 CrossRefGoogle Scholar
  37. 37.
    Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L 1 optical flow. In: Hamprecht, Fred, A., Schnörr, Christoph, Jähne, Bernd (eds.) DAGM 2007. LNCS, vol. 4713, pp. 214–223. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74936-3_22 CrossRefGoogle Scholar
  38. 38.
    Fusiello, A., Irsara, L.: Quasi-Euclidean uncalibrated epipolar rectification. In: ICPR (2008)Google Scholar
  39. 39.
    Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K., Darrell, T.: A category-level 3-D object dataset: putting the kinect to work. In: IEEE Workshop on Consumer Depth Cameras for Computer Vision (2011)Google Scholar
  40. 40.
    Galasso, F., Keuper, M., Brox, T., Schiele, B.: Spectral graph reduction for efficient image and streaming video segmentation. In: CVPR (2014)Google Scholar
  41. 41.
    Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: analysis and an algorithm. In: NIPS (2002)Google Scholar
  42. 42.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. TPAMI 22, 888–905 (2000)CrossRefGoogle Scholar
  43. 43.
    von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. TPAMI 34, 2274–2282 (2012)CrossRefGoogle Scholar
  45. 45.
    Zhang, Q., Shen, X., Xu, L., Jia, J.: Rolling guidance filter. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 815–830. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10578-9_53 Google Scholar
  46. 46.
    Meilǎ, M., Shortreed, S., Xu, L.: Regularized spectral learning. In: AISTATS (2005)Google Scholar
  47. 47.
    Jordan, F., Bach, F.: Learning spectral clustering. In: NIPS (2004)Google Scholar
  48. 48.
    Ionescu, C., Vantzosy, O., Sminchisescu, C.: Matrix backpropagation for deep networks with structured layers. In: ICCV (2015)Google Scholar
  49. 49.
    Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. TPAMI 36, 1187–1200 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Max Planck Institute for Informatics, Saarland Informatics CampusSaarbrückenGermany
  2. 2.OSRAM Corporate TechnologyMunichGermany

Personalised recommendations