Abstract
Video object co-segmentation refers to the problem of simultaneously segmenting a common category of objects from multiple videos. Most existing video co-segmentation methods assume that all frames from all videos contain the target objects. Unfortunately, this assumption is rarely true in practice, particularly for large video sets, and existing methods perform poorly when the assumption is violated. Hence, any practical video object co-segmentation algorithm needs to identify the relevant frames containing the target object from all videos, and then co-segment the object only from these relevant frames. We present a spatiotemporal energy minimization formulation for simultaneous video object discovery and co-segmentation across multiple videos. Our formulation incorporates a spatiotemporal auto-context model, which is combined with appearance modeling for superpixel labeling. The superpixel-level labels are propagated to the frame level through a multiple instance boosting algorithm with spatial reasoning (Spatial-MILBoosting), based on which frames containing the video object are identified. Our method only needs to be bootstrapped with the frame-level labels for a few video frames (e.g., usually 1 to 3) to indicate if they contain the target objects or not. Experiments on three datasets validate the efficacy of our proposed method, which compares favorably with the state-of-the-art.
Chapter PDF
Similar content being viewed by others
Keywords
References
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. TPAMI 34(11), 2274–2282 (2012)
Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: CVPR, pp. 73–80 (2010)
Avidan, S.: SpatialBoost: Adding spatial reasoning to adaboost. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. Part IV. LNCS, vol. 3954, pp. 386–396. Springer, Heidelberg (2006)
Bai, X., Wang, J., Simons, D., Sapiro, G.: Video SnapCut: robust video object cutout using localized classifiers. ACM Trans. on Graphics 28, 70 (2009)
Batra, D., Kowdle, A., Parikh, D., Luo, J., Chen, T.: iCoseg: Interactive co-segmentation with intelligent scribble guidance. In: CVPR, pp. 3169–3176 (2010)
Boykov, Y., Funka-Lea, G.: Graph cuts and efficient ND image segmentation. IJCV 70(2), 109–131 (2006)
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. TPAMI 26(9), 1124–1137 (2004)
Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)
Chen, D.J., Chen, H.T., Chang, L.W.: Video object cosegmentation. In: ACM Multimedia, pp. 805–808 (2012)
Chiu, W.C., Fritz, M.: Multi-class video co-segmentation with a generative multi-video model. In: CVPR, pp. 321–328 (2013)
Dai, J., Wu, Y.N., Zhou, J., Zhu, S.C.: Cosegmentation and cosketch by unsupervised learning. In: ICCV (2013)
Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: CVPR, pp. 2141–2148 (2010)
Guo, J., Li, Z., Cheong, L.F., Zhou, S.Z.: Video co-segmentation for meaningful action extraction. In: ICCV (2013)
Harel, J., Koch, C., Perona, P., et al.: Graph-based visual saliency. In: NIPS, pp. 545–552 (2006)
Joulin, A., Bach, F., Ponce, J.: Discriminative clustering for image co-segmentation. In: CVPR, pp. 1943–1950 (2010)
Lee, Y.J., Kim, J., Grauman, K.: Key-segments for video object segmentation. In: ICCV, pp. 1995–2002 (2011)
Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: ICCV (2013)
Liu, D., Chen, T.: A topic-motion model for unsupervised video object discovery. In: CVPR, pp. 1–8 (2007)
Liu, D., Hua, G., Chen, T.: A hierarchical visual model for video object summarization. TPAMI 32(12), 2178–2190 (2010)
Ma, T., Latecki, L.J.: Maximum weight cliques with mutex constraints for video object segmentation. In: CVPR, pp. 670–677 (2012)
Ochs, P., Brox, T.: Object segmentation in video: a hierarchical variational approach for turning point trajectories into dense regions. In: ICCV, pp. 1583–1590 (2011)
Ochs, P., Brox, T.: Higher order motion models and spectral clustering. In: CVPR, pp. 614–621 (2012)
Papazoglou, A., Ferrari, V.: Fast object segmentation in unconstrained video. In: ICCV (2013)
Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: CVPR, pp. 3282–3289 (2012)
Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: CVPR, pp. 1939–1946 (2013)
Rubinstein, M., Liu, C., Freeman, W.T.: Annotation propagation in large image databases via dense image correspondence. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 85–99. Springer, Heidelberg (2012)
Rubio, J.C., Serrat, J., López, A.: Video co-segmentation. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part II. LNCS, vol. 7725, pp. 13–24. Springer, Heidelberg (2013)
Tang, K., Sukthankar, R., Yagnik, J., Fei-Fei, L.: Discriminative segment annotation in weakly labeled video. In: CVPR, pp. 2483–2490 (2013)
Tiburzi, F., Escudero, M., Bescós, J., Martínez, J.M.: A ground truth for motion-based video-object segmentation. In: ICIP, pp. 17–20 (2008)
Tsai, D., Flagg, M., Rehg, J.: Motion coherent tracking with multi-label MRF optimization. In: BMVC (2010)
Tu, Z.: Auto-context and its application to high-level vision tasks. In: CVPR, pp. 1–8 (2008)
Tuytelaars, T., Lampert, C.H., Blaschko, M.B., Buntine, W.: Unsupervised object discovery: A comparison. IJCV 88(2), 284–302 (2010)
Vicente, S., Kolmogorov, V., Rother, C.: Cosegmentation revisited: Models and optimization. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 465–479. Springer, Heidelberg (2010)
Vicente, S., Rother, C., Kolmogorov, V.: Object cosegmentation. In: CVPR, pp. 2217–2224 (2011)
Viola, P., Platt, J.C., Zhang, C.: Multiple instance boosting for object detection. In: NIPS, pp. 1417–1424 (2005)
Wang, L., Xue, J., Zheng, N., Hua, G.: Automatic salient object extraction with contextual cue. In: ICCV, pp. 105–112 (2011)
Wang, L., Xue, J., Zheng, N., Hua, G.: Concurrent segmentation of categorized objects from an image collection. In: ICPR, pp. 3309–3312 (2012)
Wang, L., Hua, G., Xue, J., Gao, Z., Zheng, N.: Joint segmentation and recognition of categorized objects from noisy web image collection. TIP (2014)
Xu, L., Jia, J., Matsushita, Y.: Motion detail preserving optical flow estimation. TPAMI 34(9), 1744–1757 (2012)
Xue, J., Wang, L., Zheng, N., Hua, G.: Automatic salient object extraction with contextual cue and its applications to recognition and alpha matting. PR 46(11), 2874–2889 (2013)
Zhang, D., Javed, O., Shah, M.: Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. In: CVPR, pp. 628–635 (2013)
Zhao, G., Yuan, J., Hua, G.: Topical video object discovery from key frames by modeling word co-occurrence prior. In: CVPR, pp. 1602–1609 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, L., Hua, G., Sukthankar, R., Xue, J., Zheng, N. (2014). Video Object Discovery and Co-segmentation with Extremely Weak Supervision. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8692. Springer, Cham. https://doi.org/10.1007/978-3-319-10593-2_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-10593-2_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10592-5
Online ISBN: 978-3-319-10593-2
eBook Packages: Computer ScienceComputer Science (R0)