Unseen Object Segmentation in Videos via Transferable Representations

  • Yi-Wen ChenEmail author
  • Yi-Hsuan Tsai
  • Chu-Ya Yang
  • Yen-Yu Lin
  • Ming-Hsuan Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11364)


In order to learn object segmentation models in videos, conventional methods require a large amount of pixel-wise ground truth annotations. However, collecting such supervised data is time-consuming and labor-intensive. In this paper, we exploit existing annotations in source images and transfer such visual information to segment videos with unseen object categories. Without using any annotations in the target video, we propose a method to jointly mine useful segments and learn feature representations that better adapt to the target frames. The entire process is decomposed into two tasks: (1) solving a submodular function for selecting object-like segments, and (2) learning a CNN model with a transferable module for adapting seen categories in the source domain to the unseen target video. We present an iterative update scheme between two tasks to self-learn the final solution for object segmentation. Experimental results on numerous benchmark datasets show that the proposed method performs favorably against the state-of-the-art algorithms.



This work is supported in part by Ministry of Science and Technology under grants MOST 105-2221-E-001-030-MY2 and MOST 107-2628-E-001-005-MY3.

Supplementary material

484519_1_En_38_MOESM1_ESM.avi (32.1 mb)
Supplementary material 1 (avi 32903 KB)
484519_1_En_38_MOESM2_ESM.pdf (27.4 mb)
Supplementary material 2 (pdf 28017 KB)


  1. 1.
    Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Gool, L.V.: One-shot video object segmentation. In: CVPR (2017)Google Scholar
  2. 2.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv:1606.00915 (2016)
  3. 3.
    Cheng, J., Tsai, Y.H., Wang, S., Yang, M.H.: Segflow: joint learning for video object segmentation and optical flow. In: ICCV (2017)Google Scholar
  4. 4.
    Everingham, M., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The Pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  5. 5.
    Faktor, A., Irani, M.: Video segmentation by non-local consensus voting. In: BMVC (2014)Google Scholar
  6. 6.
    Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML (2015)Google Scholar
  7. 7.
    Gopalan, R., Li, R., Chellappa, R.: Domain adaptation for object recognition: an unsupervised approach. In: ICCV (2011)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  9. 9.
    Hoffman, J., et al.: LSDA: large scale detection through adaptation. In: NIPS (2014)Google Scholar
  10. 10.
    Hong, S., Oh, J., Lee, H., Han, B.: Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In: CVPR (2016)Google Scholar
  11. 11.
    Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. arXiv:1711.10370 (2017)
  12. 12.
    Jain, S., Xiong, B., Grauman, K.: FusionSeg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: CVPR (2017)Google Scholar
  13. 13.
    Jain, S.D., Grauman, K.: Supervoxel-consistent foreground propagation in video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 656–671. Springer, Cham (2014). Scholar
  14. 14.
    Khoreva, A., Perazzi, F., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: CVPR (2017)Google Scholar
  15. 15.
    Koh, Y.J., Kim, C.S.: Primary object segmentation in videos based on region augmentation and reduction. In: CVPR (2017)Google Scholar
  16. 16.
    Lazic, N., Givoni, I., Frey, B., Aarabi, P.: Floss: facility location for subspace segmentation. In: ICCV (2009)Google Scholar
  17. 17.
    Lee, Y.J., Kim, J., Grauman, K.: Key-segments for video object segmentation. In: ICCV (2011)Google Scholar
  18. 18.
    Lim, J.J., Salakhutdinov, R., Torralba, A.: Transfer learning by borrowing examples for multiclass object detection. In: NIPS (2011)Google Scholar
  19. 19.
    Liu, C.: Beyond pixels: exploring new representations and applications for motion analysis. Ph.D. thesis, MIT (2009)Google Scholar
  20. 20.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  21. 21.
    Märki, N., Perazzi, F., Wang, O., Sorkine-Hornung, A.: Bilateral space video segmentation. In: CVPR (2016)Google Scholar
  22. 22.
    Papazoglou, A., Ferrari, V.: Fast object segmentation in unconstrained video. In: ICCV (2013)Google Scholar
  23. 23.
    Patricia, N., Caputo, B.: Learning to learn, from transfer learning to domain adaptation: a unifying perspective. In: CVPR (2014)Google Scholar
  24. 24.
    Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)Google Scholar
  25. 25.
    Perazzi, F., Pont-Tuset, J., McWilliams, B., Gool, L.V., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR (2016)Google Scholar
  26. 26.
    Perazzi, F., Wang, O., Gross, M., Sorkine-Hornung, A.: Fully connected object proposals for video segmentation. In: CVPR (2015)Google Scholar
  27. 27.
    Ochs, P., Brox, T.: Object segmentation in video: a hierarchical variational approach for turning point trajectories into dense regions. In: ICCV (2011)Google Scholar
  28. 28.
    Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: CVPR (2012)Google Scholar
  29. 29.
    Rochan, M., Wang, Y.: Weakly supervised localization of novel objects using appearance transfer. In: CVPR (2015)Google Scholar
  30. 30.
    Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 213–226. Springer, Heidelberg (2010). Scholar
  31. 31.
    Saleh, F.S., Aliakbarian, M.S., Salzmann, M., Petersson, L., Alvarez, J.M.: Bringing background into the foreground: making all classes equal in weakly-supervised video semantic segmentation. In: ICCV (2017)Google Scholar
  32. 32.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556, 1187–1200 (2014)Google Scholar
  33. 33.
    Strand, R., Ciesielski, K.C., Malmberg, F., Saha, P.K.: The minimum barrier distance. CVIU 117(4), 429–437 (2013)zbMATHGoogle Scholar
  34. 34.
    Tang, K., Sukthankar, R., Yagnik, J., Fei-Fei, L.: Discriminative segment annotation in weakly labeled video. In: CVPR (2013)Google Scholar
  35. 35.
    Taylor, B., Karasev, V., Soatto, S.: Causal video object segmentation from persistence of occlusions. In: CVPR (2015)Google Scholar
  36. 36.
    Tokmakov, P., Alahari, K., Schmid, C.: Learning motion patterns in videos. In: CVPR (2017)Google Scholar
  37. 37.
    Tokmakov, P., Alahari, K., Schmid, C.: Learning video object segmentation with visual memory. In: ICCV (2017)Google Scholar
  38. 38.
    Tommasi, T., Orabona, F., Caputo, B.: Learning categories from few examples with multi model knowledge transfer. PAMI 36, 928–941 (2014)CrossRefGoogle Scholar
  39. 39.
    Tsai, Y.H., Yang, M.H., Black, M.J.: Video segmentation via object flow. In: CVPR (2016)Google Scholar
  40. 40.
    Tsai, Y.-H., Zhong, G., Yang, M.-H.: Semantic co-segmentation in videos. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 760–775. Springer, Cham (2016). Scholar
  41. 41.
    Yan, Y., Xu, C., Cai, D., Corso, J.J.: Weakly supervised actor-action segmentation via robust multi-task ranking. In: CVPR (2017)Google Scholar
  42. 42.
    Zhang, D., Yang, L., Meng, D., Xu, D., Han, J.: SPFTN: a self-paced fine-tuning network for segmenting objects in weakly labelled videos. In: CVPR (2017)Google Scholar
  43. 43.
    Zhang, J., Sclaroff, S., Lin, Z., Shen, X., Price, B., Mech, R.: Minimum barrier salient object detection at 80 fps. In: ICCV (2015)Google Scholar
  44. 44.
    Zhang, Y., Chen, X., Li, J., Wang, C., Xia, C.: Semantic object segmentation via detection in weakly labeled video. In: CVPR (2015)Google Scholar
  45. 45.
    Zhong, G., Tsai, Y.-H., Yang, M.-H.: Weakly-supervised video scene co-parsing. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10111, pp. 20–36. Springer, Cham (2017). Scholar
  46. 46.
    Zhu, F., Jiang, Z., Shao, L.: Submodular object recognition. In: CVPR (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Yi-Wen Chen
    • 1
    • 2
    Email author
  • Yi-Hsuan Tsai
    • 3
  • Chu-Ya Yang
    • 1
  • Yen-Yu Lin
    • 1
  • Ming-Hsuan Yang
    • 4
    • 5
  1. 1.Academia SinicaTaipeiTaiwan
  2. 2.National Taiwan UniversityTaipeiTaiwan
  3. 3.NEC Laboratories AmericaPrincetonUSA
  4. 4.University of California, MercedMercedUSA
  5. 5.Google CloudSunnyvaleUSA

Personalised recommendations