Semi-supervised Domain Adaptation for Weakly Labeled Semantic Video Object Segmentation

Wang, Huiling; Raiko, Tapani; Lensu, Lasse; Wang, Tinghuai; Karhunen, Juha

doi:10.1007/978-3-319-54181-5_11

Huiling Wang¹⁷,
Tapani Raiko¹⁷,
Lasse Lensu¹⁸,
Tinghuai Wang¹⁹ &
…
Juha Karhunen¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10111))

Included in the following conference series:

Asian Conference on Computer Vision

4434 Accesses
7 Citations

Abstract

Deep convolutional neural networks (CNNs) have been immensely successful in many high-level computer vision tasks given large labelled datasets. However, for video semantic object segmentation, a domain where labels are scarce, effectively exploiting the representation power of CNN with limited training data remains a challenge. Simply borrowing the existing pre-trained CNN image recognition model for video segmentation task can severely hurt performance. We propose a semi-supervised approach to adapting CNN image recognition model trained from labelled image data to the target domain exploiting both semantic evidence learned from CNN, and the intrinsic structures of video data. By explicitly modelling and compensating for the domain shift from the source domain to the target domain, this proposed approach underpins a robust semantic object segmentation method against the changes in appearance, shape and occlusion in natural videos. We present extensive experiments on challenging datasets that demonstrate the superior performance of our approach compared with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15555-0_21
Chapter Google Scholar
Lee, Y.J., Kim, J., Grauman, K.: Key-segments for video object segmentation. In: ICCV, pp. 1995–2002 (2011)
Google Scholar
Zhang, D., Javed, O., Shah, M.: Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. In: CVPR, pp. 628–635 (2013)
Google Scholar
Papazoglou, A., Ferrari, V.: Fast object segmentation in unconstrained video. In: ICCV, pp. 1777–1784 (2013)
Google Scholar
Wang, T., Wang, H.: Graph transduction learning of object proposals for video object segmentation. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 553–568. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16817-3_36
Google Scholar
Wang, H., Wang, T.: Primary object discovery and segmentation in videos via graph-based transductive inference. Comput. Vis. Image Underst. 143, 159–172 (2016)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition, arXiv preprint (2014). arXiv:1409.1556
Rasmus, A., Valpola, H., Honkala, M., Berglund, M., Raiko, T.: Semi-supervised learning with ladder network. In: NIPS (2015)
Google Scholar
Faktor, A., Irani, M.: Video segmentation by non-local consensus voting. In: BMVC, vol. 2, p. 6 (2014)
Google Scholar
Yang, J., Zhao, G., Yuan, J., Shen, X., Lin, Z., Price, B., Brandt, J.: Discovering primary objects in videos by saliency fusion and iterative appearance estimation. IEEE Trans. Circuits Syst, Video Technol (2015)
Google Scholar
Perazzi, F., Wang, O., Gross, M., Sorkine-Hornung, A.: Fully connected object proposals for video segmentation. In: ICCV, pp. 3227–3234 (2015)
Google Scholar
Endres, I., Hoiem, D.: Category independent object proposals. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 575–588. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15555-0_42
Chapter Google Scholar
Manen, S., Guillaumin, M., Gool, L.J.V.: Prime object proposals with randomized prim’s algorithm. In: ICCV, pp. 2536–2543 (2013)
Google Scholar
Wang, J., Xu, Y., Shum, H.Y., Cohen, M.F.: Video tooning. ACM Trans. Graph. 23, 574–583 (2004)
Article Google Scholar
Collomosse, J.P., Rowntree, D., Hall, P.M.: Stroke surfaces: temporally coherent artistic animations from video. IEEE Trans. Vis. Comput. Graph. 11, 540–549 (2005)
Article Google Scholar
Wang, T., Collomosse, J.P.: Probabilistic motion diffusion of labeling priors for coherent video segmentation. IEEE Trans. Multimed. 14, 389–400 (2012)
Article Google Scholar
Tsai, D., Flagg, M., Nakazawa, A., Rehg, J.M.: Motion coherent tracking using multi-label MRF optimization. Int. J. Comput. Vis. 100, 190–202 (2012)
Article MathSciNet Google Scholar
Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: ICCV, Australia, 1–8 December 2013, pp. 2192–2199 (2013)
Google Scholar
Wang, T., Han, B., Collomosse, J.P.: Touchcut: fast image and video segmentation using single-touch interaction. Comput. Vis. Image Underst. 120, 14–30 (2014)
Article Google Scholar
Grundmann, M., Kwatra, V., Han, M., Essa, I.A.: Efficient hierarchical graph-based video segmentation. In: CVPR, pp. 2141–2148 (2010)
Google Scholar
Xu, C., Xiong, C., Corso, J.J.: Streaming hierarchical video segmentation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 626–639. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33783-3_45
Chapter Google Scholar
Wang, C., de La Gorce, M., Paragios, N.: Segmentation, ordering and multi-object tracking using graphical models. In: ICCV, pp. 747–754 (2009)
Google Scholar
Sundberg, P., Brox, T., Maire, M., Arbelaez, P., Malik, J.: Occlusion boundary detection and figure/ground assignment from optical flow. In: CVPR, pp. 2233–2240 (2011)
Google Scholar
Giordano, D., Murabito, F., Palazzo, S., Spampinato, C.: Superpixel-based video object segmentation using perceptual organization and location prior. In: CVPR, pp. 4814–4822 (2015)
Google Scholar
Taylor, B., Karasev, V., Soatto, S.: Causal video object segmentation from persistence of occlusions. In: CVPR, pp. 4268–4276 (2015)
Google Scholar
Wang, W., Shen, J., Porikli, F.: Saliency-aware geodesic video object segmentation. In: CVPR, pp. 3395–3402 (2015)
Google Scholar
Hartmann, G., Grundmann, M., Hoffman, J., Tsai, D., Kwatra, V., Madani, O., Vijayanarasimhan, S., Essa, I., Rehg, J., Sukthankar, R.: Weakly supervised learning of object segmentations from web-scale video. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7583, pp. 198–208. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33863-2_20
Chapter Google Scholar
Tang, K.D., Sukthankar, R., Yagnik, J., Li, F.: Discriminative segment annotation in weakly labeled video. In: CVPR, pp. 2483–2490 (2013)
Google Scholar
Liu, X., Tao, D., Song, M., Ruan, Y., Chen, C., Bu, J.: Weakly supervised multiclass video segmentation. In: CVPR, pp. 57–64 (2014)
Google Scholar
Zhang, Y., Chen, X., Li, J., Wang, C., Xia, C.: Semantic object segmentation via detection in weakly labeled video. In: CVPR, pp. 3641–3649 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Sch, B.: Learning with local and global consistency. In: NIPS, pp. 321–328 (2004)
Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 309–314 (2004)
Article Google Scholar
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23, 1222–1239 (2001)
Article Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Google Scholar
Arbelaez, P., Maire, M., Fowlkes, C.C., Malik, J.: From contours to regions: an empirical evaluation. In: CVPR, pp. 2294–2301 (2009)
Google Scholar
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24673-2_3
Chapter Google Scholar
Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15555-0_21
Chapter Google Scholar
Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: CVPR, pp. 3282–3289 (2012)
Google Scholar
Jain, S.D., Grauman, K.: Supervoxel-consistent foreground propagation in video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 656–671. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10593-2_43
Google Scholar
Wang, L., Hua, G., Sukthankar, R., Xue, J., Zheng, N.: Video object discovery and co-segmentation with extremely weak supervision. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 640–655. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10593-2_42
Google Scholar

Download references

Author information

Authors and Affiliations

Aalto University, Espoo, Finland
Huiling Wang, Tapani Raiko & Juha Karhunen
Lappeenranta University of Technology, Lappeenranta, Finland
Lasse Lensu
Nokia Technologies, Tampere, Finland
Tinghuai Wang

Authors

Huiling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tapani Raiko
View author publications
You can also search for this author in PubMed Google Scholar
Lasse Lensu
View author publications
You can also search for this author in PubMed Google Scholar
Tinghuai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Juha Karhunen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huiling Wang .

Editor information

Editors and Affiliations

National Tsing Hua University, Hsinchu, Taiwan
Shang-Hong Lai
Graz University of Technology, Graz, Austria
Vincent Lepetit
Drexel University, Philadelphia, Pennsylvania, USA
Ko Nishino
The University of Tokyo, Tokyo, Japan
Yoichi Sato

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Raiko, T., Lensu, L., Wang, T., Karhunen, J. (2017). Semi-supervised Domain Adaptation for Weakly Labeled Semantic Video Object Segmentation. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10111. Springer, Cham. https://doi.org/10.1007/978-3-319-54181-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-54181-5_11
Published: 10 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54180-8
Online ISBN: 978-3-319-54181-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics