Advertisement

Multimedia Tools and Applications

, Volume 78, Issue 23, pp 33617–33631 | Cite as

Effective online refinement for video object segmentation

  • Gongyang Li
  • Zhi LiuEmail author
  • Xiaofei Zhou
Article
  • 50 Downloads

Abstract

In this paper, we propose a novel framework, which deeply explores the motion cue and the online fine-tuning strategy to tackle the task of semi-supervised video object segmentation. First, in order to filter out the irrelevant background regions in the initial segmentation results, which are generated by an existing semi-supervised segmentation model, a motion based background suppression method is exploited to obtain the purified segmentation results. Second, a set of key frames with high-quality segmentation results are selected based on several metrics of segmentation quality in the purified segmentation results. Finally, the selected key frames are combined with the manually annotated first frame to efficiently retrain the segmentation model online, so as to obtain more accurate segmentation results. Our experimental results on two challenging datasets demonstrate that the proposed framework achieves the state-of-the-art performance.

Keywords

Video object segmentation Motion cue Online fine-tuning 

Notes

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant No. 61771301.

References

  1. 1.
    Caelles S, Maninis K, Pont-Tuset J, Leal-Taixé L, Cremers D, Van-Gool L (2017). One-shot video object segmentation. In: Proc. of IEEE conference on computer vision and pattern recognition, pp. 5320–5329Google Scholar
  2. 2.
    Cheng J, Tsai Y, Wang S, and Yang M (2017). Segflow: joint learning for video object segmentation and optical flow. In: Proc. of IEEE international conference on computer vision, pp. 686–695Google Scholar
  3. 3.
    Hu YT, Huang JB, Schwing AG (2018) Unsupervised video object segmentation using motion saliency-guided spatio-temporal propagation. In: Proc. of European conference on computer visionGoogle Scholar
  4. 4.
    Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, and Brox T (2017). Flownet 2.0: evolution of optical flow estimation with deep networks. In: Proc. of IEEE conference on computer vision and pattern recognition, pp. 1647–1655Google Scholar
  5. 5.
    Jain SD and Grauman K (2014). Supervoxel-consistent foreground propagation in video. In: Proc. of European conference on computer vision, pp. 656–671.CrossRefGoogle Scholar
  6. 6.
    Jain SD, Xiong B, Grauman K (2017) Fusionseg: learning to combine motion and appearance for automatic segmenation. In: Proc. of IEEE conference on computer vision and pattern recognition, pp. 2117–2126Google Scholar
  7. 7.
    Jampani V, Gadde R, Gehler PV (2017) Video propagation networks. In: Proc. of IEEE conference on computer vision and pattern recognition, pp. 3154–3164Google Scholar
  8. 8.
    Jang W, Kim C (2017) Online video object segmentation via convolutional trident network. In: Proc. of IEEE conference on computer vision and pattern recognition, pp. 7474–7483Google Scholar
  9. 9.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. ACM international conference on Multimedia:675–678Google Scholar
  10. 10.
    Koh YJ, Kim C (2017) Primary object segmentation in videos based on region augmentation and reduction. In: Proc. of IEEE conference on computer vision and pattern recognition, pp. 7417–7425Google Scholar
  11. 11.
    Luo B, Li H, Meng F, Wu Q, Ngan KN (2018) An unsupervised method to extract video object via complexity awareness and object local parts. IEEE Trans on Circuits and Syst for Video Tech 28(7):1580–1594CrossRefGoogle Scholar
  12. 12.
    Mai L and Liu F (2014). Comparing salient object detection results without ground truth. In: Proc. of European conference on computer vision, pp. 76–91CrossRefGoogle Scholar
  13. 13.
    Märki N, Perazzi F, Wang O, and Sorkine-Hornung A (2016) Bilateral space video segmentation. In: Proc. of IEEE conference on computer vision and pattern recognition, pp. 743–751Google Scholar
  14. 14.
    Nagaraja NS, Schmidt FR, Brox T (2015) Video segmentation with just a few strokes. In: Proc. of IEEE international conference on computer vision, pp. 3235–3243Google Scholar
  15. 15.
    Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66CrossRefGoogle Scholar
  16. 16.
    Perazzi F, Wang O, Gross M, Sorkine-Hornung A (2015). Fully connected object proposals for video segmentation. In: Proc. of IEEE international conference on computer vision, pp. 3227–3234.Google Scholar
  17. 17.
    Perazzi F, Pont-Tuset J, McWilliams B, Van-Gool L, Gross M, and Sorkine-Hornung A (2016). A benchmark dataset and evaluation methodology for video object segmentation. In: Proc. of IEEE conference on computer vision and pattern recognition, pp. 724–732Google Scholar
  18. 18.
    Perazzi F, Khoreva A, Benenson R, Schiele B, SorkineHornung A (2017) Learning video object segmentation from static images. In: Proc. of IEEE conference on computer vision and pattern recognition, pp. 3491–3500Google Scholar
  19. 19.
    Prest A, Leistner C, Civera J, Schmid C, and Ferrari V (2012). Learning object class detectors from weakly annotated video. In: Proc. of IEEE conference on computer vision and pattern recognition, pp. 3282–3289Google Scholar
  20. 20.
    Simonyan K, Zisserman A (2014). Very deep convolutional networks for large-scale image recognition. Computer ScienceGoogle Scholar
  21. 21.
    Tokmakov P, Alahari K, and Schmid C (2017). Learning motion patterns in videos. In: Proc. of IEEE conference on computer vision and pattern recognition, pp. 531–539Google Scholar
  22. 22.
    Tokmakov P, Alahari K, Schmid C (2017). Learning video object segmentation with visual memory. In: Proc. of IEEE international conference on computer vision, pp. 4491–4500Google Scholar
  23. 23.
    Tsai Y, Yang M, Black MJ (2016) Video segmentation via object flow. In: Proc. of IEEE conference on computer vision and pattern recognition, pp. 3899–3908Google Scholar
  24. 24.
    Voigtlaender P, Leibe B (2017). Online adaptation of convolutional neural networks for video object segmentation. In: Proc. of the British Machine Vision ConferenceGoogle Scholar
  25. 25.
    Wang W, Shen J, Xie J, Porikli F (2017) Super-trajectory for video segmentation. In: Proc. of IEEE international conference on computer vision, pp. 1680–1688Google Scholar
  26. 26.
    Wang W, Shen J, Yang R, Porikli F (2018) Saliency-aware video object segmentation. IEEE Trans on Pattern Anal Mach Intell 40(1):20–33CrossRefGoogle Scholar
  27. 27.
    Yoon JS, Rameau F, Kim J, Lee S, Shin S, Kweon I. S. (2017). Pixel-level matching for video object segmentation using convolutional neural networks. In: Proc. of IEEE international conference on computer vision, pp. 2186–2195Google Scholar
  28. 28.
    Zhang G, Yuan Z, Liu Y, Ma L, Zheng N (2015) Video object segmentation by integrating trajectories from points and regions. Multimed Tools Appl 74(21):9665–9696CrossRefGoogle Scholar
  29. 29.
    Zhou X, Liu Z, Sun G, Wang X (2017) Adaptive saliency fusion based on quality assessment. Multimed Tools Appl 76(22):23187–23211CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Shanghai Institute for Advanced Communication and Data ScienceShanghai UniversityShanghaiChina
  2. 2.School of Communication and Information EngineeringShanghai UniversityShanghaiChina
  3. 3.Institute of Information and ControlHangzhou Dianzi UniversityHangzhouChina

Personalised recommendations