Skip to main content

Adaptive Online Learning for Video Object Segmentation

  • Conference paper
  • First Online:
Intelligence Science and Big Data Engineering. Visual Data Engineering (IScIDE 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11935))

  • 1427 Accesses

Abstract

In this work, we address the problem of video object segmentation (VOS), namely segmenting specific objects throughout a video sequence when given only an annotated first frame. Previous VOS methods based on deep neural networks often solves this problem by fine-tuning the segmentation model in the first frame of the test video sequence, which is time-consuming and can not be well adapted to the current target video. In this paper, we proposed the adaptive online learning for video object segmentation (AOL-VOS), which adaptively optimizes the network parameters and hyperparameters of segmentation model for better predicting the segmentation results. Specifically, we first pre-train the segmentation model with the static video frames and then learn the effective adaptation strategy on the training set by optimizing both network parameters and hyperparameters. In the testing process, we learn how to online adapt the learned segmentation model to the specific testing video sequence and the corresponding future video frames, where the confidence patterns is employed to constrain/guide the implementation of adaptive learning process by fusing both object appearance and motion cue information. Comprehensive evaluations on Davis 16 and SegTrack V2 datasets well demonstrate the significant superiority of our proposed AOL-VOS over other state-of-the-arts for video object segmentation task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Venetianer, P.L., et al.: Video surveillance system employing video primitives. Google Patents, US Patent 9,892,606 (2018)

    Google Scholar 

  2. Serrano, A., Sitzmann, V., Ruiz-Borau, J., Wetzstein, G., Gutierrez, D., Masiz, B.: Movie editing and cognitive event segmentation in virtual reality video. ACM Trans. Graph. 36, 47 (2017)

    Article  Google Scholar 

  3. Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36, 18–32 (2014)

    Article  Google Scholar 

  4. Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFs. In: CVPR, pp. 669–677 (2016)

    Google Scholar 

  5. Teichmann, M., Weber, M., Zoellner, M., Cipolla, R., Urtasun, R.: MultiNet: real-time joint semantic reasoning for autonomous driving. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1013–1020 (2018)

    Google Scholar 

  6. Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: CVPR, pp. 1647–1655 (2017)

    Google Scholar 

  7. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)

    Google Scholar 

  8. Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR, pp. 730–738 (2017)

    Google Scholar 

  9. Hu, Y.-T., Huang, J.-B., Schwing, A.G.: Unsupervised video object segmentation using motion saliency-guided spatio-temporal propagation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 813–830. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_48

    Chapter  Google Scholar 

  10. Li, S., Seybold, B., Vorobyov, A., Fathi, A., Huang, Q., Jay Kuo, C.-C.: Instance embedding transfer to unsupervised video object segmentation. In: CVPR, pp. 6526–6535 (2018)

    Google Scholar 

  11. Li, S., Seybold, B., Vorobyov, A., Lei, X., Kuo, C.-C.J.: Unsupervised video object segmentation with motion-based bilateral networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 215–231. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_13

    Chapter  Google Scholar 

  12. Croitoru, I., Bogolin, S.-V., Leordeanu, M.: Unsupervised learning from video to detect foreground objects in single images. In: ICCV, pp. 4345–4353 (2017)

    Google Scholar 

  13. Tsai, Y.-H., Yang, M.-H., Black, M.J.: Video segmentation via object flow. In: CVPR, pp. 3899–3908 (2016)

    Google Scholar 

  14. Xiao, H., Feng, J., Lin, G., Liu, Y., Zhang, M.: MoNet: deep motion exploitation for video object segmentation. In: CVPR, pp. 1140–1148 (2018)

    Google Scholar 

  15. Hu, P., Wang, G., Kong, X., Kuen, J., Tan, Y.-P.: Motion-guided cascaded refinement network for video object segmentation. In: CVPR, pp. 1400–1409 (2018)

    Google Scholar 

  16. Chen, Y., Pont-Tuset, J., Montes, A., Van Gool, L.: Blazingly fast video object segmentation with pixel-wise metric learning. In: CVPR, pp. 1189–1198 (2018)

    Google Scholar 

  17. Cheng, J., Tsai, Y.-H., Wang, S., Yang, M.-H.: SegFlow: joint learning for video object segmentation and optical flow. In: ICCV, pp. 686–695 (2017)

    Google Scholar 

  18. Xu, M., Fan, C., Wang, Y., Ryoo, M.S., Crandall, D.J.: Joint person segmentation and identification in synchronized first- and third-person videos. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 656–672. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_39

    Chapter  Google Scholar 

  19. Li, X., Loy, C.C.: Video object segmentation with joint re-identification and attention-aware mask propagation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 93–110. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_6

    Chapter  Google Scholar 

  20. Li, X., Loy, C.C.: Video object segmentation with re-identification. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)

    Google Scholar 

  21. Wug Oh, S., Lee, J.-Y., Sunkavalli, K., Joo Kim, S.: Fast video object segmentation by reference-guided mask propagation. In: CVPR, pp. 7376–7385 (2018)

    Google Scholar 

  22. Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for video object segmentation. In: BMVC, pp. 656–672 (2017)

    Google Scholar 

  23. Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for object tracking. arXiv preprint arXiv:1703.09554 (2017)

  24. Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: ICCV, pp. 2192–2199 (2013)

    Google Scholar 

  25. Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR, pp. 724–732 (2016)

    Google Scholar 

  26. Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)

  27. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with Atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611 (2018)

  28. Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for the 2017 DAVIS challenge on video object segmentation. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)

    Google Scholar 

  29. Caelles, S., Maninis, K.-K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: CVPR, pp. 5320–5329 (2017)

    Google Scholar 

  30. Maninis, K.-K., et al.: Video object segmentation without temporal information. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1515–1530 (2018)

    Article  Google Scholar 

  31. Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: CVPR, pp. 3491–3500 (2017)

    Google Scholar 

  32. Hu, Y.-T., Huang, J.-B., Schwing, A.: MaskRNN: instance level video object segmentation. In: NIPS, pp. 325–334 (2017)

    Google Scholar 

  33. Ci, H., Wang, C., Wang, Y.: Video object segmentation by learning location-sensitive embeddings. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 524–539. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_31

    Chapter  Google Scholar 

  34. Xiao, H., Kang, B., Liu, Y., Zhang, M., Feng, J.: Online meta adaptation for fast video object segmentation. IEEE Trans. Pattern Anal. Mach. Intell.(2019)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China (Nos. 61972204, 61906094, U1713208), Tencent AI Lab Rhino-Bird Focused Research Program (No. JR201922), the Fundamental Research Funds for the Central Universities (Nos. 30918011321 and 30919011232).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunyan Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wei, L., Xu, C., Zhang, T. (2019). Adaptive Online Learning for Video Object Segmentation. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Visual Data Engineering. IScIDE 2019. Lecture Notes in Computer Science(), vol 11935. Springer, Cham. https://doi.org/10.1007/978-3-030-36189-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36189-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36188-4

  • Online ISBN: 978-3-030-36189-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics