Adaptive Online Learning for Video Object Segmentation

Wei, Li; Xu, Chunyan; Zhang, Tong

doi:10.1007/978-3-030-36189-1_2

Li Wei¹³,
Chunyan Xu¹³ &
Tong Zhang¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11935))

Included in the following conference series:

International Conference on Intelligent Science and Big Data Engineering

1427 Accesses

Abstract

In this work, we address the problem of video object segmentation (VOS), namely segmenting specific objects throughout a video sequence when given only an annotated first frame. Previous VOS methods based on deep neural networks often solves this problem by fine-tuning the segmentation model in the first frame of the test video sequence, which is time-consuming and can not be well adapted to the current target video. In this paper, we proposed the adaptive online learning for video object segmentation (AOL-VOS), which adaptively optimizes the network parameters and hyperparameters of segmentation model for better predicting the segmentation results. Specifically, we first pre-train the segmentation model with the static video frames and then learn the effective adaptation strategy on the training set by optimizing both network parameters and hyperparameters. In the testing process, we learn how to online adapt the learned segmentation model to the specific testing video sequence and the corresponding future video frames, where the confidence patterns is employed to constrain/guide the implementation of adaptive learning process by fusing both object appearance and motion cue information. Comprehensive evaluations on Davis 16 and SegTrack V2 datasets well demonstrate the significant superiority of our proposed AOL-VOS over other state-of-the-arts for video object segmentation task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Venetianer, P.L., et al.: Video surveillance system employing video primitives. Google Patents, US Patent 9,892,606 (2018)
Google Scholar
Serrano, A., Sitzmann, V., Ruiz-Borau, J., Wetzstein, G., Gutierrez, D., Masiz, B.: Movie editing and cognitive event segmentation in virtual reality video. ACM Trans. Graph. 36, 47 (2017)
Article Google Scholar
Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36, 18–32 (2014)
Article Google Scholar
Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFs. In: CVPR, pp. 669–677 (2016)
Google Scholar
Teichmann, M., Weber, M., Zoellner, M., Cipolla, R., Urtasun, R.: MultiNet: real-time joint semantic reasoning for autonomous driving. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1013–1020 (2018)
Google Scholar
Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: CVPR, pp. 1647–1655 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR, pp. 730–738 (2017)
Google Scholar
Hu, Y.-T., Huang, J.-B., Schwing, A.G.: Unsupervised video object segmentation using motion saliency-guided spatio-temporal propagation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 813–830. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_48
Chapter Google Scholar
Li, S., Seybold, B., Vorobyov, A., Fathi, A., Huang, Q., Jay Kuo, C.-C.: Instance embedding transfer to unsupervised video object segmentation. In: CVPR, pp. 6526–6535 (2018)
Google Scholar
Li, S., Seybold, B., Vorobyov, A., Lei, X., Kuo, C.-C.J.: Unsupervised video object segmentation with motion-based bilateral networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 215–231. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_13
Chapter Google Scholar
Croitoru, I., Bogolin, S.-V., Leordeanu, M.: Unsupervised learning from video to detect foreground objects in single images. In: ICCV, pp. 4345–4353 (2017)
Google Scholar
Tsai, Y.-H., Yang, M.-H., Black, M.J.: Video segmentation via object flow. In: CVPR, pp. 3899–3908 (2016)
Google Scholar
Xiao, H., Feng, J., Lin, G., Liu, Y., Zhang, M.: MoNet: deep motion exploitation for video object segmentation. In: CVPR, pp. 1140–1148 (2018)
Google Scholar
Hu, P., Wang, G., Kong, X., Kuen, J., Tan, Y.-P.: Motion-guided cascaded refinement network for video object segmentation. In: CVPR, pp. 1400–1409 (2018)
Google Scholar
Chen, Y., Pont-Tuset, J., Montes, A., Van Gool, L.: Blazingly fast video object segmentation with pixel-wise metric learning. In: CVPR, pp. 1189–1198 (2018)
Google Scholar
Cheng, J., Tsai, Y.-H., Wang, S., Yang, M.-H.: SegFlow: joint learning for video object segmentation and optical flow. In: ICCV, pp. 686–695 (2017)
Google Scholar
Xu, M., Fan, C., Wang, Y., Ryoo, M.S., Crandall, D.J.: Joint person segmentation and identification in synchronized first- and third-person videos. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 656–672. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_39
Chapter Google Scholar
Li, X., Loy, C.C.: Video object segmentation with joint re-identification and attention-aware mask propagation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 93–110. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_6
Chapter Google Scholar
Li, X., Loy, C.C.: Video object segmentation with re-identification. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)
Google Scholar
Wug Oh, S., Lee, J.-Y., Sunkavalli, K., Joo Kim, S.: Fast video object segmentation by reference-guided mask propagation. In: CVPR, pp. 7376–7385 (2018)
Google Scholar
Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for video object segmentation. In: BMVC, pp. 656–672 (2017)
Google Scholar
Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for object tracking. arXiv preprint arXiv:1703.09554 (2017)
Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: ICCV, pp. 2192–2199 (2013)
Google Scholar
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR, pp. 724–732 (2016)
Google Scholar
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with Atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611 (2018)
Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for the 2017 DAVIS challenge on video object segmentation. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)
Google Scholar
Caelles, S., Maninis, K.-K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: CVPR, pp. 5320–5329 (2017)
Google Scholar
Maninis, K.-K., et al.: Video object segmentation without temporal information. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1515–1530 (2018)
Article Google Scholar
Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: CVPR, pp. 3491–3500 (2017)
Google Scholar
Hu, Y.-T., Huang, J.-B., Schwing, A.: MaskRNN: instance level video object segmentation. In: NIPS, pp. 325–334 (2017)
Google Scholar
Ci, H., Wang, C., Wang, Y.: Video object segmentation by learning location-sensitive embeddings. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 524–539. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_31
Chapter Google Scholar
Xiao, H., Kang, B., Liu, Y., Zhang, M., Feng, J.: Online meta adaptation for fast video object segmentation. IEEE Trans. Pattern Anal. Mach. Intell.(2019)
Google Scholar

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China (Nos. 61972204, 61906094, U1713208), Tencent AI Lab Rhino-Bird Focused Research Program (No. JR201922), the Fundamental Research Funds for the Central Universities (Nos. 30918011321 and 30919011232).

Author information

Authors and Affiliations

Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Li Wei, Chunyan Xu & Tong Zhang

Authors

Li Wei
View author publications
You can also search for this author in PubMed Google Scholar
Chunyan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Tong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunyan Xu .

Editor information

Editors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Zhen Cui
Nanjing University of Science and Technology, Nanjing, China
Jinshan Pan
Nanjing University of Science and Technology, Nanjing, China
Shanshan Zhang
Nanjing University of Science and Technology, Nanjing, China
Liang Xiao
Nanjing University of Science and Technology, Nanjing, China
Jian Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, L., Xu, C., Zhang, T. (2019). Adaptive Online Learning for Video Object Segmentation. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Visual Data Engineering. IScIDE 2019. Lecture Notes in Computer Science(), vol 11935. Springer, Cham. https://doi.org/10.1007/978-3-030-36189-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-36189-1_2
Published: 29 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36188-4
Online ISBN: 978-3-030-36189-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics