Adaptive Guidance and Attention-Refined Network for Fast Video Object Segmentation

Li, Yaqian; Li, Moran; Xiao, Cunjun; Li, Haibin

doi:10.1007/s11063-023-11257-6

Adaptive Guidance and Attention-Refined Network for Fast Video Object Segmentation

Published: 15 July 2023

Volume 55, pages 7211–7225, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Yaqian Li ORCID: orcid.org/0000-0003-1032-9910^1,2,
Moran Li¹,
Cunjun Xiao¹ &
…
Haibin Li¹

134 Accesses
Explore all metrics

Abstract

Most video object segmentation networks have difficulties in balancing accuracy and speed, leading them to fail to meet the requirements of application. In this paper, we propose a lightweight online-trained video object segmentation network. Specifically, to force the network focus on the potential object, we propose a new way to guide the encoder module by classification score map, and integrate a cross-dimension attention into the refinement segmentation module. Meanwhile, to reduce the negative influence of unreliable samples, we use two indexes to adaptively choose templates for the memory module. Experiments were conducted on three popular benchmarks, and our approach has achieved a good trade-off between accuracy and speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual Attention Based Network with Hierarchical ConvLSTM for Video Object Segmentation

Adaptive Online Learning for Video Object Segmentation

COMatchNet: Co-Attention Matching Network for Video Object Segmentation

References

Bao L, Wu B, Liu w (2018) CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF’. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5977–5986
Caelles S et al (2017) One-shot video object segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5320–5329
Chen X et al (2020) State-aware tracker for real-time video object segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 9384–9393
Cheng J et al (2018) Fast and accurate online video object segmentation via tracking parts. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7415–7424
Ci H, Wang C, Wang Y (2018) Video object segmentation by learning location-sensitive embeddings. In: European conference on computer vision (ECCV). Springer, pp 524–539
Dai Y et al (2021) Attentional feature fusion. In: IEEE winter conference on applications of computer vision, pp 3560–3569
Duarte K, Rawat YS, Shah M (2019) CapsuleVOS: semi-supervised video object segmentation using capsule routing. In: IEEE international conference on computer vision (ICCV), pp 8479–8488
Duke B et al (2021) Sstvos: sparse spatiotemporal transformers for video object segmentation. In: EEE conference on computer vision and pattern recognition (CVPR), pp 5912–5921
Ge W, Lu X, Shen J (2021) Video object segmentation using global and instance embedding learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 16836–16845
Zhou Q et al (2019) Motion-guided spatial time attention for video object segmentation. In: IEEE international conference on computer vision (ICCV) workshops, pp 693–696
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7132–7141
Hu L et al. (2021) “Learning Position and Target Consistency for Memory-Based Video Object Segmentation”. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 4144– 4154
Jampani V, Gadde R, Gehler PV (2017) Video propagation networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3154–3164
JPont-Tuset J et al (2017) The 2017 DAVIS challenge on video object segmentation. arXiv eprints arXiv:1704.00675
Khoreva A et al (2019) Lucid data dreaming for video object segmentation. Int J Comput Vis 127(9):1175–1197
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. In: Computer Science
Maninis KK et al (2018) Video object segmentation without temporal information. IEEE Trans Pattern Anal Mach Intell 41(6):1515–1530
Article Google Scholar
Li L et al (2022) Locality-aware inter-and intra-video reconstruction for self-supervised correspondence learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 8709–8720
Li X, Change Loy C (2018) Video object segmentation with joint re-identification and attention-aware mask propagation. In: European conference on computer vision (ECCV). Springer, pp 90–105
Li Y, Shen Z, Shan Y (2020) Fast video object segmentation using the global context module. In: European conference on computer vision (ECCV), vol 12355. Springer, pp 735–750
Liang Y et al (2020) Video object segmentation with adaptive feature bank and uncertain-region refinement. In: Conference and workshop on neural information processing systems (NIPS)
Lu X et al (2020) Learning video object segmentation from unlabeled videos. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 8960–8970
Lu X et al (2020) Video object segmentation with episodic graph memory networks. In: European conference on computer vision (ECCV). Springer, pp 661–679
Luiten J, Voigtlaender P, Leibe B (2018) PReMVOS: proposal-generation, refinement and merging for video object segmentation. In: Asian conference on computer vision. vol 11364. Lecture Notes in Computer Science, pp 565–580
Martin DR, Fowlkes CC, Malik J (2004) Learning to detect natural image boundaries using local brightness, color, and texture cues . In: IEEE transactions on pattern analysis and machine intelligence 26.5, pp 530–549
Misra D et al (2021) Rotate to attend: convolutional triplet attention module. In: IEEE winter conference on applications of computer vision, pp 3139–3148
Nocedal J, Wright SJ (1999) Numerical optimization. Springer, Berlin
Book MATH Google Scholar
Oh S et al (2018) Fast video object segmentation by reference-guided mask propagation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7376–7385
Oh SW et al (2019) Video object segmentation using space-time memory networks. In: IEEE international conference on computer vision (ICCV), pp 9225–9234
Ohnander J et al (2019) A generative appearance model for end-to-end video object segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 8953– 8962
Perazzi F et al (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 724–732
Perazzi F et al (2017) Learning video object segmentation from static images. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3491–3500
Robinson A et al (2020) Learning fast and robust target models for video object segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7406–7415
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention. vol 9351. Springer, pp 234–241
Seong H, Hyun J, Kim E (2020) Kernelized memory network for video object segmentation. In: European conference on computer vision (ECCV). Springer, pp 629–645
Shewchuk JR et al (1994) An introduction to the conjugate gradient method without the agonizing pain. In: Technical Report
Yoon JS et al (2017) Pixel-level matching for video object segmentation using convolutional neural networks. In: IEEE international conference on computer vision (ICCV), pp 2186–2195
Tjaden H et al (2018) A region-based gauss-newton approach to real-time monocular multiple object tracking. IEEE Trans Pattern Anal Mach Intell 41(8):1797–1812
Article Google Scholar
Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for video object segmentation. In: The British machine vision conference (BMVC), pp 116.1–116.13
Voigtlaender P et al (2019) FEELVOS: fast end-to-end embedding learning for video object segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 9481– 9490
Wang W et al (2018) Semi-supervised video object segmentation with super-trajectories. IEEE Trans Pattern Anal Mach Intell 41(4):985–998
Article Google Scholar
Wang Z et al (2019) RANet: ranking attention network for fast video object segmentation. In: IEEE international conference on computer vision (ICCV), pp 3977–3986
Woo S et al (2018) CBAM: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Xiao H et al (2018) MoNet: deep motion exploitation for video object segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1140–1148
Xiao H et al (2019) Online meta adaptation for fast video object segmentation. IEEE Trans Pattern Anal Mach Intell 42(5):1205–1217
Google Scholar
Xu N et al (2018) YouTube-VOS: a large-scale video object segmentation benchmark. arXiv e-prints , arXiv: 1809.03327
Yang C et al (2021) Self-supervised video object segmentation by motion grouping. In: IEEE international conference on computer vision (CVPR), pp 7177–7188
Yang L et al (2018) Efficient video object segmentation via network modulation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6499–6507
Yang L et al (2018) Efficient video object segmentation via network modulation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6499–6507
Yin Y et al (2021) Directional deep embedding and appearance learning for fast video object segmentation. IEEE Trans Neural Netw Learn Syst 33(8):3884–3894
Article Google Scholar
Zhang L et al (2019) Fast video object segmentation via dynamic targeting network. In: IEEE international conference on computer vision (ICCV), pp 5581–5590
Zhang Y et al (2020) A transductive approach for video object segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6947–6956
Zhang Y et al (2020) A transductive approach for video object segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6949–6958
Zhou T et al (2022) A survey on deep learning technique for video segmentation. In: IEEE transactions on pattern analysis and machine intelligence, pp 1–20
Zhou Tianfei et al (2020) MATNet: motion-attentive transition network for zero-shot video object segmentation. IEEE Trans Image Process 29:8326–8338
Article MATH Google Scholar
Zhou T et al (2021) Target-aware object discovery and association for unsupervised video multi- object segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6985–6994
Zhou Z et al (2019) Enhanced memory network for video segmentation. In: IEEE international conference on computer vision (ICCV) workshops, pp 689–692

Download references

Acknowledgements

Support for this study is provided by the National Natural Science Foundation of China, Research on occlusion perception, repair and reliability evaluation method for occlusion face recognition. 62106214. Support for this study is also provided by the Provincial Key Laboratory Performance Subsidy Project. 22567612 H.

Author information

Authors and Affiliations

School of Electrical Engineering, Yanshan University, Qinhuangdao, 066004, China
Yaqian Li, Moran Li, Cunjun Xiao & Haibin Li
Key Lab of Industrial Computer Control, Engineering of Hebei Province, Qinhuangdao, 066004, China
Yaqian Li

Authors

Yaqian Li
View author publications
You can also search for this author in PubMed Google Scholar
Moran Li
View author publications
You can also search for this author in PubMed Google Scholar
Cunjun Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Haibin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moran Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Y., Li, M., Xiao, C. et al. Adaptive Guidance and Attention-Refined Network for Fast Video Object Segmentation. Neural Process Lett 55, 7211–7225 (2023). https://doi.org/10.1007/s11063-023-11257-6

Download citation

Accepted: 13 March 2023
Published: 15 July 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11063-023-11257-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Guidance and Attention-Refined Network for Fast Video Object Segmentation

Abstract

Access this article

Similar content being viewed by others

Dual Attention Based Network with Hierarchical ConvLSTM for Video Object Segmentation

Adaptive Online Learning for Video Object Segmentation

COMatchNet: Co-Attention Matching Network for Video Object Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive Guidance and Attention-Refined Network for Fast Video Object Segmentation

Abstract

Access this article

Similar content being viewed by others

Dual Attention Based Network with Hierarchical ConvLSTM for Video Object Segmentation

Adaptive Online Learning for Video Object Segmentation

COMatchNet: Co-Attention Matching Network for Video Object Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation