Abstract
At present, a large number of video object segmentation algorithms only use a small amount of frame information to guide the segmentation of the current frame, but fail to fully exploit the information of the historical frames, which makes the network model difficult for the network model to adapt to complex environmental changes, causing the phenomenon of object drift; at the same time, the mask refinement method is also rough, resulting in blurred edges of the generated mask. To solve this problem, this paper proposes a video object segmentation algorithms based on temporal frame context information fusion and feature enhancement. First, in order to make full use of historical frame information, this paper proposes a temporal frame residual fusion module to adaptively fuse historical frame information. Second, a spatial cascade mask refinement module is established to enhance the spatial information of the shallow features of the backbone network and refine the edge information of the fusion features. The experimental results show that our algorithm achieves the performance (J&F) of 87.4% and 76.6% on DAVIS2016 and DAVIS2017 respectively and the segmentation speed (FPS) also meets the real-time requirements, reaching 26FPS on DAVIS2016 validation set. Contrast to many mainstream algorithms in recent years, it has obvious advantages in performance.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Caelles S, Maninis K-K, Pont-Tuset J, Leal-Taixé L, Cremers D, Gool LV (2017) One-shot video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 221–230
Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for the 2017 davis challenge on video object segmentation. In: The 2017 DAVIS challenge on video object segmentation-CVPR workshops, vol 5
Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2663–2672
Chen Y, Pont-Tuset J, Montes A, Gool LV (2018) Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1189–1198
Hu Y-T, Huang J-B, Schwing AG (2018) Videomatch: Matching based video object segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 54–70
Cheng J, Tsai Y-H, Hung W-C, Wang S, Yang M-H (2018) Fast and accurate online video object segmentation via tracking parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7415–7424
Li X, Loy CC (2018) Video object segmentation with joint re-identification and attention-aware mask propagation. In: Proceedings of the European conference on computer vision (ECCV), pp 90–105
Wang Q, Zhang L, Bertinetto L, Hu W, Torr PHS (2019) Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1328–1338
Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 539–546
Zeng X, Liao R, Li G u, Xiong Y, Fidler S, Urtasun R (2019) Dmm-net: Differentiable mask-matching network for video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3929–3938
Huang W, Gu J, Ma X, Li Y (2020) End-to-end multitask siamese network with residual hierarchical attention for real-time object tracking. Appl Intell 50(6):1908–1921
Yang L, Wang Y, Xiong X, Yang J, Katsaggelos AK (2018) Efficient video object segmentation via network modulation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6499–6507
Oh SW, Lee J-Y, Sunkavalli K, Kim SJ (2018) Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7376–7385
Wang Z, Xu J, Li L, Zhu F, Shao L (2019) Ranet: Ranking attention network for fast video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3978–3987
Johnander J, Danelljan M, Brissman E, Khan FS, Felsberg M (2019) A generative appearance model for end-to-end video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8953–8962
Wang H, Liu W, Xing W (2022) A temporal attention based appearance model for video object segmentation. Appl Intell 52(2):2290–2300
Yin Y, De X u, Wang X, Zhang L (2021) Directional deep embedding and appearance learning for fast video object segmentation. IEEE Transactions on Neural Networks and Learning Systems
Voigtlaender P, Chai Y, Schroff F, Adam H, Leibe B, Chen L-C (2019) Feelvos: Fast end-to-end embedding learning for video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9481– 9490
Fu L, Yu Z, Sun X, Huang J, Wang D, Yu D (2021) Video object segmentation based on motion-aware roi prediction and adaptive reference updating. Expert Syst Appl 167:114153
Oh SW, Lee J-Y, Xu N, Kim SJ (2019) Video object segmentation using space-time memory networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9226–9235
Li Y, Shen Z, Shan Y (2020) Fast video object segmentation using the global context module. In: European conference on computer vision. Springer, pp 735–750
Seong H, Hyun J, Kim E (2020) Kernelized memory network for video object segmentation. In: European conference on computer vision. Springer, pp 629–645
Singh KK, Lee JY (2017) Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In: 2017 IEEE International conference on computer vision (ICCV). IEEE, pp 3544–3553
Lu X, Wang W, Shen J, Crandall D, Luo J (2020) Zero-shot video object segmentation with co-attention siamese networks. IEEE Transactions on Pattern Analysis and Machine Intelligence
Lu X, Wang W, Danelljan M, Zhou T, Shen J, Gool LV (2020) Video object segmentation with episodic graph memory networks. In: European conference on computer vision. Springer, pp 661–679
Lu X, Wangm W, Shen J, Crandall D, Van Gool L (2021) Segmenting objects from relational visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence
Zhang Y, Wu Z, Peng H, Lin S (2020) A transductive approach for video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6949–6958
Zhang L, Gonzalez-Garcia A, Van De Weijer J, Danelljan M, Khan FS (2019) Learning the model update for siamese trackers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4010–4019
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115(3):211–252
Robinson A, Lawin FJ, Danelljan M, Khan FS, Felsberg M (2020) Learning fast and robust target models for video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7406–7415
Khoreva A, Benenson R, Ilg E, Brox T, Schiele B (2017) Lucid data dreaming for object tracking. In: The DAVIS challenge on video object segmentation
Bao L, Wu B, Liu W (2018) Cnn in mrf Video object segmentation via inference in a cnn-based higher-order spatio-temporal mrf. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5977–5986
Xu K, Wen L, Li G, Bo L, Huang Q (2019) Spatiotemporal cnn for video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1379–1388
Xi C, Li Z, Ye Y, Yu G, Shen J, Qi D (2020) State-aware tracker for real-time video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9384–9393
Park H, Yoo J, Jeong S, Venkatesh G, Kwak N (2021) Learning dynamic network using a reuse gate function in semi-supervised video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8405–8414
Yang S, Lu Z, Qi J, Lu H, Wang S, Zhang X (2021) Learning motion-appearance co-attention for zero-shot video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1564–1573
Ji G-P, Fu K, Wu Z, Fan D-P, Shen J, Shao L (2021) Full-duplex strategy for video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4922–4933
Sun M, Xiao J, Lim EG, Xie Y, Feng J (2020) Adaptive roi generation for video object segmentation using reinforcement learning. Pattern Recogn 106:107465
Maninis K-K, Caelles S, Chen Y, Pont-Tuset J, Leal-Taixé L, Cremers D, Gool LV (2018) Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(6):1515–1530
Lin H, Qi X, Jia J (2019) Agss-vos: Attention guided single-shot video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3949– 3957
Voigtlaender P, Luiten J, Torr PHS, Leibe B (2020) R-cnn: Siam Visual tracking by re-detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6578–6588
Huang X, Xu J, Tai Y-W, Tang C-K (2020) Fast video object segmentation with temporal aggregation network and dynamic template matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8879– 8889
Ge W, Lu X, Shen J (2021) Video object segmentation using global and instance embedding learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16836–16845
Duarte K, Rawat YS, Shah M (2019) Capsulevos: Semi-supervised video object segmentation using capsule routing. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8480–8489
Ventura C, Bellver M, Girbau A, Salvador A, Marques F, Nieto XG-I (2019) Rvos: End-to-end recurrent network for video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5277–5286
Acknowledgements
This work is supported by the National Natural Science Foundation of China under grant no. 62072370.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hou, Z., Li, F., Wang, S. et al. Video object segmentation based on temporal frame context information fusion and feature enhancement. Appl Intell 53, 6496–6510 (2023). https://doi.org/10.1007/s10489-022-03693-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03693-z