Skip to main content
Log in

Video object segmentation based on temporal frame context information fusion and feature enhancement

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

At present, a large number of video object segmentation algorithms only use a small amount of frame information to guide the segmentation of the current frame, but fail to fully exploit the information of the historical frames, which makes the network model difficult for the network model to adapt to complex environmental changes, causing the phenomenon of object drift; at the same time, the mask refinement method is also rough, resulting in blurred edges of the generated mask. To solve this problem, this paper proposes a video object segmentation algorithms based on temporal frame context information fusion and feature enhancement. First, in order to make full use of historical frame information, this paper proposes a temporal frame residual fusion module to adaptively fuse historical frame information. Second, a spatial cascade mask refinement module is established to enhance the spatial information of the shallow features of the backbone network and refine the edge information of the fusion features. The experimental results show that our algorithm achieves the performance (J&F) of 87.4% and 76.6% on DAVIS2016 and DAVIS2017 respectively and the segmentation speed (FPS) also meets the real-time requirements, reaching 26FPS on DAVIS2016 validation set. Contrast to many mainstream algorithms in recent years, it has obvious advantages in performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Caelles S, Maninis K-K, Pont-Tuset J, Leal-Taixé L, Cremers D, Gool LV (2017) One-shot video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 221–230

  2. Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for the 2017 davis challenge on video object segmentation. In: The 2017 DAVIS challenge on video object segmentation-CVPR workshops, vol 5

  3. Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2663–2672

  4. Chen Y, Pont-Tuset J, Montes A, Gool LV (2018) Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1189–1198

  5. Hu Y-T, Huang J-B, Schwing AG (2018) Videomatch: Matching based video object segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 54–70

  6. Cheng J, Tsai Y-H, Hung W-C, Wang S, Yang M-H (2018) Fast and accurate online video object segmentation via tracking parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7415–7424

  7. Li X, Loy CC (2018) Video object segmentation with joint re-identification and attention-aware mask propagation. In: Proceedings of the European conference on computer vision (ECCV), pp 90–105

  8. Wang Q, Zhang L, Bertinetto L, Hu W, Torr PHS (2019) Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1328–1338

  9. Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 539–546

  10. Zeng X, Liao R, Li G u, Xiong Y, Fidler S, Urtasun R (2019) Dmm-net: Differentiable mask-matching network for video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3929–3938

  11. Huang W, Gu J, Ma X, Li Y (2020) End-to-end multitask siamese network with residual hierarchical attention for real-time object tracking. Appl Intell 50(6):1908–1921

    Article  Google Scholar 

  12. Yang L, Wang Y, Xiong X, Yang J, Katsaggelos AK (2018) Efficient video object segmentation via network modulation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6499–6507

  13. Oh SW, Lee J-Y, Sunkavalli K, Kim SJ (2018) Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7376–7385

  14. Wang Z, Xu J, Li L, Zhu F, Shao L (2019) Ranet: Ranking attention network for fast video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3978–3987

  15. Johnander J, Danelljan M, Brissman E, Khan FS, Felsberg M (2019) A generative appearance model for end-to-end video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8953–8962

  16. Wang H, Liu W, Xing W (2022) A temporal attention based appearance model for video object segmentation. Appl Intell 52(2):2290–2300

    Article  Google Scholar 

  17. Yin Y, De X u, Wang X, Zhang L (2021) Directional deep embedding and appearance learning for fast video object segmentation. IEEE Transactions on Neural Networks and Learning Systems

  18. Voigtlaender P, Chai Y, Schroff F, Adam H, Leibe B, Chen L-C (2019) Feelvos: Fast end-to-end embedding learning for video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9481– 9490

  19. Fu L, Yu Z, Sun X, Huang J, Wang D, Yu D (2021) Video object segmentation based on motion-aware roi prediction and adaptive reference updating. Expert Syst Appl 167:114153

    Article  Google Scholar 

  20. Oh SW, Lee J-Y, Xu N, Kim SJ (2019) Video object segmentation using space-time memory networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9226–9235

  21. Li Y, Shen Z, Shan Y (2020) Fast video object segmentation using the global context module. In: European conference on computer vision. Springer, pp 735–750

  22. Seong H, Hyun J, Kim E (2020) Kernelized memory network for video object segmentation. In: European conference on computer vision. Springer, pp 629–645

  23. Singh KK, Lee JY (2017) Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In: 2017 IEEE International conference on computer vision (ICCV). IEEE, pp 3544–3553

  24. Lu X, Wang W, Shen J, Crandall D, Luo J (2020) Zero-shot video object segmentation with co-attention siamese networks. IEEE Transactions on Pattern Analysis and Machine Intelligence

  25. Lu X, Wang W, Danelljan M, Zhou T, Shen J, Gool LV (2020) Video object segmentation with episodic graph memory networks. In: European conference on computer vision. Springer, pp 661–679

  26. Lu X, Wangm W, Shen J, Crandall D, Van Gool L (2021) Segmenting objects from relational visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence

  27. Zhang Y, Wu Z, Peng H, Lin S (2020) A transductive approach for video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6949–6958

  28. Zhang L, Gonzalez-Garcia A, Van De Weijer J, Danelljan M, Khan FS (2019) Learning the model update for siamese trackers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4010–4019

  29. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  30. Robinson A, Lawin FJ, Danelljan M, Khan FS, Felsberg M (2020) Learning fast and robust target models for video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7406–7415

  31. Khoreva A, Benenson R, Ilg E, Brox T, Schiele B (2017) Lucid data dreaming for object tracking. In: The DAVIS challenge on video object segmentation

  32. Bao L, Wu B, Liu W (2018) Cnn in mrf Video object segmentation via inference in a cnn-based higher-order spatio-temporal mrf. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5977–5986

  33. Xu K, Wen L, Li G, Bo L, Huang Q (2019) Spatiotemporal cnn for video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1379–1388

  34. Xi C, Li Z, Ye Y, Yu G, Shen J, Qi D (2020) State-aware tracker for real-time video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9384–9393

  35. Park H, Yoo J, Jeong S, Venkatesh G, Kwak N (2021) Learning dynamic network using a reuse gate function in semi-supervised video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8405–8414

  36. Yang S, Lu Z, Qi J, Lu H, Wang S, Zhang X (2021) Learning motion-appearance co-attention for zero-shot video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1564–1573

  37. Ji G-P, Fu K, Wu Z, Fan D-P, Shen J, Shao L (2021) Full-duplex strategy for video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4922–4933

  38. Sun M, Xiao J, Lim EG, Xie Y, Feng J (2020) Adaptive roi generation for video object segmentation using reinforcement learning. Pattern Recogn 106:107465

    Article  Google Scholar 

  39. Maninis K-K, Caelles S, Chen Y, Pont-Tuset J, Leal-Taixé L, Cremers D, Gool LV (2018) Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(6):1515–1530

    Article  Google Scholar 

  40. Lin H, Qi X, Jia J (2019) Agss-vos: Attention guided single-shot video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3949– 3957

  41. Voigtlaender P, Luiten J, Torr PHS, Leibe B (2020) R-cnn: Siam Visual tracking by re-detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6578–6588

  42. Huang X, Xu J, Tai Y-W, Tang C-K (2020) Fast video object segmentation with temporal aggregation network and dynamic template matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8879– 8889

  43. Ge W, Lu X, Shen J (2021) Video object segmentation using global and instance embedding learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16836–16845

  44. Duarte K, Rawat YS, Shah M (2019) Capsulevos: Semi-supervised video object segmentation using capsule routing. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8480–8489

  45. Ventura C, Bellver M, Girbau A, Salvador A, Marques F, Nieto XG-I (2019) Rvos: End-to-end recurrent network for video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5277–5286

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under grant no. 62072370.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fucheng Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, Z., Li, F., Wang, S. et al. Video object segmentation based on temporal frame context information fusion and feature enhancement. Appl Intell 53, 6496–6510 (2023). https://doi.org/10.1007/s10489-022-03693-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03693-z

Keywords

Navigation