Temporal-spatial information mining and aggregation for video matting

Ma, Zhiwei; Yao, Guilin

doi:10.1007/s11042-023-16747-2

Temporal-spatial information mining and aggregation for video matting

Published: 12 September 2023

Volume 83, pages 29221–29237, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

58 Accesses
Explore all metrics

Abstract

In previous video matting methods, there are some problems that require additional auxiliary information and lack of temporal consistency. To solve these problems, we propose a novel video matting framework (STMI-Net) based on temporal-spatial information mining and aggregation. This framework doesn’t require any auxiliary information and adopts a double decoder network structure, specifically, one decoder is composed of the recurrent network, which can make full use of the temporal information in the video frames to ensure the temporal coherence in results; and the other decoder is composed of the convolution network, which deeply restores the frame-by-frame spatial features to achieve the spatial continuity in results. By aggregating these two parts of the information at the global level, our model achieves 0.0066 MSE on the VideoMatte240K dataset, which surpasses the RVM baseline by 13%; and achieves 0.0047 MSE on PPM-100 portrait matting dataset, which surpasses the MG baseline by 26.5%. We also implement an ablation study to demonstrate the specific functions of the temporal decoder and the spatial decoder in our model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

One-Trimap Video Matting

Deep Video Matting with Temporal Consistency

Efficient Semantic-Guidance High-Resolution Video Matting

Data availability

The data that support the findings of this study are available from relevant references (we have cited the relevant references in the introduction of the dataset and have obtained the permission of the relevant authors) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the authors of relevant references.

References

Wang J, Cohen MF et al (2008) Image and video matting: a survey. Foundations and Trends® in Computer Graphics and Vision 3(2):97–175
Article CAS Google Scholar
Mahmoud M, Baltruˇsaitis T, Robinson P, Riek L (2011) 3d corpus of spontaneous complex mental states. In: Conference on affective computing and intelligent interaction. ACII 2011. Lecture notes in computer science, vol 6974
Ke Z, Li K, Zhou Y et al (2020) Is a green screen really necessary for real-time portrait matting? Conference on computer vision and pattern recognition (CVPR). IEEE ArXiv: abs/2011.11961
Lin S, Yang L, Saleemi I et al (2022) Robust high-resolution video matting with temporal guidance. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. arXiv:2108.11515
Seong H, Seoung O, Brian P, Euntai K, Lee J (2022) One-trimap video matting. ECCV. https://doi.org/10.1007/978-3-031-19818-2_25
Chen LC, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. IEEE Conference on Computer Vision & pattern recognition. arXiv preprint arXiv:1706.05587
Howard A, Sandler M, Chu G et al (2019) Searching for MobileNetV3. IEEE/CVF international conference on computer vision (ICCV). https://doi.org/10.48550/arXiv.1905.02244
Liu Y, Li Q, Yuan Y, Du Q, Wang Q (2021) ABNet: adaptive balanced network for multi-scale object detection in remote sensing imagery. IEEE Trans Geosci Remote Sens 60:1–14
CAS Google Scholar
Wang Q, Liu Y, Xiong Z, Yuan Y (2022) Hybrid feature aligned network for salient object detection in optical remote sensing imagery. IEEE Trans Geosci Remote Sens 60:1–15
Google Scholar
Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: unsupervised video object segmentation with co-attention Siamese networks. CVPR arXiv:2001.06810
Ge W, Lu X, Shen J (2021) Video object segmentation using global and instance embedding learning[C]. Computer vision and pattern recognition. IEEE, pp 16831–16840. https://doi.org/10.1109/CVPR46437.2021.01656
Lu X, Wang W, Shen J et al (2021) Segmenting objects from relational visual data. IEEE Trans Pattern Anal Mach Intell 44:7885–7897
Article Google Scholar
Wang W, Lu X, Shen J et al (2020) Zero-shot video object segmentation via attentive graph neural networks[C]//International conference on computer vision. IEEE, pp 9235–9244. https://doi.org/10.1109/ICCV.2019.00933
Lu X, Wang W, Shen J et al (2020) Zero-shot video object segmentation with co-attention Siamese networks[C]. IEEE Trans Pattern Anal Mach Intell 44(4):2228–2242. https://doi.org/10.1109/TPAMI.2020.3040258
Article Google Scholar
Wang J, Cohen M (2007) Optimized color sampling for robust matting. IEEE conference on computer vision & pattern recognition. IEEE Computer Society, pp 1–8. https://doi.org/10.1109/CVPR.2007.383006
Gastal E, Oliveira M (2010) Shared sampling for real-time alpha matting. Comput Graph Forum, vol 29, no 2. Proceedings of Eurographics, pp 575–584
He K, Rhemann C, Rother C et al (2011) A global sampling method for alpha matting. IEEE Conference on Computer Vision & Pattern Recognition, pp 2049–2056. https://doi.org/10.1109/CVPR.2011.5995495
Sun J, Jia J, Tang C et al (2004) Poisson matting. ACM Trans Graph 23(3):315–321. https://doi.org/10.1145/1015706.1015721
Article Google Scholar
Levin A (2006) A closed form solution to natural image matting. IEEE Computer Society, pp 61–68. https://doi.org/10.1109/CVPR.2006.18
Chen Q, Li D, Tang C (2013) KNN matting. IEEE Trans Pattern Anal Mach Intell 35(9):2175–2188. https://doi.org/10.1109/TPAMI.2013.18
Article PubMed Google Scholar
Xu N, Price B, Cohen S, Huang T (2017) Deep image matting. IEEE Conf Comput Vis Pattern Recognit arXiv:1703.03872
Lutz S, Amplianitis K, Smolic A (2018) ΑlphaGAN: generative adversarial networks for natural image matting. British Machine Vision Conference arXiv:1807.10088
Chen Q, Ge T, Xu Y, Zhang Z, Yang X, Gai K (2018) Semantic human matting. Multi-media arXiv:1809.01354
Sengupta S, Jayaram V, Curless B et al (2020) Background matting: the world is your green screen. Comput Vis Pattern Recogn (CVPR), pp 2288–2297. https://doi.org/10.1109/CVPR42600.2020.00236
Lin S, Ryabtsev A, Sengupta S et al (2020) Real-time high-resolution background matting. IEEE Conference on Computer Vision & Pattern Recognition, pp 8758–8767. https://doi.org/10.1109/CVPR46437.2021.00865
Sun Y, Wang G, Gu Q et al (2021) Deep video matting via spatio-temporal alignment and aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6971–6980. https://doi.org/10.1109/CVPR46437.2021.00690
Shi X, Chen Z, Wang H et al (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. MIT Press, pp 802–810
Dai J, Qi H, Xiong Y et al (2017) Deformable convolutional networks. Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 764–773. https://doi.org/10.1109/ICCV.2017.89
Chao P, Zhang X, Gang Y et al (2017) Large Kernel matters — improve semantic segmentation by global convolutional network. IEEE conference on computer vision and pattern recognition (CVPR), pp 1743–1751. https://doi.org/10.1109/CVPR.2017.189
Erofeev M, Gitman Y, Vatolin D, Fedorov A, Wang J (2015) Perceptually motivated benchmark for video matting. In: BMVC. https://doi.org/10.5244/C.29.99
Wang T et al (2021) Video matting via consistency-regularized graph neural networks. IEEE/CVF International Conference on Computer Vision (ICCV), pp 4882–4891. https://doi.org/10.1109/ICCV48922.2021.00486
Yu Q, Zhang J, Zhang H et al (2020) Mask guided matting via progressive refinement network. Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 1154–1163. https://doi.org/10.1109/CVPR46437.2021.00121

Download references

Funding

This work is supported by the Youth Innovation Talent Support Program of Harbin University of Commerce (No. 2020CX39).

Author information

Authors and Affiliations

Harbin University of Commerce, Xuehai Street, Harbin, 150028, Heilongjiang Province, China
Zhiwei Ma & Guilin Yao

Authors

Zhiwei Ma
View author publications
You can also search for this author in PubMed Google Scholar
Guilin Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiwei Ma.

Ethics declarations

Consent for participate

We guarantee that all the authors have been involved with this work.

Consent for publication

We guarantee that all the authors approved the manuscript and agreed to its submission.

Conflict of interest

We (The authors) declare that we have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, Z., Yao, G. Temporal-spatial information mining and aggregation for video matting. Multimed Tools Appl 83, 29221–29237 (2024). https://doi.org/10.1007/s11042-023-16747-2

Download citation

Received: 04 August 2022
Revised: 19 June 2023
Accepted: 31 August 2023
Published: 12 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16747-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Temporal-spatial information mining and aggregation for video matting

Abstract

Access this article

Similar content being viewed by others

One-Trimap Video Matting

Deep Video Matting with Temporal Consistency

Efficient Semantic-Guidance High-Resolution Video Matting

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Consent for participate

Consent for publication

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Temporal-spatial information mining and aggregation for video matting

Abstract

Access this article

Similar content being viewed by others

One-Trimap Video Matting

Deep Video Matting with Temporal Consistency

Efficient Semantic-Guidance High-Resolution Video Matting

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Consent for participate

Consent for publication

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation