Efficient Semantic-Guidance High-Resolution Video Matting

Yu, Yue; Li, Ding; Yang, Yulin

doi:10.1007/978-3-031-50069-5_13

Yue Yu¹²,
Ding Li¹² &
Yulin Yang¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14495))

Included in the following conference series:

Computer Graphics International Conference

325 Accesses

Abstract

Video matting has made significant progress in trimap-based field. However, researchers are increasingly interested in auxiliary-free matting because it is more useful in real-world applications. We propose a new efficient semantic-guidance high-resolution video matting network for human body. We apply the convolutional network as the backbone while also employing the transformer in the encoder, which is used to utilize semantic features, while ensuring that the network is not overly bloated. In addition, a channel-wise attention mechanism is introduced in the decoder to improve the representation of semantic feature. In comparison to the current state-of-the-art methods, the method proposed in this paper achieves better results while maintaining the speed and efficiency of prediction. We can complete the real-time auxiliary-free matting for high-resolution video (4K or HD).

Supported by National Natural Science Foundation of China (61807002).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ballas, N., Yao, L., Pal, C., Courville, A.: Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432 (2015)
Chen, X., et al.: Robust human matting via semantic guidance. In: Wang, L., Gall, J., Chin, T.J., Sato, I., Chellappa, R. (eds.) ACCV 2022. LNCS, vol. 13842, pp. 2984–2999. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-26284-5_37
Chapter Google Scholar
Chen, Y., et al.: Mobile-former: bridging mobilenet and transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5270–5279 (2022)
Google Scholar
Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., Liu, Z.: Dynamic ReLU. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 351–367. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_21
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Erofeev, M., Gitman, Y., Vatolin, D.S., Fedorov, A., Wang, J.: Perceptually motivated benchmark for video matting. In: British Machine Vision Conference, pp. 1–12 (2015)
Google Scholar
Graham, B., et al.: Levit: a vision transformer in convnet’s clothing for faster inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12259–12269 (2021)
Google Scholar
Ke, Z., et al.: Is a green screen really necessary for real-time portrait matting? arXiv preprint arXiv:2011.11961 (2020)
Li, L., Tang, J., Ye, Z., Sheng, B., Mao, L., Ma, L.: Unsupervised face super-resolution via gradient enhancement and semantic guidance. Vis. Comput. (2021)
Google Scholar
Li, Y., Fang, L., Ye, L., Yang, X.: Deep video matting with temporal consistency. In: International Forum on Digital TV and Wireless Multimedia Communications, pp. 339–352 (2022)
Google Scholar
Lin, S., Ryabtsev, A., Sengupta, S., Curless, B.L., Seitz, S.M., Kemelmacher-Shlizerman, I.: Real-time high-resolution background matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8762–8771 (2021)
Google Scholar
Lin, S., Yang, L., Saleemi, I., Sengupta, S.: Robust high-resolution video matting with temporal guidance. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 238–247 (2022)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.D.: EAPT: efficient attention pyramid transformer for image processing. IEEE Trans. Multimedia (2021)
Google Scholar
Park, G., Son, S., Yoo, J., Kim, S., Kwak, N.: Matteformer: transformer-based image matting via prior-tokens. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11696–11706 (2022)
Google Scholar
Qiao, Y., et al.: Attention-guided hierarchical structure aggregation for image matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13676–13685 (2020)
Google Scholar
Rhemann, C., Rother, C., Wang, J., Gelautz, M., Kohli, P., Rott, P.: A perceptually motivated online benchmark for image matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1826–1833 (2009)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Sengupta, S., Jayaram, V., Curless, B., Seitz, S.M., Kemelmacher-Shlizerman, I.: background matting: the world is your green screen. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2291–2300 (2020)
Google Scholar
Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16519–16529 (2021)
Google Scholar
Sun, Y., Tang, C.K., Tai, Y.W.: Semantic image matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11120–11129 (2021)
Google Scholar
Sun, Y., Wang, G., Gu, Q., Tang, C.K., Tai, Y.W.: Deep video matting via spatio-temporal alignment and aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6975–6984 (2021)
Google Scholar
Wu, H., Zheng, S., Zhang, J., Huang, K.: Fast end-to-end trainable guided filter. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1838–1847 (2018)
Google Scholar
Xu, N., Price, B., Cohen, S., Huang, T.: Deep image matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2970–2979 (2017)
Google Scholar
Yao, G., Huang, R.: An image matting algorithm based on inception-resnet-v2 network. In: International conference on Variability of the Sun and Sun-Like Stars: From Asteroseismology to Space Weather, pp. 323–334 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Institute of Technology, Beijing, China
Yue Yu, Ding Li & Yulin Yang

Authors

Yue Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ding Li
View author publications
You can also search for this author in PubMed Google Scholar
Yulin Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Yu .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Bin Sheng
Shanghai Jiao Tong University, Shanghai, China
Lei Bi
University of Sydney, Sydney, NSW, Australia
Jinman Kim
MIRALab-CUI, University of Geneve, Carouge, Geneve, Switzerland
Nadia Magnenat-Thalmann
Swiss Federal Institute of Technology, Lausanne, Switzerland
Daniel Thalmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, Y., Li, D., Yang, Y. (2024). Efficient Semantic-Guidance High-Resolution Video Matting. In: Sheng, B., Bi, L., Kim, J., Magnenat-Thalmann, N., Thalmann, D. (eds) Advances in Computer Graphics. CGI 2023. Lecture Notes in Computer Science, vol 14495. Springer, Cham. https://doi.org/10.1007/978-3-031-50069-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-50069-5_13
Published: 20 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50068-8
Online ISBN: 978-3-031-50069-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics