One-Trimap Video Matting

Seong, Hongje; Oh, Seoung Wug; Price, Brian; Kim, Euntai; Lee, Joon-Young

doi:10.1007/978-3-031-19818-2_25

Hongje Seong¹²,
Seoung Wug Oh¹³,
Brian Price¹³,
Euntai Kim¹² &
…
Joon-Young Lee¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13689))

Included in the following conference series:

European Conference on Computer Vision

1910 Accesses
5 Citations

Abstract

Recent studies made great progress in video matting by extending the success of trimap-based image matting to the video domain. In this paper, we push this task toward a more practical setting and propose One-Trimap Video Matting network (OTVM) that performs video matting robustly using only one user-annotated trimap. A key of OTVM is the joint modeling of trimap propagation and alpha prediction. Starting from baseline trimap propagation and alpha prediction networks, our OTVM combines the two networks with an alpha-trimap refinement module to facilitate information flow. We also present an end-to-end training strategy to take full advantage of the joint model. Our joint modeling greatly improves the temporal stability of trimap propagation compared to the previous decoupled methods. We evaluate our model on two latest video matting benchmarks, Deep Video Matting and VideoMatting108, and outperform state-of-the-art by significant margins (MSE improvements of 56.4% and 56.7%, respectively). The source code and model are available online: https://github.com/Hongje/OTVM.

H. Seong—This work was done during an internship at Adobe Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apostoloff, N., Fitzgibbon, A.: Bayesian video matting using learnt image priors. In: CVPR (2004)
Google Scholar
Bai, X., Wang, J., Simons, D.: Towards temporally-coherent video matting. In: Gagalowicz, A., Philips, W. (eds.) MIRAGE 2011. LNCS, vol. 6930, pp. 63–74. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24136-9_6
Chapter Google Scholar
Chen, Q., Li, D., Tang, C.K.: KNN matting. IEEE Trans. Pattern Anal. Mach. Intell. 35(9), 2175–2188 (2013)
Article Google Scholar
Chen, X., Zou, D., Zhiying Zhou, S., Zhao, Q., Tan, P.: Image matting with local and nonlocal smooth priors. In: CVPR, pp. 1902–1907 (2013)
Google Scholar
Cheng, H.K., Tai, Y.W., Tang, C.K.: Rethinking space-time networks with improved memory coverage for efficient video object segmentation. In: NeurIPS (2021)
Google Scholar
Cheng, M.M., Mitra, N.J., Huang, X., Torr, P.H., Hu, S.M.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2014)
Article Google Scholar
Cho, D., Tai, Y.-W., Kweon, I.: Natural image matting using deep convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 626–643. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_39
Chapter Google Scholar
Choi, I., Lee, M., Tai, Y.-W.: video matting using multi-frame nonlocal matting Laplacian. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 540–553. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_39
Chapter Google Scholar
Chuang, Y.Y., Agarwala, A., Curless, B., Salesin, D., Szeliski, R.: Video matting of complex scenes. In: SIGGRAPH (2002)
Google Scholar
Chuang, Y.Y., Curless, B., Salesin, D.H., Szeliski, R.: A Bayesian approach to digital matting. In: CVPR. IEEE (2001)
Google Scholar
Eisemann, M., Wolf, J., Magnor, M.A.: Spectral video matting. In: VMV, pp. 121–126. Citeseer (2009)
Google Scholar
Erofeev, M., Gitman, Y., Vatolin, D.S., Fedorov, A., Wang, J.: Perceptually motivated benchmark for video matting. In: BMVC, pp. 99–1 (2015)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Article Google Scholar
Forte, M., Pitié, F.: \( f \), \( b \), alpha matting. arXiv preprint arXiv:2003.07711 (2020)
Gastal, E.S.L., Oliveira, M.M.: Shared sampling for real-time alpha matting. Comput. Graph. Forum 29(2), 575–584 (2010)
Article Google Scholar
Gong, M., Wang, L., Yang, R., Yang, Y.H.: Real-time video matting using multichannel Poisson equations. In: Graphics Interface (2010)
Google Scholar
Grady, L., Schiwietz, T., Aharon, S., Westermann, R.: Random walks for interactive alpha-matting. In: Proceedings of VIIP, vol. 2005, pp. 423–429 (2005)
Google Scholar
Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: ICCV, pp. 991–998. IEEE (2011)
Google Scholar
He, K., Rhemann, C., Rother, C., Tang, X., Sun, J.: A global sampling method for alpha matting. In: CVPR, pp. 2049–2056. IEEE (2011)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Hou, Q., Liu, F.: Context-aware image matting for simultaneous foreground and alpha estimation. In: ICCV, pp. 4130–4139 (2019)
Google Scholar
Jian, S., Jia, J., Tang, C.K., Shum, H.Y.: Poisson matting. ACM Trans. Graph. 23, 315–321 (2004)
Article Google Scholar
Ke, Z., Sun, J., Li, K., Yan, Q., Lau, R.W.: ModNet: real-time trimap-free portrait matting via objective decomposition. In: AAAI (2022)
Google Scholar
Lee, P., Wu, Y.: Nonlocal matting. In: CVPR, pp. 2193–2200. IEEE (2011)
Google Scholar
Levin, A., Lischinski, D., Weiss, Y.: A closed-form solution to natural image matting. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 228–242 (2007)
Article Google Scholar
Levin, A., Rav-Acha, A., Lischinski, D.: Spectral matting. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1699–1712 (2008)
Article Google Scholar
Li, D., Chen, Q., Tang, C.K.: Motion-aware KNN Laplacian for video matting. In: ICCV (2013)
Google Scholar
Li, Y., Lu, H.: Natural image matting via guided contextual attention. In: AAAI (2020)
Google Scholar
Lin, S., Ryabtsev, A., Sengupta, S., Curless, B.L., Seitz, S.M., Kemelmacher-Shlizerman, I.: Real-time high-resolution background matting. In: CVPR, pp. 8762–8771 (2021)
Google Scholar
Lin, S., Yang, L., Saleemi, I., Sengupta, S.: Robust high-resolution video matting with temporal guidance. In: WACV (2022)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, L., et al.: On the variance of the adaptive learning rate and beyond. In: ICLR, April 2020
Google Scholar
Lu, H., Dai, Y., Shen, C., Xu, S.: Indices matter: learning to index for deep image matting. In: ICCV, pp. 3266–3275 (2019)
Google Scholar
Oh, S.W., Lee, J.Y., Xu, N., Kim, S.J.: Video object segmentation using space-time memory networks. In: ICCV, October 2019
Google Scholar
Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 Davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675 (2017)
Porter, T., Duff, T.: Compositing digital images. In: SIGGRAPH, pp. 253–259 (1984)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Sengupta, S., Jayaram, V., Curless, B., Seitz, S.M., Kemelmacher-Shlizerman, I.: Background matting: the world is your green screen. In: CVPR, pp. 2291–2300 (2020)
Google Scholar
Seong, H., Oh, S.W., Lee, J.Y., Lee, S., Lee, S., Kim, E.: Hierarchical memory matching network for video object segmentation. In: ICCV, pp. 12889–12898 (2021)
Google Scholar
Shahrian, E., Price, B., Cohen, S., Rajan, D.: Temporally consistent and spatially accurate video matting. In: Eurographics (2014)
Google Scholar
Shahrian, E., Rajan, D., Price, B., Cohen, S.: Improving image matting using comprehensive sampling sets. In: CVPR, pp. 636–643 (2013)
Google Scholar
Shen, X., Tao, X., Gao, H., Zhou, C., Jia, J.: Deep automatic portrait matting. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 92–107. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_6
Chapter Google Scholar
Shi, J., Yan, Q., Xu, L., Jia, J.: Hierarchical image saliency detection on extended CSSD. IEEE Trans. Pattern Anal. Mach. Intell. 38(4), 717–729 (2015)
Article Google Scholar
Sun, J., Ke, Z., Zhang, L., Lu, H., Lau, R.W.: ModNet-V: improving portrait video matting via background restoration. arXiv preprint arXiv:2109.11818 (2021)
Sun, Y., Wang, G., Gu, Q., Tang, C.K., Tai, Y.W.: Deep video matting via spatio-temporal alignment and aggregation. In: CVPR, pp. 6975–6984 (2021)
Google Scholar
Lee, S.-Y., Yoon, J.-C., Lee, I.K.: Temporally coherent video matting. Graph. Models 72, 25–33 (2010)
Article Google Scholar
Tang, Z., Miao, Z., Wan, Y.: Temporally consistent video matting based on bilayer segmentation. In: ICME (2010)
Google Scholar
Tang, Z., Miao, Z., Wan, Y., Zhang, D.: Video matting via opacity propagation. Visual Comput. 28, 47–51 (2012)
Article Google Scholar
Wang, J., Cohen, M.F.: Optimized color sampling for robust matting. In: CVPR. IEEE (2007)
Google Scholar
Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: CVPRW (2019)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Xu, N., Price, B., Cohen, S., Huang, T.: Deep image matting. In: CVPR, pp. 2970–2979 (2017)
Google Scholar
Xu, N., et al.: YouTube-VOS: sequence-to-sequence video object segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 603–619. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_36
Chapter Google Scholar
Xu, N., et al.: Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327 (2018)
Yu, Q., et al.: Mask guided matting via progressive refinement network. In: CVPR, pp. 1154–1163 (2021)
Google Scholar
Zhang, Y., et al.: Attention-guided temporal coherent video object matting. In: ACM MM (2021)
Google Scholar
Zhu, B., Chen, Y., Wang, J., Liu, S., Zhang, B., Tang, M.: Fast deep matting for portrait animation on mobile phone. In: ACM MM, pp. 297–305 (2017)
Google Scholar

Download references

Acknowledgements

This research was supported in part by the Yonsei Signature Research Cluster Program of 2022 (2022-22-0002). This research was also supported in part by the KIST Institutional Program (Project No. 2E31051-21-204).

Author information

Authors and Affiliations

Yonsei University, Seoul, Korea
Hongje Seong & Euntai Kim
Adobe Research, San Jose, CA, USA
Seoung Wug Oh, Brian Price & Joon-Young Lee

Authors

Hongje Seong
View author publications
You can also search for this author in PubMed Google Scholar
Seoung Wug Oh
View author publications
You can also search for this author in PubMed Google Scholar
Brian Price
View author publications
You can also search for this author in PubMed Google Scholar
Euntai Kim
View author publications
You can also search for this author in PubMed Google Scholar
Joon-Young Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joon-Young Lee .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4978 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Seong, H., Oh, S.W., Price, B., Kim, E., Lee, JY. (2022). One-Trimap Video Matting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13689. Springer, Cham. https://doi.org/10.1007/978-3-031-19818-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-19818-2_25
Published: 22 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19817-5
Online ISBN: 978-3-031-19818-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics