Skip to main content

One-Trimap Video Matting

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13689))

Included in the following conference series:

Abstract

Recent studies made great progress in video matting by extending the success of trimap-based image matting to the video domain. In this paper, we push this task toward a more practical setting and propose One-Trimap Video Matting network (OTVM) that performs video matting robustly using only one user-annotated trimap. A key of OTVM is the joint modeling of trimap propagation and alpha prediction. Starting from baseline trimap propagation and alpha prediction networks, our OTVM combines the two networks with an alpha-trimap refinement module to facilitate information flow. We also present an end-to-end training strategy to take full advantage of the joint model. Our joint modeling greatly improves the temporal stability of trimap propagation compared to the previous decoupled methods. We evaluate our model on two latest video matting benchmarks, Deep Video Matting and VideoMatting108, and outperform state-of-the-art by significant margins (MSE improvements of 56.4% and 56.7%, respectively). The source code and model are available online: https://github.com/Hongje/OTVM.

H. Seong—This work was done during an internship at Adobe Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apostoloff, N., Fitzgibbon, A.: Bayesian video matting using learnt image priors. In: CVPR (2004)

    Google Scholar 

  2. Bai, X., Wang, J., Simons, D.: Towards temporally-coherent video matting. In: Gagalowicz, A., Philips, W. (eds.) MIRAGE 2011. LNCS, vol. 6930, pp. 63–74. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24136-9_6

    Chapter  Google Scholar 

  3. Chen, Q., Li, D., Tang, C.K.: KNN matting. IEEE Trans. Pattern Anal. Mach. Intell. 35(9), 2175–2188 (2013)

    Article  Google Scholar 

  4. Chen, X., Zou, D., Zhiying Zhou, S., Zhao, Q., Tan, P.: Image matting with local and nonlocal smooth priors. In: CVPR, pp. 1902–1907 (2013)

    Google Scholar 

  5. Cheng, H.K., Tai, Y.W., Tang, C.K.: Rethinking space-time networks with improved memory coverage for efficient video object segmentation. In: NeurIPS (2021)

    Google Scholar 

  6. Cheng, M.M., Mitra, N.J., Huang, X., Torr, P.H., Hu, S.M.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2014)

    Article  Google Scholar 

  7. Cho, D., Tai, Y.-W., Kweon, I.: Natural image matting using deep convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 626–643. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_39

    Chapter  Google Scholar 

  8. Choi, I., Lee, M., Tai, Y.-W.: video matting using multi-frame nonlocal matting Laplacian. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 540–553. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_39

    Chapter  Google Scholar 

  9. Chuang, Y.Y., Agarwala, A., Curless, B., Salesin, D., Szeliski, R.: Video matting of complex scenes. In: SIGGRAPH (2002)

    Google Scholar 

  10. Chuang, Y.Y., Curless, B., Salesin, D.H., Szeliski, R.: A Bayesian approach to digital matting. In: CVPR. IEEE (2001)

    Google Scholar 

  11. Eisemann, M., Wolf, J., Magnor, M.A.: Spectral video matting. In: VMV, pp. 121–126. Citeseer (2009)

    Google Scholar 

  12. Erofeev, M., Gitman, Y., Vatolin, D.S., Fedorov, A., Wang, J.: Perceptually motivated benchmark for video matting. In: BMVC, pp. 99–1 (2015)

    Google Scholar 

  13. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)

    Article  Google Scholar 

  14. Forte, M., Pitié, F.: \( f \), \( b \), alpha matting. arXiv preprint arXiv:2003.07711 (2020)

  15. Gastal, E.S.L., Oliveira, M.M.: Shared sampling for real-time alpha matting. Comput. Graph. Forum 29(2), 575–584 (2010)

    Article  Google Scholar 

  16. Gong, M., Wang, L., Yang, R., Yang, Y.H.: Real-time video matting using multichannel Poisson equations. In: Graphics Interface (2010)

    Google Scholar 

  17. Grady, L., Schiwietz, T., Aharon, S., Westermann, R.: Random walks for interactive alpha-matting. In: Proceedings of VIIP, vol. 2005, pp. 423–429 (2005)

    Google Scholar 

  18. Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: ICCV, pp. 991–998. IEEE (2011)

    Google Scholar 

  19. He, K., Rhemann, C., Rother, C., Tang, X., Sun, J.: A global sampling method for alpha matting. In: CVPR, pp. 2049–2056. IEEE (2011)

    Google Scholar 

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  21. Hou, Q., Liu, F.: Context-aware image matting for simultaneous foreground and alpha estimation. In: ICCV, pp. 4130–4139 (2019)

    Google Scholar 

  22. Jian, S., Jia, J., Tang, C.K., Shum, H.Y.: Poisson matting. ACM Trans. Graph. 23, 315–321 (2004)

    Article  Google Scholar 

  23. Ke, Z., Sun, J., Li, K., Yan, Q., Lau, R.W.: ModNet: real-time trimap-free portrait matting via objective decomposition. In: AAAI (2022)

    Google Scholar 

  24. Lee, P., Wu, Y.: Nonlocal matting. In: CVPR, pp. 2193–2200. IEEE (2011)

    Google Scholar 

  25. Levin, A., Lischinski, D., Weiss, Y.: A closed-form solution to natural image matting. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 228–242 (2007)

    Article  Google Scholar 

  26. Levin, A., Rav-Acha, A., Lischinski, D.: Spectral matting. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1699–1712 (2008)

    Article  Google Scholar 

  27. Li, D., Chen, Q., Tang, C.K.: Motion-aware KNN Laplacian for video matting. In: ICCV (2013)

    Google Scholar 

  28. Li, Y., Lu, H.: Natural image matting via guided contextual attention. In: AAAI (2020)

    Google Scholar 

  29. Lin, S., Ryabtsev, A., Sengupta, S., Curless, B.L., Seitz, S.M., Kemelmacher-Shlizerman, I.: Real-time high-resolution background matting. In: CVPR, pp. 8762–8771 (2021)

    Google Scholar 

  30. Lin, S., Yang, L., Saleemi, I., Sengupta, S.: Robust high-resolution video matting with temporal guidance. In: WACV (2022)

    Google Scholar 

  31. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  32. Liu, L., et al.: On the variance of the adaptive learning rate and beyond. In: ICLR, April 2020

    Google Scholar 

  33. Lu, H., Dai, Y., Shen, C., Xu, S.: Indices matter: learning to index for deep image matting. In: ICCV, pp. 3266–3275 (2019)

    Google Scholar 

  34. Oh, S.W., Lee, J.Y., Xu, N., Kim, S.J.: Video object segmentation using space-time memory networks. In: ICCV, October 2019

    Google Scholar 

  35. Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 Davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675 (2017)

  36. Porter, T., Duff, T.: Compositing digital images. In: SIGGRAPH, pp. 253–259 (1984)

    Google Scholar 

  37. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  38. Sengupta, S., Jayaram, V., Curless, B., Seitz, S.M., Kemelmacher-Shlizerman, I.: Background matting: the world is your green screen. In: CVPR, pp. 2291–2300 (2020)

    Google Scholar 

  39. Seong, H., Oh, S.W., Lee, J.Y., Lee, S., Lee, S., Kim, E.: Hierarchical memory matching network for video object segmentation. In: ICCV, pp. 12889–12898 (2021)

    Google Scholar 

  40. Shahrian, E., Price, B., Cohen, S., Rajan, D.: Temporally consistent and spatially accurate video matting. In: Eurographics (2014)

    Google Scholar 

  41. Shahrian, E., Rajan, D., Price, B., Cohen, S.: Improving image matting using comprehensive sampling sets. In: CVPR, pp. 636–643 (2013)

    Google Scholar 

  42. Shen, X., Tao, X., Gao, H., Zhou, C., Jia, J.: Deep automatic portrait matting. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 92–107. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_6

    Chapter  Google Scholar 

  43. Shi, J., Yan, Q., Xu, L., Jia, J.: Hierarchical image saliency detection on extended CSSD. IEEE Trans. Pattern Anal. Mach. Intell. 38(4), 717–729 (2015)

    Article  Google Scholar 

  44. Sun, J., Ke, Z., Zhang, L., Lu, H., Lau, R.W.: ModNet-V: improving portrait video matting via background restoration. arXiv preprint arXiv:2109.11818 (2021)

  45. Sun, Y., Wang, G., Gu, Q., Tang, C.K., Tai, Y.W.: Deep video matting via spatio-temporal alignment and aggregation. In: CVPR, pp. 6975–6984 (2021)

    Google Scholar 

  46. Lee, S.-Y., Yoon, J.-C., Lee, I.K.: Temporally coherent video matting. Graph. Models 72, 25–33 (2010)

    Article  Google Scholar 

  47. Tang, Z., Miao, Z., Wan, Y.: Temporally consistent video matting based on bilayer segmentation. In: ICME (2010)

    Google Scholar 

  48. Tang, Z., Miao, Z., Wan, Y., Zhang, D.: Video matting via opacity propagation. Visual Comput. 28, 47–51 (2012)

    Article  Google Scholar 

  49. Wang, J., Cohen, M.F.: Optimized color sampling for robust matting. In: CVPR. IEEE (2007)

    Google Scholar 

  50. Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: CVPRW (2019)

    Google Scholar 

  51. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  52. Xu, N., Price, B., Cohen, S., Huang, T.: Deep image matting. In: CVPR, pp. 2970–2979 (2017)

    Google Scholar 

  53. Xu, N., et al.: YouTube-VOS: sequence-to-sequence video object segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 603–619. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_36

    Chapter  Google Scholar 

  54. Xu, N., et al.: Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327 (2018)

  55. Yu, Q., et al.: Mask guided matting via progressive refinement network. In: CVPR, pp. 1154–1163 (2021)

    Google Scholar 

  56. Zhang, Y., et al.: Attention-guided temporal coherent video object matting. In: ACM MM (2021)

    Google Scholar 

  57. Zhu, B., Chen, Y., Wang, J., Liu, S., Zhang, B., Tang, M.: Fast deep matting for portrait animation on mobile phone. In: ACM MM, pp. 297–305 (2017)

    Google Scholar 

Download references

Acknowledgements

This research was supported in part by the Yonsei Signature Research Cluster Program of 2022 (2022-22-0002). This research was also supported in part by the KIST Institutional Program (Project No. 2E31051-21-204).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joon-Young Lee .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4978 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Seong, H., Oh, S.W., Price, B., Kim, E., Lee, JY. (2022). One-Trimap Video Matting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13689. Springer, Cham. https://doi.org/10.1007/978-3-031-19818-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19818-2_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19817-5

  • Online ISBN: 978-3-031-19818-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics