Skip to main content
Log in

Bridging Composite and Real: Towards End-to-End Deep Image Matting

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Extracting accurate foregrounds from natural images benefits many downstream applications such as film production and augmented reality. However, the furry characteristics and various appearance of the foregrounds, e.g., animal and portrait, challenge existing matting methods, which usually require extra user inputs such as trimap or scribbles. To resolve these problems, we study the distinct roles of semantics and details for image matting and decompose the task into two parallel sub-tasks: high-level semantic segmentation and low-level details matting. Specifically, we propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders to learn both tasks in a collaborative manner for end-to-end natural image matting. Besides, due to the limitation of available natural images in the matting task, previous methods typically adopt composite images for training and evaluation, which result in limited generalization ability on real-world images. In this paper, we investigate the domain gap issue between composite images and real-world images systematically by conducting comprehensive analyses of various discrepancies between the foreground and background images. We find that a carefully designed composition route RSSN that aims to reduce the discrepancies can lead to a better model with remarkable generalization ability. Furthermore, we provide a benchmark containing 2,000 high-resolution real-world animal images and 10,000 portrait images along with their manually labeled alpha mattes to serve as a test bed for evaluating matting model’s generalization ability on real-world images. Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods and effectively reduces the generalization error. The code and the datasets will be released at https://github.com/JizhiziLi/GFM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The source code, datasets, models, and a video demo will be made publicly available at https://github.com/JizhiziLi/GFM.

  2. https://unsplash.com/ and https://www.pexels.com/

References

  • Aksoy, Y., Oh, T. H., Paris, S., Pollefeys, M., & Matusik, W. (2018). Semantic soft segmentation. ACM Transactions on Graphics, 37(4), 1–13.

    Article  Google Scholar 

  • Cai, S., Zhang, X., Fan, H., Huang, H., Liu, J., Liu, J., Liu, J., Wang, J., & Sun, J. (2019). Disentangled image matting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8819–8828.

  • Chen, B.C., & Kae, A. (2019). Toward realistic image compositing with adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8415–8424.

  • Chen, Q., Li, D., & Tang, C. K. (2013). Knn matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2175–2188.

    Article  Google Scholar 

  • Chen, Q., Ge, T., Xu, Y., Zhang, Z., Yang, X., & Gai, K. (2018). Semantic human matting. In: Proceedings of the ACM International Conference on Multimedia, pp. 618–626.

  • Chen, Z., Zhang, J., & Tao, D. (2021). Recursive context routing for object detection. International Journal of Computer Vision, 129(1), 142–160.

    Article  Google Scholar 

  • Cong, W., Zhang, J., Niu, L., Liu, L., Ling, Z., Li, W., & Zhang, L. (2020). Dovenet: Deep image harmonization via domain verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8394–8403.

  • Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2009). Bm3d image denoising with shape-adaptive principal component analysis. In: SPARS’09-Signal Processing with Adaptive Sparse Structured Representations

  • Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.

  • Hou, Q., Liu, & F. (2019). Context-aware image matting for simultaneous foreground and alpha estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4130–4139.

  • Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141.

  • Huang, G., Liu, Z., Van, Der Maaten, L., & Weinberger, K.Q. (2017). Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708.

  • Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105.

  • Levin, A., Lischinski, D., & Weiss, Y. (2007). A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 228–242.

    Article  Google Scholar 

  • Levin, A., Rav-Acha, A., & Lischinski, D. (2008). Spectral matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10), 1699–1712.

    Article  Google Scholar 

  • Li, X., Liu, K., Dong, Y., & Tao, D. (2017). Patch alignment manifold matting. IEEE Transactions on Neural Networks and Learning Systems, 29(7), 3214–3226.

    Article  MathSciNet  Google Scholar 

  • Li, Y., & Lu, H. (2020). Natural image matting via guided contextual attention. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 11450–11457.

    Article  Google Scholar 

  • Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, pp 740–755

  • Liu, J., Yao, Y., Hou, W., Cui, M., Xie, X., Zhang, C., & Hua, X.s. (2020). Boosting semantic human matting with coarse annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8563–8572

  • Liu, J.J., Hou, Q., Cheng, M.M., Feng, J., & Jiang, J. (2019). A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Lu, H., Dai, Y., Shen, C., & Xu, S. (2019a). Indices matter: Learning to index for deep image matting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3266–3275.

  • Lu, H., Dai, Y., Shen, C., & Xu, S. (2019b). Indices matter: Learning to index for deep image matting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

  • Qiao, Y., Liu, Y., Yang, X., Zhou, D., Xu, M., Zhang, Q., & Wei, X. (2020). Attention-guided hierarchical structure aggregation for image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., & Jagersand, M. (2019). Basnet: Boundary-aware salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1804–2767. Springer, Germany

  • Rhemann, C., Rother, C., Wang, J., Gelautz, M., Kohli, P., & Rott, P. (2009). A perceptually motivated online benchmark for image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1826–1833.

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In: International Conference on MICCAI, pp. 234–241.

  • Ruzon, M.A., & Tomasi, C. (2000). Alpha estimation in natural images. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 18–25.

  • Shen, X., Tao, X., Gao, H., Zhou, C., & Jia, J. (2016). Deep automatic portrait matting. In: Proceedings of the European Conference on Computer Vision, pp. 92–107.

  • Sun, J., Jia, J., Tang, C. K., & Shum, & H.Y. (2004). Poisson matting. ACM Transactions on Graphics, 23(3), 315–321.

  • Tang, J., Aksoy, Y., Oztireli, C., Gross, M., & Aydin, T.O. (2019). Learning-based sampling for natural image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3055–3063.

  • Tsai, Y.H., Shen, X., Lin, Z., Sunkavalli, K., Lu, X., & Yang, M.H. (2017). Deep image harmonization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3789–3797.

  • Wang, J., & Cohen, M.F. (2005). An iterative optimization approach for unified image segmentation and matting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 936–943.

  • Wang, J., & Cohen, M.F. (2007). Optimized color sampling for robust matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.

  • Xu, N., Price, B., Cohen, S., & Huang, T. (2017). Deep image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. .2970–2979

  • Xue, S., Agarwala, A., Dorsey, J., & Rushmeier, H. (2012). Understanding and improving the realism of image composites. ACM Transactions on Graphics, 31(4), 1–10.

  • Yu, Q., Zhang, J., Zhang, H., Wang, Y., Lin, Z., Xu, N., Bai, Y., & Yuille, A. (2021). Mask guided matting via progressive refinement network. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition

  • Zhang, J., & Tao, D. (2020). Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet of Things Journal, 8(10), 7789–7817.

    Article  Google Scholar 

  • Zhang, J., Chen, Z., & Tao, D. (2021). Towards high performance human keypoint detection. International Journal of Computer Vision, 129(9), 2639–2662.

    Article  Google Scholar 

  • Zhang, Q., Zhang, J., Liu, W., & Tao, D. (2019). Category anchor-guided unsupervised domain adaptation for semantic segmentation. Advances in Neural Information Processing Systems, 32, 435–445.

    Google Scholar 

  • Zhang, Y., Gong, L., Fan, L., Ren, P., Huang, Q., Bao, H., & Xu, W. (2019b). A late fusion cnn for digital matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7469–7478.

  • Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890.

  • Zheng, Y., Kambhamettu, C., Yu, J., Bauer, T., & Steiner, K. (2008). Fuzzymatte: A computationally efficient scheme for interactive matting. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dacheng Tao.

Additional information

Communicated by Stephen Lin.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by Australian Research Council Projects FL-170100117, IH-180100002, IC-190100031.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Zhang, J., Maybank, S.J. et al. Bridging Composite and Real: Towards End-to-End Deep Image Matting. Int J Comput Vis 130, 246–266 (2022). https://doi.org/10.1007/s11263-021-01541-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01541-0

Keywords

Navigation