Bridging Composite and Real: Towards End-to-End Deep Image Matting

Li, Jizhizi; Zhang, Jing; Maybank, Stephen J.; Tao, Dacheng

doi:10.1007/s11263-021-01541-0

Bridging Composite and Real: Towards End-to-End Deep Image Matting

Published: 04 January 2022

Volume 130, pages 246–266, (2022)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

1409 Accesses
39 Citations
1 Altmetric
Explore all metrics

Abstract

Extracting accurate foregrounds from natural images benefits many downstream applications such as film production and augmented reality. However, the furry characteristics and various appearance of the foregrounds, e.g., animal and portrait, challenge existing matting methods, which usually require extra user inputs such as trimap or scribbles. To resolve these problems, we study the distinct roles of semantics and details for image matting and decompose the task into two parallel sub-tasks: high-level semantic segmentation and low-level details matting. Specifically, we propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders to learn both tasks in a collaborative manner for end-to-end natural image matting. Besides, due to the limitation of available natural images in the matting task, previous methods typically adopt composite images for training and evaluation, which result in limited generalization ability on real-world images. In this paper, we investigate the domain gap issue between composite images and real-world images systematically by conducting comprehensive analyses of various discrepancies between the foreground and background images. We find that a carefully designed composition route RSSN that aims to reduce the discrepancies can lead to a better model with remarkable generalization ability. Furthermore, we provide a benchmark containing 2,000 high-resolution real-world animal images and 10,000 portrait images along with their manually labeled alpha mattes to serve as a test bed for evaluating matting model’s generalization ability on real-world images. Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods and effectively reduces the generalization error. The code and the datasets will be released at https://github.com/JizhiziLi/GFM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Deep learning models for digital image processing: a review

Article 07 January 2024

Notes

The source code, datasets, models, and a video demo will be made publicly available at https://github.com/JizhiziLi/GFM.
https://unsplash.com/ and https://www.pexels.com/

References

Aksoy, Y., Oh, T. H., Paris, S., Pollefeys, M., & Matusik, W. (2018). Semantic soft segmentation. ACM Transactions on Graphics, 37(4), 1–13.
Article Google Scholar
Cai, S., Zhang, X., Fan, H., Huang, H., Liu, J., Liu, J., Liu, J., Wang, J., & Sun, J. (2019). Disentangled image matting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8819–8828.
Chen, B.C., & Kae, A. (2019). Toward realistic image compositing with adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8415–8424.
Chen, Q., Li, D., & Tang, C. K. (2013). Knn matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2175–2188.
Article Google Scholar
Chen, Q., Ge, T., Xu, Y., Zhang, Z., Yang, X., & Gai, K. (2018). Semantic human matting. In: Proceedings of the ACM International Conference on Multimedia, pp. 618–626.
Chen, Z., Zhang, J., & Tao, D. (2021). Recursive context routing for object detection. International Journal of Computer Vision, 129(1), 142–160.
Article Google Scholar
Cong, W., Zhang, J., Niu, L., Liu, L., Ling, Z., Li, W., & Zhang, L. (2020). Dovenet: Deep image harmonization via domain verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8394–8403.
Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2009). Bm3d image denoising with shape-adaptive principal component analysis. In: SPARS’09-Signal Processing with Adaptive Sparse Structured Representations
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.
Hou, Q., Liu, & F. (2019). Context-aware image matting for simultaneous foreground and alpha estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4130–4139.
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141.
Huang, G., Liu, Z., Van, Der Maaten, L., & Weinberger, K.Q. (2017). Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708.
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105.
Levin, A., Lischinski, D., & Weiss, Y. (2007). A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 228–242.
Article Google Scholar
Levin, A., Rav-Acha, A., & Lischinski, D. (2008). Spectral matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10), 1699–1712.
Article Google Scholar
Li, X., Liu, K., Dong, Y., & Tao, D. (2017). Patch alignment manifold matting. IEEE Transactions on Neural Networks and Learning Systems, 29(7), 3214–3226.
Article MathSciNet Google Scholar
Li, Y., & Lu, H. (2020). Natural image matting via guided contextual attention. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 11450–11457.
Article Google Scholar
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, pp 740–755
Liu, J., Yao, Y., Hou, W., Cui, M., Xie, X., Zhang, C., & Hua, X.s. (2020). Boosting semantic human matting with coarse annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8563–8572
Liu, J.J., Hou, Q., Cheng, M.M., Feng, J., & Jiang, J. (2019). A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Lu, H., Dai, Y., Shen, C., & Xu, S. (2019a). Indices matter: Learning to index for deep image matting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3266–3275.
Lu, H., Dai, Y., Shen, C., & Xu, S. (2019b). Indices matter: Learning to index for deep image matting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Qiao, Y., Liu, Y., Yang, X., Zhou, D., Xu, M., Zhang, Q., & Wei, X. (2020). Attention-guided hierarchical structure aggregation for image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., & Jagersand, M. (2019). Basnet: Boundary-aware salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1804–2767. Springer, Germany
Rhemann, C., Rother, C., Wang, J., Gelautz, M., Kohli, P., & Rott, P. (2009). A perceptually motivated online benchmark for image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1826–1833.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In: International Conference on MICCAI, pp. 234–241.
Ruzon, M.A., & Tomasi, C. (2000). Alpha estimation in natural images. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 18–25.
Shen, X., Tao, X., Gao, H., Zhou, C., & Jia, J. (2016). Deep automatic portrait matting. In: Proceedings of the European Conference on Computer Vision, pp. 92–107.
Sun, J., Jia, J., Tang, C. K., & Shum, & H.Y. (2004). Poisson matting. ACM Transactions on Graphics, 23(3), 315–321.
Tang, J., Aksoy, Y., Oztireli, C., Gross, M., & Aydin, T.O. (2019). Learning-based sampling for natural image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3055–3063.
Tsai, Y.H., Shen, X., Lin, Z., Sunkavalli, K., Lu, X., & Yang, M.H. (2017). Deep image harmonization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3789–3797.
Wang, J., & Cohen, M.F. (2005). An iterative optimization approach for unified image segmentation and matting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 936–943.
Wang, J., & Cohen, M.F. (2007). Optimized color sampling for robust matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
Xu, N., Price, B., Cohen, S., & Huang, T. (2017). Deep image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. .2970–2979
Xue, S., Agarwala, A., Dorsey, J., & Rushmeier, H. (2012). Understanding and improving the realism of image composites. ACM Transactions on Graphics, 31(4), 1–10.
Yu, Q., Zhang, J., Zhang, H., Wang, Y., Lin, Z., Xu, N., Bai, Y., & Yuille, A. (2021). Mask guided matting via progressive refinement network. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition
Zhang, J., & Tao, D. (2020). Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet of Things Journal, 8(10), 7789–7817.
Article Google Scholar
Zhang, J., Chen, Z., & Tao, D. (2021). Towards high performance human keypoint detection. International Journal of Computer Vision, 129(9), 2639–2662.
Article Google Scholar
Zhang, Q., Zhang, J., Liu, W., & Tao, D. (2019). Category anchor-guided unsupervised domain adaptation for semantic segmentation. Advances in Neural Information Processing Systems, 32, 435–445.
Google Scholar
Zhang, Y., Gong, L., Fan, L., Ren, P., Huang, Q., Bao, H., & Xu, W. (2019b). A late fusion cnn for digital matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7469–7478.
Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890.
Zheng, Y., Kambhamettu, C., Yu, J., Bauer, T., & Steiner, K. (2008). Fuzzymatte: A computationally efficient scheme for interactive matting. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.

Download references

Author information

J. Li and J. Zhang are co-first authors and contribute equally to this work.

Authors and Affiliations

School of Computer Science, Faculty of Engineering, The University of Sydney, Darlington, NSW, 2008, Australia
Jizhizi Li, Jing Zhang & Dacheng Tao
Department of Computer Science and Information System, Birkbeck College, University of London, London, UK
Stephen J. Maybank

Authors

Jizhizi Li
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Stephen J. Maybank
View author publications
You can also search for this author in PubMed Google Scholar
Dacheng Tao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dacheng Tao.

Additional information

Communicated by Stephen Lin.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by Australian Research Council Projects FL-170100117, IH-180100002, IC-190100031.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Zhang, J., Maybank, S.J. et al. Bridging Composite and Real: Towards End-to-End Deep Image Matting. Int J Comput Vis 130, 246–266 (2022). https://doi.org/10.1007/s11263-021-01541-0

Download citation

Received: 16 May 2021
Accepted: 22 October 2021
Published: 04 January 2022
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11263-021-01541-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bridging Composite and Real: Towards End-to-End Deep Image Matting

Abstract

Access this article

Similar content being viewed by others

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Deep learning models for digital image processing: a review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bridging Composite and Real: Towards End-to-End Deep Image Matting

Abstract

Access this article

Similar content being viewed by others

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Deep learning models for digital image processing: a review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation