Abstract
Extracting accurate foregrounds from natural images benefits many downstream applications such as film production and augmented reality. However, the furry characteristics and various appearance of the foregrounds, e.g., animal and portrait, challenge existing matting methods, which usually require extra user inputs such as trimap or scribbles. To resolve these problems, we study the distinct roles of semantics and details for image matting and decompose the task into two parallel sub-tasks: high-level semantic segmentation and low-level details matting. Specifically, we propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders to learn both tasks in a collaborative manner for end-to-end natural image matting. Besides, due to the limitation of available natural images in the matting task, previous methods typically adopt composite images for training and evaluation, which result in limited generalization ability on real-world images. In this paper, we investigate the domain gap issue between composite images and real-world images systematically by conducting comprehensive analyses of various discrepancies between the foreground and background images. We find that a carefully designed composition route RSSN that aims to reduce the discrepancies can lead to a better model with remarkable generalization ability. Furthermore, we provide a benchmark containing 2,000 high-resolution real-world animal images and 10,000 portrait images along with their manually labeled alpha mattes to serve as a test bed for evaluating matting model’s generalization ability on real-world images. Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods and effectively reduces the generalization error. The code and the datasets will be released at https://github.com/JizhiziLi/GFM.
Similar content being viewed by others
Notes
The source code, datasets, models, and a video demo will be made publicly available at https://github.com/JizhiziLi/GFM.
References
Aksoy, Y., Oh, T. H., Paris, S., Pollefeys, M., & Matusik, W. (2018). Semantic soft segmentation. ACM Transactions on Graphics, 37(4), 1–13.
Cai, S., Zhang, X., Fan, H., Huang, H., Liu, J., Liu, J., Liu, J., Wang, J., & Sun, J. (2019). Disentangled image matting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8819–8828.
Chen, B.C., & Kae, A. (2019). Toward realistic image compositing with adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8415–8424.
Chen, Q., Li, D., & Tang, C. K. (2013). Knn matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2175–2188.
Chen, Q., Ge, T., Xu, Y., Zhang, Z., Yang, X., & Gai, K. (2018). Semantic human matting. In: Proceedings of the ACM International Conference on Multimedia, pp. 618–626.
Chen, Z., Zhang, J., & Tao, D. (2021). Recursive context routing for object detection. International Journal of Computer Vision, 129(1), 142–160.
Cong, W., Zhang, J., Niu, L., Liu, L., Ling, Z., Li, W., & Zhang, L. (2020). Dovenet: Deep image harmonization via domain verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8394–8403.
Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2009). Bm3d image denoising with shape-adaptive principal component analysis. In: SPARS’09-Signal Processing with Adaptive Sparse Structured Representations
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.
Hou, Q., Liu, & F. (2019). Context-aware image matting for simultaneous foreground and alpha estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4130–4139.
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141.
Huang, G., Liu, Z., Van, Der Maaten, L., & Weinberger, K.Q. (2017). Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708.
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105.
Levin, A., Lischinski, D., & Weiss, Y. (2007). A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 228–242.
Levin, A., Rav-Acha, A., & Lischinski, D. (2008). Spectral matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10), 1699–1712.
Li, X., Liu, K., Dong, Y., & Tao, D. (2017). Patch alignment manifold matting. IEEE Transactions on Neural Networks and Learning Systems, 29(7), 3214–3226.
Li, Y., & Lu, H. (2020). Natural image matting via guided contextual attention. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 11450–11457.
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, pp 740–755
Liu, J., Yao, Y., Hou, W., Cui, M., Xie, X., Zhang, C., & Hua, X.s. (2020). Boosting semantic human matting with coarse annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8563–8572
Liu, J.J., Hou, Q., Cheng, M.M., Feng, J., & Jiang, J. (2019). A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Lu, H., Dai, Y., Shen, C., & Xu, S. (2019a). Indices matter: Learning to index for deep image matting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3266–3275.
Lu, H., Dai, Y., Shen, C., & Xu, S. (2019b). Indices matter: Learning to index for deep image matting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Qiao, Y., Liu, Y., Yang, X., Zhou, D., Xu, M., Zhang, Q., & Wei, X. (2020). Attention-guided hierarchical structure aggregation for image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., & Jagersand, M. (2019). Basnet: Boundary-aware salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1804–2767. Springer, Germany
Rhemann, C., Rother, C., Wang, J., Gelautz, M., Kohli, P., & Rott, P. (2009). A perceptually motivated online benchmark for image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1826–1833.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In: International Conference on MICCAI, pp. 234–241.
Ruzon, M.A., & Tomasi, C. (2000). Alpha estimation in natural images. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 18–25.
Shen, X., Tao, X., Gao, H., Zhou, C., & Jia, J. (2016). Deep automatic portrait matting. In: Proceedings of the European Conference on Computer Vision, pp. 92–107.
Sun, J., Jia, J., Tang, C. K., & Shum, & H.Y. (2004). Poisson matting. ACM Transactions on Graphics, 23(3), 315–321.
Tang, J., Aksoy, Y., Oztireli, C., Gross, M., & Aydin, T.O. (2019). Learning-based sampling for natural image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3055–3063.
Tsai, Y.H., Shen, X., Lin, Z., Sunkavalli, K., Lu, X., & Yang, M.H. (2017). Deep image harmonization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3789–3797.
Wang, J., & Cohen, M.F. (2005). An iterative optimization approach for unified image segmentation and matting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 936–943.
Wang, J., & Cohen, M.F. (2007). Optimized color sampling for robust matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
Xu, N., Price, B., Cohen, S., & Huang, T. (2017). Deep image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. .2970–2979
Xue, S., Agarwala, A., Dorsey, J., & Rushmeier, H. (2012). Understanding and improving the realism of image composites. ACM Transactions on Graphics, 31(4), 1–10.
Yu, Q., Zhang, J., Zhang, H., Wang, Y., Lin, Z., Xu, N., Bai, Y., & Yuille, A. (2021). Mask guided matting via progressive refinement network. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition
Zhang, J., & Tao, D. (2020). Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet of Things Journal, 8(10), 7789–7817.
Zhang, J., Chen, Z., & Tao, D. (2021). Towards high performance human keypoint detection. International Journal of Computer Vision, 129(9), 2639–2662.
Zhang, Q., Zhang, J., Liu, W., & Tao, D. (2019). Category anchor-guided unsupervised domain adaptation for semantic segmentation. Advances in Neural Information Processing Systems, 32, 435–445.
Zhang, Y., Gong, L., Fan, L., Ren, P., Huang, Q., Bao, H., & Xu, W. (2019b). A late fusion cnn for digital matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7469–7478.
Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890.
Zheng, Y., Kambhamettu, C., Yu, J., Bauer, T., & Steiner, K. (2008). Fuzzymatte: A computationally efficient scheme for interactive matting. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Stephen Lin.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by Australian Research Council Projects FL-170100117, IH-180100002, IC-190100031.
Rights and permissions
About this article
Cite this article
Li, J., Zhang, J., Maybank, S.J. et al. Bridging Composite and Real: Towards End-to-End Deep Image Matting. Int J Comput Vis 130, 246–266 (2022). https://doi.org/10.1007/s11263-021-01541-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-021-01541-0