Abstract
We propose a novel end-to-end deep learning framework, the Joint Matting Network (JMNet), to automatically generate alpha mattes for human images. We utilize the intrinsic structures of the human body as seen in images by introducing a pose estimation module, which can provide both global structural guidance and a local attention focus for the matting task. Our network model includes a pose network, a trimap network, a matting network, and a shared encoder to extract features for the above three networks. We also append a trimap refinement module and utilize gradient loss to provide a sharper alpha matte. Extensive experiments have shown that our method outperforms state-of-theart human matting techniques; the shared encoder leads to better performance and lower memory costs. Our model can process real images downloaded from the Internet for use in composition applications.
Article PDF
Similar content being viewed by others
Explore related subjects
Find the latest articles, discoveries, and news in related topics.Avoid common mistakes on your manuscript.
References
Chen, X.; Qi, D.; Shen, J. Boundary-aware network for fast and high-accuracy portrait segmentation. arXiv preprint arXiv:1901.03814, 2019.
Shen, X. Y.; Hertzmann, A.; Jia, J. Y.; Paris, S.; Price, B.; Shechtman, E.; Sachs, I. Automatic portrait segmentation for image stylization. Computer Graphics Forum Vol. 35, No. 2, 93–102, 2016.
Levin, A.; Lischinski, D.; Weiss, Y. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 2, 228–242, 2008.
Chen, Q. F.; Li, D.; Tang, C. K. KNN matting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 35, No. 9, 2175–2188, 2013.
Shen, X. Y.; Tao, X.; Gao, H. Y.; Zhou, C.; Jia, J. Y. Deep automatic portrait matting. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 92–107, 2016.
Chen, Q.; Ge, T. Z.; Xu, Y. Y.; Zhang, Z. Q.; Yang, X. X.; Gai, K. Semantic human matting. In: Proceedings of the 26th ACM International Conference on Multimedia, 618–626, 2018.
Xu, N.; Price, B.; Cohen, S.; Huang, T. Deep image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2970–2979, 2017.
Chuang, Y.-Y.; Curless, B.; Salesin, D. H.; Szeliski, R. A Bayesian approach to digital matting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 264–271, 2001.
Wang, J.; Cohen, M. F. Optimized color sampling for robust matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–8, 2007.
Gastal, E. S. L.; Oliveira, M. M. Shared sampling for real-time alpha matting. Computer Graphics Forum Vol. 29, No. 2, 575–584, 2010.
He, K.; Rhemann, C.; Rother, C.; Tang, X.; Sun J. A global sampling method for alpha matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2049–2056, 2011.
Cho, D.; Tai, Y. W.; Kweon, I. Natural image matting using deep convolutional neural networks. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 626–643, 2016.
Lutz, S.; Amplianitis, K.; Smolic, A. Alphagan: Generative adversarial networks for natural image matting. arXiv preprint arXiv:1807.10088, 2018.
Tang, J. W.; Aksoy, Y.; Oztireli, C.; Gross, M.; Aydin, T. O. Learning-based sampling for natural image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3055–3063, 2019.
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440, 2015.
Zhao, H. S.; Shi, J. P.; Qi, X. J.; Wang, X. G.; Jia, J. Y. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2881–2890, 2017.
Zhang, Y. K.; Gong, L. X.; Fan, L. B.; Ren, P. R.; Huang, Q. X.; Bao, H. J.; Xu, W. A late fusion CNN for digital matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7469–7478, 2019.
Ronneberger, O.; Fischer, P.; Brox, T. Unet: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W.; Frangi, A. Eds. Springer Cham, 234–241, 2015.
Chen, L. C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834–848, 2018.
Carreira, J.; Agrawal, P.; Fragkiadaki, K.; Malik, J. Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4733–4742, 2016.
Toshev, A.; Szegedy, C. DeepPose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1653–1660, 2014.
Newell, A.; Yang, K. Y.; Deng, J. Stacked hourglass networks for human pose estimation. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 483–499, 2016.
Wei, S.-E.; Ramakrishna, V.; Kanade, T.; Sheikh, Y. Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4724–4732, 2016.
Chu, X.; Ouyang, W. L.; Li, H. S.; Wang, X. G. Structured feature learning for pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4715–4723, 2016.
Liang, X. D.; Gong, K.; Shen, X. H.; Lin, L. Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 4, 871–885, 2019.
Kikuchi, T.; Endo, Y.; Kanamori, Y.; Hashimoto, T.; Mitani, J. Transferring pose and augmenting background for deep human-image parsing and its applications. Computational Visual Media Vol. 4, No. 1, 43–54, 2018.
Wu, X.; Li, R. L.; Zhang, F. L.; Liu, J. C.; Wang, J.; Shamir, A.; Hu, S.-M. Deep portrait image completion and extrapolation. IEEE Transactions on Image Processing Vol. 29, 2344–2355, 2020.
He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
He, K. M.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2961–2969, 2017.
Lin, T. Y.; Dollar, P.; Girshick, R.; He, K. M.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117–2125, 2017.
Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S. E.; Sheikh, Y. OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008, 2018.
Lin, T. Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Doll´ar, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision–ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B..; Tuytelaars T. Eds. Springer Cham, 740–755, 2014.
Rhemann, C.; Rother, C.; Wang, J.; Gelautz, M.; Kohli, P.; Rott, P. A perceptually motivated online benchmark for image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1826–1833, 2009.
Acknowledgements
The authors would like to thank all the reviewers. We gratefully acknowledge the support of Jian-Cheng Liu who helped prepare and preprocess the dataset. This work was supported by National Natural Science Foundation of China (Grant Nos. 61561146393 and 61521002). Fang-Lue Zhang was supported by a Victoria Early-Career Research Excellence Award.
Author information
Authors and Affiliations
Corresponding author
Additional information
Xian Wu is currently a Ph.D. student in Tsinghua University. He received his B.S. degree from Tsinghua University in 2015. His research interests include image synthesis and editing, and deep learning in computer graphics.
Xiao-Nan Fang is currently a Ph.D. student in Tsinghua University. He received his B.S. degree from Tsinghua University in 2018. His research interests include image and video processing, and computer graphics.
Tao Chen received his B.S. degree in fundamental science and Ph.D. degree in computer science from Tsinghua University, China, in 2005 and 2011, respectively. He is currently the Deputy General Manager of the AI Center at Visual China Group and the VP of Machine Learning, 500PX Inc. His research interests include multimedia, computer graphics, and computer vision.
Fang-Lue Zhang is a lecturer at Victoria University of Wellington. He received his doctoral degree from Tsinghua University in 2015 and bachelor degree from Zhejiang University in 2009. His research interests include image and video editing, computer vision, and computer graphics. He is a member of ACM and IEEE.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Wu, X., Fang, XN., Chen, T. et al. JMNet: A joint matting network for automatic human matting. Comp. Visual Media 6, 215–224 (2020). https://doi.org/10.1007/s41095-020-0168-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-020-0168-6