JMNet: A joint matting network for automatic human matting

Wu, Xian; Fang, Xiao-Nan; Chen, Tao; Zhang, Fang-Lue

doi:10.1007/s41095-020-0168-6

JMNet: A joint matting network for automatic human matting

Research Article
Open access
Published: 14 April 2020

Volume 6, pages 215–224, (2020)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

JMNet: A joint matting network for automatic human matting

Download PDF

Xian Wu¹,
Xiao-Nan Fang¹,
Tao Chen² &
…
Fang-Lue Zhang³

1917 Accesses
7 Citations
Explore all metrics

Abstract

We propose a novel end-to-end deep learning framework, the Joint Matting Network (JMNet), to automatically generate alpha mattes for human images. We utilize the intrinsic structures of the human body as seen in images by introducing a pose estimation module, which can provide both global structural guidance and a local attention focus for the matting task. Our network model includes a pose network, a trimap network, a matting network, and a shared encoder to extract features for the above three networks. We also append a trimap refinement module and utilize gradient loss to provide a sharper alpha matte. Extensive experiments have shown that our method outperforms state-of-theart human matting techniques; the shared encoder leads to better performance and lower memory costs. Our model can process real images downloaded from the Internet for use in composition applications.

Article PDF

Efficient Semantic-Guidance High-Resolution Video Matting

Alpha matting for portraits using encoder-decoder models

Article 25 February 2022

Alpha Local Difference Loss Function for Deep Image Matting

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Chen, X.; Qi, D.; Shen, J. Boundary-aware network for fast and high-accuracy portrait segmentation. arXiv preprint arXiv:1901.03814, 2019.
Google Scholar
Shen, X. Y.; Hertzmann, A.; Jia, J. Y.; Paris, S.; Price, B.; Shechtman, E.; Sachs, I. Automatic portrait segmentation for image stylization. Computer Graphics Forum Vol. 35, No. 2, 93–102, 2016.
Article Google Scholar
Levin, A.; Lischinski, D.; Weiss, Y. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 2, 228–242, 2008.
Article Google Scholar
Chen, Q. F.; Li, D.; Tang, C. K. KNN matting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 35, No. 9, 2175–2188, 2013.
Article Google Scholar
Shen, X. Y.; Tao, X.; Gao, H. Y.; Zhou, C.; Jia, J. Y. Deep automatic portrait matting. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 92–107, 2016.
Google Scholar
Chen, Q.; Ge, T. Z.; Xu, Y. Y.; Zhang, Z. Q.; Yang, X. X.; Gai, K. Semantic human matting. In: Proceedings of the 26th ACM International Conference on Multimedia, 618–626, 2018.
Google Scholar
Xu, N.; Price, B.; Cohen, S.; Huang, T. Deep image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2970–2979, 2017.
Google Scholar
Chuang, Y.-Y.; Curless, B.; Salesin, D. H.; Szeliski, R. A Bayesian approach to digital matting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 264–271, 2001.
Google Scholar
Wang, J.; Cohen, M. F. Optimized color sampling for robust matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–8, 2007.
Google Scholar
Gastal, E. S. L.; Oliveira, M. M. Shared sampling for real-time alpha matting. Computer Graphics Forum Vol. 29, No. 2, 575–584, 2010.
Article Google Scholar
He, K.; Rhemann, C.; Rother, C.; Tang, X.; Sun J. A global sampling method for alpha matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2049–2056, 2011.
Google Scholar
Cho, D.; Tai, Y. W.; Kweon, I. Natural image matting using deep convolutional neural networks. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 626–643, 2016.
Google Scholar
Lutz, S.; Amplianitis, K.; Smolic, A. Alphagan: Generative adversarial networks for natural image matting. arXiv preprint arXiv:1807.10088, 2018.
Google Scholar
Tang, J. W.; Aksoy, Y.; Oztireli, C.; Gross, M.; Aydin, T. O. Learning-based sampling for natural image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3055–3063, 2019.
Google Scholar
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440, 2015.
Google Scholar
Zhao, H. S.; Shi, J. P.; Qi, X. J.; Wang, X. G.; Jia, J. Y. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2881–2890, 2017.
Google Scholar
Zhang, Y. K.; Gong, L. X.; Fan, L. B.; Ren, P. R.; Huang, Q. X.; Bao, H. J.; Xu, W. A late fusion CNN for digital matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7469–7478, 2019.
Google Scholar
Ronneberger, O.; Fischer, P.; Brox, T. Unet: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W.; Frangi, A. Eds. Springer Cham, 234–241, 2015.
Google Scholar
Chen, L. C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834–848, 2018.
Article Google Scholar
Carreira, J.; Agrawal, P.; Fragkiadaki, K.; Malik, J. Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4733–4742, 2016.
Google Scholar
Toshev, A.; Szegedy, C. DeepPose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1653–1660, 2014.
Google Scholar
Newell, A.; Yang, K. Y.; Deng, J. Stacked hourglass networks for human pose estimation. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 483–499, 2016.
Google Scholar
Wei, S.-E.; Ramakrishna, V.; Kanade, T.; Sheikh, Y. Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4724–4732, 2016.
Google Scholar
Chu, X.; Ouyang, W. L.; Li, H. S.; Wang, X. G. Structured feature learning for pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4715–4723, 2016.
Google Scholar
Liang, X. D.; Gong, K.; Shen, X. H.; Lin, L. Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 4, 871–885, 2019.
Article Google Scholar
Kikuchi, T.; Endo, Y.; Kanamori, Y.; Hashimoto, T.; Mitani, J. Transferring pose and augmenting background for deep human-image parsing and its applications. Computational Visual Media Vol. 4, No. 1, 43–54, 2018.
Article Google Scholar
Wu, X.; Li, R. L.; Zhang, F. L.; Liu, J. C.; Wang, J.; Shamir, A.; Hu, S.-M. Deep portrait image completion and extrapolation. IEEE Transactions on Image Processing Vol. 29, 2344–2355, 2020.
Article Google Scholar
He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
Google Scholar
He, K. M.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2961–2969, 2017.
Google Scholar
Lin, T. Y.; Dollar, P.; Girshick, R.; He, K. M.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117–2125, 2017.
Google Scholar
Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S. E.; Sheikh, Y. OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008, 2018.
Google Scholar
Lin, T. Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Doll´ar, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision–ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B..; Tuytelaars T. Eds. Springer Cham, 740–755, 2014.
Google Scholar
Rhemann, C.; Rother, C.; Wang, J.; Gelautz, M.; Kohli, P.; Rott, P. A perceptually motivated online benchmark for image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1826–1833, 2009.
Google Scholar

Download references

Acknowledgements

The authors would like to thank all the reviewers. We gratefully acknowledge the support of Jian-Cheng Liu who helped prepare and preprocess the dataset. This work was supported by National Natural Science Foundation of China (Grant Nos. 61561146393 and 61521002). Fang-Lue Zhang was supported by a Victoria Early-Career Research Excellence Award.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Xian Wu & Xiao-Nan Fang
AI Center at Visual China Group, Burlingame, CA, 94010, USA
Tao Chen
School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
Fang-Lue Zhang

Authors

Xian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Nan Fang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fang-Lue Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fang-Lue Zhang.

Additional information

Xian Wu is currently a Ph.D. student in Tsinghua University. He received his B.S. degree from Tsinghua University in 2015. His research interests include image synthesis and editing, and deep learning in computer graphics.

Xiao-Nan Fang is currently a Ph.D. student in Tsinghua University. He received his B.S. degree from Tsinghua University in 2018. His research interests include image and video processing, and computer graphics.

Tao Chen received his B.S. degree in fundamental science and Ph.D. degree in computer science from Tsinghua University, China, in 2005 and 2011, respectively. He is currently the Deputy General Manager of the AI Center at Visual China Group and the VP of Machine Learning, 500PX Inc. His research interests include multimedia, computer graphics, and computer vision.

Fang-Lue Zhang is a lecturer at Victoria University of Wellington. He received his doctoral degree from Tsinghua University in 2015 and bachelor degree from Zhejiang University in 2009. His research interests include image and video editing, computer vision, and computer graphics. He is a member of ACM and IEEE.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Wu, X., Fang, XN., Chen, T. et al. JMNet: A joint matting network for automatic human matting. Comp. Visual Media 6, 215–224 (2020). https://doi.org/10.1007/s41095-020-0168-6

Download citation

Received: 13 January 2020
Accepted: 19 February 2020
Published: 14 April 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s41095-020-0168-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

JMNet: A joint matting network for automatic human matting

Abstract

Article PDF

Similar content being viewed by others

Efficient Semantic-Guidance High-Resolution Video Matting

Alpha matting for portraits using encoder-decoder models

Alpha Local Difference Loss Function for Deep Image Matting

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

JMNet: A joint matting network for automatic human matting

Abstract

Article PDF

Similar content being viewed by others

Efficient Semantic-Guidance High-Resolution Video Matting

Alpha matting for portraits using encoder-decoder models

Alpha Local Difference Loss Function for Deep Image Matting

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation