Transferring pose and augmenting background for deep human-image parsing and its applications

Kikuchi, Takazumi; Endo, Yuki; Kanamori, Yoshihiro; Hashimoto, Taisuke; Mitani, Jun

doi:10.1007/s41095-017-0098-0

Transferring pose and augmenting background for deep human-image parsing and its applications

Research Article
Open access
Published: 30 January 2018

Volume 4, pages 43–54, (2018)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Transferring pose and augmenting background for deep human-image parsing and its applications

Download PDF

Takazumi Kikuchi¹,
Yuki Endo¹,
Yoshihiro Kanamori¹,
Taisuke Hashimoto¹ &
…
Jun Mitani¹

997 Accesses
7 Citations
Explore all metrics

Abstract

Parsing of human images is a fundamental task for determining semantic parts such as the face, arms, and legs, as well as a hat or a dress. Recent deep-learning-based methods have achieved significant improvements, but collecting training datasets with pixel-wise annotations is labor-intensive. In this paper, we propose two solutions to cope with limited datasets. Firstly, to handle various poses, we incorporate a pose estimation network into an end-to-end human-image parsing network, in order to transfer common features across the domains. The pose estimation network can be trained using rich datasets and can feed valuable features to the human-image parsing network. Secondly, to handle complicated backgrounds, we increase the variation in image backgrounds automatically by replacing the original backgrounds of human images with others obtained from large-scale scenery image datasets. Individually, each solution is versatile and beneficial to human-image parsing, while their combination yields further improvement. We demonstrate the effectiveness of our approach through comparisons and various applications such as garment recoloring, garment texture transfer, and visualization for fashion analysis.

Article PDF

BCNet: Learning Body and Cloth Shape from a Single Image

RICH: Robust Implicit Clothed Humans Reconstruction from Multi-scale Spatial Cues

Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Kanamori, Y.; Yamada, H.; Hirose, M.; Mitani, J.; Fukui, Y. Image-based virtual try-on system with garment reshaping and color correction. In: Lecture Notes in Computer Science, Vol. 9550. Gavrilova, M.; Tan, C.; Iglesias, A.; Shinya, M.; Galvez, A.; Sourin, A. Eds. Berlin, Heidelberg: Springer, 1–16, 2016.
Google Scholar
Di, W.; Wah, C.; Bhardwaj, A.; Piramuthu, R.; Sundaresan, N. Style finder: Fine-grained clothing style detection and retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 8–13, 2013.
Google Scholar
Hu, Y.; Yi, X.; Davis, L. S. Collaborative fashion recommendation: A functional tensor factorization approach. In: Proceedings of the 23rd ACM International Conference on Multimedia, 129–138, 2015.
Chapter Google Scholar
Kalantidis, Y.; Kennedy, L.; Li, L.-J. Getting the look: Clothing recognition and segmentation for automatic product suggestions in everyday photos. In: Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, 105–112, 2013.
Chapter Google Scholar
Wei, S.-E.; Ramakrishna, V.; Kanade, T.; Sheikh, Y. Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4724–4732, 2016.
Google Scholar
Liang, X.; Xu, C.; Shen, X.; Yang, J.; Tang, J.; Lin, L.; Yan, S. Human parsing with contextualized convolutional neural network. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 1, 115–127, 2017.
Article Google Scholar
Quattoni, A.; Torralba, A. Recognizing indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 413–420, 2009.
Google Scholar
Yamaguchi, K.; Kiapour, M. H.; Ortiz, L. E.; Berg, T. L. Parsing clothing in fashion photographs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3570–3577, 2012.
Google Scholar
Yamaguchi, K.; Kiapour, M.; Ortiz, L.; Berg, T. Retrieving similar styles to parse clothing. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 5, 1028–1040, 2015.
Article Google Scholar
Simo-Serra, E.; Fidler, S.; Moreno-Noguer, F.; Urtasun, R. A high performance CRF model for clothes parsing. In: Proceedings of the Asian Conference on Computer Vision, 64–81, 2014.
Google Scholar
Dong, J.; Chen, Q.; Shen, X.; Yang, J.; Yan, S. Towards unified human parsing and pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 843–850, 2014.
Google Scholar
Liu, S.; Liang, X.; Liu, L.; Lu, K.; Lin, L.; Yan, S. Fashion parsing with video context. In: Proceedings of the 22nd ACM International Conference on Multimedia, 467–476, 2014.
Google Scholar
Liang, X.; Liu, S.; Shen, X.; Yang, J.; Liu, L.; Dong, J.; Lin, L.; Yan, S. Deep human parsing with active template regression. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 12, 2402–2414, 2015.
Article Google Scholar
Liu, S.; Liang, X.; Liu, L.; Shen, X.; Yang, J.; Xu, C.; Lin, L.; Cao, X.; Yan, S. Matching-CNN meets KNN: Quasi-parametric human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1419–1427, 2015.
Google Scholar
Bertasius, G.; Shi, J.; Torresani, L. Semantic segmentation with boundary neural fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3602–3610, 2016.
Google Scholar
Ghiasi, G.; Fowlkes, C. C. Laplacian pyramid reconstruction and refinement for semantic segmentation. In: Proceedings of the European Conference on Computer Vision, 519–534, 2016.
Google Scholar
Liang, X.; Shen, X.; Feng, J.; Lin, L.; Yan, S. Semantic object parsing with graph LSTM. In: Proceedings of the European Conference on Computer Vision, 125–143, 2016.
Google Scholar
Liang, X.; Shen, X.; Xiang, D.; Feng, J.; Lin, L.; Yan, S. Semantic object parsing with local-global long shortterm memory. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3185–3193, 2016.
Google Scholar
Lin, G.; Shen, C.; van den Hengel, A.; Reid, I. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3194–3203, 2016.
Google Scholar
Vemulapalli, R.; Tuzel, O.; Liu, M.-Y.; Chellapa, R. Gaussian conditional random field network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3224–3233, 2016.
Google Scholar
Dai, J.; He, K.; Sun, J. Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3150–3158, 2016.
Google Scholar
Hong, S.; Oh, J.; Lee, H.; Han, B. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3204–3212, 2016.
Google Scholar
Papandreou, G.; Chen, L.; Murphy, K. P.; Yuille, A. L. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 1742–1750, 2015.
Google Scholar
Yang, W.; Ouyang, W.; Li, H.; Wang, X. Endto-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3073–3082, 2016.
Google Scholar
Chu, X.; Ouyang, W.; Li, H.; Wang, X. Structured feature learning for pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4715–4723, 2016.
Google Scholar
Andriluka, M.; Pishchulin, L.; Gehler, P.; Schiele, B. 2D human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3686–3693, 2014.
Google Scholar
Aksoy, Y.; Aydin, T. O.; Pollefeys, M. Designing effective inter-pixel information flow for natural image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 29–37, 2017.
Google Scholar
Floater, M. S. Mean value coordinates. Computer Aided Geometric Design Vol. 20, No. 1, 19–27, 2003.
Article MathSciNet MATH Google Scholar
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research Vol. 9, 2579–2605, 2008.
MATH Google Scholar
Simo-Serra, E.; Ishikawa, H. Fashion style in 128 floats: Joint ranking and classification using weak data for feature extraction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 298–307, 2016.
Google Scholar
He, H.; Bai, Y.; Garcia, E. A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322–1328, 2008.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Tsukuba, 1-1-1 Tennohdai, Tsukuba City, Ibaraki, Japan
Takazumi Kikuchi, Yuki Endo, Yoshihiro Kanamori, Taisuke Hashimoto & Jun Mitani

Authors

Takazumi Kikuchi
View author publications
You can also search for this author in PubMed Google Scholar
Yuki Endo
View author publications
You can also search for this author in PubMed Google Scholar
Yoshihiro Kanamori
View author publications
You can also search for this author in PubMed Google Scholar
Taisuke Hashimoto
View author publications
You can also search for this author in PubMed Google Scholar
Jun Mitani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yuki Endo or Yoshihiro Kanamori.

Additional information

This article is published with open access at Springerlink.com

Takazumi Kikuchi received his B.S. degree from the University of Tsukuba, Japan, in 2016. He is studying computer graphics and image processing on the master’s course in computer science at the University of Tsukuba.

Yuki Endo received his B.S., M.S., and Ph.D. degrees in engineering from the University of Tsukuba, Japan, in 2010, 2012, and 2017, respectively. In 2016, he started working at the University of Tsukuba, where his present post is assistant professor in the Graduate School of Systems and Information Engineering. His research interests center on computer graphics and include image processing and machine learning.

Yoshihiro Kanamori received his B.S., M.S., and Ph.D. degrees in computer science from the University of Tokyo, Japan, in 2003, 2005, and 2009, respectively. He is an associate professor in the University of Tsukuba. He was a visiting researcher in ETH Zurich from 2014 to 2016 funded by the postdoctoral fellowship for research abroad of the Japan Society for the Promotion of Science (JSPS). His research interests center on computer graphics, especially rendering techniques. He studies image editing techniques for reproducing real-world phenomena as well as techniques for assisting creation of illustrations and animations.

Taisuke Hashimoto received his B.S. degree from the University of Tsukuba, Japan, in 2017. He is studying computer graphics and image processing on the master’s course in computer science at the University of Tsukuba.

Jun Mitani received his Ph.D. degree in engineering from the University of Tokyo in 2004. He has been a professor at the University of Tsukuba since April 2015. His research interests center on computer graphics, in particular geometric modeling techniques and their application to curved origami as well as interactive design interfaces.

Electronic supplementary material

Transferring pose and augmenting background for deep human-image parsing and its applications

Rights and permissions

Open Access The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Kikuchi, T., Endo, Y., Kanamori, Y. et al. Transferring pose and augmenting background for deep human-image parsing and its applications. Comp. Visual Media 4, 43–54 (2018). https://doi.org/10.1007/s41095-017-0098-0

Download citation

Received: 08 September 2017
Accepted: 08 November 2017
Published: 30 January 2018
Issue Date: March 2018
DOI: https://doi.org/10.1007/s41095-017-0098-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Transferring pose and augmenting background for deep human-image parsing and its applications

Abstract

Article PDF

Similar content being viewed by others

BCNet: Learning Body and Cloth Shape from a Single Image

RICH: Robust Implicit Clothed Humans Reconstruction from Multi-scale Spatial Cues

Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Electronic supplementary material

Transferring pose and augmenting background for deep human-image parsing and its applications

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transferring pose and augmenting background for deep human-image parsing and its applications

Abstract

Article PDF

Similar content being viewed by others

BCNet: Learning Body and Cloth Shape from a Single Image

RICH: Robust Implicit Clothed Humans Reconstruction from Multi-scale Spatial Cues

Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Electronic supplementary material

Transferring pose and augmenting background for deep human-image parsing and its applications

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation