Single-Image 3D Human Pose and Shape Estimation Enhanced by Clothed 3D Human Reconstruction

Liu, Leyuan; Gao, Yunqi; Sun, Jianchi; Chen, Jingying

doi:10.1007/978-981-99-9109-9_4

Leyuan Liu^7,8,
Yunqi Gao⁷,
Jianchi Sun⁷ &
…
Jingying Chen^7,8

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1998))

Included in the following conference series:

International Symposium on Artificial Intelligence and Robotics

287 Accesses

Abstract

3D human pose and shape estimation and clothed 3D human reconstruction are two hot topics in the community of computer vision. 3D human pose and shape estimation aims to estimate the 3D poses and body shapes of “naked” humans under clothes, while clothed 3D human reconstruction refers to reconstructing the surfaces of humans wearing clothes. These two topics are closely related, but researchers usually study them separately. In this paper, we enhance the accuracy of the 3D human pose and body shape estimation by the reconstructed clothed 3D human models. Our method consists of two main components: the 3D body mesh recovery module and the clothed 3D human reconstruction module. In the 3D body mesh recovery module, an intermediate 3D body mesh is first recovered from the input image by a graph convolutional network (GCN), and then the 3D body pose and shape parameters are estimated by a regressor. In the clothed human reconstruction module, two clothed human surface models are respectively reconstructed under the guidance of the recovered 3D body mesh and the ground-truth 3D body mesh. At the training phase, losses which are described by the residuals among the two reconstructed clothed human models and ground truth are passed back into the 3D body mesh recovery module and used for boosting the body mesh recovery module. The quantitative and qualitative experimental results on THuman2.0, and LSP show that our method outperforms the current state-of-the-art 3D human pose and shape estimation methods.

This work was supported by the National Natural Science Foundation of China under grant No. 62077026 and the Fundamental Research Funds for the Central Universities under grant No. CCNU22QN012.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Chapter Google Scholar
Choi, H., Moon, G., Lee, K.M.: Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 769–787. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_45
Chapter Google Scholar
Choi, H., Moon, G., Park, J., Lee, K.M.: Learning to estimate robust 3D human mesh from in-the-wild crowded scenes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/10.1109/CVPR52688.2022.00153
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Chapter Google Scholar
He, T., Xu, Y., Saito, S., Soatto, S., Tung, T.: ARCH++: animation-ready clothed human reconstruction revisited. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11046–11056 (2021). https://doi.org/10.1109/ICCV48922.2021.01086
Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: International Conference on 3D Vision (3DV), pp. 421–430 (2017). https://doi.org/10.1109/3DV.2017.00055
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: British Machine Vision Conference (BMVC), vol. 2, p. 5 (2010). https://doi.org/10.5244/C.24.12
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7122–7131 (2018). https://doi.org/10.1109/CVPR.2018.00744
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR), pp. 1–15 (2015)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (2016)
Google Scholar
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2252–2261 (2019). https://doi.org/10.1109/ICCV.2019.00234
Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4501–4510 (2019). https://doi.org/10.1109/CVPR.2019.00463
Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6050–6059 (2017). https://doi.org/10.1109/CVPR.2017.500
Li, Y., Cai, J., Zhou, Q., Lu, H.: Joint semantic-instance segmentation method for intelligent transportation system. IEEE Trans. Intell. Transp. Syst. 1–8 (2022). https://doi.org/10.1109/TITS.2022.3190369
Liu, L., Sun, J., Gao, Y., Chen, J.: HEI-human: a hybrid explicit and implicit method for single-view 3D clothed human reconstruction. In: Ma, H., et al. (eds.) PRCV 2021. LNCS, vol. 13020, pp. 251–262. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88007-1_21
Chapter Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 1–16 (2015). https://doi.org/10.1145/2816795.2818013
Article Google Scholar
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: Conference on Computer Graphics and Interactive Techniques, pp. 163–169 (1987). https://doi.org/10.1145/37401.37422
Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: International Conference on 3D Vision (3DV), pp. 484–494 (2018). https://doi.org/10.1109/3DV.2018.00062
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: International Conference on Neural Information Processing Systems (NIPS) (2019)
Google Scholar
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2304–2314 (2019). https://doi.org/10.1109/ICCV.2019.00239
Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: posing face, body, and hands in the wild. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10965–10974 (2019). https://doi.org/10.1109/CVPR.2019.01122
Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: ICON: implicit clothed humans obtained from normals. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13296–13306 (2022). https://doi.org/10.1109/TPAMI.2021.3050505
Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5746–5756 (2021). https://doi.org/10.1109/CVPR46437.2021.00569
Zeng, W., Ouyang, W., Luo, P., Liu, W., Wang, X.: 3D human mesh regression with dense correspondence. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7054–7063 (2020). https://doi.org/10.1109/CVPR42600.2020.00708
Zhang, H., et al.: PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In: IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/ICCV48922.2021.01125
Zheng, Y., Li, Y., Yang, S., Lu, H.: Global-PBNet: a novel point cloud registration for autonomous driving. IEEE Trans. Intell. Transp. Syst. 23(11), 22312–22319 (2022). https://doi.org/10.1109/TITS.2022.3153133
Article Google Scholar
Zheng, Z., Yu, T., Liu, Y., Dai, Q.: PaMIR: parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Tans. Pattern Anal. Mach. Intell. (TPAMI) 44(6), 3170–3184 (2021)
Article Google Scholar
Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D human reconstruction from a single image. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7739–7749 (2019). https://doi.org/10.1109/ICCV.2019.00783

Download references

Author information

Authors and Affiliations

National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, China
Leyuan Liu, Yunqi Gao, Jianchi Sun & Jingying Chen
National Engineering Laboratory for Educational Big Data, Central China Normal University, Wuhan, China
Leyuan Liu & Jingying Chen

Authors

Leyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yunqi Gao
View author publications
You can also search for this author in PubMed Google Scholar
Jianchi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jingying Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingying Chen .

Editor information

Editors and Affiliations

Kyushu Institute of Technology, Fukuoka, Japan
Huimin Lu
Southeast University, Nanjing, China
Jintong Cai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, L., Gao, Y., Sun, J., Chen, J. (2024). Single-Image 3D Human Pose and Shape Estimation Enhanced by Clothed 3D Human Reconstruction. In: Lu, H., Cai, J. (eds) Artificial Intelligence and Robotics. ISAIR 2023. Communications in Computer and Information Science, vol 1998. Springer, Singapore. https://doi.org/10.1007/978-981-99-9109-9_4

Download citation

DOI: https://doi.org/10.1007/978-981-99-9109-9_4
Published: 04 January 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9108-2
Online ISBN: 978-981-99-9109-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Single-Image 3D Human Pose and Shape Estimation Enhanced by Clothed 3D Human Reconstruction