Skip to main content

Single-Image 3D Human Pose and Shape Estimation Enhanced by Clothed 3D Human Reconstruction

  • Conference paper
  • First Online:
Artificial Intelligence and Robotics (ISAIR 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1998))

Included in the following conference series:

  • 287 Accesses

Abstract

3D human pose and shape estimation and clothed 3D human reconstruction are two hot topics in the community of computer vision. 3D human pose and shape estimation aims to estimate the 3D poses and body shapes of “naked” humans under clothes, while clothed 3D human reconstruction refers to reconstructing the surfaces of humans wearing clothes. These two topics are closely related, but researchers usually study them separately. In this paper, we enhance the accuracy of the 3D human pose and body shape estimation by the reconstructed clothed 3D human models. Our method consists of two main components: the 3D body mesh recovery module and the clothed 3D human reconstruction module. In the 3D body mesh recovery module, an intermediate 3D body mesh is first recovered from the input image by a graph convolutional network (GCN), and then the 3D body pose and shape parameters are estimated by a regressor. In the clothed human reconstruction module, two clothed human surface models are respectively reconstructed under the guidance of the recovered 3D body mesh and the ground-truth 3D body mesh. At the training phase, losses which are described by the residuals among the two reconstructed clothed human models and ground truth are passed back into the 3D body mesh recovery module and used for boosting the body mesh recovery module. The quantitative and qualitative experimental results on THuman2.0, and LSP show that our method outperforms the current state-of-the-art 3D human pose and shape estimation methods.

This work was supported by the National Natural Science Foundation of China under grant No. 62077026 and the Fundamental Research Funds for the Central Universities under grant No. CCNU22QN012.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34

    Chapter  Google Scholar 

  2. Choi, H., Moon, G., Lee, K.M.: Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 769–787. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_45

    Chapter  Google Scholar 

  3. Choi, H., Moon, G., Park, J., Lee, K.M.: Learning to estimate robust 3D human mesh from in-the-wild crowded scenes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/10.1109/CVPR52688.2022.00153

  4. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38

    Chapter  Google Scholar 

  5. He, T., Xu, Y., Saito, S., Soatto, S., Tung, T.: ARCH++: animation-ready clothed human reconstruction revisited. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11046–11056 (2021). https://doi.org/10.1109/ICCV48922.2021.01086

  6. Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: International Conference on 3D Vision (3DV), pp. 421–430 (2017). https://doi.org/10.1109/3DV.2017.00055

  7. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: British Machine Vision Conference (BMVC), vol. 2, p. 5 (2010). https://doi.org/10.5244/C.24.12

  8. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7122–7131 (2018). https://doi.org/10.1109/CVPR.2018.00744

  9. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR), pp. 1–15 (2015)

    Google Scholar 

  10. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (2016)

    Google Scholar 

  11. Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2252–2261 (2019). https://doi.org/10.1109/ICCV.2019.00234

  12. Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4501–4510 (2019). https://doi.org/10.1109/CVPR.2019.00463

  13. Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6050–6059 (2017). https://doi.org/10.1109/CVPR.2017.500

  14. Li, Y., Cai, J., Zhou, Q., Lu, H.: Joint semantic-instance segmentation method for intelligent transportation system. IEEE Trans. Intell. Transp. Syst. 1–8 (2022). https://doi.org/10.1109/TITS.2022.3190369

  15. Liu, L., Sun, J., Gao, Y., Chen, J.: HEI-human: a hybrid explicit and implicit method for single-view 3D clothed human reconstruction. In: Ma, H., et al. (eds.) PRCV 2021. LNCS, vol. 13020, pp. 251–262. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88007-1_21

    Chapter  Google Scholar 

  16. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 1–16 (2015). https://doi.org/10.1145/2816795.2818013

    Article  Google Scholar 

  17. Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: Conference on Computer Graphics and Interactive Techniques, pp. 163–169 (1987). https://doi.org/10.1145/37401.37422

  18. Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: International Conference on 3D Vision (3DV), pp. 484–494 (2018). https://doi.org/10.1109/3DV.2018.00062

  19. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: International Conference on Neural Information Processing Systems (NIPS) (2019)

    Google Scholar 

  20. Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2304–2314 (2019). https://doi.org/10.1109/ICCV.2019.00239

  21. Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: posing face, body, and hands in the wild. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10965–10974 (2019). https://doi.org/10.1109/CVPR.2019.01122

  22. Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: ICON: implicit clothed humans obtained from normals. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13296–13306 (2022). https://doi.org/10.1109/TPAMI.2021.3050505

  23. Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5746–5756 (2021). https://doi.org/10.1109/CVPR46437.2021.00569

  24. Zeng, W., Ouyang, W., Luo, P., Liu, W., Wang, X.: 3D human mesh regression with dense correspondence. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7054–7063 (2020). https://doi.org/10.1109/CVPR42600.2020.00708

  25. Zhang, H., et al.: PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In: IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/ICCV48922.2021.01125

  26. Zheng, Y., Li, Y., Yang, S., Lu, H.: Global-PBNet: a novel point cloud registration for autonomous driving. IEEE Trans. Intell. Transp. Syst. 23(11), 22312–22319 (2022). https://doi.org/10.1109/TITS.2022.3153133

    Article  Google Scholar 

  27. Zheng, Z., Yu, T., Liu, Y., Dai, Q.: PaMIR: parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Tans. Pattern Anal. Mach. Intell. (TPAMI) 44(6), 3170–3184 (2021)

    Article  Google Scholar 

  28. Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D human reconstruction from a single image. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7739–7749 (2019). https://doi.org/10.1109/ICCV.2019.00783

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingying Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, L., Gao, Y., Sun, J., Chen, J. (2024). Single-Image 3D Human Pose and Shape Estimation Enhanced by Clothed 3D Human Reconstruction. In: Lu, H., Cai, J. (eds) Artificial Intelligence and Robotics. ISAIR 2023. Communications in Computer and Information Science, vol 1998. Springer, Singapore. https://doi.org/10.1007/978-981-99-9109-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-9109-9_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-9108-2

  • Online ISBN: 978-981-99-9109-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics