Skip to main content
Log in

Topology-preserved human reconstruction with details

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Due to the high diversity and complexity of body shapes, it is challenging to directly estimate the human geometry from a single image with the various clothing styles. Most of the model-based approaches are limited to predict the shape and pose of a minimally clothed body with over-smoothing surface. While capturing the fine detailed geometries, the model-free methods are lack of the fixed mesh topology. To address these issues, we propose a novel topology-preserved human reconstruction approach by bridging the gap between model-based and model-free human reconstruction. We present an end-to-end neural network that simultaneously predicts the pixel-aligned implicit surface and an explicit mesh model built by graph convolutional neural network. Experiments on DeepHuman and our collected dataset showed that our approach is effective. The code will be made publicly available at https://github.com/l1346792580123/sdfgcn.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. ACM Trans. Graph. 24(3), 408–416 (2005)

    Article  Google Scholar 

  2. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1-248:16 (2015)

    Article  Google Scholar 

  3. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 7122–7131 (2018)

  4. Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: International Conference on Computer Vision ICCV, pp. 2252–2261 (2019)

  5. Neophytou, A., Hilton, A.: A layered model of human body and garment deformation. In: International Conference on 3DV, pp. 171–178 (2014)

  6. Lähner, Z., Cremers, D., Tung, T.: Deepwrinkles: Accurate and realistic clothing modeling. In: European Conference on Computer Vision ECCV, vol. 11208, pp. 698–715 (2018)

  7. Yang, J., Franco, J., Hétroy-Wheeler, F., Wuhrer, S.: Analyzing clothing layer deformation statistics of 3d human motions. In: European Conference on Computer Vision ECCV, vol. 11211, pp. 245–261 (2018)

  8. Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A.A., Tzionas, D., Black, M.J.: Expressive body capture: 3d hands, face, and body from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 10,975–10,985 (2019)

  9. Saito, S., Simon, T., Saragih, J.M., Joo, H.: Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 81–90 (2020)

  10. Varol, G., Ceylan, D., Russell, B.C., Yang, J., Yumer, E., Laptev, I., Schmid, C.: Bodynet: Volumetric inference of 3d human body shapes. In: European Conference on Computer Vision ECCV, vol. 11211, pp. 20–38 (2018)

  11. Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: Deephuman: 3d human reconstruction from a single image. In: International Conference on Computer Vision ICCV, pp. 7738–7748. IEEE (2019)

  12. Saito, S., Huang, Z., Natsume, R., Morishima, S., Li, H., Kanazawa, A.: Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In: International Conference on Computer Vision ICCV, pp. 2304–2314 (2019)

  13. Chen, Y., Tian, Y., He, M.: Monocular human pose estimation: a survey of deep learning-based methods. Comput. Vis. Image Underst. 192, 102,897 (2020)

    Article  Google Scholar 

  14. Desmarais, Y., Mottet, D., Slangen, P., Montesinos, P.: A review of 3d human pose estimation algorithms for markerless motion capture. Comput. Vis. Image Underst. 212, 103,275 (2021)

    Article  Google Scholar 

  15. Wang, J., Tan, S., Zhen, X., Xu, S., Zheng, F., He, Z., Shao, L.: Deep 3d human pose estimation: a review. Comput. Vis. Image Underst. 210, 103,225 (2021)

    Article  Google Scholar 

  16. Joo, H., Simon, T., Sheikh, Y.: Total capture: A 3d deformation model for tracking faces, hands, and bodies. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 8320–8329 (2018)

  17. Ranjan, A., Bolkart, T., Sanyal, S., Black, M.J.: Generating 3d faces using convolutional mesh autoencoders. In: European Conference on Computer Vision ECCV, vol. 11207, pp. 725–741 (2018)

  18. Choi, H., Moon, G., Lee, K.M.: Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In: European Conference on Computer Vision ECCV, vol. 12352, pp. 769–787 (2020)

  19. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P.V., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3d human pose and shape from a single image. In: European Conference on Computer Vision ECCV, vol. 9909, pp. 561–578 (2016)

  20. Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: Posing face, body, and hands in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 10965–10974 (2019)

  21. Sun, W., Wang, L., Ma, S., Ma, Q.: Estimating 3d body mesh without smpl annotations via alternating successive convex approximation. Comput. Vis. Image Underst. 224, 103539 (2022)

    Article  Google Scholar 

  22. Alldieck, T., Magnor, M.A., Xu, W., Theobalt, C., Pons-Moll, G.: Detailed human avatars from monocular video. In: International Conference on 3DV, pp. 98–109 (2018)

  23. Alldieck, T., Magnor, M.A., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1175–1186 (2019)

  24. Zhu, H., Zuo, X., Wang, S., Cao, X., Yang, R.: Detailed human shape estimation from a single image by hierarchical mesh deformation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4491–4500 (2019)

  25. Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: Clothcap: seamless 4d clothing capture and retargeting. ACM Trans. Graph. 36(4), 73:1-73:15 (2017)

    Article  Google Scholar 

  26. Ma, Q., Yang, J., Ranjan, A., Pujades, S., Pons-Moll, G., Tang, S., Black, M.J.: Learning to dress 3d people in generative clothing. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 6468–6477 (2020)

  27. Corona, E., Pumarola, A., Alenyà, G., Pons-Moll, G., Moreno-Noguer, F.: Smplicit: Topology-aware generative model for clothed people. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 11875–11885 (2021)

  28. Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Combining implicit function learning and parametric models for 3d human reconstruction. In: European Conference on Computer Vision ECCV, vol. 12347, pp. 311–329 (2020)

  29. Saito, S., Yang, J., Ma, Q., Black, M.J.: Scanimate: Weakly supervised learning of skinned clothed avatar networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2886–2897 (2021)

  30. Ma, Q., Saito, S., Yang, J., Tang, S., Black, M.J.: SCALE: modeling clothed humans with a surface codec of articulated local elements. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, pp. 16082–16093 (2021)

  31. Gabeur, V., Franco, J., Martin, X., Schmid, C., Rogez, G.: Moulding humans: Non-parametric 3d human shape estimation from single images. In: International Conference on Computer Vision ICCV, pp. 2232–2241 (2019)

  32. Tang, S., Tan, F., Cheng, K., Li, Z., Zhu, S., Tan, P.: A neural network for detailed human depth estimation from a single image. In: International Conference on Computer Vision ICCV, pp. 7749–7758 (2019)

  33. Natsume, R., Saito, S., Huang, Z., Chen, W., Ma, C., Li, H., Morishima, S.: Siclope: Silhouette-based clothed people. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4480–4490 (2019)

  34. Wang, T., Liu, M., Zhu, J., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 8798–8807 (2018)

  35. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision ECCV, vol. 9912, pp. 483–499 (2016)

  36. Maas, A., Hannun, A., Ng, A.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the International Conference on Machine Learning vol. 30, p. 3 (2013)

  37. Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: Archive of motion capture as surface shapes. In: International Conference on Computer Vision ICCV, pp. 5442–5451 (2019)

  38. Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3d surface construction algorithm. In: SIGGRAPH, pp. 163–169. ACM (1987)

  39. renderpeople: https://www.renderpeople.com

  40. Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision ECCV, vol. 8693, pp. 740–755 (2014)

  41. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human 3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)

    Article  Google Scholar 

  42. Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4501–4510 (2019)

  43. Zhu, H., Zuo, X., Wang, S., Cao, X., Yang, R.: Detailed human shape estimation from a single image by hierarchical mesh deformation. In: IEEE Conference on Computer Vision and Pattern Recognition, (CVPR), pp. 4491–4500 (2019)

Download references

Funding

This work is supported by the National Natural Science Foundation of China under Grants (61831015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianke Zhu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 12227 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, L., Zhu, J. Topology-preserved human reconstruction with details. Vis Comput 39, 3609–3619 (2023). https://doi.org/10.1007/s00371-023-02957-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02957-0

Keywords

Navigation