Advertisement

STAR: Sparse Trained Articulated Human Body Regressor

Conference paper
  • 764 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12351)

Abstract

The SMPL body model is widely used for the estimation, synthesis, and analysis of 3D human pose and shape. While popular, we show that SMPL has several limitations and introduce STAR, which is quantitatively and qualitatively superior to SMPL. First, SMPL has a huge number of parameters resulting from its use of global blend shapes. These dense pose-corrective offsets relate every vertex on the mesh to all the joints in the kinematic tree, capturing spurious long-range correlations. To address this, we define per-joint pose correctives and learn the subset of mesh vertices that are influenced by each joint movement. This sparse formulation results in more realistic deformations and significantly reduces the number of model parameters to 20% of SMPL. When trained on the same data as SMPL, STAR generalizes better despite having many fewer parameters. Second, SMPL factors pose-dependent deformations from body shape while, in reality, people with different shapes deform differently. Consequently, we learn shape-dependent pose-corrective blend shapes that depend on both body pose and BMI. Third, we show that the shape space of SMPL is not rich enough to capture the variation in the human population. We address this by training STAR with an additional 10,000 scans of male and female subjects, and show that this results in better model generalization. STAR is compact, generalizes better to new bodies and is a drop-in replacement for SMPL. STAR is publicly available for research purposes at http://star.is.tue.mpg.de.

Notes

Acknowledgments

The authors thank N. Mahmood for insightful discussions and feedback, and the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting A. A. A. Osman. The authors would like to thank Joachim Tesch, Muhammed Kocabas, Nikos Athanasiou, Nikos Kolotouros and Vassilis Choutas for their support and fruitful discussions.

Disclosure: In the last five years, MJB has received research gift funds from Intel, Nvidia, Facebook, and Amazon. He is a co-founder and investor in Meshcapade GmbH, which commercializes 3D body shape technology. While MJB is a part-time employee of Amazon, his research was performed solely at, and funded solely by, MPI.

Supplementary material

504443_1_En_36_MOESM1_ESM.pdf (3 mb)
Supplementary material 1 (pdf 3122 KB)

References

  1. 1.
    Dyna dataset (2015). http://dyna.is.tue.mpg.de/. Accessed 15 May 2015
  2. 2.
    SizeUSA dataset (2017). https://www.tc2.com/size-usa.html
  3. 3.
    Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Detailed human avatars from monocular video. In: International Conference on 3D Vision (3DV), pp. 98–109 (2018)Google Scholar
  4. 4.
    Allen, B., Curless, B., Popović, Z.: Articulated body deformation from range scan data. ACM Trans. Graph. (Proc. SIGGRAPH) 21(3), 612–619 (2002)Google Scholar
  5. 5.
    Allen, B., Curless, B., Popović, Z.: The space of human body shapes: reconstruction and parameterization from range scans. ACM Trans. Graph. (Proc. SIGGRAPH) 22(3), 587–594 (2003)Google Scholar
  6. 6.
    Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of PEople. ACM Trans. Graph. (Proc. SIGGRAPH) 24(3), 408–416 (2005)Google Scholar
  7. 7.
    Chen, Y., Liu, Z., Zhang, Z.: Tensor-based human body modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 105–112 (2013)Google Scholar
  8. 8.
    Freifeld, O., Black, M.J.: Lie bodies: a manifold representation of 3D human shape. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 1–14. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33718-5_1CrossRefGoogle Scholar
  9. 9.
    Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., Seidel, H.P.: A statistical model of human pose and body shape. Comput. Graph. Forum 28(2), 337–346 (2009)CrossRefGoogle Scholar
  10. 10.
    Hirshberg, D.A., Loper, M., Rachlin, E., Black, M.J.: Coregistration: simultaneous alignment and modeling of articulated 3D shape. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 242–255. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33783-3_18CrossRefGoogle Scholar
  11. 11.
    Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: International Conference on 3D Vision (3DV), pp. 421–430 (2017)Google Scholar
  12. 12.
    Huang, Y., Kaufmann, M., Aksan, E., Black, M.J., Hilliges, O., Pons-Moll, G.: Deep inertial poser: learning to reconstruct human pose from sparse inertial measurements in real time. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 37, 185:1–185:15 (2018)Google Scholar
  13. 13.
    Jacobson, A., Baran, I., Kavan, L., Popović, J., Sorkine, O.: Fast automatic skinning transformations. ACM Trans. Graph. (TOG) 31(4), 1–10 (2012)CrossRefGoogle Scholar
  14. 14.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7122–7131 (2018)Google Scholar
  15. 15.
    Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: video inference for human body pose and shape estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5253–5263 (2020)Google Scholar
  16. 16.
    Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2252–2261 (2019)Google Scholar
  17. 17.
    Kry, P.G., James, D.L., Pai, D.K.: Eigenskin: real time large deformation character skinning in hardware. In: Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 153–159 (2002)Google Scholar
  18. 18.
    Kurihara, T., Miyata, N.: Modeling deformable human hands from medical images. In: Proceedings of the 2004 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 355–363 (2004)Google Scholar
  19. 19.
    Lewis, J.P., Cordner, M., Fong, N.: Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 165–172. SIGGRAPH 2000 (2000)Google Scholar
  20. 20.
    Liu, L., Zheng, Y., Tang, D., Yuan, Y., Fan, C., Zhou, K.: NeuroSkinning: automatic skin binding for production characters with deep graph networks. ACM Trans. Graph. (TOG) 38(4), 1–12 (2019)Google Scholar
  21. 21.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 34(6), 248:1–248:16 (2015)Google Scholar
  22. 22.
    Loper, M.M., Mahmood, N., Black, M.J.: MoSh: motion and shape capture from sparse markers. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 33(6), 220:1–220:13 (2014)Google Scholar
  23. 23.
    Magnenat-Thalmann, N., Laperrire, R., Thalmann, D.: Joint-dependent local deformations for hand animation and object grasping. In: Proceedings on Graphics Interface. Citeseer (1988)Google Scholar
  24. 24.
    Magnenat-Thalmann, N., Thalmann, D.: Human body deformations using joint-dependent local operators and finite-element theory. Technical report, EPFL (1990)Google Scholar
  25. 25.
    Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: archive of motion capture as surface shapes. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5442–5451 (2019)Google Scholar
  26. 26.
    von Marcard, T., Rosenhahn, B., Black, M., Pons-Moll, G.: Sparse inertial poser: automatic 3D human pose estimation from sparse IMUs. Comput. Graph. Forum 36(2) (2017). Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics), pp. 349–360Google Scholar
  27. 27.
    Neumann, T., Varanasi, K., Wenger, S., Wacker, M., Magnor, M., Theobalt, C.: Sparse localized deformation components. ACM Trans. Graph. (TOG) 32(6), 1–10 (2013)CrossRefGoogle Scholar
  28. 28.
    Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 459–468 (2018)Google Scholar
  29. 29.
    Pishchulin, L., Wuhrer, S., Helten, T., Theobalt, C., Schiele, B.: Building statistical shape spaces for 3D human modeling. Pattern Recogn. 67, 276–286 (2017)CrossRefGoogle Scholar
  30. 30.
    Pons-Moll, G., Romero, J., Mahmood, N., Black, M.J.: Dyna: a model of dynamic human shape in motion. ACM Trans. Graph. (Proc. SIGGRAPH) 34(4), 120:1–120:14 (2015)Google Scholar
  31. 31.
    Rhee, T., Lewis, J.P., Neumann, U.: Real-time weighted pose-space deformation on the GPU. Comput. Graph. Forum 25(3), 439–448 (2006)CrossRefGoogle Scholar
  32. 32.
    Robinette, K.M., et al.: Civilian American and European surface anthropometry resource (CAESAR) final report. Technical report AFRL-HE-WP-TR-2002-0169, US Air Force Research Laboratory (2002)Google Scholar
  33. 33.
    Rueegg, N., Lassner, C., Black, M.J., Schindler, K.: Chained representation cycling: learning to estimate 3D human pose and shape by cycling between representations. In: Conference on Artificial Intelligence (AAAI-20) (2020)Google Scholar
  34. 34.
    Saint, A., Ahmed, E., Cherenkova, K., Gusev, G., Aouada, D., Ottersten, B.: 3DBodyTex: textured 3D body dataset. In: International Conference on 3D Vision (3DV), pp. 495–504 (2018)Google Scholar
  35. 35.
    Seo, H., Cordier, F., Magnenat-Thalmann, N.: Synthesizing animatable body models with parameterized shape modifications. In: ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 2003, pp. 120–125 (2003)Google Scholar
  36. 36.
    Tan, J., Budvytis, I., Cipolla, R.: Indirect deep structured learning for 3D human body shape and pose prediction. In: Proceedings of the British Machine Vision Conference (BMVC), pp. 4–7 (2017)Google Scholar
  37. 37.
    Xu, H., Bazavan, E.G., Zanfir, A., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: GHUM & GHUML: generative 3D human shape and articulated pose models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6184–6193 (2020)Google Scholar
  38. 38.
    Zanfir, A., Marinoiu, E., Sminchisescu, C.: Monocular 3D pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2148–2157 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Max Planck Institute for Intelligent SystemsTübingenGermany

Personalised recommendations