Hierarchical Kinematic Human Mesh Recovery

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)


We consider the problem of estimating a parametric model of 3D human mesh from a single image. While there has been substantial recent progress in this area with direct regression of model parameters, these methods only implicitly exploit the human body kinematic structure, leading to sub-optimal use of the model prior. In this work, we address this gap by proposing a new technique for regression of human parametric model that is explicitly informed by the known hierarchical structure, including joint interdependencies of the model. This results in a strong prior-informed design of the regressor architecture and an associated hierarchical optimization that is flexible to be used in conjunction with the current standard frameworks for 3D human mesh recovery. We demonstrate these aspects by means of extensive experiments on standard benchmark datasets, showing how our proposed new design outperforms several existing and popular methods, establishing new state-of-the-art results. By considering joint interdependencies, our method is equipped to infer joints even under data corruptions, which we demonstrate by conducting experiments under varying degrees of occlusion.

Supplementary material

504472_1_En_45_MOESM1_ESM.pdf (3 mb)
Supplementary material 1 (pdf 3067 KB)


  1. 1.
    Singh, V., et al.: DARWIN: deformable patient avatar representation with deep image network. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 497–504. Springer, Cham (2017). Scholar
  2. 2.
    Martínez-González, A., Villamizar, M., Canévet, O., Odobez, J.-M.: Real-time convolutional networks for depth-based human pose estimation. In: IROS. IEEE (2018)Google Scholar
  3. 3.
    Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision - ECCV 2016. LNCS, vol. 9905. Springer, Cham (2016)Google Scholar
  4. 4.
    Tung, H.-Y., Tung, H.-W., Yumer, E., Fragkiadaki, K.: Self-supervised learning of motion capture. In: NIPS. Curran Associates Inc. (2017)Google Scholar
  5. 5.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR. IEEE (2018)Google Scholar
  6. 6.
    Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: ICCV. IEEE (2019)Google Scholar
  7. 7.
    Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: CVPR. IEEE (2017)Google Scholar
  8. 8.
    Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: CVPR. IEEE (2019)Google Scholar
  9. 9.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM TOG 34(6), 1–16 (2015)CrossRefGoogle Scholar
  10. 10.
    Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. arXiv preprint arXiv:1812.08008, 2018
  11. 11.
    Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: CVPR. IEEE (2016)Google Scholar
  12. 12.
    Güler, R.A., Neverova, N., Kokkinos, I.: Dense human pose estimation in the wild. In: CVPR. IEEE, Densepose (2018)Google Scholar
  13. 13.
    Mehta, D., et al.: VNect: real-time 3D human pose estimation with a single RGB camera. ACM TOG 36(4), 1–14 (2017)CrossRefGoogle Scholar
  14. 14.
    Moreno-Noguer, F.: 3D human pose estimation from a single image via distance matrix regression. In: CVPR. IEEE (2017)Google Scholar
  15. 15.
    Pavlakos, G., Zhou, X., Daniilidis, K.: Ordinal depth supervision for 3D human pose estimation. In: CVPR. IEEE (2018)Google Scholar
  16. 16.
    Guler, R.A., Kokkinos, I.: HoloPose: holistic 3D human reconstruction in-the-wild. In: CVPR. IEEE (2019)Google Scholar
  17. 17.
    Zhang, H., Cao, J., Guo, L., Ouyang, W., Sun, Z.: DaNet: decompose-and-aggregate network for 3D human shape and pose estimation. In: ACM MM. ACM (2019)Google Scholar
  18. 18.
    Xu, Y., Zhu, S.-C., Tung, T.: DenseRaC: joint 3D pose and shape estimation by dense render-and-compare. In: ICCV. IEEE (2019)Google Scholar
  19. 19.
    Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: video inference for human body pose and shape estimation. In: CVPR. IEEE (2020)Google Scholar
  20. 20.
    Arnab, A., Doersch, C., Zisserman, A.: Exploiting temporal context for 3D human pose estimation in the wild. In: CVPR. IEEE (2019)Google Scholar
  21. 21.
    Pavlakos, G., Kolotouros, N., Daniilidis, K.: TexturePose: supervising human mesh estimation with texture consistency. In: ICCV. IEEE (2019)Google Scholar
  22. 22.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). Scholar
  23. 23.
    Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3d human pose. In: CVPR. IEEE (2017)Google Scholar
  24. 24.
    Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR. IEEE (2016)Google Scholar
  25. 25.
    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR. IEEE (2014)Google Scholar
  26. 26.
    Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: ICCV. IEEE (2017)Google Scholar
  27. 27.
    Tan, V., Budvytis, I., Cipolla, R.: Indirect deep structured learning for 3D human body shape and pose prediction. In: BMVC. BMVA Press (2017)Google Scholar
  28. 28.
    Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: CVPR. IEEE (2018)Google Scholar
  29. 29.
    Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: 3DV. IEEE (2018)Google Scholar
  30. 30.
    Yao, P., Fang, Z., Wu, F., Feng, Y., Li., J.: DenseBody: directly regressing dense 3D human pose and shape from a single color image. arXiv preprint arXiv:1903.10153 (2019)
  31. 31.
    Varol, G., et al.: BodyNet: volumetric inference of 3D human body shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 20–38. Springer, Cham (2018). Scholar
  32. 32.
    Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: CVPR. IEEE (2017)Google Scholar
  33. 33.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV 61(1), 55–79 (2005)CrossRefGoogle Scholar
  34. 34.
    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE T-PAMI 35(12), 2878–2890 (2012)CrossRefGoogle Scholar
  35. 35.
    Cai, Y., et al.: Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. In: ICCV. IEEE (2019)Google Scholar
  36. 36.
    Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing network structure for 3D human pose estimation. In: ICCV. IEEE (2019)Google Scholar
  37. 37.
    Aksan, E., Kaufmann, M., Hilliges, O.: Structured prediction helps 3D human motion modelling. In: ICCV, pp. 7144–7153. IEEE (2019)Google Scholar
  38. 38.
    Tang, W., Ying, W.: Does learning specific features for related parts help human pose estimation? In: CVPR. IEEE (2019)Google Scholar
  39. 39.
    Fang, H.-S., Xu, Y.,Wang, W., Liu, X., Zhu, S.-C.: Learning pose grammar to encode human body configuration for 3D pose estimation. In: AAAI. AAAI (2018)Google Scholar
  40. 40.
    Isack, H., et al.: RePose: learning deep kinematic priors for fast human pose estimation. arXiv preprint arXiv:2002.03933 (2020)
  41. 41.
    Zhou, X., Sun, X., Zhang, W., Liang, S., Wei, Y.: Deep kinematic pose regression. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 186–201. Springer, Cham (2016). Scholar
  42. 42.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. IEEE (2016)Google Scholar
  43. 43.
    Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR. IEEE (2019)Google Scholar
  44. 44.
    Kingma, D.P., Welling. M.: Auto-encoding variational Bayes. In: ICLR (2014)Google Scholar
  45. 45.
    Loper, M., Mahmood, N., Black, M.J.: MoSh: motion and shape capture from sparse markers. ACM TOG 33(6), 1–13 (2014)CrossRefGoogle Scholar
  46. 46.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC. BMVA Press (2010)Google Scholar
  47. 47.
    Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: CVPR. IEEE (2011)Google Scholar
  48. 48.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR. IEEE (2014)Google Scholar
  49. 49.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  50. 50.
    Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE T-PAMI 36(7), 1325–1339 (2013)CrossRefGoogle Scholar
  51. 51.
    Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 3DV. IEEE (2017)Google Scholar
  52. 52.
    Sárándi, I., Linder, T., Arras, K.O., Leibe, B.: How robust is 3D human pose estimation to occlusion? arXiv preprint arXiv:1808.09316 (2018)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.United Imaging IntelligenceCambridgeUSA
  2. 2.George Mason UniversityFairfaxUSA

Personalised recommendations