Skip to main content
Log in

SDM3d: shape decomposition of multiple geometric priors for 3D pose estimation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Recovering the 3D human pose from a single image with 2D joints is a challenging task in computer vision applications. The sparse representation (SR) model has been successfully adopted in 3D pose estimation approaches. However, since existing available training 3D data are often collected in a constrained environment (i.e., indoor) with limited diversity of subjects and actions, most SR-based approaches would have a lower generalization to real-world scenarios that may contain more complex cases. To alleviate this issue, this paper proposes SDM3d, a novel shape decomposition using multiple geometric priors for 3D pose estimation. SDM3d makes a new attempt by separating a 3D pose into the global structure and body deformations that are encoded explicitly via different priors constraints. Furthermore, a joint learning strategy is designed to learn two over-complete dictionaries from training data to capture more geometric priors information. We have evaluated SDM3d on four well-recognized benchmarks, i.e., Human3.6M, HumanEva-I, CMU MoCap, and MPII. The experiment results show the effectiveness of SDM3d.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Agudo A, Moreno-Noguer F (2017) Force-based representation for non-rigid shape and elastic model estimation. IEEE Trans Pattern Anal Mach Intell 40(9):2137–2150

    Article  Google Scholar 

  2. Akhter I, Black MJ (2015) Pose-conditioned joint angle limits for 3D human pose reconstruction. In: Computer vision and pattern recognition, pp 1446–1455

  3. Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: Computer vision and pattern recognition, pp 3686–3693

  4. Bo L, Sminchisescu C (2010) Twin gaussian processes for structured prediction. Int J Comput Vis 87(1–2):28–52

    Article  Google Scholar 

  5. Bogo F, Kanazawa A, Lassner C, Gehler P, Romero J, Black MJ (2016) Keep it smpl: automatic estimation of 3D human pose and shape from a single image. In: European conference on computer vision, pp 561–578

  6. Boumal N, Mishra B, Absil PA, Sepulchre R (2013) Manopt, a matlab toolbox for optimization on manifolds. J Mach Learn Res 15(1):1455–1459

    MATH  Google Scholar 

  7. Boyd SP, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn Arch 3(1):1–122

    MATH  Google Scholar 

  8. Candes EJ, Tao T (2006) Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans Inform Theory 52(12):5406–5425

    Article  MathSciNet  Google Scholar 

  9. Cao W, Yang Z, Ren X, Lyu L, Zhang B, Zhang Y, Wu E (2019) An improved solution for deformation simulation of nonorthotropic geometric models. Comput Anim Virtual Worlds 31:e1915

    Google Scholar 

  10. Chen CH, Ramanan D (2017) 3D human pose estimation = 2D pose estimation+ matching. In: Computer vision and pattern recognition, pp 5759–5767

  11. Chen W, Wang H, Li Y, Su H, Wang Z, Tu C, Lischinski D, Cohen-Or D, Chen B (2016) Synthesizing training images for boosting human 3D pose estimation. In: International conference on 3d vision, pp 479–488

  12. CMU (2014) Mocap: Carnegie mellon university motion capture database. http://mocap.cs.cmu.edu/

  13. Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models-their training and application. Comput Vis Image Underst 61(1):38–59

    Article  Google Scholar 

  14. Dai Y, Li H, He M (2012) A simple prior-free method for non-rigid structure-from-motion factorization. In: CVPR, pp 2018–2025

  15. Du Y, Wong Y, Liu Y, Han F, Gui Y, Wang Z, Kankanhalli M, Geng W (2016) Marker-less 3D human motion capture with monocular image sequence and height-maps. In: European conference on computer vision, pp 20–36

  16. Ehlers K, Brama K (2016) A human-robot interaction interface for mobile and stationary robots based on real-time 3d human body and hand-finger pose estimation. In: IEEE international conference on emerging technologies and factory automation, pp 1–6

  17. Fan X, Zheng K, Zhou Y, Wang S (2014) Pose locality constrained representation for 3D human pose reconstruction. In: European conference on computer vision, pp 174–188

  18. Hachiuma R, Saito H (2016) Recognition and pose estimation of primitive shapes from depth images for spatial augmented reality. In: 2016 IEEE 2nd workshop on everyday virtual reality, pp 32–35

  19. Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339

    Article  Google Scholar 

  20. Jiang H (2010) 3D human pose reconstruction using millions of exemplars. In: International conference on pattern recognition, pp 1674–1677

  21. Jiang M, Yu Z, Zhang Y, Wang Q, Li C, Lei Y (2019) Reweighted sparse representation with residual compensation for 3d human pose estimation from a single rgb image. Neurocomputing 358(C):332–343

    Article  Google Scholar 

  22. Katircioglu I, Tekin B, Salzmann M, Lepetit V, Fua P (2018) Learning latent representations of 3D human pose with deep neural networks. Int J Comput Vis 126(12):1–16

    Article  Google Scholar 

  23. Kostrikov I, Gall J (2014) Depth sweep regression forests for estimating 3D human pose from images. In: British machine vision conference, pp 1–13

  24. Lawrence ND, Moore AJ (2007) Hierarchical gaussian process latent variable models. In: International conference on machine learning, pp 481–488

  25. Li S, Zhang W, Chan AB (2017) Maximum-margin structured learning with deep networks for 3D human pose estimation. Int J Comput Vis 122(1):149–168

    Article  MathSciNet  Google Scholar 

  26. Lin M, Liang L, Liang X, Wang K, Hui C, Lin M, Liang L, Liang X, Wang K, Hui C (2017) Recurrent 3D pose sequence machines. In: Computer vision and pattern recognition, pp 5543–5552

  27. Liu Z, Song X, Tang Z (2015) Fusing hierarchical multi-scale local binary patterns and virtual mirror samples to perform face recognition. Neural Comput Appl 26(8):2013–2026

    Article  Google Scholar 

  28. Lv Z (2019) Robust3d: a robust 3d face reconstruction application. In: Neural computing and applications, pp 1–8

  29. Martinez J, Hossain R, Romero J, Little JJ (2017) A simple yet effective baseline for 3D human pose estimation. In: International conference on computer vision, pp 2659–2668

  30. Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu W, Theobalt C (2017) Monocular 3d human pose estimation in the wild using improved cnn supervision, pp 506–516

  31. Morenonoguer F (2017) 3D human pose estimation from a single image via distance matrix regression. In: Computer vision and pattern recognition, pp 1561–1570

  32. Morozov AA, Sushkova OS, Polupanov AF (2017) Object-oriented logic programming of 3d intelligent video surveillance: the problem statement. In: IEEE 26th international symposium on industrial electronics, pp 1631–1636

  33. Nesterov Yu (2013) Gradient methods for minimizing composite functions. Math Program 140(1):125–161

    Article  MathSciNet  Google Scholar 

  34. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, pp 483–499

  35. Olshausen BA, Field DJ (1997) Sparse coding with an overcomplete basis set: a strategy employed by v1? Vis Res 37(23):3311–3325

    Article  Google Scholar 

  36. Park D, Ramanan D (2015) Articulated pose estimation with tiny synthetic videos. In: Computer vision and pattern recognition workshops, pp 58–66

  37. Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Computer vision and pattern recognition, pp 1263–1272

  38. Pishchulin L, Jain A, Andriluka M, Thormählen T, Schiele B (2012) Articulated people detection and pose estimation: reshaping the future. In: Computer vision and pattern recognition, pp 3178–3185

  39. Radwan I, Dhall A, Goecke R (2013) Monocular image 3D human pose estimation under self-occlusion. In: International conference on computer vision, pp 1888–1895

  40. Ramakrishna V, Kanade T, Sheikh Y (2012) Reconstructing 3D human pose from 2D image landmarks. In: European conference on computer vision, pp 573–586

  41. Sanzari M, Ntouskos V, Pirri F (2016) Bayesian image based 3D pose estimation. In: European conference on computer vision, pp 566–582

  42. Sarafianos N, Boteanu B, Ionescu B, Kakadiaris IA (2016) 3D human pose estimation: a review of the literature and analysis of covariates. Comput Vis Image Underst 152:1–20

    Article  Google Scholar 

  43. Sedai S, Bennamoun M, Huynh DQ (2013) Discriminative fusion of shape and appearance features for human pose estimation. Pattern Recognit 46(12):3223–3237

    Article  Google Scholar 

  44. Shao Y, Nong S, Gao C, Li M (2018) Spatial and class structure regularized sparse representation graph for semi-supervised hyperspectral image classification. Pattern Recognit 81:102–114

    Article  Google Scholar 

  45. Sigal L, Black MJ (2006) Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion. Int J Comput Vis 87(1–2):4–27

    Google Scholar 

  46. Sigal L, Memisevic R, Fleet DJ (2009) Shared kernel information embedding for discriminative inference. In: Computer vision and pattern recognition, pp 2852–2859

  47. Simo-Serra E, Quattoni A, Torras C, Moreno-Noguer F (2013) A joint model for 2D and 3D pose estimation from a single image. In: Computer vision and pattern recognition, pp 3634–3641

  48. Simo-Serra E, Ramisa A, Alenyà G, Torras C (2012) Single image 3D human pose estimation from noisy observations. In: Computer vision and pattern recognition, pp 2673–2680

  49. Sminchisescu C, Jepson A (2004) Generative modeling for continuous non-linearly embedded visual inference. In: International conference on machine learning

  50. Tekin B, Katircioglu I, Salzmann M, Lepetit V, Fua P (2016) Structured prediction of 3D human pose with deep neural networks. arXiv:1605.05180

  51. Tekin B, Rozantsev A, Lepetit V, Fua P (2016) Direct prediction of 3D body poses from motion compensated sequences. In: Computer vision and pattern recognition, pp 991–1000

  52. Varol G, Romero J, Martin X, Mahmood N, Black M, Laptev I, Schmid C (2017) Learning from synthetic humans. In: Computer vision and pattern recognition, pp 4627–4635

  53. Wang C, Wang Y, Lin Z, Yuille A (2019) Robust 3D human pose estimation from single images or video sequences. IEEE Trans Pattern Anal Mach Intell 41(5):1227–1241

    Article  Google Scholar 

  54. Wang C, Wang Y, Lin Z, Yuille AL, Gao W (2014) Robust estimation of 3D human poses from a single image. In: Computer vision and pattern recognition, pp 2369–2376

  55. Wang K, Lin L, Jiang C, Qian C, Wei P (2019) 3D human pose machines with self-supervised learning. In: IEEE transactions on pattern analysis and machine intelligence, p 1

  56. Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Computer vision and pattern recognition, pp 4724–4732

  57. Yang X, Sun Q, Wang T (2019) No-reference image quality assessment based on sparse representation. Neural Comput Appl 31(10):6643–6658

    Article  Google Scholar 

  58. Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: Computer vision and pattern recognition, pp 1385–1392

  59. Yang Z, Tang L, Zhang K, Wong PK (2018) Multi-view cnn feature aggregation with elm auto-encoder for 3d shape recognition. Cognit Comput 10(6):908–921

    Article  Google Scholar 

  60. Yasin H, Iqbal U, Krüger B, Weber A, Gall J (2016) A dual-source approach for 3D pose estimation from a single image. In: Computer vision and pattern recognition, pp 4948–4956

  61. Zeng S, Gou J, Yang X (2018) Improving sparsity of coefficients for robust sparse and collaborative representation-based image classification. Neural Comput Appl 30(10):2965–2978

    Article  Google Scholar 

  62. Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: which helps face recognition? In: International conference on computer vision, pp 471–478

  63. Zhou X, Huang Q, Sun X, Xue X, Wei Y (2017) Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: International conference on computer vision, pp 398–407

  64. Zhou X, Leonardos S, Hu X, Daniilidis K (2015) 3D shape estimation from 2d landmarks: a convex relaxation approach. In: Computer vision and pattern recognition, pp 4447–4455

  65. Zhou X, Sun X, Zhang W, Liang S, Wei Y (2016) Deep kinematic pose regression. In: European conference on computer vision, pp 186–201

  66. Zhou X, Zhu M, Leonardos S, Daniilidis K (2017) Sparse representation for 3D shape estimation: a convex relaxation approach. IEEE Trans Pattern Anal Mach Intell 39(8):1648–1661

    Article  Google Scholar 

  67. Zhou X, Zhu M, Leonardos S, Derpanis KG, Daniilidis K (2016) Sparseness meets deepness: 3D human pose estimation from monocular video. In: Computer vision and pattern recognition, pp 4966–4975

  68. Zhou X, Zhu M, Pavlakos G, Leonardos S, Derpanis KG, Daniilidis K (2019) Monocap: monocular human motion capture using a cnn coupled with a geometric prior. IEEE Trans Pattern Anal Mach Intell 41(4):901–914

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the National Nature Science Foundation of China (Grant No. 61671397).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunqi Lei.

Ethics declarations

Conflict of interest

No conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, M., Yu, Z., Li, C. et al. SDM3d: shape decomposition of multiple geometric priors for 3D pose estimation. Neural Comput & Applic 33, 2165–2181 (2021). https://doi.org/10.1007/s00521-020-05086-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05086-0

Keywords

Navigation