Skip to main content
Log in

Multi-task neural network with physical constraint for real-time multi-person 3D pose estimation from monocular camera

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

3D human pose estimation has many important applications in human-computer interaction and human action recognition. Simultaneously achieving real-time speed, varying human number, and high accuracy from a single RGB image is a challenging problem. To this end, this paper proposes a multi-task and multi-level neural network structure with physical constraint. The unique network structure estimates 3D human poses from single RGB image in an end-to-end way and achieves both high accuracy and high speed. Experimental results shows that the proposed system achieves 21 fps on RTX 2080 GPU with only 33 mm accuracy loss compared with conventional works. The mechanism of the network is also analyzed through network visualization. This work shows the possibility of estimating 3D human pose from a single RGB monocular camera with real-time speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Abdulla W (2017) Mask r-cnn for object detection and instance segmentation on keras and tensorflow

  2. Blumenthal-Barby DC, Eisert P (2014) High-resolution depth for binocular image-based modeling. Comput Graph 39:89–100

    Article  Google Scholar 

  3. Cao Z, Hidalgo G, Simon T, Wei S-E, Sheikh Y (2018) OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv:1812.08008

  4. Cao S, Lu W, Xu Q (2016) Deep neural networks for learning graph representations. In: AAAI conference on artificial intelligence (AAAI)

  5. Chen X, Yuille AL (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in Neural Information Processing Systems, vol 27, Curran Associates Inc., pp 1736–1744

  6. Chen C-H, Tyagi A, Agrawal A, Drover D, MV R, Stojanov S, Rehg JM (2019) Unsupervised 3d pose estimation with geometric self-supervision. arXiv:1904.04812

  7. Cheng B, Xiao B, Wang J, Shi H, S Huang T, Zhang L (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In: International conference on computer vision and pattern recognition (CVPR), pp 5386–5395

  8. Cheng Y, Yang B, Wang B, Yan W, Tan RT (2019) Occlusion-aware networks for 3d human pose estimation in video. In: International conference on computer vision and pattern recognition (CVPR), pp 723–732

  9. Drennan M (2010) An implementation of camera calibration algorithms. Clemson University

  10. Fang H-S, Xie S, Tai Y-W, Lu C (2017) RMPE: Regional multi-person pose estimation. In: International conference on computer vision (ICCV)

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: International conference on computer vision and pattern recognition (CVPR), pp 770–778

  12. Kocabas M, Karagoz S, Akbas E (2019) Self-supervised learning of 3d human pose using multi-view geometry. In: International conference on computer vision and pattern recognition (CVPR)

  13. Li Z, Wang X, Wang F, Jiang P (2019) On boosting single-frame 3d human pose estimation via monocular videos. In: International conference on computer vision and pattern recognition (CVPR), pp 2192–2201

  14. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: International conference on computer vision and pattern recognition (CVPR)

  15. Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) SMPL: A skinned multi-person linear model. ACM Trans Graphics 34 (6):248:1–248:16

    Article  Google Scholar 

  16. Luo D, Du S, Ikenaga T (2019) End-to-end feature pyramid network for real-timemulti-person pose estimation. In: International conference on machine vision applications (MVA)

  17. Luo D, Du S, Ikenaga T (2019) Multi-task and multi-level detection neural network based real-time 3d pose estimation. In: Asia-pacific signal and information processing association annual summit and conference (APSIPA ASC), pp 1427–1434

  18. Martinez J, Hossain R, Romero J, Little JJ (2017) A simple yet effective baseline for 3d human pose estimation. In: International conference on computer vision (ICCV)

  19. Mehta D, Sotnychenko O, Mueller F, Xu W, Sridhar S, Pons-Moll G, Theobalt C (2018) Single-shot multi-person 3d pose estimation from monocular rgb. In: 2018 international conference on 3D vision (3DV). IEEE, pp 120–130

  20. Nie X, Feng J, Xing J, Yan S (2018) Pose partition networks for multi-person pose estimation. In: Europeon conference on computer vision (ECCV), pp 684–699

  21. Omran M, Lassner C, Pons-Moll G, Gehler P, Schiele B (2018) Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In: International conference on 3D vision (3DV), pp 484–494

  22. Pavllo D, Feichtenhofer C, Grangier D, Auli M (2019) 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: International conference on computer vision and pattern recognition (CVPR)

  23. Redmon J, Farhadi A (2016) Yolo9000: Better, faster, stronger. In: International conference on computer vision and pattern recognition (CVPR)

  24. Rogez G, Weinzaepfel P, Schmid C (2017) Lcr-net: Localization-classification-regression for human pose. In: International conference on computer vision and pattern recognition (CVPR), pp 3433–3441

  25. Sharifi A, Harati A, Vahedian A (2014) Marker based human pose estimation using annealed particle swarm optimization with search space partitioning. In: International conference on computer and knowledge engineering (ICCKE), pp 135–140

  26. Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: International conference on computer vision and pattern recognition (CVPR)

  27. Vatahska T, Bennewitz M, Behnke S (2007) Feature-based head pose estimation from images. In: IEEE-RAS international conference on humanoid robots, pp 330–335

  28. Xiu Y, Li J, Wang H, Fang Y, Lu C (2018) Pose Flow: Efficient online pose tracking. In: British machine vision conference (BMVC)

  29. Xu J, Yu Z, Ni B, Yang J, Yang X, Zhang W (2020) Deep kinematics analysis for monocular 3d human pose estimation. In: International conference on computer vision and pattern recognition (CVPR), pp 899–908

  30. Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE MultiMed 19(2):4–10

    Article  Google Scholar 

  31. Zhang Z, Wang C, Qin W, Zeng W (2020) Fusing wearable imus with multi-view images for human pose estimation: A geometric approach. In: International conference on computer vision and pattern recognition (CVPR), pp 2200–2209

  32. Zhou X, Zhu M, Leonardos S, Derpanis KG, Daniilidis K (2016) Sparseness meets deepness: 3d human pose estimation from monocular video. In: International conference on computer vision and pattern recognition (CVPR), pp 4966–4975

  33. Zhu D-X (2010) Binocular vision-slam using improved sift algorithm. In: International workshop on intelligent systems and applications, pp 1–4

Download references

Acknowledgements

This work was jointly supported by the Waseda University Grant for Special Research Projects under grants 2020C-657 and 2020R-040, the National Natural Science Foundation of China under grant 62001110, the Natural Science Foundation of Jiangsu Province under grant BK20200353, the Guangdong Basic and Applied Basic Research Foundation under grant 2020A1515110145, the Shenzhen Science and Technology Program under grant RCBS20200714114858072, the 111 Project under grant B17040, and the Fundamental Research Funds for the Central Universities under grant 2242021R10115.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songlin Du.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, D., Du, S. & Ikenaga, T. Multi-task neural network with physical constraint for real-time multi-person 3D pose estimation from monocular camera. Multimed Tools Appl 80, 27223–27244 (2021). https://doi.org/10.1007/s11042-021-10982-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10982-1

Keywords

Navigation