Skip to main content

Advertisement

Log in

A comprehensive survey on human pose estimation approaches

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

The human pose estimation is a significant issue that has been taken into consideration in the computer vision network for recent decades. It is a vital advance toward understanding individuals in videos and still images. In simple terms, a human pose estimation model takes in an image or video and estimates the position of a person’s skeletal joints in either 2D or 3D space. Several studies on human posture estimation can be found in the literature, however, they center around a specific class; for instance, model-based methodologies or human movement investigation, and so on. Later, various Deep Learning (DL) algorithms came into existence to overcome the difficulties which were there in the earlier approaches. In this study, an exhaustive review of human pose estimation (HPE), including milestone work and recent advancements is carried out. This survey discusses the different two-dimensional (2D) and three-dimensional human (3D) pose estimation techniques along with their classical and deep learning approaches which provide the solution to the various computer vision problems. Moreover, the paper also considers the different deep learning models used in pose estimation, and the analysis of 2D and 3D datasets is done. Some of the evaluation metrics used for estimating human poses are also discussed here. By knowing the direction of the individuals, HPE opens a road for a few real-life applications some of which are talked about in this study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig.15

Similar content being viewed by others

References

  1. Chen, Y., Tian, Y., He, M.: Monocular human pose estimation: a survey of deep learning-based methods. Comput. Vis. Image Underst. (2020). https://doi.org/10.1016/j.cviu.2019.102897

    Article  Google Scholar 

  2. Szczuko, P.: Deep neural networks for human pose estimation from a very low resolution depth image. Multimed. Tools Appl. 78, 1–21 (2019). https://doi.org/10.1007/s11042-019-7433-7

    Article  Google Scholar 

  3. Liu, Y., Xu, Y., Li, S.: 2-D Human Pose Estimation from Images Based on Deep Learning: A Review," 2018 2nd IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Xi'an, China. 462-465. (2018). https://doi.org/10.1109/IMCEC.2018.8469573

  4. Chen, C., Wang, T., Li, D., Hong, J.: Repetitive assembly action recognition based on object detection and pose estimation. J. Manuf. Syst. 55, 325–333 (2020). https://doi.org/10.1016/j.jmsy.2020.04.018

    Article  Google Scholar 

  5. Silva, D., Varges, M., Marana, A.N.: "Human action recognition in videos based on spatiotemporal features and bag-of-poses. Appl. Soft Comput. 95, 106513 (2020). https://doi.org/10.1016/j.asoc.2020.106513

    Article  Google Scholar 

  6. Ordóñez, F., Roggen, D.: Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16, 1–25 (2016). https://doi.org/10.3390/s16010115

    Article  Google Scholar 

  7. Christian, S., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision." In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818–2826, (2016). https://doi.org/10.1109/CVPR.2016.308

  8. Chen, K., Paolo Gabriel, Alasfour, A., Gong, C., Doyle, W.K., Devinsky, O., Friedman, D., et al.: Patient-specific pose estimation in clinical environments. IEEE J. Transl. Eng. Health Med. 6, 1–11 (2018). https://doi.org/10.1109/JTEHM.2018.2875464

    Article  Google Scholar 

  9. Islam, M.J., Mo J., Sattar. J.: Robot-to-robot relative pose estimation using humans as markers. arXiv preprint arXiv:1903.00820 (2019).

  10. Zimmermann, C., Tim, W., Christian, D., Wolfram, B., and Thomas, B.: 3d human pose estimation in Rgbd images for robotic task learning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1986–1992. IEEE (2018). https://doi.org/10.1109/ICRA.2018.8462833

  11. Vasileiadis, M., Sotiris, M., Dimitrios, G., Christos-Savvas, B., Dimitrios, T.: "Robust human pose tracking for realistic service robot applications." In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1363–1372. (2017). https://doi.org/10.1109/ICCVW.2017.162

  12. Preim, B., Meuschke, M.: A survey of medical animations. Comput. Graph. 90, 145–168 (2020). https://doi.org/10.1016/j.cag.2020.06.003

    Article  Google Scholar 

  13. Kumarapu, L., Mukherjee, P.: “AnimePose: Multi-person 3D pose estimation and animation”, arXiv:2002.02792v1, pp 1–5, (2020). https://doi.org/10.1016/j.patrec.2021.03.028

  14. Tiwari, M.M., Tiwari, M.T., Rajendran, G., Suson, R.: Deep learning approach for generating 2D pose estimation from video for motion capture animation. Int. J. Future Gener. Commun. Netw. 13(2), 1556–1561 (2020)

    Google Scholar 

  15. Casado García, F., Luis, Y.,Pérez Losada, D., Santana Alonso, A.: “Pose estimation and object tracking using 2D images”, In 2017-27th International Conference on Flexible Automation and Intelligent Manufacturing, Modena, Italy, (2017). https://doi.org/10.1016/j.promfg.2017.07.134

  16. Cleetus, A.: Real-time multiple human pose estimation for animations in game engines. Int. Res. J. Eng. Technol. (IRJET) 7(5), 7923–7928 (2020)

    Google Scholar 

  17. https://mobidev.biz/blog/human-pose-estimation-ai-personal-fitness-coach. Accessed 2 Sept 2021

  18. https://viso.ai/deep-learning/pose-estimation-ultimate-overview/. Accessed 14 Aug 2021

  19. Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vision 61, 55–79 (2005). https://doi.org/10.1023/B:VISI.0000042934.15159.49

    Article  Google Scholar 

  20. Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. C22, 67–92 (1973). https://doi.org/10.1109/T-C.1973.223602

    Article  Google Scholar 

  21. Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2878–2890 (2013). https://doi.org/10.1109/TPAMI.2012.261

    Article  Google Scholar 

  22. Wu, J., Christopher, G., James M.R.: "Real-time human detection using contour cues." In 2011 IEEE international conference on robotics and automation, pp. 860–867. IEEE, (2011). https://doi.org/10.1109/ICRA.2011.5980437

  23. Micilotta, AS., Eng-Jon, O., Richard, B.: "Real-time upper body detection and 3D pose estimation in monoscopic images." In European Conference on Computer Vision, pp. 139–150. Springer, Berlin, Heidelberg, (2006). https://doi.org/10.1007/11744078_11

  24. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  25. https://neuralet.com/article/human-pose-estimation-with-deep-learning-part-i/. Accessed 15 Sept 2021

  26. Munea, T.L., Jembre, Y.Z., Weldegebriel, H.T., Chen, L., Huang, C., Yang, C.: The progress of human pose estimation: a survey and taxonomy of models applied in 2D human pose estimation. IEEE Access 8, 133330–133348 (2020). https://doi.org/10.1109/ACCESS.2020.3010248

    Article  Google Scholar 

  27. Rogez, G., Weinzaepfel, P., Schmid, C.: Lcr-net: localization-classification-regression for human pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3433–3441. (2017)

  28. Toshev, A., Szegedy, DC.: Human pose estimation via deep neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, WI, USA, pp. 1653–1660. (2014).

  29. Wei, S., Ramakrishna, V., Kanade, T., Sheikh, Y.: "Convolutional Pose Machines," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 4724–4732. (2016). https://doi.org/10.1109/CVPR.2016.511

  30. Onishi, K., Takiguchi, T., Ariki, Y.: "3D human posture estimation using the HOG features from monocular image." In 2008 19th International Conference on Pattern Recognition, pp. 1–4. IEEE. (2008). DOI:https://doi.org/10.1109/ICPR.2008.4761608

  31. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation, European Conference on Computer Vision (ECCV) (2016) 483–499 (2016).

  32. Bulat, A., Georgios, T.: "Human pose estimation via convolutional part heatmap regression." In European Conference on Computer Vision, pp. 717–732. Springer, Cham, (2016)

  33. Luo, Z., Zhicheng, W., Yan, H., Liang, W., Tieniu, T., Erjin, Z.. "Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13264–13273. (2021).

  34. Chen, X., Yuille, A. L.: Articulated pose estimation by a graphical model with image dependent pairwise relations, in Advances in Neural Information Processing Systems, pp. 1736–1744. (2014).

  35. Andriluka, M., Iqbal, U., Insafutdinov, E., Pishchulin, L., Milan, A., Gall, J., Schiele, B.: “PoseTrack: A Benchmark for Human Pose Estimation and Tracking”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5167–5176. (2018). https://doi.org/10.1109/CVPR.2018.00542

  36. Dang, Qi., Yin, J., Wang, B., Zheng, W.: Deep learning based 2D human pose estimation: a survey. Tsinghua Sci. Technol. 24, 663–676 (2019). https://doi.org/10.26599/TST.2018.9010100

    Article  Google Scholar 

  37. Papandreou, G., Tyler, Z., Nori, K., Alexander, T., Jonathan, T., Chris, B., Kevin M.: Towards accurate multi-person pose estimation in the wild." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4903–4911. (2017). https://doi.org/10.1109/CVPR.2017.395

  38. Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.:"Cascaded pyramid network for multi-person pose estimation." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7103–7112. (2018). https://doi.org/10.1109/CVPR.2018.00742

  39. Gamra, M.B., Akhloufi, M.A.: A review of deep learning techniques for 2D and 3D human pose estimation. Image Vis. Comput. (2021). https://doi.org/10.1016/j.imavis.104282

    Article  Google Scholar 

  40. Rodrigues, N., Torres, H.D.R., Oliveira, B., Borges, J., Queirós, S.F.M., Mendes, J.A., Fonseca, J.C., Coelho, V., Brito, J.H.: Top-down human pose estimation with depth images and domain adaptation. SCITEPRESS (2019)

    Book  Google Scholar 

  41. Kocabas, M., Karagoz, S., Akbas, E.: "Multiposenet: Fast multi-person pose estimation using pose residual network." In Proceedings of the European conference on computer vision (ECCV), pp. 417–433. (2018). https://doi.org/10.1007/978-3-030-01252-6_26

  42. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: "Deepercut: A deeper, stronger, and faster multi-person pose estimation model." In European Conference on Computer Vision, pp. 34–50. Springer, Cham, (2016). https://doi.org/10.1007/978-3-319-46466-4_3

  43. Zheng, C., Wu, W., Yang, T., Zhu, S., Chen, C., Liu, R., Shen, J., Kehtarnavaz, N., Shah, M.: "Deep learning-based human pose estimation: A survey." arXiv preprint arXiv:2012.13392 (2020).

  44. Cao, Z., Simon, T., Wei, S. E., Sheikh, Y.: "OpenPose:Realtime multi-person 2d pose estimation using part affinity fields." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7291–7299. (2017). https://doi.org/10.1109/CVPR.2017.143

  45. Fang, H. S., Xie, S., Tai, Y. W., Lu, C.: "Rmpe: Regional multi-person pose estimation." In Proceedings of the IEEE international conference on computer vision, pp. 2334–2343. (2017). https://doi.org/10.1109/ICCV.2017.256

  46. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: "Efficient object localization using convolutional networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 648–656. (2015). https://doi.org/10.1109/CVPR.2015.7298664

  47. Sun, K., Xiao, B., Liu, D., Wang, J.,: "Deep high-resolution representation learning for human pose estimation." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703. (2019). https://doi.org/10.1109/CVPR.2019.00584

  48. Osokin, D.: "Real-time 2d multi-person pose estimation on CPU: Lightweight OpenPose." arXiv preprint arXiv:1811.12004 (2018).

  49. Tang, W., Yu, P., Wu, Y.: "Deeply learned compositional models for human pose estimation." In Proceedings of the European conference on computer vision (ECCV), pp. 190–206. (2018). https://doi.org/10.1007/978-3-030-01219-9_12

  50. Mao, W., Ge, Y., Shen, C., Tian, Z., Wang, X.: "Tfpose: Direct human pose estimation with transformers." arXiv preprint arXiv:2103.15320 (2021).

  51. Jain, A., Tompson, J., LeCun, Y., Bregler, C.: "Modeep: A deep learning framework using motion features for human pose estimation." In: Asian conference on computer vision, pp. 302–315. Springer, Cham. (2014). https://doi.org/10.1007/978-3-319-16808-1_21

  52. Alzughaibi, A., Chaczko, Z.: "Human detection model using feature extraction method in video frames," 2016 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1-6. (2016) https://doi.org/10.1109/IVCNZ.2016.7804424

  53. https://mobidev.biz/wp-content/uploads/2020/07/3d-keypoints-human-pose-estimation.png. Accessed 20 Aug 2021

  54. Hanguen, K., Lee, S., Lee, D., Choi, S., Ju, J., Myung, H.: Real- time human pose estimation and gesture recognition from depth images using superpixels and SVM classifier. Sensors (Basel) (2015). https://doi.org/10.3390/s150612410

    Article  Google Scholar 

  55. Chen, K., Gong, S., Xiang, T.: “Human pose estimation using structural support vector machines”, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, pp. 846–851. (2011). https://doi.org/10.1109/ICCVW.2011.6130340

  56. Hallquist, A., Zakhor, A.: "Single view pose estimation of mobile devices in urban environments." In 2013 IEEE Workshop on Applications of Computer Vision (WACV), pp. 347–354. IEEE, (2013).

  57. Fei, X., Wang, H., Cheong, L. L., Zeng, X., Wang, M., Tighe, J.: "Single View Physical Distance Estimation using Human Pose." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12406–12416. (2021)

  58. Wang, J., Tan, S., Zhen, X., Xu, S., Zheng, F., He, Z., Shao, L.: Deep 3D human pose estimation: a review. Comput. Vis. Image Underst. (2021). https://doi.org/10.1016/j.cviu.2021.103225

    Article  Google Scholar 

  59. https://www.kdnuggets.com/. Accessed 30 Aug 2021

  60. He, K., Gkioxari, G., Dollár, P., Girshick, R.: "Mask r-cnn." In Proceedings of the IEEE international conference on computer vision, pp. 2961–2969. (2017). https://doi.org/10.1109/ICCV.2017.322

  61. Su, J.-Y., Cheng, S.-C., Chang, C.-C., Chen, J.-M.: Model-based 3D pose estimation of a single rgb image using a deep viewpoint classification neural network. Appl. Sci. 9(12), 2478 (2019). https://doi.org/10.3390/app9122478

    Article  Google Scholar 

  62. Kostrikov, I., Gall, J.: Depth sweep regression forests for estimating 3D human pose from images. BMVC 1(2), 5 (2014). https://doi.org/10.5244/C.28.80

    Article  Google Scholar 

  63. Benzine, A., Chabot, F., Luvison, B., Pham, Q. C., Achard, C.: "Pandanet: Anchor-based single-shot multi-person 3d pose estimation." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6856–6865. (2020).

  64. Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., Theobalt, C.: "Single-shot multi-person 3d body pose estimation from monocular rgb input." arXiv preprint arXiv:1712.03453 (2017).

  65. Li, S., Chan, A.B.: "3d human pose estimation from monocular images with deep convolutional neural network." In Asian Conference on Computer Vision, pp. 332–347. Springer, Cham. (2014). https://doi.org/10.1007/978-3-319-16808-1_23

  66. Deng, Y., Sun, Y., Zhu, J.: "SVMA: A GAN-based model for Monocular 3D Human Pose Estimation." arXiv preprint arXiv:2106.05616 (2021).

  67. Miura, T., Sako, S.: 3D human pose estimation model using location-maps for distorted and disconnected images by a wearable omnidirectional camera. IPSJ Trans. Comput. Vis. Appl. 12(1), 1–17 (2020). https://doi.org/10.1186/s41074-020-00066-8

    Article  Google Scholar 

  68. Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S., "3D Pictorial Structures for Multiple Human Pose Estimation," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH. pp. 1669–1676. (2014). https://doi.org/10.1109/CVPR.2014.216

  69. Groos, D., Ramampiaro, H., Ihlen, E.A.F.: EfficientPose: scalable single-person pose estimation. Appl. Intell. 51(4), 2518–2533 (2021). https://doi.org/10.1186/s41074-020-00066-8

    Article  Google Scholar 

  70. Marin-Jimenez, M.J., Romero-Ramirez, F.J., Munoz-Salinas, R., Medina-Carnicer, R.: 3D human pose estimation from depth maps using a deep combination of poses. J. Vis. Commun. Image Represent. 55, 627–639 (2018). https://doi.org/10.1016/j.jvcir.2018.07.010

    Article  Google Scholar 

  71. Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: "3d human pose estimation in video with temporal convolutions and semi-supervised training." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7753–7762. (2019). https://doi.org/10.1109/CVPR.2019.00794

  72. Meng, Lu., Gao, H.: 3D human pose estimation based on a fully connected neural network with adversarial learning prior knowledge. Front. Phys. 9, 3 (2021). https://doi.org/10.3389/fphy.2021.629288

    Article  Google Scholar 

  73. https://inblog.in/Human-Pose-Estimation-Using-Alpha-Pose-XyPPEbNTAO. Accessed 10 Sept 2021

  74. https://analyticsindiamag.com/guide-to-openpose-for-real-time-human-pose-estimation/. Accessed 10 Sept 2021

  75. Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P. V., Schiele, B.: "Deepcut: Joint subset partition and labeling for multi person pose estimation." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4929–4937. (2016). https://doi.org/10.1109/CVPR.2016.533

  76. https://debuggercafe.com/real-time-pose-estimation-using-alphapose-pytorch-and-deep-learning/. Accessed 5 Sept 2021

  77. Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: "Human Pose Estimation with Iterative Error Feedback," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 4733–4742. (2016). https://doi.org/10.1109/CVPR.2016.512

  78. Ghafoor, M., Mahmood, A.: "Quantification of Occlusion Handling Capability of 3D Human Pose Estimation Framework." IEEE Transactions on Multimedia. (2022). DOI: https://doi.org/10.48550/arXiv.2203.04113

  79. Wu, B., Ramakant N.: "Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors." In Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, vol. 1, pp. 90–97. IEEE, (2005). https://doi.org/10.1109/ICCV.2005.74

  80. Di, Y., Manhardt, F., Wang, G., Ji, X., Navab, N., Tombari, F.: "SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12396–12405. (2021). DOI: https://doi.org/10.1109/ICCV48922.2021.01217

  81. Zhang, S., He, H., Zhang, Y., Li, X., Sang, Y.: Dynamic self-occlusion avoidance approach based on the depth image sequence of moving visual object. Math. Probl. Eng. (2016). https://doi.org/10.1155/2016/4783794

    Article  Google Scholar 

  82. Jacques, J. C., Dihl, L. L., Jung, C. R., Musse, S. R.: "Self-occlusion and 3D pose estimation in still images." In 2013 IEEE International Conference on Image Processing, pp. 2539–2543. IEEE. (2013). DOI: https://doi.org/10.1109/ICIP.2013.6738523

  83. Veld, R. M., Wijnhoven, R. G. J., Bondarev, Y.: "Detection and handling of occlusion in an object detection system." In Video Surveillance and Transportation Imaging Applications 2015, vol. 9407, pp. 184–195. SPIE. (2015). DOI: https://doi.org/10.1117/12.2077175

  84. Liu, Q., Chen, D., Chu, Q., Yuan, L., Liu, B., Zhang, L., Yu, N.: Online multi-object tracking with unsupervised re-identification learning and occlusion estimation. Neurocomputing (2022). https://doi.org/10.1016/j.neucom.2022.01.008

    Article  Google Scholar 

  85. Gu, R., Wang, G., Hwang, J. N.: "Exploring severe occlusion: multi-person 3d pose estimation with gated convolution." In 2020 25th International Conference on Pattern Recognition (ICPR), pp. 8243–8250. IEEE. (2021). DOI: https://doi.org/10.1109/ICPR48806.2021.9412107

  86. Antol, S., Lawrence Zitnick, C., Parikh, D.: "Zero-shot learning via visual abstraction." In European conference on computer vision, pp. 401–416. Springer, Cham. 2014. https://doi.org/10.1007/978-3-319-10593-2_27

  87. Jena, R.: "Out of the Box: A combined approach for handling occlusion in Human Pose Estimation." arXiv preprint arXiv:1904.11157 (2019).

  88. Cheng, Y., Yang, B., Wang, B., Yan, W., Tan, R. T.: "Occlusion-aware networks for 3d human pose estimation in video." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 723–732. (2019). DOI: https://doi.org/10.1109/ICCV.2019.00081

  89. Wang, D., Zhang, S., Hua, G.: "Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference." Advances in Neural Information Processing Systems 34 (2021).

  90. Khan, K., Albattah, W., Khan, R.U., Qamar, A.M., Nayab, D.: Advances and trends in real time visual crowd analysis. Sensors (2020). https://doi.org/10.3390/s20185073

    Article  Google Scholar 

  91. Chang, S., Yuan, L., Nie, X., Huang, Z., Zhou, Y., Chen, Y., Yan, S.: "Towards accurate human pose estimation in videos of crowded scenes." In Proceedings of the 28th ACM International Conference on Multimedia, pp. 4630–4634. (2020). DOI: https://doi.org/10.1145/3394171.3416299

  92. Chi, C., Zhang, S., Xing, J., Lei, Z., Li, S. Z., Zou, X.: "Pedhunter: Occlusion robust pedestrian detector in crowded scenes." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 10639-10646. (2020). DOI: https://doi.org/10.1609/AAAI.V34I07.6690

  93. Li, J., Wang, C., Zhu, H., Mao, Y., Fang, H. S., Lu, C.: "Crowdpose: Efficient crowded scenes pose estimation and a new benchmark." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10863–10872. (2019). DOI: https://doi.org/10.1109/CVPR.2019.01112

  94. Elons, A.S., Abol-Ela, M.: "Occlusion resolving inside public crowded scenes based on social deep learning model," 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS). pp. 218-223. (2017). DOI: https://doi.org/10.1109/INTELCIS.2017.8260050

  95. Ferrari, V., Marin-Jimenez, M., Zisserman, A., "Progressive search space reduction for human pose estimation." In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE. (2008). https://doi.org/10.1109/CVPR.2008.4587468

  96. Shafaei, A., James J.L.: “Real-Time Human Motion Capture with Multiple Depth Cameras”, Proceedings of the 13th Conference on Computer and Robot Vision. (2016). https://doi.org/10.1109/CRV.2016.25

  97. Johnson, S., Everingham, M.: “Learning Effective Human Pose Estimation from Inaccurate Annotation”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1465–1472. (2011). https://doi.org/10.1109/CVPR.2011.5995318

  98. Sapp, B., Taskar, B.: “MODEC: Multimodal Decomposable Models for Human Pose Estimation”, In 2013 IEEE Conference on Computer Vision and Pattern Recognition, NW Washington DC, United States, pp. 3674–3681. (2013). https://doi.org/10.1109/CVPR.2013.471

  99. Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. Adv. Neural. Inf. Process. Syst. 27, 1799–1807 (2014)

    Google Scholar 

  100. Charles, J., Pfister, T., Everingham, M., Zisserman, A.: Automatic and efficient human pose estimation for sign language videos. Int. J. Comput. Vision 110(1), 70–90 (2014). https://doi.org/10.1007/s11263-013-0672-6

    Article  Google Scholar 

  101. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, Microsoft coco: Common objects in context, in European Conference on Computer Vision, Zurich, Switzerland, 2014, pp. 740–755.

  102. Bin, Y., Chen, Z. M., Wei, X. S., Chen, X., Gao, C., Sang, N.: “Structure-aware Human Pose Estimation with Graph Convolutional Networks”, Vol. 106, pp.107410, Pattern Recognition. (2020). https://doi.org/10.1016/j.patcog.2020.107410

  103. Von Marcard, T., Henschel, R., Black, M. J., Rosenhahn, B., Pons-Moll, G.: "Recovering accurate 3d human pose in the wild using imus and a moving camera." In Proceedings of the European Conference on Computer Vision (ECCV), pp. 601–617. (2018).

  104. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, No. 7. (2014). https://doi.org/10.1109/TPAMI.2013.248

  105. Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko, O., Xu, W., Theobalt, C.: "Monocular 3d human pose estimation in the wild using improved cnn supervision." In 2017 international conference on 3D vision (3DV), pp. 506–516. IEEE. (2017). https://doi.org/10.1109/3DV.2017.00064

  106. Fabbri, M., Lanzi, F., Calderara, S., Palazzi, A., Vezzani, R., Cucchiara, R.: "Learning to detect and track visible and occluded body joints in a virtual world." In Proceedings of the European conference on computer vision (ECCV), pp. 430–446. (2018). https://doi.org/10.1007/978-3-030-01225-0_27

  107. Trumble, M., Gilbert, A., Malleson, C., Hilton, A., Collomosse, J.P.: Total capture: 3D human pose estimation fusing video and inertial sensors. BMVC 2(5), 1–13 (2017). https://doi.org/10.5244/C.31.14

    Article  Google Scholar 

  108. Sigal, L., Balan, A., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87, 1–2 (2010)

    Article  Google Scholar 

  109. Sigal, L., Black, M. J.: HumanEva: Synchronized video and motion capture dataset for evaluation of articulated human motion, Techniacl Report CS-06–08, Brown University. (2006).

  110. Marcard, T.V., Pons-Moll, G., Rosenhahn, B.: “Multimodal motion capture dataset TNT15”. Leibniz Univ. Hannover, Hanover, Germany, and Max Planck for Intelligent Systems, Tübingen, Germany. Tech. Rep. (2016). https://doi.org/10.13140/RG.2.1.4162.0248

    Article  Google Scholar 

  111. Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara, S., Sheikh, Y.: "Panoptic studio: A massively multiview system for social motion capture." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3334–3342. 2015. DOI: https://doi.org/10.1109/ICCV.2015.381

  112. Haque, A., Peng, B., Luo, Z., Alahi, A., Yeung, S., Fei-Fei, L.: “Towards Viewpoint Invariant 3D Human Pose Estimation”, arXiv:1603.07076v3 [cs.CV], pp.1–20. (2016). https://doi.org/10.1007/978-3-319-46448-0_10

  113. Chen, Y., Shen, C., Wei, X. S., Liu, L., Yang, J.: "Adversarial posenet: A structure-aware convolutional network for human pose estimation." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1212–1221. (2017). DOI: https://doi.org/10.48550/arXiv.1705.00389

  114. Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. CVPR (2017). https://doi.org/10.48550/arXiv.1702.07432

    Article  Google Scholar 

  115. Lifshitz, I., Fetaya, E., Ullman, S.: Human pose estimation using deep consensus voting. ECCV (2016). https://doi.org/10.1007/978-3-319-46475-6_16

    Article  Google Scholar 

  116. Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. ECCV (2016). https://doi.org/10.1007/978-3-319-46478-7_44

    Article  Google Scholar 

  117. Chou, C. J., Chien, J. T., Chen, H. T.: "Self adversarial training for human pose estimation." In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 17–30. IEEE. (2018). DOI: https://doi.org/10.48550/arXiv.1707.02439

  118. Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.-J., Yuan, J., et al.: Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. In: International conference on computer vision p. 227–2281. (2019). doi: https://doi.org/10.1109/ICCV.2019.00236 48.

  119. Wang, J., Yan, S., Xiong, Y., Lin, D.: "Motion guided 3d pose estimation from videos." In European Conference on Computer Vision, pp. 764–780. Springer, Cham. (2020). DOI: https://doi.org/10.48550/arXiv.2004.13985

  120. Ning, G., Liu, P., Fan, X., Zhang, C.: "A top-down approach to articulated human pose estimation and tracking." In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0. (2018). DOI: https://doi.org/10.1007/978-3-030-11012-3_20

  121. Yasin, H., Iqbal, U., Kruger, B., Weber, A., Gall, J.: "A dual-source approach for 3d pose estimation from a single image." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4948–4956. (2016). DOI: https://doi.org/10.1016/j.cviu.2018.03.007.

  122. Simo-Serra, E., Quattoni, A., Torras, C., Moreno-Noguer, F.: "A joint model for 2d and 3d pose estimation from a single image." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3634–3641. (2013). DOI: https://doi.org/10.1109/CVPR.2013.466

  123. Bo, L., Sminchisescu, C., Kanaujia, A., Metaxas, D.: "Fast algorithms for large scale conditional 3D prediction." In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE. (2008). DOI: https://doi.org/10.1109/CVPR.2008.4587578

  124. Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 398–407. (2017). DOI: https://doi.org/10.48550/arXiv.1704.02447

  125. Habibie, I., Xu, W., Mehta, D., Pons-Moll, G., Theobalt, C.: In the wild human pose estimation using explicit 2D features and intermediate 3D representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 10905–10914. (2019). DOI: https://doi.org/10.48550/arXiv.1904.03289

  126. Xu, Y., Zhu, S.-C., Tung, T.: DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 7760–7770. (2019). DOI: https://doi.org/10.48550/arXiv.1910.00116

  127. Wandt, B., Rosenhahn, B.: RepNet: Weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7782–7791. (2019). DOI: https://doi.org/10.48550/arXiv.1902.09868

  128. Chen, X., Lin, K., Liu, W., Qian, C., Lin, L.: Weakly-supervised discovery of geometry-aware representation for 3D human pose estimation. In: Conference on computer vision and pattern recognition p. 10895–904. (2019).

  129. Cisse, M. M., Adi, Y., Neverova, N., Keshet, J.: Houdini: Fooling deep structured visual and speech recognition models with adversarial examples. In: Advances in neural information processing systems, vol. 30. (2017)

  130. Bai, J., Wu, B., Zhang, Y., Li, Y., Li, Z., Xia, S. T.: "Targeted attack against deep neural networks via flipping limited weight bits." arXiv preprint arXiv:2102.10496. (2021).

  131. Rathore, P., Basak, A., Nistala, S. H., Runkana, V.: "Untargeted, Targeted and Universal Adversarial Attacks and Defenses on Time Series." In 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE. (2020). DOI: https://doi.org/10.1109/IJCNN48605.2020.9207272

  132. Guo, S., Zhao, J., Li, X., Duan, J., Mu, D., Jing, X.: A black-box attack method against machine-learning-based anomaly network flow detection models. Secur. Commun. Netw. (2021). https://doi.org/10.1155/2021/5578335

    Article  Google Scholar 

  133. Wang, Y., Liu, J., Chang, X., Wang, J., Rodríguez, R. J.: "DI-AA: An Interpretable White-box Attack for Fooling Deep Neural Networks." arXiv preprint arXiv:2110.07305. (2021).

  134. Bhagoji, A. N., He, W., Li, B., Song, D.: "Exploring the space of black-box attacks on deep neural networks." arXiv preprint arXiv:1712.09491. (2017).

  135. Yang, X., Liu, W., Zhang, S., Liu, W., Tao, D.: Targeted attention attack on deep learning models in road sign recognition. IEEE Internet Things J. 8(6), 4980–4990 (2021). https://doi.org/10.1109/JIOT.2020.3034899

    Article  Google Scholar 

  136. Shi, Y., Sagduyu, Y. E.: "Evasion and causative attacks with adversarial deep learning," MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM). pp. 243–248. (2017). doi: https://doi.org/10.1109/MILCOM.2017.8170807.

  137. Hou, R., Ai, S., Chen, Q., Yan, H., Huang, T., Chen, K.: Similarity-based integrity protection for deep learning systems. Inf. Sci. (2022). https://doi.org/10.1016/j.ins.2022.04.003

    Article  Google Scholar 

  138. Xu, G., Li, H., Ren, H., Yang, K., Deng, R.H.: Data security issues in deep learning: attacks, countermeasures, and opportunities. IEEE Commun. Mag. 57(11), 116–122 (2019). https://doi.org/10.1109/MCOM.001.1900091

    Article  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shradha Dubey.

Ethics declarations

Conflict of interest

No conflict of interest, financial or otherwise.

Consent for publication

Not applicable.

Additional information

Communicated by R. Huang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dubey, S., Dixit, M. A comprehensive survey on human pose estimation approaches. Multimedia Systems 29, 167–195 (2023). https://doi.org/10.1007/s00530-022-00980-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-022-00980-0

Keywords

Navigation