Multiple human 3D pose estimation from multiview images

Abstract

Multiple human 3D pose estimation is a challenging task. It is mainly because of large variations in the scale and pose of humans, fast motions, multiple persons in the scene, and arbitrary number of visible body parts due to occlusion or truncation. Some of these ambiguities can be resolved by using multiview images. This is due to the fact that more evidences of body parts would be available in multiple views. In this work, a novel method for multiple human 3D pose estimation using evidences in multiview images is proposed. The proposed method utilizes a fully connected pairwise conditional random field that contains two types of pairwise terms. The first pairwise term encodes the spatial dependencies among human body joints based on an articulated human body configuration. The second pairwise term is based on the output of a 2D deep part detector. An approximate inference is then performed using the loopy belief propagation algorithm. The proposed method is evaluated on the Campus, Shelf, Utrecht Multi-Person Motion benchmark, Human3.6M, KTH Football II, and MPII Cooking datasets. Experimental results indicate that the proposed method achieves substantial improvements over the existing state-of-the-art methods in terms of the probability of correct pose and the mean per joint position error performance measures.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Notes

  1. 1.

    http://ipl.ce.sharif.edu/3D_pose.html

References

  1. 1.

    Afrouzian R, Seyedarabi H, Kasaei S (2016) Pose estimation of soccer players using multiple uncalibrated cameras. Multimed Tools Appl 75(12):6809–6827. https://doi.org/10.1007/s11042-015-2611-8

    Article  Google Scholar 

  2. 2.

    Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723. https://doi.org/10.1109/TAC.1974.1100705

    MathSciNet  Article  MATH  Google Scholar 

  3. 3.

    Amin S, Andriluka M, Rohrbach M, Schiele B (2013) Multi-view pictorial structures for 3d human pose estimation. In: British Machine Vision Conference, vol. 2. BMVA Press

  4. 4.

    Amin S, Müller P, Bulling A, Andriluka M (2014) Test-time adaptation for 3d human pose estimation. In: German conference on pattern recognition, pp 253–264. Springer

  5. 5.

    Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: New benchmark and state of the art analysis. In: IEEE conference on computer vision and pattern recognition (CVPR)

  6. 6.

    Belagiannis V, Zisserman A (2016). Recurrent human pose estimation. arXiv:1605.02914

  7. 7.

    Belagiannis V, Amann C, Navab N, Ilic S (2014) Holistic human pose estimation with regression forests. In: Articulated motion and deformable objects, pp 20–30. Springer

  8. 8.

    Belagiannis V, Amin S, Andriluka M, Schiele B, Navab N, Ilic S (2014) 3d pictorial structures for multiple human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1669–1676. IEEE

  9. 9.

    Belagiannis V, Wang X, Schiele B, Fua P, Ilic S, Navab N (2014) Multiple human pose estimation with temporally consistent 3D pictorial structures. In: ChaLearn looking at people workshop, European conference on computer vision (ECCV2014). IEEE

  10. 10.

    Belagiannis V, Rupprecht C, Carneiro G, Navab N (2015) Robust optimization for deep regression. In: 2015 IEEE international conference on computer vision (ICCV), pp 2830–2838. IEEE

  11. 11.

    Belagiannis V, Amin S, Andriluka M, Schiele B, Navab N, Ilic S (2015) 3d pictorial structures revisited: Multiple human pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence

  12. 12.

    Berclaz J, Fleuret F, Turetken E, Fua P (2011) Multiple object tracking using k-shortest paths optimization. IEEE Trans Pattern Anal Mach Intell 33(9):1806–1819

    Article  Google Scholar 

  13. 13.

    Bishop MC (2006) Pattern Recognition and Machine Learning. Springer, Berlin

    Google Scholar 

  14. 14.

    Bourdev L, Maji S, Brox T, Malik J (2010) Detecting people using mutually consistent poselet activations. In: Computer Vision–ECCV, pp 168–181. Springer

  15. 15.

    Burenius M, Sullivan J, Carlsson S (2013) 3d pictorial structures for multiple view articulated pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3618–3625. IEEE

  16. 16.

    Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR

  17. 17.

    Charles J, Pfister T, Magee D, Hogg D, Zisserman A (2014) Upper body pose estimation with temporal sequential forests. In: Proceedings of the British machine vision conference, pp 1–12. BMVA Press

  18. 18.

    Charles J, Pfister T, Magee D, Hogg D, Zisserman A (2016) Personalizing human video pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3063– 3072

  19. 19.

    Chen X, Yuille AL (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in neural information processing systems, pp 1736–1744

  20. 20.

    Dantone M, Gall J, Leistner C, Van Gool L (2013) Human pose estimation using body parts dependent joint regressors. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3041–3048. IEEE

  21. 21.

    Dong J, Chen Q, Xia W, Huang Z, Yan S (2013) A deformable mixture parsing model with parselets. In: IEEE international conference on computer vision (ICCV), pp 3408–3415. IEEE

  22. 22.

    Dong J, Chen Q, Shen X, Yang J, Yan S (2014) Towards unified human parsing and pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 843–850. IEEE

  23. 23.

    Felzenszwalb PF, Huttenlocher DP (2006) Efficient belief propagation for early vision. Int J Comput Vis 70(1):41–54

    Article  Google Scholar 

  24. 24.

    Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645. https://doi.org/10.1109/TPAMI.2009.167

    Article  Google Scholar 

  25. 25.

    Ferrari V, Marin-Jimenez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  26. 26.

    Fischler MA, Elschlager RA (1973) The representation and matching of pictorial structures. IEEE Trans Comput 100(1):67–92

    Article  Google Scholar 

  27. 27.

    Holt B, Ong EJ, Cooper H, Bowden R (2011) Putting the pieces together: Connected poselets for human pose estimation. In: IEEE international conference on computer vision workshops (ICCV Workshops), pp 1196–1201. IEEE

  28. 28.

    Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) DeeperCut: A deeper, stronger, and faster multi-person pose estimation model. In: Leibe B (ed) Computer Vision – ECCV 2016, Lecture Notes in Computer Science, vol. 9910, pp. 34–50. Springer, Amsterdam, The Netherlands. https://doi.org/10.1007/978-3-319-46466-4_3

  29. 29.

    Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339

    Article  Google Scholar 

  30. 30.

    Jain A, Tompson J, Andriluka M, Taylor GW, Bregler C (2013) Learning human pose estimation features with convolutional networks. arXiv:1312.7302

  31. 31.

    Jain A, Tompson J, LeCun Y, Bregler C (2014) Modeep: A deep learning framework using motion features for human pose estimation. In: Asian conference on computer vision, pp 302–315. Springer

  32. 32.

    Jammalamadaka N, Zisserman A, Jawahar CV (2017) Human pose search using deep networks. Image Vis Comput 59:31–43. https://doi.org/10.1016/j.imavis.2016.12.002.

    Article  Google Scholar 

  33. 33.

    Kazemi V, Sullivan J (2012) Using richer models for articulated pose estimation of footballers. In: BMVC, pp 1–10

  34. 34.

    Kazemi V, Burenius M, Azizpour H, Sullivan J (2013) Multi-view body part recognition with random forests. In: 24th British machine vision conference. British machine vision association

  35. 35.

    Kiefel M, Gehler P (2014) Human pose estimation with fields of parts. In: Computer Vision–ECCV, pp 331–346. Springer

  36. 36.

    Li S, Zhang W, Chan AB (2017) Maximum-margin structured learning with deep networks for 3d human pose estimation. Int J Comput Vis 122(1):149–168. https://doi.org/10.1007/s11263-016-0962-x

    MathSciNet  Article  Google Scholar 

  37. 37.

    Mooij JM (2010) libDAI: A free and open source C++ library for discrete approximate inference in graphical models. J. Mach Learn Res 11:2169–2173. http://www.jmlr.org/papers/volume11/mooij10a/mooij10a.pdf

    MATH  Google Scholar 

  38. 38.

    Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: ECCV

  39. 39.

    Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Harvesting multiple views for marker-less 3d human pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  40. 40.

    Pfister T, Charles J, Zisserman A (2015) Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE international conference on computer vision, pp 1913–1921

  41. 41.

    Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 588–595. IEEE

  42. 42.

    Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Strong appearance and expressive spatial models for human pose estimation. In: IEEE international conference on computer vision (ICCV), pp 3487–3494. IEEE

  43. 43.

    Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2016) DeepCut: Joint subset partition and labeling for multi person pose estimation. In: 29th IEEE conference on computer vision and pattern recognition (CVPR 2016), pp. 4929–4937. IEEE Computer Society, Las Vegas, NV, USA. https://doi.org/10.1109/CVPR.2016.533

  44. 44.

    Rohrbach M, Amin S, Andriluka M, Schiele B (2012) A database for fine grained activity detection of cooking activities. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 1194–1201. IEEE

  45. 45.

    Schick A, Stiefelhagen R (2015) 3d pictorial structures for human pose estimation with supervoxels. In: 2015 IEEE winter conference on applications of computer vision (WACV), pp. 140–147. IEEE

  46. 46.

    Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124

    Article  Google Scholar 

  47. 47.

    Tekin B, Katircioglu I, Salzmann M, Lepetit V, Fua P (2016) Structured prediction of 3d human pose with deep neural networks. CoRR arXiv:1605.05180

  48. 48.

    Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp 1799–1807

  49. 49.

    Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660

  50. 50.

    Tran D, Forsyth D (2010) Improved human parsing with a full relational model. In: Computer Vision–ECCV, pp 227–240. Springer

  51. 51.

    Van der Aa N, Luo X, Giezeman GJ, Tan RT, Veltkamp RC (2011) Umpm benchmark: A multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), pp 1264–1269. IEEE

  52. 52.

    Yan C, Zhang Y, Dai F, Wang X, Li L, Dai Q (2014) Parallel deblocking filter for hevc on many-core processor. Electron Lett 50(5):367–368

    Article  Google Scholar 

  53. 53.

    Yan C, Zhang Y, Dai F, Zhang J, Li L, Dai Q (2014) Efficient parallel hevc intra-prediction on many-core processor. Electron Lett 50(11):805–806

    Article  Google Scholar 

  54. 54.

    Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014) A highly parallel framework for hevc coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576

    Article  Google Scholar 

  55. 55.

    Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014) Efficient parallel framework for hevc motion estimation on many-core processors. IEEE Trans Circuits Syst Video Technol 24(12):2077–2089

    Article  Google Scholar 

  56. 56.

    Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR), pp 1385–1392. IEEE

  57. 57.

    Yang Y, Ramanan D (2013) Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal Mach Intell 35(12):2878–2890

    Article  Google Scholar 

  58. 58.

    Zhou X, Sun X, Zhang W, Liang S, Wei Y (2016) Deep kinematic pose regression. In: Computer Vision–ECCV 2016 Workshops, pp 186–201. Springer

  59. 59.

    Zhou X, Zhu M, Leonardos S, Derpanis KG, Daniilidis K (2016) Sparseness meets deepness: 3d human pose estimation from monocular video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4966–4975

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Shohreh Kasaei.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ershadi-Nasab, S., Noury, E., Kasaei, S. et al. Multiple human 3D pose estimation from multiview images. Multimed Tools Appl 77, 15573–15601 (2018). https://doi.org/10.1007/s11042-017-5133-8

Download citation

Keywords

  • Human pose estimation
  • Multiview images
  • Multiple human
  • Fully connected model
  • Graphical model