Advertisement

Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera

  • Timo von MarcardEmail author
  • Roberto Henschel
  • Michael J. Black
  • Bodo Rosenhahn
  • Gerard Pons-Moll
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11214)

Abstract

In this work, we propose a method that combines a single hand-held camera and a set of Inertial Measurement Units (IMUs) attached at the body limbs to estimate accurate 3D poses in the wild. This poses many new challenges: the moving camera, heading drift, cluttered background, occlusions and many people visible in the video. We associate 2D pose detections in each image to the corresponding IMU-equipped persons by solving a novel graph based optimization problem that forces 3D to 2D coherency within a frame and across long range frames. Given associations, we jointly optimize the pose of a statistical body model, the camera pose and heading drift using a continuous optimization framework. We validated our method on the TotalCapture dataset, which provides video and IMU synchronized with ground truth. We obtain an accuracy of 26 mm, which makes it accurate enough to serve as a benchmark for image-based 3D pose estimation in the wild. Using our method, we recorded 3D Poses in the Wild (3DPW), a new dataset consisting of more than 51, 000 frames with accurate 3D pose in challenging sequences, including walking in the city, going up-stairs, having coffee or taking the bus. We make the reconstructed 3D poses, video, IMU and 3D models available for research purposes at http://virtualhumans.mpi-inf.mpg.de/3DPW.

Keywords

Human pose Video IMUs Sensor fusion 2D to 3D People tracking 3D pose dataset 

Notes

Acknowledgements

We thank Jorge Márquez, Senya Polikovsky, Matvey Safroshkin and Andrea Keller for the technical support.

Supplementary material

474197_1_En_37_MOESM1_ESM.pdf (2.3 mb)
Supplementary material 1 (pdf 2395 KB)

References

  1. 1.
    Andriluka, M., Roth, S., Schiele, B.: Monocular 3D pose estimation and tracking by detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 623–630 (2010)Google Scholar
  2. 2.
    Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_34CrossRefGoogle Scholar
  3. 3.
    Bull, A.D.: Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res. 12(Oct), 2879–2904 (2011)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  5. 5.
    Gurobi Optimization Inc.: Gurobi Optimizer Reference Manual (2016)Google Scholar
  6. 6.
    Helten, T., Baak, A., Bharaj, G., Muller, M., Seidel, H.P., Theobalt, C.: Personalization and evaluation of a real-time depth-based full body tracker. In: 3D Vision (3DV) (2013)Google Scholar
  7. 7.
    Henschel, R., Leal-Taixé, L., Cremers, D., Rosenhahn, B.: Fusion of head and full-body detectors for multi-object tracking. In: Computer Vision and Pattern Recognition Workshops (CVPRW) (2018)Google Scholar
  8. 8.
    Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36(7), 1325–1339 (2014)CrossRefGoogle Scholar
  9. 9.
    Jahangiri, E., Yuille, A.L.: Generating multiple diverse hypotheses for human 3D pose consistent with 2D joint detections. In: IEEE International Conference on Computer Vision (ICCV) Workshops (PeopleCap) (2017)Google Scholar
  10. 10.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  11. 11.
    Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)Google Scholar
  12. 12.
    Levinkov, E., et al.: Joint graph decomposition & node labeling: problem, algorithms, applications. In: CVPR, vol. 7. IEEE (2017)Google Scholar
  13. 13.
    Li, S., Zhang, W., Chan, A.B.: Maximum-margin structured learning with deep networks for 3D human pose estimation. In: IEEE International Conference on Computer Vision (ICCV), pp. 2848–2856 (2015)Google Scholar
  14. 14.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015)CrossRefGoogle Scholar
  15. 15.
    Loper, M.M., Mahmood, N., Black, M.J.: MoSh: motion and shape capture from sparse markers. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 33(6), 220:1–220:13 (2014)zbMATHGoogle Scholar
  16. 16.
    Malleson, C., Volino, M., Gilbert, A., Trumble, M., Collomosse, J., Hilton, A.: Real-time full-body motion capture from video and IMUs. In: 2017 Fifth International Conference on 3D Vision (3DV) (2017)Google Scholar
  17. 17.
    von Marcard, T., Pons-Moll, G., Rosenhahn, B.: Human pose estimation from video and IMUs. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 38(8), 1533–1547 (2016)CrossRefGoogle Scholar
  18. 18.
    Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  19. 19.
    Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 3D Vision (3DV). IEEE (2017)Google Scholar
  20. 20.
    Mehta, D., et al.: Single-shot multi-person 3D body pose estimation from monocular RGB input. arXiv preprint arXiv:1712.03453 (2017)
  21. 21.
    Mehta, D., et al.: VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG) 36(4), 44 (2017)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  23. 23.
    Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  24. 24.
    Pons-Moll, G., et al.: Outdoor human motion capture using inverse kinematics and von mises-fisher sampling. In: Proceedings of the 2011 International Conference on Computer Vision (ICCV), pp. 1243–1250 (2011)Google Scholar
  25. 25.
    Pons-Moll, G., Baak, A., Helten, T., Müller, M., Seidel, H.P., Rosenhahn, B.: Multisensor-fusion for 3D full-body human motion capture. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 663–670 (2010)Google Scholar
  26. 26.
    Pons-Moll, G., Fleet, D.J., Rosenhahn, B.: Posebits for monocular human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2337–2344 (2014)Google Scholar
  27. 27.
    Pons-Moll, G., Pujades, S., Hu, S., Black, M.: ClothCap: seamless 4D clothing capture and retargeting. ACM Trans. Graph. (Proc. SIGGRAPH) 36(4), 73 (2017)CrossRefGoogle Scholar
  28. 28.
    Popa, A.I., Zanfir, M., Sminchisescu, C.: Deep multitask architecture for integrated 2D and 3D human sensing. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  29. 29.
    Rhodin, H., et al.: Learning monocular 3D human pose estimation from multi-view images. In: CVPR (2018)Google Scholar
  30. 30.
    Roetenberg, D., Luinge, H., Slycke, P.: Moven: full 6DOF human motion tracking using miniature inertial sensors. Xsen Technologies, December 2007Google Scholar
  31. 31.
    Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net++: multi-person 2D and 3D pose detection in natural images. arXiv preprint arXiv:1803.00455 (2018)
  32. 32.
    Sigal, L., Balan, A.O., Black, M.J.: Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. (IJCV) 87(1–2), 4 (2010)CrossRefGoogle Scholar
  33. 33.
    Simo-Serra, E., Quattoni, A., Torras, C., Moreno-Noguer, F.: A joint model for 2D and 3D pose estimation from a single image. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3634–3641 (2013)Google Scholar
  34. 34.
    Simo-Serra, E., Ramisa, A., Alenyà, G., Torras, C., Moreno-Noguer, F.: Single image 3D human pose estimation from noisy observations. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2673–2680 (2012)Google Scholar
  35. 35.
    Sminchisescu, C., Triggs, B.: Kinematic jump processes for monocular 3D human tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2003)Google Scholar
  36. 36.
    Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. arXiv preprint arXiv:1704.00159 (2017)
  37. 37.
    Tang, S., Andres, B., Andriluka, M., Schiele, B.: Subgraph decomposition for multi-target tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5033–5041 (2015)Google Scholar
  38. 38.
    Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  39. 39.
    Trumble, M., Gilbert, A., Malleson, C., Hilton, A., Collomosse, J.: Total capture: 3D human pose estimation fusing video and inertial sensors. In: Proceedings of 28th British Machine Vision Conference, pp. 1–13 (2017)Google Scholar
  40. 40.
    Tung, H.Y., Tung, H.W., Yumer, E., Fragkiadaki, K.: Self-supervised learning of motion capture. In: NIPS (2017)Google Scholar
  41. 41.
    Vlasic, D., et al.: Practical motion capture in everyday surroundings. ACM Trans. Graph. (TOG) 26(3), 35 (2007)CrossRefGoogle Scholar
  42. 42.
    von Marcard, T., Rosenhahn, B., Black, M., Pons-Moll, G.: Sparse inertial poser: automatic 3D human pose estimation from sparse IMUs. In: Computer Graphics Forum, Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics), vol. 36, no. 2, pp. 349–360 (2017)Google Scholar
  43. 43.
    Wandt, B., Ackermann, H., Rosenhahn, B.: 3D reconstruction of human motion from monocular image sequences. Trans. Pattern Anal. Mach. Intell. (TPAMI) 38(8), 1505–1516 (2016)CrossRefGoogle Scholar
  44. 44.
    Wang, C., Wang, Y., Lin, Z., Yuille, A.L., Gao, W.: Robust estimation of 3D human poses from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2361–2368 (2014)Google Scholar
  45. 45.
    Zell, P., Wandt, B., Rosenhahn, B.: Joint 3D human motion capture and physical analysis from monocular videos. In: The IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2017)Google Scholar
  46. 46.
    Zhang, C., Pujades, S., Black, M., Pons-Moll, G.: Detailed, accurate, human shape estimation from clothed 3D scan sequences. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  47. 47.
    Zheng, Z., et al.: HybridFusion: real-time performance capture using a single depth sensor and sparse IMUs. In: European Conference on Computer Vision (ECCV) (2018)Google Scholar
  48. 48.
    Zhou, F., De la Torre, F.: Spatio-temporal matching for human detection in video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 62–77. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_5CrossRefGoogle Scholar
  49. 49.
    Zhou, X., Leonardos, S., Hu, X., Daniilidis, K.: 3D shape estimation from 2D landmarks: a convex relaxation approach. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4447–4455 (2015)Google Scholar
  50. 50.
    Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 398–407 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Timo von Marcard
    • 1
    Email author
  • Roberto Henschel
    • 1
  • Michael J. Black
    • 2
  • Bodo Rosenhahn
    • 1
  • Gerard Pons-Moll
    • 3
  1. 1.Leibniz Universität HannoverHanoverGermany
  2. 2.MPI for Intelligent SystemsTübingenGermany
  3. 3.MPI for Informatics, Saarland Informatics CampusSaarbrückenGermany

Personalised recommendations