HybridFusion: Real-Time Performance Capture Using a Single Depth Sensor and Sparse IMUs

  • Zerong Zheng
  • Tao Yu
  • Hao Li
  • Kaiwen Guo
  • Qionghai Dai
  • Lu Fang
  • Yebin LiuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)


We propose a light-weight yet highly robust method for real-time human performance capture based on a single depth camera and sparse inertial measurement units (IMUs). Our method combines non-rigid surface tracking and volumetric fusion to simultaneously reconstruct challenging motions, detailed geometries and the inner human body of a clothed subject. The proposed hybrid motion tracking algorithm and efficient per-frame sensor calibration technique enable non-rigid surface reconstruction for fast motions and challenging poses with severe occlusions. Significant fusion artifacts are reduced using a new confidence measurement for our adaptive TSDF-based fusion. The above contributions are mutually beneficial in our reconstruction system, which enable practical human performance capture that is real-time, robust, low-cost and easy to deploy. Experiments show that extremely challenging performances and loop closure problems can be handled successfully.


Performance capture Real-time Single-view IMU 



This work is supported by the National Key Foundation for Exploring Scientific Instrument of China No. 2013YQ140517; the National NSF of China grant No. 61522111, No. 61531014, No. 61233005, No. 61722209 and No. 61331015. Hao Li was supported by the ONR YIP grant N00014-17-S-FO14, the CONIX Research Center, an SRC program sponsored by DARPA, the U.S. ARL under contract number W911NF-14-D-0005, Adobe, and Sony.

Supplementary material

Supplementary material 1 (mp4 63528 KB)


  1. 1.
    Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: shape completion and animation of people. ACM Trans. Graph. 24(3), 408–416 (2005)CrossRefGoogle Scholar
  2. 2.
    Baak, A., Helten, T., Müller, M., Pons-Moll, G., Rosenhahn, B., Seidel, H.-P.: Analyzing and evaluating markerless motion tracking using inertial sensors. In: Kutulakos, K.N. (ed.) ECCV 2010. LNCS, vol. 6553, pp. 139–152. Springer, Heidelberg (2012). Scholar
  3. 3.
    Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). Scholar
  4. 4.
    Bradley, D., Popa, T., Sheffer, A., Heidrich, W., Boubekeur, T.: Markerless garment capture. In: ACM TOG. vol. 27, p. 99. ACM (2008)Google Scholar
  5. 5.
    Brox, T., Rosenhahn, B., Gall, J., Cremers, D.: Combined region and motion-based 3D tracking of rigid and articulated objects. IEEE TPAMI 32(3), 402–415 (2010)CrossRefGoogle Scholar
  6. 6.
    Dou, M., et al.: Motion2fusion: real-time volumetric performance capture. ACM Trans. Graph. 36(6), 246:1–246:16 (2017)CrossRefGoogle Scholar
  7. 7.
    Dou, M., et al.: Fusion4D: real-time performance capture of challenging scenes. ACM TOG 35(4), 114 (2016)CrossRefGoogle Scholar
  8. 8.
    Gall, J., Stoll, C., De Aguiar, E., Theobalt, C., Rosenhahn, B., Seidel, H.P.: Motion capture using joint skeleton tracking and surface estimation. In: CVPR, pp. 1746–1753. IEEE (2009)Google Scholar
  9. 9.
    Guo, K., Xu, F., Wang, Y., Liu, Y., Dai, Q.: Robust non-rigid motion tracking and surface reconstruction using l0 regularization. In: ICCV, pp. 3083–3091 (2015)Google Scholar
  10. 10.
    Guo, K., Xu, F., Yu, T., Liu, X., Dai, Q., Liu, Y.: Real-time geometry, Albedo and motion reconstruction using a single RGBD camera. ACM Trans. Graph. (TOG) 36(3) (2017)CrossRefGoogle Scholar
  11. 11.
    Helten, T., Muller, M., Seidel, H.P., Theobalt, C.: Real-time body tracking with one depth camera and inertial sensors. In: The IEEE International Conference on Computer Vision (ICCV), December 2013Google Scholar
  12. 12.
    Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: Volumedeform: real-time volumetric non-rigid reconstruction. In: ECCV (2016)Google Scholar
  13. 13.
    Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3D deformation model for tracking faces, hands, and bodies. In: CVPR. IEEE (2018)Google Scholar
  14. 14.
    Leroy, V., Franco, J.S., Boyer, E.: Multi-view dynamic shape refinement using local temporal integration. In: ICCV. IEEE (2017)Google Scholar
  15. 15.
    Li, H., Adams, B., Guibas, L.J., Pauly, M.: Robust single-view geometry and motion reconstruction. In: ACM TOG. vol. 28, p. 175. ACM (2009)CrossRefGoogle Scholar
  16. 16.
    Liao, M., Zhang, Q., Wang, H., Yang, R., Gong, M.: Modeling deformable objects from a single depth camera. In: ICCV (2009)Google Scholar
  17. 17.
    Liu, Y., Gall, J., Stoll, C., Dai, Q., Seidel, H., Theobalt, C.: Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2720–2735 (2013)CrossRefGoogle Scholar
  18. 18.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 34(6), 248:1–248:16 (2015)Google Scholar
  19. 19.
    Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1987, pp. 163–169. ACM, New York, NY, USA (1987)Google Scholar
  20. 20.
    Malleson, C., Volino, M., Gilbert, A., Trumble, M., Collomosse, J., Hilton, A.: Real-time full-body motion capture from video and IMUs. In: 2017 Fifth International Conference on 3D Vision (3DV) (2017)Google Scholar
  21. 21.
    von Marcard, T., Henschel, R., Black, M., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: European Conference on Computer Vision, September 2018Google Scholar
  22. 22.
    von Marcard, T., Pons-Moll, G., Rosenhahn, B.: Human pose estimation from video and IMUs. Trans. Pattern Anal. Mach. Intell. PAMI 38(8) (2016)CrossRefGoogle Scholar
  23. 23.
    Mitra, N.J., Flöry, S., Ovsjanikov, M., Gelfand, N., Guibas, L.J., Pottmann, H.: Dynamic geometry registration. In: SGP, pp. 173–182 (2007)Google Scholar
  24. 24.
    Mustafa, A., Kim, H., Guillemaut, J., Hilton, A.: General dynamic scene reconstruction from multiple view video. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 7–13 December 2015, pp. 900–908 (2015)Google Scholar
  25. 25.
    Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015Google Scholar
  26. 26.
    Pekelny, Y., Gotsman, C.: Articulated object reconstruction and markerless motion capture from depth video. In: CGF. vol. 27, pp. 399–408. Wiley Online Library (2008)Google Scholar
  27. 27.
    Pons-Moll, G., et al.: Outdoor human motion capture using inverse kinematics and von mises-fisher sampling. In: IEEE International Conference on Computer Vision (ICCV), pp. 1243–1250, November 2011Google Scholar
  28. 28.
    Pons-Moll, G., Baak, A., Helten, T., Müller, M., Seidel, H.P., Rosenhahn, B.: Multisensor-fusion for 3D full-body human motion capture. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2010Google Scholar
  29. 29.
    Pons-Moll, G., Pujades, S., Hu, S., Black, M.: ClothCap: seamless 4D clothing capture and retargeting. ACM Trans. Graph. (Proc. SIGGRAPH) 36(4), 73:1–73:15 (2017). Two first authors contributed equallyCrossRefGoogle Scholar
  30. 30.
    Pons-Moll, G., Romero, J., Mahmood, N., Black, M.J.: Dyna: a model of dynamic human shape in motion. ACM Trans. Graph. (Proc. SIGGRAPH) 34(4), 120:1–120:14 (2015)CrossRefGoogle Scholar
  31. 31.
    Slavcheva, M., Baust, M., Cremers, D., Ilic, S.: KillingFusion: Non-rigid 3D reconstruction without correspondences. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  32. 32.
    Slavcheva, M., Baust, M., Ilic, S.: SobolevFusion: 3D reconstruction of scenes undergoing free non-rigid motion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  33. 33.
    Slyper, R., Hodgins, J.: Action capture with accelerometers. In: Gross, M., James, D. (eds.) Eurographics/SIGGRAPH Symposium on Computer Animation. The Eurographics Association (2008)Google Scholar
  34. 34.
    Sumner, R.W., Schmid, J., Pauly, M.: Embedded deformation for shape manipulation. In: SIGGRAPH, SIGGRAPH 2007. ACM, New York (2007)Google Scholar
  35. 35.
    Süßmuth, J., Winter, M., Greiner, G.: Reconstructing animated meshes from time-varying point clouds. In: CGF. vol. 27, pp. 1469–1476. Blackwell Publishing Ltd. (2008)Google Scholar
  36. 36.
    Tautges, J., et al.: Motion reconstruction using sparse accelerometer data. ACM Trans. Graph. 30(3), 18:1–18:12 (2011)CrossRefGoogle Scholar
  37. 37.
    Tevs, A., et al.: Animation cartography-intrinsic reconstruction of shape and motion. ACM TOG 31(2), 12 (2012)CrossRefGoogle Scholar
  38. 38.
    Tkach, A., Tagliasacchi, A., Remelli, E., Pauly, M., Fitzgibbon, A.: Online generative model personalization for hand tracking. ACM Trans. Graph. 36(6), 243:1–243:11 (2017)CrossRefGoogle Scholar
  39. 39.
    Vlasic, D., et al.: Practical motion capture in everyday surroundings. In: Proceedings of SIGGRAPH 2007. ACM (2007)Google Scholar
  40. 40.
    Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. In: ACM TOG. vol. 27, p. 97. ACM (2008)Google Scholar
  41. 41.
    von Marcard, T., Rosenhahn, B., Black, M., Pons-Moll, G.: Sparse inertial poser: automatic 3D human pose estimation from sparse IMUs. Computer Graphics Forum, Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics), pp. 349–360 (2017)Google Scholar
  42. 42.
    Xu, W., et al.: MonoPerfCap: human performance capture from monocular video. ACM TOG 37, 27 (2017)Google Scholar
  43. 43.
    Ye, G., Liu, Y., Hasler, N., Ji, X., Dai, Q., Theobalt, C.: Performance capture of interacting characters with handheld kinects. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 828–841. Springer, Heidelberg (2012). Scholar
  44. 44.
    Yu, T., et al.: BodyFusion: real-time capture of human motion and surface geometry using a single depth camera. In: The IEEE International Conference on Computer Vision (ICCV). ACM, October 2017Google Scholar
  45. 45.
    Yu, T., et al.: DoubleFusion: real-time capture of human performance with inner body shape from a depth sensor. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  46. 46.
    Zollhöfer, M., et al.: Real-time non-rigid reconstruction using an RGB-D camera. ACM TOG 33(4), 156 (2014)CrossRefGoogle Scholar
  47. 47.
    Zou, D., Tan, P.: CoSLAM: collaborative visual slam in dynamic environments. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 354–366 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Zerong Zheng
    • 1
  • Tao Yu
    • 1
    • 2
  • Hao Li
    • 3
  • Kaiwen Guo
    • 4
  • Qionghai Dai
    • 1
  • Lu Fang
    • 5
  • Yebin Liu
    • 1
    Email author
  1. 1.Tsinghua UniversityBeijingChina
  2. 2.Beihang UniversityBeijingChina
  3. 3.University of Southern CaliforniaLos AngelesUSA
  4. 4.Google Inc.Mountain ViewUSA
  5. 5.Tsinghua-Berkeley Shenzhen InstituteTsinghua UniversityShenzhenChina

Personalised recommendations