Human Performance Capture Using Multiple Handheld Kinects

  • Yebin Liu
  • Genzhi Ye
  • Yangang Wang
  • Qionghai Dai
  • Christian Theobalt
Chapter
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)

Abstract

Capturing real performances of human actors has been an important topic in the fields of computer graphics and computer vision in the last few decades. The reconstructed 3D performance can be used for character animation and free-viewpoint video. While most of the available performance capture approaches rely on a 3D video studio with tens of RGB cameras, this chapter presents a method for marker-less performance capture of single or multiple human characters using only three handheld Kinects. Compared with the RGB camera approaches, the proposed method is more convenient with respect to data acquisition, allowing for much fewer cameras and carry-on camera capture. The method introduced in this chapter reconstructs human skeletal poses, deforming surface geometry and camera poses for every time step of the depth video. It succeeds on general uncontrolled indoor scenes with potentially dynamic background, and it succeeds even for reconstruction of multiple closely interacting characters.

References

  1. 1.
    Deutscher J, Blake A, Reid I (2000) Articulated body motion capture by annealed particle filtering. In: IEEE conference on computer vision pattern recognition, pp 1144–1149Google Scholar
  2. 2.
    Bregler C, Malik J, Pullen K (2004) Twist based acquisition and tracking of animal and human kinematics. IJCV 56:179–194CrossRefGoogle Scholar
  3. 3.
    Sigal L, Black M (2006 ) Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion. Technical Report CS-06-08, Brown UniversityGoogle Scholar
  4. 4.
    Balan A, Sigal L, Black M, Davis J, Haussecker H (2007) Detailed human shape and pose from images. In: IEEE conference on computer vision pattern recognition, pp 1–8Google Scholar
  5. 5.
    Stoll C, Hasler N, Gall J, Seidel HP, Theobalt C (2011) Fast articulated motion tracking using a sums of gaussians body model. In: IEEE international conference on computer vision, pp 951–958Google Scholar
  6. 6.
    Poppe R (2007) Vision-based human motion analysis: an overview. CVIU 108:4–18Google Scholar
  7. 7.
    Vlasic D, Baran I, Matusik W, Popović J (2008) Articulated mesh animation from multi-view silhouettes. ACM Trans Graph 27:1–9CrossRefGoogle Scholar
  8. 8.
    De Aguiar E, Stoll C, Theobalt C, Ahmed N, Seidel H, Thrun S (2008) Performance capture from sparse multi-view video. In: ACM Transactions on Graphics (TOG). vol 27, p 98Google Scholar
  9. 9.
    Ballan L, Cortelazzo G (2008) Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes. In: 3DPVT, vol 37Google Scholar
  10. 10.
    Cagniart C, Boyer E, Ilic S (2010) Free-form mesh tracking: a patch-based approach. In: IEEE conference on computer vision pattern recognition, pp 1339–1346Google Scholar
  11. 11.
    Starck J, Hilton A (2007) Surface capture for performance based animation. IEEE Comput Graph Appl 27(3):21–31CrossRefGoogle Scholar
  12. 12.
    Gall J, Stoll C, De Aguiar E, Theobalt C, Rosenhahn B, Seidel H (2009) Motion capture using joint skeleton tracking and surface estimation. In: IEEE conference on computer vision pattern recognition, pp 1746–1753Google Scholar
  13. 13.
    Kolb A, Barth E, Koch R, Larsen R (2010) Time-of-flight cameras in computer graphics. Comput Graph Forum 29:141–159CrossRefGoogle Scholar
  14. 14.
    Liu Y, Stoll C, Gall J, Seidel HP, Theobalt C (2011) Markerless motion capture of interacting characters using multi-view image segmentation. In: IEEE conference on computer vision pattern recognition, pp 1249–1256Google Scholar
  15. 15.
    Liu Y, Gall J, Stoll C, Dai Q, Seidel HP, Theobalt C (2013) Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans Pattern Anal Mach Intell 35:2720–2735CrossRefGoogle Scholar
  16. 16.
    Ye G, Liu Y, Hasler N, Ji X, Dai Q, Theobalt C (2012) Performance capture of interacting characters with handheld kinects. In: IEEE conference on computer vision ECCV. Springer, Berlin, pp 828–841Google Scholar
  17. 17.
    Ye G, Liu Y, Deng Y, Hasler N, Ji X, Dai Q, Theobalt C (2013) Free-viewpoint video of human actors using multiple handheld kinects. IEEE T Cybern 43:1370–1382CrossRefGoogle Scholar
  18. 18.
    Moeslund TB, Hilton A, Krüger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput vis image underst 104:90–126CrossRefGoogle Scholar
  19. 19.
    Poppe R (2007) Vision-based human motion analysis: an overview. Comput vis image underst 108:4–18CrossRefGoogle Scholar
  20. 20.
    Deutscher J, Blake A, Reid I (200) Articulated body motion capture by annealed particle filtering. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, Vol 2, pp 126–133Google Scholar
  21. 21.
    Gall J, Rosenhahn B, Brox T, Seidel HP (2010) Optimization and filtering for human motion capture. Int j comput vis 87:75–92CrossRefGoogle Scholar
  22. 22.
    Wu C, Varanasi K, Theobalt C (2012) Full body performance capture under uncontrolled and varying illumination: a shading-based approach. Springer, New York, pp 757–770Google Scholar
  23. 23.
    Wu C, Varanasi K, Liu Y, Seidel HP, Theobalt C (2011) Shading-based dynamic shape refinement from multi-view video under general illumination. In: IEEE international conference on computer vision (ICCV), pp 1108–1115Google Scholar
  24. 24.
    Li G, Wu C, Stoll C, Liu Y, Varanasi K, Dai Q, Theobalt C (2013) Capturing relightable human performances under general uncontrolled illumination. Comput Graph Forum 32:275–284CrossRefGoogle Scholar
  25. 25.
    Hasler N, Rosenhahn B, Thormählen T, Wand M, Gall J, Seidel HP (2009) Markerless motion capture with unsynchronized moving cameras. In: IEEE international conference on computer vision pattern recognition, pp 224–231Google Scholar
  26. 26.
    Wu C, Stoll C, Valgaerts L, Theobalt C (2013) On-set performance capture of multiple actors with a stereo camera. ACM Trans Graph (TOG) 32:161Google Scholar
  27. 27.
    Wei X, Chai J (2010) Videomocap: modeling physically realistic human motion from monocular video sequences. ACM Trans Graph (TOG) 29:42Google Scholar
  28. 28.
    Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE T Cybernet 43:1318–1334CrossRefGoogle Scholar
  29. 29.
    Shum HPH, Ho ESL, Jiang Y, Takagi S (2013) Real-time posture reconstruction for microsoft kinect. IEEE T Cybernet 43:1357–1369CrossRefGoogle Scholar
  30. 30.
    Ni B, Pei Y, Moulin P, Yan S (2013) Multilevel depth and image fusion for human activity detection. IEEE T Cybernet 43:1383–1394CrossRefGoogle Scholar
  31. 31.
    Baak A, Müller M, Bharaj G, Seidel HP, Theobalt C (2013) A data-driven approach for real-time full body pose reconstruction from a depth camera. In: Consumer depth cameras for computer vision. Springer, New York, pp 71–98Google Scholar
  32. 32.
    Shotton J, Fitzgibbon AW, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: IEEE international conference on computer vision pattern recognition, pp 1297–1304Google Scholar
  33. 33.
    Ganapathi V, Plagemann C, Koller D, Thrun S (2010) Real time motion capture using a single time-of-flight camera. In: IEEE international conference on computer vision pattern recognition, pp 755–762Google Scholar
  34. 34.
    Agarwal A, Triggs B (2004) 3d human pose from silhouettes by relevance vector regression. In: Proceedings of the IEEE computer society conference on computer vision and Pattern Recognition, vol 2, p 882Google Scholar
  35. 35.
    Ye M, Wang X, Yang R, Ren L, Pollefeys M (2011) Accurate 3d pose estimation from a single depth image. In: IEEE international conference on computer vision, pp 731–738Google Scholar
  36. 36.
    Taylor J, Shotton J, Sharp T, Fitzgibbon A (2012) The vitruvian manifold: inferring dense correspondences for one-shot human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 103–110Google Scholar
  37. 37.
    Wei X, Zhang P, Chai J (2012) Accurate realtime full-body motion capture using a single depth camera. ACM Trans Graph (TOG) 31:188CrossRefGoogle Scholar
  38. 38.
    Bouguet JY (2004) Camera calibration toolbox for matlabGoogle Scholar
  39. 39.
  40. 40.
    Barmpoutis A (2013) Tensor body: real-time reconstruction of the human body and avatar synthesis from rgb-d. IEEE T Cybernet 43:1347–1356CrossRefGoogle Scholar
  41. 41.
    Tong J, Zhou J, Liu L, Pan Z, Yan H (2012) Scanning 3d full human bodies using kinects. IEEE Trans Vis Comput Graph 18:643–650CrossRefGoogle Scholar
  42. 42.
    Li H, Vouga E, Gudym A, Luo L, Barron JT, Gusev G (2013) 3d self-portraits. ACM Trans Graph 32:187Google Scholar
  43. 43.
    Aiger D, Mitra NJ, Cohen-Or D (2008) 4-points congruent sets for robust surface registration. ACM Trans Graph 27(85):1–10Google Scholar
  44. 44.
    Sorkine O (2006) Differential representations for mesh processing. Comput Graph Forum 25:789–807CrossRefGoogle Scholar
  45. 45.
  46. 46.
    Oikonomidis I, Kyriazis N, Argyros AA (2011) Efficient model-based 3d tracking of hand articulations using kinect. In: IEEE international conference on BMVC, pp 1–11Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Yebin Liu
    • 1
  • Genzhi Ye
    • 1
  • Yangang Wang
    • 1
  • Qionghai Dai
    • 1
  • Christian Theobalt
    • 1
  1. 1.Tsinghua UniversityBeijingChina

Personalised recommendations