Abstract
Capturing real performances of human actors has been an important topic in the fields of computer graphics and computer vision in the last few decades. The reconstructed 3D performance can be used for character animation and free-viewpoint video. While most of the available performance capture approaches rely on a 3D video studio with tens of RGB cameras, this chapter presents a method for marker-less performance capture of single or multiple human characters using only three handheld Kinects. Compared with the RGB camera approaches, the proposed method is more convenient with respect to data acquisition, allowing for much fewer cameras and carry-on camera capture. The method introduced in this chapter reconstructs human skeletal poses, deforming surface geometry and camera poses for every time step of the depth video. It succeeds on general uncontrolled indoor scenes with potentially dynamic background, and it succeeds even for reconstruction of multiple closely interacting characters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
[2014] IEEE. Reprinted, with permission, from [Genzhi Ye, Yebin Liu, Yue Deng, Nils Hasler, Xiangyang Ji, Qionghai Dai, Christian Theobalt, Free-viewpoint Video of Human Actors using Multiple Handheld Kinects, IEEE Trans. Cybernetics, 43(5), pp 1370–1382, 2013].
- 2.
The accompanying video is available at: www.media.au.tsinghua.edu.cn/kinectfvv.mp4.
References
Deutscher J, Blake A, Reid I (2000) Articulated body motion capture by annealed particle filtering. In: IEEE conference on computer vision pattern recognition, pp 1144–1149
Bregler C, Malik J, Pullen K (2004) Twist based acquisition and tracking of animal and human kinematics. IJCV 56:179–194
Sigal L, Black M (2006 ) Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion. Technical Report CS-06-08, Brown University
Balan A, Sigal L, Black M, Davis J, Haussecker H (2007) Detailed human shape and pose from images. In: IEEE conference on computer vision pattern recognition, pp 1–8
Stoll C, Hasler N, Gall J, Seidel HP, Theobalt C (2011) Fast articulated motion tracking using a sums of gaussians body model. In: IEEE international conference on computer vision, pp 951–958
Poppe R (2007) Vision-based human motion analysis: an overview. CVIU 108:4–18
Vlasic D, Baran I, Matusik W, Popović J (2008) Articulated mesh animation from multi-view silhouettes. ACM Trans Graph 27:1–9
De Aguiar E, Stoll C, Theobalt C, Ahmed N, Seidel H, Thrun S (2008) Performance capture from sparse multi-view video. In: ACM Transactions on Graphics (TOG). vol 27, p 98
Ballan L, Cortelazzo G (2008) Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes. In: 3DPVT, vol 37
Cagniart C, Boyer E, Ilic S (2010) Free-form mesh tracking: a patch-based approach. In: IEEE conference on computer vision pattern recognition, pp 1339–1346
Starck J, Hilton A (2007) Surface capture for performance based animation. IEEE Comput Graph Appl 27(3):21–31
Gall J, Stoll C, De Aguiar E, Theobalt C, Rosenhahn B, Seidel H (2009) Motion capture using joint skeleton tracking and surface estimation. In: IEEE conference on computer vision pattern recognition, pp 1746–1753
Kolb A, Barth E, Koch R, Larsen R (2010) Time-of-flight cameras in computer graphics. Comput Graph Forum 29:141–159
Liu Y, Stoll C, Gall J, Seidel HP, Theobalt C (2011) Markerless motion capture of interacting characters using multi-view image segmentation. In: IEEE conference on computer vision pattern recognition, pp 1249–1256
Liu Y, Gall J, Stoll C, Dai Q, Seidel HP, Theobalt C (2013) Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans Pattern Anal Mach Intell 35:2720–2735
Ye G, Liu Y, Hasler N, Ji X, Dai Q, Theobalt C (2012) Performance capture of interacting characters with handheld kinects. In: IEEE conference on computer vision ECCV. Springer, Berlin, pp 828–841
Ye G, Liu Y, Deng Y, Hasler N, Ji X, Dai Q, Theobalt C (2013) Free-viewpoint video of human actors using multiple handheld kinects. IEEE T Cybern 43:1370–1382
Moeslund TB, Hilton A, Krüger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput vis image underst 104:90–126
Poppe R (2007) Vision-based human motion analysis: an overview. Comput vis image underst 108:4–18
Deutscher J, Blake A, Reid I (200) Articulated body motion capture by annealed particle filtering. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, Vol 2, pp 126–133
Gall J, Rosenhahn B, Brox T, Seidel HP (2010) Optimization and filtering for human motion capture. Int j comput vis 87:75–92
Wu C, Varanasi K, Theobalt C (2012) Full body performance capture under uncontrolled and varying illumination: a shading-based approach. Springer, New York, pp 757–770
Wu C, Varanasi K, Liu Y, Seidel HP, Theobalt C (2011) Shading-based dynamic shape refinement from multi-view video under general illumination. In: IEEE international conference on computer vision (ICCV), pp 1108–1115
Li G, Wu C, Stoll C, Liu Y, Varanasi K, Dai Q, Theobalt C (2013) Capturing relightable human performances under general uncontrolled illumination. Comput Graph Forum 32:275–284
Hasler N, Rosenhahn B, Thormählen T, Wand M, Gall J, Seidel HP (2009) Markerless motion capture with unsynchronized moving cameras. In: IEEE international conference on computer vision pattern recognition, pp 224–231
Wu C, Stoll C, Valgaerts L, Theobalt C (2013) On-set performance capture of multiple actors with a stereo camera. ACM Trans Graph (TOG) 32:161
Wei X, Chai J (2010) Videomocap: modeling physically realistic human motion from monocular video sequences. ACM Trans Graph (TOG) 29:42
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE T Cybernet 43:1318–1334
Shum HPH, Ho ESL, Jiang Y, Takagi S (2013) Real-time posture reconstruction for microsoft kinect. IEEE T Cybernet 43:1357–1369
Ni B, Pei Y, Moulin P, Yan S (2013) Multilevel depth and image fusion for human activity detection. IEEE T Cybernet 43:1383–1394
Baak A, Müller M, Bharaj G, Seidel HP, Theobalt C (2013) A data-driven approach for real-time full body pose reconstruction from a depth camera. In: Consumer depth cameras for computer vision. Springer, New York, pp 71–98
Shotton J, Fitzgibbon AW, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: IEEE international conference on computer vision pattern recognition, pp 1297–1304
Ganapathi V, Plagemann C, Koller D, Thrun S (2010) Real time motion capture using a single time-of-flight camera. In: IEEE international conference on computer vision pattern recognition, pp 755–762
Agarwal A, Triggs B (2004) 3d human pose from silhouettes by relevance vector regression. In: Proceedings of the IEEE computer society conference on computer vision and Pattern Recognition, vol 2, p 882
Ye M, Wang X, Yang R, Ren L, Pollefeys M (2011) Accurate 3d pose estimation from a single depth image. In: IEEE international conference on computer vision, pp 731–738
Taylor J, Shotton J, Sharp T, Fitzgibbon A (2012) The vitruvian manifold: inferring dense correspondences for one-shot human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 103–110
Wei X, Zhang P, Chai J (2012) Accurate realtime full-body motion capture using a single depth camera. ACM Trans Graph (TOG) 31:188
Bouguet JY (2004) Camera calibration toolbox for matlab
OpenNI: (http://www.openni.org/)
Barmpoutis A (2013) Tensor body: real-time reconstruction of the human body and avatar synthesis from rgb-d. IEEE T Cybernet 43:1347–1356
Tong J, Zhou J, Liu L, Pan Z, Yan H (2012) Scanning 3d full human bodies using kinects. IEEE Trans Vis Comput Graph 18:643–650
Li H, Vouga E, Gudym A, Luo L, Barron JT, Gusev G (2013) 3d self-portraits. ACM Trans Graph 32:187
Aiger D, Mitra NJ, Cohen-Or D (2008) 4-points congruent sets for robust surface registration. ACM Trans Graph 27(85):1–10
Sorkine O (2006) Differential representations for mesh processing. Comput Graph Forum 25:789–807
OptiTrack: (http://www.naturalpoint.com/optitrack/)
Oikonomidis I, Kyriazis N, Argyros AA (2011) Efficient model-based 3d tracking of hand articulations using kinect. In: IEEE international conference on BMVC, pp 1–11
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Liu, Y., Ye, G., Wang, Y., Dai, Q., Theobalt, C. (2014). Human Performance Capture Using Multiple Handheld Kinects. In: Shao, L., Han, J., Kohli, P., Zhang, Z. (eds) Computer Vision and Machine Learning with RGB-D Sensors. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-08651-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-08651-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08650-7
Online ISBN: 978-3-319-08651-4
eBook Packages: Computer ScienceComputer Science (R0)