Skip to main content

Human Performance Capture Using Multiple Handheld Kinects

  • Chapter
  • First Online:
Computer Vision and Machine Learning with RGB-D Sensors

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

Capturing real performances of human actors has been an important topic in the fields of computer graphics and computer vision in the last few decades. The reconstructed 3D performance can be used for character animation and free-viewpoint video. While most of the available performance capture approaches rely on a 3D video studio with tens of RGB cameras, this chapter presents a method for marker-less performance capture of single or multiple human characters using only three handheld Kinects. Compared with the RGB camera approaches, the proposed method is more convenient with respect to data acquisition, allowing for much fewer cameras and carry-on camera capture. The method introduced in this chapter reconstructs human skeletal poses, deforming surface geometry and camera poses for every time step of the depth video. It succeeds on general uncontrolled indoor scenes with potentially dynamic background, and it succeeds even for reconstruction of multiple closely interacting characters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    [2014] IEEE. Reprinted, with permission, from [Genzhi Ye, Yebin Liu, Yue Deng, Nils Hasler, Xiangyang Ji, Qionghai Dai, Christian Theobalt, Free-viewpoint Video of Human Actors using Multiple Handheld Kinects, IEEE Trans. Cybernetics, 43(5), pp 1370–1382, 2013].

  2. 2.

    The accompanying video is available at: www.media.au.tsinghua.edu.cn/kinectfvv.mp4.

References

  1. Deutscher J, Blake A, Reid I (2000) Articulated body motion capture by annealed particle filtering. In: IEEE conference on computer vision pattern recognition, pp 1144–1149

    Google Scholar 

  2. Bregler C, Malik J, Pullen K (2004) Twist based acquisition and tracking of animal and human kinematics. IJCV 56:179–194

    Article  Google Scholar 

  3. Sigal L, Black M (2006 ) Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion. Technical Report CS-06-08, Brown University

    Google Scholar 

  4. Balan A, Sigal L, Black M, Davis J, Haussecker H (2007) Detailed human shape and pose from images. In: IEEE conference on computer vision pattern recognition, pp 1–8

    Google Scholar 

  5. Stoll C, Hasler N, Gall J, Seidel HP, Theobalt C (2011) Fast articulated motion tracking using a sums of gaussians body model. In: IEEE international conference on computer vision, pp 951–958

    Google Scholar 

  6. Poppe R (2007) Vision-based human motion analysis: an overview. CVIU 108:4–18

    Google Scholar 

  7. Vlasic D, Baran I, Matusik W, Popović J (2008) Articulated mesh animation from multi-view silhouettes. ACM Trans Graph 27:1–9

    Article  Google Scholar 

  8. De Aguiar E, Stoll C, Theobalt C, Ahmed N, Seidel H, Thrun S (2008) Performance capture from sparse multi-view video. In: ACM Transactions on Graphics (TOG). vol 27, p 98

    Google Scholar 

  9. Ballan L, Cortelazzo G (2008) Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes. In: 3DPVT, vol 37

    Google Scholar 

  10. Cagniart C, Boyer E, Ilic S (2010) Free-form mesh tracking: a patch-based approach. In: IEEE conference on computer vision pattern recognition, pp 1339–1346

    Google Scholar 

  11. Starck J, Hilton A (2007) Surface capture for performance based animation. IEEE Comput Graph Appl 27(3):21–31

    Article  Google Scholar 

  12. Gall J, Stoll C, De Aguiar E, Theobalt C, Rosenhahn B, Seidel H (2009) Motion capture using joint skeleton tracking and surface estimation. In: IEEE conference on computer vision pattern recognition, pp 1746–1753

    Google Scholar 

  13. Kolb A, Barth E, Koch R, Larsen R (2010) Time-of-flight cameras in computer graphics. Comput Graph Forum 29:141–159

    Article  Google Scholar 

  14. Liu Y, Stoll C, Gall J, Seidel HP, Theobalt C (2011) Markerless motion capture of interacting characters using multi-view image segmentation. In: IEEE conference on computer vision pattern recognition, pp 1249–1256

    Google Scholar 

  15. Liu Y, Gall J, Stoll C, Dai Q, Seidel HP, Theobalt C (2013) Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans Pattern Anal Mach Intell 35:2720–2735

    Article  Google Scholar 

  16. Ye G, Liu Y, Hasler N, Ji X, Dai Q, Theobalt C (2012) Performance capture of interacting characters with handheld kinects. In: IEEE conference on computer vision ECCV. Springer, Berlin, pp 828–841

    Google Scholar 

  17. Ye G, Liu Y, Deng Y, Hasler N, Ji X, Dai Q, Theobalt C (2013) Free-viewpoint video of human actors using multiple handheld kinects. IEEE T Cybern 43:1370–1382

    Article  Google Scholar 

  18. Moeslund TB, Hilton A, Krüger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput vis image underst 104:90–126

    Article  Google Scholar 

  19. Poppe R (2007) Vision-based human motion analysis: an overview. Comput vis image underst 108:4–18

    Article  Google Scholar 

  20. Deutscher J, Blake A, Reid I (200) Articulated body motion capture by annealed particle filtering. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, Vol 2, pp 126–133

    Google Scholar 

  21. Gall J, Rosenhahn B, Brox T, Seidel HP (2010) Optimization and filtering for human motion capture. Int j comput vis 87:75–92

    Article  Google Scholar 

  22. Wu C, Varanasi K, Theobalt C (2012) Full body performance capture under uncontrolled and varying illumination: a shading-based approach. Springer, New York, pp 757–770

    Google Scholar 

  23. Wu C, Varanasi K, Liu Y, Seidel HP, Theobalt C (2011) Shading-based dynamic shape refinement from multi-view video under general illumination. In: IEEE international conference on computer vision (ICCV), pp 1108–1115

    Google Scholar 

  24. Li G, Wu C, Stoll C, Liu Y, Varanasi K, Dai Q, Theobalt C (2013) Capturing relightable human performances under general uncontrolled illumination. Comput Graph Forum 32:275–284

    Article  Google Scholar 

  25. Hasler N, Rosenhahn B, Thormählen T, Wand M, Gall J, Seidel HP (2009) Markerless motion capture with unsynchronized moving cameras. In: IEEE international conference on computer vision pattern recognition, pp 224–231

    Google Scholar 

  26. Wu C, Stoll C, Valgaerts L, Theobalt C (2013) On-set performance capture of multiple actors with a stereo camera. ACM Trans Graph (TOG) 32:161

    Google Scholar 

  27. Wei X, Chai J (2010) Videomocap: modeling physically realistic human motion from monocular video sequences. ACM Trans Graph (TOG) 29:42

    Google Scholar 

  28. Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE T Cybernet 43:1318–1334

    Article  Google Scholar 

  29. Shum HPH, Ho ESL, Jiang Y, Takagi S (2013) Real-time posture reconstruction for microsoft kinect. IEEE T Cybernet 43:1357–1369

    Article  Google Scholar 

  30. Ni B, Pei Y, Moulin P, Yan S (2013) Multilevel depth and image fusion for human activity detection. IEEE T Cybernet 43:1383–1394

    Article  Google Scholar 

  31. Baak A, Müller M, Bharaj G, Seidel HP, Theobalt C (2013) A data-driven approach for real-time full body pose reconstruction from a depth camera. In: Consumer depth cameras for computer vision. Springer, New York, pp 71–98

    Google Scholar 

  32. Shotton J, Fitzgibbon AW, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: IEEE international conference on computer vision pattern recognition, pp 1297–1304

    Google Scholar 

  33. Ganapathi V, Plagemann C, Koller D, Thrun S (2010) Real time motion capture using a single time-of-flight camera. In: IEEE international conference on computer vision pattern recognition, pp 755–762

    Google Scholar 

  34. Agarwal A, Triggs B (2004) 3d human pose from silhouettes by relevance vector regression. In: Proceedings of the IEEE computer society conference on computer vision and Pattern Recognition, vol 2, p 882

    Google Scholar 

  35. Ye M, Wang X, Yang R, Ren L, Pollefeys M (2011) Accurate 3d pose estimation from a single depth image. In: IEEE international conference on computer vision, pp 731–738

    Google Scholar 

  36. Taylor J, Shotton J, Sharp T, Fitzgibbon A (2012) The vitruvian manifold: inferring dense correspondences for one-shot human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 103–110

    Google Scholar 

  37. Wei X, Zhang P, Chai J (2012) Accurate realtime full-body motion capture using a single depth camera. ACM Trans Graph (TOG) 31:188

    Article  Google Scholar 

  38. Bouguet JY (2004) Camera calibration toolbox for matlab

    Google Scholar 

  39. OpenNI: (http://www.openni.org/)

  40. Barmpoutis A (2013) Tensor body: real-time reconstruction of the human body and avatar synthesis from rgb-d. IEEE T Cybernet 43:1347–1356

    Article  Google Scholar 

  41. Tong J, Zhou J, Liu L, Pan Z, Yan H (2012) Scanning 3d full human bodies using kinects. IEEE Trans Vis Comput Graph 18:643–650

    Article  Google Scholar 

  42. Li H, Vouga E, Gudym A, Luo L, Barron JT, Gusev G (2013) 3d self-portraits. ACM Trans Graph 32:187

    Google Scholar 

  43. Aiger D, Mitra NJ, Cohen-Or D (2008) 4-points congruent sets for robust surface registration. ACM Trans Graph 27(85):1–10

    Google Scholar 

  44. Sorkine O (2006) Differential representations for mesh processing. Comput Graph Forum 25:789–807

    Article  Google Scholar 

  45. OptiTrack: (http://www.naturalpoint.com/optitrack/)

  46. Oikonomidis I, Kyriazis N, Argyros AA (2011) Efficient model-based 3d tracking of hand articulations using kinect. In: IEEE international conference on BMVC, pp 1–11

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yebin Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Liu, Y., Ye, G., Wang, Y., Dai, Q., Theobalt, C. (2014). Human Performance Capture Using Multiple Handheld Kinects. In: Shao, L., Han, J., Kohli, P., Zhang, Z. (eds) Computer Vision and Machine Learning with RGB-D Sensors. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-08651-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08651-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08650-7

  • Online ISBN: 978-3-319-08651-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics