Machine Vision and Applications

, Volume 28, Issue 8, pp 839–858 | Cite as

Real-time 3D motion capture by monocular vision and virtual rendering

  • David Antonio Gómez Jáuregui
  • Patrick Horain
Original Paper


Networked 3D virtual environments allow multiple users to interact over the Internet by means of avatars and to get some feeling of a virtual telepresence. However, avatar control may be tedious. 3D sensors for motion capture systems based on 3D sensors have reached the consumer market, but webcams remain more widespread and cheaper. This work aims at animating a user’s avatar by real-time motion capture using a personal computer and a plain webcam. In a classical model-based approach, we register a 3D articulated upper-body model onto video sequences and propose a number of heuristics to accelerate particle filtering while robustly tracking user motion. Describing the body pose using wrists 3D positions rather than joint angles allows efficient handling of depth ambiguities for probabilistic tracking. We demonstrate experimentally the robustness of our 3D body tracking by real-time monocular vision, even in the case of partial occlusions and motion in the depth direction.


3D motion capture Monocular vision 3D/2D registration Particle filtering Real-time computer vision 


  1. 1.
    Agarwal, A., Triggs, B.: Recovering 3D human pose from monocular image. IEEE Trans. Pattern Anal. Mach.Intell. 12, 44–58 (2006)CrossRefGoogle Scholar
  2. 2.
    Andriluka, M., Roth, S., Schiele, B.: Monocular 3D pose estimation and tracking by detection. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 623–630, San Francisco, California (2010)Google Scholar
  3. 3.
    Balan, A.O., Sigal, L., Black, M.J.: A quantitative evaluation of video-based 3D person tracking. In: Proceedings of ICCCN 2005, pp. 349–356 (2005)Google Scholar
  4. 4.
    Bernier, O., Cheung-Mon-Chang, P.: Real-time 3D articulated pose tracking using particle filtering and belief propagation on factor graphs. BMVC 1, 27–36 (2006)Google Scholar
  5. 5.
    Bernier, O., Cheung-Mon-Chang, P., Bouguet, A.: Fast nonparametric belief propagation for real-time stereo articulated body tracking. Comput. Vis. Image Underst. 113(1), 29–47 (2009)CrossRefGoogle Scholar
  6. 6.
    Borgefors, G.: Distance transformations in digital images. Comput. Vis. Graph. Image Process. 34, 344–371 (1986)CrossRefGoogle Scholar
  7. 7.
    Bregler, C., Malik, J., Pullen, K.: Twist based acquisition and tracking of animal and human kinematics. Int. J. Comput. Vis. 56, 179–194 (2004)CrossRefGoogle Scholar
  8. 8.
    CMU: Carnegie mellon university graphics lab, motion capture database. (2017)
  9. 9.
    Delamarre, Q., Faugeras, O.: 3D articulated models and multiview tracking with physical forces. J. Comput. Vis. Image Underst. 81(3), 328–357 (2001)CrossRefzbMATHGoogle Scholar
  10. 10.
    Deriche, R.: Fast algorithms for low-level vision. IEEE Trans. Pattern Anal. Mach. Intell. 12, 78–87 (1990)CrossRefGoogle Scholar
  11. 11.
    Deutscher, J., Blake, A., Reid, I.: Articulated body motion capture by annealed particle filtering. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 126–133 (2000)Google Scholar
  12. 12.
    Deutscher, J., North, B., Bascle, B., Blake, A.: Tracking through singularities and discontinuities by random sampling. In: ICCV, pp. 1144–1149 (1999)Google Scholar
  13. 13.
    Elgammal, A.M., Lee, C.S.: Inferring 3D body pose from silhouettes using activity manifold learning. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR04), pp. 681–688 (2004)Google Scholar
  14. 14.
    Fontmarty, M., Lerasle, F., Danès, P.: Data fusion within a modified annealed particle filter dedicated to human motion capture. In: In International Conference on Intelligent Robots and Systems, San Diego, CA, USA (2007)Google Scholar
  15. 15.
    Gelencsér-Horváth, A., Tornai, G., Horváth, A., Cserey, G.: Fast, parallel implementation of particle filtering on the gpu architecture. EURASIP J. Adv. Signal Process. 148, 1–16 (2013)Google Scholar
  16. 16.
    Gonczarek, A., Tomczak, J.M.: Articulated tracking with manifold regularized particle filter. Mach. Vis. Appl. 27(2), 275–286 (2016)CrossRefzbMATHGoogle Scholar
  17. 17.
    HANIM: Hanim specification. (2017)
  18. 18.
    Hartmann, B., Mancini, M., Pelachaud, C.: Implementing expressive gesture synthesis for embodied conversational agents. In: Gesture Workshop. LNAI. Springer (2005)Google Scholar
  19. 19.
    Hauberg, S.: Three dimensional monocular human motion analysis in end-effector space. In: EMMCVPR 2009. Lecture Notes in Computer Science, pp. 235–248. Springer (2009)Google Scholar
  20. 20.
    Horain, P., Soares, J.M., Kumar, P., Bideau, A.: Virtually enhancing the perception of user actions. In: 15th International Conference on Artificial Reality and Telexistence (ICAT 2005), p. 245246, Christchurch, New Zealand (2005)Google Scholar
  21. 21.
    Howe, N.R.: A recognition-based motion capture baseline on the HumanEva II test data. Mach. Vis. Appl. 22(6), 995–1008 (2011)CrossRefGoogle Scholar
  22. 22.
    Hua, G., Wu, Y.: A decentralized probabilistic approach to articulated body tracking. J. Comput. Vis. Image Underst. 108(3), 272–283 (2007)CrossRefGoogle Scholar
  23. 23.
    I-Maginer: Open source platform for 3D environments. (2010)
  24. 24.
    Isard, M., Blake, A.: Condensation—conditional density propagation for visual tracking. IJCV Int. J. Comput. Vis. 29, 5–28 (1998)CrossRefGoogle Scholar
  25. 25.
    ISO/IEC:: Information technology-coding of audio-visual objects-part 2: visual international standard 14996-2 (2001)Google Scholar
  26. 26.
    Jáuregui, D.A.G., Horain, P.: Region-based vs. edge-based registration for 3D motion capture by real time monoscopic vision. In: A. Gagalowicz, L.. W. Philips (eds.) Proceedings of MIRAGE 2009, pp. 344–355. INRIA Rocquencourt, France (2009)Google Scholar
  27. 27.
    Jáuregui, D.A.G., Horain, P.: Real-time 3D motion capture by monocular vision and virtual rendering. In: A. Fusiello, V. Murino, L.V.. R. Cucchiara (editors) (eds.) Computer Vision ECCV 2012. Workshops and Demonstrations, pp. 663–666, Florence, Italy (2012)Google Scholar
  28. 28.
    Kalman, R.: A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 82D, 34–45 (1960)Google Scholar
  29. 29.
    Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)CrossRefzbMATHMathSciNetGoogle Scholar
  30. 30.
    Lee, M.W., Cohen, I., Jung, S.K.: Particle filter with analytical inference for human body tracking. In: IEEE Workshop on Motion and Video Computing, pp. 159–165 (2002)Google Scholar
  31. 31.
    Lenz, C., Panin, G., Knoll, A.: A gpu-accelerated particle filter with pixel-level likelihood. In: In International Workshop on Vision, Modeling and Visualization (VMV), Konstanz, Germany (2008)Google Scholar
  32. 32.
    Lindner, M., Schiller, I., Kolb, A., Koch, R.: Time-of-flight sensor calibration for accurate range sensing. Comput. Vis. Image Underst. 114(12), 1318–1328 (2010)CrossRefGoogle Scholar
  33. 33.
    Lopez, F., Zhang, L., Mok, A.K., Beaman, J.: Particle filtering on gpu architectures for manufacturing applications. Comput. Ind. 71, 116–127 (2015)CrossRefGoogle Scholar
  34. 34.
    Lozano, O.M., Otsuka, K.: Real-time visual tracker by stream processing. J. Signal Process. Syst. 57(2), 285–295 (2009)CrossRefGoogle Scholar
  35. 35.
    Lu, Z., Carreira-Perpinan, M., Sminchisescu, C.: People tracking with the Laplacian Eigenmaps latent variable model. Adv. Neural Inf. Process. Syst. 20, 1705–1712 (2008)Google Scholar
  36. 36.
    MacCormick, J., Isard, M.: Partitioned sampling, articulated objects, and interface-quality hand tracking. In: European Conference on Computer Vision, vol. 2, pp. 3–19, Dublin, Irlande (2000)Google Scholar
  37. 37.
    Marques-Soares, J., Horain, P., Bideau, A., Nguyen, M.: Acquisition 3D du geste par vision monoscopique en temps réel et téléprésence. In: Actes de l’atelier Acquisition du geste humain par vision artificielle et applications, pp. 23–27, Toulouse (2004)Google Scholar
  38. 38.
    Microsoft: Kinect - (2017)
  39. 39.
    Moeslund, T., Hilton, A., Kruger, V.: A survey of advances in vision-based human motion capture and analysis. Int. J. Comput. Vis. Image Underst. (CVIU’06) 104, 90126 (2006)Google Scholar
  40. 40.
    Montemayor, A.S., Pantrigo, J.J., Cabido, R., Payne, B.: Bandwidth-improved gpu particle filter for visual tracking. In: In proceedings of the Ibero-American Symposium on Computer Graphics—SIACG (2006), Santiago de Compostela, Spain (2006)Google Scholar
  41. 41.
    Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7, 208–313 (1965)CrossRefzbMATHMathSciNetGoogle Scholar
  42. 42.
    Ning, H., Tan, T., Wang, L., Hu, W.: People tracking based on motion model and motion constraints with automatic initialization. Pattern Recognit. 37(7), 1423–1440 (2004)CrossRefGoogle Scholar
  43. 43.
    Niskanen, M., Boyer, E., Horaud, R.: Articulated motion capture from 3-D points and normals. In: British Machine Vision Conference, pp. 439–448, Oxford, United Kingdom (2005)Google Scholar
  44. 44.
    Noriega, P., Bernier, O.: Multicues 3D monocular upper body tracking using constrained belief propagation. In: British Machine Vision Conference, vol. 2, pp. 10–13, Warwick, United Kingdom (2007)Google Scholar
  45. 45.
    Ouhaddi, H., Horain, P.: Hand tracking by 3D model registration. In: Subsol, G. (ed.) Colloque Scientifique International Ralit virtuelle et prototypage, pp. 51–59, Laval, France (1999)Google Scholar
  46. 46.
    Poppe, R.W.: Vision-based human motion analysis: an overview. Comput. Vis. Image Underst. 108, 4–18 (2007)CrossRefGoogle Scholar
  47. 47.
    Raskar, R.: Hardware support for non-photorealistic rendering. In: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, pp. 41–46. ACM Press (2001)Google Scholar
  48. 48.
    Rius, I., Gonzilez, J., Varona, J., Roca, F.X.: Action-specific motion prior for efficient bayesian 3D human body tracking. Pattern Recognit. 42(11), 2907–2921 (2009)CrossRefzbMATHGoogle Scholar
  49. 49.
    Rohr, K.: Towards model-based recognition of human movements in image sequences. CVGIP Image Underst. 59(1), 94–115 (1994)CrossRefGoogle Scholar
  50. 50.
    Saboune, J., Charpillet, F.: Using interval particle filtering for marker less 3D human motion capture. In: IEEE International Conference on Tools with Artificial Intelligence, pp. 621–627 (2005)Google Scholar
  51. 51.
    Shoemake, K.: Graphic Gems IV. Academic Press, Cambridge (1994). ISBN:0123361657Google Scholar
  52. 52.
    Sigal, L., Balan, A.O., Black, M.J.: Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1—-2), 4–27 (2010). doi: 10.1007/s11263-009-0273-6 CrossRefGoogle Scholar
  53. 53.
    Sigal, L., Bhatia, S., Roth, S., Black, M., Isard, M.: Tracking loose-limbed people. In: Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 421–428 (2004)Google Scholar
  54. 54.
    Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Discriminative density propagation for 3D human motion estimation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR05), vol. 1, pp. 390–397, San Diego, CA (2005)Google Scholar
  55. 55.
    Sminchisescu, C., Triggs, B.: Covariance scaled sampling for monocular 3D body tracking. In: Conference on Computer Vision and Pattern Recognition, Hawaii (2001)Google Scholar
  56. 56.
    Sminchisescu, C., Triggs, B.: Kinematic jump processes for monocular 3D human tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 69–76, Madison, WI (2003)Google Scholar
  57. 57.
    Tolani, D., Goswami, A., Badler, N.: Real-time inverse kinematics techniques for anthropomorphic limbs. Graphical Models and Image Process in archive 62, 353–388 (2000)Google Scholar
  58. 58.
    Toyama, K., Blake, A.: Probabilistic tracking with exemplars in a metric space. Int. J. Comput. Vis. 48(1), 9–19 (2002)CrossRefzbMATHGoogle Scholar
  59. 59.
    Urtasun, R., Fleet, D.J., Fua, P.: 3D people tracking with gaussian process dynamical models. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR06), vol. 1, p. 238245, New York, NY (2006)Google Scholar
  60. 60.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. IEEE Comput. Vis. Pattern Recognit. 1, 511–518 (2001)Google Scholar
  61. 61.
    Wright, R.S.J., Lipchak, B., Haemel, N.: OpenGL SuperBible: Comprehensive Tutorial and Reference, 4rth edn. Addison-Wesley Professional, Michigan (2007)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Institut Mines-TélécomTélécom SudParisEvry CedexFrance
  2. 2.ESTIABidartFrance

Personalised recommendations