Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation

  • Leonid Sigal
  • Michael Isard
  • Horst Haussecker
  • Michael J. Black
Open Access
Article

Abstract

We formulate the problem of 3D human pose estimation and tracking as one of inference in a graphical model. Unlike traditional kinematic tree representations, our model of the body is a collection of loosely-connected body-parts. In particular, we model the body using an undirected graphical model in which nodes correspond to parts and edges to kinematic, penetration, and temporal constraints imposed by the joints and the world. These constraints are encoded using pair-wise statistical distributions, that are learned from motion-capture training data. Human pose and motion estimation is formulated as inference in this graphical model and is solved using Particle Message Passing (PaMPas). PaMPas is a form of non-parametric belief propagation that uses a variation of particle filtering that can be applied over a general graphical model with loops. The loose-limbed model and decentralized graph structure allow us to incorporate information from “bottom-up” visual cues, such as limb and head detectors, into the inference process. These detectors enable automatic initialization and aid recovery from transient tracking failures. We illustrate the method by automatically tracking people in multi-view imagery using a set of calibrated cameras and present quantitative evaluation using the HumanEva dataset.

Keywords

Articulated pose estimation Articulated tracking Human pose estimation Human motion tracking Non-parametric belief propagation 

References

  1. Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 44–58. CrossRefGoogle Scholar
  2. Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: people detection and articulated pose estimation. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR). Google Scholar
  3. Balan, A., Sigal, L., & Black, M. J. (2005). A quantitative evaluation of video-based 3D person tracking. In IEEE workshop on visual surveillance and performance evaluation of tracking and surveillance (pp. 349–356). October 2005. CrossRefGoogle Scholar
  4. Banerjee, A., Dhillon, I. S., Ghosh, J., & Sra, S. (2005). Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6, 1345–1382. MathSciNetMATHGoogle Scholar
  5. Bergtholdt, M., Kappes, J., Schmidt, S., & Schnorr, C. (2010). A study of parts-based object class detection using complete graphs. International Journal of Computer Vision, 87(1–2), 93–117. MathSciNetCrossRefGoogle Scholar
  6. Bhatia, S., Sigal, L., Isard, M., & Black, M. J. (2004). 3D human limb detection using space carving and multi-view eigen models. In IEEE Workshop on articulated and nonrigid motion, CVPR’04 CDROM proceedings. Google Scholar
  7. Bregler, C., & Malik, J. (1998). Tracking people with twists and exponential maps. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 8–15). Google Scholar
  8. Bo, L., Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2008). Fast algorithms for large scale conditional 3D prediction. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR). Google Scholar
  9. Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 679–714. CrossRefGoogle Scholar
  10. Cham, T.-J., & Rehg, J. (1999). A multiple hypothesis approach to figure tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 239–245). Google Scholar
  11. Cheung, G. K. M., Baker, S., & Kanade, T. (2003). Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 77–84). Google Scholar
  12. Choo, K., & Fleet, D. J. (2001). People tracking with hybrid Monte Carlo. In IEEE international conference on computer vision (ICCV) (Vol. 2, pp. 321–328). Google Scholar
  13. Cochran, W. G. (1977). Sampling techniques. New York: Wiley. MATHGoogle Scholar
  14. Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619. CrossRefGoogle Scholar
  15. Cooper, G. (1990). The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 42, 393–405. MathSciNetMATHCrossRefGoogle Scholar
  16. Corazza, S., Muendermann, L., Chaudhari, A., Demattio, T., Cobelli, C., & Andriacchi, T. (2006). A markerless motion capture system to study musculoskeletal biomechanics: visual hull and simulated annealing approach. Annals of Biomedical Engineering 34(6), 1019–1029. CrossRefGoogle Scholar
  17. Deutscher, J., & Reid, I. D. (2005). Articulated body motion capture by stochastic search. International Journal of Computer Vision 61(2), 185–205 CrossRefGoogle Scholar
  18. Deutscher, J., Blake, A., & Reid, I. (2000). Articulated body motion capture by annealed particle filtering. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 126–133). Google Scholar
  19. Deutscher, J., Isard, M., & MacCormick, J. (2002). Automatic camera calibration from a single Manhattan image. In European conference on computer vision (ECCV) (Vol. 4, pp. 175–188). Google Scholar
  20. Doucet, A., Godsill, S. J., & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197–208. CrossRefGoogle Scholar
  21. Doucet, A., de Freitas, N., & Gordon, N. (2001). Sequential Monte Carlo methods in practice. In Statistics for engineering and information sciences. Berlin: Springer. Google Scholar
  22. Eichner, M., & Ferrari, V. (2009). Better appearance models for pictorial structures. In British machine vision conference (BMVC). Google Scholar
  23. Elgammal, A., & Lee, C. (2004). Inferring 3D body pose from silhouettes using activity manifold learning. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 681–688). Google Scholar
  24. Elidan, G., McGraw, I., & Koller, D. (2006). Residual belief propagation: Informed scheduling for asynchronous message passing. In Proceedings of the twenty-second conference on uncertainty in AI (UAI), July 2006. Google Scholar
  25. Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision 61(1), 55–79 CrossRefGoogle Scholar
  26. Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 264–271). Google Scholar
  27. Fischler, M., & Elschlager, R. (1973). The representation and matching of pictorial structures. IEEE Transactions on Computers, 22(1), 67–92. CrossRefGoogle Scholar
  28. Foley, J., van Dam, A., Feiner, S., & Hughes, J. (1990). Computer graphics: Principles and practice. Reading: Addison Wesley. ISBN:0-201-12110-7 Google Scholar
  29. Forsyth, D. A., Arikan, O., Ikemoto, L., O’Brien, J., & Ramanan, D. (2006). Computational studies of human motion: Part 1, tracking and motion synthesis. ISBN:1-933019-30-1, 178 pp. Google Scholar
  30. Gall, J., Potthoff, J., Schnoerr, C., Rosenhahn, B., & Seidel, H.-P. (2006). Interacting annealing particle filters: Mathematics and a recipe for applications (Technical Report MPI-I-2006-4-009). Saarbruecken, Germany, September 2006. Google Scholar
  31. Gall, J., Rosenhahn, B., & Seidel, H.-P. (2007). Clustered stochastic optimization for object recognition and pose estimation. In LNCS: Vol. 4713. Annual symposium of the German association for pattern recognition (DAGM) (pp. 32–41). Google Scholar
  32. Gall, J., Rosenhahn, B., Brox, T., & Seidel, H.-P. (2010). Optimization and filtering for human motion capture—A multi-layer framework. International Journal of Computer Vision, 87(1), 75–92. CrossRefGoogle Scholar
  33. Gavrila, D. (1999). The visual analysis of human movement: A survey. Computer Vision and Image Understanding, 73(1), 82–98. MATHCrossRefGoogle Scholar
  34. Gavrila, D., & Davis, L. (1996). 3-D model-based tracking of humans in action: A multi-view approach. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 73–80). Google Scholar
  35. Grauman, K., Shakhnarovich, G., & Darrell, T. (2003). Inferring 3D structure with a statistical image-based shape model. In IEEE International conference on computer vision (ICCV) (pp. 641–648). CrossRefGoogle Scholar
  36. Guan, P., Weiss, A., Balan, A., & Black, M. J. (2009). Estimating human shape and pose from a single image. In IEEE International Conference on computer vision (ICCV). Google Scholar
  37. Hinton, G. E. (1976). Using relaxation to find a puppet. In Proceeding of the A.I.S.B. Summer conference (pp. 148–157). Google Scholar
  38. Hogg, D. C. (1983). Model-based vision: A program to see a walking person. Image and Vision Computing, 1, 5–20. CrossRefGoogle Scholar
  39. Horaud, R., Niskanen, M., Dewaele, G., & Boyer, E. (2008). Human motion tracking by registering an articulated surface to 3-D points and normals. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). Google Scholar
  40. Hua, G., Yang, M.-H., & Wu, Y. (2005). Learning to estimate human pose with data driven belief propagation. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp, 747–754). CrossRefGoogle Scholar
  41. Ihler, A. T., Sudderth, E. B., Freeman, W. T., & Willsky, A. S. (2003). Efficient multiscale sampling from products of Gaussian mixtures. Advances in Neural Information Processing Systems, 16, 1–8. Google Scholar
  42. Intel Open Source Computer Vision Library. Available at http://www.intel.com/research/mrl/research/opencv/.
  43. Ioffe, S., & Forsyth, D. (2001a). Human tracking with mixtures of trees. In IEEE international conference on computer vision (ICCV) (Vol. 1, pp. 690–695). Google Scholar
  44. Ioffe, S., & Forsyth, D. (2001b). Probabilistic methods for finding people. International Journal of Computer Vision, 43(1), 45–68. MATHCrossRefGoogle Scholar
  45. Isard, M. (2003). Pampas: Real-valued graphical models for computer vision. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 613–620). Google Scholar
  46. John, V., Ivekovic, S., & Trucco, E. (2009). Articulated human motion tracking with HPSO. In International conference on computer vision theory and applications (VISSAPP) (pp. 531–538). Google Scholar
  47. Jordan, M. I., Sejnowski, T. J., & Poggio, T. (2001). Graphical models: Foundations of neural computation. Cambridge: MIT Press. MATHGoogle Scholar
  48. Ju, S., Black, M. J., & Yacoob, Y. (1996). Cardboard people: A parameterized model of articulated motion. In International conference on automatic face and gesture recognition (pp. 38–44). Google Scholar
  49. Kakadiaris, I. A., & Metaxas, D. (1996). Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 81–87). Google Scholar
  50. Kehl, R., Bray, M., & Gool, L. V. (2005). Full body tracking from multiple views using stochastic sampling. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 129–136). CrossRefGoogle Scholar
  51. Kinoshita, K., Ma, Y., Lao, S., & Kawade, M. (2006). A fast and robust 3D head pose and gaze estimation system. In International conference on multimodal interfaces (ICMI) (pp. 137–138). CrossRefGoogle Scholar
  52. Kirkpatrick, S., Gellatt, C., & Vecchi, M. (1982). Optimisation by simulated annealing (Technical report). IBM Thomas J. Watson Research Centre, Yorktown Heights, NY, USA. Google Scholar
  53. Knossow, D., Ronfard, R., & Horaud, R. (2008). Human motion tracking with a kinematic parameterization of extremal contours. International Journal of Computer Vision, 79(2), 247–269. CrossRefGoogle Scholar
  54. Koller, D., Lerner, U., & Angelov, D. (1999). A general algorithm for approximate inference and its application to hybrid Bayes nets. In Proceedings of the 15th annual conference on uncertainty in artificial intelligence (pp. 324–333). Google Scholar
  55. Lan, X., & Huttenlocher, D. (2005). Beyond trees: Common factor models for 2D human pose recovery. In IEEE international conference on computer vision (ICCV) (pp. 470–477). Google Scholar
  56. Lee, C.-S., & Elgammal, A. (2007). Modeling view and posture manifold for tracking. In IEEE international conference on computer vision (ICCV). Google Scholar
  57. Li, R., Yang, M.-H., Sclaroff, S., & Tian, T.-P. (2006). Monocular tracking of 3D human motion with a coordinated mixture of factor analyzers. In European conference on computer vision (ECCV) (Vol. 2, pp. 137–150). Google Scholar
  58. Li, R., Tian, T.-P., & Sclaroff, S. (2007). Simultaneous learning of nonlinear manifold and dynamical models for high-dimensional time series. In IEEE international conference on computer vision (ICCV). Google Scholar
  59. Lu, Z., Perpinan, M. C., & Sminchisescu, C. (2007). People tracking with the Laplacian eigenmaps latent variable model. Advances in Neural Information Processing Systems (NIPS). Google Scholar
  60. MacCormick, J., & Isard, M. (2000). Partitioned sampling, articulated objects, and interface-quality hand tracking. In European conference on computer vision (ECCV) (Vol. 2, pp. 3–19). Google Scholar
  61. Marr, D., & Nishihara, H. K. (1978). Representation and recognition of the spatial organization of three dimensional structure. Proceedings of the Royal Society of London. Series B, Biological Sciences, 200, 269–294. CrossRefGoogle Scholar
  62. Moeslund, T., & Granum, E. (2001). A survey of computer vision-based human motion capture. Computer Vision and Image Understanding, 81(3), 231–268. MATHCrossRefGoogle Scholar
  63. Mori, G., Ren, X., Efros, A., & Malik, J. (2004). Recovering human body configurations: Combining segmentation and recognition. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 326–333). Google Scholar
  64. Navaratnam, R., Fitzgibbon, A., & Cipolla, R. (2007). Semi-supervised joint manifold learning for multi-valued regression. In IEEE international conference on computer vision (ICCV). Google Scholar
  65. Nevatia, R., & Binford, T. O. (1973). Structured descriptions of complex objects. In Proc. 3rd international joint conference on artificial intelligence (pp. 641–647). Google Scholar
  66. Opelt, A., Pinz, A., & Zisserman, A. (2006). A boundary-fragment-model for object detection. In European conference on computer vision (ECCV) (Vol. 2, pp. 575–588). Google Scholar
  67. Poppe, R. W. (2007a). Vision-based human motion analysis: An overview. Computer Vision and Image Understanding, 108(1–2), 4–18. CrossRefGoogle Scholar
  68. Poppe, R. (2007b). Evaluating example-based pose estimation: experiments on the HumanEva sets. In Workshop on evaluation of articulated human motion and pose estimation (EHuM2). Google Scholar
  69. Ramanan, D., Forsyth, D., & Zisserman, A. (2005). Strike a pose: Tracking people by finding stylized poses. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 271–278). CrossRefGoogle Scholar
  70. Ramanan, D., & Forsyth, D. (2003). Finding and tracking people from the bottom up. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 467–474). Google Scholar
  71. Rodgers, J., Anguelov, D., Pang, H.-C., & Koller, D. (2006). Object pose detection in range scan data. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 2445–2452). Google Scholar
  72. Rosales, R., & Sclaroff, S. (2000). Inferring body pose without tracking body parts. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 721–727). Google Scholar
  73. Rosales, R., & Sclaroff, S. (2002). Learning body pose via specialized maps. Advances in Neural Information Processing Systems, 15, 1263–1270. Google Scholar
  74. Rosenhahn, B., Schmaltz, C., Brox, T., Weickert, J., Cremers, D., & Seidel, H.-P. (2008). Markerless motion capture of man-machine interaction. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR). Google Scholar
  75. Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter sensitive hashing. In IEEE international conference on computer vision (ICCV) (Vol. 2, pp. 750–757). CrossRefGoogle Scholar
  76. Siddiqui, M., & Medioni, G. (2006). Robust real-time upper body limb detection and tracking. In ACM international workshop on video surveillance & sensor networks (VSSN). Google Scholar
  77. Sidenbladh, H., & Black, M. J. (2003). Learning the statistics of people in images and video. International Journal of Computer Vision, 54(1–3), 183–209. MATHGoogle Scholar
  78. Sidenbladh, H., Black, M. J., & Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In European conference on computer vision (ECCV) (Vol. 2, pp. 702–718). Google Scholar
  79. Sigal, L., & Black, M. J. (2006a). Predicting 3D people from 2D pictures. In LNCS: Vol. 4069. AMDO 2006—IV conference on articulated motion and deformable objects, Mallorca, Spain, July (pp. 185–195). CrossRefGoogle Scholar
  80. Sigal, L., & Black, M. J. (2006b). Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 2041–2048). Google Scholar
  81. Sigal, L., Zhu, Y., Comaniciu, D., & Black, M. J. (2004a). Tracking complex objects using graphical object models. In LNCS: Vol. 3417. 1st international workshop on complex motion (pp. 227–238). Berlin: Springer. Google Scholar
  82. Sigal, L., Bhatia, S., Roth, S., Black, M. J., & Isard, M. (2004b). Tracking loose-limbed people. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 421–428). Google Scholar
  83. Sigal, L., Balan, A., & Black, M. J. (2007). Combined discriminative and generative articulated pose and non-rigid shape estimation. Advances in Neural Information Processing Systems (NIPS). Google Scholar
  84. Sigal, L., Balan, A., & Black, M. J. (2010). HumanEva synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision, 87(1/2), 4–27. CrossRefGoogle Scholar
  85. Sminchisescu, C., & Triggs, B. (2003). Estimating articulated human motion with covariance scaled sampling. The International Journal of Robotics Research, 22(6), 371–393. CrossRefGoogle Scholar
  86. Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. (2005). Discriminative density propagation for 3D human motion estimation. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 390–397). CrossRefGoogle Scholar
  87. Sminchisescu, C., Kanajujia, A., & Metaxas, D. (2006). Learning joint top-down and bottom-up processes for 3D visual inference. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 1743–1752). Google Scholar
  88. Sudderth, E., Ihler, A., Freeman, W., & Willsky, A. (2003). Nonparametric belief propagation. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 605–612). Google Scholar
  89. Sudderth, E., Mandel, M., Freeman, W., & Willsky, A. (2004). Distributed occlusion reasoning for tracking with nonparametric belief propagation. Advances in Neural Information Processing Systems, 17, 1369–1376. Google Scholar
  90. Sun, J., Shum, H., & Zheng, N. (2002). Stereo matching using belief propagation. In European conference on computer vision (ECCV) (pp. 510–524). Google Scholar
  91. Tian, T.-P., & Sclaroff, S. (2010). Fast globally optimal 2D human detection with loopy graph models. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR). Google Scholar
  92. Urtasun, R., & Darrell, T. (2008). Local probabilistic regression for activity-independent human pose inference. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR). Google Scholar
  93. Urtasun, R., Fleet, D. J., & Fua, P. (2006). Gaussian process dynamical models for 3D people tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 238–245). Google Scholar
  94. Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 511–518). Google Scholar
  95. Wachter, S., & Nagel, H. (1999). Tracking of persons in monocular image sequences. Computer Vision and Image Understanding, 74(3), 174–192. CrossRefGoogle Scholar
  96. Wainwright, M., Jaakkola, T., & Willsky, A. (2001). Tree-based reparameterization for approximate estimation on loopy graphs. Advances in Neural Information Processing Systems (NIPS), 1001–1008. Google Scholar
  97. Wang, P., & Rehg, J. M. (2006). A modular approach to the analysis and evaluation of particle filters for figure tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 790–797). Google Scholar
  98. Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In European conference on computer vision (ECCV) (pp. 18–32). Google Scholar
  99. Weiss, Y., & Freeman, W. T. (2001). Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neural Computation, 13, 2173–2200. MATHCrossRefGoogle Scholar
  100. Wu, Y., Hua, G., & Yu, T. (2003). Tracking articulated body by dynamic Markov network. In IEEE international conference on computer vision (ICCV) (pp. 1094–1101). Google Scholar
  101. Wywill, G., & Kunii, T. L. (1985). A functional model for constructive solid geometry. The Visual Computer, 1(1), 3–14. CrossRefGoogle Scholar
  102. Xu, X., & Li, B. (2007). Learning motion correlation for tracking articulated human body with a rao-blackwellised particle filter. In IEEE international conference on computer vision (ICCV). Google Scholar
  103. Yonemoto, S., Arita, D., & Taniguchi, R. (2000). Real-time human motion analysis and IK-based human figure control. In Proceedings of the workshop on human motion (HUMO). Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  • Leonid Sigal
    • 1
  • Michael Isard
    • 2
  • Horst Haussecker
    • 3
  • Michael J. Black
    • 4
  1. 1.Disney ResearchPittsburghUSA
  2. 2.Microsoft Research Silicon ValleyMountain ViewUSA
  3. 3.Intel LabsSanta ClaraUSA
  4. 4.Max Planck Institute for Intelligent SystemsTübingenGermany

Personalised recommendations