HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion

  • Leonid Sigal
  • Alexandru O. Balan
  • Michael J. Black
Article

Abstract

While research on articulated human motion and pose estimation has progressed rapidly in the last few years, there has been no systematic quantitative evaluation of competing methods to establish the current state of the art. We present data obtained using a hardware system that is able to capture synchronized video and ground-truth 3D motion. The resulting HumanEva datasets contain multiple subjects performing a set of predefined actions with a number of repetitions. On the order of 40,000 frames of synchronized motion capture and multi-view video (resulting in over one quarter million image frames in total) were collected at 60 Hz with an additional 37,000 time instants of pure motion capture data. A standard set of error measures is defined for evaluating both 2D and 3D pose estimation and tracking algorithms. We also describe a baseline algorithm for 3D articulated tracking that uses a relatively standard Bayesian framework with optimization in the form of Sequential Importance Resampling and Annealed Particle Filtering. In the context of this baseline algorithm we explore a variety of likelihood functions, prior models of human motion and the effects of algorithm parameters. Our experiments suggest that image observation models and motion priors play important roles in performance, and that in a multi-view laboratory environment, where initialization is available, Bayesian filtering tends to perform well. The datasets and the software are made available to the research community. This infrastructure will support the development of new articulated motion and pose estimation algorithms, will provide a baseline for the evaluation and comparison of new methods, and will help establish the current state of the art in human pose estimation and tracking.

Keywords

Articulated pose estimation Articulated tracking Motion capture Human tracking Datasets and evaluation 

References

  1. Agarwal, A., & Triggs, B. (2004a). Learning to track 3D human motion from silhouettes. In International conference on machine learning (ICML) (pp. 9–16). Google Scholar
  2. Agarwal, A., & Triggs, B. (2004b). 3D human pose from silhouettes by relevance vector regression. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 882–888). Google Scholar
  3. Arulampalam, S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174–188. CrossRefGoogle Scholar
  4. Baker, S., Scharstien, D., Lewis, J. P., Roth, S., Black, M. J., & Szeliski, R. (2007). A database and evaluation methodology for optical flow. In IEEE international conference on computer vision (ICCV) (pp. 1–8). Google Scholar
  5. Balan, A., Sigal, L., Black, M. J., Davis, J., & Haussecker, H. (2007). Detailed human shape and pose from images. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8). Google Scholar
  6. Balan, A., & Black, M. J. (2006). An adaptive appearance model approach for model-based articulated object tracking. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 758–765). Google Scholar
  7. Balan, A., Sigal, L., & Black, M. (2005). A quantitative evaluation of video-based 3D person tracking. In IEEE workshop on visual surveillance and performance evaluation of tracking and surveillance (VS-PETS) (pp. 349–356). Google Scholar
  8. Bissacco, A., Yang, M.-H., & Soatto, S. (2007). Fast human pose estimation using appearance and motion via multi-dimensional boosting, regression. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8). Google Scholar
  9. Bo, L., Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2008). Fast algorithms for large scale conditional 3D prediction. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8). Google Scholar
  10. Bouguet, J.-Y. Camera calibration toolbox for Matlab. http://www.vision.caltech.edu/bouguetj/calib_doc/, accessed on 7/24/2009.
  11. Bregler, C., & Malik, J. (1998). Tracking people with twists and exponential maps. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 8–15). Google Scholar
  12. Brubaker, M., Fleet, D. J., & Hertzmann, A. (2007). Physics-based person tracking using simplified lower-body dynamics. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8). Google Scholar
  13. Camomilla, V., Cereatti, A., Vannozzi, G., & Cappozzo, A. (2006). An optimized protocol for hip joint centre determination using the functional method. Journal of Biomechanics, 39(6), 1096–1106. CrossRefGoogle Scholar
  14. CMU Motion Capture Database, http://mocap.cs.cmu.edu/, accessed on 7/24/2009.
  15. Corazza, S., Mündermann, L., & Andriacchi, T. (2007). A framework for the functional identification of joint centers using markerless motion capture, validation for the hip joint. Journal of Biomechanics, 40(15), 3510–3515. Google Scholar
  16. Deutscher, J., & Reid, I. (2005). Articulated body motion capture by stochastic search. International Journal of Computer Vision, 61(2), 185–205. CrossRefGoogle Scholar
  17. Doucet, A., Godsil, S. J., & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197–208. CrossRefGoogle Scholar
  18. Dimitrijevic, M., Lepetit, V., & Fua, P. (2006). Human body pose detection using bayesian spatio-temporal, templates. Computer Vision and Image Understanding, 104(2), 127–139. CrossRefGoogle Scholar
  19. Fathi, A., & Mori, G. (2007). Human pose estimation using motion, exemplars. In IEEE international conference on computer vision (ICCV) (pp. 1–8). Google Scholar
  20. Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79. CrossRefGoogle Scholar
  21. Gall, J., Rosenhahn, B., Brox, T., Kersting, U., & Seidel, H.-P. (2006). Learning for multi-view 3D tracking in the context of particle filters. In LNCS : Vol. 4292. International symposium on visual computing (ISVC) (pp. 59–69). Berlin: Springer. Google Scholar
  22. Gavrila, D. (1999). The visual analysis of human movement: a survey. Computer Vision and Image Understanding, 73(1), 82–98. MATHCrossRefGoogle Scholar
  23. Gavrila, D., & Davis, L. (1996). 3-D model-based tracking of humans in action: a multi-view approach. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 73–80). Google Scholar
  24. Grauman, K., Shakhnarovich, G., & Darrell, T. (2003). Inferring 3D structure with a statistical image-based shape model. In IEEE international conference on computer vision (ICCV) (pp. 641–648). Google Scholar
  25. Gross, R., & Shi, J. (2001). The CMU motion of body (MoBo) database. Technical Report CMU-RI-TR-01-18. Robotics Institute, Carnegie Mellon University. Google Scholar
  26. Hogg, D. C. (1983). Model-based vision: a program to see a walking person. Image and Vision Computing, 1, 5–20. CrossRefGoogle Scholar
  27. Hough, P. V. C. (1962). Method and means for recognizing complex patterns. U.S. Patent 3,069,654. Google Scholar
  28. Hua, G., Yang, M.-H., & Wu, Y. (2005). Learning to estimate human pose with data driven belief propagation. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 747–754). Google Scholar
  29. Isard, M., & Blake, A. (1998). Condensation–conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1), 5–28. CrossRefGoogle Scholar
  30. Jepson, A., Fleet, D., & El-Maraghi, T. (2003). Robust online appearance models for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(10), 1296–1311. CrossRefGoogle Scholar
  31. Ju, S., Black, M., & Yacoob, Y. (1996). Cardboard people: a parametrized model of articulated motion. In International conference on automatic face and gesture recognition (pp. 38–44). Google Scholar
  32. Kakadiaris, I. A., & Metaxas, D. (1996). Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 81–87). Google Scholar
  33. Knossow, D., Ronfard, R., & Horaud, R. (2008). Human motion tracking with a kinematic parameterization of extremal contours. International Journal of Computer Vision, 79(3), 247–269. CrossRefGoogle Scholar
  34. Lan, X., & Huttenlocher, D. (2005). Beyond trees: common factor models for 2D human pose recovery. In IEEE international conference on computer vision (ICCV), vol. 1 (pp. 470–477). Google Scholar
  35. Lan, X., & Huttenlocher, D. (2004). A unified spatio-temporal articulated model for tracking. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 722–729). Google Scholar
  36. Lee, C.-S., & Elgammal, A. (2007). Modeling view and posture manifold for tracking. In IEEE international conference on computer vision (ICCV) (pp. 1–8). Google Scholar
  37. Lee, M., & Nevatia, R. (2006). Human pose tracking using multi-level structured models. In European conference on computer vision (ECCV), vol. 3 (pp. 368–381). Google Scholar
  38. Li, R., Tian, T.-P., & Sclaroff, S. (2007). Simultaneous learning of nonlinear manifold and dynamical models for high-dimensional time series. In IEEE international conference on computer vision (ICCV) (pp. 1–8). Google Scholar
  39. Li, R., Yang, M.-H., Sclaroff, S., & Tian, T.-P. (2006). Monocular tracking of 3D human motion with a coordinated mixture of factor analyzers. In European conference on computer vision (ECCV). Google Scholar
  40. Lu, Z., Perpinan, M. C., & Sminchisescu, C. (2007). People tracking with the laplacian eigenmaps latent variable model. In Advances in neural information processing systems (NIPS), vol. 2 (pp. 137–150). Google Scholar
  41. MacCormick, J., & Isard, M. (2000). Partitioned sampling, articulated objects, and interface-quality hand tracking. In European conference on computer vision (ECCV), vol. 2 (pp. 3–19). Google Scholar
  42. Moeslund, T., & Granum, E. (2001). A survey of computer vision-based human motion capture. Computer Vision and Image Understanding, 18, 231–268. CrossRefGoogle Scholar
  43. Mori, G. (2005). Guiding model search using segmentation. In IEEE international conference on computer vision (ICCV) (pp. 1417–1423). Google Scholar
  44. Mori, G., Ren, X., Efros, A., & Malik, J. (2004). Recovering human body configurations: combining segmentation and recognition. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 326–333). Google Scholar
  45. Muendermann, L., Corazza, S., & Andriacchi, T. (2007). Accurately measuring human movement using articulated ICP with soft-joint constraints and a repository of articulated models. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8). Google Scholar
  46. Navaratnam, R., Fitzgibbon, A., & Cipolla, R. (2007). The joint manifold model for semi-supervised multi-valued regression. In IEEE international conference on computer vision (ICCV) (pp. 1–8). Google Scholar
  47. Ning, H., Xu, W., Gong, Y., & Huang, T. (2008). Discriminative learning of visual words for 3D human pose estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8). Google Scholar
  48. Ormoneit, D., Sidenbladh, H., Black, M. J., & Hastie, T. (2001). Learning and tracking cyclic human motion. In Advances in neural information processing systems (NIPS), vol. 13 (pp. 894–900). Google Scholar
  49. Ormoneit, D., Sidenbladh, H., Black, M. J., & Hastie, T. (2000). Stochastic modeling and tracking of human motion, Learning 2000, Snowbird, UT. Google Scholar
  50. O’Rourke, J., & Badler, N. I. (1980). Model-based image analysis of human motion using constraint propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(6), 522–192. Google Scholar
  51. Pavolvic, V., Rehg, J., Cham, T.-J., & Murphy, K. (1999). A dynamic Bayesian network approach to figure tracking using learned dynamic models. In IEEE international conference on computer vision (ICCV) (pp. 94–101). Google Scholar
  52. Phillips, P. J., Blackburn, D., Bone, M., Grother, P., Micheals, R., & Tabassi, E. (2002). Face recognition vendor test. http://www.frvt.org/.
  53. Phillips, P. J., Moon, H., Rizvi, S. A., & Rauss, P. J. (2000). The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(10), 1090–1104. CrossRefGoogle Scholar
  54. Poon, E., & Fleet, D. (2002). Hybrid Monte Carlo filtering: edge-based people tracking. It IEEE workshop on motion and video computing (pp. 151–158). Google Scholar
  55. Ramanan, D., Forsyth, D., & Zisserman, A. (2005). Strike a pose: tracking people by finding stylized poses (CVPR). In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 271–278). Google Scholar
  56. Ramanan, D., & Forsyth, D. (2003). Finding and tracking people from the bottom up. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 467–474). Google Scholar
  57. Ren, X., Berg, A., & Malik, J. (2005). Recovering human body configurations using pairwise constraints between parts. In IEEE international conference on computer vision (ICCV), vol. 1 (pp. 824–831). Google Scholar
  58. Roberts, T., McKenna, S., & Ricketts, I. (2004). Human pose estimation using learnt probabilistic region similarities and partial configurations. In European conference on computer vision (ICCV), vol. 4 (pp. 291–303). Google Scholar
  59. Rogez, G., Rihan, J., Ramalingam, S., Oritte, C., & Torr, P. H. S. (2008). Randomized trees for human pose estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8). Google Scholar
  60. Ronfard, R., Schmid, C., & Triggs, B. (2002). Larning to parse pictures of people. In European conference on computer vision (ECCV), vol. 4 (pp. 700–714). Google Scholar
  61. Rosales, R., & Sclaroff, S. (2000). Inferring body pose without tracking body parts. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 721–727). Google Scholar
  62. Rosenhahn, B., Brox, T., Kersting, U., Smith, D., Gurney, J., & Klette, R. (2006). A system for marker-less human motion estimation. Kuenstliche Intelligenz, 1, 45–51. Google Scholar
  63. Roth, S., Sigal, L., & Black, M. J. (2004). Gibbs likelihoods for Bayesian tracking. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 886–893). Google Scholar
  64. Sarkar, S., Phillips, P. J., Liu, Z., Robledo, I., Grother, P., & Bowyer, K. W. (2005). The human ID gait challenge problem: data sets, performance, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2), 162–177. CrossRefGoogle Scholar
  65. Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1/2/3), 7–42. MATHCrossRefGoogle Scholar
  66. Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In IEEE international conference on computer vision (ICCV), vol. 2 (pp. 750–759). Google Scholar
  67. Sidenbladh, H., & Black, M. J. (2003). Learning the statistics of people in images and video. International Journal of Computer Vision, 54(1–3), 183–209. MATHGoogle Scholar
  68. Sidenbladh, H., Black, M. J., & Sigal, L. (2002). Implicit probabilistic models of human motion for synthesis and tracking. In European conference on computer vision (ECCV), vol. 1 (pp. 784–800). Google Scholar
  69. Sidenbladh, H., De la Torre, F., & Black, M. J. (2000). A framework for modeling the appearance of 3D articulated figures. In International conference on automatic face and gesture recognition (FG) (pp. 368–375). Google Scholar
  70. Sidenbladh, H., Black, M., & Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In European conference on computer vision (ECCV), vol. 2 (pp. 702–718). Google Scholar
  71. Sigal, L., Bhatia, S., Roth, S., Black, M., & Isard, M. (2004). Tracking loose-limbed people. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 421–428). Google Scholar
  72. Sigal, L., & Black, M. (2006). Measure locally, reason globally: occlusion-sensitive articulated pose estimation. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 2041–2048). Google Scholar
  73. Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. (2005). Discriminative density propagation for 3D human motion estimation. in IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 390–397). Google Scholar
  74. Sminchisescu, C., & Jepson, A. (2004). Generative modeling for continuous non-linearly embedded visual inference. In International conference on machine learning (ICML) (pp. 759–766). Google Scholar
  75. Sminchisescu, C., & Triggs, B. (2003a). Kinematic jump processes for monocular 3D human tracking. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 69–76). Google Scholar
  76. Sminchisescu, C., & Triggs, B. (2003b). Estimating articulated human motion with covariance scaled sampling. International Journal of Robotics Research, 22(6), 371–391. CrossRefGoogle Scholar
  77. Sminchisescu, C., & Telea, A. (2002). Human pose estimation from silhouettes a consistent approach using distance level sets. In International conference on computer graphics, visualization and computer vision (WSCG). Google Scholar
  78. Sminchisescu, C. (2002). Consistency and coupling in human model likelihoods. In International conference on automatic face and gesture recognition (FG) (pp. 27–32). Google Scholar
  79. Srinivasan, P., & Shi, J. (2007). Bottom-up recognition and parsing of the human body. In IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 1–8). Google Scholar
  80. Taylor, C. J. (2000). Reconstruction of articulated objects from point correspondences in a single image. Computer Vision and Image Understanding, 80(3), 349–363. MATHCrossRefGoogle Scholar
  81. Urtasun, R., & Darrell, T. (2008). Local probabilistic regression for activity-independent human pose inference. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8). Google Scholar
  82. Urtasun, R., Fleet, D. J., & Fua, P. (2006). 3D people tracking with gaussian process dynamical models. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 238–245). Google Scholar
  83. Urtasun, R., Fleet, D. J., Hertzmann, A., & Fua, P. (2005). Priors for people tracking from small training sets. In IEEE international conference on computer vision (ICCV), vol. 1 (pp. 403–410). Google Scholar
  84. Vlasic, D., Baran, I., Matusik, W., & Popović, J. (2008). Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics, 27(3), 1–9. CrossRefGoogle Scholar
  85. Vondrak, M., Sigal, L., & Jenkins, O. C. (2008). Physical simulation for probabilistic motion tracking. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8). Google Scholar
  86. Wang, P., & Rehg, J. M. (2006). A modular approach to the analysis and evaluation of particle filters for figure tracking. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 790–797). Google Scholar
  87. Wachter, S., & Nagel, H. H. (1999). Tracking persons in monocular image sequences. Computer Vision and Image Understanding, 74(3), 174–192. CrossRefGoogle Scholar
  88. Xu, X., & Li, B. (2007). Learning motion correlation for tracking articulated human body with a Rao-Blackwellised particle filter. In IEEE international conference on computer vision (ICCV) (pp. 1–8). Google Scholar
  89. Zhang, J., Luo, J., Collins, R., & Liu, Y. (2006). Body localization in still images using hierarchical models and hybrid search. In IEEE international conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 1536–1543). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Leonid Sigal
    • 1
  • Alexandru O. Balan
    • 2
  • Michael J. Black
    • 2
  1. 1.Dept. of Computer ScienceUniversity of TorontoTorontoCanada
  2. 2.Dept. of Computer ScienceBrown UniversityProvidenceUSA

Personalised recommendations