Advertisement

International Journal of Computer Vision

, Volume 96, Issue 1, pp 103–124 | Cite as

Multi-view 3D Human Pose Estimation in Complex Environment

  • M. Hofmann
  • D. M. GavrilaEmail author
Open Access
Article

Abstract

We introduce a framework for unconstrained 3D human upper body pose estimation from multiple camera views in complex environment. Its main novelty lies in the integration of three components: single-frame pose recovery, temporal integration and model texture adaptation. Single-frame pose recovery consists of a hypothesis generation stage, in which candidate 3D poses are generated, based on probabilistic hierarchical shape matching in each camera view. In the subsequent hypothesis verification stage, the candidate 3D poses are re-projected into the other camera views and ranked according to a multi-view likelihood measure. Temporal integration consists of computing K-best trajectories combining a motion model and observations in a Viterbi-style maximum-likelihood approach. Poses that lie on the best trajectories are used to generate and adapt a texture model, which in turn enriches the shape likelihood measure used for pose recovery. The multiple trajectory hypotheses are used to generate pose predictions, augmenting the 3D pose candidates generated at the next time step.

We demonstrate that our approach outperforms the state-of-the-art in experiments with large and challenging real-world data from an outdoor setting.

Keywords

Human motion capture Articulated pose recovery Human computer interaction Surveillance 

References

  1. Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 44–58. CrossRefGoogle Scholar
  2. Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: people detection and articulated pose estimation. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  3. Balan, A., & Black, M. (2006). An adaptive appearance model approach for model-based articulated object tracking. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  4. Balan, A. O., Sigal, L., Black, M. J., Davis, J. E., & Haussecker, H. W. (2007). Detailed human shape and pose from images. In: CVPR (pp. 1–8). Google Scholar
  5. Bergtholdt, M., Kappes, J., Schmidt, S., & Schnörr, C. (2010). A study of parts-based object class detection using complete graphs. International Journal of Computer Vision, 87(1–2), 93–117. CrossRefMathSciNetGoogle Scholar
  6. Bissacco, A., Yang, M. H., & Soatto, S. (2007). Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  7. Bo, L., & Sminchisescu, C. (2010). Twin Gaussian processes for structured prediction. International Journal of Computer Vision, 87(1–2), 28–52. CrossRefGoogle Scholar
  8. Bouguet, J. Y. (2003). Camera calibration toolbox for Matlab. Google Scholar
  9. Bray, M., Meier, E. K., Schraudolph, N. N., & Gool, L. J. V. (2007). Fast stochastic optimization for articulated structure tracking. Image and Vision Computing, 25(3), 352–364. CrossRefGoogle Scholar
  10. Brubaker, M., Fleet, D., & Hertzmann, A. (2010). Physics-based person tracking using the anthropomorphic walker. International Journal of Computer Vision, 87(1–2), 140–155. CrossRefGoogle Scholar
  11. Cheung, K. M., Baker, S., & Kanade, T. (2005a). Shape-from-silhouette across time—part I. International Journal of Computer Vision, 62, 221–247. CrossRefGoogle Scholar
  12. Cheung, K. M., Baker, S., & Kanade, T. (2005b). Shape-from-silhouette across time—part II. International Journal of Computer Vision, 63(3), 225–245. CrossRefGoogle Scholar
  13. Corazza, S., Mündermann, L., Gambaretto, E., Ferrigno, G., & Andriacchi, T. (2010). 3D human motion tracking with a coordinated mixture of factor analyzers. International Journal of Computer Vision, 87(1–2), 156–169. CrossRefGoogle Scholar
  14. Deutscher, J., & Reid, I. (2005). Articulated body motion capture by stochastic search. International Journal of Computer Vision, 61(2), 185–205. CrossRefGoogle Scholar
  15. Drummond, T., & Cipolla, R. (2001). Real-time tracking of highly articulated structures in the presence of noisy measurements. In Proc. of the IEEE international conference on computer vision (ICCV) (pp. 315–320). Google Scholar
  16. Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2009). Pose search: retrieving people using their pose. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  17. Fleuret, F., Berclaz, J., Lengagne, R., & Fua, P. (2008). Multicamera people tracking with a probabilistic occupancy map. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 267–282. CrossRefGoogle Scholar
  18. Forsyth, D. A., Arikan, O., Ikemoto, L., O’Brien, J., & Ramanan, D. (2005). Computational studies of human motion. Foundations and Trends in Computer Graphics and Vision, 1(2–3), 77–254. CrossRefGoogle Scholar
  19. Fossati, A., Dimitrijevic, M., Lepetit, V., & Fua, P. (2007). Bridging the gap between detection and tracking for 3D monocular video-based motion capture. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  20. Fossati, A., Salzmann, M., & Fua, P. (2009). Observable subspaces for 3D human motion recovery. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  21. Gall, J., Stoll, C., de Aguiar, E., Theobalt, C., Rosenhahn, B., & Seidel, H. P. (2009). Motion capture using joint skeleton tracking and surface estimation. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  22. Gall, J., Rosenhahn, B., Brox, T., & Seidel, H. P. (2010). Optimization and filtering for human motion capture. International Journal of Computer Vision, 87(1–2), 75–92. CrossRefGoogle Scholar
  23. Gavrila, D. M. (1999). The visual analysis of human movement: a survey. Computer Vision and Image Understanding, 73(1), 82–98. CrossRefzbMATHGoogle Scholar
  24. Gavrila, D. M. (2007). A Bayesian, exemplar-based approach to hierarchical shape matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(8), 1408–1421. CrossRefGoogle Scholar
  25. Gavrila, D. M., & Davis, L. (1996). 3-D model-based tracking of humans in action: a multi-view approach. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  26. Gavrila, D. M., & Munder, S. (2007). Multi-cue pedestrian detection and tracking from a moving vehicle. International Journal of Computer Vision, 73(1), 41–59. CrossRefGoogle Scholar
  27. Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Gall, J., & Seidel, H. P. (2009). Markerless motion capture with unsynchronized moving cameras. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  28. Hofmann, M., & Gavrila, D. M. (2009a). Multi-view 3D human pose estimation combining single-frame recovery, temporal integration and model adaptation. In: Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  29. Hofmann, M., & Gavrila, D. M. (2009b). Single-frame 3D human pose recovery from multiple views. In Proc. of the DAGM symposium on pattern recognition. Google Scholar
  30. Kakadiaris, I., & Metaxas, D. (2000). Model-based estimation of 3-D human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1453–1459. CrossRefGoogle Scholar
  31. Kanaujia, A., Sminchisescu, C., & Metaxas, D. (2007). Semi-supervised hierarchical models for 3D human pose reconstruction. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  32. Kehl, R., & Gool, L. V. (2006). Markerless tracking of complex human motions from multiple views. Computer Vision and Image Understanding, 103(2–3), 190–209. CrossRefGoogle Scholar
  33. Knossow, D., Ronfard, R., & Horaud, R. (2008). Human motion tracking with a kinematic parametrization of extremal contours. International Journal of Computer Vision, 79, 247–269. CrossRefGoogle Scholar
  34. Kohli, P., Rihan, J., Bray, M., & Torr, P. H. S. (2008). Simultaneous segmentation and pose estimation of humans using dynamic graph cuts. International Journal of Computer Vision, 79, 285–298. CrossRefGoogle Scholar
  35. Laurentini, A. (1994). The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(2), 150–162. CrossRefGoogle Scholar
  36. Lee, C. S., & Elgammal, A. (2010). Coupled visual and kinematic manifold models for tracking. International Journal of Computer Vision, 87(1–2), 118–139. CrossRefGoogle Scholar
  37. Lee, M. W., & Cohen, I. (2006). A model-based approach for estimating human 3D poses in static images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6), 905–916. CrossRefGoogle Scholar
  38. Lee, M. W., & Nevatia, R. (2009). Human pose tracking in monocular sequence using multilevel structured models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 27–38. CrossRefGoogle Scholar
  39. Li, R., Tian, T. P., Sclaroff, S., & Yang, M. H. (2010). 3d human motion tracking with a coordinated mixture of factor analyzers. International Journal of Computer Vision, 87(1–2), 170–190. CrossRefGoogle Scholar
  40. Liem, M., & Gavrila, D. M. (2009). Multi-person tracking with overlapping cameras in complex, dynamic environments. In Proc. of the British machine vision conference (BMVC). Google Scholar
  41. Lv, F., & Nevatia, R. (2007). Single view human action recognition using key pose matching and Viterbi path searching. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  42. Marquardt, D. (1963). An algorithm for least-squares estimation of nonlinear parameters. SIAM Journal on Applied Mathematics, 11, 431–441. CrossRefMathSciNetGoogle Scholar
  43. Mikic, I., Trivedi, M., Hunter, E., & Cosman, P. (2003). Human body model acquisition and tracking using voxel data. International Journal of Computer Vision, 53(3), 199–223. CrossRefGoogle Scholar
  44. Moeslund, T. B., Hilton, A., & Kruger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 103(2–3), 90–126. CrossRefGoogle Scholar
  45. Mori, G., & Malik, J. (2006). Recovering 3D human body configurations using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(7), 1052–1062. CrossRefGoogle Scholar
  46. Navaratnam, R., Thayananthan, A., Torr, P. H. S., & Cipolla, R. (2005). Hierarchical part-based human body pose estimation. In Proc. of the British machine vision conference (BMVC). Google Scholar
  47. Ong, E. J., Hilton, A., & Micilotta, A. S. (2006). Viewpoint invariant exemplar-based 3D human tracking. Computer Vision and Image Understanding, 104, 178–189. CrossRefGoogle Scholar
  48. Peursum, P., Venkatesh, S., & West, G. (2007). Tracking-as-recognition for articulated full-body human motion analysis. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  49. Peursum, P., Venkatesh, S., & West, G. (2010). A study on smoothing for particle-filtered 3d human body tracking. International Journal of Computer Vision, 87(1–2), 53–74. CrossRefGoogle Scholar
  50. Pilu, M., & Fisher, R. B. (1995). Equal-distance sampling of superellipse models. In Proc. of the British machine vision conference (BMVC). Google Scholar
  51. Rabiner, L. (1989). A tutorial on HMMs and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286. CrossRefGoogle Scholar
  52. Ramanan, D., Forsyth, D. A., & Zisserman, A. (2007). Tracking people by learning their appearance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 65–81. CrossRefGoogle Scholar
  53. Roberts, T. J., McKenna, S. J., & Ricketts, I. W. (2006). Human tracking using 3D surface colour distributions. Image and Vision Computing, 24(12), 1332–1342. CrossRefGoogle Scholar
  54. Rogez, G., Rihan, J., Ramalingam, S., Orrite, C., & Torr, P. H. (2008). Randomized trees for human pose detection. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  55. Rosenhahn, B., & Brox, T. (2007). Scaled motion dynamics for markerless motion capture. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  56. Seshadri, N., & Sundberg, C. (1994). List Viterbi decoding algorithms with applications. IEEE Transactions on Communications, 42, 313–323. CrossRefGoogle Scholar
  57. Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In Proc. of the IEEE international conference on computer vision (ICCV) (pp. 750–757). CrossRefGoogle Scholar
  58. Sigal, L., & Black, M. (2010). Guest editorial: state of the art in image- and video-based human pose and motion estimation. International Journal of Computer Vision, 87(1–2), 1–3. CrossRefGoogle Scholar
  59. Sigal, L., Bhatia, S., Roth, S., Black, M. J., & Isard, M. (2004). Tracking loose-limbed people. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  60. Sigal, L., Balan, A., & Black, M. (2010). Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision, 87(1–2), 4–27. CrossRefGoogle Scholar
  61. Starck, J., & Hilton, A. (2003). Model-based multiple view reconstruction of people. In Proc. of the IEEE international conference on computer vision (ICCV) (pp. 915–922). CrossRefGoogle Scholar
  62. Stenger, B., Thayananthan, A., Torr, P. H. S., & Cipolla, R. (2006). Model-based hand tracking using a hierarchical Bayesian filter. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1372–1384. CrossRefGoogle Scholar
  63. Sundaresan, A., & Chellappa, R. (2009). Multicamera tracking of articulated human motion using shape and motion cues. IEEE Transactions on Image Processing, 18(9), 2114–2126. CrossRefMathSciNetGoogle Scholar
  64. Vondrak, M., Sigal, L., & Jenkins, O. C. (2008). Physical simulation for probabilistic motion tracking. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR). Google Scholar
  65. Xu, X., & Li, B. (2007). Learning motion correlation for tracking articulated human body with a Rao-Blackwellised particle filter. In Proc. of the IEEE international conference on computer vision (ICCV). Google Scholar
  66. Zivkovic, Z. (2004). Improved adaptive Gaussian mixture model for background subtraction. In Proc. of the international conference on pattern recognition (2) (pp. 28–31). Google Scholar

Copyright information

© The Author(s) 2011

Open AccessThis is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  1. 1.Intelligent Autonomous Systems Group, Informatics InstituteUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations