International Journal of Computer Vision

, Volume 79, Issue 3, pp 285–298 | Cite as

Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts

  • Pushmeet Kohli
  • Jonathan Rihan
  • Matthieu Bray
  • Philip H. S. Torr


This paper presents a novel algorithm for performing integrated segmentation and 3D pose estimation of a human body from multiple views. Unlike other state of the art methods which focus on either segmentation or pose estimation individually, our approach tackles these two tasks together. Our method works by optimizing a cost function based on a Conditional Random Field (CRF). This has the advantage that all information in the image (edges, background and foreground appearances), as well as the prior information on the shape and pose of the subject can be combined and used in a Bayesian framework. Optimizing such a cost function would have been computationally infeasible. However, our recent research in dynamic graph cuts allows this to be done much more efficiently than before. We demonstrate the efficacy of our approach on challenging motion sequences. Although we target the human pose inference problem in the paper, our method is completely generic and can be used to segment and infer the pose of any rigid, deformable or articulated object.


Pose estimation Segmentation Energy minimization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agarwal, A., & Triggs, B. (2004). 3D human pose from silhouettes by relevance vector regression. In: CVPR (Vol. II, pp. 882–888). Google Scholar
  2. Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell., 28. Google Scholar
  3. Blake, A., Rother, C., Brown, M., Pérez, P., & Torr, P. (2004). Interactive image segmentation using an adaptive gmmrf model. In: ECCV (Vol. I, pp. 428–441). Google Scholar
  4. Boykov, Y., & Jolly, M. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: ICCV (Vol. I, pp. 105–112). Google Scholar
  5. Bray, M., Kohli, P., & Torr, P. H. S. (2006). Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In: ECCV (Vol. 2, pp. 642–655). Google Scholar
  6. Cremers, D., Osher, S., & Soatto, S. (2006). Kernel density estimation and intrinsic alignment for shape priors in level set segmentation. International Journal of Computer Vision, 69, 335–351. CrossRefGoogle Scholar
  7. Deutscher, J., Davison, A., & Reid, I. (2001). Automatic partitioning of high dimensional search spaces associated with articulated body motion capture. In: CVPR (Vol. 2, pp. 669–676). Google Scholar
  8. Ek, C., Laurence, N., & Torr, P. (2007). Gaussian process latent variable models for human pose estimation. In 4th joint workshop on multimodal interaction and related machine learning algorithms. Google Scholar
  9. Felzenszwalb, P. F., & Huttenlocher, D. P. (2000). Efficient matching of pictorial structures. In: CVPR. Google Scholar
  10. Felzenszwalb, P., & Huttenlocher, D. (2004). Distance transforms of sampled functions (Technical Report TR2004-1963). Cornell University. Google Scholar
  11. Freedman, D., & Zhang, T. (2005). Interactive graph cut based segmentation with shape priors. In: CVPR (Vol. I, pp. 755–762). Google Scholar
  12. Gavrila, D., & Davis, L. (1996). 3D model-based tracking of humans in action: a multi-view approach. In: CVPR (pp. 73–80). Google Scholar
  13. Huang, R., Pavlovic, V., & Metaxas, D. (2004). A graphical model framework for coupling mrfs and deformable models. In: CVPR (Vol. II, pp. 739–746). Google Scholar
  14. Kehl, R., Bray, M., & Van Gool, L. (2005). Full body tracking from multiple views using stochastic sampling. In: CVPR (Vol. II, pp. 129–136). Google Scholar
  15. Kohli, P., & Torr, P. (2005). Efficiently solving dynamic Markov random fields using graph cuts. In: ICCV. Google Scholar
  16. Kolmogorov, V., & Zabih, R. (2002). What energy functions can be minimized via graph cuts? In: ECCV (Vol. III). Google Scholar
  17. Kolmogorov, V., Criminisi, A., Blake, A., Cross, G., & Rother, C. (2005). Bi-layer segmentation of binocular stereo video. In: CVPR (Vol. 2, pp. 407–414). Google Scholar
  18. Kumar, M., Torr, P., & Zisserman, A. (2005). Obj cut. In: CVPR (Vol. I, pp. 18–25). Google Scholar
  19. Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (pp. 282–289). Google Scholar
  20. Lan, X., & Huttenlocher, D. P. (2005). Beyond trees: common-factor models for 2D human pose recovery. In: ICCV (pp. 470–477). Google Scholar
  21. Leventon, M. E., Grimson, W. E. L., & Faugeras, O. D. (2000). Statistical shape influence in geodesic active contours. In: CVPR (pp. 1316–1323). Google Scholar
  22. Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: Combining segmentation and recognition. In: CVPR (Vol. 2, pp. 326–333). Google Scholar
  23. Press, W., Flannery, B., Teukolsky, S., & Vetterling, W. (1988). Numerical recipes in C. Cambridge: Cambridge University Press. MATHGoogle Scholar
  24. Ramanan, D. (2007). Using segmentation to verify object hypotheses. In: CVPR. Google Scholar
  25. Ramanan, D., & Forsyth, D. A. (2003). Finding and tracking people from the bottom up. In: CVPR (Vol. 2, pp. 467–474). Google Scholar
  26. Rihan, J., Kohli, P., & Torr, P. H. S. (2006). Objcut for face detection. In: ICVGIP (pp. 576–584). Google Scholar
  27. Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In: ICCV (pp. 750–757). Google Scholar
  28. Sidenbladh, H., Black, M. J., & Fleet, D. J. (2000a). Stochastic tracking of 3D human figures using 2D image motion. In: ECCV (Vol. 2, pp. 702–718). Google Scholar
  29. Sidenbladh, H., Black, M. J., & Fleet, D. J. (2000b). Stochastic tracking of 3D human figures using 2D image motion. In: ECCV (pp. 702–718). Google Scholar
  30. Sminchisescu, C., & Jepson, A. D. (2004). Generative modeling for continuous non-linearly embedded visual inference. In: ICML. Google Scholar
  31. Sminchisescu, C., & Triggs, B. (2001). Covariance scaled sampling for monocular 3D body tracking. In: CVPR (pp. 447–454). Google Scholar
  32. Stauffer, C., & Grimson, W. (1999). Adaptive background mixture models for real-time tracking. In: CVPR (pp. 246–252). Google Scholar
  33. Stenger, B., Thayananthan, A., Torr, P., & Cipolla, R. (2003). Filtering using a tree-based estimator. In: ICCV (pp. 1063–1070). Google Scholar
  34. Sun, Y., Kohli, P., Bray, M., & Torr, P. H. S. (2006). Using strong shape priors for stereo. In: ICVGIP (pp. 882–893). Google Scholar
  35. Urtasun, R., Fleet, D. J., Hertzmann, A., & Fua, P. (2005). Priors for people tracking from small training sets. In: ICCV (pp. 403–410). Google Scholar
  36. Viola, P. A., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57, 137–154. CrossRefGoogle Scholar
  37. Zhao, L., & Davis, L. S. (2005). Closely coupled object detection and segmentation. In: ICCV (pp. 454–461). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Pushmeet Kohli
    • 1
  • Jonathan Rihan
    • 1
  • Matthieu Bray
    • 1
  • Philip H. S. Torr
    • 1
  1. 1.Department of ComputingOxford Brookes UniversityOxfordUK

Personalised recommendations