Skip to main content

Advertisement

Log in

Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper presents a novel algorithm for performing integrated segmentation and 3D pose estimation of a human body from multiple views. Unlike other state of the art methods which focus on either segmentation or pose estimation individually, our approach tackles these two tasks together. Our method works by optimizing a cost function based on a Conditional Random Field (CRF). This has the advantage that all information in the image (edges, background and foreground appearances), as well as the prior information on the shape and pose of the subject can be combined and used in a Bayesian framework. Optimizing such a cost function would have been computationally infeasible. However, our recent research in dynamic graph cuts allows this to be done much more efficiently than before. We demonstrate the efficacy of our approach on challenging motion sequences. Although we target the human pose inference problem in the paper, our method is completely generic and can be used to segment and infer the pose of any rigid, deformable or articulated object.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal, A., & Triggs, B. (2004). 3D human pose from silhouettes by relevance vector regression. In: CVPR (Vol. II, pp. 882–888).

  • Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell., 28.

  • Blake, A., Rother, C., Brown, M., Pérez, P., & Torr, P. (2004). Interactive image segmentation using an adaptive gmmrf model. In: ECCV (Vol. I, pp. 428–441).

  • Boykov, Y., & Jolly, M. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: ICCV (Vol. I, pp. 105–112).

  • Bray, M., Kohli, P., & Torr, P. H. S. (2006). Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In: ECCV (Vol. 2, pp. 642–655).

  • Cremers, D., Osher, S., & Soatto, S. (2006). Kernel density estimation and intrinsic alignment for shape priors in level set segmentation. International Journal of Computer Vision, 69, 335–351.

    Article  Google Scholar 

  • Deutscher, J., Davison, A., & Reid, I. (2001). Automatic partitioning of high dimensional search spaces associated with articulated body motion capture. In: CVPR (Vol. 2, pp. 669–676).

  • Ek, C., Laurence, N., & Torr, P. (2007). Gaussian process latent variable models for human pose estimation. In 4th joint workshop on multimodal interaction and related machine learning algorithms.

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2000). Efficient matching of pictorial structures. In: CVPR.

  • Felzenszwalb, P., & Huttenlocher, D. (2004). Distance transforms of sampled functions (Technical Report TR2004-1963). Cornell University.

  • Freedman, D., & Zhang, T. (2005). Interactive graph cut based segmentation with shape priors. In: CVPR (Vol. I, pp. 755–762).

  • Gavrila, D., & Davis, L. (1996). 3D model-based tracking of humans in action: a multi-view approach. In: CVPR (pp. 73–80).

  • Huang, R., Pavlovic, V., & Metaxas, D. (2004). A graphical model framework for coupling mrfs and deformable models. In: CVPR (Vol. II, pp. 739–746).

  • Kehl, R., Bray, M., & Van Gool, L. (2005). Full body tracking from multiple views using stochastic sampling. In: CVPR (Vol. II, pp. 129–136).

  • Kohli, P., & Torr, P. (2005). Efficiently solving dynamic Markov random fields using graph cuts. In: ICCV.

  • Kolmogorov, V., & Zabih, R. (2002). What energy functions can be minimized via graph cuts? In: ECCV (Vol. III).

  • Kolmogorov, V., Criminisi, A., Blake, A., Cross, G., & Rother, C. (2005). Bi-layer segmentation of binocular stereo video. In: CVPR (Vol. 2, pp. 407–414).

  • Kumar, M., Torr, P., & Zisserman, A. (2005). Obj cut. In: CVPR (Vol. I, pp. 18–25).

  • Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (pp. 282–289).

  • Lan, X., & Huttenlocher, D. P. (2005). Beyond trees: common-factor models for 2D human pose recovery. In: ICCV (pp. 470–477).

  • Leventon, M. E., Grimson, W. E. L., & Faugeras, O. D. (2000). Statistical shape influence in geodesic active contours. In: CVPR (pp. 1316–1323).

  • Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: Combining segmentation and recognition. In: CVPR (Vol. 2, pp. 326–333).

  • Press, W., Flannery, B., Teukolsky, S., & Vetterling, W. (1988). Numerical recipes in C. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Ramanan, D. (2007). Using segmentation to verify object hypotheses. In: CVPR.

  • Ramanan, D., & Forsyth, D. A. (2003). Finding and tracking people from the bottom up. In: CVPR (Vol. 2, pp. 467–474).

  • Rihan, J., Kohli, P., & Torr, P. H. S. (2006). Objcut for face detection. In: ICVGIP (pp. 576–584).

  • Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In: ICCV (pp. 750–757).

  • Sidenbladh, H., Black, M. J., & Fleet, D. J. (2000a). Stochastic tracking of 3D human figures using 2D image motion. In: ECCV (Vol. 2, pp. 702–718).

  • Sidenbladh, H., Black, M. J., & Fleet, D. J. (2000b). Stochastic tracking of 3D human figures using 2D image motion. In: ECCV (pp. 702–718).

  • Sminchisescu, C., & Jepson, A. D. (2004). Generative modeling for continuous non-linearly embedded visual inference. In: ICML.

  • Sminchisescu, C., & Triggs, B. (2001). Covariance scaled sampling for monocular 3D body tracking. In: CVPR (pp. 447–454).

  • Stauffer, C., & Grimson, W. (1999). Adaptive background mixture models for real-time tracking. In: CVPR (pp. 246–252).

  • Stenger, B., Thayananthan, A., Torr, P., & Cipolla, R. (2003). Filtering using a tree-based estimator. In: ICCV (pp. 1063–1070).

  • Sun, Y., Kohli, P., Bray, M., & Torr, P. H. S. (2006). Using strong shape priors for stereo. In: ICVGIP (pp. 882–893).

  • Urtasun, R., Fleet, D. J., Hertzmann, A., & Fua, P. (2005). Priors for people tracking from small training sets. In: ICCV (pp. 403–410).

  • Viola, P. A., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57, 137–154.

    Article  Google Scholar 

  • Zhao, L., & Davis, L. S. (2005). Closely coupled object detection and segmentation. In: ICCV (pp. 454–461).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pushmeet Kohli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kohli, P., Rihan, J., Bray, M. et al. Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts. Int J Comput Vis 79, 285–298 (2008). https://doi.org/10.1007/s11263-007-0120-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-007-0120-6

Keywords

Navigation