Cascaded Models for Articulated Pose Estimation

  • Benjamin Sapp
  • Alexander Toshev
  • Ben Taskar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6312)


We address the problem of articulated human pose estimation by learning a coarse-to-fine cascade of pictorial structure models. While the fine-level state-space of poses of individual parts is too large to permit the use of rich appearance models, most possibilities can be ruled out by efficient structured models at a coarser scale. We propose to learn a sequence of structured models at different pose resolutions, where coarse models filter the pose space for the next level via their max-marginals. The cascade is trained to prune as much as possible while preserving true poses for the final level pictorial structure model. The final level uses much more expensive segmentation, contour and shape features in the model for the remaining filtered set of candidates. We evaluate our framework on the challenging Buffy and PASCAL human pose datasets, improving the state-of-the-art.


State Space Part Axis Pictorial Structure Rich Feature Pairwise Term 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material (0 kb)
Electronic Supplementary Material (1 KB) (3 kb)
Electronic Supplementary Material (10 KB)


  1. 1.
    Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. IEEE Transactions on Computers 100, 67–92 (1973)CrossRefGoogle Scholar
  2. 2.
    Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. IJCV 61, 55–79 (2005)CrossRefGoogle Scholar
  3. 3.
    Fergus, R., Perona, P., Zisserman, A.: A sparse object category model for efficient learning and exhaustive recognition. In: Proc. CVPR (2005)Google Scholar
  4. 4.
    Ramanan, D., Sminchisescu, C.: Training deformable models for localization. In: CVPR, pp. 206–213 (2006)Google Scholar
  5. 5.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: Proc. CVPR (2008)Google Scholar
  6. 6.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: Proc. CVPR (2009)Google Scholar
  7. 7.
    Fleuret, G., Geman, D.: Coarse-to-Fine Face Detection. IJCV 41, 85–107 (2001)zbMATHCrossRefGoogle Scholar
  8. 8.
    Viola, P., Jones, M.: Robust real-time object detection. IJCV 57, 137–154 (2002)CrossRefGoogle Scholar
  9. 9.
    Lan, X., Huttenlocher, D.: Beyond trees: Common-factor models for 2d human pose recovery. In: Proc. ICCV, pp. 470–477 (2005)Google Scholar
  10. 10.
    Mori, G., Ren, X., Efros, A., Malik, J.: Recovering human body configurations: Combining segmentation and recognition. In: CVPR (2004)Google Scholar
  11. 11.
    Ramanan, D.: Learning to parse images of articulated bodies. In: NIPS (2006)Google Scholar
  12. 12.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Pose search: retrieving people using their pose. In: Proc. CVPR (2009)Google Scholar
  13. 13.
    Weiss, D., Taskar, B.: Structured prediction cascades. In: Proc. AISTATS (2010)Google Scholar
  14. 14.
    Carreras, X., Collins, M., Koo, T.: TAG, dynamic programming, and the perceptron for efficient, feature-rich parsing. In: Proc. CoNLL (2008)Google Scholar
  15. 15.
    Petrov, S.: Coarse-to-Fine Natural Language Processing. PhD thesis, University of California at Bekeley (2009)Google Scholar
  16. 16.
    Felzenszwalb, P., Girshick, R., McAllester, D.: Cascade Object Detection with Deformable Part Models. In: Proc. CVPR (2010)Google Scholar
  17. 17.
    Srinivasan, P., Shi, J.: Bottom-up recognition and parsing of the human body. In: ICCV 2005, pp. 824–831. IEEE Computer Society, Los Alamitos (2007)Google Scholar
  18. 18.
    Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient SOlver for SVM. In: Proc. ICML (2007)Google Scholar
  19. 19.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object Detection with Discriminatively Trained Part Based Models. In: PAMI (2008)Google Scholar
  20. 20.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting. The Annals of Statistics 28, 337–374 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: Proc. BMVC (2009)Google Scholar
  22. 22.
    Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. JCSS 55, 119–139 (1997)zbMATHMathSciNetGoogle Scholar
  23. 23.
    Ramanan, D., Forsyth, D., Zisserman, A.: Strike a pose: Tracking people by finding stylized poses. In: Proc. CVPR, vol. 1, p. 271 (2005)Google Scholar
  24. 24.
    Cour, T., Benezit, F., Shi, J.: Spectral segmentation with multiscale graph decomposition. In: Proc. CVPR (2005)Google Scholar
  25. 25.
    Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: CVPR (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Benjamin Sapp
    • 1
  • Alexander Toshev
    • 1
  • Ben Taskar
    • 1
  1. 1.University of PennsylvaniaPhiladelphiaUSA

Personalised recommendations