Pose Machines: Articulated Pose Estimation via Inference Machines

  • Varun Ramakrishna
  • Daniel Munoz
  • Martial Hebert
  • James Andrew Bagnell
  • Yaser Sheikh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8690)


State-of-the-art approaches for articulated human pose estimation are rooted in parts-based graphical models. These models are often restricted to tree-structured representations and simple parametric potentials in order to enable tractable inference. However, these simple dependencies fail to capture all the interactions between body parts. While models with more complex interactions can be defined, learning the parameters of these models remains challenging with intractable or approximate inference. In this paper, instead of performing inference on a learned graphical model, we build upon the inference machine framework and present a method for articulated human pose estimation. Our approach incorporates rich spatial interactions among multiple parts and information across parts of different scales. Additionally, the modular framework of our approach enables both ease of implementation without specialized optimization solvers, and efficient inference. We analyze our approach on two challenging datasets with large pose variation and outperform the state-of-the-art on these benchmarks.


Random Forest Context Feature Composite Part Approximate Inference Pictorial Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. In: IJCV (2005)Google Scholar
  2. 2.
    Ramanan, D., Forsyth, D.A., Zisserman, A.: Strike a Pose: Tracking people by finding stylized poses. In: CVPR (2005)Google Scholar
  3. 3.
    Andriluka, M., Roth, S., Schiele, B.: Monocular 3D Pose Estimation and Tracking by Detection. In: CVPR (2010)Google Scholar
  4. 4.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial Structures Revisited: People Detection and Articulated Pose Estimation. In: CVPR (2009)Google Scholar
  5. 5.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011)Google Scholar
  6. 6.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010)Google Scholar
  7. 7.
    Kulesza, A., Pereira, F.: Structured learning with approximate inference. In: NIPS (2007)Google Scholar
  8. 8.
    Munoz, D., Bagnell, J.A., Hebert, M.: Stacked hierarchical labeling. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 57–70. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Ross, S., Munoz, D., Hebert, M., Bagnell, J.A.: Learning message-passing inference machines for structured prediction. In: CVPR (2011)Google Scholar
  10. 10.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: CVPR (2013)Google Scholar
  11. 11.
    Sapp, B., Taskar, B.: MODEC: Multimodal Decomposable Models for Human Pose Estimation. In: CVPR (2013)Google Scholar
  12. 12.
    Wang, Y., Mori, G.: Multiple tree models for occlusion and spatial constraints in human pose estimation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 710–724. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Sigal, L., Black, M.J.: Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In: CVPR (2006)Google Scholar
  14. 14.
    Lan, X., Huttenlocher, D.P.: Beyond trees: Common-factor models for 2d human pose recovery. In: ICCV (2005)Google Scholar
  15. 15.
    Karlinsky, L., Ullman, S.: Using linking features in learning non-parametric part models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 326–339. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  16. 16.
    Tian, Y., Zitnick, C.L., Narasimhan, S.G.: Exploring the spatial hierarchy of mixture models for human pose estimation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 256–269. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  17. 17.
    Sun, M., Savarese, S.: Articulated part-based model for joint object detection and pose estimation. In: ICCV (2011)Google Scholar
  18. 18.
    Gkioxari, G., Arbeláez, P., Bourdev, L., Malik, J.: Articulated pose estimation using discriminative armlet classifiers. In: CVPR. IEEE (2013)Google Scholar
  19. 19.
    Wang, Y., Tran, D., Liao, Z.: Learning hierarchical poselets for human parsing. In: CVPR. IEEE (2011)Google Scholar
  20. 20.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: ICCV (2013)Google Scholar
  21. 21.
    Dantone, M., Gall, J., Leistner, C., Van Gool, L.: Human pose estimation using body parts dependent joint regressors. In: CVPR (2013)Google Scholar
  22. 22.
    Bengio, Y.: Learning deep architectures for AI. Foundations and trends in Machine Learning (2009)Google Scholar
  23. 23.
    Carvalho, V., Cohen, W.: Stacked sequential learning. In: IJCAI (2005)Google Scholar
  24. 24.
    Daumé III, H., Langford, J., Marcu, D.: Search-based structured prediction. Machine Learning (2009)Google Scholar
  25. 25.
    Bai, X., Tu, Z.: Auto-context and its application to high-level vision tasks and 3d brain image segmentation. In: PAMI (2009)Google Scholar
  26. 26.
    Xiong, X., Munoz, D., Bagnell, J.A., Hebert, M.: 3-d scene analysis via sequenced predictions over points and regions. In: ICRA (2011)Google Scholar
  27. 27.
    Wolpert, D.H.: Stacked Generalization. Neural Networks (1992)Google Scholar
  28. 28.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Annals of Statistics (2001)Google Scholar
  29. 29.
    Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: ICML (2006)Google Scholar
  30. 30.
    Grubb, A., Bagnell, J.A.: Generalized boosting algorithms for convex optimization. In: ICML (2011)Google Scholar
  31. 31.
    Eichner, M., Ferrari, V.: Appearance sharing for collective human pose estimation. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 138–151. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Varun Ramakrishna
    • 1
  • Daniel Munoz
    • 1
  • Martial Hebert
    • 1
  • James Andrew Bagnell
    • 1
  • Yaser Sheikh
    • 1
  1. 1.The Robotics InstituteCarnegie Mellon UniversityUSA

Personalised recommendations