Human Pose Estimation with Fields of Parts

  • Martin Kiefel
  • Peter Vincent Gehler
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8693)


This paper proposes a new formulation of the human pose estimation problem. We present the Fields of Parts model, a binary Conditional Random Field model designed to detect human body parts of articulated people in single images.

The Fields of Parts model is inspired by the idea of Pictorial Structures, it models local appearance and joint spatial configuration of the human body. However the underlying graph structure is entirely different. The idea is simple: we model the presence and absence of a body part at every possible position, orientation, and scale in an image with a binary random variable. This results into a vast number of random variables, however, we show that approximate inference in this model is efficient. Moreover we can encode the very same appearance and spatial structure as in Pictorial Structures models.

This approach allows us to combine ideas from segmentation and pose estimation into a single model. The Fields of Parts model can use evidence from the background, include local color information, and it is connected more densely than a kinematic chain structure. On the challenging Leeds Sports Poses dataset we improve over the Pictorial Structures counterpart by 6.0% in terms of Average Precision of Keypoints.


Human Pose Estimation Efficient Inference 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

978-3-319-10602-1_22_MOESM1_ESM.pdf (704 kb)
Electronic Supplementary Material (PDF 704 KB)


  1. 1.
    Adams, A., Baek, J., Davis, M.A.: Fast high-dimensional filtering using the permutohedral lattice. Comput. Graph. Forum 29(2), 753–762 (2010)CrossRefGoogle Scholar
  2. 2.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: CVPR (2009)Google Scholar
  3. 3.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: ICCV (2009)Google Scholar
  4. 4.
    Bray, M., Kohli, P., Torr, P.: poseCut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 642–655. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  6. 6.
    Dantone, M., Gall, J., Leistner, C., Gool., L.V.: Human pose estimation using body parts dependent joint regressors. In: CVPR (2013)Google Scholar
  7. 7.
    Domke, J.: Parameter learning with truncated message-passing. In: CVPR (2011)Google Scholar
  8. 8.
    Domke, J.: Learning graphical model parameters with approximate marginal inference. PAMI (2013)Google Scholar
  9. 9.
    Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: BMVC (2009)Google Scholar
  10. 10.
    Eichner, M., Ferrari, V.: Appearance sharing for collective human pose estimation. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 138–151. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  11. 11.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV (2005)Google Scholar
  12. 12.
    Ferrari, V., Marin, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR (2008)Google Scholar
  13. 13.
    Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. (1973)Google Scholar
  14. 14.
    Gkioxari, G., Arbelaez, P., Bourdev, L., Malik, J.: Articulated pose estimation using discriminative armlet classifiers. In: CVPR (2013)Google Scholar
  15. 15.
    Jain, A., Tompson, J., Andriluka, M., Taylor, G.W., Bregler, C.: Learning human pose estimation features with convolutional networks. arXiv (2013)Google Scholar
  16. 16.
    Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: ICCV (2013)Google Scholar
  17. 17.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010)Google Scholar
  18. 18.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS (2011)Google Scholar
  19. 19.
    Krähenbühl, P., Koltun, V.: Parameter learning and convergent inference for dense random fields. In: ICML (2013)Google Scholar
  20. 20.
    Ladicky, L., Torr, P.H.S., Zisserman, A.: Human pose estimation using a joint pixel-wise and part-wise formulation. In: CVPR (2013)Google Scholar
  21. 21.
    Nowozin, S., Rother, C., Bagon, S., Sharp, T., Yao, B., Kohli, P.: Decision tree fields. In: ICCV (2011)Google Scholar
  22. 22.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: CVPR (2013)Google Scholar
  23. 23.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: ICCV (2013)Google Scholar
  24. 24.
    Ramanan, D.: Learning to parse images of articulated objects. In: NIPS (2006)Google Scholar
  25. 25.
    Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: CVPR (2010)Google Scholar
  26. 26.
    Sapp, B., Toshev, A., Taskar, B.: Cascaded models for articulated pose estimation. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 406–420. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  27. 27.
    Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: CVPR (2011)Google Scholar
  28. 28.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from a single depth image. In: CVPR (2011)Google Scholar
  29. 29.
    Sun, M., Telaprolu, M., Lee, H., Savarese, S.: An efficient branch-and-bound algorithm for optimal human pose estimation. In: CVPR (2012)Google Scholar
  30. 30.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. JMLR 6, 1453–1484 (2005),
  31. 31.
    Vineet, V., Sheasby, G., Warrell, J., Torr, P.H.S.: PoseField: An efficient mean-field based method for joint estimation of human pose, segmentation, and depth. In: Heyden, A., Kahl, F., Olsson, C., Oskarsson, M., Tai, X.-C. (eds.) EMMCVPR 2013. LNCS, vol. 8081, pp. 180–194. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  32. 32.
    Wang, H., Koller, D.: Multi-level inference by relaxed dual decomposition for human pose segmentation. In: CVPR, pp. 2433–2440 (2011)Google Scholar
  33. 33.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011)Google Scholar
  34. 34.
    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. PAMI 35 (2013)Google Scholar
  35. 35.
    Zeiler, M.: Adadelta: An adaptive learning rate method (December 2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Martin Kiefel
    • 1
  • Peter Vincent Gehler
    • 1
  1. 1.Max Planck Institute for Intelligent SystemsTübingenGermany

Personalised recommendations