Multi-person Pose Estimation with Local Joint-to-Person Associations

  • Umar Iqbal
  • Juergen Gall
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9914)


Despite of the recent success of neural networks for human pose estimation, current approaches are limited to pose estimation of a single person and cannot handle humans in groups or crowds. In this work, we propose a method that estimates the poses of multiple persons in an image in which a person can be occluded by another person or might be truncated. To this end, we consider multi-person pose estimation as a joint-to-person association problem. We construct a fully connected graph from a set of detected joint candidates in an image and resolve the joint-to-person association and outlier detection using integer linear programming. Since solving joint-to-person association jointly for all persons in an image is an NP-hard problem and even approximations are expensive, we solve the problem locally for each person. On the challenging MPII Human Pose Dataset for multiple persons, our approach achieves the accuracy of a state-of-the-art method, but it is 6,000 to 19,000 times faster.



The work was partially supported by the ERC Starting Grant ARCA (677650).


  1. 1.
    Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: CVPR (2016)Google Scholar
  2. 2.
    Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: Deepcut: Joint subset partition and labeling for multi person pose estimation. In: CVPR (2016)Google Scholar
  3. 3.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)Google Scholar
  4. 4.
    Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR (2015)Google Scholar
  5. 5.
    Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M., Travkin, O. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-46466-4_3 CrossRefGoogle Scholar
  6. 6.
    Chen, X., Yuille, A.L.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: NIPS (2014)Google Scholar
  7. 7.
    Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS (2014)Google Scholar
  8. 8.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M., Chang, L. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-46484-8_29 CrossRefGoogle Scholar
  9. 9.
    Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: ECCV (2016)Google Scholar
  10. 10.
    Rafi, U., Kostrikov, I., Gall, J., Leibe, B.: An efficient convolutional network for human pose estimation. In: BMVC (2016)Google Scholar
  11. 11.
    Pishchulin, L., Jain, A., Andriluka, M., Thormählen, T., Schiele, B.: Articulated people detection and pose estimation: Reshaping the future. In: CVPR (2012)Google Scholar
  12. 12.
    Gkioxari, G., Hariharan, B., Girshick, R., Malik, J.: Using k-poselets for detecting people and localizing their keypoints. In: CVPR (2014)Google Scholar
  13. 13.
    Chen, X., Yuille, A.L.: Parsing occluded people by flexible compositions. In: CVPR (2015)Google Scholar
  14. 14.
    Eichner, M., Ferrari, V.: We Are Family: joint pose estimation of multiple persons. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 228–242. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Ladicky, L., Torr, P.H., Zisserman, A.: Human pose estimation using a joint pixel-wise and part-wise formulation. In: CVPR (2013)Google Scholar
  16. 16.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)Google Scholar
  17. 17.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV 61(1), 55–79 (2005)CrossRefGoogle Scholar
  18. 18.
    Tran, D., Forsyth, D.: Improved human parsing with a full relational model. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 227–240. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  19. 19.
    Andriluka, M., Roth, S., Schiele, B.: Discriminative appearance models for pictorial structures. IJCV 99(3), 259–280 (2012)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: CVPR (2013)Google Scholar
  21. 21.
    Wang, F., Li, Y.: Beyond physical connections: tree models in human pose estimation. In: CVPR (2013)Google Scholar
  22. 22.
    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. TPAMI 35(12), 2878–2890 (2013)CrossRefGoogle Scholar
  23. 23.
    Dantone, M., Leistner, C., Gall, J., Van Gool, L.: Body parts dependent joint regressors for human pose estimation in still images. TPAMI 36(11), 2131–2143 (2014)CrossRefGoogle Scholar
  24. 24.
    Sun, M., Savarese, S.: Articulated part-based model for joint object detection and pose estimation. In: ICCV (2011)Google Scholar
  25. 25.
    Ladicky, L., Torr, P., Zisserman, A.: Human pose estimation using a joint pixel-wise and part-wise formulation. In: CVPR (2013)Google Scholar
  26. 26.
    Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures revisited: Multiple human pose estimation. TPAMI (2015)Google Scholar
  27. 27.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  28. 28.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. TNN (1994)Google Scholar
  29. 29.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)Google Scholar
  30. 30.
    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM TIST (2011)Google Scholar
  31. 31.
    Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers (1999)Google Scholar
  32. 32.
    Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge: a retrospective. IJCV (2015)Google Scholar
  33. 33.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC. (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Computer Vision GroupUniversity of BonnBonnGermany

Personalised recommendations