DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model

  • Eldar Insafutdinov
  • Leonid Pishchulin
  • Bjoern Andres
  • Mykhaylo Andriluka
  • Bernt Schiele
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9910)

Abstract

The goal of this paper is to advance the state-of-the-art of articulated pose estimation in scenes with multiple people. To that end we contribute on three fronts. We propose (1) improved body part detectors that generate effective bottom-up proposals for body parts; (2) novel image-conditioned pairwise terms that allow to assemble the proposals into a variable number of consistent body part configurations; and (3) an incremental optimization strategy that explores the search space more efficiently thus leading both to better performance and significant speed-up factors. Evaluation is done on two single-person and two multi-person pose estimation benchmarks. The proposed approach significantly outperforms best known multi-person pose estimation results while demonstrating competitive performance on the task of single person pose estimation (Models and code available at http://pose.mpi-inf.mpg.de).

References

  1. 1.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC 2010Google Scholar
  2. 2.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: CVPR 2014Google Scholar
  3. 3.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS 2012Google Scholar
  4. 4.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: CoRR 2014Google Scholar
  5. 5.
    Andriluka, M., Roth, S., Schiele, B.: Discriminative appearance models for pictorial structures. In: IJCV 2011Google Scholar
  6. 6.
    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. In: PAMI 2013Google Scholar
  7. 7.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: CVPR 2013Google Scholar
  8. 8.
    Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS 2014Google Scholar
  9. 9.
    Chen, X., Yuille, A.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: NIPS 2014Google Scholar
  10. 10.
    Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: Deepcut: Joint subset partition and labeling for multi person pose estimation. In: CVPR 2016Google Scholar
  11. 11.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR 2016Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016Google Scholar
  13. 13.
    Ramanan, D.: Learning to parse images of articulated objects. In: NIPS 2006Google Scholar
  14. 14.
    Jiang, H., Martin, D.R.: Global pose estimation using non-tree models. In: CVPR 2009Google Scholar
  15. 15.
    Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: CVPR 2011Google Scholar
  16. 16.
    Tran, D., Forsyth, D.: Improved human parsing with a full relational model. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 227–240. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15561-1_17 CrossRefGoogle Scholar
  17. 17.
    Wang, F., Li, Y.: Beyond physical connections: Tree models in human pose estimation. In: CVPR 2013Google Scholar
  18. 18.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: ICCV 2013Google Scholar
  19. 19.
    Gkioxari, G., Arbelaez, P., Bourdev, L., Malik, J.: Articulated pose estimation using discriminative armlet classifiers. In: CVPR 2013Google Scholar
  20. 20.
    Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR 2015Google Scholar
  21. 21.
    Ramakrishna, V., Munoz, D., Hebert, M., Andrew Bagnell, J., Sheikh, Y.: Pose machines: Articulated pose estimation via inference machines. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part II. LNCS, vol. 8690, pp. 33–47. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10605-2_3 Google Scholar
  22. 22.
    Eichner, M., Ferrari, V.: We are family: Joint pose estimation of multiple persons. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 228–242. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15549-9_17 CrossRefGoogle Scholar
  23. 23.
    Ladicky, L., Torr, P.H., Zisserman, A.: Human pose estimation using a joint pixel-wise and part-wise formulation. In: CVPR 2013Google Scholar
  24. 24.
    Chen, X., Yuille, A.: Parsing occluded people by flexible compositions. In: CVPR 2015Google Scholar
  25. 25.
    Bansal, N., Blum, A., Chawla, S.: Correlation clustering. In: ML 2004Google Scholar
  26. 26.
    Demaine, E.D., Emanuel, D., Fiat, A., Immorlica, N.: Correlation clustering in general weighted graphs. In: Theoretical Computer Science 2006Google Scholar
  27. 27.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR 2015Google Scholar
  28. 28.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected crfs. In: ICLR 2015Google Scholar
  29. 29.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR 2015Google Scholar
  30. 30.
    Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: AISTATS 2015Google Scholar
  31. 31.
    Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: ICCV 2015Google Scholar
  32. 32.
    Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: CoRR 2015Google Scholar
  33. 33.
    Sapp, B., Taskar, B.: Multimodal decomposable models for human pose estimation. In: CVPR 2013Google Scholar
  34. 34.
    Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: CVPR 2014Google Scholar
  35. 35.
    Fan, X., Zheng, K., Lin, Y., Wang, S.: Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In: CVPR 2015Google Scholar
  36. 36.
    Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: CVPR 2016Google Scholar
  37. 37.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS 2015Google Scholar
  38. 38.
    Ghiasi, G., Yang, Y., Ramanan, D., Fowlkes, C.: Parsing occluded people. In: CVPR 2014Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Eldar Insafutdinov
    • 1
  • Leonid Pishchulin
    • 1
  • Bjoern Andres
    • 1
  • Mykhaylo Andriluka
    • 1
    • 2
  • Bernt Schiele
    • 1
  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany
  2. 2.Stanford UniversityStanfordUSA

Personalised recommendations