Advertisement

DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model

  • Eldar Insafutdinov
  • Leonid Pishchulin
  • Bjoern Andres
  • Mykhaylo Andriluka
  • Bernt Schiele
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9910)

Abstract

The goal of this paper is to advance the state-of-the-art of articulated pose estimation in scenes with multiple people. To that end we contribute on three fronts. We propose (1) improved body part detectors that generate effective bottom-up proposals for body parts; (2) novel image-conditioned pairwise terms that allow to assemble the proposals into a variable number of consistent body part configurations; and (3) an incremental optimization strategy that explores the search space more efficiently thus leading both to better performance and significant speed-up factors. Evaluation is done on two single-person and two multi-person pose estimation benchmarks. The proposed approach significantly outperforms best known multi-person pose estimation results while demonstrating competitive performance on the task of single person pose estimation (Models and code available at http://pose.mpi-inf.mpg.de).

Keywords

Body Part Integer Linear Programming Area Under Curve Part Detector Conv4 Bank 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC 2010Google Scholar
  2. 2.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: CVPR 2014Google Scholar
  3. 3.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS 2012Google Scholar
  4. 4.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: CoRR 2014Google Scholar
  5. 5.
    Andriluka, M., Roth, S., Schiele, B.: Discriminative appearance models for pictorial structures. In: IJCV 2011Google Scholar
  6. 6.
    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. In: PAMI 2013Google Scholar
  7. 7.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: CVPR 2013Google Scholar
  8. 8.
    Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS 2014Google Scholar
  9. 9.
    Chen, X., Yuille, A.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: NIPS 2014Google Scholar
  10. 10.
    Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: Deepcut: Joint subset partition and labeling for multi person pose estimation. In: CVPR 2016Google Scholar
  11. 11.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR 2016Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016Google Scholar
  13. 13.
    Ramanan, D.: Learning to parse images of articulated objects. In: NIPS 2006Google Scholar
  14. 14.
    Jiang, H., Martin, D.R.: Global pose estimation using non-tree models. In: CVPR 2009Google Scholar
  15. 15.
    Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: CVPR 2011Google Scholar
  16. 16.
    Tran, D., Forsyth, D.: Improved human parsing with a full relational model. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 227–240. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15561-1_17 CrossRefGoogle Scholar
  17. 17.
    Wang, F., Li, Y.: Beyond physical connections: Tree models in human pose estimation. In: CVPR 2013Google Scholar
  18. 18.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: ICCV 2013Google Scholar
  19. 19.
    Gkioxari, G., Arbelaez, P., Bourdev, L., Malik, J.: Articulated pose estimation using discriminative armlet classifiers. In: CVPR 2013Google Scholar
  20. 20.
    Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR 2015Google Scholar
  21. 21.
    Ramakrishna, V., Munoz, D., Hebert, M., Andrew Bagnell, J., Sheikh, Y.: Pose machines: Articulated pose estimation via inference machines. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part II. LNCS, vol. 8690, pp. 33–47. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10605-2_3 Google Scholar
  22. 22.
    Eichner, M., Ferrari, V.: We are family: Joint pose estimation of multiple persons. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 228–242. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15549-9_17 CrossRefGoogle Scholar
  23. 23.
    Ladicky, L., Torr, P.H., Zisserman, A.: Human pose estimation using a joint pixel-wise and part-wise formulation. In: CVPR 2013Google Scholar
  24. 24.
    Chen, X., Yuille, A.: Parsing occluded people by flexible compositions. In: CVPR 2015Google Scholar
  25. 25.
    Bansal, N., Blum, A., Chawla, S.: Correlation clustering. In: ML 2004Google Scholar
  26. 26.
    Demaine, E.D., Emanuel, D., Fiat, A., Immorlica, N.: Correlation clustering in general weighted graphs. In: Theoretical Computer Science 2006Google Scholar
  27. 27.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR 2015Google Scholar
  28. 28.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected crfs. In: ICLR 2015Google Scholar
  29. 29.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR 2015Google Scholar
  30. 30.
    Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: AISTATS 2015Google Scholar
  31. 31.
    Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: ICCV 2015Google Scholar
  32. 32.
    Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: CoRR 2015Google Scholar
  33. 33.
    Sapp, B., Taskar, B.: Multimodal decomposable models for human pose estimation. In: CVPR 2013Google Scholar
  34. 34.
    Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: CVPR 2014Google Scholar
  35. 35.
    Fan, X., Zheng, K., Lin, Y., Wang, S.: Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In: CVPR 2015Google Scholar
  36. 36.
    Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: CVPR 2016Google Scholar
  37. 37.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS 2015Google Scholar
  38. 38.
    Ghiasi, G., Yang, Y., Ramanan, D., Fowlkes, C.: Parsing occluded people. In: CVPR 2014Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Eldar Insafutdinov
    • 1
  • Leonid Pishchulin
    • 1
  • Bjoern Andres
    • 1
  • Mykhaylo Andriluka
    • 1
    • 2
  • Bernt Schiele
    • 1
  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany
  2. 2.Stanford UniversityStanfordUSA

Personalised recommendations