Human Pose Estimation Using Deep Consensus Voting

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9906)

Abstract

In this paper we consider the problem of human pose estimation from a single still image. We propose a novel approach where each location in the image votes for the position of each keypoint using a convolutional neural net. The voting scheme allows us to utilize information from the whole image, rather than rely on a sparse set of keypoint locations. Using dense, multi-target votes, not only produces good keypoint predictions, but also enables us to compute image-dependent joint keypoint probabilities by looking at consensus voting. This differs from most previous methods where joint probabilities are learned from relative keypoint locations and are independent of the image. We finally combine the keypoints votes and joint probabilities in order to identify the optimal pose configuration. We show our competitive performance on the MPII Human Pose and Leeds Sports Pose datasets.

Supplementary material

419974_1_En_16_MOESM1_ESM.pdf (2.9 mb)
Supplementary material 1 (pdf 2933 KB)

References

  1. 1.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)Google Scholar
  2. 2.
    Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. Pattern Recogn. 13(2), 111–122 (1981)CrossRefMATHGoogle Scholar
  3. 3.
    Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback (2015). arXiv preprint: arXiv:1507.06550
  4. 4.
    Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets, fully connected CRFs (2014). arXiv preprint: arXiv:1412.7062
  5. 5.
    Chen, X., Yuille, A.L.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: NIPS (2014)Google Scholar
  6. 6.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  7. 7.
    Fan, X., Zheng, K., Lin, Y., Wang, S.: Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation. In: CVPR (2015)Google Scholar
  8. 8.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61, 55–79 (2005)CrossRefGoogle Scholar
  9. 9.
    Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22, 67–92 (1973)CrossRefGoogle Scholar
  10. 10.
    Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: Criminisi, A., Shotton, J. (eds.) Decision Forests for Computer Vision and Medical Image Analysis, pp. 143–157. Springer, London (2013)CrossRefGoogle Scholar
  11. 11.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding (2014). arXiv preprint: arXiv:1408.5093
  12. 12.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010)Google Scholar
  13. 13.
    Karlinsky, L., Dinerstein, M., Harari, D., Ullman, S.: The chains model for detecting parts by their context. In: CVPR, pp. 25–32 (2010)Google Scholar
  14. 14.
    Karlinsky, L., Ullman, S.: Using linking features in learning non-parametric part models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 326–339. Springer, Heidelberg (2012)Google Scholar
  15. 15.
    Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1568–1583 (2006)CrossRefGoogle Scholar
  16. 16.
    Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 2, p. 7 (2004)Google Scholar
  17. 17.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  18. 18.
    Maji, S., Malik, J.: Object detection using a max-margin hough transform. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1038–1045. IEEE (2009)Google Scholar
  19. 19.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. Arxiv preprint (2016)Google Scholar
  20. 20.
    Okoda, R.: Discriminative generalized hough transform for object detection. In: ICCV, pp. 2000–2005 (2009)Google Scholar
  21. 21.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2013)Google Scholar
  22. 22.
    Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: DeepCut: joint subset partition and labeling for multi person pose estimation (2015). arXiv preprint: arXiv:1511.06645
  23. 23.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Li, F.-F.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  25. 25.
    Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR (2015)Google Scholar
  26. 26.
    Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Join training of a convolutional network and a graphical model for human pose estimation. In: NIPS (2014)Google Scholar
  27. 27.
    Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)Google Scholar
  28. 28.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR, pp. 1385–1392 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Weizmann Institute of ScienceRehovotIsrael

Personalised recommendations