Test-Time Adaptation for 3D Human Pose Estimation

  • Sikandar Amin
  • Philipp Müller
  • Andreas Bulling
  • Mykhaylo Andriluka
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8753)

Abstract

In this paper we consider the task of articulated 3D human pose estimation in challenging scenes with dynamic background and multiple people. Initial progress on this task has been achieved building on discriminatively trained part-based models that deliver a set of 2D body pose candidates that are then subsequently refined by reasoning in 3D [1, 4, 5]. The performance of such methods is limited by the performance of the underlying 2D pose estimation approaches. In this paper we explore a way to boost the performance of 2D pose estimation based on the output of the 3D pose reconstruction process, thus closing the loop in the pose estimation pipeline. We build our approach around a component that is able to identify true positive pose estimation hypotheses with high confidence. We then either retrain 2D pose estimation models using such highly confident hypotheses as additional training examples, or we use similarity to these hypotheses as a cue for 2D pose estimation. We consider a number of features that can be used for assessing the confidence of the pose estimation results. The strongest feature in our comparison corresponds to the ensemble agreement on the 3D pose output. We evaluate our approach on two publicly available datasets improving over state of the art in each case.

References

  1. 1.
    Amin, S., Andriluka, M., Rohrbach, M., Schiele, B.: Multi-view pictorial structures for 3D human pose estimation. In: BMVC (2013)Google Scholar
  2. 2.
    Andriluka, M., Roth, S., Schiele, B.: Monocular 3D pose estimation and tracking by detection. In: CVPR (2010)Google Scholar
  3. 3.
    Andriluka, M., Roth, S., Schiele, B.: Discriminative appearance models for pictorial structures. IJCV 99(3), 259–280 (2012)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures for multiple human pose estimation. In: CVPR (2014)Google Scholar
  5. 5.
    Burenius, M., Sullivan, J., Carlsson, S.: 3D pictorial structures for multiple view articulated pose estimation. In: CVPR (2013)Google Scholar
  6. 6.
    Eichner, M., Ferrari, V.: Appearance sharing for collective human pose estimation. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 138–151. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  7. 7.
    El Hayek, A., Stoll, C., Hasler, N., Kim, K.I., Seidel, H.P., Theobalt, C.: Spatio-temporal motion tracking with unsynchronized cameras. In: CVPR (2012)Google Scholar
  8. 8.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV 61(1), 55–79 (2005)CrossRefGoogle Scholar
  9. 9.
    Ferrari, V., Marin, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR (2008)Google Scholar
  10. 10.
    Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. IEEE Trans. Comput. C-22(1), 67–92 (1973)Google Scholar
  11. 11.
    Jammalamadaka, N., Zisserman, A., Eichner, M., Ferrari, V., Jawahar, C.V.: Has my algorithm succeeded? an evaluator for human pose estimators. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 114–128. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Kazemi, V., Burenius, M., Azizpour, H., Sullivan, J.: Multi-view body part recognition with random forests. In: BMVC (2013)Google Scholar
  13. 13.
    Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Berkeley MHAD: a comprehensive multimodal human action database. In: WACV (2013)Google Scholar
  14. 14.
    Ramanan, D., Forsyth, D.A., Zisserman, A.: Strike a pose: tracking people by finding stylized poses. In: CVPR (2005)Google Scholar
  15. 15.
    Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: CVPR (2012)Google Scholar
  16. 16.
    Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: CVPR (2011)Google Scholar
  17. 17.
    Sigal, L., Balan, A., Black, M.J.: Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV 87(1–2), 4–27 (2010)CrossRefGoogle Scholar
  18. 18.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Sikandar Amin
    • 1
    • 2
  • Philipp Müller
    • 2
  • Andreas Bulling
    • 2
  • Mykhaylo Andriluka
    • 2
    • 3
  1. 1.Technische Universität MünchenMunichGermany
  2. 2.Max Planck Institute for InformaticsSaarbrückenGermany
  3. 3.Stanford UniversityStanfordUSA

Personalised recommendations