2D Action Recognition Serves 3D Human Pose Estimation

  • Juergen Gall
  • Angela Yao
  • Luc Van Gool
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6313)


3D human pose estimation in multi-view settings benefits from embeddings of human actions in low-dimensional manifolds, but the complexity of the embeddings increases with the number of actions. Creating separate, action-specific manifolds seems to be a more practical solution. Using multiple manifolds for pose estimation, however, requires a joint optimization over the set of manifolds and the human pose embedded in the manifolds. In order to solve this problem, we propose a particle-based optimization algorithm that can efficiently estimate human pose even in challenging in-house scenarios. In addition, the algorithm can directly integrate the results of a 2D action recognition system as prior distribution for optimization. In our experiments, we demonstrate that the optimization handles an 84D search space and provides already competitive results on HumanEva with as few as 25 particles.


Action Class Action Recognition Human Action Recognition Visual Hull Action Recognition System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material

978-3-642-15558-1_31_MOESM1_ESM.avi (13.9 mb)
Electronic Supplementary Material (14,210 KB)


  1. 1.
    Agarwal, A., Triggs, B.: Recovering 3d human pose from monocular images. TPAMI 28(1), 44–58 (2006)Google Scholar
  2. 2.
    Baak, A., Rosenhahn, B., Müller, M., Seidel, H.P.: Stabilizing motion tracking using retrieved motion priors. In: ICCV (2009)Google Scholar
  3. 3.
    Bo, L., Sminchisescu, C.: Twin gaussian processes for structured prediction. IJCV 87, 28–52 (2010)CrossRefGoogle Scholar
  4. 4.
    Chen, J., Kim, M., Wang, Y., Ji, Q.: Switching gaussian process dynamic models for simultaneous composite motion tracking and recognition. In: CVPR (2009)Google Scholar
  5. 5.
    Deutscher, J., Reid, I.: Articulated body motion capture by stochastic search. IJCV 61(2), 185–205 (2005)CrossRefGoogle Scholar
  6. 6.
    Gall, J., Rosenhahn, B., Brox, T., Seidel, H.P.: Optimization and filtering for human motion capture – a multi-layer framework. IJCV 87, 75–92 (2010)CrossRefGoogle Scholar
  7. 7.
    Gall, J., Rosenhahn, B., Seidel, H.P.: An Introduction to Interacting Simulated Annealing. In: Human Motion: Understanding, Modelling, Capture and Animation, pp. 319–343. Springer, Heidelberg (2008)Google Scholar
  8. 8.
    Grauman, K., Shakhnarovich, G., Darrell, T.: Inferring 3d structure with a statistical image-based shape model. In: ICCV, pp. 641–648 (2003)Google Scholar
  9. 9.
    Kittler, J., Society, I.C., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. TPAMI 20, 226–239 (1998)Google Scholar
  10. 10.
    Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. ACM Trans. Graph. 21(3), 473–482 (2002)CrossRefGoogle Scholar
  11. 11.
    Lee, C.S., Elgammal, A.: Coupled visual and kinematic manifold models for tracking. IJCV 87, 118–139 (2010)CrossRefGoogle Scholar
  12. 12.
    Li, R., Tian, T.P., Sclaroff, S., Yang, M.H.: 3d human motion tracking with a coordinated mixture of factor analyzers. IJCV 87, 170–190 (2010)CrossRefGoogle Scholar
  13. 13.
    Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: CVPR (2007)Google Scholar
  14. 14.
    Moeslund, T., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. CVIU 104(2), 90–126 (2006)Google Scholar
  15. 15.
    Moon, K., Pavlovic, V.: Impact of dynamics on subspace embedding and tracking of sequences. In: CVPR, pp. 198–205 (2006)Google Scholar
  16. 16.
    Moral, P.D.: Feynman-Kac Formulae. Genealogical and Interacting Particle Systems with Applications. Springer, New York (2004)zbMATHGoogle Scholar
  17. 17.
    Pingkun Yan, S.M.K., Shah, M.: Learning 4d action feature models for arbitrary view action recognition. In: CVPR (2008)Google Scholar
  18. 18.
    Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing (2010)Google Scholar
  19. 19.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). MIT Press, Cambridge (2005)Google Scholar
  20. 20.
    Shaheen, M., Gall, J., Strzodka, R., van Gool, L., Seidel, H.P.: A comparison of 3d model-based tracking approaches for human motion capture in uncontrolled environments. In: IEEE Workshop on Applications of Computer Vision (2009)Google Scholar
  21. 21.
    Sheikh, Y., Sheikh, M., Shah, M.: Exploring the space of a human action. In: ICCV (2005)Google Scholar
  22. 22.
    Sidenbladh, H., Black, M., Sigal, L.: Implicit probabilistic models of human motion for synthesis and tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 784–800. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  23. 23.
    Sigal, L., Balan, A., Black, M.: Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV 87, 4–27 (2010)CrossRefGoogle Scholar
  24. 24.
    Sminchisescu, C., Kanaujia, A., Metaxas, D.: Bme: Discriminative density propagation for visual tracking. TPAMI 29(11), 2030–2044 (2007)Google Scholar
  25. 25.
    Souvenir, R., Babbs, J.: Learning the viewpoint manifold for action recognition. In: CVPR (2008)Google Scholar
  26. 26.
    Tenenbaum, J.B., de Silva, V., Langford, J.C.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 290(5500), 2319–2323 (2000)CrossRefGoogle Scholar
  27. 27.
    Tenorth, M., Bandouch, J., Beetz, M.: The TUM kitchen data set of everyday manipulation activities for motion tracking and action recognition. In: IEEE Workshop on THEMIS (2009)Google Scholar
  28. 28.
    Ukita, N., Hirai, M., Kidode, M.: Complex volume and pose tracking with probabilistic dynamical model and visual hull constraint. In: ICCV (2009)Google Scholar
  29. 29.
    Urtasun, R., Fleet, D., Fua, P.: 3d people tracking with gaussian process dynamical models. In: CVPR, pp. 238–245 (2006)Google Scholar
  30. 30.
    Urtasun, R., Fua, P.: 3d human body tracking using deterministic temporal motion models. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 92–106. Springer, Heidelberg (2004)Google Scholar
  31. 31.
    Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: ICCV, pp. 1–7 (2007)Google Scholar
  32. 32.
    Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. CVIU 104(2-3), 249–257 (2006)Google Scholar
  33. 33.
    Yao, A., Gall, J., van Gool, L.: A hough transform-based voting framework for action recognition. In: CVPR (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Juergen Gall
    • 1
  • Angela Yao
    • 1
  • Luc Van Gool
    • 1
    • 2
  1. 1.Computer Vision LaboratoryETH ZurichSwitzerland
  2. 2.KU LeuvenBelgium

Personalised recommendations