Probabilistic Discriminative Dimensionality Reduction for Pose-Based Action Recognition

  • Valsamis Ntouskos
  • Panagiotis Papadakis
  • Fiora Pirri
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 318)


We examine the problem of classifying action sequences given a small set of examples for each type of action. Based on the presumption that human motion resides in a low dimensional space, we introduce a probabilistic dimensionality reduction model able to recover the structure of a low-dimensional manifold where all the involved actions reside. Requiring that sequences of the same action are placed apart from other sequences, we are able to achieve higher classification rates, with respect to other commonly used techniques, by performing the classification on this manifold. The main contribution is the introduction of a new model, based on Back-constrained GP-LVM which can be used for the efficient classification of sequences. We compare our method with the classification based on the Dynamic Time Warping distance and with the V-GPDS model, adapted for classification. Results are provided for sequences taken from two publicly available datasets which highlight different aspects of the method.


Action recognition Dimensionality reduction Manifold learning Time series models Motion capture 



This paper describes research done under the EU-FP7 ICT 247870 NIFTi project.


  1. 1.
    Aggarwal, J.K., Cai, Q.: Human motion analysis: a review. Comput. Vis. Image Underst. 73(3), 428–440 (1999)CrossRefGoogle Scholar
  2. 2.
    Bahlmann, C., Haasdonk, B., Burkhardt, H.: On-line handwriting recognition  with support vector machines-a kernel approach. In: International Workshop on Frontiers in  Handwriting Recognition, pp. 49–54 (2002)Google Scholar
  3. 3.
    Baisero, A., Pokorny, F.T., Kragic, D., Ek, C.H.: The path kernel. In: Proceedings of the International Conference on Pattern Recognition Applications and Methods (2013)Google Scholar
  4. 4.
    CMU: Carnegie-mellon mocap database,  (2003)
  5. 5.
    Cuturi, M., Vert, J.P., Birkenes, O., Matsui, T.: A kernel for time series based on  global alignments. Comput. Res. Repos. (2006)Google Scholar
  6. 6.
    Damianou, A.C., Titsias, M.K., Lawrence, N.D.: Variational gaussian process  dynamical systems. In: Neural Information Processing Systems Conference, pp. 2510–2518 (2011)Google Scholar
  7. 7.
    Gong, D., Medioni, G.: Dynamic manifold warping for view invariant action recognition. In: International Conference on Computer Vision (2011)Google Scholar
  8. 8.
    Härdle, W., Simar, W.: Applied Multivariate Statistical Analysis. Springer, New York (2003)CrossRefMATHGoogle Scholar
  9. 9.
    Lawrence, N.D.: Gaussian process latent variable models for visualisation  of high dimensional data. In: Neural Information Processing Systems Conference (2003)Google Scholar
  10. 10.
    Lawrence, N.D., Candela, J.Q.: Local distance preservation in the gp-lvm  through back constraints. In: International Conference on Machine learning, pp. 513–520 (2006)Google Scholar
  11. 11.
    Li, Y., Fermüller, C., Aloimonos, Y., Ji, H.: Learning shift-invariant sparse  representation of actions. In: International Conference on Computer Vision and Pattern Recognition,  pp. 2630–2637 (2010)Google Scholar
  12. 12.
    Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104(2–3), 90–126 (2006)CrossRefGoogle Scholar
  13. 13.
    Mordohai, P., Medioni, G.G.: Dimensionality estimation, manifold learning and function approximation using tensor voting. J. Mach. Learn. Res. 11, 411–450 (2010)MathSciNetMATHGoogle Scholar
  14. 14.
    Müller, M.: Information Retrieval for Music and Motion. Springer, Heidelberg (2007)Google Scholar
  15. 15.
    Müller, M., Röder, T., Clausen, M.: Efficient content-based retrieval of  motion capture data. In: SIGGRAPH, pp. 677–685 (2005)Google Scholar
  16. 16.
    Muller, M., Roder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation mocap database hdm05. Technical report CG-2007-2, Universität Bonn (2007)Google Scholar
  17. 17.
    Ntouskos, V., Papadakis, P., Pirri, F.: A comprehensive analysis of human  motion capture data for action recognition. In: Proceedings of the International Conference on  Computer Vision Theory and Applications, pp. 647–652 (2012)Google Scholar
  18. 18.
    Poggio, T.: Early vision: from computational structure to algorithms and parallel hardware. Comput. Vis. Graph. Image Process. 31(2), 139–155 (1985)CrossRefGoogle Scholar
  19. 19.
    Rasmussen, C., Williams, C.: Gaussian processes for machine learning. Adaptive Computation and Machine Learning. MIT, Cambridge (2006)Google Scholar
  20. 20.
    Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRefGoogle Scholar
  21. 21.
    Sheikh, Y., Sheikh, M., Shah, M.: Exploring the space of a human action. Int. Conf. Comput. Vis. 1, 144–149 (2005)Google Scholar
  22. 22.
    Shimodaira, H., Noma, K., Nakai, M., Sagayama, S.: Dynamic time-alignment kernel in support vector machine. Neural Inf. Process. Syst. Conf. 2, 921–928 (2001)Google Scholar
  23. 23.
    Taylor, G.W., Hinton, G.E., Roweis, S.T.: Modeling human motion using binary latent variables. In: Neural Information Processing Systems Conference, pp. 1345–1352 (2006)Google Scholar
  24. 24.
    Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science (2000)Google Scholar
  25. 25.
    Titsias, M.K., Lawrence, N.D.: Bayesian gaussian process latent variable model. J. Mach. Learn. Res. Proc. Track 9, 844–851 (2010)Google Scholar
  26. 26.
    Turaga, P.K., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473–1488 (2008)CrossRefGoogle Scholar
  27. 27.
    Urtasun, R., Darrell, T.: Discriminative gaussian process latent variable  model for classification. In: International Conference on Machine Learning, pp. 927–934 (2007)Google Scholar
  28. 28.
    Urtasun, R., Fleet, D.J., Fua, P.: 3d people tracking with gaussian process  dynamical models. In: International Conference on Computer Vision and Pattern Recognition, pp. 238–245 (2006)Google Scholar
  29. 29.
    Urtasun, R., Fleet, D.J., Geiger, A., Popovic, J., Darrell, T., Lawrence, N.D.:  Topologically-constrained latent variable models. In: International Conference on Machine Learning,  pp. 1080–1087 (2008)Google Scholar
  30. 30.
    Waltisberg, D., Yao, A., Gall, J., Van Gool, L.: Variations of a hough-voting  action recognition system. In: International conference on Pattern Recognition, pp. 306–312 (2010)Google Scholar
  31. 31.
    Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models. Neural Inf. Proc. Syst. Conf. 18, 1441–1448 (2006)Google Scholar
  32. 32.
    Yao, A., Gall, J., Fanelli, G., Gool, L.V.: Does human action recognition benefit  from pose estimation? In: British Machine Vision Conference, pp. 67.1–67.11 (2011)Google Scholar
  33. 33.
    Yao, A., Gall, J., Gool, L.J.V.: A hough transform-based voting framework for  action recognition. In: International Conference on Computer Vision and Pattern Recognition, pp. 2061–2068 (2010)Google Scholar
  34. 34.
    Zhang, X., Fan, G.: Joint gait-pose manifold for video-based human motion estimation. In: European Conference on Computer Vision, pp. 47–54 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Valsamis Ntouskos
    • 1
  • Panagiotis Papadakis
    • 1
  • Fiora Pirri
    • 1
  1. 1.ALCOR LaboratorySapienza University of RomeRomeItaly

Personalised recommendations