Machine Vision and Applications

, Volume 28, Issue 1–2, pp 185–200 | Cite as

Online human moves recognition through discriminative key poses and speed-aware action graphs

  • Thales VieiraEmail author
  • Romain Faugeroux
  • Dimas Martínez
  • Thomas Lewiner
Original Paper


Recognizing user-defined moves serves a large number of applications including sport monitoring, virtual reality or natural user interfaces (NUI). However, many of the efficient human move recognition methods are still limited to specific situations, such as straightforward NUI gestures or everyday human actions. In particular, most methods depend on a prior segmentation of recordings to both train and recognize moves. This segmentation step is generally performed manually or based on heuristics such as neutral poses or short pauses, limiting the range of applications. Besides, speed is generally not considered as a criterion to distinguish moves. We present an approach composed of a simplified move training phase that requires minimal user intervention, together with a novel online method to robustly recognize moves online from unsegmented data without requiring any transitional pauses or neutral poses, and additionally considering human move speed. Trained gestures are automatically segmented in real time by a curvature-based method that detects small pauses during a training session. A set of most discriminant key poses between different moves is also extracted in real time, optimizing the number of key poses. All together, this semi-supervised learning approach only requires continuous move performances from the user with small pauses. Key pose transitions and moves execution speeds are used as input to a novel human move recognition algorithm that recognizes unsegmented moves online, achieving high robustness and very low latency in our experiments, while also effective in distinguishing moves that differ only in speed.


Gesture recognition Action recognition Machine learning Pose identification Depth sensors Natural user interface 



The authors would like to thank CNPq, FAPEAL, FAPERJ and École Polytechnique for partially financing this research.

Supplementary material

Supplementary material 1 (mov 23113 KB)


  1. 1.
    Althloothi, S., Mahoor, M.H., Zhang, X., Voyles, R.M.: Human activity recognition using multi-features and multiple kernel learning. Pattern Recognit. 47(5), 1800–1812 (2014)CrossRefGoogle Scholar
  2. 2.
    Bloom, V., Makris, D., Argyriou, V.: Clustered spatio-temporal manifolds for online action recognition. In: Pattern Recognition (ICPR), IEEE 2014 22nd International Conference on, pp. 3963–3968 (2014)Google Scholar
  3. 3.
    Bobick, A., Davis, J.: The recognition of human movement using temporal templates. TPAMI 23 (2001)Google Scholar
  4. 4.
    Cao, L., Liu, Z., Huang, T.: Cross-dataset action detection. In: CVPR, pp. 1998–2005 (2010)Google Scholar
  5. 5.
    Cardinaux, F., Bhowmik, D., Abhayaratne, C., Hawley, M.S.: Video based technology for ambient assisted living: A review of the literature. J. Ambient Intell. Smart Environ. 3(3), 253–269 (2011)Google Scholar
  6. 6.
    Chaaraoui, A.A., Flórez-Revuelta, F.: Continuous Human Action Recognition in Ambient Assisted Living Scenarios. Springer International Publishing, Cham (2015)CrossRefGoogle Scholar
  7. 7.
    Chen, D.Y., Shih, S.W., Liao, H.Y.M.: Human action recognition using 2-d spatio-temporal templates. In: 2007 IEEE International Conference on Multimedia and Expo, pp. 667–670. IEEE (2007)Google Scholar
  8. 8.
    Chen, W., Guo, G.: Triviews: A general framework to use 3d depth data effectively for action recognition. J. Vis. Commun. Image Represent. 26, 182–191 (2015). doi: 10.1016/j.jvcir.2014.11.008 MathSciNetCrossRefGoogle Scholar
  9. 9.
    Desmond, M., Collet, P., Marsh, P., O’Shaughnessy, M.: Gestures: Their origins and distribution. Cape (1979)Google Scholar
  10. 10.
    Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: Space-time pose representation for 3d human action recognition. In: ICIAP, pp. 456–464 (2013)Google Scholar
  11. 11.
    Do Carmo, M.: Differential geometry of curves and surfaces. Pearson, London (1976)zbMATHGoogle Scholar
  12. 12.
    Ellis, C., Masood, S.Z., Tappen, M.F., La Viola Joseph, J.J., Sukthankar, R.: Exploring the trade-off between accuracy and observational latency in action recognition. J. Vis. Commun. Image Represent. 101(3), 420–436 (2013)Google Scholar
  13. 13.
    Faugeroux, R., Vieira, T., Martinez, D., Lewiner, T.: Simplified training for gesture recognition. In: Sibgrapi, pp. 133–140 (2014)Google Scholar
  14. 14.
    Forbes, K., Fiu, E.: An efficient search algorithm for motion data using weighted PCA. In: SCA, pp. 67–76 (2005)Google Scholar
  15. 15.
    Fothergill, S., Mentis, H., Kohli, P., Nowozin, S.: Instructing people for training gestural interactive systems. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’12, pp. 1737–1746. ACM, New York, NY, USA (2012)Google Scholar
  16. 16.
    Gong, D., Medioni, G., Zhu, S., Zhao, X.: Kernelized temporal cut for online temporal segmentation and recognition. In: Proceedings of the 12th European Conference on Computer Vision - Volume Part III. ECCV’12, pp. 229–243. Springer, Berlin, Heidelberg (2012)Google Scholar
  17. 17.
    Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR, pp. 2046–2053 (2010)Google Scholar
  18. 18.
    Lan, R., Sun, H.: Automated human motion segmentation via motion regularities. J. Vis. Commun. Image Represent. 31, 35–53 (2015)Google Scholar
  19. 19.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV 1, 432–439 (2003)Google Scholar
  20. 20.
    LaViola, J.: 3d gestural interaction: The state of the field. ISRN Artificial Intelligence p. 514641 (2013)Google Scholar
  21. 21.
    Lewiner, T., Gomes, J., Lopes, H., Craizer, M.: Curvature and torsion estimators based on parametric curve fitting. Comput. Gr. 29(5), 641–655 (2005)CrossRefGoogle Scholar
  22. 22.
    Li, W., Zhang, Z., Liu, Z.: Expandable data-driven graphical modeling of human actions based on salient postures. Circuits Syst. Video Technol. 18(11), 1499–1510 (2008)CrossRefGoogle Scholar
  23. 23.
    Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: CVPR W. Human Communicative Behavior Analysis, pp. 9–14 (2010)Google Scholar
  24. 24.
    Liu, Z.: MSR action recognition datasets and codes. (2014)
  25. 25.
    Lo Presti, L., La Cascia, M.: 3d skeleton-based human action classification. Comput. Gr. 53, 130–147 (2016)Google Scholar
  26. 26.
    Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and Viterbi path searching. In: CVPR, pp. 1–8 (2007)Google Scholar
  27. 27.
    Miranda, L., Vieira, T., Martinez, D., Lewiner, T., Vieira, A.W., Campos, M.F.M.: Keypose and gesture database. (2014)
  28. 28.
    Miranda, L., Vieira, T., Martinez, D., Lewiner, T., Vieira, A.W., Campos, M.F.M.: Online gesture recognition from pose kernel learning and decision forests. Comput. Gr. 39, 65–73 (2014)Google Scholar
  29. 29.
    Müller, M., Baak, A., Seidel, H.P.: Efficient and robust annotation of motion capture data. In: SCA, pp. 17–26 (2009)Google Scholar
  30. 30.
    Müller, M., Röder, T.: Motion templates for automatic classification and retrieval of motion capture data. In: SCA, pp. 137–146 (2006)Google Scholar
  31. 31.
    Niebles, J.C., Chen, C.W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: ECCV, pp. 392–405 (2010)Google Scholar
  32. 32.
    Nowozin, S., Shotton, J.: Action points: A representation for low-latency online human action recognition. Tech. Rep. MSR-TR-2012-68, Microsoft Research Cambridge (2012)Google Scholar
  33. 33.
    Padilla-López, J.R., Chaaraoui, A.A., Flórez-Revuelta, F.: A discussion on the validation tests employed to compare human action recognition methods using the MSR action3d dataset. CoRR (2015)Google Scholar
  34. 34.
    Poppe, R.: A survey on vision-based human action recognition. Comput. Gr. 28(6), 976–990 (2010)Google Scholar
  35. 35.
    Raptis, M., Kirovski, D., Hoppe, H.: Real-time classification of dance gestures from skeleton animation. In: SCA, pp. 147–156 (2011)Google Scholar
  36. 36.
    Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR, pp. 1297–1304 (2011)Google Scholar
  37. 37.
    Slama, R., Wannous, H., Daoudi, M.: Grassmannian representation of motion depth for 3d human gesture and action recognition. In: ICPR, pp. 3499–3504 (2014)Google Scholar
  38. 38.
    Slama, R., Wannous, H., Daoudi, M., Srivastava, A.: Accurate 3d action recognition using learning on the Grassmann manifold. Comput. Gr. 48(2), 556–567 (2015)Google Scholar
  39. 39.
    Sun, J., Wu, X., Yan, S., Cheong, L., Chua, T., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: CVPR, pp. 2004–2011 (2009)Google Scholar
  40. 40.
    Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: CVPR, pp. 588–595 (2014)Google Scholar
  41. 41.
    Vieira, A.W., Lewiner, T., Schwartz, W., Campos, M.F.M.: Distance matrices as invariant features for classifying mocap data. In: ICPR, pp. 2934–2937 (2012)Google Scholar
  42. 42.
    Vieira, A.W., Nascimento, E.R., Oliveira, G.L., Liu, Z., Campos, M.F.: On the improvement of human action recognition from depth map sequences using Space-Time Occupancy Patterns. Comput. Gr. 36(15), 221–227 (2014)Google Scholar
  43. 43.
    Vieira, A.W., Nascimento, E.R., Oliveira, G.L., Liu, Z., Campos, M.M.: STOP: Space-time occupancy patterns for 3d action recognition from depth map sequences. In: CIARP, pp. 252–259 (2012)Google Scholar
  44. 44.
    Vieira, T., Faugeroux, R., Martínez, D., Lewiner, T.: Speed-aware gesture database. (2016)
  45. 45.
    Vitaladevuni, S.N., Kellokumpu, V., Davis, L.S.: Action recognition using ballistic dynamics. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)Google Scholar
  46. 46.
    Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3d action recognition with random occupancy patterns. In: ECCV, pp. 872–885 (2012)Google Scholar
  47. 47.
    Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp. 1290–1297 (2012)Google Scholar
  48. 48.
    Weinland, D., Boyer, E.: Action recognition using exemplar-based embedding. In: CVPR, pp. 1–7 (2005)Google Scholar
  49. 49.
    Xia, L., Chen, C.C., Aggarwal, J.: View invariant human action recognition using histograms of 3d joints. In: CVPR W. on Human Activity Understanding from 3D Data, pp. 20–27 (2012)Google Scholar
  50. 50.
    Yang, X., Tian, Y.: Eigenjoints-based action recognition using naïve-Bayes-nearest-neighbor. In: CVPR W. On Human Activity Understanding from 3D Data, pp. 14–19 (2012)Google Scholar
  51. 51.
    Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Multimedia, pp. 1057–1060 (2012)Google Scholar
  52. 52.
    Ye, J., Li, K., Qi, G.J., Hua, K.A.: Temporal order-preserving dynamic quantization for human action recognition from multimodal sensor streams. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ICMR ’15, pp. 99–106 (2015)Google Scholar
  53. 53.
    Yu, G., Liu, Z., Yuan, J.: Discriminative orderlet mining for real-time recognition of human-object interaction. In: ACCV, Lecture Notes in Computer Science, pp. 50–65 (2014)Google Scholar
  54. 54.
    Zhao, X., Li, X., Pang, C., Zhu, X., Sheng, Q.Z.: Online human gesture recognition from motion data streams. In: Proceedings of the 21st ACM International Conference on Multimedia. MM ’13, pp. 23–32. ACM, New York, NY, USA (2013)Google Scholar
  55. 55.
    Zhou, F., De la Torre, F., Hodgins, J.K.: Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 582–596 (2013)CrossRefGoogle Scholar
  56. 56.
    Zhu, Y., Chen, W., Guo, G.: Evaluating spatiotemporal interest point features for depth-based action recognition. Image Vis. Comput. 32(8), 453–464 (2014)CrossRefGoogle Scholar
  57. 57.
    Zhu, Y., Chen, W., Guo, G.: Fusing multiple features for depth-based action recognition. Image Vis. Comput. 6(2), 18:1–18:20 (2015)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Institute of MathematicsFederal University of AlagoasMaceióBrazil
  2. 2.École PolytechniqueParisFrance
  3. 3.Department of MathematicsFederal University of AmazonasManausBrazil

Personalised recommendations