Skip to main content

Learning Actionlet Ensemble for 3D Human Action Recognition

  • Chapter
  • First Online:

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

Human action recognition is an important yet challenging task. Human actions usually involve human-object interactions, highly articulated motions, high intra-class variations and complicated temporal structures. The recently developed commodity depth sensors open up new possibilities of dealing with this problem by providing 3D depth data of the scene. This information not only facilitates a rather powerful human motion capturing technique, but also makes it possible to efficiently model human-object interactions and intra-class variations. In this chapter, we propose to characterize the human actions with a novel actionlet ensemble model, which represents the interaction of a subset of human joints. The proposed model is robust to noise, invariant to translational and temporal misalignment, and capable of characterizing both the human motion and the human-object interactions. We evaluate the proposed approach on three challenging action recognition datasets captured by Kinect devices, a multiview action recognition dataset captured with Kinect device, and a dataset captured by a motion capture system. The experimental evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    This dataset will be released to public.

References

  1. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)

    Google Scholar 

  2. “CMU graphics lab motion capture database”, http://www.mocap.cs.cmu.edu/

  3. Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: Human Communicative Behavior Analysis Workshop (in Conjunction with CVPR) (2010)

    Google Scholar 

  4. Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from RGBD images. In: International Conference on Robotics and Automation (2012)

    Google Scholar 

  5. Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR (2012)

    Google Scholar 

  6. Laptev, I.: On space-time interest points. IJCV 64(2–3), 107–123 (2005)

    Google Scholar 

  7. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  8. Dalal, N., Triggs, B., Histograms of oriented gradients for human detection. In: IEEE CVPR, pp. 886–893 (2005)

    Google Scholar 

  9. Campbell, L.W., Bobick, A.F.: Recognition of human body motion using phase space constraints. In: ICCV (1995)

    Google Scholar 

  10. Lv, F., Nevatia, R.: Recognition and segmentation of 3-D human action using HMM and Multi-class AdaBoost. In: ECCV, pp. 359–372 (2006)

    Google Scholar 

  11. Han, L., Wu, X., Liang, W., Hou, G., Jia, Y.: Discriminative human action recognition in the learned hierarchical manifold space. Image Vis. Comput. 28(5), 836–849 (2010)

    Google Scholar 

  12. Ning, H., Xu, W., Gong, Y., Huang, T.: Latent pose estimator for continuous action. In: ECCV, pp. 419–433 (2008)

    Google Scholar 

  13. Chen, H.S., Chen, H.T., Chen, Y.W., Lee, S.Y.: Human action recognition using star skeleton. In: Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks, pp. 171–178, New York, USA (2006)

    Google Scholar 

  14. Martens J., Sutskever, I.: Learning recurrent neural networks with hessian-free optimization. In: ICML (2011)

    Google Scholar 

  15. Muller, M., Röder, T.: Motion templates for automatic classification and retrieval of motion capture data. In: Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp. 137–146, Eurographics Association (2006)

    Google Scholar 

  16. Li L., Prakash, B.A. Time series clustering: complex is simpler! In: ICML (2011)

    Google Scholar 

  17. Dai, S., Yang, M., Wu, Y., Katsaggelos, A.: Detector ensemble. In: IEEE CVPR, pp. 1–8 (2007)

    Google Scholar 

  18. Zhu, L., Chen, Y., Lu, Y., Lin, C., Yuille, A.: Max margin AND/OR graph learning for parsing the human body. In: IEEE CVPR (2008)

    Google Scholar 

  19. Yuan, J., Yang, M., Wu, Y.: Mining discriminative co-occurrence patterns for visual recognition. In: CVPR (2011)

    Google Scholar 

  20. Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: CVPR (2010)

    Google Scholar 

  21. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  22. Bourdev L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: CVPR (2009)

    Google Scholar 

  23. Desai C., Ramanan, D.: Detecting actions, poses, and objects with relational phraselets. In: ECCV (2012)

    Google Scholar 

  24. Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: IEEE CVPR (2011)

    Google Scholar 

  25. Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3D action recognition with random occupancy patterns. In: ECCV, pp. 1–14 (2012)

    Google Scholar 

  26. Vieira, A.W., Nascimento, E.R.. Oliveira, G.L., Liu, Z., Campos, M.M.: STOP: space-time occupancy patterns for 3D action recognition from depth map sequences. In: 17th Iberoamerican Congress on Pattern Recognition, Buenos Aires (2012)

    Google Scholar 

  27. Yang X., Tian, Y.: EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor. In: CVPR 2012 HAU3D, Workshop (2012)

    Google Scholar 

  28. Yang, X., Zhang, C., Tian, Y.L.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM Multimedia (2012)

    Google Scholar 

  29. Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Sequence of the most informative joints (smij): a new representation for human skeletal action recognition. J. Visual Commun Image Represent. 26, 1140–1145 (2013)

    Google Scholar 

  30. Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L. Samaras, D., Brook, S.: Two-person interaction detection using body-pose features and multiple instance learning. In: CVPR 2012 HAU3D Workshop (2012)

    Google Scholar 

  31. Raptis, M., Kirovski, D., Hoppe, H.: Real-time classification of dance gestures from skeleton animation. In: Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation—SCA ’11, p. 147. ACM Press, New York, NY, USA (2011)

    Google Scholar 

  32. Chaudhry, R., Ofli, F., Kurillo, G., Bajcsy, R., and Vidal, R.: Bio-inspired dynamic 3D discriminative skeletal features for human action recognition. In: HAU3D13 (2013)

    Google Scholar 

  33. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE CVPR, vol. 2 (2006)

    Google Scholar 

  34. Oppenheim, A.V., Schafer, R.W., Buck, J.R.: Discrete Time Signal Processing (Prentice Hall Signal Processing Series). Prentice Hall, Upper Saddle River (1999)

    Google Scholar 

  35. Fischler M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Google Scholar 

  36. Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46(1), 131–159 (2002)

    Google Scholar 

  37. Wu, T.-F., Lin, C.-J. Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. JMLR 5, 975–1005 (2004)

    Google Scholar 

  38. Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. Ann. Appl. Stat. 2(3), 916–954 (2008)

    Google Scholar 

  39. Xia, L., Chen, C.-C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints The University of Texas at Austin. In: CVPR 2012 HAU3D Workshop (2012)

    Google Scholar 

  40. Wang, J., Yuan, J., Chen, Z., Wu, Y.: Spatial locality-aware sparse coding and dictionary learning. In: ACML (2012)

    Google Scholar 

  41. Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. arXiv preprint arXiv:1210.1207 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiang Wang .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 The Author(s)

About this chapter

Cite this chapter

Wang, J., Liu, Z., Wu, Y. (2014). Learning Actionlet Ensemble for 3D Human Action Recognition. In: Human Action Recognition with Depth Cameras. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-04561-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04561-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04560-3

  • Online ISBN: 978-3-319-04561-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics