Multimedia Tools and Applications

, Volume 78, Issue 5, pp 6329–6353 | Cite as

Effective human action recognition using global and local offsets of skeleton joints

  • Bin Sun
  • Dehui KongEmail author
  • Shaofan Wang
  • Lichun Wang
  • Yuping Wang
  • Baocai Yin


Human action recognition based on 3D skeleton joints is an important yet challenging task. While many research work are devoted to 3D action recognition, they mainly suffer from two problems: complex model representation and low implementation efficiency. To tackle these problems, we propose an effective and efficient framework for 3D action recognition using a global-and-local histogram representation model. Our method consists of a global-and-local featuring phase, a saturation based histogram representation phase, and a classification phase. The global-and-local featuring phase captures the global feature and local feature of each action sequence using the joint displacement between the current frame and the first frame, and the joint displacement between pairwise fixed-skip frames, respectively. The saturation based histogram representation phase captures the histogram representation of each joint considering the motion independence of joints and saturation of each histogram bin. The classification phase measures the distance of each joint histogram-to-class. Besides, we produce a novel action dataset called BJUT Kinect dataset, which consists of multi-period motion clips and intra-class variations. We compare our method with many state-of-the-art methods on BJUT Kinect dataset, UCF Kinect dataset, Florence 3D action dataset, MSR-Action3D dataset, and NTU RGB+D Dataset. The results show that our method achieves both higher accuracy and efficiency for 3D action recognition.


Action recognition Skeleton joints Offsets Histogram representation Naive-Bayes-Nearest-Neighbor 



This work was supported by National Natural Science Foundation of China (No. 61772048, 61632006), Beijing Natural Science Foundation (No. 4162009), Beijing Municipal Science and Technology Project(No. Z161100001116072, Z171100004417023).


  1. 1.
    Agahian S, Negin F, Köse C (2018) Improving bag-of-poses with semi-temporal pose descriptors for skeleton-based action recognition. Vis Comput. CrossRefGoogle Scholar
  2. 2.
    Anirudh R, Turaga P, Su J, Srivastava A (2017) Elastic functional coding of Riemannian trajectories. IEEE Trans Pattern Anal Mach Intell 39(5):922–936CrossRefGoogle Scholar
  3. 3.
    Beh J, Han D, Durasiwami R, Ko H (2014) Hidden markov model on a unit hypersphere space for gesture trajectory recognition. Pattern Recogn Lett 36:144–153CrossRefGoogle Scholar
  4. 4.
    Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE conference on computer vision and pattern recognition, pp 1–8Google Scholar
  5. 5.
    Chaaraoui A, Padilla-Lopez J, Climent-Perez P, Florez-Revuelta F (2014) Evolutionary joint selection to improve human action recognition with RGB-d devices. Expert Syst Appl 41:786–794CrossRefGoogle Scholar
  6. 6.
    Chen H, Hwang J (2011) Integrated video object tracking with applications in trajectory-based event detection. J Vis Commun Image Represent 22:673–685CrossRefGoogle Scholar
  7. 7.
    Chen W, Guo G (2015) Triviews: a general framework to use 3D depth data effectively for action recognition. J Vis Commun Image Represent 26:182–191MathSciNetCrossRefGoogle Scholar
  8. 8.
    Chen H, Wang G, Xue J-H, He L (2016) A novel hierarchical framework for human action recognition. Pattern Recogn 55:148–159CrossRefGoogle Scholar
  9. 9.
    Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low-and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst 43:996–1002CrossRefGoogle Scholar
  10. 10.
    Devanne M, Wannous H, Berretti S, Pala P, Daoudi M, Del Bimbo A (2015) 3-D human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Trans Cybern 45(7):1340–1352CrossRefGoogle Scholar
  11. 11.
    Dong J, Sun C, Yang W (2015) A supervised dictionary learning and discriminative weighting model for action recognition. Neurocomputing 158:246–256CrossRefGoogle Scholar
  12. 12.
    Ellis C, Masood S, Tappen M, Laviola J, Sukthankar R (2013) Exploring the trade-off between accuracy and observational latency in action recognition. Int J Comput Vis 101:420–436CrossRefGoogle Scholar
  13. 13.
    Evangelidis G, Singh G, Horaud R (2014) Skeletal quads: human action recognition using joint quadruples. In: International conference on pattern recognition, pp 4513–4518Google Scholar
  14. 14.
    Eweiwi A, Cheema MS, Bauckhage C, Gall J (2014) Efficient pose-based action recognition. In: Asian conference on computer vision, pp 428–443Google Scholar
  15. 15.
    Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: IEEE conference on computer vision and pattern recognitionGoogle Scholar
  16. 16.
    Hu J-F, Zheng W-S, Lai J, Zhang J (2015) Jointly learning heterogeneous features for RGB-D activity recognition. In: IEEE conference on computer vision and pattern recognition, pp 5344–5352Google Scholar
  17. 17.
    Jiang X, Zhong F, Peng Q, Qin X (2016) Action recognition based on global optimal similarity measuring. Multimedia Tools and Applications 75:11019–11036CrossRefGoogle Scholar
  18. 18.
    Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE conference on computer vision and pattern recognition, pp 1–8Google Scholar
  19. 19.
    Li M, Leung H (2017) Graph-based approach for 3D human skeletal action recognition. Pattern Recogn Lett 87:195–202CrossRefGoogle Scholar
  20. 20.
    Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 9–14Google Scholar
  21. 21.
    Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. In: International conference on pattern recognition, pp 898–901Google Scholar
  22. 22.
    Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: International joint conferences on artificial intelligence, pp 1617–1623Google Scholar
  23. 23.
    Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115CrossRefGoogle Scholar
  24. 24.
    Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: AAAI conference on artificial intelligence, pp 1266–1272Google Scholar
  25. 25.
    Lu G, Zhou Y (2013) Extraction of action patterns using local temporal self-similarities of skeletal body-joints. In: International congress on image and signal processing, pp 96–100Google Scholar
  26. 26.
    Lu G, Zhou Y, Li X, Kudo M (2016) Efficient action recognition via local position offset of 3D skeletal body joints. Multimed Tools Appl 75:3479–3494CrossRefGoogle Scholar
  27. 27.
    Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2017) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed Tools Appl 76(8):10701–10719CrossRefGoogle Scholar
  28. 28.
    Luo J, Wang W, Qi H (2014) Spatio-temporal feature extraction and representation for RGB-d human action recognition. Pattern Recogn Lett 50:139–148CrossRefGoogle Scholar
  29. 29.
    Luvizon DC, Tabia H, Picard D (2017) Learning features combination for human action recognition from skeleton sequences. Pattern Recogn Lett 99:13–20CrossRefGoogle Scholar
  30. 30.
    Matikainen P, Hebert M, Sukthankar R (2009) Trajectons: Action recognition through the motion analysis of tracked features. In: IEEE 12th international conference on computer vision workshops, pp 514–521Google Scholar
  31. 31.
    Negin F, Özdemir F, Akgül CB, Yüksel KA, Erçil A (2013) A decision forest based feature selection framework for action recognition from rgb-depth cameras. In: International conference image analysis and recognition, pp 648–657Google Scholar
  32. 32.
    Ohn-bar E, Trivedi M (2013) Joint angles similiarities and HOG2 for action recognition. In: IEEE international conference of computer vision and pattern recognition workshops, pp 465–470Google Scholar
  33. 33.
    Oreifej O, Liu Z (2013) Hon4d: histogram of oriented 4D normals for activity recognition from depth sequences. In: IEEE conference on computer vision and pattern recognition, pp 716–723Google Scholar
  34. 34.
    Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: IEEE conference on computer vision and pattern recognition, pp 2847–2854Google Scholar
  35. 35.
    Qiao R, Liu L, Shen C, van den Hengel A (2017) Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition. Pattern Recogn 66:202–212CrossRefGoogle Scholar
  36. 36.
    Roccetti M, Marfia G, Semeraro A (2012) Playing into the wild: a gesture-based interface for gaming in public spaces. J Vis Commun Image Represent 23:426–440CrossRefGoogle Scholar
  37. 37.
    Seidenari L, Varano V, Berretti S, Del Bimbo A, Pala P (2013) Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: IEEE conference on computer vision and pattern recognition workshops, pp 479–485Google Scholar
  38. 38.
    Shahroudy A, Liu J, Ng T-T, Wang G (2016) NTU RGB+ D: a large scale dataset for 3D human activity analysis. In: IEEE conference on computer vision and pattern recognition, pp 1010–1019Google Scholar
  39. 39.
    Sheng B, Yang W, Sun C (2015) Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158:73–80CrossRefGoogle Scholar
  40. 40.
    Slama R, Wannous H, Daoudi M, Srivastava A (2015) Accurate 3D action recognition using learning on the grassmann manifold. Pattern Recogn 48:556–567CrossRefGoogle Scholar
  41. 41.
    Sung J, Ponce C, Selman B, Saxena A (2012) Unstructured human activity detection from RGBD images. In: IEEE international conference on robotics and automation, pp 842–849Google Scholar
  42. 42.
    Veeriah V, Zhuang N, Qi G-J (2015) Differential recurrent neural networks for action recognition. In: IEEE international conference on computer vision, IEEE, pp 4041–4049Google Scholar
  43. 43.
    Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE conference on computer vision and pattern recognition, pp 588– 595Google Scholar
  44. 44.
    Vieira AW, Nascimento ER, Oliveira GL, Liu Z, Campos MF (2012) STOP: space-time occupancy patterns for 3D action recognition from depth map sequences. In: Iberoamerican congress on pattern recognition, pp 252–259CrossRefGoogle Scholar
  45. 45.
    Vieira A, Nascimento E, Oliveira G, Liu Z, Campos M (2014) On the improvement of human action recognition from depth map sequences using space-time occupancy patterns. Pattern Recogn Lett 36:221–227CrossRefGoogle Scholar
  46. 46.
    Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: IEEE conference on computer vision and pattern recognition, pp 1290–1297Google Scholar
  47. 47.
    Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3D action recognition with random occupancy patterns. In: European conference on computer vision, pp 872–885Google Scholar
  48. 48.
    Wang J, Liu Z, Wu Y (2014) Learning actionlet ensemble for 3D human action recognition. IEEE Trans Pattern Anal Mach Intell 36(5):914–927CrossRefGoogle Scholar
  49. 49.
    Wang P, Yuan C, Hu W, Li B, Zhang Y (2016) Graph based skeleton motion representation and similarity measurement for action recognition. In: European conference on computer vision, pp 370– 385Google Scholar
  50. 50.
    Xia L, Chen C, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 20–27Google Scholar
  51. 51.
    Yang X, Tian Y (2014) Effective 3D action recognition using eigenjoints. J Vis Commun Image Represent 25:2–11CrossRefGoogle Scholar
  52. 52.
    Yang X, Tian Y (2014) Super normal vector for activity recognition using depth sequences. In: IEEE conference on computer vision and pattern recognition, pp 804–811Google Scholar
  53. 53.
    Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM international conference on multimedia, pp 1057–1060Google Scholar
  54. 54.
    Yang Y, Deng C, Tao D, Zhang S, Liu W, Gao X (2017) Latent max-margin multitask learning with skelets for 3-D action recognition. IEEE Trans Cybern 47(2):439–448Google Scholar
  55. 55.
    Zhang S, Liu X, Xiao J (2017) On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: IEEE winter conference on applications of computer vision, pp 148– 157Google Scholar
  56. 56.
    Zhou Y, Ming A (2016) Human action recognition with skeleton induced discriminative approximate rigid part model. Pattern Recogn Lett 83:261–267CrossRefGoogle Scholar
  57. 57.
    Zhu Y, Dariush B, Fujimura K (2010) Kinematic self retargeting: a framework for human pose estimation. Comput Vis Image Underst 114:1362–1375CrossRefGoogle Scholar
  58. 58.
    Zhu Y, Chen W, Guo G (2013) Fusing spatiotemporal features and joints for 3D action recognition. In: IEEE conference on computer vision and pattern recognition workshops, pp 486–491Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Bin Sun
    • 1
  • Dehui Kong
    • 1
    Email author
  • Shaofan Wang
    • 1
  • Lichun Wang
    • 1
  • Yuping Wang
    • 1
  • Baocai Yin
    • 2
  1. 1.Beijing Advanced Innovation Center for Future Internet Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, BJUT Faculty of Information TechnologyBeijing University of TechnologyBeijingChina
  2. 2.College of Computer Science and Technology, Faculty of Electronic Information and Electrical EngineeringDalian University of TechnologyDalianChina

Personalised recommendations