Skip to main content

Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features


This paper presents a new framework for human action recognition from depth sequences. An effective depth feature representation is developed based on the fusion of 2D and 3D auto-correlation of gradients features. Specifically, depth motion maps (DMMs) are first employed to transform a depth sequence into three images capturing shape and motion cues. A feature extraction method utilizing spatial and orientational auto-correlations of image local gradients is introduced to extract features from DMMs. Space-time auto-correlation of gradients features are also extracted from depth sequences as complementary features to cope with the temporal information loss in the DMMs generation. Each set of features is used as input to two extreme learning machine classifiers to generate probability outputs. A weighted fusion strategy is proposed to assign different weights to the classifier probability outputs associated with different features, thereby providing more flexibility in the final decision making. The proposed method is evaluated on two depth action datasets (MSR Action 3D and MSR Gesture 3D) and obtains the state-of-the-art recognition performance (94.87 % for the MSR Action 3D and 98.50 % for the MSR Gesture 3D).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

    Aggarwal JK, Lu X (2014) Human activity recognition from 3d data: a review. Pattern Recogn Lett 48:70–80

    Article  Google Scholar 

  2. 2.

    Chen C, Hou Z, Zhang B, Jiang J, Yang Y (2015) Gradient local auto-correlations and extreme learning machine for depth-based activity recognition. In: 11th international symposium on Visual Computing (ISVC’15), Las Vegas, December 14–16, pp 613–623

  3. 3.

    Chen C, Jafari R, Kehtarnavaz N (2015) Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans Human-Mach Syst 45(1):51–61

    Article  Google Scholar 

  4. 4.

    Chen C, Jafari R, Kehtarnavaz N (2015) Action recognition from depth sequences using depth motion maps-based local binary patterns. In: 2015 IEEE winter conference on Applications of Computer Vision (WACV). IEEE

  5. 5.

    Chen C, Kehtarnavaz N, Jafari R (2014) A medication adherence monitoring system for pill bottles based on a wearable inertial sensor. In: EMBC, p 4983--4986

  6. 6.

    Chen C, Liu K, Jafari R, Kehtarnavaz N (2014) Home-based senior fitness test measurement system using collaborative inertial and depth sensors. In: EMBC, p 4135–4138

  7. 7.

    Chen C, Liu K, Kehtarnavaz N (2013) Real-time human action recognition based on depth motion maps. Journal of real-time image processing, p 1–9

  8. 8.

    Chen C, Zhang B, Su H, Li W, Wang L, (2015) Land-use scene classification using multi-scale completed local binary patterns. Signal, image and video processing

  9. 9.

    Chen C, Zhou L, Guo J, Li W, Su H, Guo F (2015) Gabor-filtering-based completed local binary patterns for land-use scene classification. In: 2015 IEEE international conference on multimedia big data, p 324–329

  10. 10.

    Chenyang Z, Tian Y (2013) Edge enhanced depth motion map for dynamic hand gesture recognition. In: 2013 IEEE conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE

  11. 11.

    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR, p 886–893

  12. 12.

    Georgios E, Singh G, Horaud R (2014) Skeletal quads: Human action recognition using joint quadruples. In: 2014 22nd international conference on Pattern Recognition (ICPR). IEEE

  13. 13.

    Gu B, Sheng VS, Tay KY, Romano W, Li S (2015) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416

  14. 14.

    Han J, Pauwels EJ, de Zeeuw PM, de With PHN (2012) Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment. Transactions on Consumer Electronics 58(2):255–263

    Article  Google Scholar 

  15. 15.

    Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Transactions on Cybernetics 43(5):1318–1334

    Article  Google Scholar 

  16. 16.

    Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501

    Article  Google Scholar 

  17. 17.

    Jiang W, et al (2012) Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEE

  18. 18.

    Kobayashi T, Otsu N (2008) Image feature extraction using gradient local auto-correlations. In: ECCV, p 346–358

  19. 19.

    Kobayashi T, Otsu N (2012) Motion recognition using local auto-correlation of space-time gradients. Pattern Recogn Lett 33(9):1188–1195

    Article  Google Scholar 

  20. 20.

    Kong Y, Sattar B, Fu Y (2015) Hierarchical 3D Kernel Descriptors for Action Recognition Using Depth Sequences. IEEE international conference on Automatic Face and Gesture Recognition (FG)

  21. 21.

    Kurakin A, Zhang Z, Liu Z (2012) A real time system for dynamic hand gesture recognition with a depth sensor. In: EUSIPCO, p 1975--1979

  22. 22.

    Li W, Chen C, Su H, Du Q (2015) Local binary patterns and extreme learning machine for hyperspectral imagery classification. IEEE Trans Geosci Remote Sens 53(7):3681–3693

    Article  Google Scholar 

  23. 23.

    Liu L, Shao L (2013, August) Learning discriminative representations from RGB-D video data. In: Proceedings of the twenty-third international joint conference on artificial intelligence, pp. 1493–1500. AAAI Press

  24. 24.

    Lu X, Aggarwal JK (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: 2013 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEE

  25. 25.

    Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987

    Article  MATH  Google Scholar 

  26. 26.

    Omar O, Liu Z (2013) Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: 2013 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEE

  27. 27.

    Rahmani H, Huynh DQ, Mahmood A, Mian A (2015) Discriminative human action classification using locality-constrained linear coding, Pattern recognition letters

  28. 28.

    Rahmani H, Mahmood A, Huynh DQ, Mian A (2014) Real time action recognition using histograms of depth gradients and random decision forests. In: WACV, p 626–633

  29. 29.

    Raviteja V, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: 2014 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEE

  30. 30.

    Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: CVPR, p 1297–1304

  31. 31.

    Vieira AW, Nascimento ER, Oliveira GL, Liu Z, Campos MF (2012) Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences. In: CIARP, p 252–259

  32. 32.

    Wang J, Liu Z, Chorowski J, Chen Z, Wu, Y (2012) Robust 3d action recognition with random occupancy patterns. In: ECCV, p 872–885

  33. 33.

    Wanqing L, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: 2010 IEEE computer society conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE

  34. 34.

    Xiaodong Y, Tian YL (2012) Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In: 2012 IEEE computer society conference on Computer Vision and Pattern Recognition Workshops. IEEE

  35. 35.

    Xiaodong Y, Tian YL (2014) Super normal vector for activity recognition using depth sequences. 2014 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEE

  36. 36.

    Xiaodong Y, Zhang C, Tian YL (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. Proceedings of the 20th ACM international conference on multimedia. ACM

  37. 37.

    Yu M, Liu L, Shao L (2015) Structure-preserving binary representations for RGB-D action recognition. In: IEEE Transactions on pattern analysis and machine intelligence. doi:10.1109/TPAMI.2015.2491925

  38. 38.

    Zhang B, Gao Y, Zhao S, Liu J (2010) Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor. IEEE Trans Image Process 19(2):533–544

    MathSciNet  Article  Google Scholar 

  39. 39.

    Zhang B, Shan S, Chen X, Gao W (2007) Histogram of gabor phase patterns (hgpp): A novel object representation approach for face recognition. IEEE Trans Image Process 16(1):57–68

Download references


This work was supported in the part by Natural Science Foundation of China, under Contracts 61272052 and 61473086, and by the Program for New Century Excellent Talents University of Ministry of Education of China. Thanks for the support of PAPD and CICAEET. Baochang Zhang is the correspondence.

Author information



Corresponding authors

Correspondence to Baochang Zhang or Zhenjie Hou.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, C., Zhang, B., Hou, Z. et al. Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features. Multimed Tools Appl 76, 4651–4669 (2017).

Download citation


  • Action recognition
  • Depth data
  • Depth motion maps
  • Gradient local auto-correlations
  • Space-time auto-correlation of gradients
  • Extreme learning machine
  • Weighted fusion