Multimedia Tools and Applications

, Volume 76, Issue 3, pp 4651–4669 | Cite as

Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features



This paper presents a new framework for human action recognition from depth sequences. An effective depth feature representation is developed based on the fusion of 2D and 3D auto-correlation of gradients features. Specifically, depth motion maps (DMMs) are first employed to transform a depth sequence into three images capturing shape and motion cues. A feature extraction method utilizing spatial and orientational auto-correlations of image local gradients is introduced to extract features from DMMs. Space-time auto-correlation of gradients features are also extracted from depth sequences as complementary features to cope with the temporal information loss in the DMMs generation. Each set of features is used as input to two extreme learning machine classifiers to generate probability outputs. A weighted fusion strategy is proposed to assign different weights to the classifier probability outputs associated with different features, thereby providing more flexibility in the final decision making. The proposed method is evaluated on two depth action datasets (MSR Action 3D and MSR Gesture 3D) and obtains the state-of-the-art recognition performance (94.87 % for the MSR Action 3D and 98.50 % for the MSR Gesture 3D).


Action recognition Depth data Depth motion maps Gradient local auto-correlations Space-time auto-correlation of gradients Extreme learning machine Weighted fusion 


  1. 1.
    Aggarwal JK, Lu X (2014) Human activity recognition from 3d data: a review. Pattern Recogn Lett 48:70–80CrossRefGoogle Scholar
  2. 2.
    Chen C, Hou Z, Zhang B, Jiang J, Yang Y (2015) Gradient local auto-correlations and extreme learning machine for depth-based activity recognition. In: 11th international symposium on Visual Computing (ISVC’15), Las Vegas, December 14–16, pp 613–623Google Scholar
  3. 3.
    Chen C, Jafari R, Kehtarnavaz N (2015) Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans Human-Mach Syst 45(1):51–61CrossRefGoogle Scholar
  4. 4.
    Chen C, Jafari R, Kehtarnavaz N (2015) Action recognition from depth sequences using depth motion maps-based local binary patterns. In: 2015 IEEE winter conference on Applications of Computer Vision (WACV). IEEEGoogle Scholar
  5. 5.
    Chen C, Kehtarnavaz N, Jafari R (2014) A medication adherence monitoring system for pill bottles based on a wearable inertial sensor. In: EMBC, p 4983--4986Google Scholar
  6. 6.
    Chen C, Liu K, Jafari R, Kehtarnavaz N (2014) Home-based senior fitness test measurement system using collaborative inertial and depth sensors. In: EMBC, p 4135–4138Google Scholar
  7. 7.
    Chen C, Liu K, Kehtarnavaz N (2013) Real-time human action recognition based on depth motion maps. Journal of real-time image processing, p 1–9Google Scholar
  8. 8.
    Chen C, Zhang B, Su H, Li W, Wang L, (2015) Land-use scene classification using multi-scale completed local binary patterns. Signal, image and video processingGoogle Scholar
  9. 9.
    Chen C, Zhou L, Guo J, Li W, Su H, Guo F (2015) Gabor-filtering-based completed local binary patterns for land-use scene classification. In: 2015 IEEE international conference on multimedia big data, p 324–329Google Scholar
  10. 10.
    Chenyang Z, Tian Y (2013) Edge enhanced depth motion map for dynamic hand gesture recognition. In: 2013 IEEE conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEEGoogle Scholar
  11. 11.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR, p 886–893Google Scholar
  12. 12.
    Georgios E, Singh G, Horaud R (2014) Skeletal quads: Human action recognition using joint quadruples. In: 2014 22nd international conference on Pattern Recognition (ICPR). IEEEGoogle Scholar
  13. 13.
    Gu B, Sheng VS, Tay KY, Romano W, Li S (2015) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416Google Scholar
  14. 14.
    Han J, Pauwels EJ, de Zeeuw PM, de With PHN (2012) Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment. Transactions on Consumer Electronics 58(2):255–263CrossRefGoogle Scholar
  15. 15.
    Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Transactions on Cybernetics 43(5):1318–1334CrossRefGoogle Scholar
  16. 16.
    Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501CrossRefGoogle Scholar
  17. 17.
    Jiang W, et al (2012) Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEEGoogle Scholar
  18. 18.
    Kobayashi T, Otsu N (2008) Image feature extraction using gradient local auto-correlations. In: ECCV, p 346–358Google Scholar
  19. 19.
    Kobayashi T, Otsu N (2012) Motion recognition using local auto-correlation of space-time gradients. Pattern Recogn Lett 33(9):1188–1195CrossRefGoogle Scholar
  20. 20.
    Kong Y, Sattar B, Fu Y (2015) Hierarchical 3D Kernel Descriptors for Action Recognition Using Depth Sequences. IEEE international conference on Automatic Face and Gesture Recognition (FG)Google Scholar
  21. 21.
    Kurakin A, Zhang Z, Liu Z (2012) A real time system for dynamic hand gesture recognition with a depth sensor. In: EUSIPCO, p 1975--1979Google Scholar
  22. 22.
    Li W, Chen C, Su H, Du Q (2015) Local binary patterns and extreme learning machine for hyperspectral imagery classification. IEEE Trans Geosci Remote Sens 53(7):3681–3693CrossRefGoogle Scholar
  23. 23.
    Liu L, Shao L (2013, August) Learning discriminative representations from RGB-D video data. In: Proceedings of the twenty-third international joint conference on artificial intelligence, pp. 1493–1500. AAAI PressGoogle Scholar
  24. 24.
    Lu X, Aggarwal JK (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: 2013 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEEGoogle Scholar
  25. 25.
    Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987CrossRefMATHGoogle Scholar
  26. 26.
    Omar O, Liu Z (2013) Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: 2013 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEEGoogle Scholar
  27. 27.
    Rahmani H, Huynh DQ, Mahmood A, Mian A (2015) Discriminative human action classification using locality-constrained linear coding, Pattern recognition lettersGoogle Scholar
  28. 28.
    Rahmani H, Mahmood A, Huynh DQ, Mian A (2014) Real time action recognition using histograms of depth gradients and random decision forests. In: WACV, p 626–633Google Scholar
  29. 29.
    Raviteja V, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: 2014 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEEGoogle Scholar
  30. 30.
    Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: CVPR, p 1297–1304Google Scholar
  31. 31.
    Vieira AW, Nascimento ER, Oliveira GL, Liu Z, Campos MF (2012) Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences. In: CIARP, p 252–259Google Scholar
  32. 32.
    Wang J, Liu Z, Chorowski J, Chen Z, Wu, Y (2012) Robust 3d action recognition with random occupancy patterns. In: ECCV, p 872–885Google Scholar
  33. 33.
    Wanqing L, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: 2010 IEEE computer society conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEEGoogle Scholar
  34. 34.
    Xiaodong Y, Tian YL (2012) Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In: 2012 IEEE computer society conference on Computer Vision and Pattern Recognition Workshops. IEEEGoogle Scholar
  35. 35.
    Xiaodong Y, Tian YL (2014) Super normal vector for activity recognition using depth sequences. 2014 IEEE conference on Computer Vision and Pattern Recognition (CVPR). IEEEGoogle Scholar
  36. 36.
    Xiaodong Y, Zhang C, Tian YL (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. Proceedings of the 20th ACM international conference on multimedia. ACMGoogle Scholar
  37. 37.
    Yu M, Liu L, Shao L (2015) Structure-preserving binary representations for RGB-D action recognition. In: IEEE Transactions on pattern analysis and machine intelligence. doi:10.1109/TPAMI.2015.2491925
  38. 38.
    Zhang B, Gao Y, Zhao S, Liu J (2010) Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor. IEEE Trans Image Process 19(2):533–544MathSciNetCrossRefGoogle Scholar
  39. 39.
    Zhang B, Shan S, Chen X, Gao W (2007) Histogram of gabor phase patterns (hgpp): A novel object representation approach for face recognition. IEEE Trans Image Process 16(1):57–68Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Electrical EngineeringUniversity of Texas at DallasRichardsonUSA
  2. 2.School of Automation Science and Electrical EngineeringBeihang UniversityBeijingChina
  3. 3.School of Information Science & EngineeringChangzhou UniversityChangzhouChina
  4. 4.School of Computer ScienceChina University of GeosciencesWuhanChina
  5. 5.Engineering Laboratory on Intelligent Perception for Internet of Things (ELIP)Peking University, Shenzhen Graduate SchoolShenzhenChina

Personalised recommendations