Discriminative Feature Learning for Action Recognition Using a Stacked Denoising Autoencoder

  • Ruoxin Sang
  • Peiquan Jin
  • Shouhong Wan
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 297)


In this paper, we propose a novel method to recognize human actions based on the depth information acquired by depth-based cameras. Representations of depth maps are learned and reconstructed using a stacked denoising autoencoder. By adding the category constraint, the learned features are more discriminative and able to capture the small but significant differences between actions. Greedy layer-wise training strategy is used to train the deep neural network. Then we use temporal pyramid matching on the feature representation to generate temporal representation. Finally a linear SVM is trained to classify each sequence into actions. Our method is evaluated on MSR Action3D dataset and show superiority over other popular methods. Experimental results also indicate the great power of our model to restore highly noisy input data.


Action Recognition Feature Learning Stacked Denoising Autoencoders 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 1057–1060. ACM (2012)Google Scholar
  2. 2.
    Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: Computer Vision and Pattern Recognition, CVPR (2012)Google Scholar
  3. 3.
    Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., Moore, R.: Real-time human pose recognition in parts from single depth images. Communications of the ACM 56(1), 116–124 (2013)CrossRefGoogle Scholar
  4. 4.
    Xia, L., Aggarwal, J.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera (2013)Google Scholar
  5. 5.
    Oreifej, O., Liu, Z., Redmond, W.: Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: Computer Vision and Pattern Recognition, CVPR (2013)Google Scholar
  6. 6.
    Luo, J., Wang, W., Qi, H.: Group sparsity and geometry constrained dictionary learning for action recognition from depth maps (2013)Google Scholar
  7. 7.
    Xia, L., Chen, C.C., Aggarwal, J.: View invariant human action recognition using histograms of 3d joints. In: Computer Vision and Pattern Recognition Workshops, CVPRW (2012)Google Scholar
  8. 8.
    Yang, X., Tian, Y.: Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In: Computer Vision and Pattern Recognition Workshops, CVPRW (2012)Google Scholar
  9. 9.
    Laptev, I.: On space-time interest points. International Journal of Computer Vision 64(2-3), 107–123 (2005)CrossRefGoogle Scholar
  10. 10.
    Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance 2005, pp. 65–72. IEEE (2005)Google Scholar
  11. 11.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Computer Vision and Pattern Recognition, CVPR (2008)Google Scholar
  12. 12.
    Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: Computer Vision and Pattern Recognition Workshops, CVPRW (2010)Google Scholar
  13. 13.
    Scholkopft, B., Mullert, K.R.: Fisher discriminant analysis with kernels. Neural Networks for Signal Processing IXGoogle Scholar
  14. 14.
    Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  15. 15.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)CrossRefMATHMathSciNetGoogle Scholar
  17. 17.
    Bengio, Y.: Learning deep architectures for ai. Foundations and Trends® in Machine Learning 2(1), 1–127 (2009)CrossRefMATHMathSciNetGoogle Scholar
  18. 18.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  19. 19.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives (2013)Google Scholar
  20. 20.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research 9999, 3371–3408 (2010)MathSciNetGoogle Scholar
  21. 21.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Computer Vision and Pattern Recognition, CVPR (2006)Google Scholar
  22. 22.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)Google Scholar
  23. 23.
    Martens, J., Sutskever, I.: Learning recurrent neural networks with hessian-free optimization. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 1033–1040 (2011)Google Scholar
  24. 24.
    Müller, M., Röder, T.: Motion templates for automatic classification and retrieval of motion capture data. In: Proceedings of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 137–146. Eurographics Association (2006)Google Scholar
  25. 25.
    Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3d action recognition with random occupancy patterns. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 872–885. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.School of Computer Science and Technology, Key Laboratory of Electromagnetic Space Information, Chinese Academy of SciencesUniversity of Science and Technology of ChinaHefeiChina

Personalised recommendations