Continuous Motion Recognition in Depth Camera Based on Recurrent Neural Networks and Grid-based Average Depth

  • Tao Rong
  • Rui Yang
  • Ruoyu Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10749)


Inspired by the success of using RNN in some other fields, we propose to apply the RNN to recognize human motion based on depth data. RNN can directly model the depth sequence on the time axis, and learn the temporal information more naturally. For represent the skeleton and depth information in video, we use Orderlet features and Grid-based Average Depth (GbAD) proposed in this paper. Finally, we evaluate our models on the MSR 3D Online Action Dataset in comparison with the state-of-the-art methods. Experimental results show that the proposed models outperforms other ones.



This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61672273.


  1. 1.
    Yang, R., Yang, R.: Action segmentation and recognition based on depth HOG and probability distribution difference. In: Huang, D.-S., Bevilacqua, V., Premaratne, P. (eds.) ICIC 2014. LNCS, vol. 8588, pp. 753–763. Springer, Cham (2014). Google Scholar
  2. 2.
    Yu, G., Liu, Z., Yuan, J.: Discriminative orderlet mining for real-time recognition of human-object interaction. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 50–65. Springer, Cham (2015). Google Scholar
  3. 3.
    Eum, H., Yoon, C., Lee, H., Park, M.: Continuous human action recognition using depth-MHI-HOG and a spotter model. Sensors 15, 5197–5227 (2015)CrossRefGoogle Scholar
  4. 4.
    Zhang, J., Li, W., Ogunbona, P.O., Wang, P., Tang, C.: RGB-D-based action recognition datasets: a survey (2016)Google Scholar
  5. 5.
    Yang, R., Yang, R.: DMM-pyramid based deep architectures for action recognition with depth cameras. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 37–49. Springer, Cham (2015). Google Scholar
  6. 6.
    Mikolov, T., Karafiát, M., Burget, L., Cernocky, J., Khudanpur, S.: Recurrent neural network based language model. In: Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Chiba, Japan, pp. 1045–1048, September 2010Google Scholar
  7. 7.
    Bentley, J.: Programming pearls: algorithm design techniques. Commun. ACM 27, 865–873 (1984)CrossRefGoogle Scholar
  8. 8.
    Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–14 (2010)Google Scholar
  9. 9.
    Yang, X., Zhang, C., Tian, Y.L.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM International Conference on Multimedia, pp. 1057–1060 (2012)Google Scholar
  10. 10.
    Yang, X., Tian, Y.L.: EigenJoints-based action recognition using Naive-Bayes-nearest-neighbor. Percept. Mot. Skills 38, 14–19 (2012)Google Scholar
  11. 11.
    Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3D action recognition with random occupancy patterns. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 872–885. Springer, Heidelberg (2012). CrossRefGoogle Scholar
  12. 12.
    Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: Computer Vision and Pattern Recognition, pp. 1290–1297 (2012)Google Scholar
  13. 13.
    Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990)CrossRefGoogle Scholar
  14. 14.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by back-propagating errors, pp. 533–536 (2014)Google Scholar
  15. 15.
    Boden, M.: A guide to recurrent neural networks and backpropagation. Dallas Project Sics Technical Report T Sics (2001)Google Scholar
  16. 16.
    Mikolov, T., Kombrink, S., Burget, L., Cernocky, J.H.: Extensions of recurrent neural network language model. In: IEEE International Conference on Acoustics, pp. 5528–5531 (2011)Google Scholar
  17. 17.
    Xia, L., Aggarwal, J.K.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera, vol. 9, pp. 2834–2841 (2013)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.State Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina
  2. 2.Department of Computer Science and TechnologyNanjing UniversityNanjingChina

Personalised recommendations