Advertisement

3D Skeletal Gesture Recognition via Sparse Coding of Time-Warping Invariant Riemannian Trajectories

  • Xin Liu
  • Guoying ZhaoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11295)

Abstract

3D skeleton based human representation for gesture recognition has increasingly attracted attention due to its invariance to camera view and environment dynamics. Existing methods typically utilize absolute coordinate to present human motion features. However, gestures are independent of the performer’s locations, and the features should be invariant to the body size of performer. Moreover, temporal dynamics can significantly distort the distance metric when comparing and identifying gestures. In this paper, we represent each skeleton as a point in the product space of special orthogonal group SO3, which explicitly models the 3D geometric relationships between body parts. Then, a gesture skeletal sequence can be characterized by a trajectory on a Riemannian manifold. Next, we generalize the transported square-root vector field to obtain a re-parametrization invariant metric on the product space of SO(3), therefore, the goal of comparing trajectories in a time-warping invariant manner is realized. Furthermore, we present a sparse coding of skeletal trajectories by explicitly considering the labeling information with each atoms to enforce the discriminant validity of dictionary. Experimental results demonstrate that proposed method has achieved state-of-the-art performance on three challenging benchmarks for gesture recognition.

Keywords

Gesture recognition Manifold Sparse coding 

Notes

Acknowledgments

This work is supported by Academy of Finland, Tekes Fidipro Program, Infotech, Tekniikan Edistamissaatio, and Nokia Foundation.

References

  1. 1.
    Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)CrossRefGoogle Scholar
  2. 2.
    Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 1–13 (2016)CrossRefGoogle Scholar
  3. 3.
    Anirudh, R., Turaga, P., Su, J., Srivastava, A.: Elastic functional coding of human actions: from vector-fields to latent variables. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, pp. 3147–3155 (2015)Google Scholar
  4. 4.
    Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: 3D human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Trans. Cybern. 45(7), 1340–1352 (2015)CrossRefGoogle Scholar
  5. 5.
    Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118. IEEE (2015)Google Scholar
  6. 6.
    Escalera, S., et al.: ChaLearn looking at people challenge 2014: dataset and results. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8925, pp. 459–473. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16178-5_32CrossRefGoogle Scholar
  7. 7.
    Guo, Y., Li, Y., Shao, Z.: RRV: a spatiotemporal descriptor for rigid body motion recognition. IEEE Trans. Cybern. 48, 1513–1525 (2017)CrossRefGoogle Scholar
  8. 8.
    Ho, J., Xie, Y., Vemuri, B.: On a nonlinear generalization of sparse coding and dictionary learning. In: Proceedings of the International Conference on Machine Learning, pp. 1480–1488 (2013)Google Scholar
  9. 9.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  10. 10.
    Jiang, Z., Lin, Z., Davis, L.S.: Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2651–2664 (2013)CrossRefGoogle Scholar
  11. 11.
    Karcher, H.: Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math. 30(5), 509–541 (1977)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition Workshops, pp. 9–14. IEEE (2010)Google Scholar
  13. 13.
    Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_50CrossRefGoogle Scholar
  14. 14.
    Liu, J., Wang, G., Hu, P., Duan, L.Y., Kot, A.C.: Global context-aware attention LSTM networks for 3D action recognition. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, pp. 1647–1656 (2017)Google Scholar
  15. 15.
    Neverova, N., Wolf, C., Taylor, G., Nebout, F.: ModDrop: adaptive multi-modal gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1692–1706 (2016)CrossRefGoogle Scholar
  16. 16.
    Oreifej, O., Liu, Z.: HON4D: histogram of oriented 4D normals for activity recognition from depth sequences. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, pp. 716–723 (2013)Google Scholar
  17. 17.
    Srivastava, A., Klassen, E., Joshi, S.H., Jermyn, I.H.: Shape analysis of elastic curves in Euclidean spaces. IEEE Trans. Pattern Anal. Mach. Intell. 33(7), 1415–1428 (2011)CrossRefGoogle Scholar
  18. 18.
    Su, J., Kurtek, S., Klassen, E., Srivastava, A.: Statistical analysis of trajectories on Riemannian manifolds: bird migration, hurricane tracking and video surveillance. Ann. Appl. Stat. 8, 530–552 (2014)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a Lie group. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, pp. 588–595. IEEE (2014)Google Scholar
  20. 20.
    Wang, J., Liu, Z., Wu, Y., Yuan, J.: Learning actionlet ensemble for 3D human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 914–927 (2014)CrossRefGoogle Scholar
  21. 21.
    Weng, J., Weng, C., Yuan, J.: Spatio-temporal Naive-Bayes nearest-neighbor (ST-NBNN) for skeleton-based action recognition. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (2017)Google Scholar
  22. 22.
    Wu, D., Shao, L.: Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, pp. 724–731. IEEE (2014)Google Scholar
  23. 23.
    Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition Workshops, pp. 20–27. IEEE (2012)Google Scholar
  24. 24.
    Yang, X., Tian, Y.: Eigenjoints-based action recognition using Naive-Bayes-nearest-neighbor. In: PProceedings of the IEEE Conference Computer Vision and Pattern Recognition Workshops, pp. 14–19. IEEE (2012)Google Scholar
  25. 25.
    Zhang, Q., Li, B.: Discriminative K-SVD for dictionary learning in face recognition. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, pp. 2691–2698. IEEE (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Center for Machine Vision and Signal AnalysisUniversity of OuluOuluFinland

Personalised recommendations