Skip to main content
Log in

A compact and recursive Riemannian motion descriptor for untrimmed activity recognition

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

A very low dimension frame-level motion descriptor is herein proposed with the capability to represent incomplete dynamics, thus allowing online action prediction. At each frame, a set of local trajectory kinematic cues are spatially pooled using a covariance matrix. The set of frame-level covariance matrices forms a Riemannian manifold that describes motion patterns. A set of statistic measures are computed over this manifold to characterize the sequence dynamics, either globally, or instantaneously from a motion history. Regarding the Riemannian metrics, two different versions are proposed: (1) by considering tangent projections with respect to updated recursive statistics, and (2) by mapping the covariance onto a linear matrix using as reference the identity matrix. The proposed approach was evaluated for two different tasks: (1) for action classification on complete video sequences and (2) for online action recognition, in which the activity is predicted at each frame. The method was evaluated using two public datasets: KTH and UT-interaction. For action classification, the method achieved an average accuracy of 92.27 and 81.67%, for KTH and UT-interaction, respectively. In partial recognition task, the proposed method achieved similar classification rate as for the whole sequence using only the 40 and 70% on KTH and UT sequences, respectively. The code of this work is available at [code].

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. Vis Comput 29(10), 983–1009 (2013)

    Article  Google Scholar 

  2. Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods. Front Robot AI 2, 28 (2015)

    Article  Google Scholar 

  3. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)

  4. Jiang, Y.-G., Dai, Q., Liu, W., Xue, X., Ngo, C.-W.: Human action recognition in unconstrained videos by explicit motion modeling. IEEE Trans. Image Process. 24(11), 3781–3795 (2015)

    Article  MathSciNet  Google Scholar 

  5. Ji, S., Wei, X., Yang, M., Kai, Y.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2012)

    Article  Google Scholar 

  6. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. Adv. Neural. Inf. Process. Syst. 27, 568–576 (2014)

    Google Scholar 

  7. Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4305–4314 (2015)

  8. Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-time action recognition with deeply transferred motion vector CNNs. IEEE Trans. Image Process. 27(5), 2326–2339 (2018)

    Article  MathSciNet  Google Scholar 

  9. Zhou, B., Andonian, A., Oliva, A., Torralba, A.: Temporal relational reasoning in videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 803–818 (2018)

  10. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)

  11. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

  12. Zhu, J., Zhu, Z., Zou, W. End-to-end video-level representation learning for action recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 645–650. IEEE (2018)

  13. Qiu, Z., Yao, T., Mei, T.: Learning spatiotemporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5533–5541 (2017)

  14. Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6450–6459 (2018)

  15. Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. IEEE Trans. Image Process. 22(6), 2479–2494 (2013)

    Article  MathSciNet  Google Scholar 

  16. Moreno, W., Garzón, G., Martı́nez, F.: Frame-level covariance descriptor for action recognition. In: Colombian Conference on Computing, pp. 276–290. Springer (2018)

  17. Kong, Y., Kit, D., Fu, Y.: A discriminative model with multiple temporal scales for action prediction. In: European Conference on Computer Vision, pp. 596–611. Springer (2014)

  18. Lan, T., Chen, T.-C., Savarese, S.: A hierarchical representation for future action prediction. In: European Conference on Computer Vision, pp. 689–704. Springer (2014)

  19. Fernando, B., Gavves, E., Oramas, J.M., Ghodrati, A., Tuytelaars, T.: Modeling video evolution for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5378–5387 (2015)

  20. Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in lstms for activity detection and early detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1942–1950 (2016)

  21. Varol, G., Laptev, I., Schmid, C.: Longterm temporal convolutions for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1510–1517 (2017)

    Article  Google Scholar 

  22. Veeriah, V., Zhuang, N., Qi, G.-J.: Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4041–4049 (2015)

  23. Kong, Y., Tao, Z., Fu, Y.: Deep sequential context networks for action prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1473–1481 (2017)

  24. Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: CVPR 2011, pp. 3169–3176. IEEE (2011)

  25. Fletcher, P.T., Joshi, S.: Riemannian geometry for the statistical analysis of diffusion tensor data. Sig. Process. 87(2), 250–262 (2007)

    Article  Google Scholar 

  26. Pennec, X.: Intrinsic statistics on Riemannian manifolds: basic tools for geometric measurements. J. Math. Imaging Vis. 25(1), 127 (2006)

    Article  MathSciNet  Google Scholar 

  27. Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. Int. J. Comput. Vision 66(1), 41–66 (2006)

    Article  Google Scholar 

  28. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)

    Article  Google Scholar 

  29. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition. ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)

  30. Ryoo, M.S., Chen, C.-C., Aggarwal, J.K., Roy-Chowdhury, A.: An overview of contest on semantic description of human activities (SDHA) 2010. In: International Conference on Pattern Recognition, pp. 270–285. Springer (2010)

  31. Fletcher, P.T., Joshi, S.:. Principal geodesic analysis on symmetric spaces: statistics of diffusion tensors. In: Computer Vision and Mathematical Methods in Medical and Biomedical Image Analysis, pp. 87–98. Springer (2004)

  32. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

  33. Gang, Yu., Yuan, J., Liu, Z.: Propagative hough voting for human activity detection and recognition. IEEE Trans. Circuits Syst. Video Technol. 25(1), 87–98 (2014)

    Google Scholar 

  34. Cao, X., Zhang, H., Deng, C., Liu, Q., Liu, H.: Action recognition using 3d daisy descriptor. Mach. Vis. Appl. 25(1), 159–171 (2014)

    Article  Google Scholar 

  35. Nour el houda Slimani, K., Benezeth, Y., Souami, F.: Human interaction recognition based on the co-occurence of visual words. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 455–460 (2014)

  36. Ji, X., Wang, C., Zuo, X., Wang, Y.: Multiple feature voting based human interaction recognition. Int. J. Signal Process. Image Process. Pattern Recognit. 9(1), 323–334 (2016)

    Google Scholar 

Download references

Acknowledgements

This research is partially funded by the RTRA Digiteo project MAPOCA. The authors also acknowledge the Vicerrectoría de Investigación y Extensión (VIE) of the Universidad Industrial de Santander for supporting this research registered by the project: Cuantificación de patrones locomotores para el diagnóstico y seguimiento remoto en zonas de difícil acceso with SIVIE code 2697.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabio Martı́nez Carrillo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martı́nez Carrillo, F., Gouiffès, M., Garzón Villamizar, G. et al. A compact and recursive Riemannian motion descriptor for untrimmed activity recognition. J Real-Time Image Proc 18, 1867–1880 (2021). https://doi.org/10.1007/s11554-020-01057-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-020-01057-9

Keywords

Navigation