Abstract
A very low dimension frame-level motion descriptor is herein proposed with the capability to represent incomplete dynamics, thus allowing online action prediction. At each frame, a set of local trajectory kinematic cues are spatially pooled using a covariance matrix. The set of frame-level covariance matrices forms a Riemannian manifold that describes motion patterns. A set of statistic measures are computed over this manifold to characterize the sequence dynamics, either globally, or instantaneously from a motion history. Regarding the Riemannian metrics, two different versions are proposed: (1) by considering tangent projections with respect to updated recursive statistics, and (2) by mapping the covariance onto a linear matrix using as reference the identity matrix. The proposed approach was evaluated for two different tasks: (1) for action classification on complete video sequences and (2) for online action recognition, in which the activity is predicted at each frame. The method was evaluated using two public datasets: KTH and UT-interaction. For action classification, the method achieved an average accuracy of 92.27 and 81.67%, for KTH and UT-interaction, respectively. In partial recognition task, the proposed method achieved similar classification rate as for the whole sequence using only the 40 and 70% on KTH and UT sequences, respectively. The code of this work is available at [code].
Similar content being viewed by others
References
Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. Vis Comput 29(10), 983–1009 (2013)
Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods. Front Robot AI 2, 28 (2015)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)
Jiang, Y.-G., Dai, Q., Liu, W., Xue, X., Ngo, C.-W.: Human action recognition in unconstrained videos by explicit motion modeling. IEEE Trans. Image Process. 24(11), 3781–3795 (2015)
Ji, S., Wei, X., Yang, M., Kai, Y.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2012)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. Adv. Neural. Inf. Process. Syst. 27, 568–576 (2014)
Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4305–4314 (2015)
Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-time action recognition with deeply transferred motion vector CNNs. IEEE Trans. Image Process. 27(5), 2326–2339 (2018)
Zhou, B., Andonian, A., Oliva, A., Torralba, A.: Temporal relational reasoning in videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 803–818 (2018)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Zhu, J., Zhu, Z., Zou, W. End-to-end video-level representation learning for action recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 645–650. IEEE (2018)
Qiu, Z., Yao, T., Mei, T.: Learning spatiotemporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5533–5541 (2017)
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6450–6459 (2018)
Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. IEEE Trans. Image Process. 22(6), 2479–2494 (2013)
Moreno, W., Garzón, G., Martı́nez, F.: Frame-level covariance descriptor for action recognition. In: Colombian Conference on Computing, pp. 276–290. Springer (2018)
Kong, Y., Kit, D., Fu, Y.: A discriminative model with multiple temporal scales for action prediction. In: European Conference on Computer Vision, pp. 596–611. Springer (2014)
Lan, T., Chen, T.-C., Savarese, S.: A hierarchical representation for future action prediction. In: European Conference on Computer Vision, pp. 689–704. Springer (2014)
Fernando, B., Gavves, E., Oramas, J.M., Ghodrati, A., Tuytelaars, T.: Modeling video evolution for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5378–5387 (2015)
Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in lstms for activity detection and early detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1942–1950 (2016)
Varol, G., Laptev, I., Schmid, C.: Longterm temporal convolutions for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1510–1517 (2017)
Veeriah, V., Zhuang, N., Qi, G.-J.: Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4041–4049 (2015)
Kong, Y., Tao, Z., Fu, Y.: Deep sequential context networks for action prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1473–1481 (2017)
Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: CVPR 2011, pp. 3169–3176. IEEE (2011)
Fletcher, P.T., Joshi, S.: Riemannian geometry for the statistical analysis of diffusion tensor data. Sig. Process. 87(2), 250–262 (2007)
Pennec, X.: Intrinsic statistics on Riemannian manifolds: basic tools for geometric measurements. J. Math. Imaging Vis. 25(1), 127 (2006)
Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. Int. J. Comput. Vision 66(1), 41–66 (2006)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition. ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)
Ryoo, M.S., Chen, C.-C., Aggarwal, J.K., Roy-Chowdhury, A.: An overview of contest on semantic description of human activities (SDHA) 2010. In: International Conference on Pattern Recognition, pp. 270–285. Springer (2010)
Fletcher, P.T., Joshi, S.:. Principal geodesic analysis on symmetric spaces: statistics of diffusion tensors. In: Computer Vision and Mathematical Methods in Medical and Biomedical Image Analysis, pp. 87–98. Springer (2004)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Gang, Yu., Yuan, J., Liu, Z.: Propagative hough voting for human activity detection and recognition. IEEE Trans. Circuits Syst. Video Technol. 25(1), 87–98 (2014)
Cao, X., Zhang, H., Deng, C., Liu, Q., Liu, H.: Action recognition using 3d daisy descriptor. Mach. Vis. Appl. 25(1), 159–171 (2014)
Nour el houda Slimani, K., Benezeth, Y., Souami, F.: Human interaction recognition based on the co-occurence of visual words. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 455–460 (2014)
Ji, X., Wang, C., Zuo, X., Wang, Y.: Multiple feature voting based human interaction recognition. Int. J. Signal Process. Image Process. Pattern Recognit. 9(1), 323–334 (2016)
Acknowledgements
This research is partially funded by the RTRA Digiteo project MAPOCA. The authors also acknowledge the Vicerrectoría de Investigación y Extensión (VIE) of the Universidad Industrial de Santander for supporting this research registered by the project: Cuantificación de patrones locomotores para el diagnóstico y seguimiento remoto en zonas de difícil acceso with SIVIE code 2697.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Martı́nez Carrillo, F., Gouiffès, M., Garzón Villamizar, G. et al. A compact and recursive Riemannian motion descriptor for untrimmed activity recognition. J Real-Time Image Proc 18, 1867–1880 (2021). https://doi.org/10.1007/s11554-020-01057-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-020-01057-9