Abstract
In this paper, we address the issue of recognizing human actions with low observational latency, which is vital for many applications such as virtual reality and interactive entertainment. Then our purpose is to achieve competitive action recognition performance from very short video clips. Such a task essentially is challenging because only very limited information is provided. To this end, we first develop a feature extraction method to exploit both motion (local flow) and appearance (local shape) features such that the information insufficiency can be effectively mitigated. Then we propose an action representation method named Part Movement Model (PMM), which explicitly captures the spatial-temporal structure of human actions and divides the actions into discriminative part movements. Consequently, the actions can be better represented and the competitive performance can be achieved although only short clips are used. Finally, we experimentally verify the effectiveness of the proposed method on three benchmark datasets. The results show that short clips of 6−7 frames (0.2−0.3 second video) are enough to achieve the recognition performance comparable to the baselines with high latency.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ali S, Basharat A, Shah M (2007) Chaotic invariants for human action recognition. In: Proceedings of IEEE International Conference on Computer Vision, pp 1–8
Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1948–1955
Casile A, Giese MA (2005) Critical features for the recognition of biological motion. Journal of vision 5(4):6–16
Devanne M, Wannous H, Berretti S, Pala P, Daoudi M, Bimbo AD (2015) 3-D human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Transactions on Cybernetics 45(7):1340–1352
Ellis C, Masood SZ, Tappen MF, LaViola JJ, Rahul S (2013) Exploring the trade-off between accuracy and observational latency in action recognition. Int J Comput Vis 101(3):420–436
Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Felzenszwalb PF, Girshick RB, McAllester D (2010) Cascade object detection with deformable part models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2241–2248
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9):1627–1645
Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. J Artif Intell Res 61(1):55–79
Field DJ (1987) Relations between the statistics of natural images and the response properties of cortical cells. JOSA A. 4(2):2379–2394
Gong J, Caldas CH, Gordon C (2011) Learning and classifying actions of construction workers and equipment using bag-of-videofeature-words and bayesian network models. Adv Eng Inform 25(4):771–782
Gorelick L, Blank M (2007) Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(12):2247–2253
Gorelick L, Blank M (2009) Zero-shot learning with semantic output codes. Advances in neural information processing systems:1410–1418
Guo K, Ishwar P, Konrad J (2010) Action recognition using sparse representation on covariance manifolds of optical flow. In: Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance, pp 188–195
Jain M, Jégou H., Bouthemy P (2013) Better exploiting motion for better action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2555–2562
Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In: Proceedings of IEEE International Conference on Computer Vision, pp 1–8
Ke Y, Sukthankar R, Hebert M (2007) Event detection in crowded videos. In: Proceedings of IEEE International Conference on Computer Vision, pp 1–8
Lan T, Wang Y, Mori G (2011) Discriminative figure-centric models for joint action localization and recognition. In: Proceedings of IEEE International Conference on Computer Vision, pp 2003–2010
Li G, Wang M, Lu Z, Hong R, Chua TS (2012) In-Video Product annotation with web information mining. ACM transactions on multimedia computing Communications, and Applications 8(4):55:1-55:195
Liu J, Kuipers B, Savarese S (2011) Recognizing human actions by attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3337–3344
Minhas R, Mohammed AA, Wu QM (2012) Incremental learning in human action recognition based on snippets. IEEE Transactions on Circuits and Systems for Video Technology 22(11):1529–1541
Niebles JC, Fei-Fei L (2007) A hierarchical model of shape and appearance for human action classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1242–1249
Rose RD, Bianchi NC, Gori I, Cuzzolin F (2014) Online action recognition via nonparametric incremental learning. In: Proceedings of British Machine Vision Conference, pp 1–15
Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1234–1241
Schindler K, Van Gool L (2008) Action snippets: How many frames does human action recognition require?. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of IEEE International Conference on Pattern Recognition, pp 32–36
Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3):411–426
Sun L, Jia K, Chan T, Fang Y, Wang G, Yan S (2014) DL-SFA Deeply-learned slow feature analysis for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2625–2632
Tran D, Sorokin A (2008) Human activity recognition with metric learning. In: Proceedings of European Conference on Computer Vision, pp 548–561
Tu H, Xia L, Wang Z (2014) The complex action recognition via the correlated topic model The Scientific World Journal
Wang H, Klaser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3169–3176
Wang H, Kläser A., Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 13(1):60–79
Wang H, Oneata D, Verbeek J, Schmid C (2015) A robust and efficient video representation for action recognition. Int J Comput Vis:1–20
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of IEEE International Conference on Computer Vision, pp 3551–3558
Wang H, Ullah MM, Klaser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: Proceedings of British Machine Vision Conference, pp 124–140
Wang J, Chen Z, Wu Y (2011) Action recognition with multiscale spatio-temporal contexts. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3185–3192
Wang M, Hong R, Li G, Zha ZJ , Yan S, Chua TS (2012) Event driven web video summarization by tag localization and key-shot identification . IEEE Transactions on Multimedia 14(4):975–985
Wang M, Hua XS, Hong R, Tang J, Qi GJ, Song Y (2009) Unified video annotation via Multi-Graph learning. IEEE Transactions on Circuits and Systems for Video Technology 22(5):733–746
Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 489–496
Yang W, Wang Y, Mori G (2009) Human action recognition from a single clip per action. In: Proceedings of IEEE International Conference on Computer Vision Workshops, pp 482–489
Zanfir M, Leordeanu M, Sminchisescu C (2013) The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection. In: Proceedings of IEEE International Conference on Computer Vision, pp 2752–2759
Zhang J, Gong S (2010) Action categorization by structural probabilistic latent semantic analysis. Comput Vis Image Underst 114(8):857–864
Zhang Z, Wang C, Xiao B, Zhou W, Liu S (2015) Robust relative attributes for human action recognition. Pattern Anal Applic 8(1):157–171
Acknowledgments
This work is supported partially by the National Natural Science Foundation of China under Grant 61673362 and 61233003, and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, Z., Wang, Z. Action recognition with low observational latency via part movement model. Multimed Tools Appl 76, 26675–26693 (2017). https://doi.org/10.1007/s11042-016-4193-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4193-5