Skip to main content
Log in

Action recognition with low observational latency via part movement model

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we address the issue of recognizing human actions with low observational latency, which is vital for many applications such as virtual reality and interactive entertainment. Then our purpose is to achieve competitive action recognition performance from very short video clips. Such a task essentially is challenging because only very limited information is provided. To this end, we first develop a feature extraction method to exploit both motion (local flow) and appearance (local shape) features such that the information insufficiency can be effectively mitigated. Then we propose an action representation method named Part Movement Model (PMM), which explicitly captures the spatial-temporal structure of human actions and divides the actions into discriminative part movements. Consequently, the actions can be better represented and the competitive performance can be achieved although only short clips are used. Finally, we experimentally verify the effectiveness of the proposed method on three benchmark datasets. The results show that short clips of 6−7 frames (0.2−0.3 second video) are enough to achieve the recognition performance comparable to the baselines with high latency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Ali S, Basharat A, Shah M (2007) Chaotic invariants for human action recognition. In: Proceedings of IEEE International Conference on Computer Vision, pp 1–8

  2. Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1948–1955

  3. Casile A, Giese MA (2005) Critical features for the recognition of biological motion. Journal of vision 5(4):6–16

    Article  Google Scholar 

  4. Devanne M, Wannous H, Berretti S, Pala P, Daoudi M, Bimbo AD (2015) 3-D human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Transactions on Cybernetics 45(7):1340–1352

    Article  Google Scholar 

  5. Ellis C, Masood SZ, Tappen MF, LaViola JJ, Rahul S (2013) Exploring the trade-off between accuracy and observational latency in action recognition. Int J Comput Vis 101(3):420–436

    Article  Google Scholar 

  6. Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  7. Felzenszwalb PF, Girshick RB, McAllester D (2010) Cascade object detection with deformable part models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2241–2248

  8. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9):1627–1645

    Article  Google Scholar 

  9. Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. J Artif Intell Res 61(1):55–79

    Google Scholar 

  10. Field DJ (1987) Relations between the statistics of natural images and the response properties of cortical cells. JOSA A. 4(2):2379–2394

    Article  Google Scholar 

  11. Gong J, Caldas CH, Gordon C (2011) Learning and classifying actions of construction workers and equipment using bag-of-videofeature-words and bayesian network models. Adv Eng Inform 25(4):771–782

    Article  Google Scholar 

  12. Gorelick L, Blank M (2007) Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(12):2247–2253

    Article  Google Scholar 

  13. Gorelick L, Blank M (2009) Zero-shot learning with semantic output codes. Advances in neural information processing systems:1410–1418

  14. Guo K, Ishwar P, Konrad J (2010) Action recognition using sparse representation on covariance manifolds of optical flow. In: Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance, pp 188–195

  15. Jain M, Jégou H., Bouthemy P (2013) Better exploiting motion for better action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2555–2562

  16. Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In: Proceedings of IEEE International Conference on Computer Vision, pp 1–8

  17. Ke Y, Sukthankar R, Hebert M (2007) Event detection in crowded videos. In: Proceedings of IEEE International Conference on Computer Vision, pp 1–8

  18. Lan T, Wang Y, Mori G (2011) Discriminative figure-centric models for joint action localization and recognition. In: Proceedings of IEEE International Conference on Computer Vision, pp 2003–2010

  19. Li G, Wang M, Lu Z, Hong R, Chua TS (2012) In-Video Product annotation with web information mining. ACM transactions on multimedia computing Communications, and Applications 8(4):55:1-55:195

    Article  Google Scholar 

  20. Liu J, Kuipers B, Savarese S (2011) Recognizing human actions by attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3337–3344

  21. Minhas R, Mohammed AA, Wu QM (2012) Incremental learning in human action recognition based on snippets. IEEE Transactions on Circuits and Systems for Video Technology 22(11):1529–1541

    Article  Google Scholar 

  22. Niebles JC, Fei-Fei L (2007) A hierarchical model of shape and appearance for human action classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  23. Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1242–1249

  24. Rose RD, Bianchi NC, Gori I, Cuzzolin F (2014) Online action recognition via nonparametric incremental learning. In: Proceedings of British Machine Vision Conference, pp 1–15

  25. Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1234–1241

  26. Schindler K, Van Gool L (2008) Action snippets: How many frames does human action recognition require?. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  27. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of IEEE International Conference on Pattern Recognition, pp 32–36

  28. Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3):411–426

    Article  Google Scholar 

  29. Sun L, Jia K, Chan T, Fang Y, Wang G, Yan S (2014) DL-SFA Deeply-learned slow feature analysis for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2625–2632

  30. Tran D, Sorokin A (2008) Human activity recognition with metric learning. In: Proceedings of European Conference on Computer Vision, pp 548–561

  31. Tu H, Xia L, Wang Z (2014) The complex action recognition via the correlated topic model The Scientific World Journal

  32. Wang H, Klaser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3169–3176

  33. Wang H, Kläser A., Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 13(1):60–79

    Article  MathSciNet  Google Scholar 

  34. Wang H, Oneata D, Verbeek J, Schmid C (2015) A robust and efficient video representation for action recognition. Int J Comput Vis:1–20

  35. Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of IEEE International Conference on Computer Vision, pp 3551–3558

  36. Wang H, Ullah MM, Klaser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: Proceedings of British Machine Vision Conference, pp 124–140

  37. Wang J, Chen Z, Wu Y (2011) Action recognition with multiscale spatio-temporal contexts. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3185–3192

  38. Wang M, Hong R, Li G, Zha ZJ , Yan S, Chua TS (2012) Event driven web video summarization by tag localization and key-shot identification . IEEE Transactions on Multimedia 14(4):975–985

    Article  Google Scholar 

  39. Wang M, Hua XS, Hong R, Tang J, Qi GJ, Song Y (2009) Unified video annotation via Multi-Graph learning. IEEE Transactions on Circuits and Systems for Video Technology 22(5):733–746

    Article  Google Scholar 

  40. Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 489–496

  41. Yang W, Wang Y, Mori G (2009) Human action recognition from a single clip per action. In: Proceedings of IEEE International Conference on Computer Vision Workshops, pp 482–489

  42. Zanfir M, Leordeanu M, Sminchisescu C (2013) The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection. In: Proceedings of IEEE International Conference on Computer Vision, pp 2752–2759

  43. Zhang J, Gong S (2010) Action categorization by structural probabilistic latent semantic analysis. Comput Vis Image Underst 114(8):857–864

    Article  Google Scholar 

  44. Zhang Z, Wang C, Xiao B, Zhou W, Liu S (2015) Robust relative attributes for human action recognition. Pattern Anal Applic 8(1):157–171

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is supported partially by the National Natural Science Foundation of China under Grant 61673362 and 61233003, and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zilei Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Wang, Z. Action recognition with low observational latency via part movement model. Multimed Tools Appl 76, 26675–26693 (2017). https://doi.org/10.1007/s11042-016-4193-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4193-5

Keywords

Navigation