Skip to main content
Log in

Sparse coding-based space-time video representation for action recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Methods based on feature descriptors around local interest points are now widely used in action recognition. Feature points are detected using a number of measures, namely saliency, periodicity, motion activity etc. Each of these measures is usually intensity-based and provides a trade-off between density and informativeness. In this paper, we address the problem of action recognition by representing image sequences as a sparse collection of patch-level space-time events that are salient in both space and time domain. Our method uses a multi-scale volumetric representation of video and adaptively selects an optimal space-time scale under which the saliency of a patch is most significant. The input image sequences are first partitioned into non-overlapping patches. Then, each patch is represented by a vector of coefficients that can linearly reconstruct the patch from a learned dictionary of basis patches. The space-time saliency of patches is measured by Shannon’s self-information entropy, where a patch’s saliency is determined by information variation in the contents of the patch’s spatiotemporal neighborhood. Experimental results on three benchmark datasets demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv 43:1–43

    Article  Google Scholar 

  2. Aharon M, Elad M, Bruckstein A (2006) K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. TIP 54(11):4311–4322

    Google Scholar 

  3. Ali S, Shah M (2010) Human action recognition in videos using kinematic features and multiple instance learning. PAMI 32(2):288–303

    Article  Google Scholar 

  4. Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. ICCV 2:1395–1402

    Google Scholar 

  5. Bobick A, Davis J (2001) The recognition of human movement using temporal templates. PAMI 23(3):257–267

    Article  Google Scholar 

  6. Bruce ND, Tsotsos JK (2009) Saliency, attention, and visual search: An information theoretic approach. J Vis 36(3):1–24

    Google Scholar 

  7. Chen MY, Hauptmann A (2009) MoSIFT: Recognizing human actions in surveillance videos. Carnegie Mellon University, Tech. rep.

    Google Scholar 

  8. Chen H, Chen H, Chen Y, Lee S Human action recognition using star skeleton. In: Proceedings of the International Workshop on Video Surveillance and Sensor Networks (VSSN06), Santa Barbara, CA, October 2006, pp. 171–178

  9. Cheng MM, Zhang GX, Mitra NJ et al (2011) Global contrast based salient region detection. CVPR:409–416

  10. Efros A, Berg A, Mori G, Malik J (2003) Recognizing action at a distance. ICCV 2:726–733

    Google Scholar 

  11. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22

    Article  Google Scholar 

  12. Gangeh MJ, Ghodsi A, Kamel MS (2013) Kernelized supervised dictionary learning. TIP 61:4753–4767

    MathSciNet  Google Scholar 

  13. Gangeha M, Farahatc A, Ghodsid A, Kamel M (2015) Supervised dictionary learning and sparse representation-a review computer vision and pattern recognition

  14. Harel J, Koch C, Perona P (2007) Graph-based visual saliency. NIPS:545–552

  15. Kadir T, Brady M (2003) Scale saliency: a novel approach to salient feature and scale selection. VIE:25–28

  16. Laptev I, Lindeberg T (2003) Space-time interest points. ICCV 1:432–439

    MATH  Google Scholar 

  17. Laptev I, Lindeberg T (2003) Space-time interest points. ICCV:432C439

  18. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. CVPR:1–8

  19. Li J, Levine MD, An X et al (2011) Saliency detection based on frequency and spatial domain analysis, BMVC. Dundee 86:1–11

    Google Scholar 

  20. Liu T, Yuan Z, Sun J et al (2011) Learning to detect a salient object. PAMI 33(2):353–367

    Article  Google Scholar 

  21. Mairal J, Bach F, Ponce J, Sapiro G, Zisserman A (2008) Supervised dictionary learning. NIPS:1033–1040

  22. Mairal G, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. ICML:689–696

  23. Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60

    MathSciNet  MATH  Google Scholar 

  24. Mairal J, Bach F, Ponce J (2012) Task-driven dictionary learning. PAMI 34:791–804

    Article  Google Scholar 

  25. Niebles J, Wang H, Li F (2008) Unsupervised learning of human action categories using spatialCtemporal words. IJCV 79(3):299–318

    Article  Google Scholar 

  26. Oikonomopoulos A, Patras I, Pantic M (2006) Spatiotemporal salient points for visual recognition of human actions. IEEE Trans. Systems, Man, and Cybernetics, Part B 36(3):710–719

    Article  Google Scholar 

  27. Poppe R (2010) A survey on vision-based human action recognition. Image Vision Comput 28(6):976C990

    Article  Google Scholar 

  28. Rapantzikos K, Avrithis Y, Kollias S (2007) Spatiotemporal saliency for event detection and representation in the 3D wavelet domain: potential in human action recognition. ICIVR:294–301

  29. Rapantzikos K, Avrithis Y, Kollias S (2009) Dense saliency-based spatiotemporal feature points for action recognition. CVPR:1–8

  30. Rudoy D, Goldman D et al (2013) Learning video saliency from human gaze using candidate selection. CVPR:4321–4328

  31. Schdt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. ICPR 3:32–36

    Google Scholar 

  32. Sun Q, Liu H (2012) Action disambiguation analysis using NorMalized google-like distance correlogram, ACCV

  33. Wang L, Suter D (2006) Informative shape representations for human action recognition. ICPR 2:1266–1269

    Google Scholar 

  34. Wang Y, Mori G (2009) Human action recognition by semilatent topic models. PAMI 31(10):1762–1774

    Article  Google Scholar 

  35. Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3D exemplars. ICCV:1C8

  36. Weinland D, Boyer E, Rhone-alpes L (2008) Action recognition using exemplar-based embedding. CVPR:1–7

  37. Willems G, Tuytelaars T, Gool L (2008) An efficient dense and scaleinvariant spatio-temporal interest point detector. ECCV part 2:650–663

    Google Scholar 

  38. Yang JC, Yu C, Thomas H (2011) Supervised translationinvariant sparse coding. CVPR:3517–3524

Download references

Acknowledgments

This research is partly supported by the National 973 Program of China (2013CB329401) and NSFC, China (No: 61273258, 61105001). Shanghai Key Lab of Modern Optical System gives much help for providing the experiment material.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinghua Fu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, Y., Zhang, T. & Wang, W. Sparse coding-based space-time video representation for action recognition. Multimed Tools Appl 76, 12645–12658 (2017). https://doi.org/10.1007/s11042-016-3630-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3630-9

Keywords

Navigation