Sparse coding-based space-time video representation for action recognition

Fu, Yinghua; Zhang, Tao; Wang, Wenjin

doi:10.1007/s11042-016-3630-9

Sparse coding-based space-time video representation for action recognition

Published: 25 June 2016

Volume 76, pages 12645–12658, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yinghua Fu^1,2,
Tao Zhang¹ &
Wenjin Wang²

400 Accesses
12 Citations
Explore all metrics

Abstract

Methods based on feature descriptors around local interest points are now widely used in action recognition. Feature points are detected using a number of measures, namely saliency, periodicity, motion activity etc. Each of these measures is usually intensity-based and provides a trade-off between density and informativeness. In this paper, we address the problem of action recognition by representing image sequences as a sparse collection of patch-level space-time events that are salient in both space and time domain. Our method uses a multi-scale volumetric representation of video and adaptively selects an optimal space-time scale under which the saliency of a patch is most significant. The input image sequences are first partitioned into non-overlapping patches. Then, each patch is represented by a vector of coefficients that can linearly reconstruct the patch from a learned dictionary of basis patches. The space-time saliency of patches is measured by Shannon’s self-information entropy, where a patch’s saliency is determined by information variation in the contents of the patch’s spatiotemporal neighborhood. Experimental results on three benchmark datasets demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness

A Non-negative Low Rank and Sparse Model for Action Recognition

Hessian Regularized Sparse Coding for Human Action Recognition

References

Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv 43:1–43
Article Google Scholar
Aharon M, Elad M, Bruckstein A (2006) K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. TIP 54(11):4311–4322
Google Scholar
Ali S, Shah M (2010) Human action recognition in videos using kinematic features and multiple instance learning. PAMI 32(2):288–303
Article Google Scholar
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. ICCV 2:1395–1402
Google Scholar
Bobick A, Davis J (2001) The recognition of human movement using temporal templates. PAMI 23(3):257–267
Article Google Scholar
Bruce ND, Tsotsos JK (2009) Saliency, attention, and visual search: An information theoretic approach. J Vis 36(3):1–24
Google Scholar
Chen MY, Hauptmann A (2009) MoSIFT: Recognizing human actions in surveillance videos. Carnegie Mellon University, Tech. rep.
Google Scholar
Chen H, Chen H, Chen Y, Lee S Human action recognition using star skeleton. In: Proceedings of the International Workshop on Video Surveillance and Sensor Networks (VSSN06), Santa Barbara, CA, October 2006, pp. 171–178
Cheng MM, Zhang GX, Mitra NJ et al (2011) Global contrast based salient region detection. CVPR:409–416
Efros A, Berg A, Mori G, Malik J (2003) Recognizing action at a distance. ICCV 2:726–733
Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22
Article Google Scholar
Gangeh MJ, Ghodsi A, Kamel MS (2013) Kernelized supervised dictionary learning. TIP 61:4753–4767
MathSciNet Google Scholar
Gangeha M, Farahatc A, Ghodsid A, Kamel M (2015) Supervised dictionary learning and sparse representation-a review computer vision and pattern recognition
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. NIPS:545–552
Kadir T, Brady M (2003) Scale saliency: a novel approach to salient feature and scale selection. VIE:25–28
Laptev I, Lindeberg T (2003) Space-time interest points. ICCV 1:432–439
MATH Google Scholar
Laptev I, Lindeberg T (2003) Space-time interest points. ICCV:432C439
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. CVPR:1–8
Li J, Levine MD, An X et al (2011) Saliency detection based on frequency and spatial domain analysis, BMVC. Dundee 86:1–11
Google Scholar
Liu T, Yuan Z, Sun J et al (2011) Learning to detect a salient object. PAMI 33(2):353–367
Article Google Scholar
Mairal J, Bach F, Ponce J, Sapiro G, Zisserman A (2008) Supervised dictionary learning. NIPS:1033–1040
Mairal G, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. ICML:689–696
Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60
MathSciNet MATH Google Scholar
Mairal J, Bach F, Ponce J (2012) Task-driven dictionary learning. PAMI 34:791–804
Article Google Scholar
Niebles J, Wang H, Li F (2008) Unsupervised learning of human action categories using spatialCtemporal words. IJCV 79(3):299–318
Article Google Scholar
Oikonomopoulos A, Patras I, Pantic M (2006) Spatiotemporal salient points for visual recognition of human actions. IEEE Trans. Systems, Man, and Cybernetics, Part B 36(3):710–719
Article Google Scholar
Poppe R (2010) A survey on vision-based human action recognition. Image Vision Comput 28(6):976C990
Article Google Scholar
Rapantzikos K, Avrithis Y, Kollias S (2007) Spatiotemporal saliency for event detection and representation in the 3D wavelet domain: potential in human action recognition. ICIVR:294–301
Rapantzikos K, Avrithis Y, Kollias S (2009) Dense saliency-based spatiotemporal feature points for action recognition. CVPR:1–8
Rudoy D, Goldman D et al (2013) Learning video saliency from human gaze using candidate selection. CVPR:4321–4328
Schdt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. ICPR 3:32–36
Google Scholar
Sun Q, Liu H (2012) Action disambiguation analysis using NorMalized google-like distance correlogram, ACCV
Wang L, Suter D (2006) Informative shape representations for human action recognition. ICPR 2:1266–1269
Google Scholar
Wang Y, Mori G (2009) Human action recognition by semilatent topic models. PAMI 31(10):1762–1774
Article Google Scholar
Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3D exemplars. ICCV:1C8
Weinland D, Boyer E, Rhone-alpes L (2008) Action recognition using exemplar-based embedding. CVPR:1–7
Willems G, Tuytelaars T, Gool L (2008) An efficient dense and scaleinvariant spatio-temporal interest point detector. ECCV part 2:650–663
Google Scholar
Yang JC, Yu C, Thomas H (2011) Supervised translationinvariant sparse coding. CVPR:3517–3524

Download references

Acknowledgments

This research is partly supported by the National 973 Program of China (2013CB329401) and NSFC, China (No: 61273258, 61105001). Shanghai Key Lab of Modern Optical System gives much help for providing the experiment material.

Author information

Authors and Affiliations

Department of Automation, Shanghai Jiao Tong University, Shanghai, China
Yinghua Fu & Tao Zhang
School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China
Yinghua Fu & Wenjin Wang

Authors

Yinghua Fu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenjin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yinghua Fu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, Y., Zhang, T. & Wang, W. Sparse coding-based space-time video representation for action recognition. Multimed Tools Appl 76, 12645–12658 (2017). https://doi.org/10.1007/s11042-016-3630-9

Download citation

Received: 03 November 2015
Revised: 21 March 2016
Accepted: 19 May 2016
Published: 25 June 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s11042-016-3630-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse coding-based space-time video representation for action recognition

Abstract

Access this article

Similar content being viewed by others

Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness

A Non-negative Low Rank and Sparse Model for Action Recognition

Hessian Regularized Sparse Coding for Human Action Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sparse coding-based space-time video representation for action recognition

Abstract

Access this article

Similar content being viewed by others

Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness

A Non-negative Low Rank and Sparse Model for Action Recognition

Hessian Regularized Sparse Coding for Human Action Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation