Learning spatio-temporal features for action recognition from the side of the video

Pei, Lishen; Ye, Mao; Zhao, Xuezhuan; Xiang, Tao; Li, Tao

doi:10.1007/s11760-014-0726-4

Learning spatio-temporal features for action recognition from the side of the video

Original Paper
Published: 29 November 2014

Volume 10, pages 199–206, (2016)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Lishen Pei¹,
Mao Ye¹,
Xuezhuan Zhao²,
Tao Xiang¹ &
…
Tao Li¹

750 Accesses
10 Citations
Explore all metrics

Abstract

A novel spatio-temporal feature learning approach is introduced for action recognition. First, we automatically detect and track the actor, and map the action track to a cuboid. Then, we split the cuboid into block sequences. Each block sequence is represented as a data vector by concatenating the block shape features. For each action category, we use a two-layer network to learn the distribution of the data vectors. The first layer network is constituted by multiple Restricted Boltzmann Machines (RBMs). Each RBM is trained by the data vectors that have the same spatial location. The output of the second layer RBM is the learned spatio-temporal feature. At last, we train a Support Vector Machine classifier for each class to recognize the actions. Experiments on challenging data sets confirm the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Carreira-Perpinan, M.A., Hinton, G.E.: On contrastive divergence learning. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics pp. 33–40 (2005)
Chang, C., Lin, C.: Libsvm : a library for support vector machines. ACM Trans. Intel. Syst. Technol. 2, 27 (2011)
Article Google Scholar
Chen, B., Ting, J.A., Marlin, B., de Freitas, N.: Deep learning of invariant spatio-temporal features from video. In: Workshop of Neural Information Processing Systems (2010)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. Conference on Computer Vision and Pattern Recognition pp. 886–893 (2005)
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. Conference on Computer Vision and Pattern Recognition pp. 1–8 (2008)
Felzenszwalb, P.F., Girshick, R.B., Mcallester, D., Ramanan, D.: Object detection with discriminatively trained part based models. Trans. on Pattern Anal. Mach. Intel. 32(9), 1627–1645 (2010)
Article Google Scholar
Freund, Y., Haussler, D.: Unsupervised learning of distributions on binary vectors using two layer networks. Technical Report UCSC-CRL-94-25 (1994)
Han, B., Comaniciu, D., Zhu, Y., Davis, L.: Sequential kernel density approximation and its application to real-time visual tracking. Trans. Pattern Anal. Mach. Intel. 30(7), 1186–1197 (2008)
Article Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. International Conference on Machine Learning pp. 3212–3220 (2012)
Jiang, Z., Lin, Z., Davis, L.S.: Recognizing human actions by learning and matching shape-motion prototype trees. Trans. Pattern Anal. Mach. Intel. 34(3), 533–547 (2012)
Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. International Conference on Computer Vision pp. 2003–2010 (2011)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. Conference on Computer Vision and Pattern Recognition pp. 1–8 (2008)
Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. Conference on Computer Vision and Pattern Recognition pp. 3361–3368 (2011)
Liang, Z., Wang, X., Huang, R., Lin, L.: An expressive deep model for human action parsing from a single image. International Conference on Multimedia and Expo pp. 1–6 (2014)
Lin, Z., Jiang, Z., Davis, L.S.: Recognizing actions by shape-motion prototype trees. International Conference on Computer Vision pp. 444–451 (2009)
Mahbub, U., Imtiaz, H., Ahad, M.A.R.: Action recognition based on statistical analysis from clustered flow vectors. Signal, Image Video Process. 8(2), 243–253 (2014)
Article Google Scholar
Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)
Article Google Scholar
Pei, L., Ye, M., Xu, P., Zhao, X., Li, T.: Multi-class action recognition based on inverted index of action states. International Conference on Image Processing pp. 3562–3566 (2013)
Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. Conference on Computer Vision and Pattern Recognition pp. 1242–1249 (2012)
Rodriguez, M., Ahmed, J., Shah, M.: Action mach: A spatio-temporal maximum average correlation height filter for action recognition. International Conference on Computer Vision pp. 3361–3366 (2008)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. International Conference on Pattern Recogniztion pp. 32–36 (2004)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. ACM Multimedia pp. 357–360 (2007)
T.Joachims: Optimizing search engines using clickthrough data. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) pp. 133–142 (2002)
Wang, H., Ullah, M.M., Kläser, A., Laptev, L., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. British Machine Vision Conference (2010)
Wang, Y., Mori, G.: Learning a discriminative hidden part model for human action recognition. In: Advances in Neural Information Processing Systems pp. 1721–1728 (2008)
Wu, D., Shao, L.: Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition. Conference on Computer Vision and Pattern Recognition pp. 724–731 (2014)
Zhang, S., Yao, H., Sun, X., Wang, K., Zhang, J., Lu, X., Zhang, Y.: Action recognition based on overcomplete independent component analysis. Inf. sci. 281, 635–647 (2014)
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (61375038).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Center for Robotics, Key Laboratory for NeuroInformation of Ministry of Education, University of Electronic Science and Technology of China, Chengdu, 611731, People’s Republic of China
Lishen Pei, Mao Ye, Tao Xiang & Tao Li
Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu, 610041, People’s Republic of China
Xuezhuan Zhao

Authors

Lishen Pei
View author publications
You can also search for this author in PubMed Google Scholar
Mao Ye
View author publications
You can also search for this author in PubMed Google Scholar
Xuezhuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Tao Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mao Ye.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pei, L., Ye, M., Zhao, X. et al. Learning spatio-temporal features for action recognition from the side of the video. SIViP 10, 199–206 (2016). https://doi.org/10.1007/s11760-014-0726-4

Download citation

Received: 12 March 2014
Revised: 13 November 2014
Accepted: 14 November 2014
Published: 29 November 2014
Issue Date: January 2016
DOI: https://doi.org/10.1007/s11760-014-0726-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning spatio-temporal features for action recognition from the side of the video

Abstract

Access this article

Similar content being viewed by others

Spatial and Temporal Feature Extraction Using a Restricted Boltzmann Machine Model

Deep Discriminative Model for Video Classification

Human Action Recognition Method Based on Spatio-temporal Relationship

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning spatio-temporal features for action recognition from the side of the video

Abstract

Access this article

Similar content being viewed by others

Spatial and Temporal Feature Extraction Using a Restricted Boltzmann Machine Model

Deep Discriminative Model for Video Classification

Human Action Recognition Method Based on Spatio-temporal Relationship

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation