An efficient and sparse approach for large scale human action recognition in videos

Beaudry, Cyrille; Péteri, Renaud; Mascarilla, Laurent

doi:10.1007/s00138-016-0760-z

An efficient and sparse approach for large scale human action recognition in videos

Original Paper
Published: 28 March 2016

Volume 27, pages 529–543, (2016)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Cyrille Beaudry¹,
Renaud Péteri¹ &
Laurent Mascarilla¹

664 Accesses
13 Citations
Explore all metrics

Abstract

This paper focuses on human action recognition in video sequences. A method based on optical flow estimation is presented, where critical points of this flow field are extracted. Multi-scale trajectories are generated from those points and are characterized in the frequency domain. Finally, a sequence is described by fusing this frequency information with motion orientation and shape information. This method has been tested on video datasets with recognition rates among the highest in the state of the art. Contrary to recent dense sampling strategies, the proposed method only requires critical points of motion flow field, thus permitting a lower computational cost and a better sequence description. A cross-dataset generalization is performed to illustrate the robustness of the method to recognition dataset biases. Results, comparisons and prospects on complex action recognition datasets are finally discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Robust and Efficient Video Representation for Action Recognition

Article 17 July 2015

A New Use of Doppler Spectrum for Action Recognition with the Help of Optical Flow

Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity

Notes

See also: http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/.

References

Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. In: Proceedings of international conference on computer vision (2007)
Bilinski, P., Bremond, F.: Video covariance matrix logarithm for human action recognition in videos. In: International conference on artificial intelligence, Buenos Aires (2015)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proceedings of international conference on computer vision, pp. 1395–1402 (2005)
Can, E., Manmatha, R.: Formulating action recognition as a ranking problem. In: International workshop on action similarity in unconstrained videos (2013)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2, 1–27 (2011)
Article Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, pp. 65–72 (2005)
Fan, H., Cao, Z., Jiang, Y., Yin, Q., Doudou, C.: Learning deep face representation. Comput. Res. Repos. arxiv:1403.2802 (2014)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Article Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of fourth alvey vision conference, pp. 147–151 (1988)
Hasan, M., Roy-Chowdhury, A.: A continuous learning framework for activity recognition using deep hybrid feature models. IEEE Trans. Multimed. 99, 1 (2015). doi:10.1109/TMM.2015.2477242
Article Google Scholar
Hastie, T., Rosset, S., Zhu, J., Zou, H.: Multi-class AdaBoost. Stat. Interface 2(3), 349–360 (2009)
Article MathSciNet MATH Google Scholar
Jain, M., Jégou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Proceedings of conference on computer vision pattern recognition, Portland (2013). http://hal.inria.fr/hal-00813014
Jiang, Y.G., Dai, Q., Xue, X., Liu, W., Ngo, C.W.: Trajectory-based modeling of human actions with motion reference points. In: Proceedings of the 12th European conference on computer vision, vol. part V, ECCV’12, pp. 425–438 (2012)
Kantorov, V., Laptev, I.: Efficient feature extraction, encoding and classification for action recognition. In: Proceedings of conference on computer vision and pattern recognition (2014)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)
Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Proceedings of European conference on computer vision, pp. 158–171 (2012)
Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: Proceedings of European conference on computer vision, ECCV’12, pp. 256–269. Springer-Verlag, Berlin (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of international conference on computer vision (2011)
Lan, Z., Li, X., Lin, M., Hauptmann, A.G.: Long-short term motion feature for action classification and retrieval. CoRR (2015). arxiv:1502.04132
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)
Article Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of conference on computer vision and pattern recognition, pp. 1–8 (2008)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. Proc. Conf. Comput. Vis. Pattern Recogn. 2, 2169–2178 (2006). doi:10.1109/CVPR.2006.68
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: Proceedings of conference on computer vision and pattern recognition, pp. 1996–2003 (2009)
Murthy, O., Goecke, R.: Ordered trajectories for large scale human action recognition. In: Proceedings of international conference on computer vision and pattern recognition, pp. 412–419 (2013)
Nasrollahi, K., Guerrero, S., Rasti, P., Anbarjafari, G., Baro, X., Escalante, H.J., Moeslund, T.: Deep learning based super-resolution for improved action recognition (2015)
Niebles, J., Chen, C.W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. Proc. Eur. Conf. Comput. Vis. 6312, 392–405 (2010)
Google Scholar
Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput. Res. Repos. (2014). arxiv:1405.4506
Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010—European conference on computer vision, vol. 6314, pp. 143–156 (2010)
Péteri, R., Fazekas, S., Huiskes, M.J.: DynTex : a comprehensive database of dynamic textures. Pattern Recog. Lett. (2010)
Ramana Murthy, O., Goecke, R.: Ordered trajectories for large scale human action recognition. In: Proceedings of international conference on computer vision (2013)
Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. In: Proceedings of Europeean conference on computer vision, pp. 577–590. Berlin, Heidelberg (2010)
Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)
Article Google Scholar
Sadanand, S., Corso, J.J.: Action bank: A high-level representation of activity in video. In: CVPR, pp. 1234–1241. IEEE Computer Society (2012)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of conference on pattern recognition, vol. 3, pp. 32–36 (2004)
Shi, F., Petriu, E., Laganiere, R.: Sampling strategies for real-time action recognition. In: Proceedings of conference on computer vision and pattern recognition (2013)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep fisher networks for large-scale image classification. Adv. Neural Inf. Process. Syst. 26, 163–171 (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 27, 568–576 (2014)
Google Scholar
Solmaz, B., Assari, S.M., Shah, M.: Classifying web videos using a global video descriptor. Mach. Vis. Appl. 24(7), 1473–1485 (2013). doi:10.1007/s00138-012-0449-x
Article Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. Comput. Res. Repos. (2012). arxiv:1212.0402
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms. CoRR (2015). arxiv:1502.04681
Sultani, W., Saleemi, I.: Human action recognition across datasets by foreground-weighted histogram decomposition. Proc. Conf. Comput. Vis. Pattern Recogn., pp. 764–771 (2014). doi: 10.1109/CVPR.2014.103
Sun, D., Roth, S., Black, M.: Secrets of optical flow estimation and their principles. In: Proceedings of conference on computer vision and pattern recognition, pp. 2432–2439 (2010)
Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR), pp. 1250–1257 (2012). doi:10.1109/CVPR.2012.6247808
Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: Proceedings of conference on computer vision and pattern recognition (2011)
Ullah, M.M., Laptev, I.: Actlets: a novel local representation for human action recognition in video. In: Proceedings of IEEE international conference on image processing, pp. 777–780 (2012)
Vrigkas, M., Karavasilis, V., Nikou, C., Kakadiaris, A.: Matching mixtures of curves for human action recognition. Comput. Vis. Image Underst. 119, 27–40 (2014)
Article Google Scholar
Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: Proceedings of conference on computer vision and pattern recognition, pp. 3169–3176 (2011)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)
Article MathSciNet Google Scholar
Wang, H., Muneeb Ullah, M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. University of Central Florida, Florida (2009)
Wang, H., Schmid, C.: Action Recognition with Improved Trajectories. In: Proceedings of international conference on computer vision, Sydney, pp. 3551–3558 (2013). doi:10.1109/ICCV.2013.441. http://hal.inria.fr/hal-00873267
Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. Comput. Res. Repos. (2015)
Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: Deepflow: large displacement optical flow with deep matching. Proc. Int. Conf. Comput. Vis., pp. 1385–1392 (2013). doi:10.1109/ICCV.2013.175
Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Proceedings of European conference on computer vision, Berlin, pp. 650–663 (2008)
Wulff, J., Butler, D.J., Stanley, G.B., Black, M.J.: Lessons and insights from creating a synthetic optical flow benchmark. In: Proceedings of European conference on computer vision, pp. 168–177 (2012)
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. Proc. Conf. Comput. Vis. Pattern Recogn., p. 13 (2006). doi:10.1109/CVPRW.2006.121

Download references

Author information

Authors and Affiliations

Univ. La Rochelle, 23 avenue Albert Einstein, BP 33060, 17031, La Rochelle, France
Cyrille Beaudry, Renaud Péteri & Laurent Mascarilla

Authors

Cyrille Beaudry
View author publications
You can also search for this author in PubMed Google Scholar
Renaud Péteri
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Mascarilla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renaud Péteri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beaudry, C., Péteri, R. & Mascarilla, L. An efficient and sparse approach for large scale human action recognition in videos. Machine Vision and Applications 27, 529–543 (2016). https://doi.org/10.1007/s00138-016-0760-z

Download citation

Received: 19 February 2015
Revised: 29 January 2016
Accepted: 17 February 2016
Published: 28 March 2016
Issue Date: May 2016
DOI: https://doi.org/10.1007/s00138-016-0760-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient and sparse approach for large scale human action recognition in videos

Abstract

Access this article

Similar content being viewed by others

A Robust and Efficient Video Representation for Action Recognition

A New Use of Doppler Spectrum for Action Recognition with the Help of Optical Flow

Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient and sparse approach for large scale human action recognition in videos

Abstract

Access this article

Similar content being viewed by others

A Robust and Efficient Video Representation for Action Recognition

A New Use of Doppler Spectrum for Action Recognition with the Help of Optical Flow

Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation