Skip to main content
Log in

An efficient and sparse approach for large scale human action recognition in videos

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This paper focuses on human action recognition in video sequences. A method based on optical flow estimation is presented, where critical points of this flow field are extracted. Multi-scale trajectories are generated from those points and are characterized in the frequency domain. Finally, a sequence is described by fusing this frequency information with motion orientation and shape information. This method has been tested on video datasets with recognition rates among the highest in the state of the art. Contrary to recent dense sampling strategies, the proposed method only requires critical points of motion flow field, thus permitting a lower computational cost and a better sequence description. A cross-dataset generalization is performed to illustrate the robustness of the method to recognition dataset biases. Results, comparisons and prospects on complex action recognition datasets are finally discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. See also: http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/.

References

  1. Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. In: Proceedings of international conference on computer vision (2007)

  2. Bilinski, P., Bremond, F.: Video covariance matrix logarithm for human action recognition in videos. In: International conference on artificial intelligence, Buenos Aires (2015)

  3. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proceedings of international conference on computer vision, pp. 1395–1402 (2005)

  4. Can, E., Manmatha, R.: Formulating action recognition as a ranking problem. In: International workshop on action similarity in unconstrained videos (2013)

  5. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2, 1–27 (2011)

    Article  Google Scholar 

  6. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, pp. 65–72 (2005)

  7. Fan, H., Cao, Z., Jiang, Y., Yin, Q., Doudou, C.: Learning deep face representation. Comput. Res. Repos. arxiv:1403.2802 (2014)

  8. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  9. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)

    Article  Google Scholar 

  10. Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of fourth alvey vision conference, pp. 147–151 (1988)

  11. Hasan, M., Roy-Chowdhury, A.: A continuous learning framework for activity recognition using deep hybrid feature models. IEEE Trans. Multimed. 99, 1 (2015). doi:10.1109/TMM.2015.2477242

    Article  Google Scholar 

  12. Hastie, T., Rosset, S., Zhu, J., Zou, H.: Multi-class AdaBoost. Stat. Interface 2(3), 349–360 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  13. Jain, M., Jégou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Proceedings of conference on computer vision pattern recognition, Portland (2013). http://hal.inria.fr/hal-00813014

  14. Jiang, Y.G., Dai, Q., Xue, X., Liu, W., Ngo, C.W.: Trajectory-based modeling of human actions with motion reference points. In: Proceedings of the 12th European conference on computer vision, vol. part V, ECCV’12, pp. 425–438 (2012)

  15. Kantorov, V., Laptev, I.: Efficient feature extraction, encoding and classification for action recognition. In: Proceedings of conference on computer vision and pattern recognition (2014)

  16. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)

  17. Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Proceedings of European conference on computer vision, pp. 158–171 (2012)

  18. Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: Proceedings of European conference on computer vision, ECCV’12, pp. 256–269. Springer-Verlag, Berlin (2012)

  19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  20. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of international conference on computer vision (2011)

  21. Lan, Z., Li, X., Lin, M., Hauptmann, A.G.: Long-short term motion feature for action classification and retrieval. CoRR (2015). arxiv:1502.04132

  22. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)

    Article  Google Scholar 

  23. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of conference on computer vision and pattern recognition, pp. 1–8 (2008)

  24. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. Proc. Conf. Comput. Vis. Pattern Recogn. 2, 2169–2178 (2006). doi:10.1109/CVPR.2006.68

    Google Scholar 

  25. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: Proceedings of conference on computer vision and pattern recognition, pp. 1996–2003 (2009)

  26. Murthy, O., Goecke, R.: Ordered trajectories for large scale human action recognition. In: Proceedings of international conference on computer vision and pattern recognition, pp. 412–419 (2013)

  27. Nasrollahi, K., Guerrero, S., Rasti, P., Anbarjafari, G., Baro, X., Escalante, H.J., Moeslund, T.: Deep learning based super-resolution for improved action recognition (2015)

  28. Niebles, J., Chen, C.W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. Proc. Eur. Conf. Comput. Vis. 6312, 392–405 (2010)

    Google Scholar 

  29. Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput. Res. Repos. (2014). arxiv:1405.4506

  30. Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010—European conference on computer vision, vol. 6314, pp. 143–156 (2010)

  31. Péteri, R., Fazekas, S., Huiskes, M.J.: DynTex : a comprehensive database of dynamic textures. Pattern Recog. Lett. (2010)

  32. Ramana Murthy, O., Goecke, R.: Ordered trajectories for large scale human action recognition. In: Proceedings of international conference on computer vision (2013)

  33. Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. In: Proceedings of Europeean conference on computer vision, pp. 577–590. Berlin, Heidelberg (2010)

  34. Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)

    Article  Google Scholar 

  35. Sadanand, S., Corso, J.J.: Action bank: A high-level representation of activity in video. In: CVPR, pp. 1234–1241. IEEE Computer Society (2012)

  36. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of conference on pattern recognition, vol. 3, pp. 32–36 (2004)

  37. Shi, F., Petriu, E., Laganiere, R.: Sampling strategies for real-time action recognition. In: Proceedings of conference on computer vision and pattern recognition (2013)

  38. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep fisher networks for large-scale image classification. Adv. Neural Inf. Process. Syst. 26, 163–171 (2013)

    Google Scholar 

  39. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 27, 568–576 (2014)

    Google Scholar 

  40. Solmaz, B., Assari, S.M., Shah, M.: Classifying web videos using a global video descriptor. Mach. Vis. Appl. 24(7), 1473–1485 (2013). doi:10.1007/s00138-012-0449-x

    Article  Google Scholar 

  41. Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. Comput. Res. Repos. (2012). arxiv:1212.0402

  42. Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using lstms. CoRR (2015). arxiv:1502.04681

  43. Sultani, W., Saleemi, I.: Human action recognition across datasets by foreground-weighted histogram decomposition. Proc. Conf. Comput. Vis. Pattern Recogn., pp. 764–771 (2014). doi: 10.1109/CVPR.2014.103

  44. Sun, D., Roth, S., Black, M.: Secrets of optical flow estimation and their principles. In: Proceedings of conference on computer vision and pattern recognition, pp. 2432–2439 (2010)

  45. Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR), pp. 1250–1257 (2012). doi:10.1109/CVPR.2012.6247808

  46. Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: Proceedings of conference on computer vision and pattern recognition (2011)

  47. Ullah, M.M., Laptev, I.: Actlets: a novel local representation for human action recognition in video. In: Proceedings of IEEE international conference on image processing, pp. 777–780 (2012)

  48. Vrigkas, M., Karavasilis, V., Nikou, C., Kakadiaris, A.: Matching mixtures of curves for human action recognition. Comput. Vis. Image Underst. 119, 27–40 (2014)

    Article  Google Scholar 

  49. Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: Proceedings of conference on computer vision and pattern recognition, pp. 3169–3176 (2011)

  50. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  51. Wang, H., Muneeb Ullah, M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. University of Central Florida, Florida (2009)

  52. Wang, H., Schmid, C.: Action Recognition with Improved Trajectories. In: Proceedings of international conference on computer vision, Sydney, pp. 3551–3558 (2013). doi:10.1109/ICCV.2013.441. http://hal.inria.fr/hal-00873267

  53. Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. Comput. Res. Repos. (2015)

  54. Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: Deepflow: large displacement optical flow with deep matching. Proc. Int. Conf. Comput. Vis., pp. 1385–1392 (2013). doi:10.1109/ICCV.2013.175

  55. Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Proceedings of European conference on computer vision, Berlin, pp. 650–663 (2008)

  56. Wulff, J., Butler, D.J., Stanley, G.B., Black, M.J.: Lessons and insights from creating a synthetic optical flow benchmark. In: Proceedings of European conference on computer vision, pp. 168–177 (2012)

  57. Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. Proc. Conf. Comput. Vis. Pattern Recogn., p. 13 (2006). doi:10.1109/CVPRW.2006.121

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renaud Péteri.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Beaudry, C., Péteri, R. & Mascarilla, L. An efficient and sparse approach for large scale human action recognition in videos. Machine Vision and Applications 27, 529–543 (2016). https://doi.org/10.1007/s00138-016-0760-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-016-0760-z

Keywords

Navigation