Skip to main content

Advertisement

Log in

Human action recognition from simple feature pooling

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Human action recognition (HAR) from images is an important and challenging task for many current applications. In this context, designing discriminative action descriptors from simple features is a relevant task. In this paper we show that very good descriptors can be build from simple filter outputs when multilevel architectures and non-linear transformations are used. We propose a new multiscale descriptor for HAR from a Pyramid of Accumulated Histograms of Optical Flow. We also show that, in this case, space–time gradients provide sufficient information for the recognition task. Our descriptor is evaluated on three standard databases of human actions: KTH, Weizmann and IXMAS. We compare very favorably the results of our descriptor with the current results for these three databases from other algorithms. In particular, our descriptor is directly comparable to the state-of-the-art on KTH database with an average of 96 % of correct recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. There are 93 videos instead of 90 since Weizmann’s actor Lena repeats twice the actions run, skip and walk.

  2. Intel Core i3 M350@2.27GHz, 4 GB RAM, Matlab 2009b on a single CPU core.

References

  1. Balcells M, DeMenthon D, Doermann D (2004) An appearance-based approach for consistent labeling of humans and objects in video. Pattern Anal Appl 7:373–385

    Article  MathSciNet  Google Scholar 

  2. Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2009) Recognizing human actions by fusing spatio-temporal appearance and motion descriptors. In: Proceedings of the IEEE international conference on image processing, pp 3569–3572

  3. Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space–time shapes. In: International conference on computer vision, vol 2, pp 1395–1402

  4. Blei D, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  5. Bosch A, Zisserman A, Muñoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of CIVR

  6. Boukir S, CheneviFre F (2004) Compression and recognition of dance gestures using a deformable model. Pattern Anal Appl 7:308–316

    Google Scholar 

  7. Breitenstein MD, Reichlin F, Leibe B, Koller-Meier E, Gool LV (2009) Robust tracking-by-detection using a detector confidence particle filter. In: IEEE international conference on computer vision (ICCV’09)

  8. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/. Accessed 1 Aug 2012

  9. Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European conference on computer vision

  10. Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: 2nd IEEE workshop VS-PETS, pp 65–72

  11. Duchenne O, Laptev I, Sivic J, Bach F, Ponce J (2009) Automatic annotation of human actions in video. In: International conference on computer vision

  12. Efros A, Berg A, Mori G, Malik J (2003) Recognizing action at a distance. In: International conference on computer vision, vol 2, pp 726–733

  13. Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: Proceedings of the 13th Scandinavian conference on image analysis, LNCS, vol 2749, pp 363–370

  14. Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: CVPR

  15. Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: IEEE conference on computer vision and pattern recognition

  16. Ferrari V, Marin-Jimenez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. In: IEEE conference on computer vision and pattern recognition

  17. Friedman J, Hastie T, Tibshirani R (1998) Additive logistic regression: a statistical view of boosting: technical report. Department of Statistics, Stanford University, California

    Google Scholar 

  18. Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proceedings of the IEEE ICCV

  19. Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: International conference on computer vision

  20. Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In: Proceedings of ICCV’07, pp 1–8

  21. Ke Y, Sukthankar R, Hebert M (2005) Efficient visual event detection using volumetric features. In: Proceedings of IEEE international conference on computer vision (ICCV ’05), pp 166–173

  22. Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space–time neighborhood features for human action recognition. In: IEEE conference on computer vision and pattern recognition

  23. Laptev I (2005) On space–time interest points. Int J Comput Vis 64(2/3):107–123

    Article  Google Scholar 

  24. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008a) Learning realistic human actions from movies. In: Proceedings on CVPR

  25. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008b) Learning realistic human actions from movies. In: International conference on computer vision and pattern recognition

  26. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. CVPR 2:2169–2178

    Google Scholar 

  27. Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: International conference on computer vision

  28. Lucena M, de la Blanca NP, Fuertes J (2012) Human action recognition based on aggregated local motion estimates. Mach Vis Appl 23:135–150

    Google Scholar 

  29. Lui YM, Beveridge J, Kirby M (2010) Action classification on product manifolds. In: IEEE conference on computer vision and pattern recognition, pp 833–839

  30. Marín-Jiménez M, de la Blanca NP, Mendoza M, Lucena M, Fuertes J (2009) Learning action descriptors for recognition. In: IEEE (ed) WIAMIS 2009, London, UK. IEEE Computer Society, New York, pp 5–8

  31. Mitchell TM (1997) Machine learning. McGraw-Hill, New York

    MATH  Google Scholar 

  32. Moeslund TB, Hilton A, Kruger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Underst 104:90–126

    Article  Google Scholar 

  33. Nebel JC, Lewandowski M, Thévenon J, Martínez-Contreras F, Velastin S (2011) Are current monocular computer vision systems for human action recognition suitable for visual surveillance applications? ISVC 2:290–299

    Google Scholar 

  34. Norouzi M, Ranjbar M, Mori G (2009) Stacks of convolutional restricted Boltzmann machines for shift-invariant feature learning. In: IEEE conference on computer vision and pattern recognition

  35. Otsu N (1979) A threshold selection method from gray level histograms. IEEE Trans Syst Man Cybern 9:62–66

    Article  Google Scholar 

  36. Pantic M, Pentland A, Nijholt A, Huang T (2007) Human computing and machine understanding of human behavior: a survey. Artif Intell Human Comput 4451:47–71

    Google Scholar 

  37. Pinto N, Cox DD, Dicarlo JJ (2008) Why is real-world visual object recognition hard? PLoS Comput Biol 4(1):e27

    Article  MathSciNet  Google Scholar 

  38. Ramanan D, Forsyth D, Zisserman A (2007) Tracking people by learning their appearance. IEEE Trans Pattern Anal Mach Intell 29(1):65–81

    Article  Google Scholar 

  39. Reddy KK, Liu J, Shah M (2009) Incremental action recognition using feature-tree. In: International conference on computer vision

  40. Schindler K, van Gool L (2008) Action snippets: how many frames does human action recognition require? In: IEEE conference on computer vision and pattern recognition

  41. Schindler K, van Gool L (2008) Combining densely sampled form and motion for human action recognition. In: DAGM08, pp 122–131

  42. Schüldt C, Laptev I, Caputo B: Recognizing human actions: a local SVM approach. In: International conference on pattern recognition, Cambridge, UK, vol 3, pp 32–36

  43. Seo HJ, Milanfar P (2009) Detection of human actions from a single example. In: International conference on computer vision

  44. Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell 29(3):411–426

    Article  Google Scholar 

  45. Sminchisescu C, Kanaujia A, Li Z, Metaxas D (2005) Conditional models for contextual human motion recognition. In: Proceedings of ICCV’05, IEEE

  46. Song Y, Goncalves L, Perona P (2003) Unsupervised learning of human motion. IEEE Trans Patt Anal and Mach Intell 25(7):1–14

    Google Scholar 

  47. Sun X, Chen MY, Hauptmann A (2009) Action recognition via local descriptors and holistic features. International workshop on human communicative behaviour analysis-CVPR

  48. Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. Circuits Syst Video Technol IEEE Trans 18(11):1473–1488

    Article  Google Scholar 

  49. Vedaldi A, Zisserman A (2012) Efficient additive kernels via explicit feature maps. IEEE PAMI 34(3):480–492

    Article  Google Scholar 

  50. Wang H, Ullah MM, KlSser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action cognition. In: Proeedings of BMVC

  51. Wang RR, Huang T (2004) A framework of joint object tracking and event detection. Pattern Anal Appl 7:343–355

    Article  Google Scholar 

  52. Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. In: CVIU

  53. Yu T, Kim T, Cipolla R (2010) Real-time action recognition by spatiotemporal semantic and structural forests. In: Proceedings of BMVC, pp 1–12

  54. Zelnik-Manor L, Irani Michal (2006) Statistical analysis of dynamic actions. IEEE Trans Pattern Anal Mach Intell 28(9):1530–1535

    Article  Google Scholar 

  55. Zhang Z, Hu Y, Chan S, Chia L (2008) Motion context: a new representation for human action recognition. In: ECCV 2008, pp 817–829

Download references

Acknowledgments

This work has been granted by the Project CSD2007-00018 (MIPRCV) from the Spanish Minister of Science and Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel J. Marín-Jiménez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marín-Jiménez, M.J., Pérez de la Blanca, N. & Mendoza, M.Á. Human action recognition from simple feature pooling. Pattern Anal Applic 17, 17–36 (2014). https://doi.org/10.1007/s10044-012-0292-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-012-0292-8

Keywords

Navigation