Are Current Monocular Computer Vision Systems for Human Action Recognition Suitable for Visual Surveillance Applications?

  • Jean-Christophe Nebel
  • Michał Lewandowski
  • Jérôme Thévenon
  • Francisco Martínez
  • Sergio Velastin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6939)


Since video recording devices have become ubiquitous, the automated analysis of human activity from a single uncalibrated video has become an essential area of research in visual surveillance. Despite variability in terms of human appearance and motion styles, in the last couple of years, a few computer vision systems have reported very encouraging results. Would these methods be already suitable for visual surveillance applications? Alas, few of them have been evaluated in the two most challenging scenarios for an action recognition system: view independence and human interactions. Here, first a review of monocular human action recognition methods that could be suitable for visual surveillance is presented. Then, the most promising frameworks, i.e. methods based on advanced dimensionality reduction, bag of words and random forest, are described and evaluated on IXMAS and UT-Interaction datasets. Finally, suitability of these systems for visual surveillance applications is discussed.


Random Forest Action Recognition Dynamic Time Warping Human Action Recognition Visual Hull 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blackburn, J., Ribeiro, E.: Human motion recognition using isomap and dynamic time warping. In: Elgammal, A., Rosenhahn, B., Klette, R. (eds.) Human Motion 2007. LNCS, vol. 4814, pp. 285–298. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  3. 3.
    Chin, T., Wang, L., Schindler, K., Suter, D.: Extrapolating learned manifolds for human activity recognition. In: ICIP 2007 (2007)Google Scholar
  4. 4.
    Cheng, Z., Qin, L., Huang, Q., Jiang, S., Tian, Q.: Group Activity Recognition by Gaussian Processes Estimation. In: ICPR 2010 (2010)Google Scholar
  5. 5.
    Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision at ECCV 2004, pp. 1–22 (2004)Google Scholar
  6. 6.
    Fang, C.-H., Chen, J.-C., Tseng, C.-C., Lien, J.-J.J.: Human action recognition using spatio-temporal classification. In: Zha, H., Taniguchi, R.-i., Maybank, S. (eds.) ACCV 2009. LNCS, vol. 5995, pp. 98–109. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Gilbert, A., Illingworth, J., Bowden, R.: Fast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features. In: ICCV 2009 (2009)Google Scholar
  8. 8.
    Gorelick, L., Galun, M., Sharon, E., Basri, R., Brandt, A.: Shape representation and classification using the poisson equation. PAMI 28(12), 1991–2005 (2006)CrossRefGoogle Scholar
  9. 9.
    Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y., Huang, T.S.: Action Detection in Complex Scenes with Spatial and Temporal Ambiguities. In: ICCV 2009 (2009)Google Scholar
  10. 10.
    Jia, K., Yeung, D.: Human action recognition using local spatio-temporal discriminant embedding. In: CVPR 2008 (2008)Google Scholar
  11. 11.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features In: ECML 1998 (1998)Google Scholar
  12. 12.
    Junejo, I.N., Dexter, E., Laptev, I., Pérez, P.: Cross-view action recognition from temporal self-similarities. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 293–306. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Kaaniche, M.B., Bremond, F.: Gesture Recognition by Learning Local Motion Signatures. In: CVPR 2010 (2010)Google Scholar
  14. 14.
    Kovashka, A., Grauman, K.: Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition. In: CVPR 2010 (2010)Google Scholar
  15. 15.
  16. 16.
    Laptev, I.: On Space-Time Interest Points. International Journal of Computer Vision 64(2/3), 107–123 (2005)CrossRefGoogle Scholar
  17. 17.
    Laptev, I., Perez, P.: Retrieving Actions in Movies. In: ICCV 2007 (2007)Google Scholar
  18. 18.
    Lewandowski, M., Makris, D., Nebel, J.-C.: View and style-independent action manifolds for human activity recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 547–560. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  19. 19.
    Lewandowski, M., Martinez, J., Makris, D., Nebel, J.-C.: Temporal Extension of Laplacian Eigenmaps for Unsupervised Dimensionality Reduction of Time Series. In: ICPR 2010 (2010)Google Scholar
  20. 20.
    Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: CVPR 2008 (2008)Google Scholar
  21. 21.
    Natarajan, P., Singh, V.K., Nevatia, R.: Learning 3D Action Models from a few 2D videos for View Invariant Action Recognition. In: CVPR 2010 (2010)Google Scholar
  22. 22.
    Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Orrite, C., Martinez, F., Herrero, E., Ragheb, H., Velastin, S.A.: Independent viewpoint silhouette-based human action modeling and recognition. In: MLVMA 2008 (2008)Google Scholar
  24. 24.
    Qu, H., Wang, L., Leckie, C.: Action Recognition Using Space-Time Shape Difference Images. In: ICPR 2010 (2010)Google Scholar
  25. 25.
    Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Inc., Englewood Cliffs (1993)zbMATHGoogle Scholar
  26. 26.
    Richard, S., Kyle, P.: Viewpoint manifolds for action recognition. EURASIP Journal on Image and Video Processing (2009)Google Scholar
  27. 27.
    Ryoo, M.S., Aggarwal, J.K.: Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities. In: ICCV 2009 (2009)Google Scholar
  28. 28.
    Satkin, S., Hebert, M.: Modeling the temporal extent of actions. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 536–548. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  29. 29.
    Thi, T.H., Zhang, J.: Human Action Recognition and Localization in Video using Structured Learning of Local Space-Time Features. In: AVSS 2010 (2010)Google Scholar
  30. 30.
    Turaga, P., Veeraraghavan, A., Chellappa, R.: Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In: CVPR 2008, pp. 1–8 (2008)Google Scholar
  31. 31.
    Wang, L., Suter, D.: Visual learning and recognition of sequential data manifolds with applications to human movement analysis. Computer Vision and Image Understanding 110(2), 153–172 (2008)CrossRefGoogle Scholar
  32. 32.
    Waltisberg, D., Yao, A., Gall, J., Van Gool, L.: Variations of a hough-voting action recognition system. In: Ünay, D., Çataltepe, Z., Aksoy, S. (eds.) ICPR 2010. LNCS, vol. 6388, pp. 306–312. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  33. 33.
    Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding 104(2-3), 249–257 (2006)CrossRefGoogle Scholar
  34. 34.
    Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: ICCV 2007 (2007)Google Scholar
  35. 35.
    Weinland, D., Özuysal, M., Fua, P.: Making Action Recognition Robust to Occlusions and Viewpoint Changes. In: ECCV 2010 (2010)Google Scholar
  36. 36.
  37. 37.
    Yan, P., Khan, S., Shah, M.: Learning 4D action feature models for arbitrary view action recognition. In: CVPR 2008 (2008)Google Scholar
  38. 38.
    Yao, A., Gall, J., Van Gool, L.: A Hough Transform-Based Voting Framework for Action Recognition. In: CVPR 2010 (2010)Google Scholar
  39. 39.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. NIPS 14, 585–591 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jean-Christophe Nebel
    • 1
  • Michał Lewandowski
    • 1
  • Jérôme Thévenon
    • 1
  • Francisco Martínez
    • 1
  • Sergio Velastin
    • 1
  1. 1.Digital Imaging Research CentreKingston UniversityLondonUK

Personalised recommendations