Are Current Monocular Computer Vision Systems for Human Action Recognition Suitable for Visual Surveillance Applications?

Nebel, Jean-Christophe; Lewandowski, Michał; Thévenon, Jérôme; Martínez, Francisco; Velastin, Sergio

doi:10.1007/978-3-642-24031-7_29

Jean-Christophe Nebel²⁸,
Michał Lewandowski²⁸,
Jérôme Thévenon²⁸,
Francisco Martínez²⁸ &
…
Sergio Velastin²⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6939))

Included in the following conference series:

International Symposium on Visual Computing

2773 Accesses
8 Citations

Abstract

Since video recording devices have become ubiquitous, the automated analysis of human activity from a single uncalibrated video has become an essential area of research in visual surveillance. Despite variability in terms of human appearance and motion styles, in the last couple of years, a few computer vision systems have reported very encouraging results. Would these methods be already suitable for visual surveillance applications? Alas, few of them have been evaluated in the two most challenging scenarios for an action recognition system: view independence and human interactions. Here, first a review of monocular human action recognition methods that could be suitable for visual surveillance is presented. Then, the most promising frameworks, i.e. methods based on advanced dimensionality reduction, bag of words and random forest, are described and evaluated on IXMAS and UT-Interaction datasets. Finally, suitability of these systems for visual surveillance applications is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blackburn, J., Ribeiro, E.: Human motion recognition using isomap and dynamic time warping. In: Elgammal, A., Rosenhahn, B., Klette, R. (eds.) Human Motion 2007. LNCS, vol. 4814, pp. 285–298. Springer, Heidelberg (2007)
Chapter Google Scholar
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Chin, T., Wang, L., Schindler, K., Suter, D.: Extrapolating learned manifolds for human activity recognition. In: ICIP 2007 (2007)
Google Scholar
Cheng, Z., Qin, L., Huang, Q., Jiang, S., Tian, Q.: Group Activity Recognition by Gaussian Processes Estimation. In: ICPR 2010 (2010)
Google Scholar
Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision at ECCV 2004, pp. 1–22 (2004)
Google Scholar
Fang, C.-H., Chen, J.-C., Tseng, C.-C., Lien, J.-J.J.: Human action recognition using spatio-temporal classification. In: Zha, H., Taniguchi, R.-i., Maybank, S. (eds.) ACCV 2009. LNCS, vol. 5995, pp. 98–109. Springer, Heidelberg (2010)
Chapter Google Scholar
Gilbert, A., Illingworth, J., Bowden, R.: Fast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features. In: ICCV 2009 (2009)
Google Scholar
Gorelick, L., Galun, M., Sharon, E., Basri, R., Brandt, A.: Shape representation and classification using the poisson equation. PAMI 28(12), 1991–2005 (2006)
Article Google Scholar
Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y., Huang, T.S.: Action Detection in Complex Scenes with Spatial and Temporal Ambiguities. In: ICCV 2009 (2009)
Google Scholar
Jia, K., Yeung, D.: Human action recognition using local spatio-temporal discriminant embedding. In: CVPR 2008 (2008)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features In: ECML 1998 (1998)
Google Scholar
Junejo, I.N., Dexter, E., Laptev, I., Pérez, P.: Cross-view action recognition from temporal self-similarities. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 293–306. Springer, Heidelberg (2008)
Chapter Google Scholar
Kaaniche, M.B., Bremond, F.: Gesture Recognition by Learning Local Motion Signatures. In: CVPR 2010 (2010)
Google Scholar
Kovashka, A., Grauman, K.: Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition. In: CVPR 2010 (2010)
Google Scholar
The KTH Database, http://www.nada.kth.se/cvap/actions/
Laptev, I.: On Space-Time Interest Points. International Journal of Computer Vision 64(2/3), 107–123 (2005)
Article Google Scholar
Laptev, I., Perez, P.: Retrieving Actions in Movies. In: ICCV 2007 (2007)
Google Scholar
Lewandowski, M., Makris, D., Nebel, J.-C.: View and style-independent action manifolds for human activity recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 547–560. Springer, Heidelberg (2010)
Chapter Google Scholar
Lewandowski, M., Martinez, J., Makris, D., Nebel, J.-C.: Temporal Extension of Laplacian Eigenmaps for Unsupervised Dimensionality Reduction of Time Series. In: ICPR 2010 (2010)
Google Scholar
Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: CVPR 2008 (2008)
Google Scholar
Natarajan, P., Singh, V.K., Nevatia, R.: Learning 3D Action Models from a few 2D videos for View Invariant Action Recognition. In: CVPR 2010 (2010)
Google Scholar
Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)
Chapter Google Scholar
Orrite, C., Martinez, F., Herrero, E., Ragheb, H., Velastin, S.A.: Independent viewpoint silhouette-based human action modeling and recognition. In: MLVMA 2008 (2008)
Google Scholar
Qu, H., Wang, L., Leckie, C.: Action Recognition Using Space-Time Shape Difference Images. In: ICPR 2010 (2010)
Google Scholar
Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Inc., Englewood Cliffs (1993)
MATH Google Scholar
Richard, S., Kyle, P.: Viewpoint manifolds for action recognition. EURASIP Journal on Image and Video Processing (2009)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities. In: ICCV 2009 (2009)
Google Scholar
Satkin, S., Hebert, M.: Modeling the temporal extent of actions. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 536–548. Springer, Heidelberg (2010)
Chapter Google Scholar
Thi, T.H., Zhang, J.: Human Action Recognition and Localization in Video using Structured Learning of Local Space-Time Features. In: AVSS 2010 (2010)
Google Scholar
Turaga, P., Veeraraghavan, A., Chellappa, R.: Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In: CVPR 2008, pp. 1–8 (2008)
Google Scholar
Wang, L., Suter, D.: Visual learning and recognition of sequential data manifolds with applications to human movement analysis. Computer Vision and Image Understanding 110(2), 153–172 (2008)
Article Google Scholar
Waltisberg, D., Yao, A., Gall, J., Van Gool, L.: Variations of a hough-voting action recognition system. In: Ünay, D., Çataltepe, Z., Aksoy, S. (eds.) ICPR 2010. LNCS, vol. 6388, pp. 306–312. Springer, Heidelberg (2010)
Chapter Google Scholar
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding 104(2-3), 249–257 (2006)
Article Google Scholar
Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: ICCV 2007 (2007)
Google Scholar
Weinland, D., Özuysal, M., Fua, P.: Making Action Recognition Robust to Occlusions and Viewpoint Changes. In: ECCV 2010 (2010)
Google Scholar
The Weizzman Database, http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html
Yan, P., Khan, S., Shah, M.: Learning 4D action feature models for arbitrary view action recognition. In: CVPR 2008 (2008)
Google Scholar
Yao, A., Gall, J., Van Gool, L.: A Hough Transform-Based Voting Framework for Action Recognition. In: CVPR 2010 (2010)
Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. NIPS 14, 585–591 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Digital Imaging Research Centre, Kingston University, London, Kingston-Upon-Thames, KT1 2EE, UK
Jean-Christophe Nebel, Michał Lewandowski, Jérôme Thévenon, Francisco Martínez & Sergio Velastin

Authors

Jean-Christophe Nebel
View author publications
You can also search for this author in PubMed Google Scholar
Michał Lewandowski
View author publications
You can also search for this author in PubMed Google Scholar
Jérôme Thévenon
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Velastin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Nevada, 89557, Reno, NV, USA
George Bebis
NASA Ames Research Center, 94035, Moffett Field, CA, USA
Richard Boyle
Lawrence Berkeley National Laboratory, 94720, Berkeley, CA, USA
Bahram Parvin
Desert Research Institute, 89512, Reno, NV, USA
Darko Koracin
Department of Computer Science and Engineering, University of South Carolina, 29208, Columbia, SC, USA
Song Wang
HRL Laboratories, 3011 Malibu Canyon Road, 90265-4797, Malibu, CA, USA
Kim Kyungnam
Purdue University, West Lafayette, 47907-2021, IN, USA
Bedrich Benes
Sandia National Laboratory, 87185, Albuquerque, NM, USA
Kenneth Moreland
University of Louisiana at Lafayette, 70504, LA, USA
Christoph Borst
Adobe Systems Incorporated, San Francisco, CA, USA
Stephen DiVerdi
Polytechnic Institute of NYU, 11201, Brooklyn, NY, USA
Chiang Yi-Jen
Lawrence Livermore National Laboratory, 94551-0808, Livermore, CA, USA
Jiang Ming

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nebel, JC., Lewandowski, M., Thévenon, J., Martínez, F., Velastin, S. (2011). Are Current Monocular Computer Vision Systems for Human Action Recognition Suitable for Visual Surveillance Applications?. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2011. Lecture Notes in Computer Science, vol 6939. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24031-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-24031-7_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24030-0
Online ISBN: 978-3-642-24031-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics