Advertisement

Artificial Intelligence Review

, Volume 37, Issue 4, pp 301–311 | Cite as

Survey on classifying human actions through visual sensors

  • Michael S. Del Rose
  • Christian C. Wagner
Article

Abstract

The ability to predict the intentions of people based solely on their visual actions is a skill only performed by humans and animals. This requires segmentation of items in the field of view, tracking of moving objects, identifying the importance of each object, determining the current role of each important object individually and in collaboration with other objects, relating these objects into a predefined scenario, assessing the selected scenario with the information retrieve, and finally adjusting the scenario to better fit the data. This is all accomplished with great accuracy in less than a few seconds. The intelligence of current computer algorithms has not reached this level of complexity with the accuracy and time constraints that humans and animals have, but there are several research efforts that are working towards this by identifying new algorithms for solving parts of this problem. This survey paper lists several of these efforts that rely mainly on understanding the image processing and classification of a limited number of actions. It divides the activities up into several groups and ends with a discussion of future needs.

Keywords

Visual human action classification Artificial intelligence Hidden Markov Model Grammars 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Antonakaki P, Kosmopoulos D, Perantonis SJ (2009) Detecting abnormal human behaviour using multiple cameras. Signal Process J 89: 1723–1738zbMATHCrossRefGoogle Scholar
  2. Babu RV, Anantharaman B, Ramakrishnan KR, Srinivasan SH (2002) Compressed domain action classification using HMM. Pattern Recognit Lett 23(10): 1203–1213. doi: 10.1016/S01167.8655(02)00067-3 zbMATHCrossRefGoogle Scholar
  3. Batra D, Chen TH, Sukthankar R (2008) Space-time shapelets for action recognition. In: Proceedings from IEEE workshop on motion and video computing, pp 1–6. doi: 10.1109/WMVC.2008.4544051
  4. Baum L (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3: 1–8Google Scholar
  5. Ben-Arie J, Wang Z, Pandit P, Rajaram S (2002) Human activity recognition using multidimensional indexing. IEEE Trans Pattern Anal Mach Vis (PAMI) 24(8): 1091–1104. doi: 10.1109/TPAMI.2002.1023805 CrossRefGoogle Scholar
  6. Bilmes J (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden markov models. Technical report TR-97-021, University of BerkeleyGoogle Scholar
  7. Blackburn J, Ribeiro E (2007) Human motion recognition using isomap and dynamic time warping. Lect Notes Pattern Recognit 4814:285–298, Springer, Berlin. doi: 10.1007/978.3.540.75703.0 Google Scholar
  8. Bouchaffra D, Tan J (2007) Structural hidden markov models based on stochastic context-free grammars. Control Intell Syst 35(3): 211–216zbMATHGoogle Scholar
  9. Brand M, Oliver N, Pentland A (1997) Coupled hidden markov models for complex action recognition. In: Proceedings from computer vision and pattern recognition conference (CVPR), pp 994–999Google Scholar
  10. Bui H, Phung D, Venkatesh S (2004) Hierarchical hidden markov models with general state hierarchy. In: Proceedings of the nineteenth national conference of artificial ntelligence, pp 324–329Google Scholar
  11. Campbell L, Becker D, Azarbayejani A (1996) Invariant features for 3-D Jester recognition. In: Proceedings from IEEE automatic face and gesture recognition (AFGR), pp 157–162Google Scholar
  12. Chakraborty B, Rudovic O, Gonzalez J (2008) View invariant human body detection with extension to human action recognition using component-wise HMM of body parts. Lect Notes Comput Sci 5098: 208–217. doi: 10.1007/978-3-540-70517-8_20 CrossRefGoogle Scholar
  13. Chomat O, Crowley JL (2000) A probabilistic sensor for the perception of activities. In: Proceedings from IEEE international conference on automatic face and gesture recognition, pp 314–319Google Scholar
  14. Colombo C, Comanducci D, Bimbo A (2007) Compact representation and probabilistic classification of human actions in videos. In: Proceedings from IEEE conference on advanced video and signal based surveillance, pp 342–346. doi: 10.1109/AVSS.2007.4425334
  15. Cristani M, Bicego M, Murino V (2007) Audio-visual event recognition in surveillance video sequences. IEEE Trans Multimedia 9(2): 257–267CrossRefGoogle Scholar
  16. DARPA mind’s eye broad agency announcement (2010) DARPA-BAA-10-53, (http://www.darpa.mil/tcto/docs/DARPA_ME_BAA-10-53_Mod1.pdf)
  17. Del Rose M, Stein J (2006) Survivability on the ART robotic vehicle. In: Proceedings from the seventeenth ground vehicle survivability symposiumGoogle Scholar
  18. Del Rose M, Wagner C, Frederick P (2011) Evidence feed forward hidden markov model: a new type of hidden markov model. Int J Artif Intell Appl 2(1): 1–19.Google Scholar
  19. Dimitrijevic M, Lepetit V, Fua P (2006) Human body pose detection using Bayesian spatio-temporal templates. Comput Vis Image Underst 104(2): 127–139CrossRefGoogle Scholar
  20. Du Y, Chen F, Xu W (2007) Human interaction representation and recognition through motion decomposition. IEEE Signal Process Lett 14(12): 952–955CrossRefGoogle Scholar
  21. Fin S, Singer Y, Tishby N (1998) The hierarchical hidden markov model: analysis and application. Mach Learn 32: 41–62CrossRefGoogle Scholar
  22. Fisher Iris data set website (http://archive.ics.uci.edu/ml/datasets/Iris)
  23. Galata A, Johnson N, Hogg D (2001) Learning variable length markov models of behaviour. Comput Vis Image Underst 81: 398–413zbMATHCrossRefGoogle Scholar
  24. Gao J, Collins RT, Hauptmann AG, Wactlar HD (2004) Articulated motion modeling for activity analysis. In: Proceedings from international conference on image and video retrieval, pp 1–19Google Scholar
  25. Gao X, Yang Y, Tao D, Li X (2009) Discriminative optical flow tensor for video semantic analysis. Comput Image Underst 113(3): 372–383CrossRefGoogle Scholar
  26. Gehrig D, Schulz T (2008) Selecting relevant features for human motion recognition. In: Proceedings from international conference on pattern recognition, pp 1–4. doi: 10.1109/ICPR.2008.4761290
  27. Ghayoori A, Hendessi F, Sheikh A (2006) Application of smooth ergodic hidden markov model in text to speech systems. Int J Signal Process 2(3): 151–157Google Scholar
  28. Gong S, Xiang T (2003) Recognition of roup activity using dynamic probabilistic networks. In: Proceedings from international conference in computer vision, pp 742–749Google Scholar
  29. Han L, Liange W, Wu XX, Jia YD (2008) Human action recognition using discriminative models in the learned hierarchical manifold space. In: Proceedings from IEEE international conference on automatic face and gesture recognition, pp 1–6. doi: 10.1109/AFGR.2008.4813416
  30. Hassan R, Nath B (2005) Stock market forecasting sing hidden markov model: a new approach. In: Proceedings of the fifth international conference on intelligent systems design and applicationGoogle Scholar
  31. Hassan R, Nath B, Kirley M (2006) A data clustering algorithm based on single hidden markov model. In: Proceedings of the international multi-conference on computer science and information technology, pp 57–66Google Scholar
  32. Herrera A, Beck A, Bell D, Miller P, Wu Q, Yan W (2008) Behaviour analysis and prediction in image sequences using rough sets. In: Proceedings from international machine vision and image processing conference, pp 71–76. doi: 10.1109/IMVIP.2008.24
  33. Herzog DL, Kruger V (2009) Recognition and synthesis of human movements by parametric HMMs. Lect Notes Comput Sci 5064:148–168, Springer, Berlin. doi: 10.1007/978-3-642-03061-1_8 Google Scholar
  34. Herzog D, Kruger V, Grest D (2008) Parametric hidden markov models for recognition on synthesis of movements. In: Proceedings of the British machine vision conferenceGoogle Scholar
  35. Ikizler N, Cinbis RG, Duygulu P (2008) Human action recognition with line and flow histograms. In: Proceedings from international conference on pattern recognition, pp 1–4. doi: 10.1109/ICPR.2008.4671434
  36. Ikizler N, Duygulu P (2007) Human action recognition using distribution of oriented rectangular patches. J Human Motion 271–284Google Scholar
  37. Jang WS, Lee WK, Lee IK, Lee J (2008) Enriching a motion database by analogous combination of partial human motion. Visual Comput 24(4):271–280, Springer, Berlin.Google Scholar
  38. Jenkins OC, Gonzalez G, Loper M (2006) Dynamic motion vocabularies for kinematic tracking and activity recognition. In: Proceedings from computer vision and pattern recognition conference, pp. 147–156. doi: 10.1109/CVPRW.2006.67
  39. Kam AH, Ann TK, Lung EH, Yun YW, Wang JX (2004) Automated recognition of highly complex human behavior. In: Proceedings from international conference on pattern recognition, Vol. 4, pp 327–330. doi: 10.1109/ICPR.2004.1333769
  40. Kawanaka D, Okatani T, Deguchi K (2006) HHMM based recognition of human activity. Inst Electron, Inf Commun Engineers Trans, Oxford J E89-D(7): 2180–2185Google Scholar
  41. Kitani KM, Okabe T, Sato Y, Sugimoto A (2007) Recovering the basic structure of human activity from a video-based symbol string. In: Proceedings from IEEE workshop on motion and video computing, pp 1–9. doi: 10.1109/WMVC.2007.34
  42. Lee H, Kim JH (1999) An HMM based threshold model approach for gesture recognition. IEEE Trans Pattern Anal Mach Intell (PAMI) 21: 961–973CrossRefGoogle Scholar
  43. Li X, Parizeau M, Plamondon R (2000) Training hidden markov models with multiple observations—a combinatorial method. IEEE Trans Pattern Anal Mach Intell (PAMI) 22(4): 177–371Google Scholar
  44. Liu X, Chua CS (2006) Multi-agent activity recognition using observation decomposed hidden markov models. Image Vis Comput 24: 166–175zbMATHCrossRefGoogle Scholar
  45. Liu JG, Yang Y, Shah M (2009) Learning semantic visual vocabularies using diffusion distance. In: Proceedings from computer vision and image processing conference, pp 461–468. doi: 10.1109/CVPRW.2009.5206848
  46. Masoud O, Papanikolopoulus NP (2003) A method for human action recognition. Image Vis Comput 21(8): 723–729CrossRefGoogle Scholar
  47. Mikolajczyk K, Uemura H (2008) Action recognition with motion-appearance vocabulary forest. In: Proceedings from computer vision and pattern recognition, pp 1–8. doi: 10.1109/CVPR.2008.4587628
  48. Mokhber A, Achard C, Milgram M (2008) Recognition of human behavior by space-time Silhouette Characterization. Pattern Recognit Lett 29: 81–89CrossRefGoogle Scholar
  49. Morellas V, Pavlidis I, Tsaimyartzis P (2003) DETER: detection of events for threat evaluation and recognition. Mach Vis Appl J 15(1): 29–45CrossRefGoogle Scholar
  50. Mori T, Segawa Y, Shimosaka M, Sato T (2004) Hierarchical recognition of daily human actions based on continuous hidden markov models. In: Proceedings from IEEE conference on automatic face and gesture recognition, pp 779–784. doi: 10.1109/AFGR.2004.1301629
  51. Murphy K (2002) Hidden semi-markov models. Technical report, MIT AI LabGoogle Scholar
  52. Natarajan P, Nevatia R (2007) Coupled hidden semi markov models for activity recognition. In: Proceedings of the IEEE workshop on motion and video computingGoogle Scholar
  53. Ogale A, Karapurkar A, Aloimonos Y (2007) View-invariant modeling and recognition of human actions using grammars. Lect Notes Comput Sci 4358:115–126, Springer, Berlin. doi: 10.1007/978-3-540-70932-9_9
  54. Oikonomopoulus A, Pantic M, Patras I (2008) B-spline polynomial descriptors for human activity recognition. Computer vision and pattern recognition conference, pp 1–6. doi: 10.1109/CVPR.2008.4563175
  55. Oikonomopoulos A, Patras I, Pantic M (2006) Kernal-based recognition of human actions using spatiotemporal salient points. In: Proceedings from computer vision and pattern recognition conference, pp 151–161. doi: 10.1109/CVPRW.2006.114
  56. Oliver N, Horvitz E, Garg A (2002) Layered representations for human activity recognition. In: Proceedings from IEEE international conference on multimodal inferences (ICMI), pp 3–8Google Scholar
  57. Oliver NM, Rosario B, Pentland AP (2000) A bayesian computer vision system for modeling human interaction. IEEE Trans Pattern Anal Mach Intell 22(8): 831–843CrossRefGoogle Scholar
  58. Parameswaran V, Chellappa R (2006) View invariance for human action recognition. Int J Comput Vis 66(1): 83–101CrossRefGoogle Scholar
  59. Perez O, Piccardi M, Garcia J, Patricio MA, Molina JM (2007) Comparison between genetic algorithms and the Baum-Welch algorithm in learning HMMs for human activity classification. Lect Notes Comput Sci 4448:399–406, Springer, BerlinGoogle Scholar
  60. Petrushin V (2007) Hidden markov models: fundamentals and application. EETimes online symposium for electrical engineers (OSEE), Oct 2007Google Scholar
  61. Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. In: Proceedings of the IEEE, Vol 7, pp 257–286Google Scholar
  62. Rahman M, Nakamura K, Ishikawa S (2002) Recognizing human behavior using universal eigenspace. In: Proceedings from international conference on pattern recognition, pp 295–298. doi: 10.1109/ICPR.2002.1044694
  63. Robertson N, Reid ID (2006) A general method for human activity recognition in video. Comput Vis Image Underst 104(2): 232–248CrossRefGoogle Scholar
  64. Rodriguez MD, Ahmed J, Shah M (2008) Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings from computer vision and image processing conference, pp 1–8. doi: 10.1109/CVPR.2008.4587727
  65. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings from international conference on pattern recognition, Vol. 3, pp 32–36. doi: 10.1109/ICPR.2004.1334462
  66. Shah M (2003) Understanding human behavior from motion imagery. Mach Vis Appl 14(4):210–214, Springer, Berlin. doi: 10.1007/s00138.0003-0124-3
  67. Shi QF, Wang L, Cheng L, Smola A (2008) Discriminative human action segmentation and recognition using semi-Markov Model. In: Proceedings from computer vision and pattern recognition conference, pp 1–8. doi: 10.1109/CVPR.2008.4587557
  68. Siebel NT, Maybank SJ (2004) The ADVISOR visual surveillance system. In: Proceedings from applications of computer vision, pp 103–111Google Scholar
  69. Starner T, Weaver J, Pentland A (1998) Real time American sign language recognition using desk and wearable computer based video. IEEE Trans Pattern Anal Mach Intell (PAMI) 20: 1371–1375CrossRefGoogle Scholar
  70. Stern H, Kartoun U, Shmilovici A (2001) A prototype fuzzy system for surveillance picture understanding. In: Proceedings from visual imaging and image processing conference, pp 624–629Google Scholar
  71. Thurau C, Hlavac V (2007) n-Grams of action primitives for recognizing human behavior. Lect Notes Comput Sci 4673:93–100, Springer, BerlinGoogle Scholar
  72. Truyen TT, Phung DQ, Venkatesh S, Bui HH (2006) AdaBoost.MRF: boosted Markov random forests and application to multilevel activity recognition. In: Proceedings from computer vision and pattern recognition conference, pp 1686–1693. doi: 10.1109/CVPR.2006.49
  73. Walter M, Psarrou A, Gong S (2001) Data driven gesture model acquisition using minimum description length. In: Proceedings from British machine vision conference, pp 673–683Google Scholar
  74. Wang Y, Huang KQ, Tan TN (2007) Group activity recognition based on ARMA shape sequence modeling. In: Proceedings from international conference on image processing, Vol. 3, pp. 209–212. doi: 10.1109/ICIP.2007.4379283
  75. Wang Y, Mori G (2009) Human action recognition by semilatent topic models. IEEE Trans Pattern Anal Mach Intell (PAMI) 31(10): 1762–1774CrossRefGoogle Scholar
  76. Weinland D, Ronfard R, Boyer E (2005) Motion history volumes for free viewpoint action recognition. In: Proceedings from IEEE international workshop on modeling people and human interactionGoogle Scholar
  77. Wilson A, Bobick A (1999) Parametric hidden markov models for gesture recognition. IEEE Trans Pattern Anal Mach Intell (PAMI) 21: 884–899CrossRefGoogle Scholar
  78. Xiang T, Gong S (2004) Activity based video content trajectory representation and segmentations. In: Proceedings from British machine vision conference, pp 177–186Google Scholar
  79. Xiang T, Gong S (2006) Incremental visual behaviour modelling. In: Proceedings from European conference on computer vision, pp 65–72Google Scholar
  80. Yamamoto M, Mitom H, Fujiwara F, Sato T (2006) Bayesian classification of task-oriented actions based on stochastic context free grammar. In: Proceedings from IEEE international conference on automatic face and gesture recognition, pp 317–322. doi: 10.1109/FGR.2006.28
  81. Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time sequential images using hidden markov models. In: Proceedings from IEEE computer vision and pattern recognition (CVPR), pp 379–385Google Scholar
  82. Yang JY, Wang JS, Chen YP (2008) Using acceleration measurements for activity recognition: an effective learning algorithm for constructing neural classifiers. Pattern Recognit Lett 29: 2213–2220CrossRefGoogle Scholar
  83. Yu C, Ballard D (2002) Learning to recognize human action sequences. In: Proceedings from international conference on development and earning, pp 28–33Google Scholar
  84. Zhang D, Gatica-Perez D, Bengio S, McCowan I (2006) Modeling individual and group actions in meetings with layered HMMs. IEEE Trans Multimedia 8(3): 509–520CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  1. 1.US Army Tank Automotive Research, Development, and Engineering Center (TARDEC)WarrenUSA
  2. 2.Oakland UniversityRochester HillsUSA

Personalised recommendations