Computer Vision

2014 Edition
| Editors: Katsushi Ikeuchi

Activity Recognition

  • Wanqing Li
  • Zicheng Liu
  • Zhengyou Zhang
Reference work entry


Related Concepts


Activity recognition refers to the process of identifying the types of movement performed by humans over a certain period of time. It is also known as action recognition when the period of time is relatively short.


The classic study on visual analysis of biological motion using moving light display (MLD) [1] has inspired tremendous interests among the computer vision researchers in the problem of recognizing human motion through visual information. The commonly used devices to capture human movement include human motion capture (MOCAP) with or without markers, multiple video camera systems, and single video camera systems. A MOCAP device usually works under controlled environment to capture the three-dimensional (3D) joint locations or angles of human bodies; multiple camera systems provide a way to reconstruct 3D body models from multiple viewpoint images. Both MOCAP and multiple camera systems...

This is a preview of subscription content, log in to check access.


  1. 1.
    Johansson G (1973) Visual perception of biological motion and a model for its analysis. Percept Psychophys 14(2): 201–211CrossRefGoogle Scholar
  2. 2.
    Bobick A, Davis J (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267CrossRefGoogle Scholar
  3. 3.
    Yilmaz A, Shah M (2008) A differential geometric approach to representing the human actions. Comput Vision Image Underst 109(3):335–351CrossRefGoogle Scholar
  4. 4.
    Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253CrossRefGoogle Scholar
  5. 5.
    Mokhber A, Achard C, Milgram M (2008) Recognition of human behavior by space-time silhouette characterization. Pattern Recogn 29(1):81–89CrossRefGoogle Scholar
  6. 6.
    Laptev I, Lindeberg T (2003) Space-time interest points. In: International conference on computer vision, Nice, pp 432–439zbMATHGoogle Scholar
  7. 7.
    Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse-temporal features. In: 2nd joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, Beijing, pp 65–72Google Scholar
  8. 8.
    Wong SF, Cipolla R (2007) Extracting spatiotemporal interest points using global information. In: International conference on computer vision, Rio de Janeiro, pp 1–8Google Scholar
  9. 9.
    Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vision 79(3):299–318CrossRefGoogle Scholar
  10. 10.
    Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: International conference on computer vision and pattern recognition (CVPR), Miami, pp 1–8Google Scholar
  11. 11.
    Yu G, Goussies NA, Yuan J, Liu Z (2011) Fast action detection via discriminative random forest voting and top-K subvolume search. IEEE Trans Multimedia 13:507–517CrossRefGoogle Scholar
  12. 12.
    Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden Markov model. In: International conference on computer vision and pattern recognition (CVPR), Champaign, pp 379–385Google Scholar
  13. 13.
    Galata A, Johnson N, Hogg D (2001) Learning variable-length Markov models of behaviour. Comput Vision Image Underst 81:398–413zbMATHCrossRefGoogle Scholar
  14. 14.
    Oliver N, Garg A, Horvits E (2004) Layered representations for learning and inferring office activity from multiple sensory channels. Comput Vision Image Underst 96:163–180CrossRefGoogle Scholar
  15. 15.
    Li W, Zhang Z, Liu Z (2008) Expandable data-driven graphical modeling of human actions based on salient postures. IEEE Trans Circuits Syst Video Technol 18(11): 1499–1510CrossRefGoogle Scholar
  16. 16.
    Sminchisescu C, Kanaujia A, Metaxas D (2006) Conditional models for contextual human motion recognition. Comput Vision Image Underst 104:210–220CrossRefGoogle Scholar
  17. 17.
    Wang Y, Mori G (2011) Hidden part models for human action recognition: probabilistic versus max margin. IEEE Trans Pattern Anal Mach Intell 33(7):1310–1323CrossRefGoogle Scholar
  18. 18.
    Wang L, Suter D (2007) Learning and matching of dynamic shape manifolds for human action recognition. IEEE Trans Image Process 16:1646–1661MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Wanqing Li
    • 1
  • Zicheng Liu
    • 2
  • Zhengyou Zhang
    • 3
  1. 1.University of WollongongWollongong, NSWAustralia
  2. 2.Microsoft Research, Microsoft CorporationRedmondUSA
  3. 3.Microsoft ResearchRedmondUSA