Multimedia Systems

, Volume 10, Issue 2, pp 164–179 | Cite as

A hierarchical Bayesian network for event recognition of human actions and interactions

  • Sangho Park
  • J. K. Aggarwal
Sp.lss. on Video Surveillance


Recognizing human interactions is a challenging task due to the multiple body parts of interacting persons and the concomitant occlusions. This paper presents a method for the recognition of two-person interactions using a hierarchical Bayesian network (BN). The poses of simultaneously tracked body parts are estimated at the low level of the BN, and the overall body pose is estimated at the high level of the BN. The evolution of the poses of the multiple body parts are processed by a dynamic Bayesian network (DBN). The recognition of two-person interactions is expressed in terms of semantic verbal descriptions at multiple levels: individual body-part motions at low level, single-person actions at middle level, and two-person interactions at high level. Example sequences of interacting persons illustrate the success of the proposed framework.


Surveillance Event recognition Human interaction Motion Bayesian network 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal JK, Cai Q (1999) Human motion analysis: a review. Comput Vis Image Understand 73(3):295-304Google Scholar
  2. 2.
    Allen JF, Ferguson G (1994) Actions and events in interval temporal logic. J Logic Comput 4(5):531-579zbMATHGoogle Scholar
  3. 3.
    Bakowski A, Jones G (1999) Video surveillance tracking using color region adjacency graphs. In: 7th international conference on image processing and its applications, 13-15 July 1999, University of Manchester, UK, pp 794-798Google Scholar
  4. 4.
    Barron C, Kakadiaris I (2003) A convex penalty method for optical human motion tracking. In: ACM international workshop on video surveillance (IWVS), Berkeley, CA, November 2003, pp 1-10Google Scholar
  5. 5.
    Cowell RG, Dawid AP, Lauritzen SL, Spiegelhalter DJ (1999) Probabilistic networks and expert systems. Springer, Berlin Heidelberg New YorkGoogle Scholar
  6. 6.
    Data A, Shah M, Lobo N (2002) Person-on-person violence detection in video data. In: Proceedings of the international conference on pattern recognition, Quebec City, Canada, 1:433-438Google Scholar
  7. 7.
    Elgammal AM, Davis L (2001) Probabilistic framework for segmenting people under occlusion. In: International conference on computer vision, Vancouver, Canada, 2:145-152Google Scholar
  8. 8.
    Gavrila D (1999) The visual analysis of human movement: a survey. Comput Vis Image Understand 73(1):82-98CrossRefzbMATHGoogle Scholar
  9. 9.
    Graham RL (1972) An efficient algorithm for determining the convex hull of a finite planar set. Inf Process Lett 1:132-133CrossRefzbMATHGoogle Scholar
  10. 10.
    Haritaoglu I, Harwood D, Davis LS (2000) W4: Real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 22(8):797-808CrossRefGoogle Scholar
  11. 11.
    Hongeng S, Bremond F, Nevatia R (2000) Representation and optimal recognition of human activities. In: IEEE conference on computer vision and pattern recognition, 1:818-825Google Scholar
  12. 12.
    Huang C, Darwiche A (1996) Inference in belief networks: a procedural guide. Int J Approx Reason 15(3):225-263CrossRefzbMATHGoogle Scholar
  13. 13.
    Jensen FV, Jensen F (1994) Optimal junction trees. In: Conference on uncertainty in artificial intelligence, Seattle, July 1994Google Scholar
  14. 14.
    Kojima A, Tamura T, Fukunaga K (2002) Natural language description of human activities from video images based on concept hierarchy of actions. Int J Comput Vis 50(2):171-184CrossRefzbMATHGoogle Scholar
  15. 15.
    Moeslund T, Granum E (2001) A survey of computer vision-based human motion capture. Comput Vis Image Understand 81(3):231-268CrossRefzbMATHGoogle Scholar
  16. 16.
    Oliver NM, Rosario B, Pentland AP (2000) A Bayesian computer vision system for modeling human interactions. IEEE Trans Pattern Anal Mach Intell 22(8):831-843CrossRefGoogle Scholar
  17. 17.
    O’Rourke J (1994) Computational geometry in C. Cambridge University Press, Cambridge, UK, pp 70-112Google Scholar
  18. 18.
    Park S, Aggarwal JK (2000) Recognition of human interaction using multiple features in grayscale images. In: Proceedings of the internaitonal conference on pattern recognition, Barcelona, Spain, September 2000, 1:51-54Google Scholar
  19. 19.
    Park S, Aggarwal JK (2002) Segmentation and tracking of interacting human body parts under occlusion and shadowing. In: IEEE workshop on motion and video computing, Orlando, FL, pp 105-111Google Scholar
  20. 20.
    Park S, Aggarwal JK (2003) Recognition of two-person interactions using a hierarchical Bayesian network. In: ACM international workshop on video surveillance, Berkeley, CA, pp 65-76Google Scholar
  21. 21.
    Park S, Park J, Aggarwal JK (2003) Video retrieval of human interactions using model-based motion tracking and multi-layer finite state automata. In: Lecture notes in computer science, vol 2728. Springer, Berlin Heidelberg New York, pp 394-403Google Scholar
  22. 22.
    Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Mateo, CA, pp 337-340Google Scholar
  23. 23.
    Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257-286CrossRefGoogle Scholar
  24. 24.
    Rosales R, Sclaroff S (2000) Inferring body pose without tracking body parts. In: Computer vision and pattern recognition, Hilton Head Island, SC, pp 721-727Google Scholar
  25. 25.
    Sato K, Aggarwal JK (2001) Recognizing two-person interactions in outdoor image sequences. In: IEEE workshop on multi-object tracking, Vancouver, CAGoogle Scholar
  26. 26.
    Sherrah J, Gong S (2000) Resolving visual uncertainty and occlusion through probabilistic reasoning. In: British machine vision conference, Bristol, UK, pp 252-261Google Scholar
  27. 27.
    Sherrah J, Gong S (2000) Tracking discontinuous motion using bayesian inference. In: 6th European conference on computer vision, pp 150-166Google Scholar
  28. 28.
    Siebel N, Maybank S (2001) Real-time tracking of pedestrians and vehicles. In: IEEE workshop on PETS, Kauai, HIGoogle Scholar
  29. 29.
    Wada T, Matsuyama T (2000) Multiobject behavior recognition by event driven selective attention method. IEEE Trans Pattern Anal Mach Intell 22(8):873-887CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin/Heidelberg 2004

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringThe University of Texas at AustinAustinUSA

Personalised recommendations