Skip to main content
Log in

Multi-person interaction and activity analysis: a synergistic track- and body-level analysis framework

  • Special Issue
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This paper presents a synergistic track- and body-level analysis framework for multi-person interaction and activity analysis in the context of video surveillance. The proposed two-level analysis framework covers human activities both in wide and narrow fields of view with distributed camera sensors. The track-level analysis deals with the gross-level activity patterns of multiple tracks in various wide-area surveillance situations. The body-level analysis focuses on detailed-level activity patterns of individuals in isolation or in groups. ‘Spatio-temporal personal space’ is introduced to model various patterns of grouping behavior between persons. ‘Adaptive context switching’ is proposed to mediate the track-level and body-level analysis depending on the interpersonal configuration and imaging fidelity. Our approach is based on the hierarchy of action concepts: static pose, dynamic gesture, body-part action, single-person activity, and group interaction. Event ontology with human activity hierarchy combines the multi-level analysis results to form a semantically meaningful event description. Experimental results with real-world data show the effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal J.K. and Cai Q. (1999). Human motion analysis: a review. Comput. Vis. Image Underst. 73(3): 295–304

    Google Scholar 

  2. Allen J.F. and Ferguson G. (1994). Actions and events in interval temporal logic. J. Logic Comput. 4(5): 531–579

    Article  MATH  MathSciNet  Google Scholar 

  3. Altman I. (1981). The environment and social behavior: privacy, personal space, territory, crowding. Irving Publishers, New York

    Google Scholar 

  4. Chalidabhongse, T., Kim, K., Harwood, D., Davis, L.: A perturbation method for evaluating background subtraction algorithms. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. Nice, France (2003)

  5. Gavrila D. (1999). The visual analysis of human movement: a survey. Comput. Vis. Image Underst. 73(1): 82–98

    Article  MATH  Google Scholar 

  6. Hall, D., Nascimento, J., Ribeiro, P., Andrade, E., Moreno, P., Pesnel, S., List, T., Emonet1, R., Fisher, R., Victor, J.S., Crowley, J.: Comparison of target detection algorithms using adaptive background models. In: IEEE VS-PETS. Beijing, China (2005)

  7. Haritaoglu I., Harwood D. and Davis L.S. (2000). W4: Real-time surveillance of people and their activities. IEEE Trans. Pattern Anal. Mach. Intell. 22(8): 797–808

    Article  Google Scholar 

  8. Huang, K.S., Trivedi, M.M.: 3D shape context based gesture analysis integrated with tracking using omni video array. In: Proceedings of the IEEE Workshop on Vision for Human-Computer Interaction (V4HCI). San Diego, USA (2005)

  9. Kim, K., Chalidabhongse, T., Harwood, D., Davis, L.: Real-time foreground-background segmentation using codebook model. Real Time Imaging 11, (2005)

  10. Kojima, A., Tamura, T., Fukunaga, K.: Textual description of human activities by tracking head and hand motions. In: International Conference on Pattern Recognition, vol. 2, pp.~1073–1077 (2002)

  11. Makris, D., Ellis, T., Black, J.: Learning scene semantics. In: ECOVISION 2004 Early Cognitive Vision Workshop. Isle of Skye, Scotland, UK (2004)

  12. McKenna, S.J., Jabri, S., Duric, Z., Wechsler, H.: Tracking interacting people. In: 4th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2000), pp. 348–353 (2000)

  13. Moeslund T. and Granum E. (2001). A survey of computer vision-based human motion capture. Comput. Vis. Image Underst. 81(3): 231–268

    Article  MATH  Google Scholar 

  14. Oliver, N., Horvitz, E., Garg, A.: Layered representations for human activity recognition. In: Proceedings of the IEEE International Conference on Multimodal Interfaces, pp. 3–8 (2002)

  15. Oliver N.M., Rosario B. and Pentland A.P. (2000). A Bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22(8): 831–843

    Article  Google Scholar 

  16. Park, S., Aggarwal, J.K.: A hierarchical bayesian network for event recognition of human actions and interactions. Multimedia Systems: Special Issue on Video Surveillance, pp. 164–179 (2004)

  17. Park, S., Aggarwal, J.K.: Semantic-level understanding of human actions and interactions using event hierarchy. In: IEEE Workshop on Articulated and Nonrigid Motion. Washington, DC, USA (2004)

  18. Park S. and Aggarwal J.K. (2006). Simultaneous tracking of multiple body parts of interacting persons. Comput. Vis. Image Underst. 102(1): 1–21

    Article  Google Scholar 

  19. Park, S., Trivedi, M.M.: A track-based human movement analysis and privacy protection system adaptive to environmental contexts. In: IEEE International Conference on Advanced Video and Signal based Surveillance. Como, Italy (2005)

  20. Rabiner L. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2): 257–286

    Article  Google Scholar 

  21. Remagnino P., Shihab A. and Jones G. (2004). Distributed intelligence for multi-camera visual surveillance. Pattern Recognit.: Special Issue on Agent-based Computer Vision 37(4): 675–689

    Google Scholar 

  22. Sommer R. (1969). Personal Space: The Behavioral Basis of Design. Prentice Hall, Englewood Cliffs

    Google Scholar 

  23. Trivedi, M., Mikic, I., Bhonsle, S.: Active camera networks and semantic event databases for intelligent environments. In: IEEE Workshop on Human Modeling, Analysis and Synthesis. Hilton Read, South Carolina (2000)

  24. Trivedi, M.M., Gandhi, T., Huang, K.: Distributed interactive video arrays for event capture and enhanced situational awareness. IEEE Intelligent Systems, Special Issue on Artificial Intelligence for Homeland Security (2005)

  25. Trivedi M.M., Huang K.S. and Mikic I. (2005). Dynamic context capture and distributed video arrays for intelligent spaces. IEEE Trans. Syst. Man Cybern. Part A 35(1): 145–163

    Article  Google Scholar 

  26. Valera M. and Velastin S. (2005). Intelligent distributed surveillance systems: a review. IEEE Proc. Vis. Image Signal Process. 152(2): 192–204

    Article  Google Scholar 

  27. Velastin S., Boghossian B., Lo B., Sun J., Vicencio-Silva M. (2005). Prismatica: toward ambient intelligence in public transport environments. IEEE Trans. Syst. Man Cybern. Part A 35(1): 164–182

    Article  Google Scholar 

  28. Williams E. (1959). Regression Analysis. Wiley, New York

    MATH  Google Scholar 

  29. Zhao T. and Nevatia R. (2004). Tracking multiple humans in complex situations. IEEE Trans. Pattern Anal. Mach. Intell. 26(9): 1208–1221

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohan M. Trivedi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, S., Trivedi, M.M. Multi-person interaction and activity analysis: a synergistic track- and body-level analysis framework. Machine Vision and Applications 18, 151–166 (2007). https://doi.org/10.1007/s00138-006-0055-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-006-0055-x

Keywords

Navigation