International Journal of Computer Vision

, Volume 93, Issue 2, pp 183–200 | Cite as

Stochastic Representation and Recognition of High-Level Group Activities

Article

Abstract

This paper describes a stochastic methodology for the recognition of various types of high-level group activities. Our system maintains a probabilistic representation of a group activity, describing how individual activities of its group members must be organized temporally, spatially, and logically. In order to recognize each of the represented group activities, our system searches for a set of group members that has the maximum posterior probability of satisfying its representation. A hierarchical recognition algorithm utilizing a Markov chain Monte Carlo (MCMC)-based probability distribution sampling has been designed, detecting group activities and finding the acting groups simultaneously. The system has been tested to recognize complex activities such as ‘a group of thieves stealing an object from another group’ and ‘a group assaulting a person’. Videos downloaded from YouTube as well as videos that we have taken are tested. Experimental results show that our system recognizes a wide range of group activities more reliably and accurately, as compared to previous approaches.

Keywords

Human activity recognition Group activity recognition Description-based event detection Stochastic grammar 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

(AVI 4.848 KB)

References

  1. Aggarwal, J. K., & Cai, Q. (1999). Human motion analysis: A review. Computer Vision and Image Understanding: CVIU, 73(3), 428–440. CrossRefGoogle Scholar
  2. Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11), 832–843. CrossRefMATHGoogle Scholar
  3. Allen, J. F., & Ferguson, G. (1994). Actions and events in interval temporal logic. Journal of Logic and Computation, 4(5), 531–579. CrossRefMATHMathSciNetGoogle Scholar
  4. Cupillard, F., Bremond, F., & Thonnat, M. (2002). Group behavior recognition with multiple cameras. In Proceedings of sixth IEEE workshop on applications of computer vision (WACV) (pp. 177–183). Google Scholar
  5. Francois, A. R. J., Nevatia, R., Hobbs, J., & Bolles, R. C. (2005). Verl: An ontology framework for representing and annotating video events. IEEE MultiMedia, 12(4), 76–86. CrossRefGoogle Scholar
  6. Gong, S., & Xiang, T. (2003). Recognition of group activities using dynamic probabilistic networks. In IEEE international conference on computer vision (ICCV) (p. 742). Google Scholar
  7. Hakeem, A., Sheikh, Y., & Shah, M. (2004). CASEE: A hierarchical event representation for the analysis of videos. In Proceedings of the 20th national conference on artificial intelligence (AAAI) (pp. 263–268). Google Scholar
  8. Hongeng, S., Nevatia, R., & Bremond, F. (2004). Video-based event recognition: activity representation and probabilistic recognition methods. Computer Vision and Image Understanding: CVIU, 96(2), 129–162. CrossRefGoogle Scholar
  9. Intille, S. S., & Bobick, A. F. (1999). A framework for recognizing multi-agent action from visual evidence. In AAAI/IAAI (pp. 518–525). Google Scholar
  10. Ivanov, Y. A., & Bobick, A. F. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 852–872. CrossRefGoogle Scholar
  11. Khan, S. M., & Shah, M. (2005). Detecting group activities using rigidity of formation. In ACM multimedia. Google Scholar
  12. Khan, Z., Balch, T., & Dellaert, F. (2005). Mcmc-based particle filtering for tracking a variable number of interacting targets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(11), 1805–1819. CrossRefGoogle Scholar
  13. Liao, L., Fox, D., & Kautz, H. (2005). Location-based activity recognition using relational Markov networks. In Proceedings of the nineteenth international conference on artificial intelligence (IJCAI). Google Scholar
  14. Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., & Kolobov, A. (2005). Blog: Probabilistic models with unknown objects. In Proceedings of the 19th international joint conference on artificial intelligence (IJCAI) (pp. 1352–1359). Google Scholar
  15. Oliver, N. M., Rosario, B., & Pentland, A. P. (2000). A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 831–843. CrossRefGoogle Scholar
  16. Park, S., & Aggarwal, J. K. (2004). A hierarchical Bayesian network for event recognition of human actions and interactions. Multimedia Systems, 10(2), 164–179. CrossRefGoogle Scholar
  17. Pinhanez, C. S., & Bobick, A. F. (1998). Human action detection using pnf propagation of temporal constraints. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (p. 898). Google Scholar
  18. Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62(1–2), 107–136. CrossRefGoogle Scholar
  19. Ryoo, M. S., & Aggarwal, J. K. (2008a). Observe-and-explain: A new approach for multiple hypotheses tracking of humans and objects. In IEEE conference on computer vision and pattern recognition (CVPR). Google Scholar
  20. Ryoo, M. S., & Aggarwal, J. K. (2008b). Recognition of high-level group activities based on activities of individual members. In Proceedings of IEEE workshop on motion and video computing (WMVC). Google Scholar
  21. Ryoo, M. S., & Aggarwal, J. K. (2009). Semantic representation and recognition of continued and recursive human activities. International Journal of Computer Vision (IJCV), 32(1), 1–24. CrossRefGoogle Scholar
  22. Siskind, J. M. (2001). Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. Journal of Artificial Intelligence Research (JAIR), 15, 31–90. MATHGoogle Scholar
  23. Song, X., & Nevatia, R. (2004). Detection and tracking of moving vehicles in crowded scenes. In Proceedings of IEEE workshop on motion and video computing (WMVC). Google Scholar
  24. Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of the conference on uncertainty in artificial intelligence (UAI). Google Scholar
  25. Tran, S. D., & Davis, L. S. (2008). Event modeling and recognition using Markov logic networks. In Proceedings of European conference on computer vision (ECCV) (pp. 610–623). Google Scholar
  26. Turaga, P., Chellappa, R., Subrahmanian, V. S., & Udrea, O. (2008). Machine recognition of human activities: A survey. IEEE Transactions on Circuits and Systems for Video Technology, 18(11), 1473–1488. CrossRefGoogle Scholar
  27. Vaswani, N., Roy Chowdhury, A., & Chellappa, R. (2003). Activity recognition using the dynamics of the configuration of interacting objects. In IEEE conference on computer vision and pattern recognition (CVPR). Google Scholar
  28. Viola, P., & Jones, M. J. (2001). Rapid object detection using a boosted cascade of simple features. In IEEE conference on computer vision and pattern recognition (CVPR). Google Scholar
  29. Vu, V.-T., Brémond, F., & Thonnat, M. (2003). Automatic video interpretation: A novel algorithm for temporal scenario recognition. In International joint conference on artificial intelligence (IJCAI) (pp. 1295–1302). Google Scholar
  30. Zhang, D., Gatica-Perez, D., Bengio, S., & McCowan, I. (2006). Modeling individual and group actions in meetings with layered hmms. IEEE Transactions on Multimedia, 8(3), 509–520. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Robot Research DepartmentElectronics and Telecommunications Research InstituteDaejeonKorea
  2. 2.Computer and Vision Research CenterThe University of Texas at AustinAustinUSA

Personalised recommendations