International Journal of Computer Vision

, Volume 98, Issue 1, pp 83–102 | Cite as

Explaining Activities as Consistent Groups of Events

A Bayesian Framework Using Attribute Multiset Grammars
  • Dima DamenEmail author
  • David Hogg


We propose a method for disambiguating uncertain detections of events by seeking global explanations for activities. Given a noisy visual input, and exploiting our knowledge of the activity and its constraints, one can provide a consistent set of events explaining all the detections. The paper presents a complete framework that starts with a general way to formalise the set of global explanations for a given activity using attribute multiset grammars (AMG). An AMG combines the event hierarchy with the necessary features for recognition and algebraic constraints defining allowable combinations of events and features. Parsing a set of detections by such a grammar finds a consistent set of events that satisfies the activity’s constraints. Each parse tree has a posterior probability in a Bayesian sense. To find the best parse tree, the grammar and a finite set of detections are mapped into a Bayesian network. The set of possible labellings of the Bayesian network corresponds to the set of all parse trees for a given set of detections. We compare greedy, multiple-hypotheses trees, reversible jump MCMC, and integer programming for finding the Maximum a Posteriori (MAP) solution over the space of explanations. The framework is tested for two applications; the activity in a bicycle rack and around a building entrance.


Activity analysis Event recognition Global explanations 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abney, S. P. (1997). Stochastic attribute-value grammars. Computational Linguistics, 23(4), 597–618. MathSciNetGoogle Scholar
  2. Aho, A., Sethi, R., & Ulman, J. (1986). Compilers: principles, techniques and tools. Reading: Addison-Wesley. Google Scholar
  3. Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. (2003). An introduction to MCMC for machine learning. Machine Learning, 50, 5–43. zbMATHCrossRefGoogle Scholar
  4. Blevins, J. (2001). Feature-based grammar. In R. Borsley & K. Borjars (Eds.), Nontransformational Syntax: A Guide to Current Models New York: Wiley-Blackwell. Google Scholar
  5. Damen, D. (2009). Activity analysis: finding explanations for sets of events. PhD thesis, University of Leeds, UK. Google Scholar
  6. Damen, D., & Hogg, D. (2008). Detecting carried objects in short video sequences. In Proc. European computer vision conference (ECCV). Google Scholar
  7. Damen, D., & Hogg, D. (2009a). Recognizing linked events: Searching the space of feasible explanations. In Proc. computer vision and pattern recognition (CVPR). Google Scholar
  8. Damen, D., & Hogg, D. (2009b). Attribute multiset grammars for global explanations of activities. In Proc. British machine vision conference (BMVC). Google Scholar
  9. de la Higuera, C. (2005). A bibliographical study of grammatical inference. Pattern Recognition, 38, 1332–1348. CrossRefGoogle Scholar
  10. Fan, Q., Bobbitt, R., Zhai, Y., Yanagawa, A., Pankanti, S., & Hampapur, A. (2009). Recognition of repetitive sequential human activity. In Proc. computer vision and pattern recognition (CVPR). Google Scholar
  11. Felzenszwalb, P., & Huttenlocher, D. (2000). Efficient matching of pictorial structures. In Proc. computer vision and pattern recognition (CVPR). Google Scholar
  12. FICO, D. O. (2007). XPRESS-MP solver—version 19.00.17. Google Scholar
  13. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741. zbMATHCrossRefGoogle Scholar
  14. Gollin, E. (1991). A method for the specification and parsing of visual languages. PhD thesis, Brown University. Google Scholar
  15. Gong, S., & Xiang, T. (2003). Recognition of group activities using dynamic probabilistic networks. In Proc. international conference on computer vision (ICCV). Google Scholar
  16. Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4), 711–732. MathSciNetzbMATHCrossRefGoogle Scholar
  17. Green, P. (2003). Trans-dimensional Markov chain Monte Carlo. In P. Green, N. Lid Hjort, & S. Richardson (Eds.), Highly structured stochastic systems. Oxford: Oxford University Press. Google Scholar
  18. Gupta, A., Srinivasan, P., Shi, J., & Davis, L. (2009). Learning a visually grounded storyline model from annotated videos. In Proc. computer vision and pattern recognition (CVPR). Google Scholar
  19. Hamid, R., Maddi, S., Bobick, A., & Essa, M. (2007). Structure from statistics—unsupervised activity analysis using suffix trees. In Proc. int. conf. on computer vision (ICCV). Google Scholar
  20. Han, F., & Zhu, S. (2005). Bottom-up/top-down image parsing by attribute graph grammar. In International conference on computer vision (ICCV) (Vol. 2, pp. 1778–1785). Google Scholar
  21. Hongeng, S., Nevatia, R., & Bremond, F. (2004). Video-based event recognition: activity representation and probabilistic recognition methods. Computer Vision and Image Understanding, 96(2), 129–162. CrossRefGoogle Scholar
  22. Huang, T., & Russell, S. (1998). Object identification: A Bayesian analysis with application to traffic surveillance. Artificial Intelligence, 103(1-2), 77–93. zbMATHCrossRefGoogle Scholar
  23. Intille, S., & Bobick, A. (2001). Recognizing planned, multiperson action. Computer Vision and Image Understanding, 81(3), 414–445. zbMATHCrossRefGoogle Scholar
  24. Ivanov, Y., & Bobick, A. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 852–872. CrossRefGoogle Scholar
  25. Joo, S.-W., & Chellappa, R. (2006a). Attribute grammar-based event recognition and anomaly detection. In Computer vision and pattern recognition workshop (CVPRW). Google Scholar
  26. Joo, S.-W., & Chellappa, R. (2006b). Recognition of multi-object events using attribute grammars. In Proc. int. conf. on image processing (ICIP) (pp. 2897–2900). Google Scholar
  27. Kastens, U. (1980). Ordered attributed grammars. Acta Informatica, 13, 229–256. MathSciNetzbMATHCrossRefGoogle Scholar
  28. Kitani, K. M., Sato, Y., & Sugimoto, A. (2005). Deleted interpolation using a hierarchical Bayesian grammar network for recognizing human activity. In Workshop on visual surveillance and performance evaluation of tracking and surveillance (PETS). Google Scholar
  29. Knuth, D. (1968). Semantics of context-free languages. Mathematical Systems Theory, 2(2). Google Scholar
  30. Lin, L., Gong, H., Li, L., & Wang, L. (2009). Semantic event representation and recognition using syntactic attribute graph grammar. Pattern Recognition Letters, 30(2), 180–186. CrossRefGoogle Scholar
  31. Magee, D. (2002). Tracking multiple vehicles using foreground, background and motion models. In Proc. workshop on statistical methods in video processing (pp. 7–12). Google Scholar
  32. Morefield, C. (1977). Application of 0-1 integer programming to multitarget tracking problems. IEEE Transactions on Automatic Control, 22(3), 302–312. MathSciNetzbMATHCrossRefGoogle Scholar
  33. Nevatia, R., Zhao, T., & Hongeng, S. (2003). Hierarchical language-based representation of events in video streams. In Proc. of IEEE workshop on event mining (EVENT). Google Scholar
  34. Nguyen, N., Venkatesh, S., & Bui, H. (2006). Recognising behaviours of multiple people with hierarchical probabilistic model and statistical data association. In Proc. British machine vision conference (BMVC). Google Scholar
  35. Nilsson, N. (1971). Problem-solving methods in artificial intelligence. New York: McGraw-Hill. Google Scholar
  36. Oh, S., Russell, S., & Sastry, S. (2004). Markov chain Monte Carlo data association for general multiple-target tracking problems. In 43rd IEEE Conference on Decision and Control (CDC) (Vol. 1, pp. 735–742). Google Scholar
  37. Reid, D. (1979). An algorithm for tracking multiple targets. IEEE Transactions on Automatic Control, 24(6), 843–854. CrossRefGoogle Scholar
  38. Riberio, P., & Santos-Victor, J. (2005). Human activity recognition from video: modeling, feature selection and classification architecture. In Intl. workshop on human activity recognition and modelling. Google Scholar
  39. Rota, M., & Thonnat, M. (2000). Video sequence interpretation for visual surveillance. In IEEE int. workshop on visual surveillance (VS), Dublin, Ireland. Google Scholar
  40. Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut—interactive foreground extraction using iterated graph cuts. In ACM trans. on graphics (SIGGRAPH). Google Scholar
  41. Shi, Y., Huang, Y., Minnen, D., Bobick, A., & Essa, I. (2004). Propagation networks for recognition of partially ordered sequential action. In Proc. computer vision and pattern recognition (CVPR). Google Scholar
  42. Siskind, J. (2000). Visual event classification via force dynamics. In Association for the advancement of artificial intelligence (AAAI) (pp. 149–155). Google Scholar
  43. Smith, K. (2007). Bayesian methods for visual multi-object tracking with applications to human activity recognition. PhD thesis, Ecole Polytechnique Federale de Lausanne (EPFL). Google Scholar
  44. Smith, P., Lobo, N. Vitoria, & Shah, M. (2005). Temporalboost for event recognition. In Proc. international conference on computer vision (ICCV). Google Scholar
  45. Tran, S., & Davis, L. (2008). Event modeling and recognition using Markov logic networks. In Proc. European conference on computer vision (ECCV). Google Scholar
  46. Tu, Z., & Zhu, S.-C. (2002). Image segmentation by data-driven Markov chain Monte Carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 657–673. CrossRefGoogle Scholar
  47. Wang, L., Wang, Y., & Gao, W. (2011). Mining layered grammar rules for action recognition. International Journal of Computer Vision, 93(2), 162–182. MathSciNetCrossRefGoogle Scholar
  48. Williams, H. (1999). Model Building in Mathematical Programming (4th edn.). New York: Wiley. Google Scholar
  49. Wu, Y., & Huang, T. (2004). Robust visual tracking by integrating multiple cues based on co-inference learning. International Journal of Computer Vision, 58(1), 55–71. CrossRefGoogle Scholar
  50. Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In Proc. international conference on computer vision (ICCV). Google Scholar
  51. Young, R., Kittler, J., & Matas, J. (1998). Hypothesis selection for scene interpretation using grammatical models of scene evolution. In Int. conf. on pattern recognition. Google Scholar
  52. Yu, Q., Medioni, G., & Cohen, I. (2007). Multiple target tracking using spatio-temporal Markov chain Monte Carlo data association. In Proc. computer vision and pattern recognition (CVPR). Google Scholar
  53. Zhao, T., & Nevatia, R. (2004). Tracking multiple humans in crowded environment. In Proc. computer vision and pattern recognition (CVPR). Google Scholar
  54. Zhu, S.-C., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362. zbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of BristolBristolUK
  2. 2.School of ComputingUniversity of LeedsLeedsUK

Personalised recommendations