We propose a method for disambiguating uncertain detections of events by seeking global explanations for activities. Given a noisy visual input, and exploiting our knowledge of the activity and its constraints, one can provide a consistent set of events explaining all the detections. The paper presents a complete framework that starts with a general way to formalise the set of global explanations for a given activity using attribute multiset grammars (AMG). An AMG combines the event hierarchy with the necessary features for recognition and algebraic constraints defining allowable combinations of events and features. Parsing a set of detections by such a grammar finds a consistent set of events that satisfies the activity’s constraints. Each parse tree has a posterior probability in a Bayesian sense. To find the best parse tree, the grammar and a finite set of detections are mapped into a Bayesian network. The set of possible labellings of the Bayesian network corresponds to the set of all parse trees for a given set of detections. We compare greedy, multiple-hypotheses trees, reversible jump MCMC, and integer programming for finding the Maximum a Posteriori (MAP) solution over the space of explanations. The framework is tested for two applications; the activity in a bicycle rack and around a building entrance.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price includes VAT (USA)
Tax calculation will be finalised during checkout.
Abney, S. P. (1997). Stochastic attribute-value grammars. Computational Linguistics, 23(4), 597–618.
Aho, A., Sethi, R., & Ulman, J. (1986). Compilers: principles, techniques and tools. Reading: Addison-Wesley.
Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. (2003). An introduction to MCMC for machine learning. Machine Learning, 50, 5–43.
Blevins, J. (2001). Feature-based grammar. In R. Borsley & K. Borjars (Eds.), Nontransformational Syntax: A Guide to Current Models New York: Wiley-Blackwell.
Damen, D. (2009). Activity analysis: finding explanations for sets of events. PhD thesis, University of Leeds, UK.
Damen, D., & Hogg, D. (2008). Detecting carried objects in short video sequences. In Proc. European computer vision conference (ECCV).
Damen, D., & Hogg, D. (2009a). Recognizing linked events: Searching the space of feasible explanations. In Proc. computer vision and pattern recognition (CVPR).
Damen, D., & Hogg, D. (2009b). Attribute multiset grammars for global explanations of activities. In Proc. British machine vision conference (BMVC).
de la Higuera, C. (2005). A bibliographical study of grammatical inference. Pattern Recognition, 38, 1332–1348.
Fan, Q., Bobbitt, R., Zhai, Y., Yanagawa, A., Pankanti, S., & Hampapur, A. (2009). Recognition of repetitive sequential human activity. In Proc. computer vision and pattern recognition (CVPR).
Felzenszwalb, P., & Huttenlocher, D. (2000). Efficient matching of pictorial structures. In Proc. computer vision and pattern recognition (CVPR).
FICO, D. O. (2007). XPRESS-MP solver—version 19.00.17.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741.
Gollin, E. (1991). A method for the specification and parsing of visual languages. PhD thesis, Brown University.
Gong, S., & Xiang, T. (2003). Recognition of group activities using dynamic probabilistic networks. In Proc. international conference on computer vision (ICCV).
Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4), 711–732.
Green, P. (2003). Trans-dimensional Markov chain Monte Carlo. In P. Green, N. Lid Hjort, & S. Richardson (Eds.), Highly structured stochastic systems. Oxford: Oxford University Press.
Gupta, A., Srinivasan, P., Shi, J., & Davis, L. (2009). Learning a visually grounded storyline model from annotated videos. In Proc. computer vision and pattern recognition (CVPR).
Hamid, R., Maddi, S., Bobick, A., & Essa, M. (2007). Structure from statistics—unsupervised activity analysis using suffix trees. In Proc. int. conf. on computer vision (ICCV).
Han, F., & Zhu, S. (2005). Bottom-up/top-down image parsing by attribute graph grammar. In International conference on computer vision (ICCV) (Vol. 2, pp. 1778–1785).
Hongeng, S., Nevatia, R., & Bremond, F. (2004). Video-based event recognition: activity representation and probabilistic recognition methods. Computer Vision and Image Understanding, 96(2), 129–162.
Huang, T., & Russell, S. (1998). Object identification: A Bayesian analysis with application to traffic surveillance. Artificial Intelligence, 103(1-2), 77–93.
Intille, S., & Bobick, A. (2001). Recognizing planned, multiperson action. Computer Vision and Image Understanding, 81(3), 414–445.
Ivanov, Y., & Bobick, A. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 852–872.
Joo, S.-W., & Chellappa, R. (2006a). Attribute grammar-based event recognition and anomaly detection. In Computer vision and pattern recognition workshop (CVPRW).
Joo, S.-W., & Chellappa, R. (2006b). Recognition of multi-object events using attribute grammars. In Proc. int. conf. on image processing (ICIP) (pp. 2897–2900).
Kastens, U. (1980). Ordered attributed grammars. Acta Informatica, 13, 229–256.
Kitani, K. M., Sato, Y., & Sugimoto, A. (2005). Deleted interpolation using a hierarchical Bayesian grammar network for recognizing human activity. In Workshop on visual surveillance and performance evaluation of tracking and surveillance (PETS).
Knuth, D. (1968). Semantics of context-free languages. Mathematical Systems Theory, 2(2).
Lin, L., Gong, H., Li, L., & Wang, L. (2009). Semantic event representation and recognition using syntactic attribute graph grammar. Pattern Recognition Letters, 30(2), 180–186.
Magee, D. (2002). Tracking multiple vehicles using foreground, background and motion models. In Proc. workshop on statistical methods in video processing (pp. 7–12).
Morefield, C. (1977). Application of 0-1 integer programming to multitarget tracking problems. IEEE Transactions on Automatic Control, 22(3), 302–312.
Nevatia, R., Zhao, T., & Hongeng, S. (2003). Hierarchical language-based representation of events in video streams. In Proc. of IEEE workshop on event mining (EVENT).
Nguyen, N., Venkatesh, S., & Bui, H. (2006). Recognising behaviours of multiple people with hierarchical probabilistic model and statistical data association. In Proc. British machine vision conference (BMVC).
Nilsson, N. (1971). Problem-solving methods in artificial intelligence. New York: McGraw-Hill.
Oh, S., Russell, S., & Sastry, S. (2004). Markov chain Monte Carlo data association for general multiple-target tracking problems. In 43rd IEEE Conference on Decision and Control (CDC) (Vol. 1, pp. 735–742).
Reid, D. (1979). An algorithm for tracking multiple targets. IEEE Transactions on Automatic Control, 24(6), 843–854.
Riberio, P., & Santos-Victor, J. (2005). Human activity recognition from video: modeling, feature selection and classification architecture. In Intl. workshop on human activity recognition and modelling.
Rota, M., & Thonnat, M. (2000). Video sequence interpretation for visual surveillance. In IEEE int. workshop on visual surveillance (VS), Dublin, Ireland.
Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut—interactive foreground extraction using iterated graph cuts. In ACM trans. on graphics (SIGGRAPH).
Shi, Y., Huang, Y., Minnen, D., Bobick, A., & Essa, I. (2004). Propagation networks for recognition of partially ordered sequential action. In Proc. computer vision and pattern recognition (CVPR).
Siskind, J. (2000). Visual event classification via force dynamics. In Association for the advancement of artificial intelligence (AAAI) (pp. 149–155).
Smith, K. (2007). Bayesian methods for visual multi-object tracking with applications to human activity recognition. PhD thesis, Ecole Polytechnique Federale de Lausanne (EPFL).
Smith, P., Lobo, N. Vitoria, & Shah, M. (2005). Temporalboost for event recognition. In Proc. international conference on computer vision (ICCV).
Tran, S., & Davis, L. (2008). Event modeling and recognition using Markov logic networks. In Proc. European conference on computer vision (ECCV).
Tu, Z., & Zhu, S.-C. (2002). Image segmentation by data-driven Markov chain Monte Carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 657–673.
Wang, L., Wang, Y., & Gao, W. (2011). Mining layered grammar rules for action recognition. International Journal of Computer Vision, 93(2), 162–182.
Williams, H. (1999). Model Building in Mathematical Programming (4th edn.). New York: Wiley.
Wu, Y., & Huang, T. (2004). Robust visual tracking by integrating multiple cues based on co-inference learning. International Journal of Computer Vision, 58(1), 55–71.
Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In Proc. international conference on computer vision (ICCV).
Young, R., Kittler, J., & Matas, J. (1998). Hypothesis selection for scene interpretation using grammatical models of scene evolution. In Int. conf. on pattern recognition.
Yu, Q., Medioni, G., & Cohen, I. (2007). Multiple target tracking using spatio-temporal Markov chain Monte Carlo data association. In Proc. computer vision and pattern recognition (CVPR).
Zhao, T., & Nevatia, R. (2004). Tracking multiple humans in crowded environment. In Proc. computer vision and pattern recognition (CVPR).
Zhu, S.-C., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362.
About this article
Cite this article
Damen, D., Hogg, D. Explaining Activities as Consistent Groups of Events. Int J Comput Vis 98, 83–102 (2012). https://doi.org/10.1007/s11263-011-0497-0
- Activity analysis
- Event recognition
- Global explanations