Skip to main content

Explaining Activities as Consistent Groups of Events

A Bayesian Framework Using Attribute Multiset Grammars

Abstract

We propose a method for disambiguating uncertain detections of events by seeking global explanations for activities. Given a noisy visual input, and exploiting our knowledge of the activity and its constraints, one can provide a consistent set of events explaining all the detections. The paper presents a complete framework that starts with a general way to formalise the set of global explanations for a given activity using attribute multiset grammars (AMG). An AMG combines the event hierarchy with the necessary features for recognition and algebraic constraints defining allowable combinations of events and features. Parsing a set of detections by such a grammar finds a consistent set of events that satisfies the activity’s constraints. Each parse tree has a posterior probability in a Bayesian sense. To find the best parse tree, the grammar and a finite set of detections are mapped into a Bayesian network. The set of possible labellings of the Bayesian network corresponds to the set of all parse trees for a given set of detections. We compare greedy, multiple-hypotheses trees, reversible jump MCMC, and integer programming for finding the Maximum a Posteriori (MAP) solution over the space of explanations. The framework is tested for two applications; the activity in a bicycle rack and around a building entrance.

This is a preview of subscription content, access via your institution.

References

  1. Abney, S. P. (1997). Stochastic attribute-value grammars. Computational Linguistics, 23(4), 597–618.

    MathSciNet  Google Scholar 

  2. Aho, A., Sethi, R., & Ulman, J. (1986). Compilers: principles, techniques and tools. Reading: Addison-Wesley.

    Google Scholar 

  3. Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. (2003). An introduction to MCMC for machine learning. Machine Learning, 50, 5–43.

    MATH  Article  Google Scholar 

  4. Blevins, J. (2001). Feature-based grammar. In R. Borsley & K. Borjars (Eds.), Nontransformational Syntax: A Guide to Current Models New York: Wiley-Blackwell.

    Google Scholar 

  5. Damen, D. (2009). Activity analysis: finding explanations for sets of events. PhD thesis, University of Leeds, UK.

  6. Damen, D., & Hogg, D. (2008). Detecting carried objects in short video sequences. In Proc. European computer vision conference (ECCV).

    Google Scholar 

  7. Damen, D., & Hogg, D. (2009a). Recognizing linked events: Searching the space of feasible explanations. In Proc. computer vision and pattern recognition (CVPR).

    Google Scholar 

  8. Damen, D., & Hogg, D. (2009b). Attribute multiset grammars for global explanations of activities. In Proc. British machine vision conference (BMVC).

    Google Scholar 

  9. de la Higuera, C. (2005). A bibliographical study of grammatical inference. Pattern Recognition, 38, 1332–1348.

    Article  Google Scholar 

  10. Fan, Q., Bobbitt, R., Zhai, Y., Yanagawa, A., Pankanti, S., & Hampapur, A. (2009). Recognition of repetitive sequential human activity. In Proc. computer vision and pattern recognition (CVPR).

    Google Scholar 

  11. Felzenszwalb, P., & Huttenlocher, D. (2000). Efficient matching of pictorial structures. In Proc. computer vision and pattern recognition (CVPR).

    Google Scholar 

  12. FICO, D. O. (2007). XPRESS-MP solver—version 19.00.17.

  13. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741.

    MATH  Article  Google Scholar 

  14. Gollin, E. (1991). A method for the specification and parsing of visual languages. PhD thesis, Brown University.

  15. Gong, S., & Xiang, T. (2003). Recognition of group activities using dynamic probabilistic networks. In Proc. international conference on computer vision (ICCV).

    Google Scholar 

  16. Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4), 711–732.

    MathSciNet  MATH  Article  Google Scholar 

  17. Green, P. (2003). Trans-dimensional Markov chain Monte Carlo. In P. Green, N. Lid Hjort, & S. Richardson (Eds.), Highly structured stochastic systems. Oxford: Oxford University Press.

    Google Scholar 

  18. Gupta, A., Srinivasan, P., Shi, J., & Davis, L. (2009). Learning a visually grounded storyline model from annotated videos. In Proc. computer vision and pattern recognition (CVPR).

    Google Scholar 

  19. Hamid, R., Maddi, S., Bobick, A., & Essa, M. (2007). Structure from statistics—unsupervised activity analysis using suffix trees. In Proc. int. conf. on computer vision (ICCV).

    Google Scholar 

  20. Han, F., & Zhu, S. (2005). Bottom-up/top-down image parsing by attribute graph grammar. In International conference on computer vision (ICCV) (Vol. 2, pp. 1778–1785).

    Google Scholar 

  21. Hongeng, S., Nevatia, R., & Bremond, F. (2004). Video-based event recognition: activity representation and probabilistic recognition methods. Computer Vision and Image Understanding, 96(2), 129–162.

    Article  Google Scholar 

  22. Huang, T., & Russell, S. (1998). Object identification: A Bayesian analysis with application to traffic surveillance. Artificial Intelligence, 103(1-2), 77–93.

    MATH  Article  Google Scholar 

  23. Intille, S., & Bobick, A. (2001). Recognizing planned, multiperson action. Computer Vision and Image Understanding, 81(3), 414–445.

    MATH  Article  Google Scholar 

  24. Ivanov, Y., & Bobick, A. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 852–872.

    Article  Google Scholar 

  25. Joo, S.-W., & Chellappa, R. (2006a). Attribute grammar-based event recognition and anomaly detection. In Computer vision and pattern recognition workshop (CVPRW).

    Google Scholar 

  26. Joo, S.-W., & Chellappa, R. (2006b). Recognition of multi-object events using attribute grammars. In Proc. int. conf. on image processing (ICIP) (pp. 2897–2900).

    Google Scholar 

  27. Kastens, U. (1980). Ordered attributed grammars. Acta Informatica, 13, 229–256.

    MathSciNet  MATH  Article  Google Scholar 

  28. Kitani, K. M., Sato, Y., & Sugimoto, A. (2005). Deleted interpolation using a hierarchical Bayesian grammar network for recognizing human activity. In Workshop on visual surveillance and performance evaluation of tracking and surveillance (PETS).

    Google Scholar 

  29. Knuth, D. (1968). Semantics of context-free languages. Mathematical Systems Theory, 2(2).

  30. Lin, L., Gong, H., Li, L., & Wang, L. (2009). Semantic event representation and recognition using syntactic attribute graph grammar. Pattern Recognition Letters, 30(2), 180–186.

    Article  Google Scholar 

  31. Magee, D. (2002). Tracking multiple vehicles using foreground, background and motion models. In Proc. workshop on statistical methods in video processing (pp. 7–12).

    Google Scholar 

  32. Morefield, C. (1977). Application of 0-1 integer programming to multitarget tracking problems. IEEE Transactions on Automatic Control, 22(3), 302–312.

    MathSciNet  MATH  Article  Google Scholar 

  33. Nevatia, R., Zhao, T., & Hongeng, S. (2003). Hierarchical language-based representation of events in video streams. In Proc. of IEEE workshop on event mining (EVENT).

    Google Scholar 

  34. Nguyen, N., Venkatesh, S., & Bui, H. (2006). Recognising behaviours of multiple people with hierarchical probabilistic model and statistical data association. In Proc. British machine vision conference (BMVC).

    Google Scholar 

  35. Nilsson, N. (1971). Problem-solving methods in artificial intelligence. New York: McGraw-Hill.

    Google Scholar 

  36. Oh, S., Russell, S., & Sastry, S. (2004). Markov chain Monte Carlo data association for general multiple-target tracking problems. In 43rd IEEE Conference on Decision and Control (CDC) (Vol. 1, pp. 735–742).

    Google Scholar 

  37. Reid, D. (1979). An algorithm for tracking multiple targets. IEEE Transactions on Automatic Control, 24(6), 843–854.

    Article  Google Scholar 

  38. Riberio, P., & Santos-Victor, J. (2005). Human activity recognition from video: modeling, feature selection and classification architecture. In Intl. workshop on human activity recognition and modelling.

    Google Scholar 

  39. Rota, M., & Thonnat, M. (2000). Video sequence interpretation for visual surveillance. In IEEE int. workshop on visual surveillance (VS), Dublin, Ireland.

    Google Scholar 

  40. Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut—interactive foreground extraction using iterated graph cuts. In ACM trans. on graphics (SIGGRAPH).

    Google Scholar 

  41. Shi, Y., Huang, Y., Minnen, D., Bobick, A., & Essa, I. (2004). Propagation networks for recognition of partially ordered sequential action. In Proc. computer vision and pattern recognition (CVPR).

    Google Scholar 

  42. Siskind, J. (2000). Visual event classification via force dynamics. In Association for the advancement of artificial intelligence (AAAI) (pp. 149–155).

    Google Scholar 

  43. Smith, K. (2007). Bayesian methods for visual multi-object tracking with applications to human activity recognition. PhD thesis, Ecole Polytechnique Federale de Lausanne (EPFL).

  44. Smith, P., Lobo, N. Vitoria, & Shah, M. (2005). Temporalboost for event recognition. In Proc. international conference on computer vision (ICCV).

    Google Scholar 

  45. Tran, S., & Davis, L. (2008). Event modeling and recognition using Markov logic networks. In Proc. European conference on computer vision (ECCV).

    Google Scholar 

  46. Tu, Z., & Zhu, S.-C. (2002). Image segmentation by data-driven Markov chain Monte Carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 657–673.

    Article  Google Scholar 

  47. Wang, L., Wang, Y., & Gao, W. (2011). Mining layered grammar rules for action recognition. International Journal of Computer Vision, 93(2), 162–182.

    MathSciNet  Article  Google Scholar 

  48. Williams, H. (1999). Model Building in Mathematical Programming (4th edn.). New York: Wiley.

    Google Scholar 

  49. Wu, Y., & Huang, T. (2004). Robust visual tracking by integrating multiple cues based on co-inference learning. International Journal of Computer Vision, 58(1), 55–71.

    Article  Google Scholar 

  50. Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In Proc. international conference on computer vision (ICCV).

    Google Scholar 

  51. Young, R., Kittler, J., & Matas, J. (1998). Hypothesis selection for scene interpretation using grammatical models of scene evolution. In Int. conf. on pattern recognition.

    Google Scholar 

  52. Yu, Q., Medioni, G., & Cohen, I. (2007). Multiple target tracking using spatio-temporal Markov chain Monte Carlo data association. In Proc. computer vision and pattern recognition (CVPR).

    Google Scholar 

  53. Zhao, T., & Nevatia, R. (2004). Tracking multiple humans in crowded environment. In Proc. computer vision and pattern recognition (CVPR).

    Google Scholar 

  54. Zhu, S.-C., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362.

    MATH  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Dima Damen.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Damen, D., Hogg, D. Explaining Activities as Consistent Groups of Events. Int J Comput Vis 98, 83–102 (2012). https://doi.org/10.1007/s11263-011-0497-0

Download citation

Keywords

  • Activity analysis
  • Event recognition
  • Global explanations