Multi-thread Parsing for Recognizing Complex Events in Videos

  • Zhang Zhang
  • Kaiqi Huang
  • Tieniu Tan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5304)


This paper presents a probabilistic grammar approach to the recognition of complex events in videos. Firstly, based on the original motion features, a rule induction algorithm is adopted to learn the event rules. Then, a multi-thread parsing (MTP) algorithm is adopted to recognize the complex events involving parallel temporal relation in sub-events, whereas the commonly used parser can only handle the sequential relation. Additionally, a Viterbi-like error recovery strategy is embedded in the parsing process to correct the large time scale errors, such as insertion and deletion errors. Extensive experiments including indoor gymnastic exercises and outdoor traffic events are performed. As supported by experimental results, the MTP algorithm can effectively recognize the complex events due to the strong discriminative representation and the error recovery strategy.


  1. 1.
    Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised Learning of Human Action Categories Using Spatial-TemporalWords. In: Proc. Conf. BMVC (2006)Google Scholar
  2. 2.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. Int. Conf. on Computer Vision (ICCV) (2003)Google Scholar
  3. 3.
    Laxton, B., Lim, J., Kriegman, D.: Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2007)Google Scholar
  4. 4.
    Shi, Y., Huang, Y., Minnen, D., Bobick, A., Essa, I.: Propagation networks for recognition of partially ordered sequential action. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2004)Google Scholar
  5. 5.
    Nguyen, N.T., Phung, D.Q., Venkatesh, S., Bui, H.: Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2005)Google Scholar
  6. 6.
    Xiang, T., Gong, S.: Beyond Tracking: Modelling Activity and Understanding Behaviour. International Journal of Computer Vision (IJCV) 67(1) (2006)Google Scholar
  7. 7.
    Min, J., Kasturi, R.: Activity Recognition Based on Multiple Motion Trajectories. In: Proc. Int. Conf. on Pattern Recognition (ICPR), pp. 199–202 (2004)Google Scholar
  8. 8.
    Minnen, D., Essa, I., Starner, T.: Expectation Grammars: Leveraging High-Level Expectations for Activity Recognition. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 626–632 (2003)Google Scholar
  9. 9.
    Moore, D., Essa, I.: Recognizing Multitasked Activities from Video Using Stochastic Context-Free Grammar. In: Proc. Conf. AAAI (2002)Google Scholar
  10. 10.
    Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE TRANS. PAMI 22(8), 852–872 (2000)CrossRefGoogle Scholar
  11. 11.
    Ryoo, M.S., Aggarwal, J.K.: Recognition of Composite Human Activities through Context-Free Grammar Based Representation. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2006)Google Scholar
  12. 12.
    Yamamoto, M., Mitomi, H., Fujiwara, F., Sato, T.: Bayesian Classification of Task-Oriented Actions Based on Stochastic Context-Free Grammar. In: Proc. Int. Conf. on Automatic Face and Gesture Recognition (FGR) (2006)Google Scholar
  13. 13.
    Wang, X., Tieu, K., Grimson, E.: Learning Semantic Scene Models by Trajectory Analysis. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 110–123. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Zhang, Z., Huang, K.Q., Tan, T.N., Wang, L.S.: Trajectory Series Analysis based Event Rule Induction for Visual Surveillance. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2007)Google Scholar
  15. 15.
    Hakeem, A., Shah, M.: Learning, detection and representation of multi-agent events in videos. Artif. Intell. 171(8-9), 586–605 (2007)CrossRefGoogle Scholar
  16. 16.
    Nevatia, R., Zhao, T., Hongeng, S.: Hierarchical Language-based Representation of Events in Video Streams. In: Proc. CVPRW on Event Mining (2003)Google Scholar
  17. 17.
    Allen, J.F., Ferguson, F.: Actions and Events in Interval Temporal Logical. J. Logic and Computation 4(5), 531–579 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Johnston, M.: Unification-based Multimodal Parsing. In: Proc. Conf. on COLING-ACL, pp. 624–630 (1998)Google Scholar
  19. 19.
    Amengual, J.C., Vidal, E.: Efficient Error-Correcting Viterbi Parsing. IEEE TRANS PAMI 20(10), 1109–1116 (1998)CrossRefGoogle Scholar
  20. 20.
    Stolcke, A.: An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21(2), 165–201 (1995)MathSciNetGoogle Scholar
  21. 21.
    Snoek, C.G.M., Worring, M.: Multimedia event-based video indexing using time intervals. IEEE TRANS Multimedia 7(4), 638–647 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Zhang Zhang
    • 1
  • Kaiqi Huang
    • 1
  • Tieniu Tan
    • 1
  1. 1.National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of ScienceBeijingChina

Personalised recommendations