Abstract
This paper presents a probabilistic grammar approach to the recognition of complex events in videos. Firstly, based on the original motion features, a rule induction algorithm is adopted to learn the event rules. Then, a multi-thread parsing (MTP) algorithm is adopted to recognize the complex events involving parallel temporal relation in sub-events, whereas the commonly used parser can only handle the sequential relation. Additionally, a Viterbi-like error recovery strategy is embedded in the parsing process to correct the large time scale errors, such as insertion and deletion errors. Extensive experiments including indoor gymnastic exercises and outdoor traffic events are performed. As supported by experimental results, the MTP algorithm can effectively recognize the complex events due to the strong discriminative representation and the error recovery strategy.
Chapter PDF
Similar content being viewed by others
References
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised Learning of Human Action Categories Using Spatial-TemporalWords. In: Proc. Conf. BMVC (2006)
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. Int. Conf. on Computer Vision (ICCV) (2003)
Laxton, B., Lim, J., Kriegman, D.: Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2007)
Shi, Y., Huang, Y., Minnen, D., Bobick, A., Essa, I.: Propagation networks for recognition of partially ordered sequential action. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2004)
Nguyen, N.T., Phung, D.Q., Venkatesh, S., Bui, H.: Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2005)
Xiang, T., Gong, S.: Beyond Tracking: Modelling Activity and Understanding Behaviour. International Journal of Computer Vision (IJCV) 67(1) (2006)
Min, J., Kasturi, R.: Activity Recognition Based on Multiple Motion Trajectories. In: Proc. Int. Conf. on Pattern Recognition (ICPR), pp. 199–202 (2004)
Minnen, D., Essa, I., Starner, T.: Expectation Grammars: Leveraging High-Level Expectations for Activity Recognition. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 626–632 (2003)
Moore, D., Essa, I.: Recognizing Multitasked Activities from Video Using Stochastic Context-Free Grammar. In: Proc. Conf. AAAI (2002)
Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE TRANS. PAMI 22(8), 852–872 (2000)
Ryoo, M.S., Aggarwal, J.K.: Recognition of Composite Human Activities through Context-Free Grammar Based Representation. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2006)
Yamamoto, M., Mitomi, H., Fujiwara, F., Sato, T.: Bayesian Classification of Task-Oriented Actions Based on Stochastic Context-Free Grammar. In: Proc. Int. Conf. on Automatic Face and Gesture Recognition (FGR) (2006)
Wang, X., Tieu, K., Grimson, E.: Learning Semantic Scene Models by Trajectory Analysis. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 110–123. Springer, Heidelberg (2006)
Zhang, Z., Huang, K.Q., Tan, T.N., Wang, L.S.: Trajectory Series Analysis based Event Rule Induction for Visual Surveillance. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2007)
Hakeem, A., Shah, M.: Learning, detection and representation of multi-agent events in videos. Artif. Intell. 171(8-9), 586–605 (2007)
Nevatia, R., Zhao, T., Hongeng, S.: Hierarchical Language-based Representation of Events in Video Streams. In: Proc. CVPRW on Event Mining (2003)
Allen, J.F., Ferguson, F.: Actions and Events in Interval Temporal Logical. J. Logic and Computation 4(5), 531–579 (1994)
Johnston, M.: Unification-based Multimodal Parsing. In: Proc. Conf. on COLING-ACL, pp. 624–630 (1998)
Amengual, J.C., Vidal, E.: Efficient Error-Correcting Viterbi Parsing. IEEE TRANS PAMI 20(10), 1109–1116 (1998)
Stolcke, A.: An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21(2), 165–201 (1995)
Snoek, C.G.M., Worring, M.: Multimedia event-based video indexing using time intervals. IEEE TRANS Multimedia 7(4), 638–647 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Z., Huang, K., Tan, T. (2008). Multi-thread Parsing for Recognizing Complex Events in Videos. In: Forsyth, D., Torr, P., Zisserman, A. (eds) Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol 5304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88690-7_55
Download citation
DOI: https://doi.org/10.1007/978-3-540-88690-7_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88689-1
Online ISBN: 978-3-540-88690-7
eBook Packages: Computer ScienceComputer Science (R0)