Abstract
A novel method based on interval temporal syntactic model was proposed to recognize human activities in video flow. The method is composed of two parts: feature extract and activities recognition. Trajectory shape descriptor, speeded up robust features (SURF) and histograms of optical flow (HOF) were proposed to represent human activities, which provide more exhaustive information to describe human activities on shape, structure and motion. In the process of recognition, a probabilistic latent semantic analysis model (PLSA) was used to recognize sample activities at the first step. Then, an interval temporal syntactic model, which combines the syntactic model with the interval algebra to model the temporal dependencies of activities explicitly, was introduced to recognize the complex activities with a time relationship. Experiments results show the effectiveness of the proposed method in comparison with other state-of-the-art methods on the public databases for the recognition of complex activities.
Similar content being viewed by others
References
KUMARI S, MITRA S K. Human action recognition using DFT[C]// IEEE National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics. Hubli: IEEE, 2011: 239–242.
LAPTEV I. On space-time interest points [J]. International Journal of Computer Vision, 2005, 1(2/3): 432–439.
DOLLAR P, RABAUD V, COTTRELL G, BELONGIE S. Behavior recognition via sparse spatio-temporal features [C]// IEEE International Workshop on Visual Surveillance & Performance Evaluation of Tracking & Surveillance. Beijing: IEEE, 2005: 65–72.
LAPTEV I, MARSZALEK M, SCHMID C, ROZENFELD B. Learning realistic human actions from movies [C]// Conference on Computer Vision and Pattern Recognition. Anchorage: IEEE, 2008: 1–8.
SCHULDT C, LAPTEV I, Caputo B. Recognizing human actions: A local SVM approach [C]// International Conference on Pattern Recognition. Cambridge: IEEE, 2004: 32–36.
SCOVANNER P, ALI S, SHAH M. A 3-dimensional sift descriptor and its application to action recognition [C]// Proceedings of the 15th International Conference on Multimedia. NY: ACM, 2007: 357–360.
KLASER A, MARSZALEK M, SCHMID C. A spatio-temporal descriptor based on 3d-gradients [C]// Proceedings of the 19th British Machine Vision Conference. Leeds: Springer Verlag, 2008: 275: 1–10.
ERIC C, VINCENT R, ANDREW Z. Match-time covariance for descriptor [C]// Proceedings of the 24th British Machine Vision Conference. Bristol: Springer Verlag, 2013: 270–281.
BREGONZIO M, LI J, GONG S, TAO X. Discriminative topics modeling for action feature selection and recognition [C]// Proceedings of the 21th British Machine Vision Conference. Aberystwyth: Springer, 2010: 1–11.
DAI Peng, DI Hui-jun, DONG Li-geng, TAO Lin-mi. Group interaction analysis in dynamic context [J]. IEEE Transactions On Systems, Man and Cybernetics—Part B, 2009, 39(1): 34–42.
ZHANG Y, ZHANG Y, SWEARS E, LARIOS N, WANG ZH, JI Q. Modeling temporal interactions with interval temporal bayesian networks for complex activity recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(10): 2468–2483.
DAMEN D, HOGG D. Recognizing linked events: Searching the space of feasible explanations [C]// Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 927–934.
ZHANG J, GONG S. Action categorization by structural probabilistic latent semantic analysis [J]. Computer Vision and Image Understanding, 2010, 114(8): 857–864.
ARSIC D, SCHULLER B. Real time person tracking and behavior interpretation in multi-camera scenarios applying homography and coupled HMMs [C]// International Conference on Analysis of Verbal and Nonverbal Communication and Enactment. Budapest, Hungary: Springer-Verlag, 2011: 1–18.
SUGIMOTO M, ZIN T T, TORIU T, NAKAJIMA S. Robust rule-based method for human activity recognition [J]. International Journal of Computer Science and Network Security, 2011, 11(4): 37–43.
FERNANDEZ-CABALLERO A, CASTILLO J C, RODRIGUEZ SANCHEZ J M. Human activity monitoring by local and global finite state machines [J]. Expert Systems with Applications, 2012, 39(8): 6982–6993.
HUANG Jin-xia. Complex human activity recognition based on SCFG [D]. Changsha: Central South University, 2013. (in chinese)
XIA Li-min, SHI Xiao-ting, TU Hong-bin. An approach for complex activity recognition by key frames [J]. Journal of Central South University, 2015, 22(9): 3450–3457.
SHI J, TOMASI C. Good features to track [C]// Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 1994: 593–600.
XIA Li-min, YANG Bao-juan, TU Hong-bing. Recognition of suspicious behavior using case-based reasoning [J]. Journal of Central South University, 2015, 22(1): 241–250.
ALLEN J F. Maintaining knowledge about temporal intervals [J]. Communications of the ACM, 1983, 26(11): 832–843.
GRUNWALD P. A minimum description length approach to grammar inference [C]// Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing, Montreal: Springer Verlag, 1996: 203–216.
ZHANG Z, TAN T, HNANG K. An extended grammar system for learning and recognizing complex visual events [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 240–255.
YouTube action dataset [EB/OL]. 2011-11-01. http://www.cs.ucf. edu/liujg/ YouTubeActiondataset.html.
UCF sports action dataset [EB/OL]. 2012-02-01. http://vision.eecs. ucf.edu/datasetsActions.html.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Project(50808025) supported by the National Natural Science Foundation of China; Project(20090162110057) supported by the Doctoral Fund of Ministry of Education, China
Rights and permissions
About this article
Cite this article
Xia, Lm., Han, F. & Wang, J. Complex human activities recognition using interval temporal syntactic model. J. Cent. South Univ. 23, 2578–2586 (2016). https://doi.org/10.1007/s11771-016-3319-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11771-016-3319-2