SSPR /SPR 2012: Structural, Syntactic, and Statistical Pattern Recognition pp 474-482 | Cite as
Human Action Recognition in Video by Fusion of Structural and Spatio-temporal Features
Abstract
The problem of human action recognition has received increasing attention in recent years for its importance in many applications. Local representations and in particular STIP descriptors have gained increasing popularity for action recognition. Yet, the main limitation of those approaches is that they do not capture the spatial relationships in the subject performing the action. This paper proposes a novel method based on the fusion of global spatial relationships provided by graph embedding and the local spatio-temporal information of STIP descriptors. Experiments on an action recognition dataset reported in the paper show that recognition accuracy can be significantly improved by combining the structural information with the spatio-temporal features.
Keywords
Graph Graph embedding Human action recognition STIP Markov modelsReferences
- 1.Laptev, I.: On space-time interest points. International Journal of Computer Vision 64(2), 107–123 (2005)MathSciNetCrossRefGoogle Scholar
- 2.Niebles, J., Chen, C.W., Fei-Fei, L.: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 3.Ta, A.-P., Wolf, C., Lavoue, G., Baskurt, A.: Recognizing and localizing individual activities through graph matching, pp. 196–203. IEEE Computer Society, Los Alamitos (2010)Google Scholar
- 4.Borzeshi, E.Z., Xu, R.Y.D., Piccardi, M.: Automatic Human Action Recognition in Videos by Graph Embedding. In: Maino, G., Foresti, G.L. (eds.) ICIAP 2011, Part II. LNCS, vol. 6979, pp. 19–28. Springer, Heidelberg (2011)CrossRefGoogle Scholar
- 5.Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. IEEE Transactions on Computers 22(1), 67–92 (1973)CrossRefGoogle Scholar
- 6.Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3 (2004)Google Scholar
- 7.Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Analysis & Applications 13(1), 113–129 (2010)MathSciNetCrossRefGoogle Scholar
- 8.Neuhaus, M., Bunke, H.: Automatic learning of cost functions for graph edit distance. Information Sciences 177(1), 239–247 (2007)MathSciNetMATHCrossRefGoogle Scholar
- 9.Rieck, K., Laskov, P.: Linear-Time Computation of Similarity Measures for Sequential Data. Journal of Machine Learning Research 9, 23–48 (2007)Google Scholar
- 10.Belkin, M., Niyogi, P.: Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation 15(6), 1373–1396 (2003)MATHCrossRefGoogle Scholar
- 11.Qiu, H., Hancock, E.R.: Clustering and embedding using commute times. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(11), 1873–1890 (2007)CrossRefGoogle Scholar
- 12.Wilson, R.C., Hancock, E.R., Luo, B.: Pattern vectors from algebraic graph theory. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1112–1124 (2005)Google Scholar
- 13.Riesen, K., Neuhaus, M., Bunke, H.: Graph Embedding in Vector Spaces by Means of Prototype Selection. In: Escolano, F., Vento, M. (eds.) GbRPR. LNCS, vol. 4538, pp. 383–393. Springer, Heidelberg (2007)CrossRefGoogle Scholar
- 14.Hjaltason, G.R., Samet, H.: Properties of embedding methods for similarity searching in metric spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(5), 530–549 (2003)CrossRefGoogle Scholar
- 15.Borzeshi, E.Z., Piccardi, M., Xu, R.Y.D.: A discriminative prototype selection approach for graph embedding in human action recognition. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1295–1301. IEEE (2011)Google Scholar
- 16.Riesen, K., Bunke, H.: Graph classification by means of Lipschitz embedding. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39(6), 1472–1483 (2009)CrossRefGoogle Scholar
- 17.Chen, T.P., Haussecker, H., Bovyrin, A., Belenov, R., Rodyushkin, K., Kuranov, A., Eruhimov, V.: Computer vision workload analysis: case study of video surveillance systems. Intel Technology Journal 9(2), 109–118 (2005)Google Scholar
- 18.Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
- 19.Vedaldi, A., Fulkerson, B.: Vlfeat: An open and portable library of computer vision algorithms. In: Proceedings of the International Conference on Multimedia, pp. 1469–1472. ACM (2010)Google Scholar
- 20.Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)Google Scholar
- 21.Singh, S., Velastin, S.A., Ragheb, H.: Muhavi: A multicamera human action video dataset for the evaluation of action recognition methods. In: 2010 Seventh IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 48–55. IEEE (2010)Google Scholar
- 22.Concha, O.P., Xu, D., Yi, R., Moghaddam, Z., Piccardi, M.: Hmm-mio: an enhanced hidden markov model for action recognition. In: 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 62–69. IEEE (2011)Google Scholar
- 23.Rabiner, L., Juang, B.: An introduction to hidden markov models. IEEE ASSP Magazine 3(1), 4–16 (1986)CrossRefGoogle Scholar
- 24.Liu, C., Rubin, D.B.: Ml estimation of the t distribution using em and its extensions, ecm and ecme. Statistica Sinica 5(1), 19–39 (1995)MathSciNetMATHGoogle Scholar
- 25.Archambeau, C., Delannay, N., Verleysen, M.: Mixtures of robust probabilistic principal component analyzers. Neurocomputing 71(7), 1274–1282 (2008)CrossRefGoogle Scholar
- 26.Gao, Z., Chen, M., Hauptmann, A., Cai, A.: Comparing Evaluation Protocols on the KTH Dataset. In: Salah, A.A., Gevers, T., Sebe, N., Vinciarelli, A. (eds.) HBU 2010. LNCS, vol. 6219, pp. 88–100. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 27.Guo, K., Ishwar, P., Konrad, J.: Action recognition using sparse representation on covariance manifolds of optical flow. In: 2010 Seventh IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 188–195. IEEE (2010)Google Scholar
- 28.Rother, C., Kolmogorov, V., Blake, A.: Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics (TOG) 23, 309–314 (2004)CrossRefGoogle Scholar