Abstract
Various visual tasks such as the recognition of human actions, gestures, facial expressions, and classification of dynamic textures require modeling and the representation of spatio-temporal information. In this paper, we propose representing space-time patterns using directional spatio-temporal oriented gradients. In the proposed approach, a 3D video patch is represented by a histogram of oriented gradients over nine symmetric spatio-temporal planes. Video comparison is achieved through a positive definite similarity kernel that is learnt by multiple kernel learning. A rich spatio-temporal descriptor with a simple trade-off between discriminatory power and invariance properties is thereby obtained. To evaluate the proposed approach, we consider three challenging visual recognition tasks, namely the classification of dynamic textures, human gestures and human actions. Our evaluations indicate that the proposed approach attains significant classification improvements in recognition accuracy in comparison to state-of-the-art methods such as LBP-TOP, 3D-SIFT, HOG3D, tensor canonical correlation analysis, and dynamical fractal analysis.
Chapter PDF
Similar content being viewed by others
Keywords
- Action Recognition
- Multiple Kernel Learn
- Dynamic Texture
- Multiple Instance Learning
- Spatial Pyramid Match
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition (2009)
de Campos, T., Barnard, M., Mikolajczyk, K., Kittler, J., Yan, F., Christmas, W., Windridge, D.: An evaluation of bags-of-words and spatio-temporal shapes for action recognition. In: WACV, pp. 344–351 (2011)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: International Conference on Multimedia, pp. 357–360 (2007)
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC, pp. 995–1004 (2008)
Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. PAMI 29(6), 915–928 (2007)
Willems, G., Tuytelaars, T., Van Gool, L.: An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Schindler, K., Van Gool, L.: Action snippets: How many frames does human action recognition require? In: CVPR, pp. 1–8 (2008)
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: ICCV, pp. 1–8 (2007)
Mattivi, R., Shao, L.: Human Action Recognition Using LBP-TOP as Sparse Spatio-Temporal Feature Descriptor. In: Jiang, X., Petkov, N. (eds.) CAIP 2009. LNCS, vol. 5702, pp. 740–747. Springer, Heidelberg (2009)
Chen, J., Shan, S., He, C., Zhao, G., Pietikainen, M., Chen, X., Gao, W.: Wld: A robust local image descriptor. PAMI 32(9), 1705–1720 (2010)
Ojansivu, V., Heikkilä, J.: Blur Insensitive Texture Classification Using Local Phase Quantization. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) ICISP 2008. LNCS, vol. 5099, pp. 236–243. Springer, Heidelberg (2008)
Päivärinta, J., Rahtu, E., Heikkilä, J.: Volume Local Phase Quantization for Blur-Insensitive Dynamic Texture Classification. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 360–369. Springer, Heidelberg (2011)
Laptev, I.: On space-time interest points. IJCV 64(2), 107–123 (2005)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR, vol. 2, pp. 524–531 (2005)
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR, vol. 2, pp. 2169–2178 (2006)
Choi, J., Jeon, W., Lee, S.: Spatio-temporal pyramid matching for sports videos. In: ACM Int. Conf. on Multimedia Information Retrieval, pp. 291–297 (2008)
Chen, Y., Garcia, E.K., Gupta, M.R., Rahimi, A., Cazzanti, L.: Similarity-based classification: Concepts and algorithms. JMLR 10, 747–776 (2009)
Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. JMLR 9, 2491–2521 (2008)
Kim, T., Cipolla, R.: Canonical correlation analysis of video volume tensors for action categorization and detection. PAMI 31(8), 1415–1428 (2009)
Xu, Y., Quan, Y., Ling, H., Ji, H.: Dynamic texture classification using dynamic fractal analysis. In: ICCV (2011)
Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. PAMI 32(2), 288–303 (2010)
Doretto, G., Chiuso, A., Wu, Y., Soatto, S.: Dynamic textures. IJCV 51(2), 91–109 (2003)
Ghanem, B., Ahuja, N.: Maximum Margin Distance Learning for Dynamic Texture Recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 223–236. Springer, Heidelberg (2010)
Ravichandran, A., Chaudhry, R., Vidal, R.: View-invariant dynamic texture recognition using a bag of dynamical systems. In: CVPR, pp. 1651–1657 (2009)
Derpanis, K., Wildes, R.: Dynamic texture recognition based on distributions of spacetime oriented structure. In: CVPR, pp. 191–198 (2010)
Péteri, R., Fazekas, S., Huiskes, M.: Dyntex: A comprehensive database of dynamic textures. Pattern Recognition Letters 31(12), 1627–1632 (2010)
Kim, T., Kittler, J., Cipolla, R.: Discriminative learning and recognition of image set classes using canonical correlations. PAMI 29(6), 1005–1018 (2007)
Lui, Y., Beveridge, J., Kirby, M.: Action classification on product manifolds. In: CVPR, pp. 833–839 (2010)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR, vol. 3, pp. 32–36 (2004)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29(12), 2247–2253 (2007)
Le, Q., Zou, W., Yeung, S., Ng, A.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR, pp. 3361–3368 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Norouznezhad, E., Harandi, M.T., Bigdeli, A., Baktash, M., Postula, A., Lovell, B.C. (2012). Directional Space-Time Oriented Gradients for 3D Visual Pattern Analysis. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33712-3_53
Download citation
DOI: https://doi.org/10.1007/978-3-642-33712-3_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33711-6
Online ISBN: 978-3-642-33712-3
eBook Packages: Computer ScienceComputer Science (R0)