Directional Space-Time Oriented Gradients for 3D Visual Pattern Analysis

  • Ehsan Norouznezhad
  • Mehrtash T. Harandi
  • Abbas Bigdeli
  • Mahsa Baktash
  • Adam Postula
  • Brian C. Lovell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7574)


Various visual tasks such as the recognition of human actions, gestures, facial expressions, and classification of dynamic textures require modeling and the representation of spatio-temporal information. In this paper, we propose representing space-time patterns using directional spatio-temporal oriented gradients. In the proposed approach, a 3D video patch is represented by a histogram of oriented gradients over nine symmetric spatio-temporal planes. Video comparison is achieved through a positive definite similarity kernel that is learnt by multiple kernel learning. A rich spatio-temporal descriptor with a simple trade-off between discriminatory power and invariance properties is thereby obtained. To evaluate the proposed approach, we consider three challenging visual recognition tasks, namely the classification of dynamic textures, human gestures and human actions. Our evaluations indicate that the proposed approach attains significant classification improvements in recognition accuracy in comparison to state-of-the-art methods such as LBP-TOP, 3D-SIFT, HOG3D, tensor canonical correlation analysis, and dynamical fractal analysis.


Action Recognition Multiple Kernel Learn Dynamic Texture Multiple Instance Learning Spatial Pyramid Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition (2009)Google Scholar
  2. 2.
    de Campos, T., Barnard, M., Mikolajczyk, K., Kittler, J., Yan, F., Christmas, W., Windridge, D.: An evaluation of bags-of-words and spatio-temporal shapes for action recognition. In: WACV, pp. 344–351 (2011)Google Scholar
  3. 3.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: International Conference on Multimedia, pp. 357–360 (2007)Google Scholar
  4. 4.
    Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC, pp. 995–1004 (2008)Google Scholar
  5. 5.
    Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. PAMI 29(6), 915–928 (2007)CrossRefGoogle Scholar
  6. 6.
    Willems, G., Tuytelaars, T., Van Gool, L.: An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  7. 7.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)Google Scholar
  8. 8.
    Schindler, K., Van Gool, L.: Action snippets: How many frames does human action recognition require? In: CVPR, pp. 1–8 (2008)Google Scholar
  9. 9.
    Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: ICCV, pp. 1–8 (2007)Google Scholar
  10. 10.
    Mattivi, R., Shao, L.: Human Action Recognition Using LBP-TOP as Sparse Spatio-Temporal Feature Descriptor. In: Jiang, X., Petkov, N. (eds.) CAIP 2009. LNCS, vol. 5702, pp. 740–747. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    Chen, J., Shan, S., He, C., Zhao, G., Pietikainen, M., Chen, X., Gao, W.: Wld: A robust local image descriptor. PAMI 32(9), 1705–1720 (2010)CrossRefGoogle Scholar
  12. 12.
    Ojansivu, V., Heikkilä, J.: Blur Insensitive Texture Classification Using Local Phase Quantization. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) ICISP 2008. LNCS, vol. 5099, pp. 236–243. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Päivärinta, J., Rahtu, E., Heikkilä, J.: Volume Local Phase Quantization for Blur-Insensitive Dynamic Texture Classification. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 360–369. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  14. 14.
    Laptev, I.: On space-time interest points. IJCV 64(2), 107–123 (2005)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR, vol. 2, pp. 524–531 (2005)Google Scholar
  16. 16.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  17. 17.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR, vol. 2, pp. 2169–2178 (2006)Google Scholar
  18. 18.
    Choi, J., Jeon, W., Lee, S.: Spatio-temporal pyramid matching for sports videos. In: ACM Int. Conf. on Multimedia Information Retrieval, pp. 291–297 (2008)Google Scholar
  19. 19.
    Chen, Y., Garcia, E.K., Gupta, M.R., Rahimi, A., Cazzanti, L.: Similarity-based classification: Concepts and algorithms. JMLR 10, 747–776 (2009)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. JMLR 9, 2491–2521 (2008)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Kim, T., Cipolla, R.: Canonical correlation analysis of video volume tensors for action categorization and detection. PAMI 31(8), 1415–1428 (2009)CrossRefGoogle Scholar
  22. 22.
    Xu, Y., Quan, Y., Ling, H., Ji, H.: Dynamic texture classification using dynamic fractal analysis. In: ICCV (2011)Google Scholar
  23. 23.
    Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. PAMI 32(2), 288–303 (2010)CrossRefGoogle Scholar
  24. 24.
    Doretto, G., Chiuso, A., Wu, Y., Soatto, S.: Dynamic textures. IJCV 51(2), 91–109 (2003)zbMATHCrossRefGoogle Scholar
  25. 25.
    Ghanem, B., Ahuja, N.: Maximum Margin Distance Learning for Dynamic Texture Recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 223–236. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  26. 26.
    Ravichandran, A., Chaudhry, R., Vidal, R.: View-invariant dynamic texture recognition using a bag of dynamical systems. In: CVPR, pp. 1651–1657 (2009)Google Scholar
  27. 27.
    Derpanis, K., Wildes, R.: Dynamic texture recognition based on distributions of spacetime oriented structure. In: CVPR, pp. 191–198 (2010)Google Scholar
  28. 28.
    Péteri, R., Fazekas, S., Huiskes, M.: Dyntex: A comprehensive database of dynamic textures. Pattern Recognition Letters 31(12), 1627–1632 (2010)CrossRefGoogle Scholar
  29. 29.
    Kim, T., Kittler, J., Cipolla, R.: Discriminative learning and recognition of image set classes using canonical correlations. PAMI 29(6), 1005–1018 (2007)CrossRefGoogle Scholar
  30. 30.
    Lui, Y., Beveridge, J., Kirby, M.: Action classification on product manifolds. In: CVPR, pp. 833–839 (2010)Google Scholar
  31. 31.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR, vol. 3, pp. 32–36 (2004)Google Scholar
  32. 32.
    Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29(12), 2247–2253 (2007)CrossRefGoogle Scholar
  33. 33.
    Le, Q., Zou, W., Yeung, S., Ng, A.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR, pp. 3361–3368 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ehsan Norouznezhad
    • 1
    • 2
  • Mehrtash T. Harandi
    • 1
    • 2
  • Abbas Bigdeli
    • 1
    • 2
  • Mahsa Baktash
    • 1
    • 2
  • Adam Postula
    • 1
    • 2
  • Brian C. Lovell
    • 1
    • 2
  1. 1.NICTASt. LuciaAustralia
  2. 2.School of ITEEThe University of QueenslandAustralia

Personalised recommendations