Multimedia Systems

, Volume 22, Issue 3, pp 343–353 | Cite as

Tucker decomposition-based tensor learning for human action recognition

  • Jianguang Zhang
  • Yahong HanEmail author
  • Jianmin Jiang
Regular Paper


The spatial information is the important cue for human action recognition. Different from the vector representation, the spatial structure of human action in the still images can be preserved by the tensor representation. This paper proposes a robust human action recognition algorithm by tensor representation and Tucker decomposition. In this method, the still image containing human action is represented by a tensor descriptor (Histograms of Oriented Gradients). This representation preserves the spatial information inside the human action. Based on this representation, the unknown tensor parameter is decomposed according to the Tucker tensor decomposition at first, and then the optimization problems can be solved using the alternative optimization method, where at each iteration, the tensor descriptor is projected along one order and the parameter along the corresponding order can be estimated by solving the Ridge Regression problem. The estimated tensor parameter is more discriminative because of effectively using the spacial information along each order. Experiments are conducted using action images from three publicly available databases. Experimental results demonstrate that our method outperforms other methods.


Tucker decomposition Histograms of oriented gradients Action recognition 



This work was partly supported by the National Program on the Key Basic Research Project (under Grant 2013CB329301), NSFC (under Grant 61202166, 61472276), the Major Project of National Social Science Fund (under Grant 14ZDB153) and Doctoral Fund of Ministry of Education of China (under Grant 20120032120042).


  1. 1.
    Wang, M., Hua, X.-S.: Active learning in multimedia annotation and retrieval: a survey. ACM Trans Intell Syst Technol 2(2), 10 (2011)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Wu, F., Yuan, Y., Rui, Y., Yan, S., Zhuang, Y.: Annotating web images using nova: non-convex group sparsity. In: Proceedings of the 20th ACM international conference on Multimedia, pp. 509–518. ACM, (2012)Google Scholar
  3. 3.
    Wang, M., Gao, Y., Lu, K., Rui, Y.: View-based discriminative probabilistic modeling for 3d object retrieval and recognition. IEEE Trans Image Process 22(4), 1395–1407 (2013)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Wu, F., Tan, X., Yang, Y., Tao, D., Tang, S., Zhuang, Y.: Supervised nonnegative tensor factorization with maximum-margin constraint. In: 27th AAAI Conference on Artificial Intelligence, AAAI, pp. 962–968, (2013)Google Scholar
  5. 5.
    Guo, W., Kotsia, I., Patras, I.: Tensor learning for regression. IEEE Trans Image Process 21(2), 816–827 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, CVPR, vol. 1, pp. 886–893. IEEE, (2005)Google Scholar
  7. 7.
    Fischer, S., Šroubek, F., Perrinet, L., Redondo, R., Cristóbal, G.: Self-invertible 2d log-gabor wavelets. Int J Comput Vis 75(2), 231–246 (2007)CrossRefGoogle Scholar
  8. 8.
    Hatun, K., Duygulu, P.: Pose sentences: a new representation for action recognition using sequence of pose words. In: 19th International Conference on Pattern Recognition. ICPR, pp. 1–4. IEEE, (2008)Google Scholar
  9. 9.
    Ikizler-Cinbis, N., Gokberk Cinbis, R., Sclaroff, S.: Learning actions from the web. In: IEEE 12th International Conference on Computer Vision, pp. 995–1002. IEEE, (2009)Google Scholar
  10. 10.
    Vo, T., Tran, D., Ma, W., Nguyen, K.: Improved hog descriptors in image classification with cp decomposition. In: Neural Information Processing, pp. 384–391. Springer, (2013)Google Scholar
  11. 11.
    Shakhnarovich, G., Indyk, P., Darrell, T.: Nearest-neighbor methods in learning and vision: theory and practice. (2006)Google Scholar
  12. 12.
    Cortes, C., Vapnik, V.: Support vector machine. In: Machine learning, vol. 20, pp. 273–297. Springer, (1995)Google Scholar
  13. 13.
    Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Vasilescu M.A.O., Terzopoulos, D.: Multilinear subspace analysis of image ensembles. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. II-93. IEEE, (2003)Google Scholar
  15. 15.
    Pang, Y., Li, X., Yuan, Y.: Robust tensor analysis with l1-norm. IEEE. Trans. Circuits. Syst. Video. Technol. 20(2):172–178 (2010)Google Scholar
  16. 16.
    Cai, D., He, X., Han, J.: Subspace learning based on tensor analysis. Department of Computer Science Technology Report No. 2572, University of Illinois at Urbana-Champaign (UIUCDCS-R-2005-2572), (2005)Google Scholar
  17. 17.
    Yan, S., Xu, D., Yang, Q., Zhang, L., Tang, X., Zhang, H.-J.: Multilinear discriminant analysis for face recognition. IEEE Trans Image Process 16(1), 212–220 (2007)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Zhigang, M., Yi, Y., Feiping, N., Nicu, S.: Thinking of images as what they are: compound matrix regression for image classification. In: Proceedings of Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI), (2013)Google Scholar
  19. 19.
    Tao, D., Li, X, Hu, W., Maybank, S., Wu, X.: Supervised tensor learning. In: Fifth IEEE International Conference on Data Mining (ICDM), pp. 450–457. IEEE, (2005)Google Scholar
  20. 20.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev 51(3), 455–500 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Tichavsky, P., Phan, A.H., Koldovsky, Z.: Cramér-rao-induced bounds for candecomp/parafac tensor decomposition. IEEE Trans Signal Process 61(8), 1986–1997 (2013)CrossRefGoogle Scholar
  22. 22.
    Kotsia, I., Patras, I.: Support tucker machines. In: Computer Vision and Pattern Recognition (CVPR), pp. 633–640. IEEE, (2011)Google Scholar
  23. 23.
    Fung, G., Mangasarian, O.L.: Multicategory proximal support vector machine classifiers. Mach Learn 59(1–2), 77–97 (2005)CrossRefzbMATHGoogle Scholar
  24. 24.
    Basak, D., Pal, S., Patranabis, D.C.: Support vector regression. Neural Inform Process Lett Rev 11(10), 203–224 (2007)Google Scholar
  25. 25.
    Martin, D.R., Fowlkes, C.C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans Pattern Anal Mach Intell 26(5), 530–549 (2004)CrossRefGoogle Scholar
  26. 26.
    Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans Pattern Anal Mach Intell 31(10), 1775–1789 (2009)CrossRefGoogle Scholar
  27. 27.
    Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9–16. IEEE, (2010)Google Scholar
  28. 28.
    Ikizler, N., Cinbis, R.G., Pehlivan, S., Duygulu, P.: Recognizing actions from still images. In: 19th International Conference on Pattern Recognition. ICPR 2008, pp. 1–4. IEEE, (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyTianjin UniversityTianjinChina
  2. 2.Department of Mathematics and Computer ScienceHengshui UniversityHengshuiChina
  3. 3.Tianjin Key Laboratory of Cognitive Computing and ApplicationTianjin UniversityTianjinChina
  4. 4.School of Computer Science and Software EngineeringShenzhen UniversityShenzhenChina

Personalised recommendations