Multimedia Tools and Applications

, Volume 76, Issue 6, pp 7575–7594 | Cite as

Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors

Article

Abstract

In this paper, we present a novel approach for human action and gesture recognition using dual-complementary tensors. In particular, the proposed method constructs a compact and yet discriminative representation by normalizing the input video volume into dual tensors. One tensor is obtained from the raw video volume data and the other one is obtained from the histogram of oriented gradients (HOG) features. Each tensor is converted to factored matrices and the similarity between factored matrices is evaluated using canonical correlation analysis (CCA). We, furthermore, propose an information fusion method to combine the resulting similarity scores. The proposed fusion strategy can effectively enhance discriminability between different action categories and lead to better recognition accuracy. We have conducted several experiments on two publicly available databases (UCF sports and Cambridge-Gesture). The results show that our proposed method achieves comparable recognition accuracy as the state-of-the-art methods.

Keywords

Action recognition Gesture recognition Canonical correlation analysis Tensor Fusing factored matrices 

References

  1. 1.
    Atmosukarto I, Ahuja N, Ghanem B (2015) Action recognition using discriminative structured trajectory groups. In: IEEE winter conference on applications of computer vision. IEEE, pp 899–906Google Scholar
  2. 2.
    Baraldi L, Paci F, Serra G, Benini L, Cucchiara R (2014) Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In: IEEE conference on computer vision and pattern recognition workshops. IEEE, pp 702–707Google Scholar
  3. 3.
    Cristani M, Raghavendra R, Del Bue A, Murino V (2013) Human behavior analysis in video surveillance: a social signal processing perspective. Neurocomputing 100:86–97CrossRefGoogle Scholar
  4. 4.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition, vol 1. IEEE, pp 886–893Google Scholar
  5. 5.
    De Geest R, Tuytelaars T (2014) Dense interest features for video processing. In: IEEE international conference on image processing. IEEE, pp 5771–5775Google Scholar
  6. 6.
    Deng X, Liu X, Song M, Cheng J, Bu J, Chen C (2013) Lf-eme: local features with elastic manifold embedding for human action recognition. Neurocomputing 99:144–153CrossRefGoogle Scholar
  7. 7.
    Dollár P Piotr’s computer vision matlab toolbox (PMT). http://vision.ucsd.edu/pdollar/toolbox/doc/index.html
  8. 8.
    Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance. IEEE, pp 65–72Google Scholar
  9. 9.
    Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: The British machine vision conference, vol 2, p 5Google Scholar
  10. 10.
    Dyana A, Das S (2010) Mst-css (multi-spectro-temporal curvature scale space), a novel spatio-temporal representation for content-based video retrieval. IEEE Trans Circuits Syst for Video Technol 20(8):1080–1094CrossRefGoogle Scholar
  11. 11.
    Fernando B, Gavves E, Oramas J, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: IEEE conference on computer vision and pattern recognition, vol 2, p 8Google Scholar
  12. 12.
    Guha T, Ward RK (2012) Learning sparse representations for human action recognition. IEEE Trans Pattern Anal Mach Intell 34(8):1576–1588CrossRefGoogle Scholar
  13. 13.
    Harandi MT, Sanderson C, Shirazi S, Lovell BC (2013) Kernel analysis on grassmann manifolds for action recognition. Pattern Recog Lett 34(15):1906–1915CrossRefGoogle Scholar
  14. 14.
    Iosifidis A, Tefas A, Pitas I (2013) Minimum class variance extreme learning machine for human action recognition. IEEE Trans Circuits Syst Video Technol 23 (11):1968–1979CrossRefGoogle Scholar
  15. 15.
    Jones S, Shao L (2013) Content-based retrieval of human actions from realistic video databases. Inf Sci 236:56–65CrossRefGoogle Scholar
  16. 16.
    Jones S, Shao L, Zhang J, Liu Y (2012) Relevance feedback for real-world human action retrieval. Pattern Recog Lett 33(4):446–452CrossRefGoogle Scholar
  17. 17.
    Kim TK, Cipolla R (2009) Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans Pattern Anal Mach Intell 31(8):1415–1428CrossRefGoogle Scholar
  18. 18.
    Kim TK, Wong KYK, Cipolla R (2007) Tensor canonical correlation analysis for action classification. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8Google Scholar
  19. 19.
    Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: IEEE conference on computer vision and pattern recognition. IEEE, pp. 2046–2053Google Scholar
  21. 21.
    Lai K, Konrad J, Ishwar P (2012) A gesture-driven computer interface using kinect. In: IEEE Southwest symposium on image analysis and interpretation. IEEE, pp 185–188Google Scholar
  22. 22.
    Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 3361–3368Google Scholar
  23. 23.
    Li H, Tang J, Wu S, Zhang Y, Lin S (2010) Automatic detection and analysis of player action in moving background sports video sequences. IEEE Trans Circuits Syst Video Technol 20(3):351–364CrossRefGoogle Scholar
  24. 24.
    Lin W, Sun MT, Poovendran R, Zhang Z (2010) Group event detection with a varying number of group members for video surveillance. IEEE Trans Circuits Syst Video Technol 20(8):1057–1067CrossRefGoogle Scholar
  25. 25.
    Liu AA, Su YT, Jia PP, Gao Z, Hao T, Yang ZX (2015) Multipe/single-view human action recognition via part-induced multitask structural learning. IEEE Trans Cybern 45(6):1194–1208CrossRefGoogle Scholar
  26. 26.
    Lui YM (2012) Human gesture recognition on product manifolds. J Mach Learn Res 13(1):3297–3321MathSciNetMATHGoogle Scholar
  27. 27.
    Lui YM (2012) Tangent bundles on special manifolds for action recognition. IEEE Trans Circuits Syst Video Technol 22(6):930–942CrossRefGoogle Scholar
  28. 28.
    Lui YM, Beveridge JR (2011) Tangent bundle for human action recognition. In: IEEE international conference on automatic face & gesture recognition and workshops. IEEE, pp 97–102Google Scholar
  29. 29.
    Lui YM, Beveridge JR, Kirby M (2010) Action classification on product manifolds. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 833–839Google Scholar
  30. 30.
    Ma AJ, Yuen PC, Zou WW, Lai JH (2013) Supervised spatio-temporal neighborhood topology learning for action recognition. IEEE Trans Circuits Syst Video Technol 23(8):1447–1460CrossRefGoogle Scholar
  31. 31.
    Minhas R, Mohammed AA, Wu QJ (2012) Incremental learning in human action recognition based on snippets. IEEE Trans Circuits Syst Video Technol 22 (11):1529–1541CrossRefGoogle Scholar
  32. 32.
    Nagendar G, Bandiatmakuri SG, Tandarpally MG, Jawahar C (2013) Action recognition using canonical correlation kernels. In: Asian conference on computer vision. Springer, pp 479–492Google Scholar
  33. 33.
    O’Hara S, Draper BA (2012) Scalable action recognition with a subspace forest. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1210–1217Google Scholar
  34. 34.
    Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1242–1249Google Scholar
  35. 35.
    Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8Google Scholar
  36. 36.
    Ross A, Jain A (2003) Information fusion in biometrics. Pattern Recog Lett 24(13):2115–2125CrossRefGoogle Scholar
  37. 37.
    Rougier C, Meunier J, St-Arnaud A, Rousseau J (2011) Robust video surveillance for fall detection based on human shape deformation. IEEE Trans Circuits Syst Video Technol 21(5):611–622CrossRefGoogle Scholar
  38. 38.
    Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 1234–1241Google Scholar
  39. 39.
    Scherer S, Glodek M, Layher G, Schels M, Schmidt M, Brosch T, Tschechne S, Schwenker F, Neumann H, Palm G (2012) A generic framework for the inference of user states in human computer interaction. J Multimodal User Interfaces 6(3–4):117–141CrossRefGoogle Scholar
  40. 40.
    Shao L, Zhen X, Tao D, Li X (2014) Spatio-temporal laplacian pyramid coding for action recognition. IEEE Trans Cybern 44(6):817–827CrossRefGoogle Scholar
  41. 41.
    Shi F, Petriu E, Laganiere R (2013) Sampling strategies for real-time action recognition. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 2595–2602Google Scholar
  42. 42.
    Song Y, Demirdjian D, Davis R (2012) Continuous body and hand gesture recognition for natural human-computer interaction. ACM Trans Interact Intell Syst 2 (1):5CrossRefGoogle Scholar
  43. 43.
    Song Y, Zheng YT, Tang S, Zhou X, Zhang Y, Lin S, Chua TS (2011) Localized multiple kernel learning for realistic human action recognition in videos. IEEE Trans Circuits Syst Video Technol 21(9):1193–1202CrossRefGoogle Scholar
  44. 44.
    Sun D, Roth S, Black MJ (2010) Secrets of optical flow estimation and their principles. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 2432–2439Google Scholar
  45. 45.
    Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 3169–3176Google Scholar
  46. 46.
    Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79MathSciNetCrossRefGoogle Scholar
  47. 47.
    Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 489–496Google Scholar
  48. 48.
    Wu X, Xu D, Duan L, Luo J, Jia Y (2013) Action recognition using multilevel features and latent structural svm. IEEE Trans Circuits Syst Video Technol 23(8):1422–1431CrossRefGoogle Scholar
  49. 49.
    Yang M, Dai D, Shen L, Van Gool L (2014) Latent dictionary learning for sparse representation based classification. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 4138–4145Google Scholar
  50. 50.
    Yuan C, Hu W, Tian G, Yang S, Wang H (2013) Multi-task sparse learning with beta process prior for action recognition. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 423–429Google Scholar
  51. 51.
    Zhen X, Shao L, Tao D, Li X (2013) Embedding motion and structure features for action recognition. IEEE Trans Circuits Syst Video Technol 23(7):1182–1190CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Computer Science and Information EngineeringNational Chung Cheng UniversityChiayiRepublic of China

Personalised recommendations