Multimedia Tools and Applications

, Volume 76, Issue 8, pp 10635–10652 | Cite as

A spatial-temporal iterative tensor decomposition technique for action and gesture recognition

  • Yuting Su
  • Haiyi Wang
  • Peiguang Jing
  • Chuanzhong Xu
Article

Abstract

Classification of video sequences is an important task with many applications in video search and action recognition. As opposed to some traditional approaches that transform original video sequences into forms of visual feature vectors, tensor-based methods have been proposed for classifying video sequences with natural representation of original data. However, one obvious limitation of tensor-based methods is that the input video sequences are often required to be preprocessed with a unified length of time. In this paper, we propose a technique for handling classification of video sequences in unequal length of time, namely Spatial-Temporal Iterative Tensor Decomposition (S-TITD) for uniform length. The proposed framework contains two primary steps. We first represent original video sequences as a third-order tensor and perform Tucker-2 decomposition to obtain the reduced-dimension core tensor. Then we encode the third order of core tensor to a uniform length by adaptively selecting the most informative slices. Notably, the above two steps are embedded into a dynamic learning framework to guarantee the proposed method has the ability of updating results over time. We conduct a series of experiments on three public datasets in gesture and action recognition, and the experimental results show that the proposed S-TITD approach achieves better performances than the state-of-the-art algorithms.

Keywords

Gesture recognition Tensor decomposition Spatial-temporal iterative Video sequences 

References

  1. 1.
    Bellini P, Bruno I, Cenni D, Fuzier A, Nesi P, PaolucciMobile M (2015) Medicine: semantic computing management for health care applications on desktop and mobile devices. Multimed Tools Appl 58(1):41–79CrossRefGoogle Scholar
  2. 2.
    Cevikalp H, Triggs B (2010) Face recognition based on image sets. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 13–18Google Scholar
  3. 3.
    Chen X, Yang T, Xu J (2015) Multi-gait identification based on multilinear analysis and multi-target tracking. Multimed Tools Appl. doi: 10.1007/s11042-015-2585-6 Google Scholar
  4. 4.
    Davis J, Shah M (1994) Recognizing hand gestures. In: Proceedings of IEEE European Conference on Computer Vision. Berlin Heidelberg, pp 331–340Google Scholar
  5. 5.
    Flórez F, García JM, García J, Hernández A (2002) Hand gesture recognition following the dynamics of a topology-preserving network. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 318–323Google Scholar
  6. 6.
    Hamm J, Lee DD (2008) Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of ACM International Conference on Machine Learning, pp 376–383Google Scholar
  7. 7.
    Harandi MT, Sanderson C, Shirazi S, Lovell BC (2011) Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2705–2712Google Scholar
  8. 8.
    Harandi MT, Sanderson C, Wiliem A, Lovell BC (2012) Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures. In: Proceedings of IEEE Workshop on Applications of Computer Vision, pp 433–439Google Scholar
  9. 9.
    Hong P, Turk M, Huang TS (2000) Gesture modeling and recognition using finite state machines. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 410–415Google Scholar
  10. 10.
    Hossain MS, Muhammad G (2015) Cloud-assisted speech and face recognition framework for health monitoring. Mobile Networks and Applications, pp 1–9Google Scholar
  11. 11.
    Hotelling H (1936) Relations between two sets of variates. Biometrika, pp 321–377Google Scholar
  12. 12.
    Hu W, Xie D, Fu Z, Zeng W, Maybank S (2007) Semantic-based surveillance video retrieval. IEEE Transactions on Image Processing 16(4):1168–1181MathSciNetCrossRefGoogle Scholar
  13. 13.
    Ishihara T, Otsu N (2004) Gesture recognition using auto-regressive coefficients of higher-order local auto-correlation features. In: Proceedings of the IEEE Conference on Automatic Face and Gesture Recognition, pp 583–588Google Scholar
  14. 14.
    Kim TK, Cipolla R (2009) Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(8):1415–1428CrossRefGoogle Scholar
  15. 15.
    Kim TK, Kittler J, Cipolla R (2007) Discriminative learning and recognition of image set classes using canonical correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):1005–1018Google Scholar
  16. 16.
    Lai Z., Xu Y, Yang J, Tang J, Zhang D (2013) Sparse tensor discriminant analysis. IEEE Transactions on Image Processing 22(10):3904–3915MathSciNetCrossRefGoogle Scholar
  17. 17.
    Liu L, Li Z, Delp EJ (2009) Efficient and low-complexity surveillance video compression using backward-channel aware Wyner-Ziv video coding. IEEE Transactions on Circuits and Systems for Video Technology 19(4):453–465CrossRefGoogle Scholar
  18. 18.
    Lui YM (2012) Tangent bundles on special manifolds for action recognition. IEEE Transactions on Circuits and Systems for Video Technology 22(6):930–942CrossRefGoogle Scholar
  19. 19.
    Liu Y, Wu F (2008) Multi-modality video shot clustering with tensor representation. Multimed Tools Appl 41(1):93–109CrossRefGoogle Scholar
  20. 20.
    Lu H, Plataniotis KN, Venetsanopoulos AN (2008) MPCA: Multilinear principal component analysis of tensor objects. IEEE Transactions on Neural Networks 19(1):18–39Google Scholar
  21. 21.
    Lui YM, Beveridge JR, Kirby M (2010) Action classification on product manifolds. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 833–839Google Scholar
  22. 22.
    Manresa C, Perales FJ, Mas R, Varona J (2005) Hand tracking and gesture recognition for human-computer interaction. Electronic Letters on Computer Vision and Image Analysis 74(8):2687–2715Google Scholar
  23. 23.
    Marcel S, Bernier O, Viallet JE, Collobert D (2000) Hand gesture recognition using input-output hidden markov models. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 456–461Google Scholar
  24. 24.
    Nie L, Zhao Y, Akbari M, Shen J, Chua TS (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Transactions on Knowledge and Data Engineering 27(2):396–409Google Scholar
  25. 25.
    Nie L, Akbari M, Li T, Chua T (2014) A joint local-global approach for medical terminology assignment. In: Proceedings of Medical Information Retrieval Workshop at SIGIR, pp 24–27Google Scholar
  26. 26.
    Nie L, Li T, Akbari M, Shen J, Chua TS (2014) WenZher: comprehensive vertical search for healthcare domain. In: Proceedings of the Conference on Research and Development in Information Retrieval, pp 1245–1246Google Scholar
  27. 27.
    Nie L, Wang M, Zhang L, Yan S, Zhang B, Chua TS (2014) Disease inference from health-related questions via sparse deep learning. IEEE Transactions on Knowledge and Data Engineering 27(8):2107–2119Google Scholar
  28. 28.
    Nie F, Xiang S, Song Y, Zhang C (2009) Extracting the optimal dimensionality for local tensor discriminant analysis. Pattern Recognition 42(1):105–114Google Scholar
  29. 29.
    Zhang L, Yang Y, Wang M, Hong R, Chua TS (2015) Beyond Doctors: Future Health Prediction from Multimedia and Multimodal Observations. In: Proceedings of ACM International Conference on Multimedia, pp 591–600Google Scholar
  30. 30.
    Pan P, Schonfeld D (2008) Dynamic proposal variance and optimal particle allocation in particle filtering for video tracking. IEEE Transactions on Circuits and Systems for Video Technology 18(9). doi: 10.1109/TCSVT.2008.928889
  31. 31.
    Phan AH, Cichocki A (2010) Tensor decompositions for feature extraction and classification of high dimensional datasets. IEICE Nonlinear theory and its applications 1(1):37–68Google Scholar
  32. 32.
    Rajko S, Qian G, Ingalls T, James J (2007) Real-time gesture recognition with minimal training requirements and on-line learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8Google Scholar
  33. 33.
    Saisan P, Doretto G, Wu YN, Soatto S (2001) Dynamic texture recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2:58–63Google Scholar
  34. 34.
    Suk HI, Sin BK, Lee SW (2008) Recognizing hand gestures using dynamic bayesian network. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 1–6Google Scholar
  35. 35.
    Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, pp 374–383Google Scholar
  36. 36.
    Tao J, Turjo M, Tan YP (2006) Quickest change detection for health-care video surveillance. In: Proceedings of IEEE International Symposium on Circuits and SystemsGoogle Scholar
  37. 37.
    Wang SB, Quattoni A, Morency LP, Demirdjian D, Darrell T (2006) Hidden conditional random fields for gesture recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2:1521–1527Google Scholar
  38. 38.
    Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 379–385Google Scholar
  39. 39.
    Yan R, Yang J, Hauptmann AG (2004) Learning query-class dependent weights in automatic video retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp 548–555Google Scholar
  40. 40.
    Yang Y, Xu D, Nie F, Yan S, Zhuang Y (2010) Image clustering using local discriminant models and global integration. IEEE Transactions on Image Processing 19(10):2761–2773MathSciNetCrossRefGoogle Scholar
  41. 41.
    Yang Y, Zhuang Y, Wu YT, Wu F, Pan YH (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Transactions on Multimedia 10(3):437–446CrossRefGoogle Scholar
  42. 42.
    Zhang L, Gao Y, Hong R, Hu Y, Ji R, Dai Q (2015) Probabilistic skimlets fusion for summarizing multiple consumer landmark videos. IEEE Transactions on Multimedia 17(1):40–49CrossRefGoogle Scholar
  43. 43.
    Zhang W, Lin Z, Tang X (2009) Tensor linear Laplacian discrimination (TLLD) for feature extraction. Pattern Recognition 42(9):1941–1948Google Scholar
  44. 44.
    Zhang X, Shi X, Hu W, Li X, Maybank S (2011) Visual tracking via dynamic tensor analysis with mean update. Neurocomputing 74(17):3277–3285Google Scholar
  45. 45.
    Zhang L, Xia Y, Mao K, Ma H, Shan Z (2015) An effective video summarization framework toward handheld devices. IEEE Transactions on Industrial Electronics 62(2):1309–1316CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Yuting Su
    • 1
  • Haiyi Wang
    • 1
  • Peiguang Jing
    • 1
  • Chuanzhong Xu
    • 1
  1. 1.School of Electronic Information EngineeringTianjin UniversityTianjinChina

Personalised recommendations