Abstract
Human action recognition using 3D skeletal data has become popular topic with the emergence of the cost-effective depth sensors, such as Microsoft Kinect. However, noisy joint position and speed variation between actors make action recognition from 3D joint positions difficult. To address these problems, this paper proposes a novel framework, called Enhanced Sequence Matching (ESM), to align and compare action sequences. Inspired by DNA sequence alignment method used in bioinformatics, we model the new scoring function to measure the similarity between two action sequences with noise. We construct action sequence from a set of elementary Moving Poses (eMP) built from affinity propagation. By using affinity propagation, eMP set is built automatically, in other words, it determines the number of eMPs itself. The proposed framework outperforms the state-of-the-art on UTKinect action dataset and MSRC-12 gesture dataset and achieves comparable performance to the state-of-the-art on MSR action 3D dataset. Moreover, experimental results show that our method is very intuitive and robust to noise and temporal variation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR, p. 3 (2011)
Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, New york (2004)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Vingron, M., Waterman, M.S.: Sequence alignment and penalty choice: review of concepts, case studies and implications. J. Mol. Biol. 235, 1–12 (1994)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007)
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1290–1297 (2012)
Xia, L., Chen, C.C., Aggarwal, J.: View invariant human action recognition using histograms of 3d joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 20–27 (2012)
Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection. In: The IEEE International Conference on Computer Vision (ICCV) (2013)
Wang, C., Wang, Y., Yuille, A.: An approach to pose-based action recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 915–922 (2013)
Luo, J., Wang, W., Qi, H.: Group sparsity and geometry constrained dictionary learning for action recognition from depth maps. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 1809–1816 (2013)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46, 131–159 (2002)
Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional models for contextual human motion recognition. In: Tenth IEEE International Conference on Computer Vision, ICCV 2005, vol. 2, pp. 1808–1815 (2005)
Morency, L., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–8 (2007)
Lv, F., Nevatia, R.: Recognition and segmentation of 3-d human action using HMM and multi-class adaboost. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part IV. LNCS, vol. 3954, pp. 359–372. Springer, Heidelberg (2006)
Li, K., Hu, J., Fu, Y.: Modeling complex temporal composition of actionlets for activity prediction. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 286–299. Springer, Heidelberg (2012)
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–14 (2010)
Quattoni, A., Wang, S., Morency, L., Collins, M., Darrell, T.: Hidden conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 29, 1848–1852 (2007)
Han, L., Wu, X., Liang, W., Hou, G., Jia, Y.: Discriminative human action recognition in the learned hierarchical manifold space. Image Vis. Comput. 28, 836–849 (2010). Best of Automatic Face and Gesture Recognition 2008
Wang, Y., Mori, G.: Hidden part models for human action recognition: probabilistic versus max margin. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1310–1323 (2011)
Veeraraghavan, A., Roy-chowdhury, A.K.: The function space of an activity. In: Proceedings of Computer Vision Pattern Recognition, pp. 959–968 (2006)
Müller, M., Röder, T.: Motion templates for automatic classification and retrieval of motion capture data. In: Proceedings of the 2006 ACM SIGGRAPH Eurographics Symposium on Computer Animation, SCA 2006, Aire-la-Ville, Switzerland, Switzerland, pp. 137–146. Eurographics Association (2006)
Yao, B.Z., Zhu, S.C.: Learning deformable action templates from cluttered videos. In: ICCV, pp. 1507–1514. IEEE (2009)
Wang, J., Wu, Y.: Learning maximum margin temporal warping for action recognition. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2688–2695 (2013)
Fothergill, S., Mentis, H., Kohli, P., Nowozin, S.: Instructing people for training gestural interactive systems. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2012, pp. 1737–1746. ACM, New York (2012)
Zhu, Y., Chen, W., Guo, G.: Fusing spatiotemporal features and joints for 3d action recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 486–491 (2013)
Lehrmann, A.M., Gehler, P.V., Nowozin, S.: Efficient non-linear markov models for human motion. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, USA. IEEE (2014)
Acknowledgement
This work was partly supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST) (No. 2011-00166669) and Samsung Electronics Co., Ltd.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Jung, HJ., Hong, KS. (2015). Enhanced Sequence Matching for Action Recognition from 3D Skeletal Data. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-16814-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16813-5
Online ISBN: 978-3-319-16814-2
eBook Packages: Computer ScienceComputer Science (R0)