Multi-activity 3D human motion recognition and tracking in composite motion model with synthesized transition bridges

Yu, Jialin; Sun, Jifeng; Liu, Shengqing; Luo, Shasha

doi:10.1007/s11042-017-4847-y

Multi-activity 3D human motion recognition and tracking in composite motion model with synthesized transition bridges

Published: 04 June 2017

Volume 77, pages 12023–12055, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jialin Yu ORCID: orcid.org/0000-0001-8286-7203¹,
Jifeng Sun¹,
Shengqing Liu¹ &
…
Shasha Luo¹

324 Accesses
3 Citations
Explore all metrics

Abstract

Recognizing and tracking multiple activities are all extremely challenging machine vision tasks due to diverse motion types included and high-dimensional (HD) state space. To overcome these difficulties, a novel generative model called composite motion model (CMM) is proposed. This model contains a set of independent, low-dimensional (LD), and activity-specific manifold models that effectively constrain the state search space for 3D human motion recognition and tracking. This separate modeling of activity-specific movements can not only allow each manifold model to be optimized in accordance with only its respective movement, but also improve the scalability of the models. For accurate tracking with our CMM, a particle filter (PF) method is thus employed and then the particles can be distributed in all manifold models at each time step. In addition, an efficient activity switching strategy is proposed to dominate the particle distribution on all LD manifolds. To diffuse the particles amongst manifold models and respond quickly to the sudden changes in the activity, a set of visually-reasonable and kinematically-realistic transition bridges are synthesized by using the good properties of LD latent space and HD observation space, which enables the inter-activity motions seem more natural and realistic. Finally, a pose hypothesis that can best interpret the visual observation is selected and then used to recognize the activity that is currently observed. Extensive experiments, via qualitative and quantitative analyses, verify the effectiveness and robustness of our proposed CMM in the tasks of multi-activity 3D human motion recognition and tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Article Open access 12 April 2024

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

A review of computer vision-based approaches for physical rehabilitation and assessment

Article Open access 19 June 2021

References

Agarwal A, Triggs B (2006) 3D human poses from silhouettes by relevance vector regression. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp 882–888
Andrei N (2007) Scale conjugate gradient algorithms for unconstrained optimization. Comput Optim Appl 38(3):401–416
Article MathSciNet MATH Google Scholar
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(2):509–522
Article Google Scholar
Chen J, Kim M, Wang Y (2009) Switching Gaussian process dynamics models for simultaneous composite motion tracking and recognition. Proceedings of IEEE Computer Society International Conference on Computer Vision and Pattern Recognition (CVPR), pp 2655–2662
Cheng MM, Zhang GX (2011) Connectedness of random walk segmentation. IEEE Trans Pattern Anal Mach Intell 33(1):200–202
Article Google Scholar
Corazza S, Mundermann L, Gambaretto E (2010) Markless motion capture through visual hull, articulated ICP and subject specific model generation. Int J Comput Vis 87(1):156–169
Article Google Scholar
Cui SJ, Liu Y, Xu YD, Zhao HY (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst 43(4):996–1002
Article Google Scholar
Elgammal A, Lee CS (2004) Inferring 3D body pose from silhouettes using activity manifold learning. Proceedings of IEEE Computer Society International Conference on Computer Vision and Pattern Recognition (CVPR), pp 681–688
Gall J, Rosenhahn B (2010) Optimization and filtering for human motion capture. Int J Comput Vis 61(2):185–205
Google Scholar
Gao Z, Zhang H, Liu A, Xu GP, Xue YB (2016) Human action recognition on depth dataset. Neural Comput Applic 27(7):2047–2054
Article Google Scholar
Gonczarek A, Tomczak JM (2016) Articulated tracking with manifold regularized particle filter. Mach Vis Appl 27(2):275–286
Article MATH Google Scholar
Howe NR (2011) A recognition-based motion capture baseline on the human Eva II test data. Mach Vis Appl 22(6):995–1008
Article Google Scholar
Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human 3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339
Article Google Scholar
Isard M, Blake A (1998) CONDENSATION-conditional density propagation for visual tracking. Int J Comput Vis 29(1):5–28
Article Google Scholar
Kadry S, Abdallah A, Joumaa C (2011) On the optimization of Dijkstra’s algorithm. Lect Notes Electr Eng 133(2):393–397
Article Google Scholar
Kovar L, Gleicher M, Pighin FH (2008) Motion graphs. Proceedings of ACM SIGGRAPH 2008 classes, pp 473–482
Lawrence ND (2005) Probabilistic nonlinear principal component analysis with Gaussian process latent variable models. J Mach Learn Res 11(6):1783–1816
MATH Google Scholar
Lawrence ND (2007) Hierarchical Gaussian process latent variable models. Proceedings of ACM International Conference on Machine Learning (ICML), pp 481–488
Li SJ, Liu ZQ, Chan AB (2014) Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp 482–489
Lu Z, Carreira-Perpinan M, Sminchisescu C (2007) People tracking with the Laplacian eigenmaps latent variable model. Proceedings of Advances in Neural Information Processing System (NIPS), pp 1–8
McKeague S, Liu JD, Yang GZ (2013) Hand and body association in crowded environments for human-robot interaction. Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp 2161–2168
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. The MIT Press
Reza A, Hadi S, Shohreh K (2016) Pose estimation of soccer players using multiple uncalibrated cameras. Multimed Tools Appl 75(12):6809–6827
Article Google Scholar
Safonova A, Hodgins JK (2007) Construction and optimal search of interpolated motion graphs. Proceedings of the ACM SIGGRAPH Conference on Computer Graphics, pp 1–12
Sedai S, Bennamoun M, Huynh DQ (2013) A Gaussian process guided particle filter for tracking 3D human pose in video. IEEE Trans Image Process 22(11):4286–4300
Article MathSciNet MATH Google Scholar
Sermanet P, Eigen D (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. Proceedings of IEEE International Conference on Learning Representations (ICLR), pp 1–16
Shotton J, Girshick R, Fitzgibbon A, Sharp T, Cook M, Blake A (2013) Efficient human pose estimation from single depth images. IEEE Trans Pattern Anal Mach Intell 35(12):2821–2840
Article Google Scholar
Sigal L, Balan AO, Black MJ (2010) Human Eva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int J Comput Vis 87(1–2):4–27
Article Google Scholar
Sunho P, Seungjin C (2010) Hierarchical Gaussian process regression. Proceedings of 2th Asian Conference on Machine Learning (ACML), pp 95–110
Szczuko P (2014) Genetic programming extension to APF-based monocular human body pose estimation. Multimed Tools Appl 68(1):177–192
Article Google Scholar
Tenenbaum JB, Silva VD, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Article Google Scholar
Tobias J, Esther KM, Luc VG (2007) Multi-activity tracking in LLE body pose space. Lect Notes Comput Sci (including Subseries Lecture Notes in Artificial Intelligent and Lecture Notes in Bioinformatics) 4814 LNCS, pp 42–57
Tobias J, Esther KM, Luc VG (2009) Learning generative models for multi-activity body pose estimation. Int J Comput Vis 83(2):121–134
Article Google Scholar
Tomas P, Karen S, James C (2015) Deep convolutional neural networks for efficient pose estimation in gesture videos. Proceedings of the Asian Conference on Computer Vision (ACCV), pp 538–552
Torres C, Fragoso V, Hammond S, Manjunath BS (2016) Eye-CU: Sleep pose classification for healthcare using multimodal multiview data. Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1–9
Ueng SK, Chen GZ (2016) Vision-based multi-user human computer interaction. Multimed Tools Appl 75(16):10059–10076
Article Google Scholar
Urtasum R, Fleet DJ, Geiger A, Popovic J, Darrell T, Lawrence ND (2008) Topologically-constrained latent variable models. Proceedings of IEEE International Conference on Machine Learning (ICML), pp 1080–1087
Wang JM, Fleet DJ, Hertzmann A (2008) Gaussian process dynamical models for human motion. IEEE Trans Pattern Anal Mach Intell 30(2):283–298
Article Google Scholar
Wang PC, Li WQ, Gao ZM, Zhang J, Tang C (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Trans Hum-Mach Syst 46(4):498–509
Article Google Scholar
Yu JL, Sun JF (2016) Action temporal-spatial semantic guide for 3D human pose tracking. Proceedings of 28th Chinese Control and Decision Conference (CCDC), pp 1940–1945
Yu JL, Sun JF (2016) 3D human pose regression via robust sparse tensor subspace learning. Multimed Tools Appl 76(2):2399–2439
Article Google Scholar
Zhang X, Fan GL (2010) Dual gait generative models for human motion estimation from a single camera. IEEE Trans Syst Man Cybern B Cybern 40(4):1034–1049
Article Google Scholar
Zhang ZY, Zha HY (2006) Principal manifold and nonlinear dimensionality reduction via tangent space alignment. SIAM J Sci Comput 26(1):313–338
Article MathSciNet MATH Google Scholar
Zhao X, Liu YC (2008) Generative tracking of 3D human motion by hierarchical annealing genetic algorithm. Pattern Recogn 41(8):2470–2483
Article MATH Google Scholar
Zhao LM, Safonova A (2009) Achieving good connectivity in motion graphs. Graphical Models J 71(4):139–152
Article Google Scholar
Zhao X, Ning HZ, Liu YC (2010) Human pose regression through multiview visual fusion. IEEE Trans Circuits Syst Video Technol 20(7):957–966
Article Google Scholar
Zhou XW, Zhu ML, Leonardos S, Derpanis K, Daniilidis K (2016) Sparseness meets deepness: 3D human pose estimation from monocular video. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp 4966–4975

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant: 61202292), and in part by Guangdong Province National Science Foundation of China (Grant: 9151064101000037). The authors thank Sigal L, Balan AO, and Ionescu C for providing publically available databases (i.e., HumanEva and Human3.6 M databases) for free.

Author information

Authors and Affiliations

School of Electronic and Information Engineering, South China University of Technology, Guangzhou, 510640, China
Jialin Yu, Jifeng Sun, Shengqing Liu & Shasha Luo

Authors

Jialin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jifeng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shengqing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shasha Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jialin Yu.

Appendix: Definition of common notations

Notation	Definition
X	LD latent variable space
Y	HD pose observation space
X	LD pose sequence
Y	HD pose observation sequence
x	LD latent point that corresponds to a 3D pose in LD space X
y	HD pose data in HD space Y
z	Visual observation (i.e., shape contexts in our paper)
c	Activity class
N	Number of activities (or Number of separate models)
f _N(·)	A nonlinear mapping from LD latent space X to HD observation space Y
f _D(·)	A temporal dynamical mapping from the latent point x _t − 1 to another point x _t
f _x → z	Mapping function to visual observation space
f _x → y	Mapping function to pose observation space
t	Frame index
S	Sample poses
\( {\mathbf{s}}^{dri} \)	Particles after drift
\( {\mathbf{s}}^{dis} \)	Particles after diffusion
n	Number of particles
α	Particle weight

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, J., Sun, J., Liu, S. et al. Multi-activity 3D human motion recognition and tracking in composite motion model with synthesized transition bridges. Multimed Tools Appl 77, 12023–12055 (2018). https://doi.org/10.1007/s11042-017-4847-y

Download citation

Received: 12 December 2016
Revised: 27 April 2017
Accepted: 17 May 2017
Published: 04 June 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s11042-017-4847-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-activity 3D human motion recognition and tracking in composite motion model with synthesized transition bridges

Abstract

Access this article

Similar content being viewed by others

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

A review of computer vision-based approaches for physical rehabilitation and assessment

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Definition of common notations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-activity 3D human motion recognition and tracking in composite motion model with synthesized transition bridges

Abstract

Access this article

Similar content being viewed by others

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

A review of computer vision-based approaches for physical rehabilitation and assessment

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Definition of common notations

Appendix: Definition of common notations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation