Human Activity Understanding

Lin, Liang; Zhang, Dongyu; Luo, Ping; Zuo, Wangmeng

doi:10.1007/978-981-13-2387-4_10

Liang Lin⁵,
Dongyu Zhang⁵,
Ping Luo⁶ &
…
Wangmeng Zuo⁷

845 Accesses

Abstract

Understanding human activity is very challenging even with recently developed 3D/depth sensors. To solve this problem, this work investigates a novel deep structured model that adaptively decomposes an activity into temporal parts using convolutional neural networks (CNNs). The proposed model advances two aspects of the traditional deep learning approaches. First, a latent temporal structure is introduced into the deep model, accounting for large temporal variations in diverse human activities. In particular, we utilize the latent variables to decompose the input activity into a number of temporally segmented subactivities and feed them into the parts (i.e., subnetworks) of the deep architecture. Second, we incorporate a radius-margin bound as a regularization term into our deep model, which effectively improves the generalization performance for classification (Reprinted by permission from Springer Nature Customer Service Centre GmbH: Springer International Journal of Computer Vision [1] ^© 2019).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://vision.sysu.edu.cn/projects/3d-activity/.

References

L. Lin, K. Wang, W. Zuo, M. Wang, J. Luo, L. Zhang, A deep structured model with radius-margin bound for 3D human activity recognition. Int. J. Comput. Vis. 118(2), 256–273 (2016)
Article MathSciNet Google Scholar
L. Xia, C. Chen, J.K. Aggarwal, View invariant human action recognition using histograms of 3d joints, in CVPRW, pp 20–27 (2012)
Google Scholar
O. Oreifej, Z. Liu, Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences, in CVPR, pp. 716–723 (2013)
Google Scholar
L. Xia, J. Aggarwal, Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera, in CVPR, pp. 2834–2841 (2013)
Google Scholar
J. Wang, Z. Liu, Y. Wu, J. Yuan, Mining actionlet ensemble for action recognition with depth cameras, in: CVPR, pp. 1290–1297 (2012)
Google Scholar
Y. Wang, G. Mori, Hidden part models for human action recognition: Probabilistic vs. max-margin. IEEE Trans. Pattern Anal. Mach. Intell. 33(7), 1310–1323 (2011)
Article Google Scholar
J.M. Chaquet, E.J. Carmona, A. Fernandez-Caballero, A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. 117(6), 633–659 (2013)
Article Google Scholar
Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, L.D. Jackel et al., Handwritten digit recognition with a back-propagation network (Adv. Neural Inf. Process, Syst, 1990)
Google Scholar
G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
P. Wu, S. Hoi, H. Xia, P. Zhao, D. Wang, C. Miao, Online multimodal deep similarity learning with application to image retrieval, in ACM Mutilmedia, pp. 153–162 (2013)
Google Scholar
P. Luo, X. Wang, X. Tang, Pedestrian parsing via deep decompositional neural network, in ICCV, pp. 2648–2655 (2013)
Google Scholar
K. Wang, X. Wang, L. Lin, 3d human activity recognition with reconfigurable convolutional neural networks, in ACM MM (2014)
Google Scholar
S. Ji, W. Xu, M. Yang, K. Yu, 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
S. Zhu, D. Mumford, A stochastic grammar of images. Found. Trends Comput. Graph. Vis. 2(4), 259–362 (2007)
Article Google Scholar
P.F. Felzenszwalb, R.B. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
M.R. Amer, S. Todorovic, Sum-product networks for modeling activities with stochastic structure, in: CVPR, pp. 1314–1321 (2012)
Google Scholar
L. Lin, X. Wang, W. Yang, J.H. Lai, Discriminatively trained and-or graph models for object shape detection. IEEE Trans. Pattern Anal. Mach. Intelli. 37(5), 959–972 (2015)
Article Google Scholar
M. Pei, Y. Jia, S. Zhu, Parsing video events with goal inference and intent prediction, in ICCV, pp. 487–494 (2011)
Google Scholar
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, in CVPR (2014)
Google Scholar
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 1097–1105, (2012)
Google Scholar
H.S. Koppula, R. Gupta, A. Saxena, Learning human activities and object affordances from rgb-d videos. Int. J. Robot. Res. (IJRR) 32(8), 951–970 (2013)
Article Google Scholar
F.J. Huang, Y. LeCun, Large-scale learning with svm and convolutional for generic object categorization, in CVPR, pp. 284–291 (2006)
Google Scholar
R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
V. Vapnik, Statistical Learning Theory (John Wiley and Sons, New York, 1998)
MATH Google Scholar
O. Chapelle, V. Vapnik, O. Bousquet, S. Mukherjee, Choosing multiple parameters for support vector machines. Mach. Learn. 46(1–3), 131–159 (2002)
Article Google Scholar
H. Do, A. Kalousis, Convex formulations of radius-margin based support vector machines, in ICML (2013)
Google Scholar
H. Do, A. Kalousis, M. Hilario, Feature weighting using margin and radius based error bound optimization in svms, in Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, vol 5781, Springer, Berlin Heidelberg, pp 315–329 (2009)
Chapter Google Scholar
P S, K K, S C, Y L, Pedestrian detection with unsupervised multi- stage feature learning, in CVPR (2013)
Google Scholar
K. Yun, J. Honorio, D. Chattopadhyay, T.L. Berg, D. Samaras, Two-person interaction detection using body-pose features and multiple instance learning, in Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong, China
Liang Lin & Dongyu Zhang
School of Information Engineering, The Chinese University of Hong Kong, Hong Kong, Hong Kong
Ping Luo
School of Computer Science, Harbin Institute of Technology, Harbin, China
Wangmeng Zuo

Authors

Liang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Dongyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ping Luo
View author publications
You can also search for this author in PubMed Google Scholar
Wangmeng Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Lin .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lin, L., Zhang, D., Luo, P., Zuo, W. (2020). Human Activity Understanding. In: Human Centric Visual Analysis with Deep Learning. Springer, Singapore. https://doi.org/10.1007/978-981-13-2387-4_10

Download citation

DOI: https://doi.org/10.1007/978-981-13-2387-4_10
Published: 14 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2386-7
Online ISBN: 978-981-13-2387-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics