Skip to main content

Human Activity Understanding

  • Chapter
  • First Online:
Human Centric Visual Analysis with Deep Learning
  • 845 Accesses

Abstract

Understanding human activity is very challenging even with recently developed 3D/depth sensors. To solve this problem, this work investigates a novel deep structured model that adaptively decomposes an activity into temporal parts using convolutional neural networks (CNNs). The proposed model advances two aspects of the traditional deep learning approaches. First, a latent temporal structure is introduced into the deep model, accounting for large temporal variations in diverse human activities. In particular, we utilize the latent variables to decompose the input activity into a number of temporally segmented subactivities and feed them into the parts (i.e., subnetworks) of the deep architecture. Second, we incorporate a radius-margin bound as a regularization term into our deep model, which effectively improves the generalization performance for classification (Reprinted by permission from Springer Nature Customer Service Centre GmbH: Springer International Journal of Computer Vision [1] © 2019).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://vision.sysu.edu.cn/projects/3d-activity/.

References

  1. L. Lin, K. Wang, W. Zuo, M. Wang, J. Luo, L. Zhang, A deep structured model with radius-margin bound for 3D human activity recognition. Int. J. Comput. Vis. 118(2), 256–273 (2016)

    Article  MathSciNet  Google Scholar 

  2. L. Xia, C. Chen, J.K. Aggarwal, View invariant human action recognition using histograms of 3d joints, in CVPRW, pp 20–27 (2012)

    Google Scholar 

  3. O. Oreifej, Z. Liu, Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences, in CVPR, pp. 716–723 (2013)

    Google Scholar 

  4. L. Xia, J. Aggarwal, Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera, in CVPR, pp. 2834–2841 (2013)

    Google Scholar 

  5. J. Wang, Z. Liu, Y. Wu, J. Yuan, Mining actionlet ensemble for action recognition with depth cameras, in: CVPR, pp. 1290–1297 (2012)

    Google Scholar 

  6. Y. Wang, G. Mori, Hidden part models for human action recognition: Probabilistic vs. max-margin. IEEE Trans. Pattern Anal. Mach. Intell. 33(7), 1310–1323 (2011)

    Article  Google Scholar 

  7. J.M. Chaquet, E.J. Carmona, A. Fernandez-Caballero, A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. 117(6), 633–659 (2013)

    Article  Google Scholar 

  8. Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, L.D. Jackel et al., Handwritten digit recognition with a back-propagation network (Adv. Neural Inf. Process, Syst, 1990)

    Google Scholar 

  9. G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  Google Scholar 

  10. P. Wu, S. Hoi, H. Xia, P. Zhao, D. Wang, C. Miao, Online multimodal deep similarity learning with application to image retrieval, in ACM Mutilmedia, pp. 153–162 (2013)

    Google Scholar 

  11. P. Luo, X. Wang, X. Tang, Pedestrian parsing via deep decompositional neural network, in ICCV, pp. 2648–2655 (2013)

    Google Scholar 

  12. K. Wang, X. Wang, L. Lin, 3d human activity recognition with reconfigurable convolutional neural networks, in ACM MM (2014)

    Google Scholar 

  13. S. Ji, W. Xu, M. Yang, K. Yu, 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  14. S. Zhu, D. Mumford, A stochastic grammar of images. Found. Trends Comput. Graph. Vis. 2(4), 259–362 (2007)

    Article  Google Scholar 

  15. P.F. Felzenszwalb, R.B. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  16. M.R. Amer, S. Todorovic, Sum-product networks for modeling activities with stochastic structure, in: CVPR, pp. 1314–1321 (2012)

    Google Scholar 

  17. L. Lin, X. Wang, W. Yang, J.H. Lai, Discriminatively trained and-or graph models for object shape detection. IEEE Trans. Pattern Anal. Mach. Intelli. 37(5), 959–972 (2015)

    Article  Google Scholar 

  18. M. Pei, Y. Jia, S. Zhu, Parsing video events with goal inference and intent prediction, in ICCV, pp. 487–494 (2011)

    Google Scholar 

  19. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, in CVPR (2014)

    Google Scholar 

  20. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 1097–1105, (2012)

    Google Scholar 

  21. H.S. Koppula, R. Gupta, A. Saxena, Learning human activities and object affordances from rgb-d videos. Int. J. Robot. Res. (IJRR) 32(8), 951–970 (2013)

    Article  Google Scholar 

  22. F.J. Huang, Y. LeCun, Large-scale learning with svm and convolutional for generic object categorization, in CVPR, pp. 284–291 (2006)

    Google Scholar 

  23. R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  24. V. Vapnik, Statistical Learning Theory (John Wiley and Sons, New York, 1998)

    MATH  Google Scholar 

  25. O. Chapelle, V. Vapnik, O. Bousquet, S. Mukherjee, Choosing multiple parameters for support vector machines. Mach. Learn. 46(1–3), 131–159 (2002)

    Article  Google Scholar 

  26. H. Do, A. Kalousis, Convex formulations of radius-margin based support vector machines, in ICML (2013)

    Google Scholar 

  27. H. Do, A. Kalousis, M. Hilario, Feature weighting using margin and radius based error bound optimization in svms, in Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, vol 5781, Springer, Berlin Heidelberg, pp 315–329 (2009)

    Chapter  Google Scholar 

  28. P S, K K, S C, Y L, Pedestrian detection with unsupervised multi- stage feature learning, in CVPR (2013)

    Google Scholar 

  29. K. Yun, J. Honorio, D. Chattopadhyay, T.L. Berg, D. Samaras, Two-person interaction detection using body-pose features and multiple instance learning, in Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Lin .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Lin, L., Zhang, D., Luo, P., Zuo, W. (2020). Human Activity Understanding. In: Human Centric Visual Analysis with Deep Learning. Springer, Singapore. https://doi.org/10.1007/978-981-13-2387-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2387-4_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2386-7

  • Online ISBN: 978-981-13-2387-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics