Skip to main content

Vision-Based Body Activity Recognition

  • Chapter
  • First Online:
Vision-Based Human Activity Recognition

Part of the book series: SpringerBriefs in Intelligent Systems ((BRIEFSINSY))

  • 311 Accesses

Abstract

The chapter systematically sorts out the human body-related activity recognition tasks based on the visual sensors, including body pose estimation, action recognition, and body reconstruction. The corresponding state-of-the-art approaches are summarized and introduced. As an important interaction cue, human body activity has broad application scenarios. Therefore, it has attracted considerable attention to improving the accuracy, efficiency, and robustness of the recognition and address the challenges of its diversity and complexity. Basically, there are two kinds of approaches for it: model-based and data-driven. It can clearly see that the data-driven implicit modelling will potentially unify these two approaches, which is a promising direction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Maurer U, Smailagic A, Siewiorek DP, Deisher M (2008) Activity recognition and monitoring using multiple sensors on different body positions. In: International workshop on wearable and implantable body sensor networks (BSN’06). IEEE, p 4

    Google Scholar 

  2. Pang Y, Yuan Y, Li X, Pan J (2011) Efficient hog human detection. Signal Process 91(4):773–781

    Article  Google Scholar 

  3. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703

    Google Scholar 

  4. Amin S, Andriluka M, Rohrbach M, Schiele B (2013) Multi-view pictorial structures for 3d human pose estimation. In: BMVC, vol 1

    Google Scholar 

  5. Hofmann M, Gavrila DM (2012) Multi-view 3d human pose estimation in complex environment. Int J Comput Vision 96(1):103–124

    Article  MathSciNet  Google Scholar 

  6. Rafi U, Gall J, Leibe B (2015) A semantic occlusion model for human pose estimation from a single depth image. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 67–74

    Google Scholar 

  7. Yub Jung H, Lee S, Seok Heo Y, Dong Yun, I (2015) Random tree walk toward instantaneous 3d human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2467–2474

    Google Scholar 

  8. Tekin B, Rozantsev A, Lepetit V, Fua P (2016) Direct prediction of 3d body poses from motion compensated sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 991–1000

    Google Scholar 

  9. Zhou X, Zhu M, Leonardos S, Derpanis KG, Daniilidis K (2016) Sparseness meets deepness: 3d human pose estimation from monocular video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 4966–4975

    Google Scholar 

  10. Ramakrishna V, Kanade T, Sheikh Y (2012) Reconstructing 3d human pose from 2d image landmarks. In: European conference on computer vision. Springer, pp 573–586

    Google Scholar 

  11. Wang C, Wang Y, Lin Z, Yuille AL, Gao W (2014) Robust estimation of 3d human poses from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2361–2368

    Google Scholar 

  12. Akhter I, Black MJ (2015) Pose-conditioned joint angle limits for 3d human pose reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1446–1455

    Google Scholar 

  13. Simo-Serra E, Ramisa A, Alenyà G, Torras C, Moreno-Noguer F (2012) Single image 3d human pose estimation from noisy observations. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2673–2680

    Google Scholar 

  14. Wei XK, Chai J (2009) Modeling 3d human poses from uncalibrated monocular images. In: 2009 IEEE 12th International conference on computer vision. IEEE, pp 1873–1880

    Google Scholar 

  15. Chen C-H, Ramanan D (2017) 3d human pose estimation = 2d pose estimation+ matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7035–7043

    Google Scholar 

  16. Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) Smpl: A skinned multi-person linear model. ACM Trans Graph (TOG) 34(6):1–16

    Article  Google Scholar 

  17. Bogo F, Kanazawa A, Lassner C, Gehler P, Romero J, Black MJ (2016) Keep it SMPL: automatic estimation of 3d human pose and shape from a single image. In: European conference on computer vision. Springer, pp 561–578

    Google Scholar 

  18. Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937

    Google Scholar 

  19. Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European conference on computer vision, Springer, pp 34–50

    Google Scholar 

  20. Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol 2, p 5

    Google Scholar 

  21. Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299

    Google Scholar 

  22. Sapp B, Taskar B (2013) MODEC: Multimodal decomposable models for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3674–3681

    Google Scholar 

  23. Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693

    Google Scholar 

  24. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755

    Google Scholar 

  25. Andriluka M, Iqbal U, Insafutdinov E, Pishchulin L, Milan A, Gall J, Schiele B (2018) PoseTrack: a benchmark for human pose estimation and tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5167–5176

    Google Scholar 

  26. Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339

    Google Scholar 

  27. Sigal L, Balan AO, Black MJ (2010) Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int J Comput Vision 87(1):4–27

    Article  Google Scholar 

  28. Joo H, Simon T, Cikara M, Sheikh Y (2019) Towards social artificial intelligence: nonverbal social signal prediction in a triadic interaction. In: CVPR

    Google Scholar 

  29. Fabbri M Lanzi F, Calderara S, Palazzi A, Vezzani R, Cucchiara R (2018) Learning to detect and track visible and occluded body joints in a virtual world. In: European conference on computer vision (ECCV)

    Google Scholar 

  30. Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu W, Theobalt C (20) Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 2017 Fifth international conference on 3D vision (3DV). IEEE.https://doi.org/10.1109/3dv.2017.00064, http://gvv.mpi-inf.mpg.de/3dhpdataset

  31. Varol G, Romero J, Martin X, Mahmood N, Black MJ, Laptev I, Schmid C (2017) Learning from synthetic humans. In: CVPR

    Google Scholar 

  32. Lassner C, Romero J, Kiefel M, Bogo F, Black MJ, Gehler PV (2017) Unite the people: closing the loop between 3d and 2d human representations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6050–6059

    Google Scholar 

  33. Riza Alp Guler IK Neverova N (2018) DensePose: Dense human pose estimation in the wild

    Google Scholar 

  34. Kong Y, Fu, Y (2018) Human action recognition and prediction: a survey. arXiv:1806.11230

  35. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, vol 27

    Google Scholar 

  36. Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool LV (2016) Temporal segment networks: Towards good practices for deep action recognition. In: European conference on computer vision. Springer, pp 20–36

    Google Scholar 

  37. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308

    Google Scholar 

  38. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

    Google Scholar 

  39. Tran, D, Wang, H, Torresani, L, Ray, J, LeCun, Y, Paluri, M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  40. Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In: Proceedings of the IEEE international conference on computer vision (ICCV)

    Google Scholar 

  41. Zolfaghari M, Singh K, Brox T (2018) Eco: efficient convolutional network for online video understanding. In: Proceedings of the European conference on computer vision (ECCV), pp 695–712

    Google Scholar 

  42. Crasto N, Weinzaepfel P, Alahari K, Schmid C (2019) MARS: Motion-augmented RGB stream for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7882–7891

    Google Scholar 

  43. Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634

    Google Scholar 

  44. Weng J, Weng C, Yuan J (2017) Spatio-temporal Naive-Bayes nearest-neighbor (ST-NBNN) for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  45. Weng J, Liu M, Jiang X, Yuan J (2018) Deformable pose traversal convolution for 3d action and gesture recognition. In: Proceedings of the European conference on computer vision (ECCV)

    Google Scholar 

  46. Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-C (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, vol 28

    Google Scholar 

  47. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence

    Google Scholar 

  48. Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402

  49. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: 2011 International conference on computer vision. IEEE, pp 2556–2563

    Google Scholar 

  50. Smaira L, Carreira J, Noland E, Clancy E, Wu A, Zisserman A (2020) A short note on the kinetics-700-2020 human action dataset. arXiv:2010.10864

  51. Monfort M, Andonian A, Zhou B, Ramakrishnan K, Bargal SA, Yan T, Brown L, Fan Q, Gutfruend D, Vondrick C et al (2019) Moments in time dataset: one million videos for event understanding. IEEE Trans Pattern Anal Mach Intell 1–8

    Google Scholar 

  52. Goyal R, Ebrahimi Kahou S, Michalski V, Materzynska J, Westphal S, Kim H, Haenel V, Fruend I, Yianilos P, Mueller-Freitag M, et al (2017) The “something something” video database for learning and evaluating visual common sense. In: Proceedings of the IEEE international conference on computer vision, pp 5842–5850

    Google Scholar 

  53. Chen L, Peng S, Zhou X (2021) Towards efficient and photorealistic 3d human reconstruction: a brief survey. Vis Inform 5(4):11–19

    Article  Google Scholar 

  54. Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R (2020) NeRf: representing scenes as neural radiance fields for view synthesis. In: European conference on computer vision. Springer, pp 405–421

    Google Scholar 

  55. Kanazawa A, Black MJ, Jacobs DW, Malik J (2018) End-to-end recovery of human shape and pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  56. Tung H-Y, Tung H-W, Yumer E, Fragkiadaki K (2017) Self-supervised learning of motion capture. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc

    Google Scholar 

  57. Tung H-YF, Harley AW, Seto W, Fragkiadaki K (2017) Adversarial inverse graphics networks: learning 2d-to-3d lifting and image-to-image translation from unpaired supervision. In: 2017 IEEE international conference on computer vision (ICCV), pp 4364–4372

    Google Scholar 

  58. Varol G, Ceylan D, Russell B, Yang J, Yumer E, Laptev I, Schmid C (2018) BodyNet: volumetric inference of 3d human body shapes. In: Proceedings of the European conference on computer vision (ECCV)

    Google Scholar 

  59. Omran M Lassner C, Pons-Moll G, Gehler P, Schiele B (2018) Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: 2018 international conference on 3D vision (3DV), pp 484–494

    Google Scholar 

  60. Guler RA, Kokkinos I (2019) HoloPose: Holistic 3d human reconstruction in-the-wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  61. Guo K, Lincoln P, Davidson P, Busch J, Yu X, Whalen M, Harvey G, Orts-Escolano S, Pandey R, Dourgarian J et al (2019) The relightables: volumetric performance capture of humans with realistic relighting. ACM Trans Graph (ToG) 38(6):1–19

    Google Scholar 

  62. Newcombe RA, Fox D, Seitz SM (2015) DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  63. Yu T, Zheng Z, Guo K, Zhao J, Dai Q, Li H, Pons-Moll G, Liu Y (2018) Doublefusion: real-time capture of human performances with inner body shapes from a single depth sensor. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  64. Zheng Z, Yu T, Wei Y, Dai Q, Liu Y (2019) DeepHuman: 3d human reconstruction from a single image. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

    Google Scholar 

  65. Saito S, Huang Z, Natsume R, Morishima S, Kanazawa A, Li H (2019) PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

    Google Scholar 

  66. Peng S, Zhang Y, Xu Y, Wang Q, Shuai Q, Bao H, Zhou X (2021) Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9054–9063

    Google Scholar 

  67. Peng S, Dong J, Wang Q, Zhang S, Shuai Q, Bao H, Zhou X (2021) Animatable neural radiance fields for human body modeling. arXiv eprints, 2105

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhongxu Hu .

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hu, Z., Lv, C. (2022). Vision-Based Body Activity Recognition. In: Vision-Based Human Activity Recognition. SpringerBriefs in Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-19-2290-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-2290-9_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-2289-3

  • Online ISBN: 978-981-19-2290-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics