Abstract
The chapter systematically sorts out the human body-related activity recognition tasks based on the visual sensors, including body pose estimation, action recognition, and body reconstruction. The corresponding state-of-the-art approaches are summarized and introduced. As an important interaction cue, human body activity has broad application scenarios. Therefore, it has attracted considerable attention to improving the accuracy, efficiency, and robustness of the recognition and address the challenges of its diversity and complexity. Basically, there are two kinds of approaches for it: model-based and data-driven. It can clearly see that the data-driven implicit modelling will potentially unify these two approaches, which is a promising direction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Maurer U, Smailagic A, Siewiorek DP, Deisher M (2008) Activity recognition and monitoring using multiple sensors on different body positions. In: International workshop on wearable and implantable body sensor networks (BSN’06). IEEE, p 4
Pang Y, Yuan Y, Li X, Pan J (2011) Efficient hog human detection. Signal Process 91(4):773–781
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703
Amin S, Andriluka M, Rohrbach M, Schiele B (2013) Multi-view pictorial structures for 3d human pose estimation. In: BMVC, vol 1
Hofmann M, Gavrila DM (2012) Multi-view 3d human pose estimation in complex environment. Int J Comput Vision 96(1):103–124
Rafi U, Gall J, Leibe B (2015) A semantic occlusion model for human pose estimation from a single depth image. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 67–74
Yub Jung H, Lee S, Seok Heo Y, Dong Yun, I (2015) Random tree walk toward instantaneous 3d human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2467–2474
Tekin B, Rozantsev A, Lepetit V, Fua P (2016) Direct prediction of 3d body poses from motion compensated sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 991–1000
Zhou X, Zhu M, Leonardos S, Derpanis KG, Daniilidis K (2016) Sparseness meets deepness: 3d human pose estimation from monocular video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 4966–4975
Ramakrishna V, Kanade T, Sheikh Y (2012) Reconstructing 3d human pose from 2d image landmarks. In: European conference on computer vision. Springer, pp 573–586
Wang C, Wang Y, Lin Z, Yuille AL, Gao W (2014) Robust estimation of 3d human poses from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2361–2368
Akhter I, Black MJ (2015) Pose-conditioned joint angle limits for 3d human pose reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1446–1455
Simo-Serra E, Ramisa A, Alenyà G, Torras C, Moreno-Noguer F (2012) Single image 3d human pose estimation from noisy observations. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2673–2680
Wei XK, Chai J (2009) Modeling 3d human poses from uncalibrated monocular images. In: 2009 IEEE 12th International conference on computer vision. IEEE, pp 1873–1880
Chen C-H, Ramanan D (2017) 3d human pose estimation = 2d pose estimation+ matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7035–7043
Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) Smpl: A skinned multi-person linear model. ACM Trans Graph (TOG) 34(6):1–16
Bogo F, Kanazawa A, Lassner C, Gehler P, Romero J, Black MJ (2016) Keep it SMPL: automatic estimation of 3d human pose and shape from a single image. In: European conference on computer vision. Springer, pp 561–578
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European conference on computer vision, Springer, pp 34–50
Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol 2, p 5
Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299
Sapp B, Taskar B (2013) MODEC: Multimodal decomposable models for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3674–3681
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755
Andriluka M, Iqbal U, Insafutdinov E, Pishchulin L, Milan A, Gall J, Schiele B (2018) PoseTrack: a benchmark for human pose estimation and tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5167–5176
Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339
Sigal L, Balan AO, Black MJ (2010) Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int J Comput Vision 87(1):4–27
Joo H, Simon T, Cikara M, Sheikh Y (2019) Towards social artificial intelligence: nonverbal social signal prediction in a triadic interaction. In: CVPR
Fabbri M Lanzi F, Calderara S, Palazzi A, Vezzani R, Cucchiara R (2018) Learning to detect and track visible and occluded body joints in a virtual world. In: European conference on computer vision (ECCV)
Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu W, Theobalt C (20) Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 2017 Fifth international conference on 3D vision (3DV). IEEE.https://doi.org/10.1109/3dv.2017.00064, http://gvv.mpi-inf.mpg.de/3dhpdataset
Varol G, Romero J, Martin X, Mahmood N, Black MJ, Laptev I, Schmid C (2017) Learning from synthetic humans. In: CVPR
Lassner C, Romero J, Kiefel M, Bogo F, Black MJ, Gehler PV (2017) Unite the people: closing the loop between 3d and 2d human representations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6050–6059
Riza Alp Guler IK Neverova N (2018) DensePose: Dense human pose estimation in the wild
Kong Y, Fu, Y (2018) Human action recognition and prediction: a survey. arXiv:1806.11230
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, vol 27
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool LV (2016) Temporal segment networks: Towards good practices for deep action recognition. In: European conference on computer vision. Springer, pp 20–36
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Tran, D, Wang, H, Torresani, L, Ray, J, LeCun, Y, Paluri, M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In: Proceedings of the IEEE international conference on computer vision (ICCV)
Zolfaghari M, Singh K, Brox T (2018) Eco: efficient convolutional network for online video understanding. In: Proceedings of the European conference on computer vision (ECCV), pp 695–712
Crasto N, Weinzaepfel P, Alahari K, Schmid C (2019) MARS: Motion-augmented RGB stream for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7882–7891
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Weng J, Weng C, Yuan J (2017) Spatio-temporal Naive-Bayes nearest-neighbor (ST-NBNN) for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Weng J, Liu M, Jiang X, Yuan J (2018) Deformable pose traversal convolution for 3d action and gesture recognition. In: Proceedings of the European conference on computer vision (ECCV)
Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-C (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, vol 28
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence
Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: 2011 International conference on computer vision. IEEE, pp 2556–2563
Smaira L, Carreira J, Noland E, Clancy E, Wu A, Zisserman A (2020) A short note on the kinetics-700-2020 human action dataset. arXiv:2010.10864
Monfort M, Andonian A, Zhou B, Ramakrishnan K, Bargal SA, Yan T, Brown L, Fan Q, Gutfruend D, Vondrick C et al (2019) Moments in time dataset: one million videos for event understanding. IEEE Trans Pattern Anal Mach Intell 1–8
Goyal R, Ebrahimi Kahou S, Michalski V, Materzynska J, Westphal S, Kim H, Haenel V, Fruend I, Yianilos P, Mueller-Freitag M, et al (2017) The “something something” video database for learning and evaluating visual common sense. In: Proceedings of the IEEE international conference on computer vision, pp 5842–5850
Chen L, Peng S, Zhou X (2021) Towards efficient and photorealistic 3d human reconstruction: a brief survey. Vis Inform 5(4):11–19
Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R (2020) NeRf: representing scenes as neural radiance fields for view synthesis. In: European conference on computer vision. Springer, pp 405–421
Kanazawa A, Black MJ, Jacobs DW, Malik J (2018) End-to-end recovery of human shape and pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Tung H-Y, Tung H-W, Yumer E, Fragkiadaki K (2017) Self-supervised learning of motion capture. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc
Tung H-YF, Harley AW, Seto W, Fragkiadaki K (2017) Adversarial inverse graphics networks: learning 2d-to-3d lifting and image-to-image translation from unpaired supervision. In: 2017 IEEE international conference on computer vision (ICCV), pp 4364–4372
Varol G, Ceylan D, Russell B, Yang J, Yumer E, Laptev I, Schmid C (2018) BodyNet: volumetric inference of 3d human body shapes. In: Proceedings of the European conference on computer vision (ECCV)
Omran M Lassner C, Pons-Moll G, Gehler P, Schiele B (2018) Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: 2018 international conference on 3D vision (3DV), pp 484–494
Guler RA, Kokkinos I (2019) HoloPose: Holistic 3d human reconstruction in-the-wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Guo K, Lincoln P, Davidson P, Busch J, Yu X, Whalen M, Harvey G, Orts-Escolano S, Pandey R, Dourgarian J et al (2019) The relightables: volumetric performance capture of humans with realistic relighting. ACM Trans Graph (ToG) 38(6):1–19
Newcombe RA, Fox D, Seitz SM (2015) DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Yu T, Zheng Z, Guo K, Zhao J, Dai Q, Li H, Pons-Moll G, Liu Y (2018) Doublefusion: real-time capture of human performances with inner body shapes from a single depth sensor. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Zheng Z, Yu T, Wei Y, Dai Q, Liu Y (2019) DeepHuman: 3d human reconstruction from a single image. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Saito S, Huang Z, Natsume R, Morishima S, Kanazawa A, Li H (2019) PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Peng S, Zhang Y, Xu Y, Wang Q, Shuai Q, Bao H, Zhou X (2021) Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9054–9063
Peng S, Dong J, Wang Q, Zhang S, Shuai Q, Bao H, Zhou X (2021) Animatable neural radiance fields for human body modeling. arXiv eprints, 2105
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Hu, Z., Lv, C. (2022). Vision-Based Body Activity Recognition. In: Vision-Based Human Activity Recognition. SpringerBriefs in Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-19-2290-9_4
Download citation
DOI: https://doi.org/10.1007/978-981-19-2290-9_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2289-3
Online ISBN: 978-981-19-2290-9
eBook Packages: Computer ScienceComputer Science (R0)