A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Pareek, Preksha; Thakkar, Ankit

doi:10.1007/s10462-020-09904-8

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Published: 25 September 2020

Volume 54, pages 2259–2322, (2021)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Preksha Pareek¹ &
Ankit Thakkar¹

11k Accesses
187 Citations
Explore all metrics

Abstract

Human Action Recognition (HAR) involves human activity monitoring task in different areas of medical, education, entertainment, visual surveillance, video retrieval, as well as abnormal activity identification, to name a few. Due to an increase in the usage of cameras, automated systems are in demand for the classification of such activities using computationally intelligent techniques such as Machine Learning (ML) and Deep Learning (DL). In this survey, we have discussed various ML and DL techniques for HAR for the years 2011–2019. The paper discusses the characteristics of public datasets used for HAR. It also presents a survey of various action recognition techniques along with the HAR applications namely, content-based video summarization, human–computer interaction, education, healthcare, video surveillance, abnormal activity detection, sports, and entertainment. The advantages and disadvantages of action representation, dimensionality reduction, and action analysis methods are also provided. The paper discusses challenges and future directions for HAR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Fig. 9

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Article 12 August 2023

Human activity recognition in artificial intelligence framework: a narrative review

Article 18 January 2022

Human Action Recognition and Prediction: A Survey

Article 28 March 2022

Abbreviations

ABC:: Artificial Bee Colony
ADI:: Average Depth Image
ADL:: Activities of Daily Living
AGC:: Adaptive Graph Convolution
AGCN:: Adaptive Graph Convolutional Network
ANN:: Artificial Neural Network
ARA:: Average Recognition Accuracy
ASAGA:: Adaptive Simulated Annealing Genetic Algorithm
BN:: Batch Normalization
BoVW:: Bag of Visual Words
BPTT:: Back-Propagation-Through-Time
CAE:: Convolution Autoencoder
CHMM:: Coupled Hidden Markove Model
CNN:: Convolution Neural Network
CS:: Cross-Subject
CV:: Cross-View
DBN:: Deep Belief Network
DDI:: Depth Difference Image
DDS:: Depth Differential Silhouettes
DE:: Differential Evolution
DL:: Deep Learning
DMM:: Depth Motion Map
DNN:: Deep Neural Network
DRNN:: Differential Recurrent Neural Network
DT:: Decision Tree
DTW:: Dynamic Time Warping
ELM:: Extreme Learning Machine
FCN:: Fully Convolutional Network
FTP:: Fourier Temporal Pyramid
GA:: Genetic Algorithm
GAN:: Generative Adversarial Network
GDI:: Geodesic Distance Iso
GLCM:: Grey Level Co-occurrence Matrix
GRU:: Gated Recurrent Unit
HAR:: Human Action Recognition
HCI:: Human–Computer Interface
HMM:: Hidden Markov Model
HOF:: Histogram of Optical Flow
HOG:: Histogram of Oriented Gradient
HoMB:: Histogram of Motion Boundary
HoVW:: Histogram of Visual Word
IEF:: Iterative Error Feedback
JDM:: Joint Distance Map
KDA:: Kernel Discriminant Analysis
KELM:: Kernel Extreme Learning Machine
kNN:: k-Nearest Neighbor
KPCA:: Kernel PCA
LBP:: Local Binary Pattern
LBPH:: LBP Histogram
LDA:: Linear Discriminant Analysis
LHMM:: Layered Hidden Markove Model
LOAO:: Leave One Actor Out
LOSO:: Leave One Sequence Out
LSTM:: Long Short-Term Memory
MAP:: Mean Average Precision
MEI:: Motion Energy Image
MHI:: Motion History Image
MiCT:: Mixed Convolution Neural Network
ML:: Machine Learning
MSE:: Mean Squared Error
NBNN:: Naïve Bayes Nearest Neighbor
PCA:: Principal Component Analysis
PCOG:: Pyramid Correlogram of Oriented Gradients
PoF2I:: Pose Feature to Image
PSO:: Particle Swarm Optimization
PSO-WC:: PSO-Weight Class
PSO-WV:: PSO-Weight Views
RBD:: Reduced Basis Decomposition
RBF:: Radial Basis Function
RBM:: Restricted Boltzman Machine
RF:: Random Forest
RNN:: Recurrent Neural Network
ROI:: Region of Interest
RVM:: Relevance Vector Machine
RVM:: Relevance Vector Machine
SDEG:: Spatial Edge Distribution of Gradients
SDK:: Software Development Kit
sDTD:: sequential Deep Trajectory Descriptor
SIFT:: Scale Invariant Feature Transform
SPD:: Symmetric Positive Definite
SSM:: Self-Similarity Matrix
STIP:: Space–Time Interest Point
STM:: Spatio-Temporal Matrix
SVM:: Support Vector Machine
TDD:: Two-stream Deep Convolution Descriptor
TpDD:: Trajectory-pooled Deep-Convolutional Descriptor
TS-GCN:: Two-Stream Graph Convolutional Network
TSN:: Temporal Segment Network
WLNBNN:: Weighted Local NBNN
ZSAR:: Zero-Shot Action Recognition

References

Abdul-Azim HA, Hemayed EE (2015) Human action recognition using trajectory-based representation. Egypt Inform J 16(2):187–198
Google Scholar
Aggarwal JK, Ryoo MS (2011) Human activity analysis: a survey. ACM Comput Surv (CSUR) 43(3):16
Google Scholar
Ahsan U, Sun C, Essa I (2018) Discrimnet: Semi-supervised action recognition from videos using generative adversarial networks. ArXiv preprint arXiv:1801.07230
Akilan T, Wu QJ, Safaei A, Jiang W (2017) A late fusion approach for harnessing multi-CNN model high-level features. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 566–571
Al Machot F, Elkobaisi MR, Kyamakya K (2020) Zero-shot human activity recognition using non-visual sensors. Sensors 20(3):825
Google Scholar
Amraee S, Vafaei A, Jamshidi K, Adibi P (2018) Abnormal event detection in crowded scenes using one-class SVM. Signal Image Video Process 12:1115–1123
Google Scholar
Angelini F, Fu Z, Long Y, Shao L, Naqvi SM (2019) 2D pose-based real-time human action recognition with occlusion-handling. IEEE Trans Multimedia 22(6):1433–1446
Google Scholar
Ar I, Akgul YS (2013) Action recognition using random forest prediction with combined pose-based and motion-based features. In: 2013 8th international conference on electrical and electronics engineering (ELECO). IEEE, pp 315–319
Arifoglu D, Bouchachia A (2017) Activity recognition and abnormal behaviour detection with recurrent neural networks. Procedia Comput Sci 110:86–93
Google Scholar
Arunraj M, Srinivasan A, Juliet AV (2018) Online action recognition from RGB-D cameras based on reduced basis decomposition. J Real-Time Image Process 17:341–356
Google Scholar
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: International workshop on human behavior understanding. Springer, pp 29–39
Baldi P (2012) Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML workshop on unsupervised and transfer learning, pp 37–49
Berkeley (2014) Multimodal human action dataset. Last Accessed 11 Dec 2019
Bhaumik H, Bhattacharyya S, Nath MD, Chakraborty S (2015) Real-time storyboard generation in videos using a probability distribution based threshold. In: 2015 fifth international conference on communication systems and network technologies (CSNT). IEEE, pp 425–431
Bhoomika Rathod SB, Pandya D, Patel R (2017) A survey on human activity analysis techniques. Int J Future Revolut Comput Sci Commun Eng 3:462–471
Google Scholar
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space–time shapes. In: Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, vol 2. IEEE, pp 1395–1402
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Google Scholar
Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Boulgouris NV, Chi ZX (2007) Gait recognition using radon transform and linear discriminant analysis. IEEE Trans Image Process 16(3):731–740
MathSciNet Google Scholar
Boulgouris NV, Hatzinakos D, Plataniotis KN (2005) Gait recognition: a challenging signal processing technology for biometric identification. IEEE Signal Process Mag 22(6):78–90
Google Scholar
Brand M, Oliver N, Pentland A (1997) Coupled hidden Markov models for complex action recognition. In: Proceedings of the computer vision and pattern recognition, 1997. IEEE, pp 994–999
Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: European conference on computer vision. Springer, pp 717–732
Cao J, Lin Z, Huang G-B (2012) Self-adaptive evolutionary extreme learning machine. Neural Process Lett 36(3):285–305
Google Scholar
Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4733–4742
Castro D, Hickson S, Sangkloy P, Mittal B, Dai S, Hays J, Essa I (2018) Let’s dance: learning from online dance videos. ArXiv preprint arXiv:1801.07388
CGCV-Laboratory (2017) Dongguk activities and actions database. Last Accessed 11 Dec 2019
Chaaraoui AA, Flórez-Revuelta F (2014a) A low-dimensional radial silhouette-based feature for fast human action recognition fusing multiple views. International scholarly research notices, vol 2014
Chaaraoui AA, Flórez-Revuelta F (2014b) Optimizing human action recognition based on a cooperative coevolutionary algorithm. Eng Appl Artif Intell 31:116–125
Google Scholar
Chakraborty B, Holte MB, Moeslund TB, Gonzalez J, Roca FX (2011) A selective spatio-temporal interest point detector for human action recognition in complex scenes. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 1776–1783
Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Comput Vis Image Underst 117(6):633–659
Google Scholar
Chen Y (2015) Reduced basis decomposition: a certified and fast lossy data compression algorithm. Comput Math Appl 70(10):2566–2574
MathSciNet MATH Google Scholar
Chen X, Yuille AL (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in neural information processing systems, pp 1736–1744
Chen C, Jafari R, Kehtarnavaz N (2015a) Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans Hum Mach Syst 45(1):51–61
Google Scholar
Chen C, Jafari R, Kehtarnavaz N (2015b) Action recognition from depth sequences using depth motion maps-based local binary patterns. In: 2015 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1092–1099
Chen C, Liu M, Zhang B, Han J, Jiang J, Liu H (2016) 3D action recognition using multi-temporal depth motion maps and fisher vector. In: IJCAI, pp 3331–3337
Chen C, Liu M, Liu H, Zhang B, Han J, Kehtarnavaz N (2017) Multi-temporal depth motion maps-based local binary patterns for 3-D human action recognition. IEEE Access 5:22590–22604
Google Scholar
Chintalapati S, Raghunadh M (2013) Automated attendance management system based on face recognition algorithms. In: 2013 IEEE international conference on computational intelligence and computing research (ICCIC). IEEE, pp 1–5
Computer-Vision-Lab (2012) SBU Kinect interaction dataset. Last Accessed 11 Dec 2019
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Craley J, Murray TS, Mendat DR, Andreou AG (2017) Action recognition using micro-Doppler signatures and a recurrent neural network. In: 2017 51st annual conference on information sciences and systems (CISS). IEEE, pp 1–5
CRCV (2010) UCF Sports Action dataset. Last Accessed 11 Dec 2019
CRCV (2012) UCF50 dataset. Last Accessed 11 Dec 2019
CRCV (2013) UCF101 dataset. Last Accessed 1 Feb 2020
CRCV (2020) UMN video dataset. Last Accessed 1 Feb 2020
Cutler R, Davis LS (2000) Robust real-time periodic motion detection, analysis, and applications. IEEE Trans Pattern Anal Mach Intell 22(8):781–796
Google Scholar
Czuszynski K, Ruminski J, Wtorek J (2017) Pose classification in the gesture recognition using the linear optical sensor. In: 2017 10th international conference on human system interactions (HSI). IEEE, pp 18–24
Dai C, Liu X, Lai J, Li P, Chao H-C (2019) Human behavior deep recognition architecture for smart city applications in the 5G environment. IEEE Netw 33(5):206–211
Google Scholar
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European conference on computer vision. Springer, pp 428–441
Das S, Koperski M, Bremond F, Francesca G (2018) Deep-temporal lstm for daily living action recognition. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6
Das S, Chaudhary A, Bremond F, Thonnat M (2019a) Where to focus on for human action recognition? In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 71–80
Das S, Dai R, Koperski M, Minciullo L, Garattoni L, Bremond F, Francesca G (2019b) Toyota smarthome: real-world activities of daily living. In: Proceedings of the IEEE international conference on computer vision, pp 833–842
De-La-Hoz-Franco E, Ariza-Colpas P, Quero JM, Espinilla M (2018) Sensor-based datasets for human activity recognition: a systematic review of literature. IEEE Access 6:59192–59210
Google Scholar
D’Orazio T, Marani R, Renó V, Cicirelli G (2016) Recent trends in gesture recognition: how depth data has improved classical approaches. Image Vis Comput 52:56–72
Google Scholar
Duque D, Santos H, Cortez P (2007) Prediction of abnormal behaviors for intelligent video surveillance systems. In: IEEE symposium on computational intelligence and data mining, 2007. CIDM 2007. IEEE, pp 362–367
Everts I, Van Gemert JC, Gevers T (2014) Evaluation of color spatio-temporal interest points for human action recognition. IEEE Trans Image Process 23(4):1569–1580
MathSciNet MATH Google Scholar
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941
Feng Y, Yuan Y, Lu X (2017) Learning deep event models for crowd anomaly detection. Neurocomputing 219:548–556
Google Scholar
Fisher PR (2012) CAVIAR dataset. Last Accessed 1 Feb 2020
Foggia P, Percannella G, Saggese A, Vento M (2013) Recognizing human actions by a bag of visual words. In: 2013 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 2910–2915
Foggia P, Saggese A, Strisciuglio N, Vento M (2014) Exploiting the deep learning paradigm for recognizing human actions. In: 2014 11th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 93–98
Gan L, Chen F (2013) Human action recognition using APJ3D and random forests. JSW 8(9):2238–2245
Google Scholar
Gao J, Zhang T, Xu C (2019) I know the relationships: zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8303–8311
Gavrila DM (1999) The visual analysis of human movement: a survey. Comput Vis Image Underst 73(1):82–98
MATH Google Scholar
Gkalelis N, Kim H, Hilton A, Nikolaidis N, Pitas I (2009) The i3DPost multi-view and 3D human action/interaction database. In: 2009 conference for visual media production. IEEE, pp 159–168
Gowda SN (2017) Human activity recognition using combinatorial deep belief networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1–6
Guo G, Lai A (2014) A survey on still image based human action recognition. Pattern Recogn 47(10):3343–3361
Google Scholar
Gupta JP, Singh N, Dixit P, Semwal VB, Dubey SR (2013) Human activity recognition using gait pattern. Int J Comput Vis Image Process (IJCVIP) 3(3):31–53
Google Scholar
Haria A, Subramanian A, Asokkumar N, Poddar S, Nayak JS (2017) Hand gesture recognition for human computer interaction. Procedia Comput Sci 115:367–374
Google Scholar
Hassan MM, Uddin MZ, Mohamed A, Almogren A (2018) A robust human activity recognition system using smartphone sensors and deep learning. Future Gener Comput Syst 81:307–313
Google Scholar
Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. Image Vis Comput 60:4–21
Google Scholar
Huang G-B, Zhu Q-Y, Siew C-K (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE international joint conference on neural networks, 2004, vol 2. IEEE, pp 985–990
Huang Z, Wan C, Probst T, Van Gool L (2017) Deep learning on lie groups for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6099–6108
Huang Y, Lai S-H, Tai S-H (2018) Human action recognition based on temporal pose CNN and multi-dimensional fusion. In: Proceedings of the European conference on computer vision (ECCV)
Huynh-The T, Hua-Cam H, Kim D-S (2019) Encoding pose features to images with data augmentation for 3D action recognition. IEEE Trans Industr Inform 16:3100–3111
Google Scholar
Ijjina EP, Chalavadi KM (2016) Human action recoxgnition using genetic algorithms and convolutional neural networks. Pattern Recogn 59:199–212
Google Scholar
INRIA (2016) IXMAS dataset. Last Accessed 1 Feb 2020
Iosifidis A, Tefas A, Pitas I (2014) Regularized extreme learning machine for multi-view semi-supervised action recognition. Neurocomputing 145:250–262
Google Scholar
Jalal A (2017) IM-daily depth activity dataset. Last Accessed 1 Feb 2020
Jalal A, Kim Y (2014) Dense depth maps-based human pose tracking and recognition in dynamic scenes using ridge data. In: 2014 11th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 119–124
Jalal A, Uddin MZ, Kim T-S (2012) Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home. IEEE Trans Consum Electron 58:3
Google Scholar
Jalal A, Kim Y-H, Kim Y-J, Kamal S, Kim D (2017) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recogn 61:295–308
Google Scholar
Jhuang H (2013) HMDB dataset. Last Accesed 11 Dec 2019
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Google Scholar
Jian M, Zhang S, Wu L, Zhang S, Wang X, He Y (2019) Deep key frame extraction for sport training. Neurocomputing 328:147–156
Google Scholar
Jiang Z, Lin Z, Davis L (2012) Recognizing human actions by learning and matching shape-motion prototype trees. IEEE Trans Pattern Anal Mach Intell 34(3):533–547
Google Scholar
Kalaivani P, Vimala D (2015) Human action recognition using background subtraction method. Int Res J Eng Technol (IRJET) 2(3):1032–1035
Google Scholar
Kang SB, Szeliski R (2004) Extracting view-dependent depth maps from a collection of images. Int J Comput Vis 58(2):139–163
Google Scholar
Karpathy A (2014) Sports-1M dataset. Last Accessed 11 Dec 2019
Kastaniotis D, Theodorakopoulos I, Theoharatos C, Economou G, Fotopoulos S (2015) A framework for gait-based recognition using Kinect. Pattern Recogn Lett 68:327–335
Google Scholar
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P, et al (2017) The kinetics human action video dataset. ArXiv preprint arXiv:1705.06950
Ke Y, Sukthankar R, Hebert M (2007) Event detection in crowded videos. In: 2007 IEEE 11th international conference on computer vision. IEEE, pp 1–8
Khan ZA, Sohn W (2011) Abnormal human activity recognition system based on R-transform and kernel discriminant technique for elderly home care. IEEE Trans Consum Electron 57:4
Google Scholar
Kim SH, Park R-H (2002) An efficient algorithm for video sequence matching using the modified hausdorff distance and the directed divergence. IEEE Trans Circuits Syst Video Technol 12(7):592–596
Google Scholar
Kim TS, Reiter A (2017) Interpretable 3D human action analysis with temporal convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1623–1631
Kim H, Lee S, Kim Y, Lee S, Lee D, Ju J, Myung H (2016) Weighted joint-based human behavior recognition algorithm using only depth information for low-cost intelligent video-surveillance system. Expert Syst Appl 45:131–141
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kumar K, Kishore P, Kumar DA, Kumar EK (2018) Indian classical dance action identification using adaboost multiclass classifier on multifeature fusion. In: 2018 conference on signal processing and communication engineering systems (SPACES). IEEE, pp 167–170
Laptev I (2005) On space–time interest points. Int J Comput Vis 64(2–3):107–123
Google Scholar
Laptev I (2012) Hollywood2 dataset. Last Accessed 11 Dec 2019
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Lee LH, Wan CH, Yong TF, Kok HM (2010) A review of nearest neighbor-support vector machines hybrid classification models. J Appl Sci 10:1841–1858
Google Scholar
Lee H-Y, Huang J-B, Singh M, Yang M-H (2017) Unsupervised representation learning by sorting sequences. In: Proceedings of the IEEE international conference on computer vision, pp 667–676
Li W (2017a) MSR daily activity 3D dataset. Last Accessed 11 Dec 2019
Li W (2017b) MSR-action3D dataset. Last Accessed 1 Feb 2020
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops. IEEE, pp 9–14
Li C, Hou Y, Wang P, Li W (2017a) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628
Google Scholar
Li C, Wang P, Wang S, Hou Y, Li W (2017b) Skeleton-based action recognition using LSTM and CNN. In: 2017 IEEE international conference on multimedia and expo workshops (ICMEW). IEEE, pp 585–590
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3595–3603
Lim JH, Teh EY, Geh MH, Lim CH (2017) Automated classroom monitoring with connected visioning system. In: Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC), 2017. IEEE, pp 386–393
Liu DZ (2016) MSR action dataset. Last Accessed 1 Feb 2020
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 1996–2003
Liu L, Shao L, Zhen X, Li X (2013) Learning discriminative key poses for action recognition. IEEE Trans Cybern 43(6):1860–1870
Google Scholar
Liu L, Shao L, Li X, Lu K (2016) Learning spatio-temporal representations for action recognition: a genetic programming approach. IEEE Trans Cybern 46(1):158–170
Google Scholar
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017a) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26
Google Scholar
Liu M, Liu H, Chen C (2017b) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68:346–362
Google Scholar
Lu K, Chen J, Little JJ, He H (2018) Lightweight convolutional neural networks for player detection and classification. Comput Vis Image Underst 172:77–87
Google Scholar
Mabrouk AB, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Syst Appl 91:480–491
Google Scholar
M. C. Laboratory (2012) DHA video dataset. Last Accessed 1 Feb 2020
Miao Y, Song J (2014) Abnormal event detection based on SVM in video surveillance. In: 2014 IEEE workshop on advanced research and technology in industry applications (WARTIA). IEEE, pp 1379–1383
MICC (2012) Florence 3D actions dataset. Last Accessed 11 Dec 2019
Mika S, Schölkopf B, Smola AJ, Müller K-R, Scholz M, Rätsch G (1999) Kernel PCA and de-noising in feature spaces. In: Advances in neural information processing systems, pp 536–542
Mishra A, Verma VK, Reddy MSK, Arulkumar S, Rai P, Mittal A (2018) A generative approach to zero-shot and few-shot action recognition. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 372–380
MIVIA-Lab (2017) MIVIA Dataset. Last Accessed 11 Dec 2019
Moya Rueda F, Grzeszick R, Fink G, Feldhorst S, ten Hompel M (2018) Convolutional neural networks for human activity recognition using body-worn sensors. In: Informatics, vol 5. Multidisciplinary Digital Publishing Institute, p 26
Murray TS, Mendat DR, Pouliquen PO, Andreou AG (2015) The Johns Hopkins University multimodal dataset for human action recognition. In: Radar sensor technology XIX; and active and passive signatures VI, vol 9461. International Society for Optics and Photonics, p 94611U
NADA (2004) KTH dataset. Last Accessed 1 Feb 2020
Nazir S, Yousaf MH, Velastin SA (2018) Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition. Comput Electr Eng 72:660–669
Google Scholar
Neha TK (2020) A review on PSO-SVM based performance measurement on different datasets. Int J Res Appl Sci Eng Technol 8:444–448
Google Scholar
Nizam Y, Mohd MNH, Jamil MMA (2017) Human fall detection from depth images using position and velocity of subject. Procedia Comput Sci 105:131–137
Google Scholar
Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado GS, Dean J (2013) Zero-shot learning by convex combination of semantic embeddings. ArXiv preprint arXiv:1312.5650
Nunes UM, Faria DR, Peixoto P (2017) A human activity recognition framework using max-min features and key poses with differential evolution random forests classifier. Pattern Recogn Lett 99:21–31
Google Scholar
Nweke HF, Teh YW, Mujtaba G, Al-Garadi MA (2019) Data fusion and multiple classifier systems for human activity detection and health monitoring: review and open research directions. Inf Fusion 46:147–170
Google Scholar
Ohlberger M, Rave S (2015) Reduced basis methods: success, limitations and future challenges. ArXiv preprint arXiv:1511.02021
Oikonomopoulos A, Patras I, Pantic M (2005) Spatiotemporal salient points for visual recognition of human actions. IEEE Trans Syst Man Cybern Part B Cybern 36(3):710–719
Google Scholar
Oliver N, Horvitz E, Garg A (2002) Layered representations for human activity recognition. In: Proceedings of the 4th IEEE international conference on multimodal interfaces. IEEE Computer Society, p 3
Oreifej O, Liu Z (2013) HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 716–723
Pagliari D, Pinto L (2015) Calibration of Kinect for xbox one and comparison between the two generations of microsoft sensors. Sensors 15(11):27569–27589
Google Scholar
Panahi L, Ghods V (2018) Human fall detection using machine vision techniques on RGB-D images. Biomed Signal Process Control 44:146–153
Google Scholar
Patel CI, Garg S, Zaveri T, Banerjee A, Patel R (2018) Human action recognition using fusion of features for unconstrained video sequences. Comput Electr Eng 70:284–301
Google Scholar
Paul M, Haque SM, Chakraborty S (2013) Human detection in surveillance videos and its applications: a review. EURASIP J Adv Signal Process 2013(1):176
Google Scholar
Peng X, Zou C, Qiao Y, Peng Q (2014) Action recognition with stacked fisher vectors. In: European conference on computer vision. Springer, pp 581–595
Pham HH, Salmane H, Khoudour L, Crouzil A, Velastin SA, Zegers P (2020) A unified deep framework for joint 3D pose estimation and action recognition from a single RGB camera. Sensors 20(7):1825
Google Scholar
Popoola OP, Wang K (2012) Video-based abnormal human behavior recognition: a review. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):865–878
Google Scholar
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Google Scholar
Prasnthi Mandha SVR, Lavanya Devi G (2017) A random forest based classification model for human activity recognition. Int J Adv Sci Technol Eng Manag Sci 3:294–300
Google Scholar
Presti LL, La Cascia M (2016) 3D skeleton-based human action classification: a survey. Pattern Recogn 53:130–147
Google Scholar
Qi M, Wang Y, Qin J, Li A, Luo J, Van Gool L (2019) stagNet: an attentive semantic RNN for group activity and individual action recognition. IEEE Trans Circuits Syst Video Technol 30:549–565
Google Scholar
Qian H, Mao Y, Xiang W, Wang Z (2010) Recognition of human activities using svm multi-class classifier. Pattern Recogn Lett 31(2):100–111
Google Scholar
Qin Y, Mo L, Xie B (2017) Feature fusion for human action recognition based on classical descriptors and 3D convolutional networks. In: 2017 eleventh international conference on sensing technology (ICST). IEEE, pp 1–5
Rapid-Rich-Object-Search Lab (2016) NTU RGB+D action recognition dataset. Last Accessed 11 Dec 2019
Razzak MI, Naz S, Zaib A (2018) Deep learning for medical image processing: overview, challenges and the future. In: Classification in BioApps. Springer, pp 323–350
Rensink RA (2000) The dynamic representation of scenes. Vis Cognit 7(1–3):17–42
Google Scholar
Robot-Learning-Lab (2017) Cornell activity dataset (CAD-60). Last Accessed 11 Dec 2019
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104
Google Scholar
Ronao CA, Cho S-B (2016) Human activity recognition with smartphone sensors using deep learning neural networks. Expert Syst Appl 59:235–244
Google Scholar
Roy Y, Banville H, Albuquerque I, Gramfort A, Falk TH, Faubert J (2019) Deep learning-based electroencephalography analysis: a systematic review. J Neural Eng 16(5):051001
Google Scholar
Saini O, Sharma S (2018) A review on dimension reduction techniques in data mining. Comput Eng Intell Syst 9:7–14
Google Scholar
Shao L, Ji L, Liu Y, Zhang J (2012) Human action segmentation and recognition via motion and shape analysis. Pattern Recogn Lett 33(4):438–445
Google Scholar
Sharma RP, Verma GK (2015) Human computer interaction using hand gesture. Procedia Comput Sci 54:721–727
Google Scholar
Sharma S, Kiros R, Salakhutdinov R (2015) Action recognition using visual attention. ArXiv preprint arXiv:1511.04119
Shereena V, David JM (2014) Content based image retrieval: classification using neural networks. Int J Multimedia Appl 6(5):31
Google Scholar
Shi Y, Tian Y, Wang Y, Huang T (2017) Sequential deep trajectory descriptor for action recognition with three-stream cnn. IEEE Trans Multimedia 19(7):1510–1520
Google Scholar
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12026–12035
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: CVPR 2011. IEEE, pp 1297–1304
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1227–1236
Singh D, Mohan CK (2017) Graph formulation of video activities for abnormal activity recognition. Pattern Recogn 65:265–272
Google Scholar
Singh S, Velastin SA, Ragheb H (2010) Muhavi: a multicamera human action video dataset for the evaluation of action recognition methods. In: Seventh IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 48–55
Song Y, Demirdjian D, Davis R (2011) NATOPS aircraft handling signals database. Last Accessed 11 Dec 2019
Statistical Visual Computing Lab (2014) UCSD anomaly detection dataset. Last Accessed 11 Dec 2019
Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329
Google Scholar
Taha A, Zayed HH, Khalifa M, El-Horbaty E-S (2014) Human action recognition based on msvm and depth images. Int J Comput Sci Issues (IJCSI) 11(4):42
Google Scholar
Thakkar A, Lohiya R (2020) Attack classification using feature selection techniques: a comparative study. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02167-9
Article Google Scholar
Thi TH, Zhang J, Cheng L, Wang L, Satoh S (2010) Human action recognition and localization in video using structured learning of local space–time features. In: 2010 seventh IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 204–211
Thomas G, Gade R, Moeslund TB, Carr P, Hilton A (2017) Computer vision for sports: current applications and research topics. Comput Vis Image Underst 159:3–18
Google Scholar
Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660
Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circuits Syst Video Technol 18(11):1473–1488
Google Scholar
Ullah A, Muhammad K, Haq IU, Baik SW (2019) Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments. Future Gener Comput Syst 96:386–397
Google Scholar
University of Minnesota (2010) Unusual crowd activity dataset. Last Accessed 11 Dec 2019
Varadarajan J, Odobez J-M (2009) Topic models for scene analysis and abnormality detection. In: 2009 IEEE 12th international conference on computer vision workshops (ICCV workshops). IEEE, pp 1338–1345
Veeriah V, Zhuang N, Qi G-J (2015) Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 4041–4049
Vezzani R, Baltieri D, Cucchiara R (2010) Hmm based action recognition with projection histogram features. In: International conference on pattern recognition. Springer, pp 286–293
Vishwakarma DK, Kapoor R (2015) Hybrid classifier based human activity recognition using the silhouette and cells. Expert Syst Appl 42(20):6957–6965
Google Scholar
Vishwakarma DK, Kapoor R, Dhiman A (2016) A proposed unified framework for the recognition of human activity by exploiting the characteristics of action dynamics. Robot Auton Syst 77:25–38
Google Scholar
Vrigkas M, Nikou C, Kakadiaris IA (2015) A review of human activity recognition methods. Front Robot AI 2:28
Google Scholar
Wang Y, Mori G (2009) Human action recognition by semilatent topic models. IEEE Trans Pattern Anal Mach Intell 31(10):1762–1774
Google Scholar
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558
Wang H, Kläser A, Schmid C, Liu C-L (2011) Action recognition by dense trajectories. In: CVPR 2011. IEEE, pp 3169–3176
Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305–4314
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision. Springer, pp 20–36
Wang P, Cao Y, Shen C, Liu L, Shen HT (2017) Temporal pyramid pooling-based convolutional neural network for action recognition. IEEE Trans Circuits Syst Video Technol 27(12):2613–2622
Google Scholar
Wang J, Chen Y, Hao S, Peng X, Hu L (2018) Deep learning for sensor-based activity recognition: a survey. Pattern Recogn Lett 119:3–11
Google Scholar
Wang W, Zheng VW, Yu H, Miao C (2019) A survey of zero-shot learning: settings, methods, and applications. ACM Trans Intell Syst Technol (TIST) 10(2):1–37
Google Scholar
Wanqing Li XN (2014) Northwestern-UCLA multiview action 3D dataset. Last Accessed 11 Dec 2019
Weimer D, Scholz-Reiter B, Shpitalni M (2016) Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Ann Manuf Technol 65(1):417–420
Google Scholar
Xia L (2016) UT Kinect-action 3D dataset. Last Accessed 11 Dec 2019
Xia L, Chen C-C, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 20–27
Xu D, Xiao X, Wang X, Wang J (2016) Human action recognition based on Kinect and PSO-SVM by representing 3D skeletons as points in lie group. In: 2016 international conference on audio, language and image processing (ICALIP). IEEE, pp 568–573
Xu L, Yang W, Cao Y, Li Q (2017) Human activity recognition based on random forests. In: 2017 13th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE, pp 548–553
YACVID (2014) MuHAVi dataset. Last Accessed 11 Dec 2019
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden Markov model. In: Proceedings CVPR’92 of the 1992 IEEE computer society conference on computer vision and pattern recognition, 1992. IEEE, pp 379–385
Yang Y, Ramanan D (2012) Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal Mach Intell 35(12):2878–2890
Google Scholar
Yang X, Tian Y (2014) Effective 3D action recognition using EigenJoints. J Vis Commun Image Represent 25(1):2–11
Google Scholar
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM international conference on Multimedia. ACM, pp 1057–1060
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence
Yang Y, Hou C, Lang Y, Guan D, Huang D, Xu J (2019) Open-set human activity recognition based on micro-Doppler signatures. Pattern Recogn 85:60–69
Google Scholar
Yao A, Gall J, Fanelli G, Van Gool L (2011) Does human action recognition benefit from pose estimation? In: BMVC 2011-proceedings of the British machine vision conference 2011
You D, Hamsici OC, Martinez AM (2010) Kernel optimization in discriminant analysis. IEEE Trans Pattern Anal Mach Intell 33(3):631–638
Google Scholar
You I, Choo K-KR, Ho C-L et al (2018) A smartphone-based wearable sensors for monitoring real-time physiological data. Comput Electr Eng 65:376–392
Google Scholar
Yu M, Yu Y, Rhuma A, Naqvi SM, Wang L, Chambers JA et al (2013) An online one class support vector machine-based person-specific fall detection system for monitoring an elderly individual in a room environment. IEEE J Biomed Health Inform 17(6):1002–1014
Google Scholar
Zellers R, Choi Y (2017) Zero-shot activity recognition with verb attribute induction. ArXiv preprint arXiv:1707.09468
Zhang Z (2012) Microsoft Kinect sensor and its effect. IEEE Multimedia 19(2):4–10
Google Scholar
Zhang X, Yao L, Wang X, Monaghan J, Mcalpine D, Zhang Y (2019a) A survey on deep learning based brain computer interface: recent advances and new frontiers. ArXiv preprint arXiv:1905.04149
Zhang X, Yao L, Wang X, Zhang W, Zhang S, Liu Y (2019b) Know your mind: adaptive cognitive activity recognition with reinforced CNN. In: 2019 IEEE international conference on data mining (ICDM). IEEE, pp 896–905
Zhou X, Zhu M, Pavlakos G, Leonardos S, Derpanis KG, Daniilidis K (2018a) Monocap: monocular human motion capture using a CNN coupled with a geometric prior. IEEE Trans Pattern Anal Mach Intell 41(4):901–914
Google Scholar
Zhou Y, Sun X, Zha Z-J, Zeng W (2018b) Mict: Mixed 3D/2D convolutional tube for human action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 449–458
Zhu Y, Chen W, Guo G (2014) Evaluating spatiotemporal interest point features for depth-based action recognition. Image Vis Comput 32(8):453–464
Google Scholar
Zhu F, Shao L, Xie J, Fang Y (2016a) From handcrafted to learned representations for human action recognition: a survey. Image Vis Comput 55:42–52
Google Scholar
Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X et al (2016b) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI, vol 2, p 8

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad, Gujarat, 382 481, India
Preksha Pareek & Ankit Thakkar

Authors

Preksha Pareek
View author publications
You can also search for this author in PubMed Google Scholar
Ankit Thakkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Preksha Pareek.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pareek, P., Thakkar, A. A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications. Artif Intell Rev 54, 2259–2322 (2021). https://doi.org/10.1007/s10462-020-09904-8

Download citation

Published: 25 September 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10462-020-09904-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Abstract

Access this article

Similar content being viewed by others

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Human activity recognition in artificial intelligence framework: a narrative review

Human Action Recognition and Prediction: A Survey

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Abstract

Access this article

Similar content being viewed by others

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Human activity recognition in artificial intelligence framework: a narrative review

Human Action Recognition and Prediction: A Survey

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation