Vision-based human action recognition (HAR) is a hot topic of research from the decade due to a few popular applications such as visual surveillance and robotics. For correct action recognition, various local and global points are requires known as features. These features modified during the variation in human movement. But due to a bit change in several human actions, the features of these actions are mixed that degrade the recognition performance. In this article, we design a new 26-layered Convolutional Neural Network (CNN) architecture for accurate complex action recognition. The features are extracted from the global average pooling layer and fully connected (FC) layer, and fused by a proposed high entropy-based approach. Further, we propose a feature selection method name Poisson distribution along with Univariate Measures (PDaUM). Few of fused CNN features are irrelevant, and few of them are redundant that makes the incorrect prediction among complex human actions. Therefore, the proposed PDaUM based approach selects only the strongest features that later passed to the Extreme Learning Machine (ELM) and Softmax for final recognition. Four datasets are using for experimental analysis - HMDB51 (51 classes), UCF Sports (10 classes), KTH (6 classes), and Weizmann (10 classes). On these datasets, the ELM classifier gives an improved performance as compared to a Softmax classifier. The achieved accuracy on each dataset is 81.4%, 99.2%, 98.3%, and 98.7%, respectively. Comparison with existing techniques, it is shown that the proposed architecture gives better performance in terms of accuracy and testing time.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Arshad H, Khan MA, Sharif M, Yasmin M, Javed MY (2019) Multi-level features fusion and selection for human gait recognition: an optimized framework of Bayesian model and binomial distribution. Int J Mach Learn Cybern 10:3601–3618
S Asghari-Esfeden, M Sznaier, O Camps (2020) Dynamic Motion Representation for Human Action Recognition. IEEE Winter Conf Appl Comput Vis 557–566
Aurangzeb K, Haider I, Khan MA, Saba T, Javed K, Iqbal T, Rehman A, Ali H, Sarfraz MS (2019) Human behavior analysis based on multi-types features fusion and Von Nauman entropy based features reduction. J Med Imaging Health Inform 9:662–669
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. Tenth IEEE Int Conf Comput Vis (ICCV'05) 1:1395–1402
S Chen, Y Shen, Y Yan, D Wang, S Zhu (2020) Cholesky decomposition based metric learning for video-based human action recognition, IEEE Access
Dai C, Liu X, Lai J (2020) Human action recognition using two-stream attention based LSTM networks. Appl Soft Comput 86:105820
Gu Y, Ye X, Sheng W, Ou Y, Li Y (2020) Multiple stream deep learning model for human action recognition. Image Vis Comput 93:103818
S Hiriyannaiah, B Akanksh, A Koushik, G Siddesh, K Srinivasa (2020) Deep Learning for Multimedia Data in IoT. Multimed Big Data Comput IoT Appl, ed: Springer, 101–129
Huang G-B, Zhou H, Ding X, Zhang R (2011) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst, Man, Cybernet, Part B (Cybernetics) 42:513–529
Hussain N, Khan MA, Sharif M, Khan SA, Albesher AA, Saba T et al (2020) A deep neural network and classical features based scheme for objects recognition: an application for machine inspection. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-08852-3
Huynh-The T, Hua C-H, Ngo T-T, Kim D-S (2020) Image representation of pose-transition feature for 3D skeleton-based action recognition. Inf Sci 513:112–126
Khan M, Akram T, Sharif M, Muhammad N, Javed M, Naqvi S (2019) An improved strategy for human action recognition; experiencing a cascaded design. IET Image Process
Khan MA, Akram T, Sharif M, Javed MY, Muhammad N, Yasmin M (2019) An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Anal Applic 22:1377–1397
Khan MA, Javed K, Khan SA, Saba T, Habib U, Khan JA et al (2020) Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimed Tools Appl:1–27
S Kulkarni, S Jadhav, D Adhikari (2020) A Survey on Human Group Activity Recognition by Analysing Person Action from Video Sequences Using Machine Learning Techniques. Optim Mach Learn Appl, ed: Springer, 141–153
X Long, C Gan, G De Melo, J Wu, X Liu, S Wen (2018) Attention clusters: Purely attention based local feature integration for video classification," in Proc IEEE Conf Comput Vis Patt Recog 7834–7843
P-E Martin, J Benois-Pineau, R Péteri, J Morlier (2020) Fine grained sport action recognition with twin spatio-temporal convolutional neural networks: application to table tennis. Multimed Tools Appl 1–19
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2:1
Nazir S, Yousaf MH, Nebel J-C, Velastin SA (2018) A bag of expression framework for improved human action recognition. Pattern Recogn Lett 103:39–45
Ouyang X, Xu S, Zhang C, Zhou P, Yang Y, Liu G, Li X (2019) A 3D-CNN and LSTM based multi-task learning architecture for action recognition. IEEE Access 7:40757–40770
T Ozcan, A Basturk (2020) Human action recognition with deep learning and structural optimization using a hybrid heuristic algorithm. Clust Comput 1–14
MD Rodriguez, J Ahmed, M Shah (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. 2008 IEEE Conf Comput Vis Patt Recog 1–8
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. Proc 17th Int Conf Patt Recog, 2004 ICPR 2004:32–36
Sharif M, Khan MA, Akram T, Javed MY, Saba T, Rehman A (2017) A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection. EURASIP J Image Video Proc 2017:89
Sharif A, Khan MA, Javed K, Gulfam H, Iqbal T, Saba T et al (2019) Intelligent human action recognition: a framework of optimal features selection based on Euclidean distance and strong correlation. J Control Eng Appl Inform 21:3–11
Sharif M, Attique M, Tahir MZ, Yasmim M, Saba T, Tanik UJ (2020) A Machine Learning Method with Threshold Based Parallel Feature Fusion and Feature Selection for Automated Gait Recognition. J Organ End User Comput (JOEUC) 32:67–92
Sharif M, Akram T, Raza M, Saba T, Rehman A (2020) Hand-crafted and deep convolutional neural network features fusion and selection strategy: an application to intelligent human action recognition. Appl Soft Comput 87:105986
Sharif M, Khan MA, Zahid F, Shah JH, Akram T (2020) Human action recognition: a framework of statistical weighted segmentation and rank correlation-based selection. Pattern Anal Applic 23:281–294
Siddiqui S, Khan MA, Bashir K, Sharif M, Azam F, Javed MY (2018) Human action recognition: a construction of codebook by discriminative features selection approach. Int J Appl Patt Recog 5:206–228
K Simonyan, A Zisserman (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Proces Syst, 568–576
K Soomro, AR Zamir, M Shah (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402
Stoian A, Ferecatu M, Benois-Pineau J, Crucianu M (2015) Fast action localization in large-scale video archives. IEEE Trans Circ Syst Video Technol 26:1917–1930
L Sun, K Jia, D-Y Yeung, BE Shi (2015) Human action recognition using factorized spatio-temporal convolutional networks. Proc IEEE Int Conf Comput Vis 4597–4605
Tu NA, Huynh-The T, Khan KU, Lee Y-K (2018) ML-HDP: a hierarchical Bayesian nonparametric model for recognizing human actions in video. IEEE Trans Circ Syst Video Technol 29:800–814
Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2017) Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6:1155–1166
Varol G, Laptev I, Schmid C (2017) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40:1510–1517
Vishwakarma DK (2020) A two-fold transformation model for human action recognition using decisive pose. Cogn Syst Res 61:1–13
L Wang, Y Qiao, X Tang (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. Proceedings of the IEEE conference on computer vision and pattern recognition 4305–4314
L Wang, Y Xiong, Z Wang, Y Qiao, D Lin, X Tang, et al. (2016) Temporal segment networks: Towards good practices for deep action recognition. Eur Conf Comput Vis 20–36
J Wang, X Peng, Y Qiao (2020) Cascade multi-head attention networks for action recognition. Comput Vis Image Understanding 102898
Xiong Q, Zhang J, Wang P, Liu D, Gao RX (2020) Transferable two-stream convolutional neural network for human action recognition. J Manuf Syst
Yi Y, Li A, Zhou X (2020) Human action recognition based on action relevance weighted encoding. Signal Process Image Commun 80:115640
Yudistira N, Kurita T (2020) Correlation net: spatiotemporal multimodal deep learning for action recognition. Signal Process Image Commun 82:115731
Zhang H-B, Zhang Y-X, Zhong B, Lei Q, Yang L, Du J-X et al (2019) A comprehensive survey of vision-based human action recognition methods. Sensors 19:1005
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2019R1F1A1058715).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Khan, M.A., Zhang, YD., Khan, S.A. et al. A resource conscious human action recognition framework using 26-layered deep convolutional neural network. Multimed Tools Appl (2020). https://doi.org/10.1007/s11042-020-09408-1
- Action recognition
- CNN architecture
- Features fusion
- Features selection