A resource conscious human action recognition framework using 26-layered deep convolutional neural network


Vision-based human action recognition (HAR) is a hot topic of research from the decade due to a few popular applications such as visual surveillance and robotics. For correct action recognition, various local and global points are requires known as features. These features modified during the variation in human movement. But due to a bit change in several human actions, the features of these actions are mixed that degrade the recognition performance. In this article, we design a new 26-layered Convolutional Neural Network (CNN) architecture for accurate complex action recognition. The features are extracted from the global average pooling layer and fully connected (FC) layer, and fused by a proposed high entropy-based approach. Further, we propose a feature selection method name Poisson distribution along with Univariate Measures (PDaUM). Few of fused CNN features are irrelevant, and few of them are redundant that makes the incorrect prediction among complex human actions. Therefore, the proposed PDaUM based approach selects only the strongest features that later passed to the Extreme Learning Machine (ELM) and Softmax for final recognition. Four datasets are using for experimental analysis - HMDB51 (51 classes), UCF Sports (10 classes), KTH (6 classes), and Weizmann (10 classes). On these datasets, the ELM classifier gives an improved performance as compared to a Softmax classifier. The achieved accuracy on each dataset is 81.4%, 99.2%, 98.3%, and 98.7%, respectively. Comparison with existing techniques, it is shown that the proposed architecture gives better performance in terms of accuracy and testing time.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18


  1. 1.

    Arshad H, Khan MA, Sharif M, Yasmin M, Javed MY (2019) Multi-level features fusion and selection for human gait recognition: an optimized framework of Bayesian model and binomial distribution. Int J Mach Learn Cybern 10:3601–3618

    Article  Google Scholar 

  2. 2.

    S Asghari-Esfeden, M Sznaier, O Camps (2020) Dynamic Motion Representation for Human Action Recognition. IEEE Winter Conf Appl Comput Vis 557–566

  3. 3.

    Aurangzeb K, Haider I, Khan MA, Saba T, Javed K, Iqbal T, Rehman A, Ali H, Sarfraz MS (2019) Human behavior analysis based on multi-types features fusion and Von Nauman entropy based features reduction. J Med Imaging Health Inform 9:662–669

    Article  Google Scholar 

  4. 4.

    Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. Tenth IEEE Int Conf Comput Vis (ICCV'05) 1:1395–1402

    Article  Google Scholar 

  5. 5.

    S Chen, Y Shen, Y Yan, D Wang, S Zhu (2020) Cholesky decomposition based metric learning for video-based human action recognition, IEEE Access

  6. 6.

    Dai C, Liu X, Lai J (2020) Human action recognition using two-stream attention based LSTM networks. Appl Soft Comput 86:105820

    Article  Google Scholar 

  7. 7.

    Gu Y, Ye X, Sheng W, Ou Y, Li Y (2020) Multiple stream deep learning model for human action recognition. Image Vis Comput 93:103818

    Article  Google Scholar 

  8. 8.

    S Hiriyannaiah, B Akanksh, A Koushik, G Siddesh, K Srinivasa (2020) Deep Learning for Multimedia Data in IoT. Multimed Big Data Comput IoT Appl, ed: Springer, 101–129

  9. 9.

    Huang G-B, Zhou H, Ding X, Zhang R (2011) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst, Man, Cybernet, Part B (Cybernetics) 42:513–529

    Article  Google Scholar 

  10. 10.

    Hussain N, Khan MA, Sharif M, Khan SA, Albesher AA, Saba T et al (2020) A deep neural network and classical features based scheme for objects recognition: an application for machine inspection. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-08852-3

  11. 11.

    Huynh-The T, Hua C-H, Ngo T-T, Kim D-S (2020) Image representation of pose-transition feature for 3D skeleton-based action recognition. Inf Sci 513:112–126

    Article  Google Scholar 

  12. 12.

    Khan M, Akram T, Sharif M, Muhammad N, Javed M, Naqvi S (2019) An improved strategy for human action recognition; experiencing a cascaded design. IET Image Process

  13. 13.

    Khan MA, Akram T, Sharif M, Javed MY, Muhammad N, Yasmin M (2019) An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Anal Applic 22:1377–1397

    MathSciNet  Article  Google Scholar 

  14. 14.

    Khan MA, Javed K, Khan SA, Saba T, Habib U, Khan JA et al (2020) Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimed Tools Appl:1–27

  15. 15.

    S Kulkarni, S Jadhav, D Adhikari (2020) A Survey on Human Group Activity Recognition by Analysing Person Action from Video Sequences Using Machine Learning Techniques. Optim Mach Learn Appl, ed: Springer, 141–153

  16. 16.

    X Long, C Gan, G De Melo, J Wu, X Liu, S Wen (2018) Attention clusters: Purely attention based local feature integration for video classification," in Proc IEEE Conf Comput Vis Patt Recog 7834–7843

  17. 17.

    P-E Martin, J Benois-Pineau, R Péteri, J Morlier (2020) Fine grained sport action recognition with twin spatio-temporal convolutional neural networks: application to table tennis. Multimed Tools Appl 1–19

  18. 18.

    Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2:1

    Article  Google Scholar 

  19. 19.

    Nazir S, Yousaf MH, Nebel J-C, Velastin SA (2018) A bag of expression framework for improved human action recognition. Pattern Recogn Lett 103:39–45

    Article  Google Scholar 

  20. 20.

    Ouyang X, Xu S, Zhang C, Zhou P, Yang Y, Liu G, Li X (2019) A 3D-CNN and LSTM based multi-task learning architecture for action recognition. IEEE Access 7:40757–40770

    Article  Google Scholar 

  21. 21.

    T Ozcan, A Basturk (2020) Human action recognition with deep learning and structural optimization using a hybrid heuristic algorithm. Clust Comput 1–14

  22. 22.

    MD Rodriguez, J Ahmed, M Shah (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. 2008 IEEE Conf Comput Vis Patt Recog 1–8

  23. 23.

    Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. Proc 17th Int Conf Patt Recog, 2004 ICPR 2004:32–36

    Article  Google Scholar 

  24. 24.

    Sharif M, Khan MA, Akram T, Javed MY, Saba T, Rehman A (2017) A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection. EURASIP J Image Video Proc 2017:89

    Article  Google Scholar 

  25. 25.

    Sharif A, Khan MA, Javed K, Gulfam H, Iqbal T, Saba T et al (2019) Intelligent human action recognition: a framework of optimal features selection based on Euclidean distance and strong correlation. J Control Eng Appl Inform 21:3–11

    Google Scholar 

  26. 26.

    Sharif M, Attique M, Tahir MZ, Yasmim M, Saba T, Tanik UJ (2020) A Machine Learning Method with Threshold Based Parallel Feature Fusion and Feature Selection for Automated Gait Recognition. J Organ End User Comput (JOEUC) 32:67–92

    Article  Google Scholar 

  27. 27.

    Sharif M, Akram T, Raza M, Saba T, Rehman A (2020) Hand-crafted and deep convolutional neural network features fusion and selection strategy: an application to intelligent human action recognition. Appl Soft Comput 87:105986

    Article  Google Scholar 

  28. 28.

    Sharif M, Khan MA, Zahid F, Shah JH, Akram T (2020) Human action recognition: a framework of statistical weighted segmentation and rank correlation-based selection. Pattern Anal Applic 23:281–294

    Article  Google Scholar 

  29. 29.

    Siddiqui S, Khan MA, Bashir K, Sharif M, Azam F, Javed MY (2018) Human action recognition: a construction of codebook by discriminative features selection approach. Int J Appl Patt Recog 5:206–228

    Article  Google Scholar 

  30. 30.

    K Simonyan, A Zisserman (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Proces Syst, 568–576

  31. 31.

    K Soomro, AR Zamir, M Shah (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402

  32. 32.

    Stoian A, Ferecatu M, Benois-Pineau J, Crucianu M (2015) Fast action localization in large-scale video archives. IEEE Trans Circ Syst Video Technol 26:1917–1930

    Article  Google Scholar 

  33. 33.

    L Sun, K Jia, D-Y Yeung, BE Shi (2015) Human action recognition using factorized spatio-temporal convolutional networks. Proc IEEE Int Conf Comput Vis 4597–4605

  34. 34.

    Tu NA, Huynh-The T, Khan KU, Lee Y-K (2018) ML-HDP: a hierarchical Bayesian nonparametric model for recognizing human actions in video. IEEE Trans Circ Syst Video Technol 29:800–814

    Article  Google Scholar 

  35. 35.

    Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2017) Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6:1155–1166

    Article  Google Scholar 

  36. 36.

    Varol G, Laptev I, Schmid C (2017) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40:1510–1517

    Article  Google Scholar 

  37. 37.

    Vishwakarma DK (2020) A two-fold transformation model for human action recognition using decisive pose. Cogn Syst Res 61:1–13

    Article  Google Scholar 

  38. 38.

    L Wang, Y Qiao, X Tang (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. Proceedings of the IEEE conference on computer vision and pattern recognition 4305–4314

  39. 39.

    L Wang, Y Xiong, Z Wang, Y Qiao, D Lin, X Tang, et al. (2016) Temporal segment networks: Towards good practices for deep action recognition. Eur Conf Comput Vis 20–36

  40. 40.

    J Wang, X Peng, Y Qiao (2020) Cascade multi-head attention networks for action recognition. Comput Vis Image Understanding 102898

  41. 41.

    Xiong Q, Zhang J, Wang P, Liu D, Gao RX (2020) Transferable two-stream convolutional neural network for human action recognition. J Manuf Syst

  42. 42.

    Yi Y, Li A, Zhou X (2020) Human action recognition based on action relevance weighted encoding. Signal Process Image Commun 80:115640

    Article  Google Scholar 

  43. 43.

    Yudistira N, Kurita T (2020) Correlation net: spatiotemporal multimodal deep learning for action recognition. Signal Process Image Commun 82:115731

    Article  Google Scholar 

  44. 44.

    Zhang H-B, Zhang Y-X, Zhong B, Lei Q, Yang L, Du J-X et al (2019) A comprehensive survey of vision-based human action recognition methods. Sensors 19:1005

    Article  Google Scholar 

Download references


This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2019R1F1A1058715).

Author information



Corresponding author

Correspondence to Sanghyun Seo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Khan, M.A., Zhang, YD., Khan, S.A. et al. A resource conscious human action recognition framework using 26-layered deep convolutional neural network. Multimed Tools Appl (2020). https://doi.org/10.1007/s11042-020-09408-1

Download citation


  • Action recognition
  • CNN architecture
  • Features fusion
  • Features selection
  • ELM