Skip to main content

Advertisement

Log in

Recognition of human actions using CNN-GWO: a novel modeling of CNN for enhancement of classification performance

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recognizing human actions from unconstrained videos turns to be a major challenging task in computer visualization approaches due to decreased accuracy in the feature classification performance. Therefore to improve the classification performance it is essential to minimize the ‘classification’ errors. Here, in this work, we propose a hybrid CNN-GWO approach for the recognition of human actions from the unconstrained videos. The weight initializations for the proposed deep Convolutional Neural Network (CNN) classifiers highly depend on the generated solutions of GWO (Grey Wolf Optimization) algorithm, which in turn minimizes the ‘classification’ errors. The action bank and local spatio-temporal features are generated for a video and fed into the ‘CNN’ classifiers. The ‘CNN’ classifiers are trained by a gradient descent algorithm to detect a ‘local minimum’ during the fitness computation of GWO ‘search agents’. The GWO algorithms ‘global search’ capability as well as the gradient descent algorithms ‘local search’ capabilities are subjected for the identification of a solution which is nearer to the global optimum. Finally, the classification performance can be further enhanced by fusing the classifiers evidences produced by the GWO algorithm. The proposed classification frameworks efficiency for the recognition of human actions is evaluated with the help of four achievable action recognition datasets namely HMDB51, UCF50, Olympic Sports and Virat Release 2.0. The experimental validation of our proposed approach shows better achievable results on the recognition of human actions with 99.9% recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Ballas N, Yang Y, Lan Z-Z, Delezoide B, Preteux F, Hauptmann A (2013) Space-time robust representation for action recognition. In: The IEEE International Conference on Computer Vision (ICCV)

  2. Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    Article  MATH  Google Scholar 

  3. Bengio Y, Delalleau O (2011) On the expressive power of deep architectures. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory, ALT’11, Springer-Verlag, Berlin, pp 18–36

  4. Bengio Y, Simard P, Frasconi P (2000) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  Google Scholar 

  5. Bergstra J, Bengio Y (2013) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305

    MathSciNet  MATH  Google Scholar 

  6. Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B,Shelhamer E cudnn: efficient primitives for deep learning, CoRR abs/1410.0759,720 URL http://arxiv.org/abs/1410.0759

  7. Dahl GE, Sainath TN, Hinton GE (2013) Improving deep neural networks for lvcsr using rectified linear units and dropout. In: Proceedings of the 2013 International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 8609–8613

  8. Ding S, Li H, Su C, Yu J, Jin F (2013) Evolutionary artificial neural networks: a review. Artif Intell Rev 39(3):251–260. https://doi.org/10.1007/s10462-011-9270-6

    Article  Google Scholar 

  9. Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72. IEEE, Beijing

  10. Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. J Neurocomput 172:371–381

    Article  Google Scholar 

  11. Erhan D, Bengio Y, Courville A, Manzagol P-A, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660

    MathSciNet  MATH  Google Scholar 

  12. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feed forward neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS10), Journal on Society for Artificial Intelligence and Statistics

  13. Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. J Image Vision Comput 60:4–21

    Article  Google Scholar 

  14. Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  15. Huang HJ, Serre T, Wolfand L, Poggio T (2007) A biologically inspired system for action recognition. In: Proceedings of ICCV, pp 1–8

  16. Ignatov A (2017) Real-time human activity recognition from accelerometer data using Convolutional Neural Networks. Appl Soft Comput 62:915–922

  17. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  18. Jones S, Shao L (2013) Content-based retrieval of human actions from realistic video databases. J Inf Sci 236:56–65

    Article  Google Scholar 

  19. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: IEEE International Conference on Computer Vision (ICCV), 2011 pp 2556–2563

  20. Laptev I, Lindeberg T (2003) Space-time interest points. In: Proceedings of ICCV, pp 432–439

  21. Laptev I, Lindeberg T (2003) Space-time interest points, ICCV

  22. Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10:1–40

    MATH  Google Scholar 

  23. Lim MK, Tang S, Chan CS (2014) iSurveillance: intelligent framework for multiple events detection in surveillance videos. Expert Syst Appl 41(10):4704–4715

    Article  Google Scholar 

  24. Liu AA, Xu N, Nie WZ, Su YT, Wong Y, Kankanhalli M (2017) Benchmarking a multimodal and multiview and interactive dataset for human action recognition. IEEE Transactions on Cybernetics 47(7):1781–1794

  25. Lu Y, Boukharouba K, Boonært J, Fleury A, Lecoeuche S (2014) Application of an incremental SVM algorithm for on-line human recognition from video surveillance using texture and color features. J Neurocomput 126:132–140

    Article  Google Scholar 

  26. Ma S, Bargal SA, Zhang J, Sigal L, Sclaroff S (2016) Do less and achieve more: training CNNs for action recognition utilizing action images from the web. Pattern Recogn 68:334–345

    Article  Google Scholar 

  27. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Article  Google Scholar 

  28. Mohammadi E, Wu QMJ, Saif M (2016) Human action recognition by fusing the outputs of individual classifiers. In: IEEE 13th Conference on Computer and Robot Vision (CRV) 2016 pp 335–341

  29. Niebles JC, Chen C-W, Fei-Fei L (2010) Modeling temporal structure of decomposable motion segments for activity classification. In: European conference on computer vision, Springer, Berlin, pp 392–405

  30. Oh S, Hoogs A, Perera A, Cuntoor N, Chen C-C, Lee JT, Mukherjee S et al (2011) A large-scale benchmark dataset for event recognition in surveillance video. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3153–3160

  31. Oikonomopoulos A, Patras I, Pantic M (2006) Spatio-temporal salient points for visual recognition of human actions. IEEE Trans Syst, Man Cybern 36(3):710–719

    Article  Google Scholar 

  32. Prechelt L (2000) Early stopping - but when? In: Neural Networks: Tricks of the Trade, volume 1524 of LNCS, Springer-Verlag, pp 55–69

  33. Qiu Q, Jiang Z, Chellappa R (2011) Sparse dictionary-based representation and recognition of action attributes. In: IEEE International Conference on Computer Vision (ICCV), 2011, pp 707–714

  34. Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971{981. https://doi.org/10.1007/s00138-012-0450-4

    Article  Google Scholar 

  35. Rodriguez M, Orrite C, Medrano C, Makris D (2016) A time flexible kernel framework for video-based activity recognition. J Image Vision Comput 48:26–36

    Article  Google Scholar 

  36. Sadanand S, Corso JJ (2012) Action bank: A high-level representation of activity in video. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1234–1241

  37. Schaffer J, Whitley D, Eshelman L (1992) Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: International Workshop on Combinations of Genetic Algorithms and Neural Networks (COGANN-92), pp 1–37. doi:https://doi.org/10.1109/COGANN.1992.273950

  38. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local SVM approach. In: IEEE International Conference on Pattern Recognition vol. 3, pp 32–36

  39. Shen J (2009) Stochastic modeling western paintings for effective classification. Pattern Recogn 42(2):293–301

    Article  MATH  Google Scholar 

  40. Shen J, Deng RH, Cheng Z, Nie L, Yan S (2015) On robust image spam filtering via comprehensive visual modeling. Pattern Recogn 48(10):3227–3238

    Article  Google Scholar 

  41. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  42. Vinu Sundararaj (2016) An efficient threshold prediction scheme for wavelet based ECG signal noise reduction using variable step size firefly algorithm. Int J Intell Eng Syst 9(3):117–126

  43. Sutskever I, Martens J, Dahl GE, Hinton GE (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), Vol. 28, pp 1139–1147

  44. Vinothini J, Ashok Bakkiyaraj R (2012) Grey wolf optimization algorithm for color image enhancement considering brightness preservation constraint. IEEE Conf Comput Vis Pattern Recognit 482:1234–1241

    Google Scholar 

  45. Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3551–3558

  46. Wang L, Yu Q, Tang X (2014) Latent hierarchical model of temporal structure for complex activity classification. IEEE Trans Image Process 23(2):810–822

    Article  MathSciNet  MATH  Google Scholar 

  47. Xie L, Shen J, Han J, Zhu L, Shao L (2017) Dynamic multi-view hashing for online image retrieval. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp 3133–3139. IJCAI, Melbourne

  48. Yao X (1999) Evolving artificial neural networks. Proc IEEE 87(9):1423–1447. https://doi.org/10.1109/5.784219.

    Article  Google Scholar 

  49. Yuan F, Xia G-S, Sahbi H, Prinet V (2012) Mid-level features and spatio-temporal context for activity recognition. Pattern Recogn 45(12):4182–4191

    Article  Google Scholar 

  50. Zhang E, Chen W, Zhang Z, Zhang Y (2016) Local surface geometric feature for 3D human action recognition. J Neurocomput 208(5):281–289

  51. Zhen X, Shao L, Li X (2014) Action recognition by spatio-temporal oriented energies. J Inform Sci 281:295–309

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by my supervisor Dr.U.Srinivasulu Reddy, Assistant professor/ Department of Computer Applications, NIT Trichy, Tamilnadu, India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Kumaran.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumaran, N., Vadivel, A. & Kumar, S.S. Recognition of human actions using CNN-GWO: a novel modeling of CNN for enhancement of classification performance. Multimed Tools Appl 77, 23115–23147 (2018). https://doi.org/10.1007/s11042-017-5591-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5591-z

Keywords

Navigation