Recognition of human actions using CNN-GWO: a novel modeling of CNN for enhancement of classification performance

Kumaran, N.; Vadivel, A.; Kumar, S. Saravana

doi:10.1007/s11042-017-5591-z

Recognition of human actions using CNN-GWO: a novel modeling of CNN for enhancement of classification performance

Published: 19 January 2018

Volume 77, pages 23115–23147, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

N. Kumaran¹,
A. Vadivel¹ &
S. Saravana Kumar²

949 Accesses
19 Citations
Explore all metrics

Abstract

Recognizing human actions from unconstrained videos turns to be a major challenging task in computer visualization approaches due to decreased accuracy in the feature classification performance. Therefore to improve the classification performance it is essential to minimize the ‘classification’ errors. Here, in this work, we propose a hybrid CNN-GWO approach for the recognition of human actions from the unconstrained videos. The weight initializations for the proposed deep Convolutional Neural Network (CNN) classifiers highly depend on the generated solutions of GWO (Grey Wolf Optimization) algorithm, which in turn minimizes the ‘classification’ errors. The action bank and local spatio-temporal features are generated for a video and fed into the ‘CNN’ classifiers. The ‘CNN’ classifiers are trained by a gradient descent algorithm to detect a ‘local minimum’ during the fitness computation of GWO ‘search agents’. The GWO algorithms ‘global search’ capability as well as the gradient descent algorithms ‘local search’ capabilities are subjected for the identification of a solution which is nearer to the global optimum. Finally, the classification performance can be further enhanced by fusing the classifiers evidences produced by the GWO algorithm. The proposed classification frameworks efficiency for the recognition of human actions is evaluated with the help of four achievable action recognition datasets namely HMDB51, UCF50, Olympic Sports and Virat Release 2.0. The experimental validation of our proposed approach shows better achievable results on the recognition of human actions with 99.9% recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human action recognition using three orthogonal planes with unsupervised deep convolutional neural network

Article 04 March 2021

A resource conscious human action recognition framework using 26-layered deep convolutional neural network

Article 01 August 2020

Human action recognition using a hybrid deep learning heuristic

Article 28 August 2021

References

Ballas N, Yang Y, Lan Z-Z, Delezoide B, Preteux F, Hauptmann A (2013) Space-time robust representation for action recognition. In: The IEEE International Conference on Computer Vision (ICCV)
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Article MATH Google Scholar
Bengio Y, Delalleau O (2011) On the expressive power of deep architectures. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory, ALT’11, Springer-Verlag, Berlin, pp 18–36
Bengio Y, Simard P, Frasconi P (2000) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Article Google Scholar
Bergstra J, Bengio Y (2013) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305
MathSciNet MATH Google Scholar
Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B,Shelhamer E cudnn: efficient primitives for deep learning, CoRR abs/1410.0759,720 URL http://arxiv.org/abs/1410.0759
Dahl GE, Sainath TN, Hinton GE (2013) Improving deep neural networks for lvcsr using rectified linear units and dropout. In: Proceedings of the 2013 International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 8609–8613
Ding S, Li H, Su C, Yu J, Jin F (2013) Evolutionary artificial neural networks: a review. Artif Intell Rev 39(3):251–260. https://doi.org/10.1007/s10462-011-9270-6
Article Google Scholar
Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72. IEEE, Beijing
Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. J Neurocomput 172:371–381
Article Google Scholar
Erhan D, Bengio Y, Courville A, Manzagol P-A, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660
MathSciNet MATH Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feed forward neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS10), Journal on Society for Artificial Intelligence and Statistics
Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. J Image Vision Comput 60:4–21
Article Google Scholar
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Huang HJ, Serre T, Wolfand L, Poggio T (2007) A biologically inspired system for action recognition. In: Proceedings of ICCV, pp 1–8
Ignatov A (2017) Real-time human activity recognition from accelerometer data using Convolutional Neural Networks. Appl Soft Comput 62:915–922
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Jones S, Shao L (2013) Content-based retrieval of human actions from realistic video databases. J Inf Sci 236:56–65
Article Google Scholar
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: IEEE International Conference on Computer Vision (ICCV), 2011 pp 2556–2563
Laptev I, Lindeberg T (2003) Space-time interest points. In: Proceedings of ICCV, pp 432–439
Laptev I, Lindeberg T (2003) Space-time interest points, ICCV
Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10:1–40
MATH Google Scholar
Lim MK, Tang S, Chan CS (2014) iSurveillance: intelligent framework for multiple events detection in surveillance videos. Expert Syst Appl 41(10):4704–4715
Article Google Scholar
Liu AA, Xu N, Nie WZ, Su YT, Wong Y, Kankanhalli M (2017) Benchmarking a multimodal and multiview and interactive dataset for human action recognition. IEEE Transactions on Cybernetics 47(7):1781–1794
Lu Y, Boukharouba K, Boonært J, Fleury A, Lecoeuche S (2014) Application of an incremental SVM algorithm for on-line human recognition from video surveillance using texture and color features. J Neurocomput 126:132–140
Article Google Scholar
Ma S, Bargal SA, Zhang J, Sigal L, Sclaroff S (2016) Do less and achieve more: training CNNs for action recognition utilizing action images from the web. Pattern Recogn 68:334–345
Article Google Scholar
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Article Google Scholar
Mohammadi E, Wu QMJ, Saif M (2016) Human action recognition by fusing the outputs of individual classifiers. In: IEEE 13th Conference on Computer and Robot Vision (CRV) 2016 pp 335–341
Niebles JC, Chen C-W, Fei-Fei L (2010) Modeling temporal structure of decomposable motion segments for activity classification. In: European conference on computer vision, Springer, Berlin, pp 392–405
Oh S, Hoogs A, Perera A, Cuntoor N, Chen C-C, Lee JT, Mukherjee S et al (2011) A large-scale benchmark dataset for event recognition in surveillance video. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3153–3160
Oikonomopoulos A, Patras I, Pantic M (2006) Spatio-temporal salient points for visual recognition of human actions. IEEE Trans Syst, Man Cybern 36(3):710–719
Article Google Scholar
Prechelt L (2000) Early stopping - but when? In: Neural Networks: Tricks of the Trade, volume 1524 of LNCS, Springer-Verlag, pp 55–69
Qiu Q, Jiang Z, Chellappa R (2011) Sparse dictionary-based representation and recognition of action attributes. In: IEEE International Conference on Computer Vision (ICCV), 2011, pp 707–714
Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971{981. https://doi.org/10.1007/s00138-012-0450-4
Article Google Scholar
Rodriguez M, Orrite C, Medrano C, Makris D (2016) A time flexible kernel framework for video-based activity recognition. J Image Vision Comput 48:26–36
Article Google Scholar
Sadanand S, Corso JJ (2012) Action bank: A high-level representation of activity in video. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1234–1241
Schaffer J, Whitley D, Eshelman L (1992) Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: International Workshop on Combinations of Genetic Algorithms and Neural Networks (COGANN-92), pp 1–37. doi:https://doi.org/10.1109/COGANN.1992.273950
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local SVM approach. In: IEEE International Conference on Pattern Recognition vol. 3, pp 32–36
Shen J (2009) Stochastic modeling western paintings for effective classification. Pattern Recogn 42(2):293–301
Article MATH Google Scholar
Shen J, Deng RH, Cheng Z, Nie L, Yan S (2015) On robust image spam filtering via comprehensive visual modeling. Pattern Recogn 48(10):3227–3238
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Vinu Sundararaj (2016) An efficient threshold prediction scheme for wavelet based ECG signal noise reduction using variable step size firefly algorithm. Int J Intell Eng Syst 9(3):117–126
Sutskever I, Martens J, Dahl GE, Hinton GE (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), Vol. 28, pp 1139–1147
Vinothini J, Ashok Bakkiyaraj R (2012) Grey wolf optimization algorithm for color image enhancement considering brightness preservation constraint. IEEE Conf Comput Vis Pattern Recognit 482:1234–1241
Google Scholar
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3551–3558
Wang L, Yu Q, Tang X (2014) Latent hierarchical model of temporal structure for complex activity classification. IEEE Trans Image Process 23(2):810–822
Article MathSciNet MATH Google Scholar
Xie L, Shen J, Han J, Zhu L, Shao L (2017) Dynamic multi-view hashing for online image retrieval. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp 3133–3139. IJCAI, Melbourne
Yao X (1999) Evolving artificial neural networks. Proc IEEE 87(9):1423–1447. https://doi.org/10.1109/5.784219.
Article Google Scholar
Yuan F, Xia G-S, Sahbi H, Prinet V (2012) Mid-level features and spatio-temporal context for activity recognition. Pattern Recogn 45(12):4182–4191
Article Google Scholar
Zhang E, Chen W, Zhang Z, Zhang Y (2016) Local surface geometric feature for 3D human action recognition. J Neurocomput 208(5):281–289
Zhen X, Shao L, Li X (2014) Action recognition by spatio-temporal oriented energies. J Inform Sci 281:295–309
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by my supervisor Dr.U.Srinivasulu Reddy, Assistant professor/ Department of Computer Applications, NIT Trichy, Tamilnadu, India.

Author information

Authors and Affiliations

Department of Computer Applications, NIT, Trichy, Tamilnadu, India
N. Kumaran & A. Vadivel
Sree Dattha Institute of Engineering and Science, Hyderabad, India
S. Saravana Kumar

Authors

N. Kumaran
View author publications
You can also search for this author in PubMed Google Scholar
A. Vadivel
View author publications
You can also search for this author in PubMed Google Scholar
S. Saravana Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. Kumaran.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumaran, N., Vadivel, A. & Kumar, S.S. Recognition of human actions using CNN-GWO: a novel modeling of CNN for enhancement of classification performance. Multimed Tools Appl 77, 23115–23147 (2018). https://doi.org/10.1007/s11042-017-5591-z

Download citation

Received: 18 May 2017
Revised: 08 November 2017
Accepted: 27 December 2017
Published: 19 January 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11042-017-5591-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognition of human actions using CNN-GWO: a novel modeling of CNN for enhancement of classification performance

Abstract

Access this article

Similar content being viewed by others

Human action recognition using three orthogonal planes with unsupervised deep convolutional neural network

A resource conscious human action recognition framework using 26-layered deep convolutional neural network

Human action recognition using a hybrid deep learning heuristic

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recognition of human actions using CNN-GWO: a novel modeling of CNN for enhancement of classification performance

Abstract

Access this article

Similar content being viewed by others

Human action recognition using three orthogonal planes with unsupervised deep convolutional neural network

A resource conscious human action recognition framework using 26-layered deep convolutional neural network

Human action recognition using a hybrid deep learning heuristic

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation