Abstract
Human activity recognition (HAR) involves the prediction of movement type of a human, based on raw data being captured from wearable sensors or vision based sensors. HAR systems have sapped great interest over the period of time due to its wide applications in the field of healthcare, surveillance and also in the coming generation of metaverse. This article proposes an artificial intelligence (AI) based system using hybrid deep learning model to recognise various human activities from the video footages. The deep convolutional layers and recurrent neural network (RNN) have been combined to generate the hybrid deep learning model known as convolutional recurrent neural network (CRNN). The deep convolutional layers of deep convolutional neural network Inception V3 Net have been used to extract the feature values from the video frames corresponding to each human activity and each generated feature vector has been classified to the appropriate activity class by the gated recurrent unit (GRU) variant of RNN classifier. The performance of the proposed HAR system has been evaluated on the three widely used public datasets—KTH, UCF101, and UCF sports action dataset. GRU variant of RNN classifier is capable to store and remember a long temporal sequence of video frames, generating the pattern of any human activity, over a long duration and so experimental results exhibit that the proposed hybrid deep learning based HAR system outperforms the state-of-the-art methods available in this research domain.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig15_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig16_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig17_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17697-5/MediaObjects/11042_2023_17697_Fig18_HTML.png)
Similar content being viewed by others
Data Availability
My manuscript has no associated data.
References
Yang H, Yuana C, Li B, Du Y, Xing J, Hu W, Maybank S (2019) Asymmetric 3D convolutional neural networks for action recognition. Pattern Recogn 85:1–12
Wang Z, Wu D, Gravina R, Fortino G, Jiang Y, Tang K (2017) Kernel fusion based extreme learning machine for cross-location activity recognition. Inf Fusion 37:1–9
Ghosh R, Kumar A (2022) A hybrid deep learning model by combining convolutional neural network and recurrent neural network to detect forest fire. Multimed Tools Appl 81:38643–38660
Jindal A, Ghosh R (2023) An optimized CNN system to recognize handwritten characters in ancient documents in Grantha script. Int J Inf Technol. https://doi.org/10.1007/s41870-023-01247-1
Xu T, Zhu F, Wong E, Fang Y (2016) Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition. Image Vis Comput 55:127–137
Yu J, Gao H, Chen Y, Zhou D, Liu J, Ju Z (2022) Adaptive spatiotemporal representation learning for skeleton-based human action recognition. IEEE Trans Cogn Develop Syst 14:1654–1665
Ji Y, Yang Y, Shen HT, Harada T (2021) View-invariant action recognition via unsupervised attention transfer (UANT). Pattern Recogn 113:107807
Joshi S, Karhadkar A, Thatte N, Chopra K, Khadtare T (2020) A novice approach of hybrid transfer learning for video classification. Int J Future Gener Commun Netw 13:196–204
Ma M, Marturi N, Li Y, Leonardis A, Stolkin R (2018) Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos. Pattern Recogn 76:506–521
Li X, Grandvalet Y, Davoine F, Cheng J, Cui Y, Zhang H, Belongie S, Tsai Y, Yang M (2020) Transfer learning in computer vision tasks: remember where you come from. Image Vis Comput 93:103853
Robertson N, Reid I (2006) A general method for human activity recognition in video. Comput Vis Image Underst 104:232–248
Ullah A, Muhammad K, Ding W, Palade V, Haq I, Baik S (2021) Efficient activity recognition using lightweight CNN and DS-GRU network for surveillance applications. Appl Soft Comput 103:1568–4946
Gedamu K, Ji Y, Gao L, Yang Y, Shen HT (2023) Relation-mining self-attention network for skeleton-based human action recognition. Pattern Recogn 139:109455
Hang R, Li M (2022) Spatial-temporal adaptive graph convolutional network for skeleton-based action recognition. ACCV 2022:1265–1281
Gao L, Ji Y, Gedamu K, Zhu X, Xu X, Shen HT (2022) View-invariant human action recognition via view transformation network (VTN). IEEE Trans Multimedia 24:4493–4503
Mojarad R, Attal F, Chibani A, Rama S, Amirat Y (2018) Hybrid approach for human activity recognition by ubiuitous robots. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5660–5665
Laptev I (2005) On space-time interest points. In: International conference on computer vision. France, pp 432–439
Harris C (1988) A combined corner and edge detector. In: Proceedings of the 4th alvey vision conference. Manchester, pp 147–151
Wang H, Klaser A, Schmid C, Liu C (2011) Action recognition by dense treajectories. In: IEEE Conference on computer vision and pattern recognition. United States, pp 3169–3176
Wang H, Schmid C (2013) Action recognition with improved treajectories. In: IEEE International conference on computer vision. Sydney, pp 3551–3558
Dalal N, Triggs B (2005) Histogram of oriented gradients for human detection. In: IEEE Conference on computer vision and pattern recognition. France, pp 886–893
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histogram of flow and appearance. In: Springer european conference on computer vision. Austria, pp 428–441
Jalal A, Kim Y, Kamal S, Kim D (2017) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recogn 61:295–308
Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vis Image Underst 208–209:103219
Gedamu K, Ji Y, Yang Y, Gao L, Shen HT (2021) Arbitrary-view human action recognition via novel-view action generation. Pattern Recogn 118:108043
Hu L, Zhao K, Ling BWK, Lin Y (2023) Activity recognition via correlation coefficients based graph with nodes updated by multi-aggregator approach. Biomed Signal Process Control 79:104255
Ghosh R, Kumar P, Roy PP (2018) A Dempster-Shafer theory based classifier combination for online Signature recognition and verification systems. Int J Mach Learn Cybern 10:2467–2482
Ghosh R, Roy PP, Kumar P (2018) Smart device authentication based on online handwritten script identification and word recognition in indic scripts using zone-wise features. Int J Inf Syst Model Des 9(1):21–55
Ghosh R, Keshri P, Kumar P (2018) RNN based online handwritten word recognition in devanagari script. ICFHR 2018:517–522
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE Conference on computer vision and pattern recognition. Boston, pp 1–9
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition. Las Vegas, pp 770–778
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li F (2014) Large-scale video classification with convolutional neural networks. In: IEEE Conference on computer vision and pattern recognition. California, pp 1725–1732
Schuldt, Laptev I, Caputo B (2005) KTH dataset. Retrived September 2021 from. https://www.csc.kth.se/cvap/actions/?msclkid=871904d1d0e011eca44e6d4e5ff150ff
Soomro K, Zamir A, Shah M (2012) UCF101 dataset. Retrived October 2021 from. https://www.crcv.ucf.edu/data/UCF101.php
Soomro K, Zamir A (2014) UCF sports action dataset. Retrived November 2021 from. https://www.crcv.ucf.edu/data/UCF_Sports_Action.php
Du T, Bourdev L, Fergus R, Torresani L (2015) Learning spatiotemporal features with 3D convolutional networks. In: IEEE International conference on computer vision and pattern recognition. Boston, pp 4489–4497
Ji S, Xu W, Yang M, Yu K (2013) 3D Convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231
Ijjina E, Chalavadi K (2016) Human action recognition using genetic algorithms and convolutional neural networks. Pattern Recogn 59:199–212
Mliki H, Bouhlel F, Hammami M (2020) Human activity recognition from UAV-captured video sequences. Pattern Recogn 100:107140
Sargano A, Wang X, Angelov P, Habib Z (2017) Human action recognition using transfer learning with deep representations. In: International joint conference on neural networks (IJCNN). Alaska, pp 463–469
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: IEEE Conference on computer vision and pattern recogn (CVPR). Las Vegas, pp 2818–2826
Nguyen K, Fookes C, Sridharan S (2020) Context from within: hierarchical context modeling for semantic segmentation. Pattern Recogn 105:0031–3203
Elman J (1990) Finding structure in time. Cogn Sci 14:179–211
Xiao Q, Song R (2018) Action recognition based on hierarchical dynamic Bayesian network. Multimd Tools Appl 77:6955–6968
Donahue J, Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: IEEE Conference on computer vision and pattern recognition. Boston, pp 2625–2634
Majd M, Safabakhsh R (2020) Correlational convolutional LSTM for human action recognition. Neurocomputing 396:224–229
Bondugula RK, Udgata SK, Sivangi KB (2023) A novel deep learning architecture and MINIROCKET feature extraction method for human activity recognition using ECG, PPG and inertial sensor dataset. Appl Intell 53:14400–14425
Ghosh R, Vamshi C, Kumar P (2019) RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning. Pattern Recognit 92:203–218
Ghosh R (2022) A Faster R-CNN and recurrent neural network based approach of gait recognition with and without carried objects. Expert Syst Appl 205:117730
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2010) Action classification in soccer videos with long short-term memory recurrent neural networks. In: Springer international conference on artificial neural networks. Munich, pp 154–159
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: Springer international workshop on human behavior understanding. Amsterdam, pp 29–39
Joe Y, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: IEEE Conference on computer vision and pattern recognition. Boston, pp 4694–4702
Zhao R, Ali H, Smagt P (2017) Two-stream RNN/CNN for action recognition in 3D videos. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). Vancouver, pp 4260–4267
Kuehne H, Richard A, Gall J (2020) A Hybrid RNN-HMM approach for weakly supervised temporal action segmentation. IEEE Trans Pattern Anal Mach Intell 42:765–779
Fischer P, Dosovitskiy A, Ilg E, Häusser P, Hazırbaş C, Golkov V, Smagt P, Cremers D, Brox T (2015) FlowNet: learning optical flow with convolutional networks. In: IEEE International conference on computer vision. Santiago, pp 2758–2766
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE conference on computer vision and pattern recognition(CVPR). Honolulu, pp 1647–1655
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no conflict of interest/competing interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ullah, M.S., Ghosh, R. An approach combining convolutional layers and gated recurrent unit to recognize human activities. Multimed Tools Appl 83, 56489–56516 (2024). https://doi.org/10.1007/s11042-023-17697-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17697-5