An approach combining convolutional layers and gated recurrent unit to recognize human activities

Ullah, Md Shaquib; Ghosh, Rajib

doi:10.1007/s11042-023-17697-5

An approach combining convolutional layers and gated recurrent unit to recognize human activities

Published: 12 December 2023

Volume 83, pages 56489–56516, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

105 Accesses
Explore all metrics

Abstract

Human activity recognition (HAR) involves the prediction of movement type of a human, based on raw data being captured from wearable sensors or vision based sensors. HAR systems have sapped great interest over the period of time due to its wide applications in the field of healthcare, surveillance and also in the coming generation of metaverse. This article proposes an artificial intelligence (AI) based system using hybrid deep learning model to recognise various human activities from the video footages. The deep convolutional layers and recurrent neural network (RNN) have been combined to generate the hybrid deep learning model known as convolutional recurrent neural network (CRNN). The deep convolutional layers of deep convolutional neural network Inception V3 Net have been used to extract the feature values from the video frames corresponding to each human activity and each generated feature vector has been classified to the appropriate activity class by the gated recurrent unit (GRU) variant of RNN classifier. The performance of the proposed HAR system has been evaluated on the three widely used public datasets—KTH, UCF101, and UCF sports action dataset. GRU variant of RNN classifier is capable to store and remember a long temporal sequence of video frames, generating the pattern of any human activity, over a long duration and so experimental results exhibit that the proposed hybrid deep learning based HAR system outperforms the state-of-the-art methods available in this research domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 7

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Deep learning for time series classification: a review

Article 02 March 2019

Data Availability

My manuscript has no associated data.

References

Yang H, Yuana C, Li B, Du Y, Xing J, Hu W, Maybank S (2019) Asymmetric 3D convolutional neural networks for action recognition. Pattern Recogn 85:1–12
Article Google Scholar
Wang Z, Wu D, Gravina R, Fortino G, Jiang Y, Tang K (2017) Kernel fusion based extreme learning machine for cross-location activity recognition. Inf Fusion 37:1–9
Article Google Scholar
Ghosh R, Kumar A (2022) A hybrid deep learning model by combining convolutional neural network and recurrent neural network to detect forest fire. Multimed Tools Appl 81:38643–38660
Article Google Scholar
Jindal A, Ghosh R (2023) An optimized CNN system to recognize handwritten characters in ancient documents in Grantha script. Int J Inf Technol. https://doi.org/10.1007/s41870-023-01247-1
Xu T, Zhu F, Wong E, Fang Y (2016) Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition. Image Vis Comput 55:127–137
Article Google Scholar
Yu J, Gao H, Chen Y, Zhou D, Liu J, Ju Z (2022) Adaptive spatiotemporal representation learning for skeleton-based human action recognition. IEEE Trans Cogn Develop Syst 14:1654–1665
Article Google Scholar
Ji Y, Yang Y, Shen HT, Harada T (2021) View-invariant action recognition via unsupervised attention transfer (UANT). Pattern Recogn 113:107807
Article Google Scholar
Joshi S, Karhadkar A, Thatte N, Chopra K, Khadtare T (2020) A novice approach of hybrid transfer learning for video classification. Int J Future Gener Commun Netw 13:196–204
Google Scholar
Ma M, Marturi N, Li Y, Leonardis A, Stolkin R (2018) Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos. Pattern Recogn 76:506–521
Article Google Scholar
Li X, Grandvalet Y, Davoine F, Cheng J, Cui Y, Zhang H, Belongie S, Tsai Y, Yang M (2020) Transfer learning in computer vision tasks: remember where you come from. Image Vis Comput 93:103853
Robertson N, Reid I (2006) A general method for human activity recognition in video. Comput Vis Image Underst 104:232–248
Article Google Scholar
Ullah A, Muhammad K, Ding W, Palade V, Haq I, Baik S (2021) Efficient activity recognition using lightweight CNN and DS-GRU network for surveillance applications. Appl Soft Comput 103:1568–4946
Article Google Scholar
Gedamu K, Ji Y, Gao L, Yang Y, Shen HT (2023) Relation-mining self-attention network for skeleton-based human action recognition. Pattern Recogn 139:109455
Article Google Scholar
Hang R, Li M (2022) Spatial-temporal adaptive graph convolutional network for skeleton-based action recognition. ACCV 2022:1265–1281
Google Scholar
Gao L, Ji Y, Gedamu K, Zhu X, Xu X, Shen HT (2022) View-invariant human action recognition via view transformation network (VTN). IEEE Trans Multimedia 24:4493–4503
Article Google Scholar
Mojarad R, Attal F, Chibani A, Rama S, Amirat Y (2018) Hybrid approach for human activity recognition by ubiuitous robots. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5660–5665
Laptev I (2005) On space-time interest points. In: International conference on computer vision. France, pp 432–439
Harris C (1988) A combined corner and edge detector. In: Proceedings of the 4th alvey vision conference. Manchester, pp 147–151
Wang H, Klaser A, Schmid C, Liu C (2011) Action recognition by dense treajectories. In: IEEE Conference on computer vision and pattern recognition. United States, pp 3169–3176
Wang H, Schmid C (2013) Action recognition with improved treajectories. In: IEEE International conference on computer vision. Sydney, pp 3551–3558
Dalal N, Triggs B (2005) Histogram of oriented gradients for human detection. In: IEEE Conference on computer vision and pattern recognition. France, pp 886–893
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histogram of flow and appearance. In: Springer european conference on computer vision. Austria, pp 428–441
Jalal A, Kim Y, Kamal S, Kim D (2017) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recogn 61:295–308
Article Google Scholar
Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vis Image Underst 208–209:103219
Article Google Scholar
Gedamu K, Ji Y, Yang Y, Gao L, Shen HT (2021) Arbitrary-view human action recognition via novel-view action generation. Pattern Recogn 118:108043
Article Google Scholar
Hu L, Zhao K, Ling BWK, Lin Y (2023) Activity recognition via correlation coefficients based graph with nodes updated by multi-aggregator approach. Biomed Signal Process Control 79:104255
Article Google Scholar
Ghosh R, Kumar P, Roy PP (2018) A Dempster-Shafer theory based classifier combination for online Signature recognition and verification systems. Int J Mach Learn Cybern 10:2467–2482
Article Google Scholar
Ghosh R, Roy PP, Kumar P (2018) Smart device authentication based on online handwritten script identification and word recognition in indic scripts using zone-wise features. Int J Inf Syst Model Des 9(1):21–55
Article Google Scholar
Ghosh R, Keshri P, Kumar P (2018) RNN based online handwritten word recognition in devanagari script. ICFHR 2018:517–522
Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE Conference on computer vision and pattern recognition. Boston, pp 1–9
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition. Las Vegas, pp 770–778
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li F (2014) Large-scale video classification with convolutional neural networks. In: IEEE Conference on computer vision and pattern recognition. California, pp 1725–1732
Schuldt, Laptev I, Caputo B (2005) KTH dataset. Retrived September 2021 from. https://www.csc.kth.se/cvap/actions/?msclkid=871904d1d0e011eca44e6d4e5ff150ff
Soomro K, Zamir A, Shah M (2012) UCF101 dataset. Retrived October 2021 from. https://www.crcv.ucf.edu/data/UCF101.php
Soomro K, Zamir A (2014) UCF sports action dataset. Retrived November 2021 from. https://www.crcv.ucf.edu/data/UCF_Sports_Action.php
Du T, Bourdev L, Fergus R, Torresani L (2015) Learning spatiotemporal features with 3D convolutional networks. In: IEEE International conference on computer vision and pattern recognition. Boston, pp 4489–4497
Ji S, Xu W, Yang M, Yu K (2013) 3D Convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231
Article Google Scholar
Ijjina E, Chalavadi K (2016) Human action recognition using genetic algorithms and convolutional neural networks. Pattern Recogn 59:199–212
Article Google Scholar
Mliki H, Bouhlel F, Hammami M (2020) Human activity recognition from UAV-captured video sequences. Pattern Recogn 100:107140
Article Google Scholar
Sargano A, Wang X, Angelov P, Habib Z (2017) Human action recognition using transfer learning with deep representations. In: International joint conference on neural networks (IJCNN). Alaska, pp 463–469
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: IEEE Conference on computer vision and pattern recogn (CVPR). Las Vegas, pp 2818–2826
Nguyen K, Fookes C, Sridharan S (2020) Context from within: hierarchical context modeling for semantic segmentation. Pattern Recogn 105:0031–3203
Article Google Scholar
Elman J (1990) Finding structure in time. Cogn Sci 14:179–211
Article Google Scholar
Xiao Q, Song R (2018) Action recognition based on hierarchical dynamic Bayesian network. Multimd Tools Appl 77:6955–6968
Article Google Scholar
Donahue J, Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: IEEE Conference on computer vision and pattern recognition. Boston, pp 2625–2634
Majd M, Safabakhsh R (2020) Correlational convolutional LSTM for human action recognition. Neurocomputing 396:224–229
Article Google Scholar
Bondugula RK, Udgata SK, Sivangi KB (2023) A novel deep learning architecture and MINIROCKET feature extraction method for human activity recognition using ECG, PPG and inertial sensor dataset. Appl Intell 53:14400–14425
Article Google Scholar
Ghosh R, Vamshi C, Kumar P (2019) RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning. Pattern Recognit 92:203–218
Article Google Scholar
Ghosh R (2022) A Faster R-CNN and recurrent neural network based approach of gait recognition with and without carried objects. Expert Syst Appl 205:117730
Article Google Scholar
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2010) Action classification in soccer videos with long short-term memory recurrent neural networks. In: Springer international conference on artificial neural networks. Munich, pp 154–159
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: Springer international workshop on human behavior understanding. Amsterdam, pp 29–39
Joe Y, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: IEEE Conference on computer vision and pattern recognition. Boston, pp 4694–4702
Zhao R, Ali H, Smagt P (2017) Two-stream RNN/CNN for action recognition in 3D videos. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). Vancouver, pp 4260–4267
Kuehne H, Richard A, Gall J (2020) A Hybrid RNN-HMM approach for weakly supervised temporal action segmentation. IEEE Trans Pattern Anal Mach Intell 42:765–779
Article Google Scholar
Fischer P, Dosovitskiy A, Ilg E, Häusser P, Hazırbaş C, Golkov V, Smagt P, Cremers D, Brox T (2015) FlowNet: learning optical flow with convolutional networks. In: IEEE International conference on computer vision. Santiago, pp 2758–2766
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE conference on computer vision and pattern recognition(CVPR). Honolulu, pp 1647–1655

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Patna, Patna, 800005, India
Md Shaquib Ullah & Rajib Ghosh

Authors

Md Shaquib Ullah
View author publications
You can also search for this author in PubMed Google Scholar
Rajib Ghosh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajib Ghosh.

Ethics declarations

Conflicts of interest

The authors have no conflict of interest/competing interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ullah, M.S., Ghosh, R. An approach combining convolutional layers and gated recurrent unit to recognize human activities. Multimed Tools Appl 83, 56489–56516 (2024). https://doi.org/10.1007/s11042-023-17697-5

Download citation

Received: 14 July 2022
Revised: 02 October 2023
Accepted: 21 November 2023
Published: 12 December 2023
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11042-023-17697-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An approach combining convolutional layers and gated recurrent unit to recognize human activities

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A review of convolutional neural networks in computer vision

Deep learning for time series classification: a review

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An approach combining convolutional layers and gated recurrent unit to recognize human activities

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A review of convolutional neural networks in computer vision

Deep learning for time series classification: a review

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation