Skip to main content

Advertisement

Log in

An approach combining convolutional layers and gated recurrent unit to recognize human activities

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Human activity recognition (HAR) involves the prediction of movement type of a human, based on raw data being captured from wearable sensors or vision based sensors. HAR systems have sapped great interest over the period of time due to its wide applications in the field of healthcare, surveillance and also in the coming generation of metaverse. This article proposes an artificial intelligence (AI) based system using hybrid deep learning model to recognise various human activities from the video footages. The deep convolutional layers and recurrent neural network (RNN) have been combined to generate the hybrid deep learning model known as convolutional recurrent neural network (CRNN). The deep convolutional layers of deep convolutional neural network Inception V3 Net have been used to extract the feature values from the video frames corresponding to each human activity and each generated feature vector has been classified to the appropriate activity class by the gated recurrent unit (GRU) variant of RNN classifier. The performance of the proposed HAR system has been evaluated on the three widely used public datasets—KTH, UCF101, and UCF sports action dataset. GRU variant of RNN classifier is capable to store and remember a long temporal sequence of video frames, generating the pattern of any human activity, over a long duration and so experimental results exhibit that the proposed hybrid deep learning based HAR system outperforms the state-of-the-art methods available in this research domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Data Availability

My manuscript has no associated data.

References

  1. Yang H, Yuana C, Li B, Du Y, Xing J, Hu W, Maybank S (2019) Asymmetric 3D convolutional neural networks for action recognition. Pattern Recogn 85:1–12

    Article  Google Scholar 

  2. Wang Z, Wu D, Gravina R, Fortino G, Jiang Y, Tang K (2017) Kernel fusion based extreme learning machine for cross-location activity recognition. Inf Fusion 37:1–9

    Article  Google Scholar 

  3. Ghosh R, Kumar A (2022) A hybrid deep learning model by combining convolutional neural network and recurrent neural network to detect forest fire. Multimed Tools Appl 81:38643–38660

    Article  Google Scholar 

  4. Jindal A, Ghosh R (2023) An optimized CNN system to recognize handwritten characters in ancient documents in Grantha script. Int J Inf Technol. https://doi.org/10.1007/s41870-023-01247-1

  5. Xu T, Zhu F, Wong E, Fang Y (2016) Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition. Image Vis Comput 55:127–137

    Article  Google Scholar 

  6. Yu J, Gao H, Chen Y, Zhou D, Liu J, Ju Z (2022) Adaptive spatiotemporal representation learning for skeleton-based human action recognition. IEEE Trans Cogn Develop Syst 14:1654–1665

    Article  Google Scholar 

  7. Ji Y, Yang Y, Shen HT, Harada T (2021) View-invariant action recognition via unsupervised attention transfer (UANT). Pattern Recogn 113:107807

    Article  Google Scholar 

  8. Joshi S, Karhadkar A, Thatte N, Chopra K, Khadtare T (2020) A novice approach of hybrid transfer learning for video classification. Int J Future Gener Commun Netw 13:196–204

    Google Scholar 

  9. Ma M, Marturi N, Li Y, Leonardis A, Stolkin R (2018) Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos. Pattern Recogn 76:506–521

    Article  Google Scholar 

  10. Li X, Grandvalet Y, Davoine F, Cheng J, Cui Y, Zhang H, Belongie S, Tsai Y, Yang M (2020) Transfer learning in computer vision tasks: remember where you come from. Image Vis Comput 93:103853

  11. Robertson N, Reid I (2006) A general method for human activity recognition in video. Comput Vis Image Underst 104:232–248

    Article  Google Scholar 

  12. Ullah A, Muhammad K, Ding W, Palade V, Haq I, Baik S (2021) Efficient activity recognition using lightweight CNN and DS-GRU network for surveillance applications. Appl Soft Comput 103:1568–4946

    Article  Google Scholar 

  13. Gedamu K, Ji Y, Gao L, Yang Y, Shen HT (2023) Relation-mining self-attention network for skeleton-based human action recognition. Pattern Recogn 139:109455

    Article  Google Scholar 

  14. Hang R, Li M (2022) Spatial-temporal adaptive graph convolutional network for skeleton-based action recognition. ACCV 2022:1265–1281

    Google Scholar 

  15. Gao L, Ji Y, Gedamu K, Zhu X, Xu X, Shen HT (2022) View-invariant human action recognition via view transformation network (VTN). IEEE Trans Multimedia 24:4493–4503

    Article  Google Scholar 

  16. Mojarad R, Attal F, Chibani A, Rama S, Amirat Y (2018) Hybrid approach for human activity recognition by ubiuitous robots. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5660–5665

  17. Laptev I (2005) On space-time interest points. In: International conference on computer vision. France, pp 432–439

  18. Harris C (1988) A combined corner and edge detector. In: Proceedings of the 4th alvey vision conference. Manchester, pp 147–151

  19. Wang H, Klaser A, Schmid C, Liu C (2011) Action recognition by dense treajectories. In: IEEE Conference on computer vision and pattern recognition. United States, pp 3169–3176

  20. Wang H, Schmid C (2013) Action recognition with improved treajectories. In: IEEE International conference on computer vision. Sydney, pp 3551–3558

  21. Dalal N, Triggs B (2005) Histogram of oriented gradients for human detection. In: IEEE Conference on computer vision and pattern recognition. France, pp 886–893

  22. Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histogram of flow and appearance. In: Springer european conference on computer vision. Austria, pp 428–441

  23. Jalal A, Kim Y, Kamal S, Kim D (2017) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recogn 61:295–308

    Article  Google Scholar 

  24. Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vis Image Underst 208–209:103219

    Article  Google Scholar 

  25. Gedamu K, Ji Y, Yang Y, Gao L, Shen HT (2021) Arbitrary-view human action recognition via novel-view action generation. Pattern Recogn 118:108043

    Article  Google Scholar 

  26. Hu L, Zhao K, Ling BWK, Lin Y (2023) Activity recognition via correlation coefficients based graph with nodes updated by multi-aggregator approach. Biomed Signal Process Control 79:104255

    Article  Google Scholar 

  27. Ghosh R, Kumar P, Roy PP (2018) A Dempster-Shafer theory based classifier combination for online Signature recognition and verification systems. Int J Mach Learn Cybern 10:2467–2482

    Article  Google Scholar 

  28. Ghosh R, Roy PP, Kumar P (2018) Smart device authentication based on online handwritten script identification and word recognition in indic scripts using zone-wise features. Int J Inf Syst Model Des 9(1):21–55

    Article  Google Scholar 

  29. Ghosh R, Keshri P, Kumar P (2018) RNN based online handwritten word recognition in devanagari script. ICFHR 2018:517–522

    Google Scholar 

  30. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE Conference on computer vision and pattern recognition. Boston, pp 1–9

  31. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition. Las Vegas, pp 770–778

  32. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li F (2014) Large-scale video classification with convolutional neural networks. In: IEEE Conference on computer vision and pattern recognition. California, pp 1725–1732

  33. Schuldt, Laptev I, Caputo B (2005) KTH dataset. Retrived September 2021 from. https://www.csc.kth.se/cvap/actions/?msclkid=871904d1d0e011eca44e6d4e5ff150ff

  34. Soomro K, Zamir A, Shah M (2012) UCF101 dataset. Retrived October 2021 from. https://www.crcv.ucf.edu/data/UCF101.php

  35. Soomro K, Zamir A (2014) UCF sports action dataset. Retrived November 2021 from. https://www.crcv.ucf.edu/data/UCF_Sports_Action.php

  36. Du T, Bourdev L, Fergus R, Torresani L (2015) Learning spatiotemporal features with 3D convolutional networks. In: IEEE International conference on computer vision and pattern recognition. Boston, pp 4489–4497

  37. Ji S, Xu W, Yang M, Yu K (2013) 3D Convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231

    Article  Google Scholar 

  38. Ijjina E, Chalavadi K (2016) Human action recognition using genetic algorithms and convolutional neural networks. Pattern Recogn 59:199–212

    Article  Google Scholar 

  39. Mliki H, Bouhlel F, Hammami M (2020) Human activity recognition from UAV-captured video sequences. Pattern Recogn 100:107140

    Article  Google Scholar 

  40. Sargano A, Wang X, Angelov P, Habib Z (2017) Human action recognition using transfer learning with deep representations. In: International joint conference on neural networks (IJCNN). Alaska, pp 463–469

  41. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: IEEE Conference on computer vision and pattern recogn (CVPR). Las Vegas, pp 2818–2826

  42. Nguyen K, Fookes C, Sridharan S (2020) Context from within: hierarchical context modeling for semantic segmentation. Pattern Recogn 105:0031–3203

    Article  Google Scholar 

  43. Elman J (1990) Finding structure in time. Cogn Sci 14:179–211

    Article  Google Scholar 

  44. Xiao Q, Song R (2018) Action recognition based on hierarchical dynamic Bayesian network. Multimd Tools Appl 77:6955–6968

    Article  Google Scholar 

  45. Donahue J, Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: IEEE Conference on computer vision and pattern recognition. Boston, pp 2625–2634

  46. Majd M, Safabakhsh R (2020) Correlational convolutional LSTM for human action recognition. Neurocomputing 396:224–229

    Article  Google Scholar 

  47. Bondugula RK, Udgata SK, Sivangi KB (2023) A novel deep learning architecture and MINIROCKET feature extraction method for human activity recognition using ECG, PPG and inertial sensor dataset. Appl Intell 53:14400–14425

    Article  Google Scholar 

  48. Ghosh R, Vamshi C, Kumar P (2019) RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning. Pattern Recognit 92:203–218

    Article  Google Scholar 

  49. Ghosh R (2022) A Faster R-CNN and recurrent neural network based approach of gait recognition with and without carried objects. Expert Syst Appl 205:117730

    Article  Google Scholar 

  50. Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2010) Action classification in soccer videos with long short-term memory recurrent neural networks. In: Springer international conference on artificial neural networks. Munich, pp 154–159

  51. Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: Springer international workshop on human behavior understanding. Amsterdam, pp 29–39

  52. Joe Y, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: IEEE Conference on computer vision and pattern recognition. Boston, pp 4694–4702

  53. Zhao R, Ali H, Smagt P (2017) Two-stream RNN/CNN for action recognition in 3D videos. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). Vancouver, pp 4260–4267

  54. Kuehne H, Richard A, Gall J (2020) A Hybrid RNN-HMM approach for weakly supervised temporal action segmentation. IEEE Trans Pattern Anal Mach Intell 42:765–779

    Article  Google Scholar 

  55. Fischer P, Dosovitskiy A, Ilg E, Häusser P, Hazırbaş C, Golkov V, Smagt P, Cremers D, Brox T (2015) FlowNet: learning optical flow with convolutional networks. In: IEEE International conference on computer vision. Santiago, pp 2758–2766

  56. Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE conference on computer vision and pattern recognition(CVPR). Honolulu, pp 1647–1655

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajib Ghosh.

Ethics declarations

Conflicts of interest

The authors have no conflict of interest/competing interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ullah, M.S., Ghosh, R. An approach combining convolutional layers and gated recurrent unit to recognize human activities. Multimed Tools Appl 83, 56489–56516 (2024). https://doi.org/10.1007/s11042-023-17697-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17697-5

Keywords

Navigation