Computer Vision with Deep Learning for Human Activity Recognition: Features Representation

Haddad, Laila El; Hanoune, Mostafa; Ettaoufik, Abdelaziz

doi:10.1007/978-3-031-50300-9_3

Laila El Haddad⁶,
Mostafa Hanoune⁶ &
Abdelaziz Ettaoufik⁶

Part of the book series: Synthesis Lectures on Engineering, Science, and Technology ((SLEST))

76 Accesses

Abstract

Deep learning (DL) using artificial neural networks has made remarkable progress, fueled by the utilization of powerful GPUs and the availability of copious online data. This advancement has led to computers becoming highly intelligent across various fields, with computer vision being a prominent area of research and development (R&D). Specifically, Human activity recognition plays a pivotal role in various applications, including healthcare monitoring, surveillance and security systems, and human–machine interfaces. However, challenges persist in unconstrained environments, including occlusions, variations in clothing, and background noise, making these tasks difficult to solve. This review article offers a succinct examination of deep learning algorithms, with a specific emphasis on convolutional neural networks (CNNs), which have been suggested as a solution to classical artificial intelligence problems. Furthermore, the paper delves into the notable outcomes and contributions of various methodologies explored in human activity classification through the utilization of DL techniques. In conclusion, the paper emphasizes the potential of a hybrid approach that combines convolutional and recurrent neural networks in future solutions for human action/activity recognition. By combining the strengths of CNNs in extracting spatial features and RNNs in capturing temporal dependencies, the hybrid CNN-RNN models hold promise in effectively analyzing video data, leading to improved accuracy in classifying human activities. Ongoing research aims to further enhance these hybrid models to tackle the challenges of unconstrained environments and advance the human activity recognition field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Hardcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Harikrishnan, J., Sudarsan, A., Ajai, R. A. S., & Sadashiv, A. (2019). Vision-face recognition attendance monitoring system for surveillance using deep learning technology and computer vision. In 2019 international conference on vision towards emerging trends in communication and networking (ViTECoN).
Google Scholar
Li, A. A. S., Trappey, A. J. C., Trappey, C. V., & Fan, C. Y. (2019). E-discover state-of-the-art research trends of deep learning for computer vision. In IEEE international conference on systems, man and cybernetics (SMC) Bari, Italy.
Google Scholar
McCulloch, W., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology, 52, 115–133.
MathSciNet Google Scholar
Shety, S. K., & Siddiqa, A. (2019, July). Deep learning and applications in computer vision. International Journal of Computer Sciences and Engineering, 7(7). E-ISSN: 2347-2693.
Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS) (pp. 1106–1114).
Google Scholar
Nishani, E., & Ciço, B. (2017). Computer vision approaches based on deep learning and neural networks: Deep neural networks for video analysis of human pose estimation. In 2017 6th Mediterranean conference on embedded computing (MECO), 11–15 June 2017.
Google Scholar
Voulodimos, A., Doulamis, N., Doulamis, A., & Protopapadakis, E. (2018). Deep learning for computer vision: A brief review. Journal of Physics Computational Intelligence and Neuroscience, 2018, 1–13.
Google Scholar
O’Mahony, N., Campbell, S., Carvalho, A., Harapanahalli, S., Hernandez, G. V., Krpalkova, L., Riordan, D., & Walsh, J. (2020). Deep learning vs. traditional computer vision. In Advances in computer vision proceedings of the 2019 computer vision conference (CVC) (pp. 128–144). Springer Nature Switzerland AG.
Google Scholar
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. In 2015 international conference on learning representations (ICLR).
Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016) Deep residual learning for image recognition. In 2016 IEEE conference on computer vision and pattern recognition (CVPR).
Google Scholar
Elmagrouni, I., Ettaoufik, A., Aouad, S., & Maizate, A. (2021). Approach for improving user interface based on gesture recognition. In E3s web of conferences 297, 01030 (ICCSRE’2021).
Google Scholar
Wei, L., & Shah, S. K. (2017). Human activity recognition using deep neural network with contextual information. In 12th international joint conference on computer vision, imaging and computer graphics theory and applications (VISIGRAPP N2017).
Google Scholar
Zamri, N. N. M., Ling, G. F., Han, P. Y., & Yin, O. S. (2019). Vision-based human action recognition on pre-trained AlexNet. In 9th IEEE international conference on control system, computing and engineering (ICCSCE).
Google Scholar
Deep, S., & Zheng, X. (2019). Leveraging CNN and transfer learning for vision-based human activity recognition. In 2019 29th international telecommunication networks and applications conference (ITNAC).
Google Scholar
NeiliBoualia, S., & Amara, N. E. B. (2019). Pose-based human activity recognition: A review. In 2019 15th international wireless communications & mobile computing conference (IWCMC).
Google Scholar
Ouyang, W., Chu, X., & Wang, X. (Département d’ingénierie éléctronique, Université chinoise de Hong Kong). (2014). Multi-source deep learning for human pose estimation. In 2014 IEEE conference on computer vision and pattern recognition.
Google Scholar
Munasinghe, M. I. N. P. (2018). Dynamic hand gesture recognition using computer vision and neural networks. In 2018 3rd international conference for convergence in technology (I2CT) (pp. 1–5). IEEE.
Google Scholar
Mo, L., Li, F., Zhu, Y., & Huang, A. (2016). Human physical activity recognition based on computer vision with deep learning model. In 2016 IEEE international instrumentation and measurement technology conference proceedings.
Google Scholar
Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., & Feng, D. D. (2018). Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Transactions on Systems Man and Cybernetics, PP(99).
Google Scholar
Sung, G., Sokal, K., Uboweja, E., Bazarevsky, V., Baccash, J., Bazavan, E., Chang, C.-L., & Grundmann, M. (2021). On-device real-time hand gesture recognition.
Google Scholar
Nakazawa, A., Kato, H., & Inokuchi, S. (1998). Human tracking using distributed vision systems. In Proceedings of the Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).
Google Scholar
Yang, J., Cheng, J., & Lu, H. (2009). Human activity recognition based on the blob features. In 2009 IEEE international conference on multimedia and expo.
Google Scholar
Abdelbaki, A. (2016). P-CNN: Pose-based CNN features for action recognition. Computer vision Lab SS16.
Google Scholar
Shah, U., & Harpale, A. (2018). A review of deep learning models for computer vision. In 2018 IEEE Punecon.
Google Scholar
Tang, X., Yan, Z., Pen, J., Hao, B., Wang, H., & Li, J. (2021). Selective spatiotemporal features learning for dynamic gesture recognition. Expert Systems with Applications, 169, 114499.
Google Scholar
Mutegeki, R., & Han, D. S. (2020). A CNN-LSTM approach to human activity recognition. In 2020 international conference on artificial intelligence in information and communication (ICAIIC).
Google Scholar
Yang, S., Zhou, Y., & Yu, X. (2020). LSTM and GRU neural network performance comparison study. In 2020 international workshop on electronic communication and artificial intelligence (IWECAI).
Google Scholar
Chen, L., Li, Y., & Liu, Y. (2020). Human body gesture recognition method based on deep learning. In 2020 Chinese control and decision conference (CCDC).
Google Scholar
Ullah, A., Muhammad, K., Del Ser, J., Baik, W., & de Albuquerque, V. H. C. (2019). Activity recognition using temporal optical flow convolutional features and multilayer LSTM. IEEE Transactions on Industrial Electronics, 66(12), 9692–9702.
Google Scholar
Zhao, C., Han, J. G., & Xuebin Xu. (2018, September). CNN and RNN based neural networks for action recognition. In Journal of Physics: Conference Series; Bristol (Vol. 1087, No. 6).
Google Scholar
Yang, Y., & Ramanan, D. (2013). Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2878–2890. https://doi.org/10.1109/TPAMI.2012.261
Article Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Szegedy, C., et al. (2015). Going deeper with convolutions. In 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, USA (pp. 1–9). https://doi.org/10.1109/CVPR.2015.7298594
Gkioxari, G., & Malik, J. (2015). Finding action tubes (pp. 759–768). https://doi.org/10.1109/CVPR.2015.7298676
Vrigkas, M., Nikou, C., & Kakadiaris, I. A. (2015). A review of human activity recognition methods. Frontiers in Robotics and AI, 2, 28.
Article Google Scholar
Wang, C., & Yan, J. (2023). A comprehensive survey of RGB-based and skeleton-based human action recognition. IEEE Access, 11, 53880–53898. https://doi.org/10.1109/ACCESS.2023.3282311
Article Google Scholar
Zhao, L. (2023). A hybrid deep learning-based intelligent system for sports action recognition via visual knowledge discovery. IEEE Access, 11, 46541–46549. https://doi.org/10.1109/ACCESS.2023.3275012
Article Google Scholar

Download references

Author information

Authors and Affiliations

LTIM, Faculty of Sciences Ben M’sik, Hassan II University, Casablanca, Morocco
Laila El Haddad, Mostafa Hanoune & Abdelaziz Ettaoufik

Authors

Laila El Haddad
View author publications
You can also search for this author in PubMed Google Scholar
Mostafa Hanoune
View author publications
You can also search for this author in PubMed Google Scholar
Abdelaziz Ettaoufik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laila El Haddad .

Editor information

Editors and Affiliations

Faculty of Law, Economics and Social Sciences, Hassan II University, Casablanca, Morocco
Aziza Chakir
Information Systems, Universitas Bunda Mulia, Jakarta, Indonesia
Johanes Fernandes Andry
Department of Computer Science Faculty of Computing and Artificial Intelligence, Air University, Islamabad, Pakistan
Arif Ullah
Department of Management Studies, Vaish College of Engineering, Rohtak, India
Rohit Bansal
Department of Mathematics and Computer Science, University of Hassan II, Casablanca, Morocco
Mohamed Ghazouani

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Haddad, L.E., Hanoune, M., Ettaoufik, A. (2024). Computer Vision with Deep Learning for Human Activity Recognition: Features Representation. In: Chakir, A., Andry, J.F., Ullah, A., Bansal, R., Ghazouani, M. (eds) Engineering Applications of Artificial Intelligence. Synthesis Lectures on Engineering, Science, and Technology. Springer, Cham. https://doi.org/10.1007/978-3-031-50300-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-50300-9_3
Published: 20 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50299-6
Online ISBN: 978-3-031-50300-9
eBook Packages: Synthesis Collection of Technology (R0)

Publish with us

Policies and ethics