Skip to main content

Computer Vision with Deep Learning for Human Activity Recognition: Features Representation

  • Chapter
  • First Online:
Engineering Applications of Artificial Intelligence

Abstract

Deep learning (DL) using artificial neural networks has made remarkable progress, fueled by the utilization of powerful GPUs and the availability of copious online data. This advancement has led to computers becoming highly intelligent across various fields, with computer vision being a prominent area of research and development (R&D). Specifically, Human activity recognition plays a pivotal role in various applications, including healthcare monitoring, surveillance and security systems, and human–machine interfaces. However, challenges persist in unconstrained environments, including occlusions, variations in clothing, and background noise, making these tasks difficult to solve. This review article offers a succinct examination of deep learning algorithms, with a specific emphasis on convolutional neural networks (CNNs), which have been suggested as a solution to classical artificial intelligence problems. Furthermore, the paper delves into the notable outcomes and contributions of various methodologies explored in human activity classification through the utilization of DL techniques. In conclusion, the paper emphasizes the potential of a hybrid approach that combines convolutional and recurrent neural networks in future solutions for human action/activity recognition. By combining the strengths of CNNs in extracting spatial features and RNNs in capturing temporal dependencies, the hybrid CNN-RNN models hold promise in effectively analyzing video data, leading to improved accuracy in classifying human activities. Ongoing research aims to further enhance these hybrid models to tackle the challenges of unconstrained environments and advance the human activity recognition field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 44.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Harikrishnan, J., Sudarsan, A., Ajai, R. A. S., & Sadashiv, A. (2019). Vision-face recognition attendance monitoring system for surveillance using deep learning technology and computer vision. In 2019 international conference on vision towards emerging trends in communication and networking (ViTECoN).

    Google Scholar 

  2. Li, A. A. S., Trappey, A. J. C., Trappey, C. V., & Fan, C. Y. (2019). E-discover state-of-the-art research trends of deep learning for computer vision. In IEEE international conference on systems, man and cybernetics (SMC) Bari, Italy.

    Google Scholar 

  3. McCulloch, W., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology, 52, 115–133.

    MathSciNet  Google Scholar 

  4. Shety, S. K., & Siddiqa, A. (2019, July). Deep learning and applications in computer vision. International Journal of Computer Sciences and Engineering, 7(7). E-ISSN: 2347-2693.

    Google Scholar 

  5. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS) (pp. 1106–1114).

    Google Scholar 

  6. Nishani, E., & Ciço, B. (2017). Computer vision approaches based on deep learning and neural networks: Deep neural networks for video analysis of human pose estimation. In 2017 6th Mediterranean conference on embedded computing (MECO), 11–15 June 2017.

    Google Scholar 

  7. Voulodimos, A., Doulamis, N., Doulamis, A., & Protopapadakis, E. (2018). Deep learning for computer vision: A brief review. Journal of Physics Computational Intelligence and Neuroscience, 2018, 1–13.

    Google Scholar 

  8. O’Mahony, N., Campbell, S., Carvalho, A., Harapanahalli, S., Hernandez, G. V., Krpalkova, L., Riordan, D., & Walsh, J. (2020). Deep learning vs. traditional computer vision. In Advances in computer vision proceedings of the 2019 computer vision conference (CVC) (pp. 128–144). Springer Nature Switzerland AG.

    Google Scholar 

  9. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. In 2015 international conference on learning representations (ICLR).

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., & Sun, J. (2016) Deep residual learning for image recognition. In 2016 IEEE conference on computer vision and pattern recognition (CVPR).

    Google Scholar 

  11. Elmagrouni, I., Ettaoufik, A., Aouad, S., & Maizate, A. (2021). Approach for improving user interface based on gesture recognition. In E3s web of conferences 297, 01030 (ICCSRE’2021).

    Google Scholar 

  12. Wei, L., & Shah, S. K. (2017). Human activity recognition using deep neural network with contextual information. In 12th international joint conference on computer vision, imaging and computer graphics theory and applications (VISIGRAPP N2017).

    Google Scholar 

  13. Zamri, N. N. M., Ling, G. F., Han, P. Y., & Yin, O. S. (2019). Vision-based human action recognition on pre-trained AlexNet. In 9th IEEE international conference on control system, computing and engineering (ICCSCE).

    Google Scholar 

  14. Deep, S., & Zheng, X. (2019). Leveraging CNN and transfer learning for vision-based human activity recognition. In 2019 29th international telecommunication networks and applications conference (ITNAC).

    Google Scholar 

  15. NeiliBoualia, S., & Amara, N. E. B. (2019). Pose-based human activity recognition: A review. In 2019 15th international wireless communications & mobile computing conference (IWCMC).

    Google Scholar 

  16. Ouyang, W., Chu, X., & Wang, X. (Département d’ingénierie éléctronique, Université chinoise de Hong Kong). (2014). Multi-source deep learning for human pose estimation. In 2014 IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  17. Munasinghe, M. I. N. P. (2018). Dynamic hand gesture recognition using computer vision and neural networks. In 2018 3rd international conference for convergence in technology (I2CT) (pp. 1–5). IEEE.

    Google Scholar 

  18. Mo, L., Li, F., Zhu, Y., & Huang, A. (2016). Human physical activity recognition based on computer vision with deep learning model. In 2016 IEEE international instrumentation and measurement technology conference proceedings.

    Google Scholar 

  19. Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., & Feng, D. D. (2018). Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Transactions on Systems Man and Cybernetics, PP(99).

    Google Scholar 

  20. Sung, G., Sokal, K., Uboweja, E., Bazarevsky, V., Baccash, J., Bazavan, E., Chang, C.-L., & Grundmann, M. (2021). On-device real-time hand gesture recognition.

    Google Scholar 

  21. Nakazawa, A., Kato, H., & Inokuchi, S. (1998). Human tracking using distributed vision systems. In Proceedings of the Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

    Google Scholar 

  22. Yang, J., Cheng, J., & Lu, H. (2009). Human activity recognition based on the blob features. In 2009 IEEE international conference on multimedia and expo.

    Google Scholar 

  23. Abdelbaki, A. (2016). P-CNN: Pose-based CNN features for action recognition. Computer vision Lab SS16.

    Google Scholar 

  24. Shah, U., & Harpale, A. (2018). A review of deep learning models for computer vision. In 2018 IEEE Punecon.

    Google Scholar 

  25. Tang, X., Yan, Z., Pen, J., Hao, B., Wang, H., & Li, J. (2021). Selective spatiotemporal features learning for dynamic gesture recognition. Expert Systems with Applications, 169, 114499.

    Google Scholar 

  26. Mutegeki, R., & Han, D. S. (2020). A CNN-LSTM approach to human activity recognition. In 2020 international conference on artificial intelligence in information and communication (ICAIIC).

    Google Scholar 

  27. Yang, S., Zhou, Y., & Yu, X. (2020). LSTM and GRU neural network performance comparison study. In 2020 international workshop on electronic communication and artificial intelligence (IWECAI).

    Google Scholar 

  28. Chen, L., Li, Y., & Liu, Y. (2020). Human body gesture recognition method based on deep learning. In 2020 Chinese control and decision conference (CCDC).

    Google Scholar 

  29. Ullah, A., Muhammad, K., Del Ser, J., Baik, W., & de Albuquerque, V. H. C. (2019). Activity recognition using temporal optical flow convolutional features and multilayer LSTM. IEEE Transactions on Industrial Electronics, 66(12), 9692–9702.

    Google Scholar 

  30. Zhao, C., Han, J. G., & Xuebin Xu. (2018, September). CNN and RNN based neural networks for action recognition. In Journal of Physics: Conference Series; Bristol (Vol. 1087, No. 6).

    Google Scholar 

  31. Yang, Y., & Ramanan, D. (2013). Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2878–2890. https://doi.org/10.1109/TPAMI.2012.261

    Article  Google Scholar 

  32. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  33. Szegedy, C., et al. (2015). Going deeper with convolutions. In 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, USA (pp. 1–9). https://doi.org/10.1109/CVPR.2015.7298594

  34. Gkioxari, G., & Malik, J. (2015). Finding action tubes (pp. 759–768). https://doi.org/10.1109/CVPR.2015.7298676

  35. Vrigkas, M., Nikou, C., & Kakadiaris, I. A. (2015). A review of human activity recognition methods. Frontiers in Robotics and AI, 2, 28.

    Article  Google Scholar 

  36. Wang, C., & Yan, J. (2023). A comprehensive survey of RGB-based and skeleton-based human action recognition. IEEE Access, 11, 53880–53898. https://doi.org/10.1109/ACCESS.2023.3282311

    Article  Google Scholar 

  37. Zhao, L. (2023). A hybrid deep learning-based intelligent system for sports action recognition via visual knowledge discovery. IEEE Access, 11, 46541–46549. https://doi.org/10.1109/ACCESS.2023.3275012

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laila El Haddad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Haddad, L.E., Hanoune, M., Ettaoufik, A. (2024). Computer Vision with Deep Learning for Human Activity Recognition: Features Representation. In: Chakir, A., Andry, J.F., Ullah, A., Bansal, R., Ghazouani, M. (eds) Engineering Applications of Artificial Intelligence. Synthesis Lectures on Engineering, Science, and Technology. Springer, Cham. https://doi.org/10.1007/978-3-031-50300-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-50300-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-50299-6

  • Online ISBN: 978-3-031-50300-9

  • eBook Packages: Synthesis Collection of Technology (R0)

Publish with us

Policies and ethics