Skip to main content

Efficiency in Human Actions Recognition in Video Surveillance Using 3D CNN and DenseNet

  • Conference paper
  • First Online:
Advances in Information and Communication (FICC 2022)

Abstract

The human actions recognition in video is a topic of growing interest in the scientific community in computing, due to its application in real problems and different domains such as video surveillance, medicine, psychiatry, among others, and on the other hand, due to the overcrowding of video capture devices all over the planet. Processing video to extract characteristics and subsequent classification or recognition is a complex task, as it involves processing data in a spatial dimension (video dimensions) and a temporal dimension, causing the input data to increase abundantly and become in a challenging task. There are two approaches to the recognition of human actions on video; handcrafted approaches based on optical flow and approaches based on Deep Learning, the latter has achieved many achievements in terms of accuracy; however, it has the problem of high computational cost, making its application almost impossible in specific domains, much less in a real-time scenario. In this way, we propose an architecture based on Deep Learning, for human actions recognition in video, oriented to the domain of video surveillance and in a real-time scenario; For this, the proposal is based on an architecture that combines 3D CNN and DenseNet techniques. The results show that the proposal is efficient and can be used in the domain of real-time video surveillance. Likewise, general representations are proposed referring to the resolution and minimum frames per second that guarantee recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017, pp. 5534–5542, October 2017. https://doi.org/10.1109/ICCV.2017.590

  2. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol. 1, no. January, pp. 568–576 (2014)

    Google Scholar 

  3. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 4489–4497 (2015). https://doi.org/10.1109/ICCV.2015.510

  4. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018). https://doi.org/10.1109/CVPR.2018.00813

  5. Goyal, R., et al.: The ‘something something ’ video database for learning and evaluating visual common sense. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5842–5850 (2017)

    Google Scholar 

  6. Kay, W., et al.: The kinetics human action video dataset. arXiv:1705.06950 (2017)

  7. Materzynska, J., Berger, G., Bax, I., Memisevic, R.: The jester dataset: a large-scale video dataset of human gestures. In: Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019, pp. 2874–2882 (2019). https://doi.org/10.1109/ICCVW.2019.00349

  8. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–14 (2015)

    Google Scholar 

  9. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 7–12 June 2015, pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594

  10. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, no. i, pp. 1933–1941, December 2016. https://doi.org/10.1109/CVPR.2016.213

  11. Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-time action recognition with deeply transferred motion vector CNNs. IEEE Trans. Image Process. 27(5), 2326–2339 (2018). https://doi.org/10.1109/TIP.2018.2791180

    Article  MathSciNet  Google Scholar 

  12. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    Chapter  Google Scholar 

  13. Zhu, Y., Lan, Z., Newsam, S., Hauptmann, A.: Hidden two-stream convolutional networks for action recognition. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 363–378. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_23

    Chapter  Google Scholar 

  14. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59

    Article  Google Scholar 

  15. Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 4724–4733, January 2017. https://doi.org/10.1109/CVPR.2017.502

  16. Paranhos, G.B.: Representation learning of spatio-temporal features from video. USP - Sao Carlos (2019)

    Google Scholar 

  17. Hochreiter, S.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  18. Gao, Y., Liu, H., Sun, X., Wang, C., Liu, Y.: Violence detection using Oriented VIolent Flows. Image Vis. Comput. 48–49(2015), 37–41 (2016). https://doi.org/10.1016/j.imavis.2016.01.006

    Article  Google Scholar 

  19. Deniz, O., Serrano, I., Bueno, G., Kim, T.K.: Fast violence detection in video. In: VISAPP 2014 - Proceedings of 9th International Conference on Computer Vision Theory Applications, vol. 2, pp. 478–485, December 2014. https://doi.org/10.5220/0004695104780485

  20. Bilinski, P.: Human violence recognition and detection in surveillance videos, pp. 30–36 (2016). https://doi.org/10.1109/AVSS.2016.7738019

  21. Zhang, T., Jia, W., He, X., Yang, J.: Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Trans. Circ. Syst. Video Technol. 27(3), 696–709 (2017)

    Article  Google Scholar 

  22. Deb, T., Arman, A., Firoze, A.: Machine cognition of violence in videos using novel outlier-resistant VLAD. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 989–994 (2018)

    Google Scholar 

  23. Dong, Z., Qin, J., Wang, Y.: Multi-stream deep networks for person to person violence detection in videos. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds.) CCPR 2016. CCIS, vol. 662, pp. 517–531. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-3002-4_43

    Chapter  Google Scholar 

  24. Zhou, P., Ding, Q., Luo, H., Hou, X.: Violent interaction detection in video based on deep learning. In: Journal of Physics: Conference Series, vol. 844, no. 1, p. 12044 (2017)

    Google Scholar 

  25. Serrano, I., Deniz, O., Espinosa-Aranda, J.L., Bueno, G.: Fight recognition in video using Hough forests and 2D convolutional neural network. IEEE Trans. Image Process. 27(10), 4787–4797 (2018)

    Article  MathSciNet  Google Scholar 

  26. Sudhakaran, S., Lanz, O.: Learning to detect violent videos using convolutional long short-term memory. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6 (2017)

    Google Scholar 

  27. Hanson, A., Pnvr, K., Krishnagopal, S., Davis, L.: Bidirectional convolutional LSTM for the detection of violence in videos. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  28. Li, J., Jiang, X., Sun, T., Xu, K.: Efficient violence detection using 3D convolutional neural networks. In: 2019 16th IEEE International Conference on Advanced Video Signal Based Surveillance, AVSS 2019 (2019). https://doi.org/10.1109/AVSS.2019.8909883

  29. Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011. LNCS, vol. 6855, pp. 332–339. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23678-5_39

    Chapter  Google Scholar 

  30. Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: real-time detection of violent crowd behavior. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–6. IEEE, June 2012

    Google Scholar 

  31. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  32. Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Herwin Alayn Huillcen Baca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huillcen Baca, H.A., Gutierrez Caceres, J.C., de Luz Palomino Valdivia, F. (2022). Efficiency in Human Actions Recognition in Video Surveillance Using 3D CNN and DenseNet. In: Arai, K. (eds) Advances in Information and Communication. FICC 2022. Lecture Notes in Networks and Systems, vol 438. Springer, Cham. https://doi.org/10.1007/978-3-030-98012-2_26

Download citation

Publish with us

Policies and ethics