Efficiency in Human Actions Recognition in Video Surveillance Using 3D CNN and DenseNet

Huillcen Baca, Herwin Alayn; Gutierrez Caceres, Juan Carlos; de Luz Palomino Valdivia, Flor

doi:10.1007/978-3-030-98012-2_26

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 438))

Included in the following conference series:

Future of Information and Communication Conference

821 Accesses
2 Citations

Abstract

The human actions recognition in video is a topic of growing interest in the scientific community in computing, due to its application in real problems and different domains such as video surveillance, medicine, psychiatry, among others, and on the other hand, due to the overcrowding of video capture devices all over the planet. Processing video to extract characteristics and subsequent classification or recognition is a complex task, as it involves processing data in a spatial dimension (video dimensions) and a temporal dimension, causing the input data to increase abundantly and become in a challenging task. There are two approaches to the recognition of human actions on video; handcrafted approaches based on optical flow and approaches based on Deep Learning, the latter has achieved many achievements in terms of accuracy; however, it has the problem of high computational cost, making its application almost impossible in specific domains, much less in a real-time scenario. In this way, we propose an architecture based on Deep Learning, for human actions recognition in video, oriented to the domain of video surveillance and in a real-time scenario; For this, the proposal is based on an architecture that combines 3D CNN and DenseNet techniques. The results show that the proposal is efficient and can be used in the domain of real-time video surveillance. Likewise, general representations are proposed referring to the resolution and minimum frames per second that guarantee recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017, pp. 5534–5542, October 2017. https://doi.org/10.1109/ICCV.2017.590
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol. 1, no. January, pp. 568–576 (2014)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 4489–4497 (2015). https://doi.org/10.1109/ICCV.2015.510
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018). https://doi.org/10.1109/CVPR.2018.00813
Goyal, R., et al.: The ‘something something ’ video database for learning and evaluating visual common sense. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5842–5850 (2017)
Google Scholar
Kay, W., et al.: The kinetics human action video dataset. arXiv:1705.06950 (2017)
Materzynska, J., Berger, G., Bax, I., Memisevic, R.: The jester dataset: a large-scale video dataset of human gestures. In: Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019, pp. 2874–2882 (2019). https://doi.org/10.1109/ICCVW.2019.00349
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–14 (2015)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 7–12 June 2015, pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, no. i, pp. 1933–1941, December 2016. https://doi.org/10.1109/CVPR.2016.213
Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-time action recognition with deeply transferred motion vector CNNs. IEEE Trans. Image Process. 27(5), 2326–2339 (2018). https://doi.org/10.1109/TIP.2018.2791180
Article MathSciNet Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Zhu, Y., Lan, Z., Newsam, S., Hauptmann, A.: Hidden two-stream convolutional networks for action recognition. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 363–378. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_23
Chapter Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 4724–4733, January 2017. https://doi.org/10.1109/CVPR.2017.502
Paranhos, G.B.: Representation learning of spatio-temporal features from video. USP - Sao Carlos (2019)
Google Scholar
Hochreiter, S.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Gao, Y., Liu, H., Sun, X., Wang, C., Liu, Y.: Violence detection using Oriented VIolent Flows. Image Vis. Comput. 48–49(2015), 37–41 (2016). https://doi.org/10.1016/j.imavis.2016.01.006
Article Google Scholar
Deniz, O., Serrano, I., Bueno, G., Kim, T.K.: Fast violence detection in video. In: VISAPP 2014 - Proceedings of 9th International Conference on Computer Vision Theory Applications, vol. 2, pp. 478–485, December 2014. https://doi.org/10.5220/0004695104780485
Bilinski, P.: Human violence recognition and detection in surveillance videos, pp. 30–36 (2016). https://doi.org/10.1109/AVSS.2016.7738019
Zhang, T., Jia, W., He, X., Yang, J.: Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Trans. Circ. Syst. Video Technol. 27(3), 696–709 (2017)
Article Google Scholar
Deb, T., Arman, A., Firoze, A.: Machine cognition of violence in videos using novel outlier-resistant VLAD. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 989–994 (2018)
Google Scholar
Dong, Z., Qin, J., Wang, Y.: Multi-stream deep networks for person to person violence detection in videos. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds.) CCPR 2016. CCIS, vol. 662, pp. 517–531. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-3002-4_43
Chapter Google Scholar
Zhou, P., Ding, Q., Luo, H., Hou, X.: Violent interaction detection in video based on deep learning. In: Journal of Physics: Conference Series, vol. 844, no. 1, p. 12044 (2017)
Google Scholar
Serrano, I., Deniz, O., Espinosa-Aranda, J.L., Bueno, G.: Fight recognition in video using Hough forests and 2D convolutional neural network. IEEE Trans. Image Process. 27(10), 4787–4797 (2018)
Article MathSciNet Google Scholar
Sudhakaran, S., Lanz, O.: Learning to detect violent videos using convolutional long short-term memory. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6 (2017)
Google Scholar
Hanson, A., Pnvr, K., Krishnagopal, S., Davis, L.: Bidirectional convolutional LSTM for the detection of violence in videos. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Li, J., Jiang, X., Sun, T., Xu, K.: Efficient violence detection using 3D convolutional neural networks. In: 2019 16th IEEE International Conference on Advanced Video Signal Based Surveillance, AVSS 2019 (2019). https://doi.org/10.1109/AVSS.2019.8909883
Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011. LNCS, vol. 6855, pp. 332–339. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23678-5_39
Chapter Google Scholar
Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: real-time detection of violent crowd behavior. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–6. IEEE, June 2012
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013)

Download references

Author information

Authors and Affiliations

José María Arguedas National University, Apurímac, Peru
Herwin Alayn Huillcen Baca & Flor de Luz Palomino Valdivia
San Agustin National University, Arequipa, Peru
Juan Carlos Gutierrez Caceres

Authors

Herwin Alayn Huillcen Baca
View author publications
You can also search for this author in PubMed Google Scholar
Juan Carlos Gutierrez Caceres
View author publications
You can also search for this author in PubMed Google Scholar
Flor de Luz Palomino Valdivia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Herwin Alayn Huillcen Baca .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huillcen Baca, H.A., Gutierrez Caceres, J.C., de Luz Palomino Valdivia, F. (2022). Efficiency in Human Actions Recognition in Video Surveillance Using 3D CNN and DenseNet. In: Arai, K. (eds) Advances in Information and Communication. FICC 2022. Lecture Notes in Networks and Systems, vol 438. Springer, Cham. https://doi.org/10.1007/978-3-030-98012-2_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-98012-2_26
Published: 08 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98011-5
Online ISBN: 978-3-030-98012-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Efficiency in Human Actions Recognition in Video Surveillance Using 3D CNN and DenseNet