Skip to main content

Recognizing human violent action using drone surveillance within real-time proximity


Nowadays, the world is witnessing a significant rise in the cases of both reported and unnoticed violations. As an answer to this rising menace, video surveillance can fill the gap of covering untapped actions which lead to violence, while also ensuring a secure life. In our everyday life, surveillance can be accomplished efficiently by activity classification from drone videos. The prominent fields that have employed this technology are police work, video categorization, biometrics, and human–computer interaction. So far, no public dataset is available for violent activity classification using drone surveillance. Hence, this work aims to look into the domain of machine-driven recognition and classification of human actions from drone videos. In this study, the dataset is created using drones from different heights for an unconstrained environment. The study begins by performing key-point extraction and generate 2D skeletons for the persons in the frame. These extracted key points are given as features in the classification module to recognize the actions. The classification models used in the proposed method are SVM (support vector machine) and Random Forest. Experimental results show that the SVM model with RBF (radial basis function) kernel for activity classification is more efficient when compared to the prior proposed approaches and other experimented models. The research work has also analyzed the run time performance of the proposed system and achieve its real-time performance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15


  1. 1.

    Aydin, B.: Public acceptance of drones: knowledge, attitudes, and practice. Technol. Soc. 59(101), 180 (2019)

    Google Scholar 

  2. 2.

    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)

  3. 3.

    Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 183–192 (2020)

  4. 4.

    Chuang, C.H., Hsieh, J.W., Tsai, L.W., Chen, S.Y., Fan, K.C.: Carried object detection using ratio histogram and its application to suspicious event analysis. IEEE Trans. Circuits Syst. Video Technol. 19(6), 911–916 (2009)

    Article  Google Scholar 

  5. 5.

    Deniz, O., Serrano, I., Bueno, G., Kim, T.K.: Fast violence detection in video. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), IEEE, vol. 2, pp. 478–485 (2014)

  6. 6.

    Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)

  7. 7.

    Fu, E.Y., Leong, H.V., Ngai, G., Chan, S.C.: Automatic fight detection in surveillance videos. In: Proceedings of the 14th International Conference on Advances in Mobile Computing and Multi Media (MoMM '16). Association for Computing Machinery, New York, NY, USA, PP. 225–234 (2016)

  8. 8.

    Goya, K., Zhang, X., Kitayama, K., Nagayama, I.: A method for automatic detection of crimes for public security by using motion analysis. In: 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IEEE, pp. 736–741 (2009)

  9. 9.

    Ha, S., Choi, S.: Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors. In: 2016 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 381–388 (2016)

  10. 10.

    Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2012)

    Article  Google Scholar 

  11. 11.

    Kim, H., Lee, S., Jung, H.: Human activity recognition by using convolutional neural network. Int. J. Electr. Comput. Eng. 9(6), 5270 (2019)

    Google Scholar 

  12. 12.

    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  13. 13.

    Lewis, P.: Cctv in the sky: police plan to use military-style spy drones. Guardian 23, 1 (2010)

    Google Scholar 

  14. 14.

    Li, X., Choo Chuah, M.: Sbgar: semantics based group activity recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2876–2885 (2017)

  15. 15.

    Li, X., Chuah, M.C.: Rehar: robust and efficient human activity recognition. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 362–371 (2018)

  16. 16.

    Li, X., Zhang, C., Zhang, D.: Abandoned objects detection using double illumination invariant foreground masks. In: 2010 20th International Conference on Pattern Recognition, IEEE, pp. 436–439 (2010)

  17. 17.

    Liu, C., Ying, J., Han, F., Ruan, M.: Abnormal human activity recognition using Bayes classifier and convolutional neural network. In: 2018 IEEE 3rd International Conference on Signal and Image Processing (ICSIP), IEEE, pp. 33–37 (2018)

  18. 18.

    Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152 (2020)

  19. 19.

    Manzi, A., Fiorini, L., Limosani, R., Dario, P., Cavallo, F.: Two-person activity recognition using skeleton data. IET Comput. Vis. 12(1), 27–35 (2018)

    Article  Google Scholar 

  20. 20.

    Mumtaz, A., Sargano, A.B., Habib, Z.: Violence detection in surveillance videos with deep network using transfer learning. In: 2018 2nd European Conference on Electrical Engineering and Computer Science (EECS), IEEE, pp. 558–563 (2018)

  21. 21.

    Nievas, E.B., Suarez, O.D., García, G.B., Sukthankar, R.: Violence detection in video using computer vision techniques. In: International Conference on Computer Analysis of Images and Patterns, pp. 332–339. Springer (2011)

  22. 22.

    Ordóñez, F.J., Roggen, D.: Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1), 115 (2016)

    Article  Google Scholar 

  23. 23.

    Penmetsa, S., Minhuj, F., Singh, A., Omkar, S.: Autonomous uav for suspicious action detection using pictorial human pose estimation and classification. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 13(1), 0018–0032 (2014)

    Article  Google Scholar 

  24. 24.

    Ramanathan, M., Yau, W.Y., Teoh, E.K.: Human action recognition with video data: research and evaluation challenges. IEEE Trans. Hum. Mach. Syst. 44(5), 650–663 (2014)

    Article  Google Scholar 

  25. 25.

    Ren, B., Liu, M., Ding, R., Liu, H.: A survey on 3d skeleton-based action recognition using learning method (2020). arXiv:2002.05907

  26. 26.

    Seebamrungsat, J., Praising, S., Riyamongkol, P.: Fire detection in the buildings using image processing. In: 2014 Third ICT International Student Project Conference (ICT-ISPC), IEEE, pp. 95–98 (2014)

  27. 27.

    Serrano Gracia, I., Deniz Suarez, O., Bueno Garcia, G., Kim, T.K.: Fast fight detection. PLoS One 10(4), e0120448 (2015)

    Article  Google Scholar 

  28. 28.

    Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7912–7921 (2019)

  29. 29.

    Singh, A., Patil, D., Omkar, S.: Eye in the sky: real-time drone surveillance system (dss) for violent individuals identification using scatternet hybrid deep learning network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1629–1637 (2018)

  30. 30.

    Soomro, K., Zamir, A.R., Shah, M.: A dataset of 101 human action classes from videos in the wild. Center for Research in Computer Vision 2(11) (2012)

  31. 31.

    Ullah, A., Muhammad, K., Del Ser, J., Baik, S.W., de Albuquerque, V.H.C.: Activity recognition using temporal optical flow convolutional features and multilayer lstm. IEEE Trans. Ind. Electron. 66(12), 9692–9702 (2018)

    Article  Google Scholar 

  32. 32.

    Walters, W., Weber, J.: Ucav surveillance, high-tech masculinities and oriental others. In: Presentation to A Global Surveillance Society (2010)

  33. 33.

    Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, pp. 20–36. Springer (2016)

  34. 34.

    Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence (2018)

  35. 35.

    Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-time action recognition with enhanced motion vector cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2718–2726 (2016)

  36. 36.

    Zhang, H.B., Zhang, Y.X., Zhong, B., Lei, Q., Yang, L., Du, J.X., Chen, D.S.: A comprehensive survey of vision-based human action recognition methods. Sensors 19(5), 1005 (2019)

    Article  Google Scholar 

  37. 37.

    Zhou, P., Ding, Q., Luo, H., Hou, X.: Violent interaction detection in video based on deep learning. In: Journal of Physics: Conference Series, vol 844, p. 012044. IOP Publishing (2017)

Download references


The authors of the manuscript would like to thank all the individuals who ever helped them in implementation of this project. The authors would also like to thank our organizations for giving us the opportunity to work in collaborative manner.


The author declares that there is no funding associated with this project.

Author information



Corresponding author

Correspondence to Ankit Vidyarthi.

Ethics declarations

Conflict of interest

The authors of this manuscript declare that there is no conflict of interest.

Ethics statement

The author of this manuscript confirms that: (i) informed, written consent has been obtained from the relevant sources wherever is required; (ii) all procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1964 and its later amendments. (iii) the approval and/or informed consent were obtained by human subjects where ever is applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Srivastava, A., Badal, T., Garg, A. et al. Recognizing human violent action using drone surveillance within real-time proximity. J Real-Time Image Proc 18, 1851–1863 (2021).

Download citation


  • Video surveillance
  • Unconstrained environment
  • Drone videos
  • Key-point extraction
  • Activity classification