Detecting Violent Robberies in CCTV Videos Using Deep Learning

  • Giorgio MoralesEmail author
  • Itamar Salazar-Reque
  • Joel Telles
  • Daniel Díaz
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 559)


Video surveillance through security cameras has become difficult due to the fact that many systems require manual human inspection for identifying violent or suspicious scenarios, which is practically inefficient. Therefore, the contribution of this paper is twofold: the presentation of a video dataset called UNI-Crime, and the proposal of a violent robbery detection method in CCTV videos using a deep-learning sequence model. Each of the 30 frames of our videos passes through a pre-trained VGG-16 feature extractor; then, all the sequence of features is processed by two convolutional long-short term memory (convLSTM) layers; finally, the last hidden state passes through a series of fully-connected layers in order to obtain a single classification result. The method is able to detect a variety of violent robberies (i.e., armed robberies involving firearms or knives, or robberies showing different level of aggressiveness) with an accuracy of 96.69%.


Action recognition convLSTM Robbery detection 


  1. 1.
    The Global Shapers Survey. Accessed 4 Feb 2019
  2. 2.
    Mabrouk, A.B., Zagrouba, E.: Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recognit. Lett. 92, 62–67 (2017). Scholar
  3. 3.
    Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005). Scholar
  4. 4.
    Deniz, O., Serrano, I., Bueno, G., Kim, T.K.: Fast violence detection in video. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), pp. 478–485. IEEE, Lisbon (2004)Google Scholar
  5. 5.
    Tay, N.C., Connie, T., Ong, T.S., Goh, K.O.M., Teh, P.S.: A robust abnormal behavior detection method using convolutional neural network. Computational Science and Technology. LNEE, vol. 481, pp. 37–47. Springer, Singapore (2019). Scholar
  6. 6.
    Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4305–4314 (2015).
  7. 7.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 3551–3558. IEEE, Sydney (2013).
  8. 8.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 568–576. MIT Press, Montreal (2014)Google Scholar
  9. 9.
    Meng, Z., Yuan, J., Li, Z.: Trajectory-pooled deep convolutional networks for violence detection in videos. In: Liu, M., Chen, H., Vincze, M. (eds.) ICVS 2017. LNCS, vol. 10528, pp. 437–447. Springer, Cham (2017). Scholar
  10. 10.
    Zhou, P., Ding, Q., Luo, H., Hou, X.: Violent interaction detection in video based on deep learning. J. Phys. Conf. Ser. 844, 012044 (2017)CrossRefGoogle Scholar
  11. 11.
    Sudhakaran, S., Lanz, O.: Learning to detect violent videos using convolutional long short-term memory. In: 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, pp. 1–6 (2017).
  12. 12.
    Hassner, T., Itcher, I., Kliper-Gross, O.: Violent flows: real-time detection of violent crowd behavior. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Providence (2012).
  13. 13.
    Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011. LNCS, vol. 6855, pp. 332–339. Springer, Heidelberg (2011). Scholar
  14. 14.
    Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern. Anal. Mach. Intell. 36(1), 18–32 (2014)CrossRefGoogle Scholar
  15. 15.
    Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. arXiv:1801.04264 (2018)
  16. 16.
    UNI-Crime Dataset. Accessed 25 Jan 2019
  17. 17.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
  18. 18.
    Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8697–8710. IEEE, Salt Lake City (2018)Google Scholar
  19. 19.
    Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Cortes, C., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS 2015), vol. 1, pp. 802–810. MIT Press, Cambridge (2015)Google Scholar
  20. 20.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). Scholar
  21. 21.
    Salehinejad, H., Sankar, S., Barfett, J., Colak, E., Valaee, S.: Recent advances in recurrent neural networks. arXiv:1801.01078 (2018)
  22. 22.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Miami (2009).
  23. 23.
    Lee, G., Tai, Y., Kim, J.: Deep saliency with encoded low level distance map and high level features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 660–668. IEEE, Las Vegas (2016)Google Scholar
  24. 24.
    Lan, Z., Zhu, Y., Hauptmann, A.G., Newsam, S.: Deep local video feature for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, Honolulu (2017)Google Scholar
  25. 25.
    Li, Z., Gavrilyuk, K., Gavves, E., Jain, M., Snoek, C.G.: Video LSTM convolves, attends and flows for action recognition. Comput. Vis. Image Underst. 166, 41–50 (2018). Scholar
  26. 26.
    Liu, T., Stathaki, T.: Faster R-CNN for robust pedestrian detection using semantic segmentation network. Front. Neurorobot 12, 64 (2018). Scholar
  27. 27.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Las Vegas (2016).
  28. 28.
    Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9. IEEE, Boston (2015).
  29. 29.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inceptionv 4, Inception-Resnet and the impact of residual connections on learning. In: AAAI Conference on Artificial Intelligence, San Francisco (2017)Google Scholar
  30. 30.
    Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017)
  31. 31.
    Zhang, X., Zhou, X., Mengxiao, L., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. arXiv:1707.01083 (2017)
  32. 32.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR 2015), San Diego (2015)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2019

Authors and Affiliations

  1. 1.National Institute of Research and Training in Telecommunications (INICTEL-UNI)National University of EngineeringLimaPeru

Personalised recommendations