Violence Detection in Video Using Computer Vision Techniques

  • Enrique Bermejo Nievas
  • Oscar Deniz Suarez
  • Gloria Bueno García
  • Rahul Sukthankar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6855)


Whereas the action recognition community has focused mostly on detecting simple actions like clapping, walking or jogging, the detection of fights or in general aggressive behaviors has been comparatively less studied. Such capability may be extremely useful in some video surveillance scenarios like in prisons, psychiatric or elderly centers or even in camera phones. After an analysis of previous approaches we test the well-known Bag-of-Words framework used for action recognition in the specific problem of fight detection, along with two of the best action descriptors currently available: STIP and MoSIFT. For the purpose of evaluation and to foster research on violence detection in video we introduce a new video database containing 1000 sequences divided in two groups: fights and non-fights. Experiments on this database and another one with fights from action movies show that fights can be detected with near 90% accuracy.


action recognition fight detection video surveillance 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barla, A., Odone, F., Verri, A.: Histogram intersection kernel for image classification. In: Proceedings of ICIP, pp. 513–516 (2003)Google Scholar
  2. 2.
    Bregler, C.: Learning and recognizing human dynamics in video sequences. In: Proceedings of Computer Vision and Pattern Recognition (1997)Google Scholar
  3. 3.
    Chen, D., Wactlar, H., Chen, M., Gao, C., Bharucha, A., Hauptmann, A.: Recognition of aggressive human behavior using binary local motion descriptors. In: Engineering in Medicine and Biology Society, pp. 5238–5241 (20-25 2008)Google Scholar
  4. 4.
    Chen, M., Hauptmann, A.: MoSIFT: Recognizing human actions in surveillance videos. Tech. rep., Carnegie Mellon University, Pittsburgh, USA (2009)Google Scholar
  5. 5.
    Cheng, W.H., Chu, W.T., Wu, J.L.: Semantic context detection based on hierarchical audio models. In: Proceedings of the ACM SIGMM workshop on Multimedia information retrieval, pp. 109–115 (2003)Google Scholar
  6. 6.
    Clarin, C., Dionisio, J., Echavez, M., Naval, P.C.: DOVE: Detection of movie violence using motion intensity analysis on skin and blood. Tech. rep., University of the Philippines (2005)Google Scholar
  7. 7.
    Csurka, G., Dance, C., Fan, L.X., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision (2004)Google Scholar
  8. 8.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, pp. 726–733 (2003)Google Scholar
  9. 9.
    Giannakopoulos, T., Makris, A., Kosmopoulos, D., Perantonis, S., Theodoridis, S.: Audio-visual fusion for detecting violent scenes in videos. In: Konstantopoulos, S., Perantonis, S., Karkaletsis, V., Spyropoulos, C.D., Vouros, G. (eds.) SETN 2010. LNCS, vol. 6040, pp. 91–100. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Giannakopoulos, T., Kosmopoulos, D., Aristidou, A., Theodoridis, S.: Violence content classification using audio features. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds.) SETN 2006. LNCS (LNAI), vol. 3955, pp. 502–507. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Gong, Y., Wang, W., Jiang, S., Huang, Q., Gao, W.: Detecting violent scenes in movies by auditory and visual cues. In: Proceedings of the 9th Pacific Rim Conference on Multimedia, pp. 317–326. Springer, Heidelberg (2008)Google Scholar
  12. 12.
    Laptev, I.: On space-time interest points. International Journal of Computer Vision 64, 107–123 (2005)CrossRefGoogle Scholar
  13. 13.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: Proceedings of International Conference on Computer Vision, pp. 432–439 (2003)Google Scholar
  14. 14.
    Lewis, D.: Naive Bayes at Forty: The independence assumption in information retrieval. In: European Conference on Machine Learning, pp. 4–15 (1998)Google Scholar
  15. 15.
    Lin, J., Wang, W.: Weakly-supervised violence detection in movies with audio and video based co-training. In: Muneesawang, P., Wu, F., Kumazawa, I., Roeksabutr, A., Liao, M., Tang, X. (eds.) PCM 2009. LNCS, vol. 5879, pp. 930–935. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  16. 16.
    Lopes, A.P.B., do Valle Jr., E.A., de Almeida, J.M., de Albuquerque Araújo, A.: Action recognition in videos: from motion capture labs to the web. CoRR abs/1006.3506 (2010)Google Scholar
  17. 17.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(91) (2004)Google Scholar
  18. 18.
    Nam, J., Alghoniemy, M., Tewfik, A.: Audio-visual content-based violent scene characterization. In: Proceedings of ICIP, pp. 353–357 (1998)Google Scholar
  19. 19.
    Zajdel, W., Krijnders, J., Andringa, T., Gavrila, D.: CASSANDRA: audio-video sensor fusion for aggression detection. In: IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS 2007, pp. 200–205 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Enrique Bermejo Nievas
    • 1
  • Oscar Deniz Suarez
    • 1
  • Gloria Bueno García
    • 1
  • Rahul Sukthankar
    • 2
  1. 1.E.T.S.I.IndustrialesUniversidad de Castilla-La ManchaCiudad RealSpain
  2. 2.Intel Labs Pittsburgh and Robotics InstituteCarnegie MellonUSA

Personalised recommendations