Multimedia Tools and Applications

, Volume 75, Issue 12, pp 7327–7349 | Cite as

A new method for violence detection in surveillance scenes

  • Tao ZhangEmail author
  • Zhijie Yang
  • Wenjing Jia
  • Baoqing Yang
  • Jie YangEmail author
  • Xiangjian He


Violence detection is a hot topic for surveillance systems. However, it has not been studied as much as for action recognition. Existing vision-based methods mainly concentrate on violence detection and make little effort to determine the location of violence. In this paper, we propose a fast and robust framework for detecting and localizing violence in surveillance scenes. For this purpose, a Gaussian Model of Optical Flow (GMOF) is proposed to extract candidate violence regions, which are adaptively modeled as a deviation from the normal behavior of crowd observed in the scene. Violence detection is then performed on each video volume constructed by densely sampling the candidate violence regions. To distinguish violent events from nonviolent events, we also propose a novel descriptor, named as Orientation Histogram of Optical Flow (OHOF), which are fed into a linear SVM for classification. Experimental results on several benchmark datasets have demonstrated the superiority of our proposed method over the state-of-the-arts in terms of both detection accuracy and processing speed, even in crowded scenes.


Action recognition Violence detection Surveillance scenes Gaussian model of optical flow (GMOF) Orientation histogram of optical flow (OHOF) 



This research is partly supported by NSFC, China (No: 61273258, 61105001).


  1. 1.
    Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv 43:1–43CrossRefGoogle Scholar
  2. 2.
    Beauchemin SS, Barron JL (1995) The computation of optical flow. ACM Comput Surv (CSUR) 27(3):433–466CrossRefGoogle Scholar
  3. 3.
    Bermejo E, Deniz O, Bueno G, and Sukthankar R (2011) Violence detection in video using computer vision techniques. Proc. of the 14th Int Conf Comput Anal Images Patterns II: 332–339Google Scholar
  4. 4.
    Bertini M, Bimbo AD and Seidenari L (2012) Multi-scale and real-time non-parametric approach for anomaly detection and localization. CVIU 320–329Google Scholar
  5. 5.
    Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267CrossRefGoogle Scholar
  6. 6.
    Bouguet JY (1999) Pyramidal implementation of the Lucas Kanade feature tracker. Microsoft Res Labs Tech RepGoogle Scholar
  7. 7.
    Chen MY, Hauptmann A (2009) MoSIFT: recognizing human actions in surveillance videos. Tech Rep Carnegie Mellon UniversityGoogle Scholar
  8. 8.
    Chen D, Wactlar H, Chen M, Gao C, Bharucha A, Hauptmann A (2008) Recognition of aggressive human behavior using binary local motion descriptors. Eng Med Biol Soc 20:5238–5241Google Scholar
  9. 9.
    Cheng WH, Chu WT, Wu JL (2003) Semantic context detection based on hierarchical audio models. In: Proc ACM SIGMM Work Multimedia Inf Retr 109–115Google Scholar
  10. 10.
    Clarin CT, Dionisio JAM, Echavez MT, Naval PCJ (2005) DOVE: detection of movie violence using motion intensity analysis on skin and blood. Tech Rep University of the PhilippinesGoogle Scholar
  11. 11.
    Cristani M, Bicego M, Murino V (2007) Audio-visual event recognition in surveillance video sequences. IEEE Trans Multimedia 257–267Google Scholar
  12. 12.
    Cupillard F, Bremond F, Thonnat M (2002) Group behavior recognition with multiple cameras. WACV 177–183Google Scholar
  13. 13.
    Dai P, Di H, Dong L, Tao L, Xu G (2008) Group interaction analysis in dynamic context. IEEE Trans Syst Man Cybern 38(1):275–282CrossRefGoogle Scholar
  14. 14.
    Damen D, Hogg D (2009) Recognizing linked events: searching the space of feasible explanations. CVPR 927–934Google Scholar
  15. 15.
    Datta A, Shah M, Lobo NDV (2002) Person-on-person violence detection in video data. ICIP 433–438Google Scholar
  16. 16.
    de Souza FDM, Chavez GC, do Valle EA, de A Araujo A (2010) Violence detection in video using spatio-temporal features. SIBGRAPI 224–230Google Scholar
  17. 17.
    Gong S, Xiang T (2003) Recognition of group activities using dynamic probabilistic networks. ICCV 2:742–749Google Scholar
  18. 18.
    Gupta A, Davis LS (2007) Objects in action: an approach for combining action understanding and object perception. CVPR pp 1–8Google Scholar
  19. 19.
    Gupta A, Srinivasan P, Shi J, Davis LS (2009) Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. CVPR 2012–2019Google Scholar
  20. 20.
    Hassner T, Itcher Y, Kliper-Gross O (2012) Violent flows: real-time detection of violent crowd behavior. CVPRW 1–6Google Scholar
  21. 21.
    Huesmann LR, Moise-Titus J, Podolski CL, Eron LD (2003) Longitudinal relations between children’s exposure to TV violence and their aggressive and violent behavior in young adulthood: 1977–1992. Dev Psychol 39:201–221CrossRefGoogle Scholar
  22. 22.
    Intille SS, Bobick AF (1999) A framework for recognizing multiagent action from visual evidence, In: AAAI-99. AAAI Press, Menlo Park, pp 518–525Google Scholar
  23. 23.
    Ivanov YA, Bobick AF (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22(8):852–872CrossRefGoogle Scholar
  24. 24.
  25. 25.
    Lin J, Wang WQ (2009) Weakly-supervised violence detection in movies with audio and video based co-training. PCM 990–935Google Scholar
  26. 26.
    Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. CVPR 1975–1981Google Scholar
  27. 27.
    Minnen D, Essa I, Starner T (2003) Expectation grammars: leveraging high-level expectations for activity recognition. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2:626–632Google Scholar
  28. 28.
    Moore D, Essa I (2002) Recognizing multitasked activities from video using stochastic context-free grammar. Proc AAAI Natl Conf AI 770–776Google Scholar
  29. 29.
    Nam JH, Alghoniemy M, Tewfik AH (1998) Audio-visual content-based violent scene characterization. ICIP 353–357Google Scholar
  30. 30.
    Natarajan P, Nevatia R (2007) Coupled hidden semi Markov models for activity recognition. IEEE Work Motion Video Comput pp 1–8Google Scholar
  31. 31.
    Nevatia R, Zhao T, Hongeng S (2003) Hierarchical language-based representation of events in video streams. CVPR Work 4:39–47Google Scholar
  32. 32.
    Nguyen NT, Phung DQ, Venkatesh S, Bui H (2005) Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model. CVPR 2:955–960Google Scholar
  33. 33.
    Oikonomopoulos A, Patras I, Pantic M, Paragios N (2007) Trajectory-based representation of human actions. Artif Intell Hum Comput 4451:133–154CrossRefGoogle Scholar
  34. 34.
    Oliver N, Horvitz E, Garg A (2002) Layered representations for human activity recognition. Proc. 4th IEEE Int Conf Multi-modal Inter faces 3–8Google Scholar
  35. 35.
    Pinhanez CS, Bobick AF (1998) Human action detection using pnf propagation of temporal constraints. Proc. IEEE Comput Soc Conf Comput Vis Pattern Recognit 898–904Google Scholar
  36. 36.
    Popoola Oluwatoyin P and Wang KJ (2012) Video-Based Abnormal Human Behavior recognition - a review. IEEE Trans. Cybernet 865–878Google Scholar
  37. 37.
    Ryoo MS, Aggarwal JK (2009) Semantic representation and recognition of continued and recursive human activities. Int J Comput Vis 82:1–24CrossRefGoogle Scholar
  38. 38.
    Shechtman E, Irani M (2005) Space-time behavior based correlation. CVPR 1:405–412Google Scholar
  39. 39.
    Shi Y, Huang Y, Minnen D, Bobick A, Essa I (2004) Propagation networks for recognition of partially ordered sequential action. CVPR 2:862–869Google Scholar
  40. 40.
    Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. CVPRGoogle Scholar
  41. 41.
  42. 42.
  43. 43.
    Tran D, Alexander S (2008) Human activity recognition with metric learning. ECCV 548–561Google Scholar
  44. 44.
    Tran D, Sorokin A (2008) Human activity recognition with metric learning. ECCV 548–561Google Scholar
  45. 45.
    Vishwakarma S, Sapre A, Agrawal A (2011) Action recognition using cuboids of interest points. IEEE Int Conf Signal Process Commun Comput (ICSPCC) 1–6Google Scholar
  46. 46.
    Wang H, Ullah MM, Kläser A, Laptev I, Schmid C (2009) Evaluation of local spatiotemporal features for action recognition. BMVC 127–140Google Scholar
  47. 47.
    Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden Markov model. CVPR 379–385Google Scholar
  48. 48.
    Yang ZJ, Zhang T, Yang J, Wu Q, Bai L, Yao LX (2013) violence detection based on histogram of optical flow orientation, in Proc. SPIE 9067, Sixth Int Conf Mach Vision 1–4Google Scholar
  49. 49.
    Yu E, Aggarwal JK (2006) Detection of fence climbing from monocular video. 18th Int Conf Pattern Recognit 1:375–378Google Scholar
  50. 50.
    Zhang J, Chen CH (2007) Moving object detection and segmentation in dynamic video backgrounds. IEEE Conf Technol Homeland Security 64–69Google Scholar
  51. 51.
    Zhang D, Gatica-Perez D, Bengio S, McCowan I (2006) Modeling individual and group actions in meetings with layered HMMs. IEEE Trans Multimed 8(3):509–520CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Institute of Image Processing and Pattern RecognitionShanghai Jiao Tong UniversityShanghaiChina
  2. 2.Faculty of Engineering and Information TechnologyUniversity of Technology, SydneySydneyAustralia

Personalised recommendations