Skip to main content
Log in

MoWLD: a robust motion image descriptor for violence detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Automatic violence detection from video is a hot topic for many video surveillance applications. However, there has been little success in designing an algorithm that can detect violence in surveillance videos with high performance. Existing methods typically apply the Bag-of-Words (BoW) model on local spatiotemporal descriptors. However, traditional spatiotemporal features are not discriminative enough, and also the BoW model roughly assigns each feature vector to only one visual word and therefore ignores the spatial relationships among the features. To tackle these problems, in this paper we propose a novel Motion Weber Local Descriptor (MoWLD) in the spirit of the well-known WLD and make it a powerful and robust descriptor for motion images. We extend the WLD spatial descriptions by adding a temporal component to the appearance descriptor, which implicitly captures local motion information as well as low-level image appear information. To eliminate redundant and irrelevant features, the non-parametric Kernel Density Estimation (KDE) is employed on the MoWLD descriptor. In order to obtain more discriminative features, we adopt the sparse coding and max pooling scheme to further process the selected MoWLDs. Experimental results on three benchmark datasets have demonstrated the superiority of the proposed approach over the state-of-the-arts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv 43(3):1–43

    Article  Google Scholar 

  2. Andrade E, Fisher R (2006) Modelling crowd scenes for event detection. In: Proceedings of the 18th international conference on pattern recognition (ICPR’06). IEEE, vol 01, pp 175–178

  3. Baysal S, Duygulu P (2013) A line based pose representation for human action recognition. Signal Process Image Commun 28(5):458–471

    Article  Google Scholar 

  4. Bermejo E, Deniz O, Bueno G, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Proceedings of the 14th international conference on computer analysis of images and patterns. Springer, Berlin Heidelberg New York, pp 332–339

  5. Bobick A, Davis J (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267

    Article  Google Scholar 

  6. Botev ZI, Grotowski JF, Kroese DP (2010) Kernel density estimation via diffusion. Ann Stat 38(5):2916–2957

    Article  MathSciNet  MATH  Google Scholar 

  7. Boureau YL, Ponce J, Yann L (2010) A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). no. 6, vol 31, pp 111–118

  8. Chen J, Shan S, He C, Zhao G, Chen X, Gao W (2010) Wld: a robust local image descriptor. IEEE Trans Pattern Anal Mach Intell 32(9):1705–1720

    Article  Google Scholar 

  9. Chen M, Hauptmann A (2009) Mosift: recognizing human actions in surveillance videos. In: Tech. rep, Carnegie Mellon University, pp. 1–10. Carnegie Mellon University

  10. Cheng W, Chu W, Wu J (2003) Semantic context detection based on hierarchical audio models. In: Proceedings of the ACM SIGMM workshop on multimedia information retrieval, pp 109–115

  11. Clarin C, Dionisio J, Echavez M, Naval P (2005) Detection of movie violence using motion intensity analysis on skin and blood. Tech. rep., University of the Philippines

  12. Cristani M, Bicego M, Murino V (2007) Audio-visual event recognition in surveillance video sequences. In: IEEE transactions on multimedia. IEEE, pp 257–267

  13. Dai P, Di H, Dong L, Tao L, Xu G (2008) Group interaction analysis in dynamic context. In: IEEE transactions on systems, man, and cybernetics. IEEE, pp 275–282

  14. Damen D, Hogg D (2009) Recognizing linked events: searching the space of feasible explanations. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 927–934

  15. Dan X, Elisa R, Yan Y, Jingkuan S, Nicu S (2015) Learning deep representations of appearance and motion for anomalous event detection. In: The british machine vision conference (BMVC). BMVA Press, pp 1–12

  16. Datta A, Shah M, da Vitoria Lobo N (2002) Person-on-person violence detection in video data. In: Proceedings of IEEE international conference on image processing (ICIP2002), pp 433–438

  17. de Souza FDM, Chavez GC, do Valle EA, de A, Araujo A (2010) Violence detection in video using spatio-temporal features. In: Proceedings of the 23rd SIBGRAPI conference on graphics, patterns and images, SIBGRAPI 2010. IEEE, pp 224–230

  18. Gao L, Song J, Nie F, Yan Y, Sebe N, Shen HT (2015) Optimal graph leaning with partial tags and multiple features for image and video annotation. In: IEEE conference on computer vision and pattern recognition, pp 4371–4379

  19. Geng X, Yu C, Hu G (2012) Unsupervised feature selection by kernel density estimation in wavelet-based spike sorting. Biomed Signal Process Control 7(2):112–117

    Article  Google Scholar 

  20. Huesmann L, Moise-Titus J, Podolski C, Eron L (2003) Longitudinal relations between childrens exposure to tv violence and their aggressive and violent behavior in young adulthood. Dev Psychol 39(2):201–221

    Article  Google Scholar 

  21. Li S, Gong D, Yuan Y (2013) Face recognition using weber local descriptors. In: Neurocomputing, vol 122. Elsevier, Amsterdam, pp 272–283

  22. Liang Y, Hany F, Tapio S, Esko A (2014) Physical violence detection for preventing school bullying. Advances in Artificial Intelligence, pp 1–9

  23. Lin J, Wang W (2009) Weakly-supervised violence detection in movies with audio and video based co-training. In: The 10th IEEE pacific-rim conference on multimedia, Dec. ACM, pp 990–935

  24. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  25. Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1975–1981

  26. Mairal G, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26th annual international conference on machine learning (ICML-09). JMLR.org, pp 689–696

  27. Marco B, Alberto DB, Lorenzo S (2012) Multi-scale and real-time non-parametric approach for anomaly detection and localization. Comput Vis Image Underst 116(3):320–329

    Article  Google Scholar 

  28. Mehrsan JR, Martin L (2013) Online dominant and anomalous behavior detection in videos. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR), pp 2609–2616

  29. Nam J, Alghoniemy M, Tewfik A (1998) Audio-visual content-based violent scene characterization. In: Proceedings of IEEE international conference on image processing (ICIP1998), pp 353–357

  30. Nguyen N, Phung D, Venkatesh S, Bui H (2005) Learning and detecting activities from movement trajectories using the hierarchical hidden markov model. In: 2005 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 955–960

  31. Oikonomopoulos A, Patras I, Pantic M, Paragios N (2007) Trajectory-based representation of human actions. Artificial Intelligence for Human Computing 44(51):133–154

    Article  Google Scholar 

  32. Popoola OP, Wang K (2012) Video-based abnormal human behavior recognition - a review. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):865–878

    Article  Google Scholar 

  33. Saghafi B, Rajan D (2012) Human action recognition using pose-based discriminant embedding. Signal Process Image Commun 27(1):96–111

    Article  Google Scholar 

  34. Sarvesh V, Anupam A (2013) A survey on activity recognition and behavior understanding in surveillance video. Vis Comput 29(10):983–1009

    Article  Google Scholar 

  35. Shi Y, Huang Y, Minnen D, Bobick A, Essa I (2004) Propagation networks for recognition of partially ordered sequential action. In: 2004 IEEE conference on computer vision and pattern recognition (CVPR), pp 862–869

  36. Tal H, Yossi I, Orit KG (2012) Violent flows: real-time detection of violent crowd behavior. In: 3rd IEEE international workshop on socially intelligent surveillance and monitoring (SISM) at the IEEE conference on computer vision and pattern recognition (CVPR), pp 1–6

  37. Tran D, Sorokin A (2008) Human activity recognition with metric learning. In: European conference on computer vision (ECCV), 2008. Springer, Berlin Heidelberg New York, pp 548–561

  38. Vishwakarma S, Sapre A, Agrawal A (2011) Action recognition using cuboids of interest points. In: IEEE international conference on signal processing, communications and computing (ICSPCC). IEEE, pp 1–6

  39. Wang B, Li W, Yang W, Liao Q (2011) Illumination normalization based on weber’s law with application to face recognition. In: Signal processing letters, IEEE. IEEE, vol 18, pp 462–465

  40. Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1794–1801

  41. Yang J, Yu K, Huang T (2010) Supervised translation-invariant sparse coding. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), pp 3517–3524

  42. Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann AG (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimedia. pp 572–581

  43. Zhang D, Gatica-Perez D, Bengio S, McCowan I (2006) Modeling individual and group actions in meetings with layered hmms. IEEE Trans Multimedia 8(3):509–520

    Article  Google Scholar 

  44. Zhang T, Yang Z, Jia W, Yang B, Yang J, He X (2015) A new method for violence detection in surveillance scenes. Multimedia Tools and Applications, pp 1–23

  45. Zhou W, Wang C, Xiao B, Zhang Z (2014) Action recognition via structured codebook construction. Signal Process Image Commun 29(4):546–555

    Article  Google Scholar 

  46. Zhu Y, Zhao X, Fu Y, Liu Y (2011) Sparse coding on local spatial-temporal volumes for human action recognition. In: 10th Asian conference on computer vision, ACCV2010, pp 660–671

Download references

Acknowledgments

This research was partly supported by NSFC, China (No: 61273258, 61375048, 61170109).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, T., Jia, W., Yang, B. et al. MoWLD: a robust motion image descriptor for violence detection. Multimed Tools Appl 76, 1419–1438 (2017). https://doi.org/10.1007/s11042-015-3133-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3133-0

Keywords

Navigation