Multimedia Tools and Applications

, Volume 74, Issue 17, pp 7379–7404 | Cite as

VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation

  • Claire-Hélène Demarty
  • Cédric Penet
  • Mohammad Soleymani
  • Guillaume Gravier
Article

Abstract

Content-based analysis to find where violence appears in multimedia content has several applications, from parental control and children protection to surveillance. This paper presents the design and annotation of the Violent Scene Detection dataset, a corpus targeting the detection of physical violence in Hollywood movies. We discuss definitions of physical violence and provide a simple and objective definition which was used to annotate a set of 18 movies, thus resulting in the largest freely-available dataset for such a task. We discuss borderline cases and compare with annotations based on a subjective definition which requires multiple annotators. We provide a detailed analysis of the corpus, in particular regarding the relationship between violence and a set of key audio and visual concepts which were also annotated. The VSD dataset results from two years of benchmarking in the framework of the MediaEval initiative. We provide results from the 2011 and 2012 benchmarks as a validation of the dataset and as a state-of-the-art baseline. The VSD dataset is freely available at the address: http://www.technicolor.com/en/innovation/research-innovation/scientific-data-sharing/violent-scenes-dataset..

Keywords

Content-based analysis Multimedia evaluation Violent scene detection Corpus design Physical violence definition Semantic audio concepts Semantic video concepts 

References

  1. 1.
    Acar E, Albayrak S (2012) Dai lab at mediaeval 2012 affect task: the detection of violent scenes using affective features. In: MediaEval 2012, multimedia benchmark workshopGoogle Scholar
  2. 2.
    Acar E, Spiegel S, Albayrak S (2011) Mediaeval 2011 affect task: violent scene detection combining audio and visual features with svm. In: MediaEval 2011, multimedia benchmark workshopGoogle Scholar
  3. 3.
    Chen L-H, Hsu H-W, Wang L-Y, Su C-W (2011) Violence detection in movies. In: 2011 8th international conference on computer graphics, imaging and visualization (CGIV), pp 119–124Google Scholar
  4. 4.
    Chen L-H, Su C-W, Weng C-F, Liao H-YM (2009) Action scene detection with support vector machines. J Multimed 4:248–253Google Scholar
  5. 5.
    Chen Y, Zhang L, Lin B, Xu Y, Ren X (2011) Fighting detection based on optical flow context histogram. In: Second international conference on innovations in Bio-inspired computing and applications (IBICA), 2011, pp 95–98Google Scholar
  6. 6.
    de Souza F D M, Chavez G C, do Valle Jr E A, de Araujo AA (2010) Violence detection in video using spatio-temporal features. In: Proceedings of the 2010 23rd SIBGRAPI conference on graphics, patterns and images. IEEE Computer Society, Washington, DC, pp 224–230Google Scholar
  7. 7.
    Demarty C-H, Penet C, Gravier G, Soleymani M (2011) The mediaeval 2011 affect task: violent scenes detection in hollywood movies. In: MediaEval 2011, multimedia benchmark workshop, CEUR workshop proceedings, vol 807. CEUR-WS.orgGoogle Scholar
  8. 8.
    Demarty C-H, Penet C, Gravier G, Soleymani M (2011) The MediaEval 2012 affect task: violent scenes detection. In: MediaEval 2012 workshop, vol 927, Pisa, Italy, 4–5 October 2012. ceur-ws.org.Google Scholar
  9. 9.
    Demarty C-H, Penet C, Gravier G, Soleymani M (2012) A benchmarking campaign for the multimodal detection of violent scenes in movies. In: Springer, editor, ECCV 2012 workshop on IFCVCR, pp 416–425Google Scholar
  10. 10.
    Derbas N, Thollard F, Safadi B, Quénot G (2012) Lig at mediaeval 2012 affect task: use of a generic method. In: MediaEval 2012, multimedia benchmark workshopGoogle Scholar
  11. 11.
    Eyben F, Weninger F, Lehment N, Rigoll G, Schuller B (2012) Violent scenes detection with large, brute-forced acoustic and visual feature sets. In: MediaEval 2012 multimedia benchmark workshopGoogle Scholar
  12. 12.
    Giannakopoulos T, Kosmopoulos DI, Aristidou A, Theodoridis S (2006) Violence content classification using audio features. In: Proceedings of the 4th helenic conference on artificial intelligence, pp 502–507Google Scholar
  13. 13.
    Giannakopoulos T, Kosmopoulos DI, Aristidou A, Theodoridis S (2007) A multi-class audio classification method with respect to violent content in movies using Bayesian networks. In: Proceedings of the 9th IEEE workshop on multimedia signal processing, pp 90–93Google Scholar
  14. 14.
    Giannakopoulos T, Makris A, Kosmopoulos D, Perantonis S, Theodoridis S (2010) Audio-visual fusion for detecting violent scenes in videos. In: Konstantopoulos S et al (eds) Artificial intelligence: theories, models and applications, LNCS, vol 6040. Springer, pp 91–100Google Scholar
  15. 15.
    Gninkoun G, Soleymani M (2011) Automatic violence scenes detection: a multi-modal approach. In: MediaEval 2011, multimedia benchmark workshopGoogle Scholar
  16. 16.
    Gong Y, Wang W, Jiang S, Huang Q, Gao W (2008) Detecting violent scenes in movies by auditory and visual cues. In: Huang Y-M et al (eds) Advances in multimedia information processing - PCM 2008, LNCS, vol 5353. Springer, pp 317–326Google Scholar
  17. 17.
    Jiang Y-G, Dai Q, Tan CC, Xue X, Ngo C-W (2012) The shanghai-hongkong team at mediaeval 2012: violent scene detection using trajectory-based features. In: MediaEval 2012, multimedia benchmark workshopGoogle Scholar
  18. 18.
    Kriegel B (2003) La violence à la télévision. Rapport de la Mission d’évaluation, d’analyse et de propositions relative aux représentations violentes à la télévision. Technical report, Ministère de la Culture et de la Communication, Paris, FranceGoogle Scholar
  19. 19.
    Krug E G, Mercy J A, Dahlberg L L, Zwi A B (2002) The world report on violence and health. Lancet 360(9339):1083–1088CrossRefGoogle Scholar
  20. 20.
    Lam V, Le D-D, Le S-P, Satoh S, Duong DA (2012) Nii, Japan at Mediaeval 2012 violent scenes detection affect task. In: MediaEval 2011, multimedia benchmark workshopGoogle Scholar
  21. 21.
    Lam V, Le D-D, Satoh S, Duong DA (2011) Nii, Japan at Mediaeval 2011 violent scenes detection task. In: MediaEval 2011, multimedia benchmark workshopGoogle Scholar
  22. 22.
    Larson M, Rae A, Demarty C-H, Koer C, Metze F, Troncy R, Mezaris V, Jones GJF (eds) (2011) Working notes proceedings of the MediaEval 2011 workshop, Pisa, Italy, 1–2 September 2011, CEUR workshop proceedings, vol 807. CEUR-WS.orgGoogle Scholar
  23. 23.
    Larson M, Schmiedeke S, Kelm P, Rae A, Mezaris V, Piatrik T, Soleymani M, Metze F, Jones GJF (eds) (2012) Working notes proceedings of the MediaEval 2012 workshop, Pisa, Italy, 4–5 October 2012, CEUR workshop proceedings, vol 927. CEUR-WS.orgGoogle Scholar
  24. 24.
    Li L (2012) A novel violent videos classification scheme based on the bag of audio words features. In: 2012 9th international conference on information technology: new generations (ITNG), pp 7–13Google Scholar
  25. 25.
    Lin W, Sun M-T, Poovendran R, Zhang Z (2010) Group event detection with a varying number of group members for video surveillance. IEEE Trans Circ Syst Video Technol 20(8):1057–1067CrossRefGoogle Scholar
  26. 26.
    Lin J, Sun Y, Wang W (2010) Violence detection in movies with auditory and visual cues. In: Proceedings of the international conference on computational intelligence and security, pp 561 –565Google Scholar
  27. 27.
    Lin J, Wang W (2009) Weakly-supervised violence detection in movies with audio and video based co-training. In: Proceedings of the 10th pacific-rim conference on multimedia, pp 930–935Google Scholar
  28. 28.
    Marszałek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE conference on computer vision & pattern recognitionGoogle Scholar
  29. 29.
    Moncrieff S, Dorai C, Venkatesh S (2001) Affect computing in film through sound energy dynamics. In: Proceedings of the ACM international conference on multimedia, pp 525–527Google Scholar
  30. 30.
    Moncrieff S, Dorai C, Venkatesh S (2001) Detecting indexical signs in film audio for scene interpretation. In: Proceedings of the IEEE internation conference on multimedia and expo. pp 989–992Google Scholar
  31. 31.
    Nievas EB, Suarez OD, García G B, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Proceedings of the 14th international conference on computer analysis of images and patterns - vol Part II, CAIP’11. Springer, Berlin, pp 332–339Google Scholar
  32. 32.
    Over P, Awad G, Fiscus J, Antonishek B, Michel M, Smeaton FA, Kraaij W, Quénot G (2011) An overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID 2011 - TREC video retrieval evaluation online, Gaithersburg, MD, USAGoogle Scholar
  33. 33.
    Penet C, Demarty C-H, Gravier G, Gros P (2011) Technicolor and inria/irisa at mediaeval 2011: learning temporal modality integration with bayesian networks. In: MediaEval 2011, multimedia benchmark workshop, CEUR workshop proceedings, vol 807. CEUR-WS.orgGoogle Scholar
  34. 34.
    Penet C, Demarty C-H, Soleymani M, Gravier G, Gros P (2012) Technicolor/Inria/Imperial College London at the Mediaeval 2012 violent scene detection task. In: MediaEval 2012, multimedia benchmark workshopGoogle Scholar
  35. 35.
    Perperis T, Giannakopoulos T, Makris A, Kosmopoulos DI, Tsekeridou S, Perantonis SJ, Theodoridis S (2011) Multimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies. J Expert Syst Appl 38(11):14102–14116Google Scholar
  36. 36.
    Safadi B, Quéenot G (2011) Lig at Mediaeval 2011 affect task: use of a generic method. In: MediaEval 2011, multimedia benchmark workshopGoogle Scholar
  37. 37.
    Schlüter J, Ionescu B, Mironicǎ I, Schedl M (2012) Arf @ mediaeval 2012: an uninformed approach to violence detection in hollywood movies. In: MediaEval 2012, multimedia benchmark workshopGoogle Scholar
  38. 38.
    Vasconcelos N, Lippman A (1997) Towards semantically meaningful feature spaces for the characterization of video content. In: Proceedings of the IEEE international conference on image processing, vol 1, pp 25–28Google Scholar
  39. 39.
    Wang S, Jiang S, Huang Q, Gao W (2008) Shot classification for action movies based on motion characteristics. In: Proceedings of the IEEE international conference on image processing, pp 2508–2511Google Scholar
  40. 40.
    WHO (1996) Violence: a public health priority. Technical Report WHO/EHA/SPI.POA.2, World Health Organization, Geneva, SwitzerlandGoogle Scholar
  41. 41.
    Zajdel W, Krijnders JD, Andringa T, Gavrila DM (2007) Cassandra: audio-video sensor fusion for aggression detection. In: IEEE conference on advanced video and signal based surveillance, 2007. AVSS 2007. IEEE, pp 200-205Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Claire-Hélène Demarty
    • 1
  • Cédric Penet
    • 1
  • Mohammad Soleymani
    • 2
  • Guillaume Gravier
    • 3
  1. 1.TechnicolorCesson SévignéFrance
  2. 2.Imperial College LondonLondonUK
  3. 3.CNRS - IrisaRennes CedexFrance

Personalised recommendations