VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation

Demarty, Claire-Hélène; Penet, Cédric; Soleymani, Mohammad; Gravier, Guillaume

doi:10.1007/s11042-014-1984-4

VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation

Published: 15 May 2014

Volume 74, pages 7379–7404, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Claire-Hélène Demarty¹,
Cédric Penet¹,
Mohammad Soleymani² &
…
Guillaume Gravier³

1703 Accesses
30 Citations
Explore all metrics

Abstract

Content-based analysis to find where violence appears in multimedia content has several applications, from parental control and children protection to surveillance. This paper presents the design and annotation of the Violent Scene Detection dataset, a corpus targeting the detection of physical violence in Hollywood movies. We discuss definitions of physical violence and provide a simple and objective definition which was used to annotate a set of 18 movies, thus resulting in the largest freely-available dataset for such a task. We discuss borderline cases and compare with annotations based on a subjective definition which requires multiple annotators. We provide a detailed analysis of the corpus, in particular regarding the relationship between violence and a set of key audio and visual concepts which were also annotated. The VSD dataset results from two years of benchmarking in the framework of the MediaEval initiative. We provide results from the 2011 and 2012 benchmarks as a validation of the dataset and as a state-of-the-art baseline. The VSD dataset is freely available at the address: http://www.technicolor.com/en/innovation/research-innovation/scientific-data-sharing/violent-scenes-dataset..

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Article 04 June 2022

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Narrative theory and the dynamics of popular movies

Article Open access 03 May 2016

Notes

References

Acar E, Albayrak S (2012) Dai lab at mediaeval 2012 affect task: the detection of violent scenes using affective features. In: MediaEval 2012, multimedia benchmark workshop
Acar E, Spiegel S, Albayrak S (2011) Mediaeval 2011 affect task: violent scene detection combining audio and visual features with svm. In: MediaEval 2011, multimedia benchmark workshop
Chen L-H, Hsu H-W, Wang L-Y, Su C-W (2011) Violence detection in movies. In: 2011 8th international conference on computer graphics, imaging and visualization (CGIV), pp 119–124
Chen L-H, Su C-W, Weng C-F, Liao H-YM (2009) Action scene detection with support vector machines. J Multimed 4:248–253
Google Scholar
Chen Y, Zhang L, Lin B, Xu Y, Ren X (2011) Fighting detection based on optical flow context histogram. In: Second international conference on innovations in Bio-inspired computing and applications (IBICA), 2011, pp 95–98
de Souza F D M, Chavez G C, do Valle Jr E A, de Araujo AA (2010) Violence detection in video using spatio-temporal features. In: Proceedings of the 2010 23rd SIBGRAPI conference on graphics, patterns and images. IEEE Computer Society, Washington, DC, pp 224–230
Demarty C-H, Penet C, Gravier G, Soleymani M (2011) The mediaeval 2011 affect task: violent scenes detection in hollywood movies. In: MediaEval 2011, multimedia benchmark workshop, CEUR workshop proceedings, vol 807. CEUR-WS.org
Demarty C-H, Penet C, Gravier G, Soleymani M (2011) The MediaEval 2012 affect task: violent scenes detection. In: MediaEval 2012 workshop, vol 927, Pisa, Italy, 4–5 October 2012. ceur-ws.org.
Demarty C-H, Penet C, Gravier G, Soleymani M (2012) A benchmarking campaign for the multimodal detection of violent scenes in movies. In: Springer, editor, ECCV 2012 workshop on IFCVCR, pp 416–425
Derbas N, Thollard F, Safadi B, Quénot G (2012) Lig at mediaeval 2012 affect task: use of a generic method. In: MediaEval 2012, multimedia benchmark workshop
Eyben F, Weninger F, Lehment N, Rigoll G, Schuller B (2012) Violent scenes detection with large, brute-forced acoustic and visual feature sets. In: MediaEval 2012 multimedia benchmark workshop
Giannakopoulos T, Kosmopoulos DI, Aristidou A, Theodoridis S (2006) Violence content classification using audio features. In: Proceedings of the 4th helenic conference on artificial intelligence, pp 502–507
Giannakopoulos T, Kosmopoulos DI, Aristidou A, Theodoridis S (2007) A multi-class audio classification method with respect to violent content in movies using Bayesian networks. In: Proceedings of the 9th IEEE workshop on multimedia signal processing, pp 90–93
Giannakopoulos T, Makris A, Kosmopoulos D, Perantonis S, Theodoridis S (2010) Audio-visual fusion for detecting violent scenes in videos. In: Konstantopoulos S et al (eds) Artificial intelligence: theories, models and applications, LNCS, vol 6040. Springer, pp 91–100
Gninkoun G, Soleymani M (2011) Automatic violence scenes detection: a multi-modal approach. In: MediaEval 2011, multimedia benchmark workshop
Gong Y, Wang W, Jiang S, Huang Q, Gao W (2008) Detecting violent scenes in movies by auditory and visual cues. In: Huang Y-M et al (eds) Advances in multimedia information processing - PCM 2008, LNCS, vol 5353. Springer, pp 317–326
Jiang Y-G, Dai Q, Tan CC, Xue X, Ngo C-W (2012) The shanghai-hongkong team at mediaeval 2012: violent scene detection using trajectory-based features. In: MediaEval 2012, multimedia benchmark workshop
Kriegel B (2003) La violence à la télévision. Rapport de la Mission d’évaluation, d’analyse et de propositions relative aux représentations violentes à la télévision. Technical report, Ministère de la Culture et de la Communication, Paris, France
Krug E G, Mercy J A, Dahlberg L L, Zwi A B (2002) The world report on violence and health. Lancet 360(9339):1083–1088
Article Google Scholar
Lam V, Le D-D, Le S-P, Satoh S, Duong DA (2012) Nii, Japan at Mediaeval 2012 violent scenes detection affect task. In: MediaEval 2011, multimedia benchmark workshop
Lam V, Le D-D, Satoh S, Duong DA (2011) Nii, Japan at Mediaeval 2011 violent scenes detection task. In: MediaEval 2011, multimedia benchmark workshop
Larson M, Rae A, Demarty C-H, Koer C, Metze F, Troncy R, Mezaris V, Jones GJF (eds) (2011) Working notes proceedings of the MediaEval 2011 workshop, Pisa, Italy, 1–2 September 2011, CEUR workshop proceedings, vol 807. CEUR-WS.org
Larson M, Schmiedeke S, Kelm P, Rae A, Mezaris V, Piatrik T, Soleymani M, Metze F, Jones GJF (eds) (2012) Working notes proceedings of the MediaEval 2012 workshop, Pisa, Italy, 4–5 October 2012, CEUR workshop proceedings, vol 927. CEUR-WS.org
Li L (2012) A novel violent videos classification scheme based on the bag of audio words features. In: 2012 9th international conference on information technology: new generations (ITNG), pp 7–13
Lin W, Sun M-T, Poovendran R, Zhang Z (2010) Group event detection with a varying number of group members for video surveillance. IEEE Trans Circ Syst Video Technol 20(8):1057–1067
Article Google Scholar
Lin J, Sun Y, Wang W (2010) Violence detection in movies with auditory and visual cues. In: Proceedings of the international conference on computational intelligence and security, pp 561 –565
Lin J, Wang W (2009) Weakly-supervised violence detection in movies with audio and video based co-training. In: Proceedings of the 10th pacific-rim conference on multimedia, pp 930–935
Marszałek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE conference on computer vision & pattern recognition
Moncrieff S, Dorai C, Venkatesh S (2001) Affect computing in film through sound energy dynamics. In: Proceedings of the ACM international conference on multimedia, pp 525–527
Moncrieff S, Dorai C, Venkatesh S (2001) Detecting indexical signs in film audio for scene interpretation. In: Proceedings of the IEEE internation conference on multimedia and expo. pp 989–992
Nievas EB, Suarez OD, García G B, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Proceedings of the 14th international conference on computer analysis of images and patterns - vol Part II, CAIP’11. Springer, Berlin, pp 332–339
Over P, Awad G, Fiscus J, Antonishek B, Michel M, Smeaton FA, Kraaij W, Quénot G (2011) An overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID 2011 - TREC video retrieval evaluation online, Gaithersburg, MD, USA
Penet C, Demarty C-H, Gravier G, Gros P (2011) Technicolor and inria/irisa at mediaeval 2011: learning temporal modality integration with bayesian networks. In: MediaEval 2011, multimedia benchmark workshop, CEUR workshop proceedings, vol 807. CEUR-WS.org
Penet C, Demarty C-H, Soleymani M, Gravier G, Gros P (2012) Technicolor/Inria/Imperial College London at the Mediaeval 2012 violent scene detection task. In: MediaEval 2012, multimedia benchmark workshop
Perperis T, Giannakopoulos T, Makris A, Kosmopoulos DI, Tsekeridou S, Perantonis SJ, Theodoridis S (2011) Multimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies. J Expert Syst Appl 38(11):14102–14116
Google Scholar
Safadi B, Quéenot G (2011) Lig at Mediaeval 2011 affect task: use of a generic method. In: MediaEval 2011, multimedia benchmark workshop
Schlüter J, Ionescu B, Mironicǎ I, Schedl M (2012) Arf @ mediaeval 2012: an uninformed approach to violence detection in hollywood movies. In: MediaEval 2012, multimedia benchmark workshop
Vasconcelos N, Lippman A (1997) Towards semantically meaningful feature spaces for the characterization of video content. In: Proceedings of the IEEE international conference on image processing, vol 1, pp 25–28
Wang S, Jiang S, Huang Q, Gao W (2008) Shot classification for action movies based on motion characteristics. In: Proceedings of the IEEE international conference on image processing, pp 2508–2511
WHO (1996) Violence: a public health priority. Technical Report WHO/EHA/SPI.POA.2, World Health Organization, Geneva, Switzerland
Zajdel W, Krijnders JD, Andringa T, Gavrila DM (2007) Cassandra: audio-video sensor fusion for aggression detection. In: IEEE conference on advanced video and signal based surveillance, 2007. AVSS 2007. IEEE, pp 200-205

Download references

Acknowledgments

This work was partially supported by the Quaero Program. We would also like to acknowledge the MediaEval Multimedia Benchmark for providing the framework to evaluate the task of violent scene detection.

Author information

Authors and Affiliations

Technicolor, 975 avenue des Champs Blancs, ZAC des Champs Blancs, 35576, Cesson Sévigné, France
Claire-Hélène Demarty & Cédric Penet
Imperial College London, Huxley building, 180 Queen’s gate, London, SW7 2AZ, UK
Mohammad Soleymani
CNRS - Irisa, Inria Rennes Campus de Beaulieu, 35042, Rennes Cedex, France
Guillaume Gravier

Authors

Claire-Hélène Demarty
View author publications
You can also search for this author in PubMed Google Scholar
Cédric Penet
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Soleymani
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Gravier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claire-Hélène Demarty.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Demarty, CH., Penet, C., Soleymani, M. et al. VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. Multimed Tools Appl 74, 7379–7404 (2015). https://doi.org/10.1007/s11042-014-1984-4

Download citation

Published: 15 May 2014
Issue Date: September 2015
DOI: https://doi.org/10.1007/s11042-014-1984-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation

Abstract

Access this article

Similar content being viewed by others

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Video summarization using deep learning techniques: a detailed analysis and investigation

Narrative theory and the dynamics of popular movies

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation

Abstract

Access this article

Similar content being viewed by others

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Video summarization using deep learning techniques: a detailed analysis and investigation

Narrative theory and the dynamics of popular movies

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation