Chapter

Smart Information Systems

Part of the series Advances in Computer Vision and Pattern Recognition pp 291-314

Date:

Detecting Violent Content in Hollywood Movies and User-Generated Videos

  • Esra AcarAffiliated withTechnische Universität Berlin Email author 
  • , Melanie IrrgangAffiliated withTechnische Universität Berlin
  • , Dominique ManiryAffiliated withTechnische Universität Berlin
  • , Frank HopfgartnerAffiliated withTechnische Universität Berlin

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Detecting violent scenes in videos is an important content understanding functionality, e.g., for providing automated youth protection services. The key issues in designing violence detection algorithms are the choice of discriminative features and learning effective models. We employ low and mid-level audio-visual features and evaluate their discriminative power within the context of the MediaEval Violent Scenes Detection (VSD) task. The audio-visual cues are fused at the decision level. As audio features, Mel-Frequency Cepstral Coefficients (MFCC), and as visual features dense histogram of oriented gradient (HoG), histogram of oriented optical flow (HoF), Violent Flows (ViF), and affect-related color descriptors are used. We perform feature space partitioning of the violence training samples through k-means clustering and train a different model for each cluster. These models are then used to predict the violence level of videos by employing two-class support vector machines (SVMs). The experimental results in Hollywood movies and short web videos show that mid-level audio features are more discriminative than the visual features, and that the performance is further enhanced by fusing the audio-visual cues at the decision level.