Evaluation of multiple features for violent scenes detection
- First Online:
- Cite this article as:
- Lam, V., Phan, S., Le, DD. et al. Multimed Tools Appl (2017) 76: 7041. doi:10.1007/s11042-016-3331-4
Violent scenes detection (VSD) is a challenging problem because of the heterogeneous content, large variations in video quality, and complex semantic meanings of the concepts involved. In the last few years, combining multiple features from multi-modalities has proven to be an effective strategy for general multimedia event detection (MED), but the specific event detection like VSD has been comparatively less studied. Here, we evaluated the use of multiple features and their combination in a violent scenes detection system. We rigorously analyzed a set of low-level features and a deep learning feature that captures the appearance, color, texture, motion and audio in video. We also evaluated the utility of mid-level visual information obtained from detecting related violent concepts. Experiments were performed on the publicly available MediaEval VSD 2014 dataset. The results showed that visual and motion features are better than audio features. Moreover, the performance of the mid-level features was nearly as good as that of the low-level visual features. Experiments with a number of fusion methods showed that all single features are complementary and help to improve overall performance. This study also provides an empirical foundation for selecting feature sets that are capable of dealing with heterogeneous content comprising violent scenes in movies.
KeywordsViolent scenes detection Video retrieval Multi-modal fusion Multiple features
|Funder Name||Grant Number||Funding Note|
|Vietnam National University Ho Chi Minh City (VNU-HCM) Grant|