Detecting Violent Scenes in Movies by Auditory and Visual Cues
To detect violence in movies, we present a three-stage method integrating visual and auditory cues. In our method, those shots with potential violent content are first identified according to universal film-making rules. A modified semi-supervised learning technique based on semi-supervised cross feature learning (SCFL) is exploited, since it is capable to combine different types of features and use unlabeled data to improve the classification performance. Then, typical violence-related audio effects are further detected for the candidate shots, and we manage to transform the confidences outputted by the classifiers of various audio events into a shot-based violence score. Finally, the first two-stage probabilistic outputs are integrated in a boosting way to generate the final inference. The experimental results on four typical action movies preliminarily show the effectiveness of our method.
KeywordsViolence Detection Semi-supervised Cross Feature Learning Audio Effects
Unable to display preview. Download preview PDF.
- 1.Datta, A., Shah, M., Lobo, N.D.V.: Person-on-Person Violence Detection in Video Data. In: IEEE International Conference on Pattern Recognition, pp. 433–438 (2002)Google Scholar
- 2.Nam, J., Alghoniemy, M., Tewfik, A.H.: Audio-visual content-based violent scene characterization. In: IEEE International Conference on Image Processing, pp. 353–357 (1998)Google Scholar
- 3.Cheng, W., Chu, W., Wu, J.: Semantic context detection based on hierarchical audio models. In: Proceedings of the 5th ACM SIGMM international Workshop on Multimedia information Retrieval, pp. 109–115 (2003)Google Scholar
- 4.Smeaton, A.F., Lehane, B., O’Connor, N.E., Brady, C., Craig, G.: Automatically selecting shots for action movie trailers. In: Proceedings of the 8th ACM international Workshop on Multimedia information Retrieval, pp. 231–238 (2006)Google Scholar
- 5.Yan, R., Naphade, M.: Semi-supervised cross feature learning for semantic concept detection in videos. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 657–663 (2005)Google Scholar
- 11.Bordwell, D., Thompson, K.: Film Art: An Introduction, 4th edn. McGraw-Hill, New York (1996)Google Scholar
- 12.Wold, E., Blum, T., Keislar, D., Wheaten, J.: Content-based classification, search, and retrieval of audio, Multimedia. IEEE 3, 27–36 (1996)Google Scholar
- 13.Wu, T.-F., Lin, C.-J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research, 975–1005 (2004)Google Scholar