Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training

  • Jian Lin
  • Weiqiang Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5879)


In this work, we present a novel method to detect violent shots in movies. The detection process is split into two views—the audio and video views. From the audio-view, a weakly-supervised method is exploited to improve the classification performance. And from the video-view, we use a classifier to detect violent shots. Finally, the auditory and visual classifiers are combined in a co-training way. The experimental results on several movies with violent contents preliminarily show the effectiveness of our method.


Violence Weakly-supervised pLSA Audio Video Co-training 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Swanson, M.D., Zhu, B., Tewfik, A.H.: Data Hiding for Video-in-Video. In: IEEE International Conference on Image Processing, vol. 2, pp. 676–679 (1997)Google Scholar
  2. 2.
    Vasconcelos, N., Lippman, A.: Towards Semantically Meaningful Feature Spaces for The Characterization of Video Content. In: Proceedings of International Conference on Image Processing, 1997, vol. 1, pp. 25–28 (1997)Google Scholar
  3. 3.
    Datta, A., Shah, M., Lobo, N.D.V.: Person-on-Person Violence Detection in Video Data. In: IEEE International Conference on Pattern Recognition, pp. 433–438 (2002)Google Scholar
  4. 4.
    Nam, J., Alghoniemy, M., Tewfik, A.H.: Audio-Visual Content-Based Violent Scene Characterization. In: IEEE International Conference on Image Processing, vol. 1, pp. 353–357 (1998)Google Scholar
  5. 5.
    Cheng, W., Chu, W., Wu, J.: Semantic Context Detection Based on Hierarchical Audio models. In: Proceedings of the 5th ACM SIGMM international Workshop on Multimedia information Retrieval, pp. 109–115 (2003)Google Scholar
  6. 6.
    Cai, L.J., Hofmann, T.: Text Categorization by Boosting Automatically Extracted Concepts. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and Development, pp. 182–189 (2003)Google Scholar
  7. 7.
    Akita, Y., Kawahara, K.: Language Modeling Adaptation Based on PLSA of Topics and Speakers. In: 8th International Conference on Spoken Language Processing, pp. 1045–1048 (2004)Google Scholar
  8. 8.
    Wold, E., Blum, T., Keislar, D., Wheaten, J.: Content-Based Classification, Search, and Retrieval of Audio, Multimedia, IEEE  3, 27–36 (1996)Google Scholar
  9. 9.
    Cai, R., Lu, L., Hanjalic, A., Zhang, H.J., Cai, L.H.: A Flexible Framework for Key Audio Effects Detection and Auditory Context Inference. IEEE Transaction on Audio, Speech and Language Processing 14, 1026–1039 (2006)CrossRefGoogle Scholar
  10. 10.
    Wang, Y., Liu, Z., Huang, J.C.: Multimedia Content Analysis Using Both Audio and Visual Clues. IEEE Signal Processing Magazine 17, 12–36 (2000)CrossRefGoogle Scholar
  11. 11.
    Sarkar, A.: Applying Co-Training Methods to Statistical Parsing. In: Proceedings of the 2nd Annual Meeting of the NAACL (2001)Google Scholar
  12. 12.
    Ng, V., Cardie, C.: Weakly Supervised Natural Language Learning Without Redundant Views. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 94–101 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Jian Lin
    • 1
  • Weiqiang Wang
    • 1
    • 2
  1. 1.Graduate University of Chinese Academy of SciencesBeijingChina
  2. 2.Institute of Computing TechnologyChinese Academy of SciencesBeijingChina

Personalised recommendations