Multimedia Event Detection Using Segment-Based Approach for Motion Feature

  • Sang Phan
  • Thanh Duc Ngo
  • Vu Lam
  • Son Tran
  • Duy-Dinh Le
  • Duc Anh Duong
  • Shin’ichi Satoh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7674)


Detecting event in multimedia video has become a popular research topic. One of the most important clues to determine an event in video is its motion features. Currently, motion features are often extracted from the whole video using dense sampling strategy. However, this extraction method is computationally prohibitive when it comes to large scale video dataset. Moreover, video length may be very different, which makes it unreliable to compare the feature between videos. In this paper, we propose to use segment-based approach to extract motion feature. Basically, original videos are quantized into fixed-length segments for both training and testing, while still keep evaluation at video-level. Our approach has achieved promising results when applying for dense trajectory motion feature on TRECVID 2010 Multimedia Event Detection (MED) dataset. Combining with global and local features, our event detection system has comparable performance with other state-of-the-art MED systems, while the computational cost is significantly reduced.


multimedia event detection segment-based keyframe-based dense trajectory 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jiang, Y.G., Zeng, X., Ye, G., Bhattacharya, S., Ellis, D., Shah, M., Chang, S.F.: Columbia-ucf trecvid2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching. In: NIST TRECVID Workshop, Gaithersburg, MD (November 2010)Google Scholar
  2. 2.
    Hill, M., Hua, G., Natsev, A., Smith, J.R., Xie, L., Huang, B., Merler, M., Ouyang, H., Zhou, M.: Ibm research trecvid-2010 video copy detection and multimedia event detection system. In: NIST TRECVID Workshop, Gaithersburg, MD (November 2010)Google Scholar
  3. 3.
    Matsuo, T., Nakajima, S.: Nikon multimedia event detection system. In: NIST TRECVID Workshop, Gaithersburg, MD (November 2010)Google Scholar
  4. 4.
    Natarajan, P., Manohar, V., Wu, S., Tsakalidis, S., Vitaladevuni, S.N., Zhuang, X., Prasad, R., Ye, G., Liu, D.: Bbn viser trecvid 2011 multimedia event detection system. In: NIST TRECVID Workshop, Gaithersburg, MD (December 2011)Google Scholar
  5. 5.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  6. 6.
    Laptev, I.: On space-time interest points. International Journal of Computer Vision 64(2-3), 107–123 (2005)CrossRefGoogle Scholar
  7. 7.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42(3), 145–175 (2001)zbMATHCrossRefGoogle Scholar
  8. 8.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Conference on Computer Vision & Pattern Recognition (June 2008)Google Scholar
  9. 9.
    van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1582–1596 (2010)CrossRefGoogle Scholar
  10. 10.
    Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: British Machine Vision Conference, pp. 995–1004 (September 2008)Google Scholar
  11. 11.
    Chen, M., Hauptmann, A.: Mosift: Recognizing human actions in surveillance videos. In: Computer Science Department, CMU-CS-09-161 (2009)Google Scholar
  12. 12.
    Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action Recognition by Dense Trajectories. In: IEEE Conference on Computer Vision & Pattern Recognition, Colorado Springs, United States, pp. 3169–3176 (June 2011)Google Scholar
  13. 13.
    Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–22 (2004)Google Scholar
  14. 14.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International Conference on Computer Vision & Pattern Recognition, INRIA Rhône-Alpes, ZIRST-655, av. de l’Europe, Montbonnot-38334, vol. 2, pp. 886–893 (June 2005)Google Scholar
  15. 15.
    Dalal, N., Triggs, B., Schmid, C.: Human Detection Using Oriented Histograms of Flow and Appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Jiang, Y.G., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, pp. 494–501 (2007)Google Scholar
  17. 17.
    Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.G.: Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Transactions on Multimedia 12, 42–53 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sang Phan
    • 1
  • Thanh Duc Ngo
    • 1
  • Vu Lam
    • 2
  • Son Tran
    • 2
  • Duy-Dinh Le
    • 4
  • Duc Anh Duong
    • 3
  • Shin’ichi Satoh
    • 4
  1. 1.The Graduate University for Advanced StudiesJapan
  2. 2.University of Science, VNU-HCMVietnam
  3. 3.University of Information Technology, VNU-HCMVietnam
  4. 4.National Institute of InformaticsJapan

Personalised recommendations