Multi-level Fusion for Semantic Video Content Indexing and Retrieval

  • Rachid Benmokhtar
  • Benoit Huet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4918)


In this paper, we present the results of our work on the analysis of an automatic semantic video content indexing and retrieval system based on fusing various low level visual descriptors. Global MPEG-7 features extracted from video shots, are described via IVSM signature (Image Vector Space Model) in order to have a compact description of the content. Both static and dynamic feature fusion are introduced to obtain effective signatures. Support Vector Machines (SVMs) are employed to perform classification (One classifier per feature). The task of the classifiers is to detect the video semantic content. Then, classifier outputs are fused using a neural network based on evidence theory (NNET) in order to provide a decision on the content of each shot. The experimental results are conducted in the framework of the TRECVid feature extraction task.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mottaleb, M.A., Krishnamachari, S.: Multimedia descriptions based on MPEG-7: Extraction and applications. Proceeding of IEEE Multimedia 6, 459–468 (2004)CrossRefGoogle Scholar
  2. 2.
    Spyrou, E., Leborgne, H., Mailis, T., Cooke, E., Avrithis, Y., O’Connor, N.: Fusing MPEG-7 visual descriptors for image classification. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 847–852. Springer, Heidelberg (2005)Google Scholar
  3. 3.
    Rautiainen, M., Seppanen, T.: Comparison of visual features and fusion techniques in automatic detection of concepts from news video based on gabor filters. In: Proceeding of ICME, pp. 932–935 (2005)Google Scholar
  4. 4.
    Souvannavong, F., Merialdo, B., Huet, B.: Latent semantic analysis for an effective region based video shot retrieval system. In: Proceedings of ACM MIR, pp. 243–250 (2004)Google Scholar
  5. 5.
    Jolliffe, I.: Principle component analysis. Springer, Heidelberg (1986)CrossRefGoogle Scholar
  6. 6.
    Zhang, W., Shan, S., Gao, W., Chang, Y., Cao, B., Yang, P.: Information fusion in face identification. In: Proceedings of IEEE ICPR, vol. 3, pp. 950–953 (2004)Google Scholar
  7. 7.
    Vapnik, V.: The nature of statistical learning theory. Springer, Heidelberg (1995)zbMATHCrossRefGoogle Scholar
  8. 8.
    Shafer, G.: A mathematical theory of evidence. Princeton University Press, Princeton (1976)zbMATHGoogle Scholar
  9. 9.
    Benmokhtar, R., Huet, B.: Neural network combining classifier based on Dempster-Shafer theory. In: Cham, T.-J., Cai, J., Dorai, C., Rajan, D., Chua, T.-S., Chia, L.-T. (eds.) MMM 2007. LNCS, vol. 4351, pp. 196–205. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    TrecVid, Digital video retrieval at NIST,

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Rachid Benmokhtar
    • 1
  • Benoit Huet
    • 1
  1. 1.Département Communications MultimédiasInstitut EurécomSophia-AntipolisFrance

Personalised recommendations