Large Scale Concept Detection in Video Using a Region Thesaurus

  • Evaggelos Spyrou
  • Giorgos Tolias
  • Yannis Avrithis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5371)


This paper presents an approach on high-level feature detection within video documents, using a Region Thesaurus. A video shot is represented by a single keyframe and MPEG-7 features are extracted locally, from coarse segmented regions. Then a clustering algorithm is applied on those extracted regions and a region thesaurus is constructed to facilitate the description of each keyframe at a higher level than the low-level descriptors but at a lower than the high-level concepts. A model vector representation is formed and several high-level concept detectors are appropriately trained using a global keyframe annotation. The proposed approach is thoroughly evaluated on the TRECVID 2007 development data for the detection of nine high level concepts, demonstrating sufficient performance on large data sets.


Average Precision Model Vector Dominant Color Edge Histogram Video Document 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Saux, B., Amato, G.: Image classifiers for scene analysis. In: International Conference on Computer Vision and Graphics (2004)Google Scholar
  2. 2.
    Gokalp, D., Aksoy, S.: Scene classification using bag-of-regions representations. In: IEEE Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  3. 3.
    Dance, C., Willamowski, J., Fan, L., Bray, C., Csurka, G.: Visual categorization with bags of keypoints. In: ECCV - International Workshop on Statistical Learning in Computer Vision (2004)Google Scholar
  4. 4.
    Boujemaa, N., Fleuret, F., Gouet, V., Sahbi, H.: Visual content extraction for automatic semantic annotation of video news. In: IS&T/SPIE Conf. on Storage and Retrieval Methods and Applications for Multimedia (2004)Google Scholar
  5. 5.
    Voisine, N., Dasiopoulou, S., Mezaris, V., Spyrou, E., Athanasiadis, T., Kompatsiaris, I., Avrithis, Y., Strintzis, M.G.: Knowledge-assisted video analysis using a genetic algorithm. In: 6th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS (2005)Google Scholar
  6. 6.
    IBM: MARVEL Multimedia Analysis and Retrieval System. IBM Research White paper (2005)Google Scholar
  7. 7.
    Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision (2008)Google Scholar
  8. 8.
    Naphade, M.R., Kennedy, L., Kender, J.R., Chang, S.F., Smith, J.R., Over, P., Hauptmann, A.: A Light Scale Concept Ontology for Multimedia understanding for trecvid (IBM Research Technical Report (2005)Google Scholar
  9. 9.
    Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: MIR 2006: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pp. 321–330. ACM Press, New York (2006)Google Scholar
  10. 10.
    Avrithis, Y., Doulamis, A., Doulamis, N., Kollias, S.: A stochastic framework for optimal key frame extraction from mpeg video databases. Computer Vision and Image Understanding 75 (1/2), 3–24 (1999)CrossRefGoogle Scholar
  11. 11.
    Manjunath, B., Ohm, J., Vasudevan, V., Yamada, A.: Color and texture descriptors. IEEE trans. on Circuits and Systems for Video Technology 11(6), 703–715 (2001)CrossRefGoogle Scholar
  12. 12.
    Spyrou, E., LeBorgne, H., Mailis, T., Cooke, E., Avrithis, Y., O’Connor, N.: Fusing MPEG-7 visual descriptors for image classification. In: International Conference on Artificial Neural Networks (ICANN) (2005)Google Scholar
  13. 13.
    Molina, J., Spyrou, E., Sofou, N., Martinez, J.M.: On the selection of MPEG-7 visual descriptors and their level of detail for nature disaster video sequences classification. In: Falcidieno, B., Spagnuolo, M., Avrithis, Y., Kompatsiaris, I., Buitelaar, P. (eds.) SAMT 2007. LNCS, vol. 4816, pp. 70–73. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Ayache, S., Quenot, G.: TRECVID, collaborative annotation using active learning. In: TRECVID, Workshop, Gaithersburg (2007)Google Scholar
  15. 15.
    Kishida, K.: Property of average precision and its generalization: an examination of evaluation indicator for information retrieval. NII Technical Reports, NII-2005-014E (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Evaggelos Spyrou
    • 1
  • Giorgos Tolias
    • 1
  • Yannis Avrithis
    • 1
  1. 1.Image, Video and Multimedia Systems Laboratory, School of Electrical and Computer EngineeringNational Technical University of AthensAthensGreece

Personalised recommendations