Towards a High-Level Audio Framework for Video Retrieval Combining Conceptual Descriptions and Fully-Automated Processes

  • Mbarek Charhad
  • Mohammed Belkhatir
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3767)


The growing need for ’intelligent’ video retrieval systems leads to new architectures combining multiple characterizations of the video content that rely on highly expressive frameworks while providing fully-automated indexing and retrieval processes. As a matter of fact, addressing the problem of combining modalities within expressive frameworks for video indexing and retrieval is of huge importance and the only solution for achieving significant retrieval performance. This paper presents a multi-facetted conceptual framework integrating multiple characterizations of the audio content for automatic video retrieval. It relies on an expressive representation formalism handling high-level audio descriptions of a video document and a full-text query framework in an attempt to operate video indexing and retrieval on audio features beyond state-of-the-art architectures operating on low-level features and keyword-annotation frameworks. Experiments on the multimedia topic search task of the TRECVID 2004 evaluation campaign validate our proposal.


Automatic Speech Recognition Video Retrieval Video Shot Audio Feature Query Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amato, G., Mainetto, G., Savino, P.: An Approach to a Content-Based Retrieval of Multimedia Data. Multimedia Tools and Applications 7, 9–36 (1998)CrossRefGoogle Scholar
  2. 2.
    Arslan, U., Dönderler, M.-E., Saykol, E., Ulusoy, Ö., Güdükbay, U.: A Semi-Automatic Semantic Annotation Tool for Video Databases. In: Workshop on Multimedia Semantics (SOFSEM 2002), pp. 1–10. The Czech Republic (2002)Google Scholar
  3. 3.
    Assfalg, J., Bertini, M., Colombo, C., Del Bimbo, A.: Semantic Annotation of Sports Videos. IEEE MultiMedia 9(2), 52–60 (2002)CrossRefGoogle Scholar
  4. 4.
    Bertini, M., Del Bimbo, A., Nunziati, W.: Annotation and Retrieval of Structured Video Documents. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 14–16. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Gauvain, J.-L., Lamel, L., Adda, G.: The LIMSI Broadcast News transcription system. Speech Communication 37, 89–108 (2002)zbMATHCrossRefGoogle Scholar
  6. 6.
    Gong, Y., Chua, H.C., Guo, X.Y.: Image Indexing and Retrieval Based on Color Histograms. Multimedia Tools and App. II, 133–156 (1996)Google Scholar
  7. 7.
    Jiang, H., Danilo Montesi, D., Ahmed, K., Elmagarmid, A.k.: Integrated video and text for content-based access to video databases. Multimedia Tools and Applications (1999)Google Scholar
  8. 8.
    Jiang, H., Abdelsalam Helal, A., Ahmed, K., Elmagarmid, A.k., Joshi, A.: Scene change detection techniques for video database systems. ACM Multimedia Systems 6, 186–195 (1998)CrossRefGoogle Scholar
  9. 9.
    Jiang, H., Danilo Montesi, D., Ahmed, K., Elmagarmid, A.k.: VideoText database systems. In: Int’l Conf. on Multimedia Computing and Systems, pp. 334–351 (1997)Google Scholar
  10. 10.
    Kemp, T., Schmidt, M., Westphal, M., Waibel, A.: Strategies for Automatic Segmentation of Audio Data. In: ICASSP, pp. 1423–1426 (2000)Google Scholar
  11. 11.
    Kraaij, W., Smeaton, A., Over, P.: TRECVID 2004– An Overview (2004)Google Scholar
  12. 12.
    Kwon, S., Narayanan, S.: Speaker Change Detection Using a New Weighted Distance Measure. In: ICSLP, pp. 16–20 (2002)Google Scholar
  13. 13.
    Lozano, R., Martin, H.: Querying virtual videos using path and temporal expressions. ACM Symposium on Applied Computing (1998)Google Scholar
  14. 14.
    Ounis, I., Pasca, M.: RELIEF: Combining expressiveness and rapidity into a single system. In: SIGIR, pp. 266–274 (1998)Google Scholar
  15. 15.
    Sowa, J.F.: Conceptual structures: information processing in mind and machine. Addison-Wesley, Reading (1984)zbMATHGoogle Scholar
  16. 16.
    Tran, D.A., Hua, K.A., Vu, K.: VideoGraph: A Graphical Object-based Model for Representing and Querying Video Data. In: ICCM, pp. 383–396 (2000)Google Scholar
  17. 17.
    Quénot, G.: TREC-10 Shot Boundary Detection Task: CLIPS System Description and Evaluation. In: TREC 2001 (2001)Google Scholar
  18. 18.
    VanRijsbergen, C.J.: A Non-Classical Logic for Information Retrieval. Comput. J. 29(6), 481–485 (1986)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Mbarek Charhad
    • 1
  • Mohammed Belkhatir
    • 1
  1. 1.IMAG-CNRSGrenobleFrance

Personalised recommendations