Lessons for the Future from a Decade of Informedia Video Analysis Research

  • Alexander G. Hauptmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3568)


The overarching goal of the Informedia Digital Video Library project has been to achieve machine understanding of video media, including all aspects of search, retrieval, visualization and summarization in both contemporaneous and archival content collections. The base technology developed by the Informedia project combines speech, image and natural language understanding to automatically transcribe, segment and index broadcast video for intelligent search and image retrieval. While speech processing has been the most influential component in the success of the Informedia project, other modalities can be critical in various situations. Evaluations done in the context of the TRECVID benchmarks show that while some progress has been made, there is still a lot of work ahead. The fundamental “semantic gap” still exists, but there are a number of promising approaches to bridging it.


Speech Recognition Automatic Speech Recognition News Story Semantic Concept Video Retrieval 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hauptmann, A.G., Witbrock, M.J., Christel, M.G.: Artificial Intelligence Techniques in the Interface to a Digital Video Library. In: Extended Abstracts of the ACM CHI 1997 Conference on Human Factors in Computing Systems, New Orleans LA, pp. 2–3 (March 1997)Google Scholar
  2. 2.
    Christel, M., Smith, M., Taylor, C.R., Winkler, D.: Evolving Video Skims into Useful Multimedia Abstractions. In: Proc. of the ACM CHI 1998 Conference on Human Factors in Computing Systems, Los Angeles, CA, pp. 171–178 (April 1998)Google Scholar
  3. 3.
    Hauptmann, A.G., Wactlar, H.D.: Indexing and Search of Multimodal Information. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 1997), Munich, Germany, April 21-24 (1997)Google Scholar
  4. 4.
    Hauptmann, A.G., Lee, D.: Topic Labeling of Broadcast News Stories in the Informedia Digital Video Library. In: DL 1998 Proc. of the ACM Conference on Digital Libraries, Pittsburgh, PA, June 24-27 (1998)Google Scholar
  5. 5.
    Chua, T.-S., Chang, S.-F., Chaisorn, L., Hsu, W.: Story Boundary Detection in Large Broadcast News Video Archives – Techniques. In: Experience and Trends. ACM Multimedia 2004. Brave New Topic Paper, New York (2004)Google Scholar
  6. 6.
    Yan, R., Yang, J., Hauptmann, A.: Learning Query-Class Dependent Weights in Automatic Video Retrieval. In: Proceedings of ACM Multimedia 2004, New York, NY, October 10-16, pp. 548–555 (2004)Google Scholar
  7. 7.
    Rowley, H., Baluja, S., Kanade, T.: Human Face Detection in Visual Scenes. Carnegie Mellon University, School of Computer Science Technical Report CMU-CS-95-158, Pittsburgh, PAGoogle Scholar
  8. 8.
    Schneiderman, H.: A Statistical Approach to 3D Object Detection Applied to Faces and Cars. Ph.D. Thesis. Carnegie Mellon University. CMU-RI-TR-00-06Google Scholar
  9. 9.
    Satoh, S., Kanade, T.: NAME-IT: Association of Face and Name in Video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 1997), San Juan, Puerto Rico (June 1997)Google Scholar
  10. 10.
    Gong, Y.: Intelligent Image Databases: Toward Advanced Image Retrieval. Kluwer Academic Publishers, Hingham (1998)Google Scholar
  11. 11.
    Niblack, W., Barber, R., Equitz, W., Flickner, M., Glasman, E., Petkovic, D., Yanker, P., Faloutsos, C., Taubin, G.: The QBIC Project: Querying Images By Content Using Color. In: Texture and Shape SPIE 1993 Intl. Symposium on Electronic Imaging: Science and Technology, Storage and Retrieval for Image and Video Databases (Febraury 1993)Google Scholar
  12. 12.
    Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
  13. 13.
    Smith, M., Kanade, T.: Video Skimming for Quick Browsing Based on Audio and Image Characterization. Carnegie Mellon University technical report CMU-CS-95-186 (July 1995); Also submitted to PAMI Journal (Pattern Analysis and Machine Intelligence) (1995)Google Scholar
  14. 14.
    Christel, M., Conescu, R.: Addressing the Challenge of Visual Information Access from Digital Image and Video Libraries. In: ACM/IEEE JCDL (2005)Google Scholar
  15. 15.
    Christel, M., Moraveji, N.: Finding the Right Shots: Assessing Usability and Performance of a Digital Video Library Interface. In: Proc. ACM Multimedia, pp. 732–739. ACM Press, New York (2004)Google Scholar
  16. 16.
    Christel, M., Huang, C., Moraveji, N., Papernick, N.: Exploiting Multiple Modalities for Interactive Video Retrieval. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, Canada, pp. 1032–1035 (2004)Google Scholar
  17. 17.
    Wactlar, H.D., Christel, M.G., Gong, Y., Hauptmann, A.G.: Lessons Learned from the Creation and Deployment of a Terabyte Digital Video Library. IEEE Computer 32(2), 66–73 (1999)Google Scholar
  18. 18.
    Olligschlaeger, A.M., Hauptmann, A.G.: Multimodal Information Systems and GIS: The Informedia Digital Video Library. In: ESRI User Conference, San Diego, CA, July 27-29 (1999)Google Scholar
  19. 19.
    Garofolo, J.S., Auzanne, C.G.P., Voorhees, E.M.: The TREC SDR Track: A Success Story. In: Eighth TextRetrieval Conference, Washington, pp. 107–129 (2000)Google Scholar
  20. 20.
    Cox, R.V., Haskell, B.G., Lecun, Y., Shahraray, B., Rabiner, L.: Applications of Multimedia Processing to Communications. Proceedings of the IEEE, 754–824 (May 1998)Google Scholar
  21. 21.
    Hauptmann, A.G.: Towards a Large Scale Concept Ontology for Broadcast Video. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 674–675. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  22. 22.
    Naphade, M., Smith, J.R.: On Detection of Semantic Concepts at TRECVID. In: ACM Multimedia, ACM MM-2004 (October 2004)Google Scholar
  23. 23.
    Kraaij, W., Smeaton, A.F., Over, P., Arlandis, J.: TRECVID 2004 – An Introduction. In: TRECVID 2004 Proceedings,

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Alexander G. Hauptmann
    • 1
  1. 1.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations