Advertisement

Overview of VideoCLEF 2009: New Perspectives on Speech-Based Multimedia Content Enrichment

  • Martha Larson
  • Eamonn Newman
  • Gareth J. F. Jones
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6242)

Abstract

VideoCLEF 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment. For each task, video data (Dutch-language television, predominantly documentaries) accompanied by speech recognition transcripts were provided. The Subject Classification Task involved automatic tagging of videos with subject theme labels. The best performance was achieved by approaching subject tagging as an information retrieval task and using both speech recognition transcripts and archival metadata. Alternatively, classifiers were trained using either the training data provided or data collected from Wikipedia or via general Web search. The Affect Task involved detecting narrative peaks, defined as points where viewers perceive heightened dramatic tension. The task was carried out on the “Beeldenstorm” collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes, elevated speaking pitch, increased speaking intensity and radical visual changes. The Linking Task, also called “Finding Related Resources Across Languages,” involved linking video to material on the same subject in a different language. Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language “Beeldenstorm” collection and were expected to return target pages drawn from English-language Wikipedia. The best performing methods used the transcript of the speech spoken during the multimedia anchor to build a query to search an index of the Dutch-language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. Participants also experimented with pseudo-relevance feedback, query translation and methods that targeted proper names.

Keywords

Mean Average Precision Primary Link Mean Reciprocal Rank Secondary Link Query Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Calic, J., Sav, S., Izquierdo, E., Marlow, S., Murphy, N., O’Connor, N.: Temporal video segmentation for real-time key frame extraction. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, ICASSP (2002)Google Scholar
  2. 2.
    Dobrilǎ, T.-A., Diaconaşu, M.-C., Lungu, I.-D., Iftene, A.: UAIC: Participation in VideoCLEF task. In: Peters, C., et al. (eds.) CLEF 2009 Workshop, Part II. LNCS, vol. 6242, pp. 283–286. Springer, Heidelberg (2010)Google Scholar
  3. 3.
    Gyarmati, Á., Jones, G.J.F.: When to cross over? Cross-language linking using Wikipedia for VideoCLEF 2009. In: Peters, C., et al. (eds.) CLEF 2009 Workshop, Part II. LNCS, vol. 6242, pp. 409–412. Springer, Heidelberg (2010)Google Scholar
  4. 4.
    Hanjalic, A., Xu, L.-Q.: Affective video content representation and modeling. IEEE Transactions on Multimedia 7(1), 143–154 (2005)CrossRefGoogle Scholar
  5. 5.
    Huijbregts, M., Ordelman, R., de Jong, F.: Annotation of heterogeneous multimedia content using automatic speech recognition. In: Proceedings of the International Conference on Semantic and Digital Media Technologies, SAMT (2007)Google Scholar
  6. 6.
    Kekäläinen, J., Järvelin, K.: Using graded relevance assessments in IR evaluation. Journal of the American Society for Information Science and Technology 53(13), 1120–1129 (2002)CrossRefGoogle Scholar
  7. 7.
    Kierkels, J.J.M., Soleymani, M., Pun, T.: Identification of narrative peaks in video clips: Text features perform best. In: Peters, C., et al. (eds.) CLEF 2009 Workshop, Part II. LNCS, vol. 6242, pp. 393–400. Springer, Heidelberg (2010)Google Scholar
  8. 8.
    Kipp, M.: Anvil – a generic annotation tool for multimodal dialogue. In: Proceedings of Eurospeech, pp. 1367–1370 (2001)Google Scholar
  9. 9.
    Kürsten, J., Eibl, M.: Video classification as IR task: Experiments and observations. In: Peters, C., et al. (eds.) CLEF 2009 Workshop, Part II. LNCS, vol. 6242, pp. 377–384. Springer, Heidelberg (2010)Google Scholar
  10. 10.
    Larson, M., Jochems, B., Smits, E., Ordelman, R.: A cocktail approach to the VideoCLEF 2009 linking task. In: Peters, C., et al. (eds.) CLEF 2009 Workshop, Part II. LNCS, vol. 6242, pp. 401–408. Springer, Heidelberg (2010)Google Scholar
  11. 11.
    Larson, M., Newman, E., Jones, G.J.F.: Overview of VideoCLEF 2008: Automatic generation of topic-based feeds for dual language audio-visual content. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) Evaluating Systems for Multilingual and Multimodal Information Access. LNCS, vol. 5706, pp. 906–917. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Pecina, P., Hoffmannová, P., Jones, G.J.F., Zhang, Y., Oard, D.W.: Overview of the CLEF-2007 Cross-Language Speech Retrieval track. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 674–686. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Perea-Ortega, J.M., Montejo-Ráez, A., Martín-Valdivia, M.T., Ureña López, L.A.: Using Support Vector Machines as learning algorithm for video categorization. In: Peters, C., et al. (eds.) CLEF 2009 Workshop, Part II. LNCS, vol. 6242, pp. 373–376. Springer, Heidelberg (2010)Google Scholar
  14. 14.
    Raaijmakers, S., Versloot, C., de Wit, J.: A cocktail approach to the VideoCLEF 2009 linking task. In: Peters, C., et al. (eds.) CLEF 2009 Workshop, Part II. LNCS, vol. 6242, pp. 401–408. Springer, Heidelberg (2010)Google Scholar
  15. 15.
    Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVID. In: Proceedings of the ACM International Workshop on Multimedia Information Retrieval (MIR), pp. 321–330. ACM, New York (2006)CrossRefGoogle Scholar
  16. 16.
    Wrede, B., Shriberg, E.: Spotting “hot spots” in meetings: Human judgments and prosodic cues. In: Proceedings of Eurospeech, pp. 2805–2808 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Martha Larson
    • 1
  • Eamonn Newman
    • 2
  • Gareth J. F. Jones
    • 2
  1. 1.Multimedia Information Retrieval LabDelft University of TechnologyDelftNetherlands
  2. 2.Centre for Digital Video ProcessingDublin City UniversityDublin 9Ireland

Personalised recommendations