Annotating News Video with Locations

  • Jun Yang
  • Alexander G. Hauptmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4071)


The location of video scenes is an important semantic descriptor especially for broadcast news video. In this paper, we propose a learning-based approach to annotate shots of news video with locations extracted from video transcript, based on features from multiple video modalities including syntactic structure of transcript sentences, speaker identity, temporal video structure, and so on. Machine learning algorithms are adopted to combine multi-modal features to solve two sub-problems: (1) whether the location of a video shot is mentioned in the transcript, and if so, (2) among many locations in the transcript, which are correct one(s) for this shot. Experiments on TRECVID dataset demonstrate that our approach achieves approximately 85% accuracy in correctly labeling the location of any shot in news video.


Support Vector Machine Noun Phrase True Location Candidate Location Parse Tree 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aoki, H., Schiele, B., Pentland, A.: Recognizing personal location from video. In: Workshop on Perceptual User Interfaces, pp. 79–82 (1998)Google Scholar
  2. 2.
    Bikel, D., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proc. 5th Conf. on Applied Natural Language Processing, pp. 194–201 (1997)Google Scholar
  3. 3.
    Christel, M., Olligschlaeger, A., Huang, C.: Interactive maps for a digital video library. IEEE MultiMedia 7(1), 60–67 (2000)CrossRefGoogle Scholar
  4. 4.
    Gauvain, J.-L., Lamel, L., Adda, G.: The limsi broadcast news transcription system. Speech Commun. 37(1-2), 89–108 (2002)MATHCrossRefGoogle Scholar
  5. 5.
    Hauptmann, A., Witbrock, M.: Story segmentation and detection of commercials in broadcast news video. In: Advances in Digital Libraries, pp. 168–179 (1998)Google Scholar
  6. 6.
    Kumar, R., Sawhney, H., Asmuth, J., Pope, A., Hsu, S.: Registration of video to geo-referenced imagery. In: Proc. of 14th Int’l Conf. on Pattern Recognition, vol. 2, pp. 1393–1400 (1998)Google Scholar
  7. 7.
    Sato, T., Kanade, T., Hughes, E., Smith, M., Satoh, S.: Video OCR: indexing digital new libraries by recognition of superimposed captions. Multimedia Syst. 7(5), 385–395 (1999)CrossRefGoogle Scholar
  8. 8.
    Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. of 9th IEEE Int’l Conf. on Computer Vision, vol. 2 (2003)Google Scholar
  9. 9.
    Sleator, D., Temperley, D.: Parsing english with a link grammar. In: Third Int’l. Workshop on Parsing Technologies (1993)Google Scholar
  10. 10.
    Yang, J., Hauptmann, A.G.: Naming every individual in news video monologues. In: Proc. of the 12th ACM Intl., pp. 580–587 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jun Yang
    • 1
  • Alexander G. Hauptmann
    • 1
  1. 1.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations