World Wide Web

, Volume 6, Issue 2, pp 187–208

A Multi-Modal Approach to Story Segmentation for News Video

  • Lekha Chaisorn
  • Tat-Seng Chua
  • Chin-Hui Lee
Article

Abstract

This research proposes a two-level, multi-modal framework to perform the segmentation and classification of news video into single-story semantic units. The video is analyzed at the shot and story unit (or scene) levels using a variety of features and techniques. At the shot level, we employ Decision Trees technique to classify the shots into one of 13 predefined categories or mid-level features. At the scene/story level, we perform the HMM (Hidden Markov Models) analysis to locate story boundaries. Our initial results indicate that we could achieve a high accuracy of over 95% for shot classification, and over 89% in F1 measure on scene/story boundary detection. Detailed analysis reveals that HMM is effective in identifying dominant features, which helps in locating story boundaries. Our eventual goal is to support the retrieval of news video at story unit level, together with associated texts retrieved from related news sites on the web.

news story segmentation shot classification multi-modal approach learning-based approach 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    A. A. Alatan, A. N. Akansu, and W. Wolf, “Multi-modal dialog scene detection using hidden Markov models for content-based multi-media indexing,” Multimedia Tools and Applications 14, 2001, 137-151.Google Scholar
  2. [2]
    C. Anantharamu, H. Feng, and T.-S. Chua, “Temporal multi-resolution framework for shot boundary detection and key frame extraction,” in Proceedings of the International Conference on Text Retrieval (TREC'02), NIST, Gaithersburg, USA, November 2002, pp. 500-504.Google Scholar
  3. [3]
    Berkeley University, World Wide Web (Digital Library SunSITE), http://sunsite.berkeley. edu/Web/Google Scholar
  4. [4]
    L. Breiman, J. H. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Chapman & Hall, New York, 1993.Google Scholar
  5. [5]
    S.-F. Chang and H. Sundaram, “Structural and semantic analysis of video,” IEEE International Conference on Multimedia and Expo, New York, 2000, p. 687.Google Scholar
  6. [6]
    L. Chen and T.-S. Chua, “A match and tiling approach to content-based image retrieval,” in ICME'01 (IEEE International Conference on Multimedia and Expo), Tokyo, Japan, August 2001, pp. 417-420.Google Scholar
  7. [7]
    Y. Chen and E. K. Wong, “A knowledge-based approach to video content classification,” Proceedings of the International Conference of SPIE, Vol. 4315, 2001, pp. 292-300.Google Scholar
  8. [8]
    T.-S. Chua and C. Chu, “Color-based pseudo-object for image retrieval with relevance feedback,” in Proceedings of the International Conference on Advanced Multimedia Content Processing'98, Osaka, Japan, November 1998, pp. 148-162.Google Scholar
  9. [9]
    T.-S. Chua, Y. Zhao, and M. S. Kankanhalli, “An automated compressed-domain face detection method for video stratification,” in Proceedings of the International Conference on Multimedia Modeling (MMM'2000), Nagoya, Japan, November 2000, pp. 333-348.Google Scholar
  10. [10]
    R. Dale, H. Moisl, and H. Somers, Handbook of Natural Language Processing, Marcel Dekker, New York, 2000.Google Scholar
  11. [11]
    T. G. Dietterich and G. Bakiri, “Solving multi-class learning problems via error-correcting output codes,” Journal of Artificial Intelligence Research, 1995, 263-286.Google Scholar
  12. [12]
    S. Eickeler, A. Kosmala, and G. Rigoll, “A new approach to content-based video indexing using hidden Markov models,” in IEEE Workshop on Image Analysis for Multimedia Interactive Service (WIAMIS), Louvain la Neuve, Belgium, June 1997, pp. 149-154.Google Scholar
  13. [13]
    G. Hoyle, “Distance learning on the Net,” http://www.hoyle.com/distance.htmGoogle Scholar
  14. [14]
    J. Huang, Z. Liu, and Y. Wang, “Integration of multimodal features for video scene classification based on HMM,” in IEEE Signal Processing Society Workshop on Multimedia Signal Processing, Denmark, 1999, pp. 53-58.Google Scholar
  15. [15]
    I. Ide, K. Yamamoto, and H. Tanaka, “Automatic video indexing based on shot classification,” in Proceedings of the International Conference on Advanced Multimedia Content Processing (AMCP'98), Osaka, Japan, 1998, pp. 87-102.Google Scholar
  16. [16]
    M.-I. Jordan, Learning in Graphical Models, MIT Press, Cambridge, MA, 1998.Google Scholar
  17. [17]
    C.-K. Koh and T.-S. Chua, “Detection and segmentation of commercials in news video,” Technical Report, The School of Computing, National University of Singapore, 2000.Google Scholar
  18. [18]
    Y. Lin, M. S. Kanhanhalli, and T.-S. Chua, “Temporal multi-resolution analysis for video segmentationtion,” in Proceedings of the International Conference of SPIE (Storage and Retrieval for Media Databases), San Jose, USA, Vol. 3972, January 2000, pp. 494-505.Google Scholar
  19. [19]
    Z. Liu, J. Huang, and Y. Wang, “Classification of TV programs based on audio information using hidden Markov models,” in IEEE Signal Processing Society, Workshop on Multimedia Signal Processing, Los Angeles, CA, 1998, pp. 27-31.Google Scholar
  20. [20]
    L. Lu, S. Z. Li, and H.-J. Zhang, “Content-based audio segmentation using support vector machine,” in IEEE International Conference on Multimedia and Expo (ICME 2001), Japan, 2001, pp. 956-959.Google Scholar
  21. [21]
    J. R. Quinlan, Induction of Decision Trees. Machine Learning, Vol. 1, 1986, pp. 81-106.Google Scholar
  22. [22]
    L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ, 1993.Google Scholar
  23. [23]
    Vanderbilt University, The Television News Archive, http://tvnews.vanderbilt.eduGoogle Scholar
  24. [24]
    H.-J. Zhang, A. Kankanhalli, and S.W. Smoliar, “Automatic partitioning of full-motion video,” Multimedia Systems 1(1), 1993, 10-28.Google Scholar
  25. [25]
    Y. Zhang and T.-S. Chua, “Detection of text captions in compressed domain video,” in Proceedings of ACM Multimedia'2000 Workshops (Multimedia Information Retrieval), California, USA, November 2000, pp. 201-204.Google Scholar
  26. [26]
    W. Zhou, A. Vellaikal, and C.-C. Jay Kuo, “Rule-based classification system for basketball video indexing,” in Proceedings of ACM Multimedia'2000 Workshops (Multimedia Information Retrieval), California, USA, November 2000, pp. 213-216.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Lekha Chaisorn
    • 1
  • Tat-Seng Chua
    • 1
  • Chin-Hui Lee
    • 1
  1. 1.The School of ComputingNational University of SingaporeSingapore

Personalised recommendations