A Multi-Modal Approach to Story Segmentation for News Video
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
This research proposes a two-level, multi-modal framework to perform the segmentation and classification of news video into single-story semantic units. The video is analyzed at the shot and story unit (or scene) levels using a variety of features and techniques. At the shot level, we employ Decision Trees technique to classify the shots into one of 13 predefined categories or mid-level features. At the scene/story level, we perform the HMM (Hidden Markov Models) analysis to locate story boundaries. Our initial results indicate that we could achieve a high accuracy of over 95% for shot classification, and over 89% in F 1 measure on scene/story boundary detection. Detailed analysis reveals that HMM is effective in identifying dominant features, which helps in locating story boundaries. Our eventual goal is to support the retrieval of news video at story unit level, together with associated texts retrieved from related news sites on the web.
- A. A. Alatan, A. N. Akansu, and W. Wolf, “Multi-modal dialog scene detection using hidden Markov models for content-based multi-media indexing,” Multimedia Tools and Applications 14, 2001, 137-151.
- C. Anantharamu, H. Feng, and T.-S. Chua, “Temporal multi-resolution framework for shot boundary detection and key frame extraction,” in Proceedings of the International Conference on Text Retrieval (TREC'02), NIST, Gaithersburg, USA, November 2002, pp. 500-504.
- Berkeley University, World Wide Web (Digital Library SunSITE), http://sunsite.berkeley. edu/Web/
- L. Breiman, J. H. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Chapman & Hall, New York, 1993.
- S.-F. Chang and H. Sundaram, “Structural and semantic analysis of video,” IEEE International Conference on Multimedia and Expo, New York, 2000, p. 687.
- L. Chen and T.-S. Chua, “A match and tiling approach to content-based image retrieval,” in ICME'01 (IEEE International Conference on Multimedia and Expo), Tokyo, Japan, August 2001, pp. 417-420.
- Y. Chen and E. K. Wong, “A knowledge-based approach to video content classification,” Proceedings of the International Conference of SPIE, Vol. 4315, 2001, pp. 292-300.
- T.-S. Chua and C. Chu, “Color-based pseudo-object for image retrieval with relevance feedback,” in Proceedings of the International Conference on Advanced Multimedia Content Processing'98, Osaka, Japan, November 1998, pp. 148-162.
- T.-S. Chua, Y. Zhao, and M. S. Kankanhalli, “An automated compressed-domain face detection method for video stratification,” in Proceedings of the International Conference on Multimedia Modeling (MMM'2000), Nagoya, Japan, November 2000, pp. 333-348.
- R. Dale, H. Moisl, and H. Somers, Handbook of Natural Language Processing, Marcel Dekker, New York, 2000.
- T. G. Dietterich and G. Bakiri, “Solving multi-class learning problems via error-correcting output codes,” Journal of Artificial Intelligence Research, 1995, 263-286.
- S. Eickeler, A. Kosmala, and G. Rigoll, “A new approach to content-based video indexing using hidden Markov models,” in IEEE Workshop on Image Analysis for Multimedia Interactive Service (WIAMIS), Louvain la Neuve, Belgium, June 1997, pp. 149-154.
- G. Hoyle, “Distance learning on the Net,” http://www.hoyle.com/distance.htm
- J. Huang, Z. Liu, and Y. Wang, “Integration of multimodal features for video scene classification based on HMM,” in IEEE Signal Processing Society Workshop on Multimedia Signal Processing, Denmark, 1999, pp. 53-58.
- I. Ide, K. Yamamoto, and H. Tanaka, “Automatic video indexing based on shot classification,” in Proceedings of the International Conference on Advanced Multimedia Content Processing (AMCP'98), Osaka, Japan, 1998, pp. 87-102.
- M.-I. Jordan, Learning in Graphical Models, MIT Press, Cambridge, MA, 1998.
- C.-K. Koh and T.-S. Chua, “Detection and segmentation of commercials in news video,” Technical Report, The School of Computing, National University of Singapore, 2000.
- Y. Lin, M. S. Kanhanhalli, and T.-S. Chua, “Temporal multi-resolution analysis for video segmentationtion,” in Proceedings of the International Conference of SPIE (Storage and Retrieval for Media Databases), San Jose, USA, Vol. 3972, January 2000, pp. 494-505.
- Z. Liu, J. Huang, and Y. Wang, “Classification of TV programs based on audio information using hidden Markov models,” in IEEE Signal Processing Society, Workshop on Multimedia Signal Processing, Los Angeles, CA, 1998, pp. 27-31.
- L. Lu, S. Z. Li, and H.-J. Zhang, “Content-based audio segmentation using support vector machine,” in IEEE International Conference on Multimedia and Expo (ICME 2001), Japan, 2001, pp. 956-959.
- J. R. Quinlan, Induction of Decision Trees. Machine Learning, Vol. 1, 1986, pp. 81-106.
- L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ, 1993.
- Vanderbilt University, The Television News Archive, http://tvnews.vanderbilt.edu
- H.-J. Zhang, A. Kankanhalli, and S.W. Smoliar, “Automatic partitioning of full-motion video,” Multimedia Systems 1(1), 1993, 10-28.
- Y. Zhang and T.-S. Chua, “Detection of text captions in compressed domain video,” in Proceedings of ACM Multimedia'2000 Workshops (Multimedia Information Retrieval), California, USA, November 2000, pp. 201-204.
- W. Zhou, A. Vellaikal, and C.-C. Jay Kuo, “Rule-based classification system for basketball video indexing,” in Proceedings of ACM Multimedia'2000 Workshops (Multimedia Information Retrieval), California, USA, November 2000, pp. 213-216.
- A Multi-Modal Approach to Story Segmentation for News Video
World Wide Web
Volume 6, Issue 2 , pp 187-208
- Cover Date
- Print ISSN
- Online ISSN
- Kluwer Academic Publishers
- Additional Links
- news story segmentation
- shot classification
- multi-modal approach
- learning-based approach
- Industry Sectors