Skip to main content
Log in

A Multi-Modal Approach to Story Segmentation for News Video

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

This research proposes a two-level, multi-modal framework to perform the segmentation and classification of news video into single-story semantic units. The video is analyzed at the shot and story unit (or scene) levels using a variety of features and techniques. At the shot level, we employ Decision Trees technique to classify the shots into one of 13 predefined categories or mid-level features. At the scene/story level, we perform the HMM (Hidden Markov Models) analysis to locate story boundaries. Our initial results indicate that we could achieve a high accuracy of over 95% for shot classification, and over 89% in F 1 measure on scene/story boundary detection. Detailed analysis reveals that HMM is effective in identifying dominant features, which helps in locating story boundaries. Our eventual goal is to support the retrieval of news video at story unit level, together with associated texts retrieved from related news sites on the web.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. A. Alatan, A. N. Akansu, and W. Wolf, “Multi-modal dialog scene detection using hidden Markov models for content-based multi-media indexing,” Multimedia Tools and Applications 14, 2001, 137-151.

    Google Scholar 

  2. C. Anantharamu, H. Feng, and T.-S. Chua, “Temporal multi-resolution framework for shot boundary detection and key frame extraction,” in Proceedings of the International Conference on Text Retrieval (TREC'02), NIST, Gaithersburg, USA, November 2002, pp. 500-504.

  3. Berkeley University, World Wide Web (Digital Library SunSITE), http://sunsite.berkeley. edu/Web/

  4. L. Breiman, J. H. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Chapman & Hall, New York, 1993.

    Google Scholar 

  5. S.-F. Chang and H. Sundaram, “Structural and semantic analysis of video,” IEEE International Conference on Multimedia and Expo, New York, 2000, p. 687.

  6. L. Chen and T.-S. Chua, “A match and tiling approach to content-based image retrieval,” in ICME'01 (IEEE International Conference on Multimedia and Expo), Tokyo, Japan, August 2001, pp. 417-420.

  7. Y. Chen and E. K. Wong, “A knowledge-based approach to video content classification,” Proceedings of the International Conference of SPIE, Vol. 4315, 2001, pp. 292-300.

    Google Scholar 

  8. T.-S. Chua and C. Chu, “Color-based pseudo-object for image retrieval with relevance feedback,” in Proceedings of the International Conference on Advanced Multimedia Content Processing'98, Osaka, Japan, November 1998, pp. 148-162.

  9. T.-S. Chua, Y. Zhao, and M. S. Kankanhalli, “An automated compressed-domain face detection method for video stratification,” in Proceedings of the International Conference on Multimedia Modeling (MMM'2000), Nagoya, Japan, November 2000, pp. 333-348.

  10. R. Dale, H. Moisl, and H. Somers, Handbook of Natural Language Processing, Marcel Dekker, New York, 2000.

    Google Scholar 

  11. T. G. Dietterich and G. Bakiri, “Solving multi-class learning problems via error-correcting output codes,” Journal of Artificial Intelligence Research, 1995, 263-286.

  12. S. Eickeler, A. Kosmala, and G. Rigoll, “A new approach to content-based video indexing using hidden Markov models,” in IEEE Workshop on Image Analysis for Multimedia Interactive Service (WIAMIS), Louvain la Neuve, Belgium, June 1997, pp. 149-154.

  13. G. Hoyle, “Distance learning on the Net,” http://www.hoyle.com/distance.htm

  14. J. Huang, Z. Liu, and Y. Wang, “Integration of multimodal features for video scene classification based on HMM,” in IEEE Signal Processing Society Workshop on Multimedia Signal Processing, Denmark, 1999, pp. 53-58.

  15. I. Ide, K. Yamamoto, and H. Tanaka, “Automatic video indexing based on shot classification,” in Proceedings of the International Conference on Advanced Multimedia Content Processing (AMCP'98), Osaka, Japan, 1998, pp. 87-102.

  16. M.-I. Jordan, Learning in Graphical Models, MIT Press, Cambridge, MA, 1998.

    Google Scholar 

  17. C.-K. Koh and T.-S. Chua, “Detection and segmentation of commercials in news video,” Technical Report, The School of Computing, National University of Singapore, 2000.

  18. Y. Lin, M. S. Kanhanhalli, and T.-S. Chua, “Temporal multi-resolution analysis for video segmentationtion,” in Proceedings of the International Conference of SPIE (Storage and Retrieval for Media Databases), San Jose, USA, Vol. 3972, January 2000, pp. 494-505.

    Google Scholar 

  19. Z. Liu, J. Huang, and Y. Wang, “Classification of TV programs based on audio information using hidden Markov models,” in IEEE Signal Processing Society, Workshop on Multimedia Signal Processing, Los Angeles, CA, 1998, pp. 27-31.

  20. L. Lu, S. Z. Li, and H.-J. Zhang, “Content-based audio segmentation using support vector machine,” in IEEE International Conference on Multimedia and Expo (ICME 2001), Japan, 2001, pp. 956-959.

  21. J. R. Quinlan, Induction of Decision Trees. Machine Learning, Vol. 1, 1986, pp. 81-106.

    Google Scholar 

  22. L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ, 1993.

    Google Scholar 

  23. Vanderbilt University, The Television News Archive, http://tvnews.vanderbilt.edu

  24. H.-J. Zhang, A. Kankanhalli, and S.W. Smoliar, “Automatic partitioning of full-motion video,” Multimedia Systems 1(1), 1993, 10-28.

    Google Scholar 

  25. Y. Zhang and T.-S. Chua, “Detection of text captions in compressed domain video,” in Proceedings of ACM Multimedia'2000 Workshops (Multimedia Information Retrieval), California, USA, November 2000, pp. 201-204.

  26. W. Zhou, A. Vellaikal, and C.-C. Jay Kuo, “Rule-based classification system for basketball video indexing,” in Proceedings of ACM Multimedia'2000 Workshops (Multimedia Information Retrieval), California, USA, November 2000, pp. 213-216.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chaisorn, L., Chua, TS. & Lee, CH. A Multi-Modal Approach to Story Segmentation for News Video. World Wide Web 6, 187–208 (2003). https://doi.org/10.1023/A:1023622605600

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023622605600

Navigation