Unsupervised News Video Segmentation by Combined Audio-Video Analysis

  • M. De Santo
  • G. Percannella
  • C. Sansone
  • M. Vento
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4105)


Segmenting news video into stories is among key issues for achieving efficient treatment of news-based digital libraries. In this paper we present a novel unsupervised algorithm that combines audio and video information for automatic partitioning news videos into stories. The proposed algorithm is based on the detection of anchor shots within the video. In particular, a set of audio/video templates of anchorperson shots is first extracted in an unsupervised way, then shots are classified by comparing them to the templates using both video and audio similarity. Finally, a story is obtained by linking each anchor shot with all successive shots until another anchor shot, or the end of the news video, occurs. Audio similarity is evaluated by means of a new index and helps to achieve better performance in anchor shot detection than pure video approach. The method has been tested on a wide database and compared with other state-of-the-art algorithms, demonstrating its effectiveness with respect to them.


Face Detection News Story News Video Video Shot Audio Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kraaij, W., Smeaton, A.F., Over, P., Arlandis, J.: TRECVID 2004 - An Overview. TREC Video Retrieval Evaluation Online Proc.,
  2. 2.
    Wang, C., Wang, Y., Liu, H.Y., He, Y.X.: Automatic Story Segmentation of News Video Based on Audio-Visual Features and Text Information. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi’an, November 2–5, pp. 3008–3011 (2003)Google Scholar
  3. 3.
    Wei, W., Gao, W.: Automatic Segmentation of News Items Based on Video and Audio Features. Journal of Computer Science and Technology 17(2), 189–195 (2002)CrossRefGoogle Scholar
  4. 4.
    De Santo, M., Percannella, G., Sansone, C., Vento, M.: An Unsupervised Shot Classification System for News Video Story Detection. In: Abate, A.F., Nappi, M., Sebillo, M. (eds.) Multimedia Database and Image Communication, pp. 93–104. World Scientific Publ., Singapore (2005)Google Scholar
  5. 5.
    Gao, X., Tang, X.: Unsupervised Video-Shot Segmentation and Model-Free Anchorperson Detection for News Video Story Parsing. IEEE Trans. on Circ. and Syst. for Video Tech. 12(9), 765–776 (2002)CrossRefGoogle Scholar
  6. 6.
    Swanberg, D., Shu, C.F., Jain, R.: Knowledge Guided Parsing in Video Databases. In: Proc. of SPIE Symposium on Electronic Imaging: Science and Technology, San Jose, CA, pp. 13–24 (1993)Google Scholar
  7. 7.
    Smoliar, S.W., Zhang, H.J., Tao, S.Y., Gong, Y.: Automatic Parsing and Indexing of News Video. Multimedia Systems 2(6), 256–265 (1995)CrossRefGoogle Scholar
  8. 8.
    Hanjalic, A., Lagendijk, R.L., Biemond, J.: Semi-Automatic News Analysis, Indexing, and Classification System Based on Topics Preselection. In: Proc. of SPIE, Electronic Imaging, San Jose, CA (1999)Google Scholar
  9. 9.
    Bertini, M., Del Bimbo, A., Pala, P.: Content-Based Indexing and Retrieval of TV News. Pattern Recognition Letters 22, 503–516 (2001)MATHCrossRefGoogle Scholar
  10. 10.
    Snoek, C.G.M., Worring, M.: Multimodal Video Indexing: A Review of the State-of-the-art. Multimedia Tools and Applications 25, 5–35 (in press, 2005)CrossRefGoogle Scholar
  11. 11.
    Qi, W., Gu, L., Jiang, H., Chen, X.R., Zhang, H.J.: Integrating Visual, Audio and Text Analysis for News Video. In: 7th IEEE Int. Conf. on Image Processing, Vancouver, British Columbia, Canada (2000)Google Scholar
  12. 12.
    Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)MATHGoogle Scholar
  13. 13.
    Viola, P., Jones, M.: Rapid Object Detection Using a Boosted Cascade of Simple Features. In: Proc. of the IEEE CVPR Conference, vol. 1, pp. 511–518 (2001)Google Scholar
  14. 14.
    Lee, H.Y., Lee, H.K., Ha, Y.H.: Spatial Color Descriptor for Image Retrieval and Video Segmentation. IEEE Transactions on Multimedia 5(3), 358–367 (2003)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A Real-Time Text-Independent Speaker Identification System. In: IEEE ICIAP Conference, Mantova, Italy, pp. 632–637 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • M. De Santo
    • 1
  • G. Percannella
    • 1
  • C. Sansone
    • 2
  • M. Vento
    • 1
  1. 1.Dip. di Ingegneria dell’Informazione ed Ingegneria ElettricaUniversità degli Studi di SalernoFisciano (SA)Italy
  2. 2.Dipartimento di Informatica e SistemisticaUniversità degli Studi di Napoli “Federico II”NapoliItaly

Personalised recommendations