Multimedia Tools and Applications

, Volume 78, Issue 22, pp 31925–31957 | Cite as

Segmenting with style: detecting program and story boundaries in TV news broadcast videos

  • Raghvendra KannaoEmail author
  • Prithwijit Guha


Television news is an important medium to convey information to masses. This motivates several stakeholders to monitor and analyze the news broadcasts. Segmentation of streaming broadcast into programs and stories is a necessary first step for such analysis. Television news producers use predefined and unique presentation styles to create the channel content. Presentation styles vary with program and news story category, broadcast time, targeted audience etc. This motivated us to use presentation styles as features for segmenting news broadcasts. We propose a novel approach for characterization of spatio-temporal presentation styles. This involves characterization of spatial styles using a set of (presentation style specific) semantic shot categories derived from LSCOM-Lite Ontology. We also identify features and classifiers to automate the process of shot labeling for spatial style characterization. Further, the temporal presentation styles of shots are modeled using conditional random fields. This spatio-temporal modeling of presentation styles is used for segmenting the broadcast into programs and stories. We have also contributed a 360 hours broadcast video dataset acquired from three Indian English news channels with ground-truth marked semantic shot categories, program genres and story boundaries. Experimentations on this dataset have shown the utility of our proposal for news broadcast video segmentation.


TV news broadcast Semantic shot classification News program detection News story segmentation 



  1. 1.
    (2016) Electronic media monitoring center., [Online; accessed 3-September-2018]
  2. 2.
  3. 3.
    Abu-El-Haija S, Kothari N, Lee J, Natsev P, Toderici G, Varadarajan B, Vijayanarasimhan S (2016) Youtube-8m: A large-scale video classification benchmark. arXiv:160908675
  4. 4.
    Atrey PK, Hossain MA, El SA, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16(6):345–379CrossRefGoogle Scholar
  5. 5.
    Awad G, Snoek CGM, Smeaton AF, Quénot G (2016) Trecvid semantic indexing of video: A 6-year retrospective. ITE Transactions on Media Technology and Applications 4(3):187–208., invited paperCrossRefGoogle Scholar
  6. 6.
    Ban T, Isawa R, Guo S, Inoue D, Nakao K (2013) Application of string kernel based support vector machine for malware packer identification. In: International Joint Conference on Neural Networks, IEEE, pp 1–8Google Scholar
  7. 7.
    Browne P, Czirjek C, Gurrin C, Jarina R, Lee H, Marlow S, McDonald K, Murphy N, O’Connor NE, Smeaton AF et al (2002) Dublin city university video track experiments for trecGoogle Scholar
  8. 8.
    Chaisorn L, Chua TS, Koh CK, Zhao Y, Xu H, Feng H, Tian Q (2003) A Two-level multi-modal approach for story segmentation of large news video corpus. In: TRECVID Conference, (Gaithersburg, Washington DC, November 2003). Published on-line at
  9. 9.
    Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. software available at CrossRefGoogle Scholar
  10. 10.
    Charlet D, Damnati G, Bouchekif A, Douib A (2015) Fusion of speaker and lexical information for topic segmentation: A co-segmentation approach. In: IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, pp 5261–5265Google Scholar
  11. 11.
    Chatzis SP, Demiris Y (2013) The infinite-order conditional random field model for sequential data modeling. IEEE Trans Pattern Anal Mach Intell 35(6):1523–1534CrossRefGoogle Scholar
  12. 12.
    Chen Y, Li Z, Li M, Ma WY (2006) Automatic classification of photographs and graphics. In: International Conference on Multimedia and Expo, pp 973–976Google Scholar
  13. 13.
    Chua TS, Chang SF, Chaisorn L, Hsu W (2004) Story boundary detection in large broadcast news video archives: techniques, experience and trends. In: International conference on Multimedia, pp 656– 659Google Scholar
  14. 14.
    Claveau V, Lefèvre S (2015) Topic segmentation of TV-streams by watershed transform and vectorization. Comput Speech Lang 29(1):63–80CrossRefGoogle Scholar
  15. 15.
    Comcowich W (2016) The importance of TV news monitoring - and how to do it., [Online; accessed 3-September-2018]
  16. 16.
    Dietterich TG (2002) Machine learning for sequential data: A review. In: Structural, syntactic, and statistical pattern recognition, Springer, pp 15–30Google Scholar
  17. 17.
    Elin L, Lapides A (2004) Designing and Producing the Television Commercial. Pearson/A and BGoogle Scholar
  18. 18.
    Feng B, Ding P, Chen J, Bai J, Xu S, Xu B (2012) Multi-modal information fusion for news story segmentation in broadcast video. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp 1417–1420Google Scholar
  19. 19.
    Feng B, Chen Z, Zheng R, Xu B (2014) Multiple style exploration for story unit segmentation of broadcast news video. Multimedia systems 20(4):347–361CrossRefGoogle Scholar
  20. 20.
    Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with lstm recurrent networks. J Mach Learn Res 3(Aug):115–143MathSciNetzbMATHGoogle Scholar
  21. 21.
    Ghosh H, Kopparapu SK, Chattopadhyay T, Khare A, Wattamwar SS, Gorai A, Pandharipande M (2010) Multimodal indexing of multilingual news video. International Journal of Digital Multimedia BroadcastingGoogle Scholar
  22. 22.
    Grabe ME, Kamhawi R (2006) Hard wired for negative news? gender differences in processing broadcast news. Commun Res 33(5):346–369CrossRefGoogle Scholar
  23. 23.
    Graber DA (1988) Processing the news: How people tame the information tide. University Press of AmerGoogle Scholar
  24. 24.
    Gunter B (2015) The Cognitive Impact of Television News: Production Attributes and Information Reception. Springer, BerlinCrossRefGoogle Scholar
  25. 25.
    Hearst MA (1994) Multi-paragraph segmentation of expository text. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp 9–16Google Scholar
  26. 26.
    Horn BK, Schunck BG (1981) Determining optical flow. Artif Intell 17 (1-3):185–203CrossRefGoogle Scholar
  27. 27.
    India BARC (2016) Barc india, understanding urban india - newsletter issue - august 2016., [Online; accessed 3-September-2018]
  28. 28.
    Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp 448–456Google Scholar
  29. 29.
    Iwan LH, Thom JA (2017) Temporal video segmentation: detecting the end-of-act in circus performance videos. Multimed Tools Appl 76(1):1379–1401CrossRefGoogle Scholar
  30. 30.
    Jiang YG, Wu Z, Tang J, Li Z, Xue X, Chang SF (2018) Modeling multimodal clues in a hybrid deep learning framework for video classification. IEEE Trans Multimedia 20(11):3137–3147CrossRefGoogle Scholar
  31. 31.
    Jindal A, Tiwari A, Ghosh H (2011) Efficient and language independent news story segmentation for telecast news videos. In: International Symposium on Multimedia, IEEE, pp 458–463Google Scholar
  32. 32.
    Kannao R, Guha P (2015) A novel local success weighted ensemble classifier. In: IAPR Asian Conference on Pattern Recognition (ACPR 2015)Google Scholar
  33. 33.
    Kannao R, Guha P (2016) TV commercial detection using success based locally weighted kernel combination. In: MultiMedia Modeling, Springer, pp 793–805Google Scholar
  34. 34.
    Kannao R, Guha P (2017) Success based locally weighted multiple kernel combination. Pattern Recogn 68(4):38–51CrossRefGoogle Scholar
  35. 35.
    Kim JW, Cho SH (2014) Effectively detecting topic boundaries in a news video by using wikipedia. International Journal of Software Engineering and Its Applications 8(6):229–240Google Scholar
  36. 36.
    Kim W, Park J, Kim C (2010) A novel method for efficient indoor–outdoor image classification. Journal of Signal Processing Systems 61(3):251–258CrossRefGoogle Scholar
  37. 37.
    Kmiec S, Bae J, An R (2018) Learnable pooling methods for video classification. In: European Conference on Computer Vision, Springer, pp 229–238Google Scholar
  38. 38.
    Kraaij W, Smeaton AF, Over P (2004) Trecvid 2004-an overviewGoogle Scholar
  39. 39.
    Kudo T (2016) Crf++: Yet another crf toolkit. Software available at
  40. 40.
    Lafferty JD, McCallum A, Pereira FC (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., pp 282–289Google Scholar
  41. 41.
    Li H, Jou B, Ellis JG, Morozoff D, Chang SF (2013) News rover: Exploring topical structures and serendipity in heterogeneous multimedia news. In: International Conference on Multimedia, ACM, pp 449–450Google Scholar
  42. 42.
    Long X, Gan C, de Melo G, Wu J, Liu X, Wen S (2018) Attention clusters: Purely attention based local feature integration for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7834–7843Google Scholar
  43. 43.
    Lu X, Leung CC, Xie L, Ma B, Li H (2013) Broadcast news story segmentation using latent topics on data manifold. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp 8465–8469Google Scholar
  44. 44.
    Meinedo H, Neto J (2003) Audio segmentation, classification and clustering in a broadcast news task. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol 2. pp II–5Google Scholar
  45. 45.
    Misra H, Hopfgartner F, Goyal A, Punitha P, Jose JM (2010) TV news story segmentation based on semantic coherence and content similarity. In: Advances in Multimedia Modeling, Springer, pp 347–357Google Scholar
  46. 46.
    Naphade M, Smith JR, Tesic J, Chang SF, Hsu W, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Transactions on MultiMedia 13(3):86–91CrossRefGoogle Scholar
  47. 47.
    Nie X, Feng W, Wan L, Xie L (2013) Measuring semantic similarity by contextualword connections in chinese news story segmentation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, pp 8312–8316Google Scholar
  48. 48.
    Poulisse GJ, Moens MF, Dekens T, Deschacht K (2010) News story segmentation in multiple modalities. Multimed Tools Appl 48(1):3–22CrossRefGoogle Scholar
  49. 49.
    Quenot GM, Mararu D, Ayache S, Charhad M, Besacier L, Guironnet M, Pellerin D, Gensel J, Carminati L (2004) CLIPS LIS LSR LABRI Experiments in TREC Video Retrieval 2004. In: TREC Video Retrieval 2004Google Scholar
  50. 50.
    Rabiner LR, Juang BH (1986) An introduction to hidden Markov models. ASSP Magazine ,IEEE 3(1):4–16CrossRefGoogle Scholar
  51. 51.
    Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:180402767
  52. 52.
    Russell BC, Torralba A, Murphy KP, Freeman WT (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vis 77(1-3):157–173CrossRefGoogle Scholar
  53. 53.
    Shaban A, Firl A, Humayun A, Yuan J, Wang X, Lei P, Dhanda N, Boots B, Rehg JM, Li F (2017) Multiple-instance video segmentation with sequence-specific object proposals. In: CVPR Workshops, vol 14Google Scholar
  54. 54.
    Shah R, Zimmermann R (2017) Lecture video segmentation. In: Multimodal Analysis of User-Generated Multimedia Content, Springer, pp 173–203Google Scholar
  55. 55.
    Sidiropoulos P, Mezaris V, Kompatsiaris I, Meinedo H, Bugalho M, Trancoso I (2013) On the use of audio events for improving video scene segmentation. In: Analysis, Retrieval and Delivery of Multimedia Content, Springer, pp 3–19Google Scholar
  56. 56.
    Singh R, Lanchantin J, Robins G, Qi Y (2016) Transfer string kernel for cross-context dna-protein binding prediction. IEEE/ACM Transactions on Computational Biology and BioinformaticsGoogle Scholar
  57. 57.
    Smeaton AF, Lee H, O’Connor N, Marlow S, Murphy N (2003) TV news story segmentation, personalisation and recommendation. In: Intelligent Multimedia Knowledge Management: Papers from the 2003 Spring Symposium, pp 24–26Google Scholar
  58. 58.
    Smeaton AF, Over P, Doherty AR (2010) Video shot boundary detection: Seven years of trecvid activity. Comput Vis Image Underst 114(4):411–418CrossRefGoogle Scholar
  59. 59.
    Sutton C, McCallum A et al (2012) An introduction to conditional random fields. Foundations and Trends in Machine Learning 4(4):267–373CrossRefGoogle Scholar
  60. 60.
    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826Google Scholar
  61. 61.
    Szummer M, Picard RW (1998) Indoor-outdoor image classification. In: Workshop on Content-Based Access of Image and Video Database, IEEE, pp 42–51Google Scholar
  62. 62.
    Ullah J, Khan A, Jaffar MA (2018) Motion cues and saliency based unconstrained video segmentation. Multimed Tools Appl 77(6):7429–7446CrossRefGoogle Scholar
  63. 63.
    Vishwanathan S, Smola AJ et al (2004) Fast kernels for string and tree matching. Kernel methods in computational biology pp 113–130Google Scholar
  64. 64.
    Volkmer T, Tahahoghi S, Williams HE (2004) Rmit university at trecvid 2004. In: Proc. of TRECVID WorkshopGoogle Scholar
  65. 65.
    Walker R (2015) The importance of broadcast TV in local news consumption., [Online; accessed 3-September-2018]
  66. 66.
    Wikipedia (2016) Dayparting — Wikipedia, the free encyclopedia., [Online; accessed 22-July-2016]
  67. 67.
    Xu S, Feng B, Chen Z, Xu B (2013) A general framework of video segmentation to logical unit based on conditional random fields. In: International Conference on Multimedia Retrieval, pp 247–254Google Scholar
  68. 68.
    Yan C, Xie H, Chen J, Zha Z, Hao X, Zhang Y, Dai Q (2018a) Cross-modality bridging and knowledge transferring for image understanding. IEEE Trans Multimedia Early AccessGoogle Scholar
  69. 69.
    Yan C, Xie H, Chen J, Zha Z, Hao X, Zhang Y, Dai Q (2018b) A fast uyghur text detector for complex background images. IEEE Trans Multimedia 20 (12):3389–3398CrossRefGoogle Scholar
  70. 70.
    Yanagawa A, Hsu W, Chang SF (2005) Anchor shot detection in trecvid-2005 broadcast news videos. Tech. repGoogle Scholar
  71. 71.
    Young S, Woodland P, Byrne W (1993) Htk: Hidden markov model toolkit v1 5Google Scholar
  72. 72.
    Zhai Y, Chao X, Zhang Y, Javed O, Yilmaz A, Rafi F, Ali S, Alatas O, Khan S, Shah M (2004) University of central florida at trecvid 2004. In: Proc. of TRECVID WorkshopGoogle Scholar
  73. 73.
    Zhai Y, Yilmaz A, Shah M (2005) Story segmentation in news videos using visual and text cues. In: Image and Video Retrieval, Springer, pp 92–102Google Scholar
  74. 74.
    Zhang L, Zhu Z, Zhao Y (2007) Robust commercial detection system. In: International Conference on Multimedia and Expo, pp 587–590Google Scholar
  75. 75.
    Zlitni T, Bouaziz B, Mahdi W (2015) Automatic topics segmentation for TV news video using prior knowledge. Multimedia Tools and Applications pp 1–28Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electronics and Electrical EngineeringIndian Institute of Technology GuwahatiNorth GuwahatiIndia

Personalised recommendations