Multimodal Video Annotation for Retrieval and Discovery of Newsworthy Video in a News Verification Scenario

  • Lyndon Nixon
  • Evlampios ApostolidisEmail author
  • Foteini Markatopoulou
  • Ioannis Patras
  • Vasileios Mezaris
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11295)


This paper describes the combination of advanced technologies for social-media-based story detection, story-based video retrieval and concept-based video (fragment) labeling under a novel approach for multimodal video annotation. This approach involves textual metadata, structural information and visual concepts - and a multimodal analytics dashboard that enables journalists to discover videos of news events, posted to social networks, in order to verify the details of the events shown. It outlines the characteristics of each individual method and describes how these techniques are blended to facilitate the content-based retrieval, discovery and summarization of (parts of) news videos. A set of case-driven experiments conducted with the help of journalists, indicate that the proposed multimodal video annotation mechanism - combined with a professional analytics dashboard which presents the collected and generated metadata about the news stories and their visual summaries - can support journalists in their content discovery and verification work.


News video verification Story detection Video retrieval Video fragmentation Video annotation Video summarization 



This work was supported by the EU’s Horizon 2020 research and innovation programme under grant agreement H2020-687786 InVID.


  1. 1.
    Apostolidis, E., Mezaris, V.: Fast shot segmentation combining global and local visual descriptors. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6583–6587 (2014)Google Scholar
  2. 2.
    Cooray, S.H., O’Connor, N.E.: Identifying an efficient and robust sub-shot segmentation method for home movie summarisation. In: 10th International Conference on Intelligent Systems Design and Applications, pp. 1287–1292 (2010)Google Scholar
  3. 3.
    He, K., Zhang, X., et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  4. 4.
    Markatopoulou, F., Mezaris, V., et al.: Implicit and explicit concept relations in deep neural networks for multi-label video/image annotation. IEEE Trans. Circuits Syst. Video Technol. 1 (2018)Google Scholar
  5. 5.
    Nixon, L.J.B., Zhu, S., et al.: Video retrieval for multimedia verification of breaking news on social networks. In: 1st International Workshop on Multimedia Verification (MuVer 2017) at ACM Multimedia Conference, MuVer 2017, pp. 13–21. ACM (2017)Google Scholar
  6. 6.
    Over, P.D., Fiscus, J.G., et al.: TRECVID 2013-An overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID 2013. NIST, USA (2013)Google Scholar
  7. 7.
    Pan, C.M., Chuang, Y.Y., et al.: NTU TRECVID-2007 fast rushes summarization system. In: TRECVID Workshop on Video Summarization, pp. 74–78. ACM (2007)Google Scholar
  8. 8.
    Pittaras, N., Markatopoulou, F., Mezaris, V., Patras, I.: Comparison of fine-tuning and extension strategies for deep convolutional neural networks. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10132, pp. 102–114. Springer, Cham (2017). Scholar
  9. 9.
    Rublee, E., Rabaud, V., et al.: ORB: an efficient alternative to SIFT or SURF. In: 2011 International Conference on Computer Vision, pp. 2564–2571 (2011)Google Scholar
  10. 10.
    Russakovsky, O., Deng, J., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Seo, K., Park, S.J., et al.: Wipe scene-change detector based on visual rhythm spectrum. IEEE Trans. Consum. Electron. 55(2), 831–838 (2009)CrossRefGoogle Scholar
  12. 12.
    Su, C.W., Tyan, H.R., et al.: A motion-tolerant dissolve detection algorithm. IEEE Int. Conf. Multimedia Expo. 2, 225–228 (2002)CrossRefGoogle Scholar
  13. 13.
    Szegedy, C., Liu, W., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  14. 14.
    Teyssou, D., Leung, J.M., et al.: The InVID plug-in: web video verification on the browser. In: 1st International Workshop on Multimedia Verification (MuVer 2017) at ACM Multimedia Conference, pp. 23–30. ACM (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Lyndon Nixon
    • 1
  • Evlampios Apostolidis
    • 2
    • 3
    Email author
  • Foteini Markatopoulou
    • 2
  • Ioannis Patras
    • 3
  • Vasileios Mezaris
    • 2
  1. 1.MODUL Technology GmbHViennaAustria
  2. 2.Centre for Research and Technology HellasThermi-ThessalonikiGreece
  3. 3.School of EECSQueen Mary University of LondonLondonUK

Personalised recommendations