Video Fragmentation and Reverse Search on the Web

  • Evlampios ApostolidisEmail author
  • Konstantinos Apostolidis
  • Ioannis Patras
  • Vasileios Mezaris


This chapter is focused on methods and tools for video fragmentation and reverse search on the web. These technologies can assist journalists when they are dealing with fake news—which nowadays are being rapidly spread via social media platforms—that rely on the reuse of a previously posted video from a past event with the intention to mislead the viewers about a contemporary event. The fragmentation of a video into visually and temporally coherent parts and the extraction of a representative keyframe for each defined fragment enables the provision of a complete and concise keyframe-based summary of the video. Contrary to straightforward approaches that sample video frames with a constant step, the generated summary through video fragmentation and keyframe extraction is considerably more effective for discovering the video content and performing a fragment-level search for the video on the web. This chapter starts by explaining the nature and characteristics of this type of reuse-based fake news in its introductory part, and continues with an overview of existing approaches for temporal fragmentation of single-shot videos into sub-shots (the most appropriate level of temporal granularity when dealing with user-generated videos) and tools for performing reverse search of a video on the web. Subsequently, it describes two state-of-the-art methods for video sub-shot fragmentation—one relying on the assessment of the visual coherence over sequences of frames, and another one that is based on the identification of camera activity during the video recording—and presents the InVID web application that enables the fine-grained (at the fragment-level) reverse search for near-duplicates of a given video on the web. In the sequel, the chapter reports the findings of a series of experimental evaluations regarding the efficiency of the above-mentioned technologies, which indicate their competence to generate a concise and complete keyframe-based summary of the video content, and the use of this fragment-level representation for fine-grained reverse video search on the web. Finally, it draws conclusions about the effectiveness of the presented technologies and outlines our future plans for further advancing them.



The work reported in this chapter was supported by the EUs Horizon 2020 research and innovation program under grant agreements H2020-687786 InVID and H2020-732665 EMMA.


  1. 1.
    Kelm P, Schmiedeke S, Sikora T (2009) Feature-based video key frame extraction for low quality video sequences. In: 2009 10th workshop on image analysis for multimedia interactive services, pp 25–28 (2009).
  2. 2.
    Cooray SH, Bredin H, Xu LQ, O’Connor NE (2009) An interactive and multi-level framework for summarising user generated videos. In: Proceedings of the 17th ACM international conference on multimedia, MM ’09. ACM, New York, NY, USA, pp 685–688 (2009).
  3. 3.
    Mei T, Tang LX, Tang J, Hua XS (2013) Near-lossless semantic video summarization and its applications to video analysis. ACM Trans Multimed Comput Commun Appl 9(3):16:1–16:23 (2013). Scholar
  4. 4.
    González-Díaz I, Martínez-Cortés T, Gallardo-Antolín A, Díaz-de María F (2015) Temporal segmentation and keyframe selection methods for user-generated video search-based annotation. Expert Syst Appl 42(1):488–502. Scholar
  5. 5.
    Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: Proceedings of the 2013 IEEE conference on computer vision and pattern recognition, CVPR ’13. IEEE Computer Society, Washington, DC, USA, pp. 2714–2721.
  6. 6.
    Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [345], pp 2235–2244.
  7. 7.
    Karaman S, Benois-Pineau J, Dovgalecs V, Mégret R, Pinquier J, André-Obrecht R, Gaëstel Y, Dartigues JF (2014) Hierarchical hidden markov model in detecting activities of daily living in wearable videos for studies of dementia. Multimed Tools Appl 69(3):743–771. Scholar
  8. 8.
    Chu WT, Chuang PC, Yu, JY (2010) Video copy detection based on bag of trajectory and two-level approximate sequence. In: Matching, Proceedings of IPPR conference on computer vision, graphics, and image processing conference (2010)Google Scholar
  9. 9.
    Luo J, Papin C, Costello K (2009) Towards extracting semantically meaningful key frames from personal video clips: From humans to computers. IEEE Transactions Circuits and Systems for Video Technology 19(2):289–301. Scholar
  10. 10.
    Dumont E, Merialdo B, Essid S, Bailer W et al (2008) Rushes video summarization using a collaborative approach. In: TRECVID 2008, ACM International Conference on Multimedia Information Retrieval 2008, October 27-November 01, 2008, Vancouver, BC, Canada. Vancouver, CANADA. URL
  11. 11.
    Liu Y, Liu Y, Ren T, Chan K (2008) Rushes video summarization using audio-visual information and sequence alignment. In: Proceedings of the 2nd ACM TRECVid video summarization workshop, TVS ’08. ACM, New York, NY, USA, pp. 114–118.
  12. 12.
    Bai L, Hu Y, Lao S, Smeaton AF, O’Connor NE (2010) Automatic summarization of rushes video using bipartite graphs. Multimed Tools Appl 49(1):63–80. Scholar
  13. 13.
    Pan CM, Chuang YY, Hsu WH (2007) NTU TRECVID-2007 fast rushes summarization system. In: Proceedings of the international workshop on TRECVID video summarization, TVS ’07. ACM, New York, NY, USA, pp 74–78.
  14. 14.
    Teyssou D, Leung JM, Apostolidis E, Apostolidis K, Papadopoulos S, Zampoglou M, Papadopoulou O, Mezaris V (2017) The invid plug-in: web video verification on the browser. In: Proceedings of the first international workshop on multimedia verification, MuVer ’17. ACM, New York, NY, USA, pp 23–30.
  15. 15.
    Ojutkangas O, Peltola J, Järvinen S (2012) Location based abstraction of user generated mobile videos. Springer, Berlin, Heidelberg, pp 295–306. Scholar
  16. 16.
    Kim, J.G., Chang, H.S., Kim, J., Kim, H.M.: Efficient camera motion characterization for mpeg video indexing. In: 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proc.. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532), vol. 2, pp. 1171–1174 vol.2 (2000).
  17. 17.
    Durik M, Benois-Pineau J (2001) Robust motion characterisation for video indexing based on MPEG2 optical flow. In: International workshop on content-based multimedia indexing, CBMI01, pp 57–64Google Scholar
  18. 18.
    Nitta N, Babaguchi N (2013) [invited paper] content analysis for home videos. ITE Trans Media Technol Appl 1(2):91–100. Scholar
  19. 19.
    Cooray SH, O’Connor NE (2010) Identifying an efficient and robust sub-shot segmentation method for home movie summarisation. In: 2010 10th international conference on intelligent systems design and applications, pp 1287–1292.
  20. 20.
    Lowe D.G (1999) Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE international conference on computer vision, vol 2, pp 1150–1157Google Scholar
  21. 21.
    Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. Scholar
  22. 22.
    Bouguet JY (2001) Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corp 5(1–10):4Google Scholar
  23. 23.
    Apostolidis K, Apostolidis E, Mezaris V (2018) A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: Schoeffmann K, Chalidabhongse TH, Ngo CW, Aramvith S, O’Connor NE, Ho YS, Gabbouj M, Elgammal A (eds) MultiMedia modeling. Springer International Publishing, Cham, pp 29–41CrossRefGoogle Scholar
  24. 24.
    Haller M et al (2007) A generic approach for motion-based video parsing. In: 15th European signal processing conference, pp 713–717 (2007)Google Scholar
  25. 25.
    Abdollahian G, Taskiran CM, Pizlo Z, Delp EJ (2010) Camera motion-based analysis of user generated video. IEEE Trans Multimed 12(1):28–41. Scholar
  26. 26.
    Lan, D.J., Ma, Y.F., Zhang, H.J.: A novel motion-based representation for video mining. In: Proc. of the 2003 International Conference on Multimedia and Expo (ICME ’03), vol. 3, pp. III–469–72 vol.3 (2003).
  27. 27.
    Benois-Pineau J, Lovell BC, Andrews RJ (2013) Motion estimation in colour image sequences. Springer New York, NY, pp 377–395. Scholar
  28. 28.
    Koprinska I, Carrato S (1998) Video segmentation of mpeg compressed data. In: 1998 IEEE international conference on electronics, circuits and systems, vol 2. Surfing the Waves of Science and Technology (Cat No 98EX196), pp 243–246.
  29. 29.
    Grana C, Cucchiara R (2006) Sub-shot summarization for MPEG-7 based fast browsing. In: Post-Proceedings of the second Italian research conference on digital library management systems (IRCDL 2006), Padova, 27th Jan 2006 [113], pp. 80–84Google Scholar
  30. 30.
    Wang G, Seo B, Zimmermann R (2012) Motch: an automatic motion type characterization system for sensor-rich videos. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12. ACM, New York, NY, USA, pp 1319–1320 (2012).
  31. 31.
    Cricri F, Dabov K, Curcio IDD, Mate S, Gabbouj M (2011) Multimodal event detection in user generated videos. In: 2011 IEEE international symposium on multimedia, pp 263–270 (2011).
  32. 32.
    Ngo CW, Pong TC, Zhang HJ (2003) Motion analysis and segmentation through spatio-temporal slices processing. IEEE Trans Image Process 12(3):341–355. Scholar
  33. 33.
    Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305. Scholar
  34. 34.
    Mohanta PP, Saha SK, Chanda B (2008) Detection of representative frames of a shot using multivariate wald-wolfowitz test. In: 2008 19th international conference on pattern recognition, pp 1–4.
  35. 35.
    Omidyeganeh M, Ghaemmaghami S, Shirmohammadi S (2011) Video keyframe analysis using a segment-based statistical metric in a visually sensitive parametric space. IEEE Trans Image Process 20(10):2730–2737. Scholar
  36. 36.
    Guo Y, Xu Q, Sun S, Luo X, Sbert M (2016) Selecting video key frames based on relative entropy and the extreme studentized deviate test. Entropy 18(3):73 (2016). Scholar
  37. 37.
    Kasutani E, Yamada A (2001) The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In: Proceedings of 2001 international conference on image processing (Cat. No.01CH37205), vol 1, pp 674–677.
  38. 38.
    Shi J et al (1994) Good features to track. In: Proceedigns of the IEEE conference on computer vision and pattern recognition, pp 593–600Google Scholar
  39. 39.
    Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE international conference on computer vision (ICCV 2011), pp 2564–2571Google Scholar
  40. 40.
    Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM Commun 24(6):381–395. Scholar
  41. 41.
    Apostolidis E, Mezaris V (2014) Fast shot segmentation combining global and local visual descriptors. In: Proceedings of the 2014 IEEE international conference on acoustics, speech and signal processing, pp 6583–6587 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Evlampios Apostolidis
    • 1
    • 2
    Email author
  • Konstantinos Apostolidis
    • 1
  • Ioannis Patras
    • 2
  • Vasileios Mezaris
    • 1
  1. 1.Information Technologies InstituteCentre for Research and Technology HellasThessalonikiGreece
  2. 2.School of Electronic Engineering and Computer ScienceQueen Mary UniversityLondonUK

Personalised recommendations