Skip to main content
Log in

Video summarization using textual descriptions for authoring video blogs

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Authoring video blogs requires a video editing process, which is cumbersome for ordinary users. Video summarization can automate this process by extracting important segments from original videos. Because bloggers typically have certain stories for their blog posts, video summaries of a blog post should take the author’s intentions into account. However, most prior works address video summarization by mining patterns from the original videos without considering the blog author’s intentions. To generate a video summary that reflects the blog author’s intention, we focus on supporting texts in video blog posts and present a text-based method, in which the supporting text serves as a prior to the video summary. Given video and text that describe scenes of interest, our method segments videos and assigns to each video segment its priority in the summary based on its relevance to the input text. Our method then selects a subset of segments with content that is similar to the input text. Accordingly, our method produces different video summaries from the same set of videos, depending on the input text. We evaluated summaries generated from both blog viewers’ and authors’ perspectives in a user study. Experimental results demonstrate the advantages to the proposed text-based method for video blog authoring.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Aizawa K, Ishijima K, Shiina M (2001) Summarizing wearable video. In: Proc. Int’l Conf. Image Processing (ICIP), pp. 398–401

  2. Alexe B, Deselaers T, Ferrari V (2010) What is an object?. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 73–80

  3. Babaguchi N, Kawai Y, Ogura T, Kitahashi T (2004) Personalized abstraction of broadcasted American football video by highlight selection. IEEE Trans Multimedia 6(4):575–586

    Article  Google Scholar 

  4. Chu WS, Jaimes A (2015) Video co-summarization: Video summarization by visual co-occurrence. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3584–3592

  5. DeMenthon D, Kobla V, Doermann D (1998) Video summarization by curve simplification. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 211–218

  6. Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G, Avrithis Y (2013) Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans Multimedia 15(7):1553–1568

    Article  Google Scholar 

  7. Frey BJ, Delbert D (2007) Clustering by passing messages between data points. Science 315:972–976

    Article  MathSciNet  MATH  Google Scholar 

  8. Girshick R, Donahue J, Darrell T, Berkeley UC, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2–9

  9. Gong Y, Liu X (2000) Video summarization using singular value decomposition. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 174–180

  10. Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Proc. Advances in Neural Information Processing Systems (NIPS), pp. 2069– 2077

  11. Gygli M, Grabner H, van Gool L (2015) Video summarization by learning submodular mixtures of objectives. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3090–3098

  12. Gygli M, Grabner H, Riemenschneider H, van Gool L (2014) Creating summaries from user videos. In: Proc. European Conf. Computer Vision (ECCV), pp. 505–520

  13. Hong R, Tang J, Tan HK, Ngo CW, Yan S, Chua TS (2011) Beyond search: Event-driven summarization for web videos. ACM Trans. Multimedia Computing. Communications, and Applications 7(4):1–18

    Google Scholar 

  14. Hu Y, Ren JS, Dai J, Yuan C, Xu L, Wang W (2015) Deep multimodal speaker naming. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 1107–1110

  15. Huang CR, Lee HP, Chen CS (2008) Shot change detection via local keypoint matching. IEEE Trans. Multimedia 10(6):1097–1108

    Article  Google Scholar 

  16. Khosla A, Hamid R, Lin CJ, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2698–2705

  17. Laganiėre R., Bacco R, Hocevar A, Lambert P, Païs G, Ionescu BE (2008) Video summarization from spatio-temporal features. In: Proc. ACM TRECVid Video Summarization Workshop, pp. 144–148

  18. Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1346–1353

  19. Li Y, Merialdo B, Antipolis S (2010) VERT: Automatic evaluation of video summaries. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 851–854

  20. Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2714–2721

  21. Ma Y, Lu L, Zhang H, Li M (2002) A user attention model for video summarization. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 533–542

  22. Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern Part C 41(6):797–819

    Article  Google Scholar 

  23. Mcdonald R (2007) A study of global inference algorithms in multi-document summarization. In: Proc. European Conf. Information Retrieval (ECIR), pp. 557–564

  24. Money AG, Agius H (2008) Video summarisation: A conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143

    Article  Google Scholar 

  25. Nakashima Y, Yokoya N (2013) Inferring what the videographer wanted to capture. In: Proc. IEEE Int’l Conf. Image Processing (ICIP), pp. 191–195

  26. Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–304

    Article  Google Scholar 

  27. Nguyen C, Niu Y, Liu F, Money AG, Agius H (2012) Video summagator: An interface for video summarization and navigation. In: Proc. SIGCHI Conf. Human Factors in Computing Systems, vol. 19, pp. 3–6

  28. Otani M, Nakashima Y, Sato T, Yokoya N (2015) Textual description-based video summarization for video blogs. In: Proc. IEEE Int’l Conf. Multimedia and Expo (ICME), 6 pages

  29. Perazzi F, Krahenbuhl P, Pritch Y, Hornung A (2012) Saliency filters: Contrast based filtering for salient region detection. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 733–740

  30. Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: Proc. European Conf. Computer Vision (ECCV), pp. 540–555

  31. Sang J, Xu C (2010) Character-based movie summarization. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 855–858

  32. Song Y, Vallmitjana J, Stent A, Jaimes A (2015) TVSum: Summarizing web videos using titles. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 5179– 5187

  33. Takamura H, Okumura M (2009) Text summarization model based on maximum coverage problem and its variant. In: Proc. Conf. of the European Chapter of the Association for Computational Linguistics (EACL), pp. 781–789

  34. Taskiran CM, Pizlo Z, Amir A, Ponceleon D, Delp EEJ (2006) Automated video program summarization using speech transcripts. IEEE Trans Multimedia 8 (4):775–790

    Article  Google Scholar 

  35. Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proc. Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL), pp. 173–180

  36. Truong BT, Venkatesh S (2007) Video abstraction: A systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):19

    Article  Google Scholar 

  37. Tschiatschek S, Iyer RK, Wei H, Bilmes JA (2014) Learning mixtures of submodular functions for image collection summarization. In: Proc. Advances in Neural Information Processing Systems (NIPS), pp. 1413–1421

  38. Uchihashi S, Foote J, Girgensohn A, Boreczky J (1999) Video manga: Generating semantically meaningful video summaries. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 383– 392

  39. Xu J, Mukherjee L, Li Y, Warner J, Rehg JM, Singh V (2015) Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2235–2244

  40. Yan Q, Xu L, Shi J, Jia J (2013) Hierarchical saliency detection. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1155–1162

  41. Yang H, Wang B, Lin S, Wipf D, Guo M, Guo B (2015) Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: Proc. IEEE Int’l Conf. Computer Vision (ICCV), pp. 4633– 4641

  42. Zhao B, Xing EP (2014) Quasi real-time summarization for consumer videos. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2513–2520

Download references

Acknowledgements

This work was partially supported by Grants-in-Aid for Scientific Research No. 25730115, No. 25540086 and Young Scientists (B) No. 16K16086 from the Japan Society for the Promotion of Science (JSPS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mayu Otani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Otani, M., Nakashima, Y., Sato, T. et al. Video summarization using textual descriptions for authoring video blogs. Multimed Tools Appl 76, 12097–12115 (2017). https://doi.org/10.1007/s11042-016-4061-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4061-3

Keywords

Navigation