Advertisement

Multimedia Tools and Applications

, Volume 76, Issue 9, pp 12097–12115 | Cite as

Video summarization using textual descriptions for authoring video blogs

  • Mayu Otani
  • Yuta Nakashima
  • Tomokazu Sato
  • Naokazu Yokoya
Article

Abstract

Authoring video blogs requires a video editing process, which is cumbersome for ordinary users. Video summarization can automate this process by extracting important segments from original videos. Because bloggers typically have certain stories for their blog posts, video summaries of a blog post should take the author’s intentions into account. However, most prior works address video summarization by mining patterns from the original videos without considering the blog author’s intentions. To generate a video summary that reflects the blog author’s intention, we focus on supporting texts in video blog posts and present a text-based method, in which the supporting text serves as a prior to the video summary. Given video and text that describe scenes of interest, our method segments videos and assigns to each video segment its priority in the summary based on its relevance to the input text. Our method then selects a subset of segments with content that is similar to the input text. Accordingly, our method produces different video summaries from the same set of videos, depending on the input text. We evaluated summaries generated from both blog viewers’ and authors’ perspectives in a user study. Experimental results demonstrate the advantages to the proposed text-based method for video blog authoring.

Keywords

Text-based video summarization Video skimming User study 

Notes

Acknowledgements

This work was partially supported by Grants-in-Aid for Scientific Research No. 25730115, No. 25540086 and Young Scientists (B) No. 16K16086 from the Japan Society for the Promotion of Science (JSPS).

References

  1. 1.
    Aizawa K, Ishijima K, Shiina M (2001) Summarizing wearable video. In: Proc. Int’l Conf. Image Processing (ICIP), pp. 398–401Google Scholar
  2. 2.
    Alexe B, Deselaers T, Ferrari V (2010) What is an object?. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 73–80Google Scholar
  3. 3.
    Babaguchi N, Kawai Y, Ogura T, Kitahashi T (2004) Personalized abstraction of broadcasted American football video by highlight selection. IEEE Trans Multimedia 6(4):575–586CrossRefGoogle Scholar
  4. 4.
    Chu WS, Jaimes A (2015) Video co-summarization: Video summarization by visual co-occurrence. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3584–3592Google Scholar
  5. 5.
    DeMenthon D, Kobla V, Doermann D (1998) Video summarization by curve simplification. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 211–218Google Scholar
  6. 6.
    Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G, Avrithis Y (2013) Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans Multimedia 15(7):1553–1568CrossRefGoogle Scholar
  7. 7.
    Frey BJ, Delbert D (2007) Clustering by passing messages between data points. Science 315:972–976MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Girshick R, Donahue J, Darrell T, Berkeley UC, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2–9Google Scholar
  9. 9.
    Gong Y, Liu X (2000) Video summarization using singular value decomposition. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 174–180Google Scholar
  10. 10.
    Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Proc. Advances in Neural Information Processing Systems (NIPS), pp. 2069– 2077Google Scholar
  11. 11.
    Gygli M, Grabner H, van Gool L (2015) Video summarization by learning submodular mixtures of objectives. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3090–3098Google Scholar
  12. 12.
    Gygli M, Grabner H, Riemenschneider H, van Gool L (2014) Creating summaries from user videos. In: Proc. European Conf. Computer Vision (ECCV), pp. 505–520Google Scholar
  13. 13.
    Hong R, Tang J, Tan HK, Ngo CW, Yan S, Chua TS (2011) Beyond search: Event-driven summarization for web videos. ACM Trans. Multimedia Computing. Communications, and Applications 7(4):1–18Google Scholar
  14. 14.
    Hu Y, Ren JS, Dai J, Yuan C, Xu L, Wang W (2015) Deep multimodal speaker naming. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 1107–1110Google Scholar
  15. 15.
    Huang CR, Lee HP, Chen CS (2008) Shot change detection via local keypoint matching. IEEE Trans. Multimedia 10(6):1097–1108CrossRefGoogle Scholar
  16. 16.
    Khosla A, Hamid R, Lin CJ, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2698–2705Google Scholar
  17. 17.
    Laganiėre R., Bacco R, Hocevar A, Lambert P, Païs G, Ionescu BE (2008) Video summarization from spatio-temporal features. In: Proc. ACM TRECVid Video Summarization Workshop, pp. 144–148Google Scholar
  18. 18.
    Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1346–1353Google Scholar
  19. 19.
    Li Y, Merialdo B, Antipolis S (2010) VERT: Automatic evaluation of video summaries. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 851–854Google Scholar
  20. 20.
    Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2714–2721Google Scholar
  21. 21.
    Ma Y, Lu L, Zhang H, Li M (2002) A user attention model for video summarization. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 533–542Google Scholar
  22. 22.
    Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern Part C 41(6):797–819CrossRefGoogle Scholar
  23. 23.
    Mcdonald R (2007) A study of global inference algorithms in multi-document summarization. In: Proc. European Conf. Information Retrieval (ECIR), pp. 557–564Google Scholar
  24. 24.
    Money AG, Agius H (2008) Video summarisation: A conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143CrossRefGoogle Scholar
  25. 25.
    Nakashima Y, Yokoya N (2013) Inferring what the videographer wanted to capture. In: Proc. IEEE Int’l Conf. Image Processing (ICIP), pp. 191–195Google Scholar
  26. 26.
    Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–304CrossRefGoogle Scholar
  27. 27.
    Nguyen C, Niu Y, Liu F, Money AG, Agius H (2012) Video summagator: An interface for video summarization and navigation. In: Proc. SIGCHI Conf. Human Factors in Computing Systems, vol. 19, pp. 3–6Google Scholar
  28. 28.
    Otani M, Nakashima Y, Sato T, Yokoya N (2015) Textual description-based video summarization for video blogs. In: Proc. IEEE Int’l Conf. Multimedia and Expo (ICME), 6 pagesGoogle Scholar
  29. 29.
    Perazzi F, Krahenbuhl P, Pritch Y, Hornung A (2012) Saliency filters: Contrast based filtering for salient region detection. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 733–740Google Scholar
  30. 30.
    Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: Proc. European Conf. Computer Vision (ECCV), pp. 540–555Google Scholar
  31. 31.
    Sang J, Xu C (2010) Character-based movie summarization. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 855–858Google Scholar
  32. 32.
    Song Y, Vallmitjana J, Stent A, Jaimes A (2015) TVSum: Summarizing web videos using titles. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 5179– 5187Google Scholar
  33. 33.
    Takamura H, Okumura M (2009) Text summarization model based on maximum coverage problem and its variant. In: Proc. Conf. of the European Chapter of the Association for Computational Linguistics (EACL), pp. 781–789Google Scholar
  34. 34.
    Taskiran CM, Pizlo Z, Amir A, Ponceleon D, Delp EEJ (2006) Automated video program summarization using speech transcripts. IEEE Trans Multimedia 8 (4):775–790CrossRefGoogle Scholar
  35. 35.
    Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proc. Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL), pp. 173–180Google Scholar
  36. 36.
    Truong BT, Venkatesh S (2007) Video abstraction: A systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):19CrossRefGoogle Scholar
  37. 37.
    Tschiatschek S, Iyer RK, Wei H, Bilmes JA (2014) Learning mixtures of submodular functions for image collection summarization. In: Proc. Advances in Neural Information Processing Systems (NIPS), pp. 1413–1421Google Scholar
  38. 38.
    Uchihashi S, Foote J, Girgensohn A, Boreczky J (1999) Video manga: Generating semantically meaningful video summaries. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 383– 392Google Scholar
  39. 39.
    Xu J, Mukherjee L, Li Y, Warner J, Rehg JM, Singh V (2015) Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2235–2244Google Scholar
  40. 40.
    Yan Q, Xu L, Shi J, Jia J (2013) Hierarchical saliency detection. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1155–1162Google Scholar
  41. 41.
    Yang H, Wang B, Lin S, Wipf D, Guo M, Guo B (2015) Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: Proc. IEEE Int’l Conf. Computer Vision (ICCV), pp. 4633– 4641Google Scholar
  42. 42.
    Zhao B, Xing EP (2014) Quasi real-time summarization for consumer videos. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2513–2520Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Mayu Otani
    • 1
  • Yuta Nakashima
    • 1
  • Tomokazu Sato
    • 1
  • Naokazu Yokoya
    • 1
  1. 1.Graduate School of Information ScienceNara Institute of Science and TechnologyIkomaJapan

Personalised recommendations