Video summarization using textual descriptions for authoring video blogs

Otani, Mayu; Nakashima, Yuta; Sato, Tomokazu; Yokoya, Naokazu

doi:10.1007/s11042-016-4061-3

Video summarization using textual descriptions for authoring video blogs

Published: 28 October 2016

Volume 76, pages 12097–12115, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mayu Otani¹,
Yuta Nakashima¹,
Tomokazu Sato¹ &
…
Naokazu Yokoya¹

653 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Authoring video blogs requires a video editing process, which is cumbersome for ordinary users. Video summarization can automate this process by extracting important segments from original videos. Because bloggers typically have certain stories for their blog posts, video summaries of a blog post should take the author’s intentions into account. However, most prior works address video summarization by mining patterns from the original videos without considering the blog author’s intentions. To generate a video summary that reflects the blog author’s intention, we focus on supporting texts in video blog posts and present a text-based method, in which the supporting text serves as a prior to the video summary. Given video and text that describe scenes of interest, our method segments videos and assigns to each video segment its priority in the summary based on its relevance to the input text. Our method then selects a subset of segments with content that is similar to the input text. Accordingly, our method produces different video summaries from the same set of videos, depending on the input text. We evaluated summaries generated from both blog viewers’ and authors’ perspectives in a user study. Experimental results demonstrate the advantages to the proposed text-based method for video blog authoring.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Videopedia: Lecture Video Recommendation for Educational Blogs Using Topic Modeling

Live blog summarization

Article Open access 02 January 2021

P. V. S. Avinesh, Maxime Peyrard & Christian M. Meyer

Content selection criteria for news multi-video summarization based on human strategies

Article 23 January 2020

Tamires Tessarolli de Souza Barbieri & Rudinei Goularte

References

Aizawa K, Ishijima K, Shiina M (2001) Summarizing wearable video. In: Proc. Int’l Conf. Image Processing (ICIP), pp. 398–401
Alexe B, Deselaers T, Ferrari V (2010) What is an object?. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 73–80
Babaguchi N, Kawai Y, Ogura T, Kitahashi T (2004) Personalized abstraction of broadcasted American football video by highlight selection. IEEE Trans Multimedia 6(4):575–586
Article Google Scholar
Chu WS, Jaimes A (2015) Video co-summarization: Video summarization by visual co-occurrence. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3584–3592
DeMenthon D, Kobla V, Doermann D (1998) Video summarization by curve simplification. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 211–218
Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G, Avrithis Y (2013) Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans Multimedia 15(7):1553–1568
Article Google Scholar
Frey BJ, Delbert D (2007) Clustering by passing messages between data points. Science 315:972–976
Article MathSciNet MATH Google Scholar
Girshick R, Donahue J, Darrell T, Berkeley UC, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2–9
Gong Y, Liu X (2000) Video summarization using singular value decomposition. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 174–180
Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Proc. Advances in Neural Information Processing Systems (NIPS), pp. 2069– 2077
Gygli M, Grabner H, van Gool L (2015) Video summarization by learning submodular mixtures of objectives. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3090–3098
Gygli M, Grabner H, Riemenschneider H, van Gool L (2014) Creating summaries from user videos. In: Proc. European Conf. Computer Vision (ECCV), pp. 505–520
Hong R, Tang J, Tan HK, Ngo CW, Yan S, Chua TS (2011) Beyond search: Event-driven summarization for web videos. ACM Trans. Multimedia Computing. Communications, and Applications 7(4):1–18
Google Scholar
Hu Y, Ren JS, Dai J, Yuan C, Xu L, Wang W (2015) Deep multimodal speaker naming. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 1107–1110
Huang CR, Lee HP, Chen CS (2008) Shot change detection via local keypoint matching. IEEE Trans. Multimedia 10(6):1097–1108
Article Google Scholar
Khosla A, Hamid R, Lin CJ, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2698–2705
Laganiėre R., Bacco R, Hocevar A, Lambert P, Païs G, Ionescu BE (2008) Video summarization from spatio-temporal features. In: Proc. ACM TRECVid Video Summarization Workshop, pp. 144–148
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1346–1353
Li Y, Merialdo B, Antipolis S (2010) VERT: Automatic evaluation of video summaries. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 851–854
Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2714–2721
Ma Y, Lu L, Zhang H, Li M (2002) A user attention model for video summarization. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 533–542
Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern Part C 41(6):797–819
Article Google Scholar
Mcdonald R (2007) A study of global inference algorithms in multi-document summarization. In: Proc. European Conf. Information Retrieval (ECIR), pp. 557–564
Money AG, Agius H (2008) Video summarisation: A conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143
Article Google Scholar
Nakashima Y, Yokoya N (2013) Inferring what the videographer wanted to capture. In: Proc. IEEE Int’l Conf. Image Processing (ICIP), pp. 191–195
Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–304
Article Google Scholar
Nguyen C, Niu Y, Liu F, Money AG, Agius H (2012) Video summagator: An interface for video summarization and navigation. In: Proc. SIGCHI Conf. Human Factors in Computing Systems, vol. 19, pp. 3–6
Otani M, Nakashima Y, Sato T, Yokoya N (2015) Textual description-based video summarization for video blogs. In: Proc. IEEE Int’l Conf. Multimedia and Expo (ICME), 6 pages
Perazzi F, Krahenbuhl P, Pritch Y, Hornung A (2012) Saliency filters: Contrast based filtering for salient region detection. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 733–740
Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: Proc. European Conf. Computer Vision (ECCV), pp. 540–555
Sang J, Xu C (2010) Character-based movie summarization. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 855–858
Song Y, Vallmitjana J, Stent A, Jaimes A (2015) TVSum: Summarizing web videos using titles. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 5179– 5187
Takamura H, Okumura M (2009) Text summarization model based on maximum coverage problem and its variant. In: Proc. Conf. of the European Chapter of the Association for Computational Linguistics (EACL), pp. 781–789
Taskiran CM, Pizlo Z, Amir A, Ponceleon D, Delp EEJ (2006) Automated video program summarization using speech transcripts. IEEE Trans Multimedia 8 (4):775–790
Article Google Scholar
Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proc. Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL), pp. 173–180
Truong BT, Venkatesh S (2007) Video abstraction: A systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):19
Article Google Scholar
Tschiatschek S, Iyer RK, Wei H, Bilmes JA (2014) Learning mixtures of submodular functions for image collection summarization. In: Proc. Advances in Neural Information Processing Systems (NIPS), pp. 1413–1421
Uchihashi S, Foote J, Girgensohn A, Boreczky J (1999) Video manga: Generating semantically meaningful video summaries. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 383– 392
Xu J, Mukherjee L, Li Y, Warner J, Rehg JM, Singh V (2015) Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2235–2244
Yan Q, Xu L, Shi J, Jia J (2013) Hierarchical saliency detection. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1155–1162
Yang H, Wang B, Lin S, Wipf D, Guo M, Guo B (2015) Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: Proc. IEEE Int’l Conf. Computer Vision (ICCV), pp. 4633– 4641
Zhao B, Xing EP (2014) Quasi real-time summarization for consumer videos. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2513–2520

Download references

Acknowledgements

This work was partially supported by Grants-in-Aid for Scientific Research No. 25730115, No. 25540086 and Young Scientists (B) No. 16K16086 from the Japan Society for the Promotion of Science (JSPS).

Author information

Authors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, NARA, 6300192, Japan
Mayu Otani, Yuta Nakashima, Tomokazu Sato & Naokazu Yokoya

Authors

Mayu Otani
View author publications
You can also search for this author in PubMed Google Scholar
Yuta Nakashima
View author publications
You can also search for this author in PubMed Google Scholar
Tomokazu Sato
View author publications
You can also search for this author in PubMed Google Scholar
Naokazu Yokoya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mayu Otani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Otani, M., Nakashima, Y., Sato, T. et al. Video summarization using textual descriptions for authoring video blogs. Multimed Tools Appl 76, 12097–12115 (2017). https://doi.org/10.1007/s11042-016-4061-3

Download citation

Received: 15 April 2016
Revised: 26 September 2016
Accepted: 10 October 2016
Published: 28 October 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s11042-016-4061-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video summarization using textual descriptions for authoring video blogs

Abstract

Access this article

Similar content being viewed by others

Videopedia: Lecture Video Recommendation for Educational Blogs Using Topic Modeling

Live blog summarization

Content selection criteria for news multi-video summarization based on human strategies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Video summarization using textual descriptions for authoring video blogs

Abstract

Access this article

Similar content being viewed by others

Videopedia: Lecture Video Recommendation for Educational Blogs Using Topic Modeling

Live blog summarization

Content selection criteria for news multi-video summarization based on human strategies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation