Abstract
Authoring video blogs requires a video editing process, which is cumbersome for ordinary users. Video summarization can automate this process by extracting important segments from original videos. Because bloggers typically have certain stories for their blog posts, video summaries of a blog post should take the author’s intentions into account. However, most prior works address video summarization by mining patterns from the original videos without considering the blog author’s intentions. To generate a video summary that reflects the blog author’s intention, we focus on supporting texts in video blog posts and present a text-based method, in which the supporting text serves as a prior to the video summary. Given video and text that describe scenes of interest, our method segments videos and assigns to each video segment its priority in the summary based on its relevance to the input text. Our method then selects a subset of segments with content that is similar to the input text. Accordingly, our method produces different video summaries from the same set of videos, depending on the input text. We evaluated summaries generated from both blog viewers’ and authors’ perspectives in a user study. Experimental results demonstrate the advantages to the proposed text-based method for video blog authoring.
Similar content being viewed by others
References
Aizawa K, Ishijima K, Shiina M (2001) Summarizing wearable video. In: Proc. Int’l Conf. Image Processing (ICIP), pp. 398–401
Alexe B, Deselaers T, Ferrari V (2010) What is an object?. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 73–80
Babaguchi N, Kawai Y, Ogura T, Kitahashi T (2004) Personalized abstraction of broadcasted American football video by highlight selection. IEEE Trans Multimedia 6(4):575–586
Chu WS, Jaimes A (2015) Video co-summarization: Video summarization by visual co-occurrence. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3584–3592
DeMenthon D, Kobla V, Doermann D (1998) Video summarization by curve simplification. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 211–218
Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G, Avrithis Y (2013) Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans Multimedia 15(7):1553–1568
Frey BJ, Delbert D (2007) Clustering by passing messages between data points. Science 315:972–976
Girshick R, Donahue J, Darrell T, Berkeley UC, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2–9
Gong Y, Liu X (2000) Video summarization using singular value decomposition. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 174–180
Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Proc. Advances in Neural Information Processing Systems (NIPS), pp. 2069– 2077
Gygli M, Grabner H, van Gool L (2015) Video summarization by learning submodular mixtures of objectives. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3090–3098
Gygli M, Grabner H, Riemenschneider H, van Gool L (2014) Creating summaries from user videos. In: Proc. European Conf. Computer Vision (ECCV), pp. 505–520
Hong R, Tang J, Tan HK, Ngo CW, Yan S, Chua TS (2011) Beyond search: Event-driven summarization for web videos. ACM Trans. Multimedia Computing. Communications, and Applications 7(4):1–18
Hu Y, Ren JS, Dai J, Yuan C, Xu L, Wang W (2015) Deep multimodal speaker naming. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 1107–1110
Huang CR, Lee HP, Chen CS (2008) Shot change detection via local keypoint matching. IEEE Trans. Multimedia 10(6):1097–1108
Khosla A, Hamid R, Lin CJ, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2698–2705
Laganiėre R., Bacco R, Hocevar A, Lambert P, Païs G, Ionescu BE (2008) Video summarization from spatio-temporal features. In: Proc. ACM TRECVid Video Summarization Workshop, pp. 144–148
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1346–1353
Li Y, Merialdo B, Antipolis S (2010) VERT: Automatic evaluation of video summaries. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 851–854
Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2714–2721
Ma Y, Lu L, Zhang H, Li M (2002) A user attention model for video summarization. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 533–542
Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern Part C 41(6):797–819
Mcdonald R (2007) A study of global inference algorithms in multi-document summarization. In: Proc. European Conf. Information Retrieval (ECIR), pp. 557–564
Money AG, Agius H (2008) Video summarisation: A conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143
Nakashima Y, Yokoya N (2013) Inferring what the videographer wanted to capture. In: Proc. IEEE Int’l Conf. Image Processing (ICIP), pp. 191–195
Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–304
Nguyen C, Niu Y, Liu F, Money AG, Agius H (2012) Video summagator: An interface for video summarization and navigation. In: Proc. SIGCHI Conf. Human Factors in Computing Systems, vol. 19, pp. 3–6
Otani M, Nakashima Y, Sato T, Yokoya N (2015) Textual description-based video summarization for video blogs. In: Proc. IEEE Int’l Conf. Multimedia and Expo (ICME), 6 pages
Perazzi F, Krahenbuhl P, Pritch Y, Hornung A (2012) Saliency filters: Contrast based filtering for salient region detection. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 733–740
Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: Proc. European Conf. Computer Vision (ECCV), pp. 540–555
Sang J, Xu C (2010) Character-based movie summarization. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 855–858
Song Y, Vallmitjana J, Stent A, Jaimes A (2015) TVSum: Summarizing web videos using titles. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 5179– 5187
Takamura H, Okumura M (2009) Text summarization model based on maximum coverage problem and its variant. In: Proc. Conf. of the European Chapter of the Association for Computational Linguistics (EACL), pp. 781–789
Taskiran CM, Pizlo Z, Amir A, Ponceleon D, Delp EEJ (2006) Automated video program summarization using speech transcripts. IEEE Trans Multimedia 8 (4):775–790
Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proc. Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL), pp. 173–180
Truong BT, Venkatesh S (2007) Video abstraction: A systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):19
Tschiatschek S, Iyer RK, Wei H, Bilmes JA (2014) Learning mixtures of submodular functions for image collection summarization. In: Proc. Advances in Neural Information Processing Systems (NIPS), pp. 1413–1421
Uchihashi S, Foote J, Girgensohn A, Boreczky J (1999) Video manga: Generating semantically meaningful video summaries. In: Proc. ACM Int’l Conf. Multimedia (MM), pp. 383– 392
Xu J, Mukherjee L, Li Y, Warner J, Rehg JM, Singh V (2015) Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2235–2244
Yan Q, Xu L, Shi J, Jia J (2013) Hierarchical saliency detection. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1155–1162
Yang H, Wang B, Lin S, Wipf D, Guo M, Guo B (2015) Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: Proc. IEEE Int’l Conf. Computer Vision (ICCV), pp. 4633– 4641
Zhao B, Xing EP (2014) Quasi real-time summarization for consumer videos. In: Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2513–2520
Acknowledgements
This work was partially supported by Grants-in-Aid for Scientific Research No. 25730115, No. 25540086 and Young Scientists (B) No. 16K16086 from the Japan Society for the Promotion of Science (JSPS).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Otani, M., Nakashima, Y., Sato, T. et al. Video summarization using textual descriptions for authoring video blogs. Multimed Tools Appl 76, 12097–12115 (2017). https://doi.org/10.1007/s11042-016-4061-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4061-3