Abstract
The availability of smart phones with embedded video capturing mechanisms along with gigantic storage facilities has led to generation of a plethora of videos. This deluge of videos grasped the attention of the computer vision research community to deal with the problem of efficient browsing, indexing, and retrieving the intended video. Video summarization has come up as a solution to aforementioned issues where a short summary video is generated containing important information from the original video. This paper proposes a supervised attentive convolution network for summarization (ACN-SUM) framework for binary labeling of video frames. ACN-SUM is based on encoder–decoder architecture where the encoder is an attention-aware convolution network module, while the decoder comprises the deconvolution network module. In ACN-SUM, the self-attention module captures the long-range temporal dependencies among frames and concatenation of convolution network and attention module feature map result in more informative encoded frame descriptors. These encoded features are passed to the deconvolution module to generate frames labeling for keyframe selection. Experimental results demonstrate the efficiency of the proposed model against state-of-the-art methods. The performance of the proposed network has been evaluated on two benchmark datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cisco (2020) Cisco annual internet report (2018–2023). Cisco, pp 1–41
Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):3:1–3:37
Feng Y, Yuan Y, Lu X (2017) Learning deep event models for crowd anomaly detection. Neurocomputing 219:548–556
Tejero-de-Pablos A, Nakashima Y, Sato T, Yokoya N, Linna M, Rahtu E (2018) Summarization of user-generated sports video by using deep action recognition features. IEEE Trans Multimed 20(8):2000–2011. https://doi.org/10.1109/TMM.2018.2794265
Mundur P, Rao Y, Yesha Y (2006) Keyframe-based video summarization using Delaunay clustering. Int J Digit Libr 6:219–232
Avila S, Brandaolopes A, Luz A, Araujo A (2011) VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68
Furini M, Geraci F, Montangero M, Pellegrini M (2010) STIMO: STIll and MOving video storyboard for the web scenario. Multimed Tools Appl 46:47. https://doi.org/10.1007/s11042-009-0307-7
Mahmoud KM, Ismail MA, Ghanem NM (2013) VSCAN: an enhanced video summarization using density-based spatial clustering. In: Petrosino A (eds) Image analysis and processing—ICIAP 2013. Lecture notes in computer science, vol 8156. Springer, Berlin, Heidelberg
Wu J, Zhong S, Jiang J, Yang Y (2017) A novel clustering method for static video summarization. Multimed Tools Appl 76:9625–9641. https://doi.org/10.1007/s11042-016-3569-x
Kumar K, Shrimankar DD, Singh N (2018) Eratosthenes sieve based key-frame extraction technique for event summarization in videos. Multimed Tools Appl 77:7383–7404
Shroff N, Turaga SP, Chellappa R (2010) Video précis: highlighting diverse aspects of videos. IEEE Trans Multimed 12(8):853–868
Mahmoud KM, Ghanem NM, Ismail MA (2013) VGRAPH: an effective approach for generating static video summaries. In: 2013 IEEE international conference on computer vision workshops, Sydney, NSW, pp 811–818. https://doi.org/10.1109/ICCVW.2013.111
Iparraguirre J, Delrieux C (2013) Speeded-up video summarization based on local features. In: 2013 IEEE international symposium on multimedia, Anaheim, CA, pp 370–373. https://doi.org/10.1109/ISM.2013.70
Srinivas M, Pai MM, Pai RM (2016) An improved algorithm for video summarization a rank based approach. Procedia Comput Sci 89:812–819
Gong B, Grauman K (2014) Diverse sequential subset selection for supervised video summarization. Adv Neural Inf Process Syst 3:2069–2077
Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: 2013 IEEE conference on computer vision and pattern recognition, Portland, OR, pp 2714–2721. https://doi.org/10.1109/CVPR.2013.350
Sharghi A, Borji A, Li C, Yang T, Gong B (2018) Improving sequential determinantal point processes for supervised video summarization. In: Lecture notes computer science (including Lecture notes artificial intelligence. Lecture notes bioinformatics) LNCS, vol 11207, pp 533–550
Zhang K, Chao WL, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: ECCV. Springer, pp 766–782
Yujia Z, Kampffmeyer M, Zhao X, Tan M (2019) DTR-GAN: dilated temporal relational adversarial network for video summarization. In: ACM TURC ‘19: proceedings of the ACM turing celebration conference—China, pp 1–6. https://doi.org/10.1145/3321408.3322622
Lei Z, Zhang C, Zhang Q, Qiu G (2019) FrameRank: a text processing approach to video summarization. In: IEEE international conference on multimedia and expo (ICME), Shanghai, pp 368–373. https://doi.org/10.1109/ICME.2019.00071
Wei H, Ni B, Yan Y, Yu H, Yang X, Yao C (2018) Video summarization via semantic attended networks. In: Proceeding 22nd AAAI conference, artificial intelligence, pp 216–223
Gygli M, Grabner H, Riemenschneider H, Gool NV (2014) Creating summaries from user videos. In: Proceeding European conference on computer vision. Springer, pp 505–520
Asadi E, Charkari NM (2012) Video summarization using fuzzy c-means clustering. In: 20th Iranian conference on electrical engineering (ICEE2012), Tehran, pp 690–694. https://doi.org/10.1109/IranianCEE.2012.6292442
Viguier R, Lin CC (2015) Automatic video content summarization using geospatial mosaics of aerial imagery. In: 2015 IEEE international symposium on multimedia (ISM), Miami, FL, pp 249–253. https://doi.org/10.1109/ISM.2015.124
Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial LSTM networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR 2017, vol 2017, Jan 2017, pp 2982–2991
Garcia A, Boix X, Lim J, Tan A (2012) Active video summarization: customized summaries via on-line interaction with the user. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 4046–4052
Anirudh R, Masroor A, Turaga P (2016) Diversity promoting online sampling for streaming video summarization. In: 2016 IEEE international conference on image processing (ICIP), Phoenix, AZ, pp 3329–3333. https://doi.org/10.1109/ICIP.2016.7532976
Sharghi A, Borji A, Li C, Yang T, Gong B (2018) Improving sequential determinantal point processes for supervised video summarization. In: Proceedings of the European conference on computer vision (ECCV), pp 517–533
Yandong L, Wang L, Yang T, Gong B (2018) How local is the local diversity? Reinforcing sequential determinantal point processes with dynamic ground sets for supervised video summarization. In: European conference on computer vision
He X, Hua Y, Song T, Zhang Z, Xue Z, Ma R, Robertson N, Guan H (2019) Unsupervised video summarization with attentive conditional generative adversarial networks. In: Proceedings of the 27th ACM international conference on multimedia (MM ‘19). Association for Computing Machinery, New York, NY, pp 2296–2304. https://doi.org/10.1145/3343031.3351056
Rochan M, Ye L, Wang Y (2018) Video summarization using fully convolutional sequence networks. In: European conference on computer vision (ECCV-2018)
Rani S, Kumar M (2020) Social media video summarization using multi-visual features and Kohnen’s self organizing map. Inf Process Manag 57(3). https://doi.org/10.1016/j.ipm.2019.102190
Kopf J, Cohen MF, Szeliski R (2014) First-person hyper-lapse videos. ACM Trans Graph 33(4):78:1–78:10. [Online]. http://doi.acm.org/10.1145/2601097.2601195
Wang J, Wang Y, Zhang Z (2011) Visual saliency based aerial video summarization by online scene classification. In: 2011 sixth international conference on image and graphics, Hefei, pp 777–782. https://doi.org/10.1109/ICIG.2011.43
Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Vis Commun Image Represent 23(7):1031–1040. https://doi.org/10.1016/j.jvcir.2012.06.013
Elhamifar E, Kaluza MC (2017) Online summarization via submodular and convex optimization. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, pp 1818–1826. https://doi.org/10.1109/CVPR.2017.197
Ejaz N, Baik S, Majeed H, Mehmood I, Chang H (2018) Multi-scale contrast and relative motion-based key frame extraction. EURASIP J Image Video Process 40. https://doi.org/10.1186/s13640-018-0280-z
Ejaz N, Mehmood I, Baik S (2013) Efficient visual attention based framework for extracting key frames. Signal Process Image Commun 28:34–44
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: IEEE conference on computer vision and pattern recognition (CVPR)
Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles. In: IEEE conference on computer vision and pattern recognition
Cai S, Zuo W, Davis LS, Zhang L (2018) Weakly-supervised video summarization using variational encoder-decoder and web prior. In: European conference on computer vision (ECCV-2018)
Mehmood I, Sajjad M, Rho S, Baik SW (2016) Divide-and-conquer based summarization framework for extracting affective video content. Neurocomputing 174:393–403
Zhang K, Grauman K, Sha F (2018) Retrospective encoders for video summarization. In: European conference on computer vision
Fajtl J, Sokeh HS, Argyriou V, Monekosso D, Remagnino P (2019) Summarizing videos with attention. In: Carneiro G, You S (eds) Computer vision—ACCV 2018 workshops. ACCV 2018. Lecture notes in computer science, vol 11367. Springer, Cham. https://doi.org/10.1007/978-3-030-21074-8_4
Zhao B, Li X, Lu X (2017) Hierarchical recurrent neural network for video summarization. In: ACM multimedia
Ji Z, Xiong K, Pang Y, Li X (2020) Video summarization with attention-based encoder–decoder networks. IEEE Trans Circuits Syst Video Technol 30(6):1709–1717. https://doi.org/10.1109/TCSVT.2019.2904996
Zhang Y, Liang X, Dingwen Z, Tan M, Xing EP (2018) Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recogn Lett
Zhou K, Qiao Y, Xiang T (2017) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. arXiv preprint arXiv: 1801.00054
Zhang K, Chao WLF, Grauman K (2016) Summary transfer: exemplar-based subset selection for video summarization. In: Proceeding IEEE conference on computer vision and pattern recognition (CVPR), Dec 2016, vol 2016, pp 1059–1067
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceeding international conference on learning representations. http://arxiv.org/abs/1409.0473
Xu K, Ba JL, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: Proceeding of international conference on learning representations. http://arxiv.org/abs/1502.03044
Zhang Y, Qiu Z, Yao T, Liu D, Mei T (2018) Fully convolutional adaptation networks for semantic segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR-2018), pp 6810–6818
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 7794–7803
Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: ECCV—European conference on computer vision, Zurich, Sept 2014, pp 540–555. https://doi.org/10.1007/978-3-319-10599-4_35
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gupta, D., Sharma, A. (2021). Attentive Convolution Network-Based Video Summarization. In: Choudhary, A., Agrawal, A.P., Logeswaran, R., Unhelkar, B. (eds) Applications of Artificial Intelligence and Machine Learning. Lecture Notes in Electrical Engineering, vol 778. Springer, Singapore. https://doi.org/10.1007/978-981-16-3067-5_25
Download citation
DOI: https://doi.org/10.1007/978-981-16-3067-5_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3066-8
Online ISBN: 978-981-16-3067-5
eBook Packages: Computer ScienceComputer Science (R0)