Attentive Convolution Network-Based Video Summarization

Gupta, Deeksha; Sharma, Akashdeep

doi:10.1007/978-981-16-3067-5_25

Deeksha Gupta^39,40 &
Akashdeep Sharma³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 778))

1477 Accesses
3 Citations

Abstract

The availability of smart phones with embedded video capturing mechanisms along with gigantic storage facilities has led to generation of a plethora of videos. This deluge of videos grasped the attention of the computer vision research community to deal with the problem of efficient browsing, indexing, and retrieving the intended video. Video summarization has come up as a solution to aforementioned issues where a short summary video is generated containing important information from the original video. This paper proposes a supervised attentive convolution network for summarization (ACN-SUM) framework for binary labeling of video frames. ACN-SUM is based on encoder–decoder architecture where the encoder is an attention-aware convolution network module, while the decoder comprises the deconvolution network module. In ACN-SUM, the self-attention module captures the long-range temporal dependencies among frames and concatenation of convolution network and attention module feature map result in more informative encoded frame descriptors. These encoded features are passed to the deconvolution module to generate frames labeling for keyframe selection. Experimental results demonstrate the efficiency of the proposed model against state-of-the-art methods. The performance of the proposed network has been evaluated on two benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Video Summarization Using Fully Convolutional Residual Dense Network

A two-stage attention augmented fully convolutional network-based dynamic video summarization

Article 21 August 2023

Deep Learning Framework Based on Audio–Visual Features for Video Summarization

References

Cisco (2020) Cisco annual internet report (2018–2023). Cisco, pp 1–41
Google Scholar
Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):3:1–3:37
Google Scholar
Feng Y, Yuan Y, Lu X (2017) Learning deep event models for crowd anomaly detection. Neurocomputing 219:548–556
Article Google Scholar
Tejero-de-Pablos A, Nakashima Y, Sato T, Yokoya N, Linna M, Rahtu E (2018) Summarization of user-generated sports video by using deep action recognition features. IEEE Trans Multimed 20(8):2000–2011. https://doi.org/10.1109/TMM.2018.2794265
Mundur P, Rao Y, Yesha Y (2006) Keyframe-based video summarization using Delaunay clustering. Int J Digit Libr 6:219–232
Article Google Scholar
Avila S, Brandaolopes A, Luz A, Araujo A (2011) VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68
Article Google Scholar
Furini M, Geraci F, Montangero M, Pellegrini M (2010) STIMO: STIll and MOving video storyboard for the web scenario. Multimed Tools Appl 46:47. https://doi.org/10.1007/s11042-009-0307-7
Article Google Scholar
Mahmoud KM, Ismail MA, Ghanem NM (2013) VSCAN: an enhanced video summarization using density-based spatial clustering. In: Petrosino A (eds) Image analysis and processing—ICIAP 2013. Lecture notes in computer science, vol 8156. Springer, Berlin, Heidelberg
Google Scholar
Wu J, Zhong S, Jiang J, Yang Y (2017) A novel clustering method for static video summarization. Multimed Tools Appl 76:9625–9641. https://doi.org/10.1007/s11042-016-3569-x
Article Google Scholar
Kumar K, Shrimankar DD, Singh N (2018) Eratosthenes sieve based key-frame extraction technique for event summarization in videos. Multimed Tools Appl 77:7383–7404
Article Google Scholar
Shroff N, Turaga SP, Chellappa R (2010) Video précis: highlighting diverse aspects of videos. IEEE Trans Multimed 12(8):853–868
Google Scholar
Mahmoud KM, Ghanem NM, Ismail MA (2013) VGRAPH: an effective approach for generating static video summaries. In: 2013 IEEE international conference on computer vision workshops, Sydney, NSW, pp 811–818. https://doi.org/10.1109/ICCVW.2013.111
Iparraguirre J, Delrieux C (2013) Speeded-up video summarization based on local features. In: 2013 IEEE international symposium on multimedia, Anaheim, CA, pp 370–373. https://doi.org/10.1109/ISM.2013.70
Srinivas M, Pai MM, Pai RM (2016) An improved algorithm for video summarization a rank based approach. Procedia Comput Sci 89:812–819
Google Scholar
Gong B, Grauman K (2014) Diverse sequential subset selection for supervised video summarization. Adv Neural Inf Process Syst 3:2069–2077
Google Scholar
Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: 2013 IEEE conference on computer vision and pattern recognition, Portland, OR, pp 2714–2721. https://doi.org/10.1109/CVPR.2013.350
Sharghi A, Borji A, Li C, Yang T, Gong B (2018) Improving sequential determinantal point processes for supervised video summarization. In: Lecture notes computer science (including Lecture notes artificial intelligence. Lecture notes bioinformatics) LNCS, vol 11207, pp 533–550
Google Scholar
Zhang K, Chao WL, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: ECCV. Springer, pp 766–782
Google Scholar
Yujia Z, Kampffmeyer M, Zhao X, Tan M (2019) DTR-GAN: dilated temporal relational adversarial network for video summarization. In: ACM TURC ‘19: proceedings of the ACM turing celebration conference—China, pp 1–6. https://doi.org/10.1145/3321408.3322622
Lei Z, Zhang C, Zhang Q, Qiu G (2019) FrameRank: a text processing approach to video summarization. In: IEEE international conference on multimedia and expo (ICME), Shanghai, pp 368–373. https://doi.org/10.1109/ICME.2019.00071
Wei H, Ni B, Yan Y, Yu H, Yang X, Yao C (2018) Video summarization via semantic attended networks. In: Proceeding 22nd AAAI conference, artificial intelligence, pp 216–223
Google Scholar
Gygli M, Grabner H, Riemenschneider H, Gool NV (2014) Creating summaries from user videos. In: Proceeding European conference on computer vision. Springer, pp 505–520
Google Scholar
Asadi E, Charkari NM (2012) Video summarization using fuzzy c-means clustering. In: 20th Iranian conference on electrical engineering (ICEE2012), Tehran, pp 690–694. https://doi.org/10.1109/IranianCEE.2012.6292442
Viguier R, Lin CC (2015) Automatic video content summarization using geospatial mosaics of aerial imagery. In: 2015 IEEE international symposium on multimedia (ISM), Miami, FL, pp 249–253. https://doi.org/10.1109/ISM.2015.124
Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial LSTM networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR 2017, vol 2017, Jan 2017, pp 2982–2991
Google Scholar
Garcia A, Boix X, Lim J, Tan A (2012) Active video summarization: customized summaries via on-line interaction with the user. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 4046–4052
Google Scholar
Anirudh R, Masroor A, Turaga P (2016) Diversity promoting online sampling for streaming video summarization. In: 2016 IEEE international conference on image processing (ICIP), Phoenix, AZ, pp 3329–3333. https://doi.org/10.1109/ICIP.2016.7532976
Sharghi A, Borji A, Li C, Yang T, Gong B (2018) Improving sequential determinantal point processes for supervised video summarization. In: Proceedings of the European conference on computer vision (ECCV), pp 517–533
Google Scholar
Yandong L, Wang L, Yang T, Gong B (2018) How local is the local diversity? Reinforcing sequential determinantal point processes with dynamic ground sets for supervised video summarization. In: European conference on computer vision
Google Scholar
He X, Hua Y, Song T, Zhang Z, Xue Z, Ma R, Robertson N, Guan H (2019) Unsupervised video summarization with attentive conditional generative adversarial networks. In: Proceedings of the 27th ACM international conference on multimedia (MM ‘19). Association for Computing Machinery, New York, NY, pp 2296–2304. https://doi.org/10.1145/3343031.3351056
Rochan M, Ye L, Wang Y (2018) Video summarization using fully convolutional sequence networks. In: European conference on computer vision (ECCV-2018)
Google Scholar
Rani S, Kumar M (2020) Social media video summarization using multi-visual features and Kohnen’s self organizing map. Inf Process Manag 57(3). https://doi.org/10.1016/j.ipm.2019.102190
Kopf J, Cohen MF, Szeliski R (2014) First-person hyper-lapse videos. ACM Trans Graph 33(4):78:1–78:10. [Online]. http://doi.acm.org/10.1145/2601097.2601195
Wang J, Wang Y, Zhang Z (2011) Visual saliency based aerial video summarization by online scene classification. In: 2011 sixth international conference on image and graphics, Hefei, pp 777–782. https://doi.org/10.1109/ICIG.2011.43
Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Vis Commun Image Represent 23(7):1031–1040. https://doi.org/10.1016/j.jvcir.2012.06.013
Elhamifar E, Kaluza MC (2017) Online summarization via submodular and convex optimization. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, pp 1818–1826. https://doi.org/10.1109/CVPR.2017.197
Ejaz N, Baik S, Majeed H, Mehmood I, Chang H (2018) Multi-scale contrast and relative motion-based key frame extraction. EURASIP J Image Video Process 40. https://doi.org/10.1186/s13640-018-0280-z
Ejaz N, Mehmood I, Baik S (2013) Efficient visual attention based framework for extracting key frames. Signal Process Image Commun 28:34–44
Article Google Scholar
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar
Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles. In: IEEE conference on computer vision and pattern recognition
Google Scholar
Cai S, Zuo W, Davis LS, Zhang L (2018) Weakly-supervised video summarization using variational encoder-decoder and web prior. In: European conference on computer vision (ECCV-2018)
Google Scholar
Mehmood I, Sajjad M, Rho S, Baik SW (2016) Divide-and-conquer based summarization framework for extracting affective video content. Neurocomputing 174:393–403
Google Scholar
Zhang K, Grauman K, Sha F (2018) Retrospective encoders for video summarization. In: European conference on computer vision
Google Scholar
Fajtl J, Sokeh HS, Argyriou V, Monekosso D, Remagnino P (2019) Summarizing videos with attention. In: Carneiro G, You S (eds) Computer vision—ACCV 2018 workshops. ACCV 2018. Lecture notes in computer science, vol 11367. Springer, Cham. https://doi.org/10.1007/978-3-030-21074-8_4
Zhao B, Li X, Lu X (2017) Hierarchical recurrent neural network for video summarization. In: ACM multimedia
Google Scholar
Ji Z, Xiong K, Pang Y, Li X (2020) Video summarization with attention-based encoder–decoder networks. IEEE Trans Circuits Syst Video Technol 30(6):1709–1717. https://doi.org/10.1109/TCSVT.2019.2904996
Zhang Y, Liang X, Dingwen Z, Tan M, Xing EP (2018) Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recogn Lett
Google Scholar
Zhou K, Qiao Y, Xiang T (2017) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. arXiv preprint arXiv: 1801.00054
Google Scholar
Zhang K, Chao WLF, Grauman K (2016) Summary transfer: exemplar-based subset selection for video summarization. In: Proceeding IEEE conference on computer vision and pattern recognition (CVPR), Dec 2016, vol 2016, pp 1059–1067
Google Scholar
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceeding international conference on learning representations. http://arxiv.org/abs/1409.0473
Xu K, Ba JL, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: Proceeding of international conference on learning representations. http://arxiv.org/abs/1502.03044
Zhang Y, Qiu Z, Yao T, Liu D, Mei T (2018) Fully convolutional adaptation networks for semantic segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR-2018), pp 6810–6818
Google Scholar
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 7794–7803
Google Scholar
Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: ECCV—European conference on computer vision, Zurich, Sept 2014, pp 540–555. https://doi.org/10.1007/978-3-319-10599-4_35

Download references

Author information

Authors and Affiliations

University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Deeksha Gupta & Akashdeep Sharma
Mehr Chand Mahajan DAV College for Women, Panjab University, Chandigarh, India
Deeksha Gupta

Authors

Deeksha Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Akashdeep Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akashdeep Sharma .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Sharda University, Greater Noida, Uttar Pradesh, India
Ankur Choudhary
Department of Computer Science and Engineering, Sharda University, Greater Noida, Uttar Pradesh, India
Arun Prakash Agrawal
Asia Pacific Centre for Analytics (APCA), Asia Pacific University of Technology and Innovation (APU), Kuala Lumpur, Malaysia
Rajasvaran Logeswaran
Information Technology, University of South Florida Sarasota–Manatee Campus, Sarasota, FL, USA
Bhuvan Unhelkar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, D., Sharma, A. (2021). Attentive Convolution Network-Based Video Summarization. In: Choudhary, A., Agrawal, A.P., Logeswaran, R., Unhelkar, B. (eds) Applications of Artificial Intelligence and Machine Learning. Lecture Notes in Electrical Engineering, vol 778. Springer, Singapore. https://doi.org/10.1007/978-981-16-3067-5_25

Download citation

DOI: https://doi.org/10.1007/978-981-16-3067-5_25
Published: 27 July 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3066-8
Online ISBN: 978-981-16-3067-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Attentive Convolution Network-Based Video Summarization

Abstract

Access this chapter

Similar content being viewed by others

Video Summarization Using Fully Convolutional Residual Dense Network

A two-stage attention augmented fully convolutional network-based dynamic video summarization

Deep Learning Framework Based on Audio–Visual Features for Video Summarization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Attentive Convolution Network-Based Video Summarization

Abstract

Access this chapter

Similar content being viewed by others

Video Summarization Using Fully Convolutional Residual Dense Network

A two-stage attention augmented fully convolutional network-based dynamic video summarization

Deep Learning Framework Based on Audio–Visual Features for Video Summarization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation