Skip to main content

Attentive Convolution Network-Based Video Summarization

  • Conference paper
  • First Online:
Applications of Artificial Intelligence and Machine Learning

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 778))

Abstract

The availability of smart phones with embedded video capturing mechanisms along with gigantic storage facilities has led to generation of a plethora of videos. This deluge of videos grasped the attention of the computer vision research community to deal with the problem of efficient browsing, indexing, and retrieving the intended video. Video summarization has come up as a solution to aforementioned issues where a short summary video is generated containing important information from the original video. This paper proposes a supervised attentive convolution network for summarization (ACN-SUM) framework for binary labeling of video frames. ACN-SUM is based on encoder–decoder architecture where the encoder is an attention-aware convolution network module, while the decoder comprises the deconvolution network module. In ACN-SUM, the self-attention module captures the long-range temporal dependencies among frames and concatenation of convolution network and attention module feature map result in more informative encoded frame descriptors. These encoded features are passed to the deconvolution module to generate frames labeling for keyframe selection. Experimental results demonstrate the efficiency of the proposed model against state-of-the-art methods. The performance of the proposed network has been evaluated on two benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cisco (2020) Cisco annual internet report (2018–2023). Cisco, pp 1–41

    Google Scholar 

  2. Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):3:1–3:37

    Google Scholar 

  3. Feng Y, Yuan Y, Lu X (2017) Learning deep event models for crowd anomaly detection. Neurocomputing 219:548–556

    Article  Google Scholar 

  4. Tejero-de-Pablos A, Nakashima Y, Sato T, Yokoya N, Linna M, Rahtu E (2018) Summarization of user-generated sports video by using deep action recognition features. IEEE Trans Multimed 20(8):2000–2011. https://doi.org/10.1109/TMM.2018.2794265

  5. Mundur P, Rao Y, Yesha Y (2006) Keyframe-based video summarization using Delaunay clustering. Int J Digit Libr 6:219–232

    Article  Google Scholar 

  6. Avila S, Brandaolopes A, Luz A, Araujo A (2011) VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68

    Article  Google Scholar 

  7. Furini M, Geraci F, Montangero M, Pellegrini M (2010) STIMO: STIll and MOving video storyboard for the web scenario. Multimed Tools Appl 46:47. https://doi.org/10.1007/s11042-009-0307-7

    Article  Google Scholar 

  8. Mahmoud KM, Ismail MA, Ghanem NM (2013) VSCAN: an enhanced video summarization using density-based spatial clustering. In: Petrosino A (eds) Image analysis and processing—ICIAP 2013. Lecture notes in computer science, vol 8156. Springer, Berlin, Heidelberg

    Google Scholar 

  9. Wu J, Zhong S, Jiang J, Yang Y (2017) A novel clustering method for static video summarization. Multimed Tools Appl 76:9625–9641. https://doi.org/10.1007/s11042-016-3569-x

    Article  Google Scholar 

  10. Kumar K, Shrimankar DD, Singh N (2018) Eratosthenes sieve based key-frame extraction technique for event summarization in videos. Multimed Tools Appl 77:7383–7404

    Article  Google Scholar 

  11. Shroff N, Turaga SP, Chellappa R (2010) Video précis: highlighting diverse aspects of videos. IEEE Trans Multimed 12(8):853–868

    Google Scholar 

  12. Mahmoud KM, Ghanem NM, Ismail MA (2013) VGRAPH: an effective approach for generating static video summaries. In: 2013 IEEE international conference on computer vision workshops, Sydney, NSW, pp 811–818. https://doi.org/10.1109/ICCVW.2013.111

  13. Iparraguirre J, Delrieux C (2013) Speeded-up video summarization based on local features. In: 2013 IEEE international symposium on multimedia, Anaheim, CA, pp 370–373. https://doi.org/10.1109/ISM.2013.70

  14. Srinivas M, Pai MM, Pai RM (2016) An improved algorithm for video summarization a rank based approach. Procedia Comput Sci 89:812–819

    Google Scholar 

  15. Gong B, Grauman K (2014) Diverse sequential subset selection for supervised video summarization. Adv Neural Inf Process Syst 3:2069–2077

    Google Scholar 

  16. Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: 2013 IEEE conference on computer vision and pattern recognition, Portland, OR, pp 2714–2721. https://doi.org/10.1109/CVPR.2013.350

  17. Sharghi A, Borji A, Li C, Yang T, Gong B (2018) Improving sequential determinantal point processes for supervised video summarization. In: Lecture notes computer science (including Lecture notes artificial intelligence. Lecture notes bioinformatics) LNCS, vol 11207, pp 533–550

    Google Scholar 

  18. Zhang K, Chao WL, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: ECCV. Springer, pp 766–782

    Google Scholar 

  19. Yujia Z, Kampffmeyer M, Zhao X, Tan M (2019) DTR-GAN: dilated temporal relational adversarial network for video summarization. In: ACM TURC ‘19: proceedings of the ACM turing celebration conference—China, pp 1–6. https://doi.org/10.1145/3321408.3322622

  20. Lei Z, Zhang C, Zhang Q, Qiu G (2019) FrameRank: a text processing approach to video summarization. In: IEEE international conference on multimedia and expo (ICME), Shanghai, pp 368–373. https://doi.org/10.1109/ICME.2019.00071

  21. Wei H, Ni B, Yan Y, Yu H, Yang X, Yao C (2018) Video summarization via semantic attended networks. In: Proceeding 22nd AAAI conference, artificial intelligence, pp 216–223

    Google Scholar 

  22. Gygli M, Grabner H, Riemenschneider H, Gool NV (2014) Creating summaries from user videos. In: Proceeding European conference on computer vision. Springer, pp 505–520

    Google Scholar 

  23. Asadi E, Charkari NM (2012) Video summarization using fuzzy c-means clustering. In: 20th Iranian conference on electrical engineering (ICEE2012), Tehran, pp 690–694. https://doi.org/10.1109/IranianCEE.2012.6292442

  24. Viguier R, Lin CC (2015) Automatic video content summarization using geospatial mosaics of aerial imagery. In: 2015 IEEE international symposium on multimedia (ISM), Miami, FL, pp 249–253. https://doi.org/10.1109/ISM.2015.124

  25. Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial LSTM networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR 2017, vol 2017, Jan 2017, pp 2982–2991

    Google Scholar 

  26. Garcia A, Boix X, Lim J, Tan A (2012) Active video summarization: customized summaries via on-line interaction with the user. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 4046–4052

    Google Scholar 

  27. Anirudh R, Masroor A, Turaga P (2016) Diversity promoting online sampling for streaming video summarization. In: 2016 IEEE international conference on image processing (ICIP), Phoenix, AZ, pp 3329–3333. https://doi.org/10.1109/ICIP.2016.7532976

  28. Sharghi A, Borji A, Li C, Yang T, Gong B (2018) Improving sequential determinantal point processes for supervised video summarization. In: Proceedings of the European conference on computer vision (ECCV), pp 517–533

    Google Scholar 

  29. Yandong L, Wang L, Yang T, Gong B (2018) How local is the local diversity? Reinforcing sequential determinantal point processes with dynamic ground sets for supervised video summarization. In: European conference on computer vision

    Google Scholar 

  30. He X, Hua Y, Song T, Zhang Z, Xue Z, Ma R, Robertson N, Guan H (2019) Unsupervised video summarization with attentive conditional generative adversarial networks. In: Proceedings of the 27th ACM international conference on multimedia (MM ‘19). Association for Computing Machinery, New York, NY, pp 2296–2304. https://doi.org/10.1145/3343031.3351056

  31. Rochan M, Ye L, Wang Y (2018) Video summarization using fully convolutional sequence networks. In: European conference on computer vision (ECCV-2018)

    Google Scholar 

  32. Rani S, Kumar M (2020) Social media video summarization using multi-visual features and Kohnen’s self organizing map. Inf Process Manag 57(3). https://doi.org/10.1016/j.ipm.2019.102190

  33. Kopf J, Cohen MF, Szeliski R (2014) First-person hyper-lapse videos. ACM Trans Graph 33(4):78:1–78:10. [Online]. http://doi.acm.org/10.1145/2601097.2601195

  34. Wang J, Wang Y, Zhang Z (2011) Visual saliency based aerial video summarization by online scene classification. In: 2011 sixth international conference on image and graphics, Hefei, pp 777–782. https://doi.org/10.1109/ICIG.2011.43

  35. Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Vis Commun Image Represent 23(7):1031–1040. https://doi.org/10.1016/j.jvcir.2012.06.013

  36. Elhamifar E, Kaluza MC (2017) Online summarization via submodular and convex optimization. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, pp 1818–1826. https://doi.org/10.1109/CVPR.2017.197

  37. Ejaz N, Baik S, Majeed H, Mehmood I, Chang H (2018) Multi-scale contrast and relative motion-based key frame extraction. EURASIP J Image Video Process 40. https://doi.org/10.1186/s13640-018-0280-z

  38. Ejaz N, Mehmood I, Baik S (2013) Efficient visual attention based framework for extracting key frames. Signal Process Image Commun 28:34–44

    Article  Google Scholar 

  39. Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  40. Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles. In: IEEE conference on computer vision and pattern recognition

    Google Scholar 

  41. Cai S, Zuo W, Davis LS, Zhang L (2018) Weakly-supervised video summarization using variational encoder-decoder and web prior. In: European conference on computer vision (ECCV-2018)

    Google Scholar 

  42. Mehmood I, Sajjad M, Rho S, Baik SW (2016) Divide-and-conquer based summarization framework for extracting affective video content. Neurocomputing 174:393–403

    Google Scholar 

  43. Zhang K, Grauman K, Sha F (2018) Retrospective encoders for video summarization. In: European conference on computer vision

    Google Scholar 

  44. Fajtl J, Sokeh HS, Argyriou V, Monekosso D, Remagnino P (2019) Summarizing videos with attention. In: Carneiro G, You S (eds) Computer vision—ACCV 2018 workshops. ACCV 2018. Lecture notes in computer science, vol 11367. Springer, Cham. https://doi.org/10.1007/978-3-030-21074-8_4

  45. Zhao B, Li X, Lu X (2017) Hierarchical recurrent neural network for video summarization. In: ACM multimedia

    Google Scholar 

  46. Ji Z, Xiong K, Pang Y, Li X (2020) Video summarization with attention-based encoder–decoder networks. IEEE Trans Circuits Syst Video Technol 30(6):1709–1717. https://doi.org/10.1109/TCSVT.2019.2904996

  47. Zhang Y, Liang X, Dingwen Z, Tan M, Xing EP (2018) Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recogn Lett

    Google Scholar 

  48. Zhou K, Qiao Y, Xiang T (2017) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. arXiv preprint arXiv: 1801.00054

    Google Scholar 

  49. Zhang K, Chao WLF, Grauman K (2016) Summary transfer: exemplar-based subset selection for video summarization. In: Proceeding IEEE conference on computer vision and pattern recognition (CVPR), Dec 2016, vol 2016, pp 1059–1067

    Google Scholar 

  50. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceeding international conference on learning representations. http://arxiv.org/abs/1409.0473

  51. Xu K, Ba JL, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: Proceeding of international conference on learning representations. http://arxiv.org/abs/1502.03044

  52. Zhang Y, Qiu Z, Yao T, Liu D, Mei T (2018) Fully convolutional adaptation networks for semantic segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR-2018), pp 6810–6818

    Google Scholar 

  53. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 7794–7803

    Google Scholar 

  54. Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: ECCV—European conference on computer vision, Zurich, Sept 2014, pp 540–555. https://doi.org/10.1007/978-3-319-10599-4_35

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akashdeep Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gupta, D., Sharma, A. (2021). Attentive Convolution Network-Based Video Summarization. In: Choudhary, A., Agrawal, A.P., Logeswaran, R., Unhelkar, B. (eds) Applications of Artificial Intelligence and Machine Learning. Lecture Notes in Electrical Engineering, vol 778. Springer, Singapore. https://doi.org/10.1007/978-981-16-3067-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-3067-5_25

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-3066-8

  • Online ISBN: 978-981-16-3067-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics