A multi-stage deep adversarial network for video summarization with knowledge distillation

Sreeja, M. U.; Kovoor, Binsu C.

doi:10.1007/s12652-021-03641-8

A multi-stage deep adversarial network for video summarization with knowledge distillation

Original Research
Published: 24 January 2022

Volume 14, pages 9823–9838, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

688 Accesses
8 Citations
Explore all metrics

Abstract

Video summarization is defined as the process of automatically identifying and extracting the relevant contents from a video that can best represent the contents of the video. The proposed model implements a video summarization framework based on generative adversarial network (GAN) for feature extraction and knowledge distillation for key frame or segment selection. The ideal characteristics of a video summary is diversity and representativeness. The primary stage of the proposed model based on adversarial learning ensures that the extracted features contain diverse and representative elements from the video. The generator is a convolutional recurrent autoencoder that learns the hidden representation of the video through the reconstruction loss. The generator model is followed by a discriminator that aims at improving the efficiency of the generator model by trying to discriminate between the original and reconstructed video samples. The adversarial network is followed by a knowledge distillation phase which acts as a key frame or segment selector by employing a simple network whose input data is retrieved from the preceding GAN model. Comprehensive evaluations conducted on public and custom datasets substantiate the relevance of GANs and knowledge distillation phase for video summarization. Quantitative and qualitative evaluations further prove that the proposed model produces remarkable results with summaries that are diverse, representative and concise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dilated temporal relational adversarial network for generic video summarization

Article 12 October 2019

Encoder-Decoder Architectures based Video Summarization using Key-Shot Selection Model

Article 16 September 2023

Unsupervised Video Summarization via Attention-Driven Adversarial Learning

Availability of data

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

Apostolidis E, Adamantidou E, Metsai AI, Mezaris V, Patras I (2020a) Unsupervised video summarization via attention-driven adversarial learning. In: International Conference on multimedia modeling, pp 492–504
Apostolidis E, Adamantidou E, Metsai AI, Mezaris V, Patras I (2020b) AC-SUM-GAN: connecting actor-critic and generative adversarial networks for unsupervised video summarization. IEEE Trans Circ Syst Video Technol 31(8):3278–3292
Article Google Scholar
Cai S, Zuo W, Davis LS, Zhang L (2018) Weakly-supervised video summarization using variational encoder-decoder and web prior. In: Proceedings of the European conference on computer vision (ECCV), pp 184–200
De Avila SEF, Lopes APB, da Luz Jr A, de Albuquerque Araújo A (2011) VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68
Article Google Scholar
Fu TJ, Tai SH, Chen HT (2019) Attentive and adversarial learning for video summarization. In: 2019 IEEE Winter Conference on applications of computer vision (WACV), pp 1579–1587. IEEE
Han MX, Hu HM, Liu Y, Zhang C, Tian RP, Zheng J (2017) An auto-encoder-based summarization algorithm for unstructured videos. Multimed Tools Appl 76(23):25039–25056
Article Google Scholar
He X, Hua Y, Song T, Zhang Z, Xue Z, Ma R, Guan H (2019) Unsupervised video summarization with attentive conditional generative adversarial networks. In: Proceedings of the 27th ACM International Conference on multimedia, pp 2296–2304
Huang C, Wang H (2019) A novel key-frames selection framework for comprehensive video summarization. IEEE Trans Circ Syst Video Technol 30(2):577–589
Article Google Scholar
Huang S, Li X, Zhang Z, Wu F, Han J (2018) User-ranking video summarization with multi-stage spatio–temporal representation. IEEE Trans Image Process 28(6):2654–2664
Article MathSciNet MATH Google Scholar
Jappie Z, Torpey D, Celik T (2020) SummaryNet: a multi-stage deep learning model for automatic video summarisation. arXiv preprint arXiv:2002.09424
Ji Z, Zhao Y, Pang Y, Li X, Han J (2020) Deep attentive video summarization with distribution consistency learning. IEEE Trans Neural Netw Learn Syst 32(4):1765–1775
Article Google Scholar
Jung Y, Cho D, Kim D, Woo S, Kweon IS (2019) Discriminative feature learning for unsupervised video summarization. In: Proceedings of the AAAI Conference on artificial intelligence 33(1): 8537–8544
Lal S, Duggal S, Sreedevi I (2019) Online video summarization: Predicting future to better summarize present. In: 2019 IEEE Winter Conference on applications of computer vision (WACV), pp 471–480. IEEE
Li X, Du Z, Huang Y, Tan Z (2021) A deep translation (GAN) based change detection network for optical and SAR remote sensing images. ISPRS J Photogramm Remote Sens 179:14–34
Article Google Scholar
Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2019) Video anomaly detection with sparse coding inspired deep neural networks. IEEE Trans on Pattern Anal Mach Intell 43(3):1070–1084
Article Google Scholar
Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 202–211
Mukhiddin T, Lee W, Lee S, Rashid T (2020) [IEEE 2020 IEEE International Conference on Big Data and Smart Computing (BigComp)-Busan, Korea (South) (2020.2.19–2020.2.22)] 2020 IEEE International Conference on Big Data and Smart Computing (BigComp)-Research Issues on Generative Adversarial Networks and Applications, pp 487–488
Nair MS, Mohan J (2021) Static video summarization using multi-CNN with sparse autoencoder and random forest classifier. SIViP 15(4):735–742
Article Google Scholar
Nawaratne R, Alahakoon D, De Silva D, Yu X (2019) Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Trans Ind Inf 16(1):393–402
Article Google Scholar
Pantazis G, Dimas G, Iakovidis DK (2020) SalSum: Saliency-based Video Summarization using Generative Adversarial Networks. arXiv preprint arXiv:2011.10432
Ren J, Shen X, Lin Z, Mech R (2020) Best frame selection in a short video. In: Proceedings of the IEEE/CVF Winter Conference on applications of computer vision, pp 3212–3221
Rochan M, Ye L, Wang Y (2018) Video summarization using fully convolutional sequence networks. In: Proceedings of the European Conference on computer vision (ECCV), pp 347–363
Shi Q, Liu M, Liu X, Liu P, Zhang P, Yang J, Li X (2020) Domain adaption for fine-grained urban village extraction from satellite images. IEEE Geosci Remote Sens Lett 17(8):1430–1434
Article Google Scholar
Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5179–5187
Sreeja MU, Kovoor BC (2019) Towards genre-specific frameworks for video summarisation: a survey. J vis Commun Image Represent 62:340–358
Article Google Scholar
Sreeja MU, Kovoor BC (2021) A unified model for egocentric video summarization: an instance-based approach. Comput Electric Eng 92:107161
Article Google Scholar
Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 6479–6488
Thomas SS, Gupta S, Subramanian VK (2018) Context driven optimized perceptual video summarization and retrieval. IEEE Trans Circ Syst Video Technol 29(10):3132–3145
Article Google Scholar
Turnes JN, Castro JDB, Torres DL, Vega PJS, Feitosa RQ, Happ PN (2020) Atrous cgan for sar to optical image translation. IEEE Geosci Remote Sens Lett, pp 1–5
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Xingjian SHI, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810
Yanagi R, Togo R, Ogawa T, Haseyama M (2019) Scene retrieval for video summarization based on text-to-image GAN. In: 2019 IEEE International Conference on image processing (ICIP), pp 1825–1829. IEEE
Yuan L, Tay FEH, Li P, Feng J (2019a) Unsupervised video summarization with cycle-consistent adversarial LSTM networks. IEEE Trans Multimed 22(10):2711–2722
Article Google Scholar
Yuan L, Tay FE, Li P, Zhou L, Feng, J (2019b) Cycle-sum: cycle-consistent adversarial lstm networks for unsupervised video summarization. In: Proceedings of AAAI Conference on artificial intelligence 33(1): 9143–9150.
Zhang Y, Liang X, Zhang D, Tan M, Xing EP (2020) Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recogn Lett 130:376–385
Article Google Scholar
Zhao B, Li X, Lu X (2017) Hierarchical recurrent neural network for video summarization. In: Proceedings of the 25th ACM International Conference on multimedia, pp 863–871
Zhao B, Li X, Lu X (2018) Hsa-rnn: hierarchical structure-adaptive rnn for video summarization. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 7405–7414
Zhou P, Xu T, Yin Z, Liu D, Chen E, Lv G, Li C (2019) Character-oriented video summarization with visual and textual cues. IEEE Trans Multimedia 22(10):2684–2697
Article Google Scholar
Zhu W, Lu J, Li J, Zhou J (2020) DSNet: a flexible detect-to-summarize network for video summarization. IEEE Trans Image Process 30:948–962
Article Google Scholar

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Division of Information Technology, Cochin University of Science and Technology, Kochi, Kerala, India
M. U. Sreeja & Binsu C. Kovoor

Authors

M. U. Sreeja
View author publications
You can also search for this author in PubMed Google Scholar
Binsu C. Kovoor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. U. Sreeja.

Ethics declarations

Conflicts of interest/Competing interests

The authors declare that they have no conflict of interest.

Consent for publication

We give our consent for this article to be published in the Springer Journal of Ambient Intelligence and Humanized Computing.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sreeja, M.U., Kovoor, B.C. A multi-stage deep adversarial network for video summarization with knowledge distillation. J Ambient Intell Human Comput 14, 9823–9838 (2023). https://doi.org/10.1007/s12652-021-03641-8

Download citation

Received: 19 July 2021
Accepted: 01 December 2021
Published: 24 January 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s12652-021-03641-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-stage deep adversarial network for video summarization with knowledge distillation

Abstract

Access this article

Similar content being viewed by others

Dilated temporal relational adversarial network for generic video summarization

Encoder-Decoder Architectures based Video Summarization using Key-Shot Selection Model

Unsupervised Video Summarization via Attention-Driven Adversarial Learning

Availability of data

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A multi-stage deep adversarial network for video summarization with knowledge distillation

Abstract

Access this article

Similar content being viewed by others

Dilated temporal relational adversarial network for generic video summarization

Encoder-Decoder Architectures based Video Summarization using Key-Shot Selection Model

Unsupervised Video Summarization via Attention-Driven Adversarial Learning

Availability of data

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation