Skip to main content
Log in

A multi-stage deep adversarial network for video summarization with knowledge distillation

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Video summarization is defined as the process of automatically identifying and extracting the relevant contents from a video that can best represent the contents of the video. The proposed model implements a video summarization framework based on generative adversarial network (GAN) for feature extraction and knowledge distillation for key frame or segment selection. The ideal characteristics of a video summary is diversity and representativeness. The primary stage of the proposed model based on adversarial learning ensures that the extracted features contain diverse and representative elements from the video. The generator is a convolutional recurrent autoencoder that learns the hidden representation of the video through the reconstruction loss. The generator model is followed by a discriminator that aims at improving the efficiency of the generator model by trying to discriminate between the original and reconstructed video samples. The adversarial network is followed by a knowledge distillation phase which acts as a key frame or segment selector by employing a simple network whose input data is retrieved from the preceding GAN model. Comprehensive evaluations conducted on public and custom datasets substantiate the relevance of GANs and knowledge distillation phase for video summarization. Quantitative and qualitative evaluations further prove that the proposed model produces remarkable results with summaries that are diverse, representative and concise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

  • Apostolidis E, Adamantidou E, Metsai AI, Mezaris V, Patras I (2020a) Unsupervised video summarization via attention-driven adversarial learning. In: International Conference on multimedia modeling, pp 492–504

  • Apostolidis E, Adamantidou E, Metsai AI, Mezaris V, Patras I (2020b) AC-SUM-GAN: connecting actor-critic and generative adversarial networks for unsupervised video summarization. IEEE Trans Circ Syst Video Technol 31(8):3278–3292

    Article  Google Scholar 

  • Cai S, Zuo W, Davis LS, Zhang L (2018) Weakly-supervised video summarization using variational encoder-decoder and web prior. In: Proceedings of the European conference on computer vision (ECCV), pp 184–200

  • De Avila SEF, Lopes APB, da Luz Jr A, de Albuquerque Araújo A (2011) VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68

    Article  Google Scholar 

  • Fu TJ, Tai SH, Chen HT (2019) Attentive and adversarial learning for video summarization. In: 2019 IEEE Winter Conference on applications of computer vision (WACV), pp 1579–1587. IEEE

  • Han MX, Hu HM, Liu Y, Zhang C, Tian RP, Zheng J (2017) An auto-encoder-based summarization algorithm for unstructured videos. Multimed Tools Appl 76(23):25039–25056

    Article  Google Scholar 

  • He X, Hua Y, Song T, Zhang Z, Xue Z, Ma R, Guan H (2019) Unsupervised video summarization with attentive conditional generative adversarial networks. In: Proceedings of the 27th ACM International Conference on multimedia, pp 2296–2304

  • Huang C, Wang H (2019) A novel key-frames selection framework for comprehensive video summarization. IEEE Trans Circ Syst Video Technol 30(2):577–589

    Article  Google Scholar 

  • Huang S, Li X, Zhang Z, Wu F, Han J (2018) User-ranking video summarization with multi-stage spatio–temporal representation. IEEE Trans Image Process 28(6):2654–2664

    Article  MathSciNet  MATH  Google Scholar 

  • Jappie Z, Torpey D, Celik T (2020) SummaryNet: a multi-stage deep learning model for automatic video summarisation. arXiv preprint arXiv:2002.09424

  • Ji Z, Zhao Y, Pang Y, Li X, Han J (2020) Deep attentive video summarization with distribution consistency learning. IEEE Trans Neural Netw Learn Syst 32(4):1765–1775

    Article  Google Scholar 

  • Jung Y, Cho D, Kim D, Woo S, Kweon IS (2019) Discriminative feature learning for unsupervised video summarization. In: Proceedings of the AAAI Conference on artificial intelligence 33(1): 8537–8544

  • Lal S, Duggal S, Sreedevi I (2019) Online video summarization: Predicting future to better summarize present. In: 2019 IEEE Winter Conference on applications of computer vision (WACV), pp 471–480. IEEE

  • Li X, Du Z, Huang Y, Tan Z (2021) A deep translation (GAN) based change detection network for optical and SAR remote sensing images. ISPRS J Photogramm Remote Sens 179:14–34

    Article  Google Scholar 

  • Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2019) Video anomaly detection with sparse coding inspired deep neural networks. IEEE Trans on Pattern Anal Mach Intell 43(3):1070–1084

    Article  Google Scholar 

  • Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 202–211

  • Mukhiddin T, Lee W, Lee S, Rashid T (2020) [IEEE 2020 IEEE International Conference on Big Data and Smart Computing (BigComp)-Busan, Korea (South) (2020.2.19–2020.2.22)] 2020 IEEE International Conference on Big Data and Smart Computing (BigComp)-Research Issues on Generative Adversarial Networks and Applications, pp 487–488

  • Nair MS, Mohan J (2021) Static video summarization using multi-CNN with sparse autoencoder and random forest classifier. SIViP 15(4):735–742

    Article  Google Scholar 

  • Nawaratne R, Alahakoon D, De Silva D, Yu X (2019) Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Trans Ind Inf 16(1):393–402

    Article  Google Scholar 

  • Pantazis G, Dimas G, Iakovidis DK (2020) SalSum: Saliency-based Video Summarization using Generative Adversarial Networks. arXiv preprint arXiv:2011.10432

  • Ren J, Shen X, Lin Z, Mech R (2020) Best frame selection in a short video. In: Proceedings of the IEEE/CVF Winter Conference on applications of computer vision, pp 3212–3221

  • Rochan M, Ye L, Wang Y (2018) Video summarization using fully convolutional sequence networks. In: Proceedings of the European Conference on computer vision (ECCV), pp 347–363

  • Shi Q, Liu M, Liu X, Liu P, Zhang P, Yang J, Li X (2020) Domain adaption for fine-grained urban village extraction from satellite images. IEEE Geosci Remote Sens Lett 17(8):1430–1434

    Article  Google Scholar 

  • Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5179–5187

  • Sreeja MU, Kovoor BC (2019) Towards genre-specific frameworks for video summarisation: a survey. J vis Commun Image Represent 62:340–358

    Article  Google Scholar 

  • Sreeja MU, Kovoor BC (2021) A unified model for egocentric video summarization: an instance-based approach. Comput Electric Eng 92:107161

    Article  Google Scholar 

  • Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 6479–6488

  • Thomas SS, Gupta S, Subramanian VK (2018) Context driven optimized perceptual video summarization and retrieval. IEEE Trans Circ Syst Video Technol 29(10):3132–3145

    Article  Google Scholar 

  • Turnes JN, Castro JDB, Torres DL, Vega PJS, Feitosa RQ, Happ PN (2020) Atrous cgan for sar to optical image translation. IEEE Geosci Remote Sens Lett, pp 1–5

  • Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  • Xingjian SHI, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810

  • Yanagi R, Togo R, Ogawa T, Haseyama M (2019) Scene retrieval for video summarization based on text-to-image GAN. In: 2019 IEEE International Conference on image processing (ICIP), pp 1825–1829. IEEE

  • Yuan L, Tay FEH, Li P, Feng J (2019a) Unsupervised video summarization with cycle-consistent adversarial LSTM networks. IEEE Trans Multimed 22(10):2711–2722

    Article  Google Scholar 

  • Yuan L, Tay FE, Li P, Zhou L, Feng, J (2019b) Cycle-sum: cycle-consistent adversarial lstm networks for unsupervised video summarization. In: Proceedings of AAAI Conference on artificial intelligence 33(1): 9143–9150.

  • Zhang Y, Liang X, Zhang D, Tan M, Xing EP (2020) Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recogn Lett 130:376–385

    Article  Google Scholar 

  • Zhao B, Li X, Lu X (2017) Hierarchical recurrent neural network for video summarization. In: Proceedings of the 25th ACM International Conference on multimedia, pp 863–871

  • Zhao B, Li X, Lu X (2018) Hsa-rnn: hierarchical structure-adaptive rnn for video summarization. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 7405–7414

  • Zhou P, Xu T, Yin Z, Liu D, Chen E, Lv G, Li C (2019) Character-oriented video summarization with visual and textual cues. IEEE Trans Multimedia 22(10):2684–2697

    Article  Google Scholar 

  • Zhu W, Lu J, Li J, Zhou J (2020) DSNet: a flexible detect-to-summarize network for video summarization. IEEE Trans Image Process 30:948–962

    Article  Google Scholar 

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. U. Sreeja.

Ethics declarations

Conflicts of interest/Competing interests

The authors declare that they have no conflict of interest.

Consent for publication

We give our consent for this article to be published in the Springer Journal of Ambient Intelligence and Humanized Computing.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sreeja, M.U., Kovoor, B.C. A multi-stage deep adversarial network for video summarization with knowledge distillation. J Ambient Intell Human Comput 14, 9823–9838 (2023). https://doi.org/10.1007/s12652-021-03641-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-021-03641-8

Keywords

Navigation