Skip to main content
Log in

Video summarization via global feature difference optimization

  • Published:
Optoelectronics Letters Aims and scope Submit manuscript

Abstract

Video summarization aims at selecting valuable clips for browsing videos with high efficiency. Previous approaches typically focus on aggregating temporal features while ignoring the potential role of visual representations in summarizing videos. In this paper, we present a global difference-aware network (GDANet) that exploits the feature difference across frame and video as guidance to enhance visual features. Initially, a difference optimization module (DOM) is devised to enhance the discriminability of visual features, bringing gains in accurately aggregating temporal cues. Subsequently, a dual-scale attention module (DSAM) is introduced to capture informative contextual information. Eventually, we design an adaptive feature fusion module (AFFM) to make the network adaptively learn context representations and perform feature fusion effectively. We have conducted experiments on benchmark datasets, and the empirical results demonstrate the effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. APOSTOLIDIS E, ADAMANTIDOU E, METSAI A I, et al. Video summarization using deep neural networks: a survey[J]. Proceedings of the IEEE, 2021, 109(11): 1838–1863.

    Article  Google Scholar 

  2. LEI J, LUAN Q, SONG X, et al. Action parsing-driven video summarization based on reinforcement learning[J]. IEEE transactions on circuits and systems for video technology, 2018, 29(7): 2126–2137.

    Article  Google Scholar 

  3. HUANG C, WANG H. A novel key-frames selection framework for comprehensive video summarization[J]. IEEE transactions on circuits and systems for video technology, 2019, 30(2): 577–589.

    Article  Google Scholar 

  4. YUAN L, TAY F E H, LI P, et al. Cycle-SUM: cycle-consistent adversarial LSTM networks for unsupervised video summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence, January 27-February 1, 2019, Hawaii, USA. Washington: AAAI, 2019, 33(01): 9143–9150.

    Google Scholar 

  5. CHU W S, SONG Y, JAIMES A. Video co-summarization: video summarization by visual co-occurrence[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 7–12, 2015, Boston, USA. New York: IEEE, 2015: 3584–3592.

    Google Scholar 

  6. MEI S, GUAN G, WANG Z, et al. L2,0 constrained sparse dictionary selection for video summarization[C]//2014 IEEE International Conference on Multimedia and Expo, July 14–18, 2014, Chengdu, China. New York: IEEE, 2014: 1–6.

    Google Scholar 

  7. ZHANG K, CHAO W L, SHA F, et al. Video summarization with long short-term memory[C]//European Conference on Computer Vision, October 10–16, 2016, Amsterdam, Netherlands. Berlin: Springer, 2016: 766–782.

    Google Scholar 

  8. YUE-HEI N G J, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: deep networks for video classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 7–12, 2015, Boston, USA. New York: IEEE, 2015: 4694–4702.

    Google Scholar 

  9. ZHAO B, LI X, LU X. Hierarchical recurrent neural network for video summarization[C]//Proceedings of the 25th ACM International Conference on Multimedia, October 23–27, 2017, Orlando, USA. New York: ACM, 2017: 863–871.

    Google Scholar 

  10. ZHAO B, LI X, LU X. HSA-RNN: hierarchical structure-adaptive RNN for video summarization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18–22, 2018, Salt Lake City, USA. New York: IEEE, 2018: 7405–7414.

    Google Scholar 

  11. JUNG Y, CHO D, KIM D, et al. Discriminative feature learning for unsupervised video summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence, January 27–February 1, 2019, Hawaii, USA. Washington: AAAI, 2019, 33(01): 8537–8544.

    Google Scholar 

  12. FU H, WANG H. Self-attention binary neural tree for video summarization[J]. Pattern recognition letters, 2021, 143: 19–26.

    Article  ADS  Google Scholar 

  13. KANAFANI H, GHAURI J A, HAKIMOV S, et al. Unsupervised video summarization via multi-source features[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval, November 16–19, 2021. New York: ACM, 2021: 466–470.

    Google Scholar 

  14. SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 7–12, 2015, Boston, USA. New York: IEEE, 2015: 1–9.

    Google Scholar 

  15. VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.

  16. POTAPOV D, DOUZE M, HARCHAOUI Z, et al. Category-specific video summarization[C]//European Conference on Computer Vision, September 5–12, 2014, Zurich, Switzerlan. Berlin: Springer, 2014: 540–555.

    Google Scholar 

  17. GYGLI M, GRABNER H, RIEMENSCHNEIDER H, et al. Creating summaries from user videos[C]//European Conference on Computer Vision, September 5–12, 2014, Zurich, Switzerlan. Berlin: Springer, 2014: 505–520.

    Google Scholar 

  18. SONG Y, VALLMITJANA J, STENT A, et al. TVSUM: summarizing web videos using titles[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 7–12, 2015, Boston, USA. New York: IEEE, 2015: 5179–5187.

    Google Scholar 

  19. DE AVILA S E F, LOPES A P B, DA LUZ J R A, et al. VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method[J]. Pattern recognition letters, 2011, 32(1): 56–68.

    Article  ADS  Google Scholar 

  20. MAHASSENI B, LAM M, TODOROVIC S. Unsupervised video summarization with adversarial LSTM networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21–26, 2017, Hawaii, USA. New York: IEEE, 2017: 202–211.

    Google Scholar 

  21. ZHOU K, QIAO Y, XIANG T. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward[C]//Proceedings of the AAAI Conference on Artificial Intelligence, February 2–7, 2018, New Orleans, USA. Washington: AAAI, 2018, 32(1).

    Book  Google Scholar 

  22. ROCHAN M, YE L, WANG Y. Video summarization using fully convolutional sequence networks[C]//Proceedings of the European Conference on Computer Vision, September 8–14, 2018, Munich, Germany. Berlin: Springer, 2018: 347–363.

    Google Scholar 

  23. ZHAO B, LI H, LU X, et al. Reconstructive sequence-graph network for video summarization[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 44(5): 2793–2801.

    Google Scholar 

  24. LIU T, MENG Q, HUANG J J, et al. Video summarization through reinforcement learning with a 3D spatio-temporal U-Net[J]. IEEE transactions on image processing, 2022, 31: 1573–1586.

    Article  ADS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunzuo Zhang.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interest.

Additional information

This work has been supported by the National Natural Science Foundation of China (Nos.61702347 and 62027801), the Natural Science Foundation of Hebei Province (Nos.F2022210007 and F2017210161), the Science and Technology Project of Hebei Education Department (Nos.ZD2022100 and QN2017132), and the Central Guidance on Local Science and Technology Development Fund (No.226Z0501G).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Liu, Y. Video summarization via global feature difference optimization. Optoelectron. Lett. 19, 570–576 (2023). https://doi.org/10.1007/s11801-023-2212-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11801-023-2212-0

Document code

Navigation