Abstract
Video summarization aims at selecting valuable clips for browsing videos with high efficiency. Previous approaches typically focus on aggregating temporal features while ignoring the potential role of visual representations in summarizing videos. In this paper, we present a global difference-aware network (GDANet) that exploits the feature difference across frame and video as guidance to enhance visual features. Initially, a difference optimization module (DOM) is devised to enhance the discriminability of visual features, bringing gains in accurately aggregating temporal cues. Subsequently, a dual-scale attention module (DSAM) is introduced to capture informative contextual information. Eventually, we design an adaptive feature fusion module (AFFM) to make the network adaptively learn context representations and perform feature fusion effectively. We have conducted experiments on benchmark datasets, and the empirical results demonstrate the effectiveness of the proposed framework.
Similar content being viewed by others
References
APOSTOLIDIS E, ADAMANTIDOU E, METSAI A I, et al. Video summarization using deep neural networks: a survey[J]. Proceedings of the IEEE, 2021, 109(11): 1838–1863.
LEI J, LUAN Q, SONG X, et al. Action parsing-driven video summarization based on reinforcement learning[J]. IEEE transactions on circuits and systems for video technology, 2018, 29(7): 2126–2137.
HUANG C, WANG H. A novel key-frames selection framework for comprehensive video summarization[J]. IEEE transactions on circuits and systems for video technology, 2019, 30(2): 577–589.
YUAN L, TAY F E H, LI P, et al. Cycle-SUM: cycle-consistent adversarial LSTM networks for unsupervised video summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence, January 27-February 1, 2019, Hawaii, USA. Washington: AAAI, 2019, 33(01): 9143–9150.
CHU W S, SONG Y, JAIMES A. Video co-summarization: video summarization by visual co-occurrence[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 7–12, 2015, Boston, USA. New York: IEEE, 2015: 3584–3592.
MEI S, GUAN G, WANG Z, et al. L2,0 constrained sparse dictionary selection for video summarization[C]//2014 IEEE International Conference on Multimedia and Expo, July 14–18, 2014, Chengdu, China. New York: IEEE, 2014: 1–6.
ZHANG K, CHAO W L, SHA F, et al. Video summarization with long short-term memory[C]//European Conference on Computer Vision, October 10–16, 2016, Amsterdam, Netherlands. Berlin: Springer, 2016: 766–782.
YUE-HEI N G J, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: deep networks for video classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 7–12, 2015, Boston, USA. New York: IEEE, 2015: 4694–4702.
ZHAO B, LI X, LU X. Hierarchical recurrent neural network for video summarization[C]//Proceedings of the 25th ACM International Conference on Multimedia, October 23–27, 2017, Orlando, USA. New York: ACM, 2017: 863–871.
ZHAO B, LI X, LU X. HSA-RNN: hierarchical structure-adaptive RNN for video summarization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18–22, 2018, Salt Lake City, USA. New York: IEEE, 2018: 7405–7414.
JUNG Y, CHO D, KIM D, et al. Discriminative feature learning for unsupervised video summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence, January 27–February 1, 2019, Hawaii, USA. Washington: AAAI, 2019, 33(01): 8537–8544.
FU H, WANG H. Self-attention binary neural tree for video summarization[J]. Pattern recognition letters, 2021, 143: 19–26.
KANAFANI H, GHAURI J A, HAKIMOV S, et al. Unsupervised video summarization via multi-source features[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval, November 16–19, 2021. New York: ACM, 2021: 466–470.
SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 7–12, 2015, Boston, USA. New York: IEEE, 2015: 1–9.
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
POTAPOV D, DOUZE M, HARCHAOUI Z, et al. Category-specific video summarization[C]//European Conference on Computer Vision, September 5–12, 2014, Zurich, Switzerlan. Berlin: Springer, 2014: 540–555.
GYGLI M, GRABNER H, RIEMENSCHNEIDER H, et al. Creating summaries from user videos[C]//European Conference on Computer Vision, September 5–12, 2014, Zurich, Switzerlan. Berlin: Springer, 2014: 505–520.
SONG Y, VALLMITJANA J, STENT A, et al. TVSUM: summarizing web videos using titles[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 7–12, 2015, Boston, USA. New York: IEEE, 2015: 5179–5187.
DE AVILA S E F, LOPES A P B, DA LUZ J R A, et al. VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method[J]. Pattern recognition letters, 2011, 32(1): 56–68.
MAHASSENI B, LAM M, TODOROVIC S. Unsupervised video summarization with adversarial LSTM networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21–26, 2017, Hawaii, USA. New York: IEEE, 2017: 202–211.
ZHOU K, QIAO Y, XIANG T. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward[C]//Proceedings of the AAAI Conference on Artificial Intelligence, February 2–7, 2018, New Orleans, USA. Washington: AAAI, 2018, 32(1).
ROCHAN M, YE L, WANG Y. Video summarization using fully convolutional sequence networks[C]//Proceedings of the European Conference on Computer Vision, September 8–14, 2018, Munich, Germany. Berlin: Springer, 2018: 347–363.
ZHAO B, LI H, LU X, et al. Reconstructive sequence-graph network for video summarization[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 44(5): 2793–2801.
LIU T, MENG Q, HUANG J J, et al. Video summarization through reinforcement learning with a 3D spatio-temporal U-Net[J]. IEEE transactions on image processing, 2022, 31: 1573–1586.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare no conflict of interest.
Additional information
This work has been supported by the National Natural Science Foundation of China (Nos.61702347 and 62027801), the Natural Science Foundation of Hebei Province (Nos.F2022210007 and F2017210161), the Science and Technology Project of Hebei Education Department (Nos.ZD2022100 and QN2017132), and the Central Guidance on Local Science and Technology Development Fund (No.226Z0501G).
Rights and permissions
About this article
Cite this article
Zhang, Y., Liu, Y. Video summarization via global feature difference optimization. Optoelectron. Lett. 19, 570–576 (2023). https://doi.org/10.1007/s11801-023-2212-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11801-023-2212-0