Skip to main content
Log in

VSMCNN-dynamic summarization of videos using salient features from multi-CNN model

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

A dynamic video summarization system detects key parts of the input video to generate its compact representation. The summaries can be used for efficient management of video data. This paper proposes an approach, Video summarization based on multi-CNN model (VSMCNN), that exploits major aspects of human cognition to generate meaningful summaries from videos. As the method focuses on dynamic summarization, the input video is divided into a set of shots. A multi-CNN model, which is a combination of different pre-trained models of CNN, is used for feature extraction from shots. The salient features are extracted from high dimensional feature vector using an unsupervised feature reduction technique applied in multiple subspaces to rank features in the vector. The distance measure between feature vectors is then thresholded to detect prime parts of the tested video. Experiments are performed on SumMe dataset and the results prove that our approach is successful in detecting portions of the tested video that has an essential message. The analysis shows that the method outperforms the state-of-the-art methods in the literature. Further evaluation on comparison with human-generated summaries in the ground truth proves the effectiveness of the proposed method. The paper also presents a detailed analysis to show which combination of pre-trained models of CNN is best suitable for generating dynamic summaries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Abdalla K, Menezes I, Oliveira L (2019) Modelling perceptions on the evaluation of video summarization. Expert Syst Appl 131:254–265

    Article  Google Scholar 

  • Anuradha K, Anand V, Raajan NR (2020) An effective technique for the creation of a video synopsis. J Ambient Intell Humaniz Comput, pp 1–6

  • Bruhn A, Weickert J, Schnörr C (2005) Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Int J Comput Vis 61(3):211–231

    Article  MATH  Google Scholar 

  • Cong Y, Liu J, Sun G, You Q, Li Y, Luo J (2016) Adaptive greedy dictionary selection for web media summarization. IEEE Trans Image Process 26(1):185–195

    Article  MathSciNet  MATH  Google Scholar 

  • De Avila SEF, Lopes APB, da Luz Jr A, de Albuquerque Araújo A (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognit Lett 32(1):56–68

    Article  Google Scholar 

  • Ejaz N, Mehmood I, Baik SW (2013) Efficient visual attention based framework for extracting key frames from videos. Signal Process Image Commun 28(1):34–44

    Article  Google Scholar 

  • Elhamifar E, Clara De Paolis Kaluza M (2017) Online summarization via submodular and convex optimization. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1783–1791

  • Fei M, Jiang W, Mao W (2018) Creating memorable video summaries that satisfy the user’s intention for taking the videos. Neurocomputing 275:1911–1920

    Article  Google Scholar 

  • Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Advances in neural information processing systems, pp 2069–2077

  • Guan G, Wang Z, Mei S, Ott M, He M, Feng DD (2014) A top-down approach for video summarization. ACM Trans Multimed Comput Commun Appl (TOMM) 11(1):1–21

    Article  Google Scholar 

  • Guo Z, Gao L, Zhen X, Zou F, Shen F, Zheng K (2016) Spatial and temporal scoring for egocentric video summarization. Neurocomputing 208:299–308

    Article  Google Scholar 

  • Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In: European Conference on computer vision, pp 505–520. Springer

  • He X, Hua Y, Song T, Zhang Z, Xue Z, Ma R, Robertson N, Guan H (2019) Unsupervised video summarization with attentive conditional generative adversarial networks. In: Proceedings of the 27th ACM international conference on multimedia (MM’19). ACM, New York, NY, USA, pp 2296–2304

    Google Scholar 

  • Huang D, Cai X, Wang C-D (2019) Unsupervised feature selection with multi-subspace randomization and collaboration. Knowl-Based Syst 182:104856

    Article  Google Scholar 

  • Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360

  • Jadon S, Jasim M (2020) Unsupervised video summarization framework using keyframe extraction and video skimming. In: 2020 IEEE 5th International Conference on computing communication and automation (ICCCA), pp 140–145. IEEE

  • Jégou H, Douze M, Cordelia S, Patrick P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 3304–3311. IEEE

  • Ji Z, Zhao Y, Pang Y, Li X, Han J (2020) Deep attentive video summarization with distribution consistency learning. IEEE Trans Neural Netw Learn Syst 32(4):1765–1775

    Article  Google Scholar 

  • Khosla A, Hamid R, Lin C, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: 2013 IEEE Conference on computer vision and pattern recognition, pp 2698–2705

  • Kuanar SK, Panda R, Chowdhury AS (2013) Video key frame extraction through dynamic Delaunay clustering with a structural constraint. J Vis Commun Image Represent 24(7):1212–1227

    Article  Google Scholar 

  • Kumar M, Loui AC (2011) Key frame extraction from consumer videos using sparse representation. In: 2011 18th IEEE International Conference on image processing, pp 2437–2440. IEEE

  • Lal S, Duggal S, Sreedevi I (2019) Online video summarization: predicting future to better summarize present. In: 2019 IEEE Winter Conference on applications of computer vision (WACV), pp 471–480. IEEE

  • LeCun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (1989) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp 396–404

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

    Article  Google Scholar 

  • Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp 1346–1353. IEEE

  • Li Y, Merialdo B (2010) Multi-video summarization based on video-mmr. In: 11th International Workshop on image analysis for multimedia interactive services WIAMIS 10, pp 1–4. IEEE

  • Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  • Lu S, Wang Z, Mei T, Guan G, Feng DD (2014) A bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE Trans Multimed 16(6):1497–1509

    Article  Google Scholar 

  • Ma M, Mei S, Wan S, Hou J, Wang Z, Feng DD (2020) Video summarization via block sparse dictionary selection. Neurocomputing 378:197–209

    Article  Google Scholar 

  • Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial LSTM networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR) 1:1–10

  • Mahmoud KM, Ismail MA, Ghanem NM. (2013) Vscan: an enhanced video summarization using density-based spatial clustering. In: International Conference on image analysis and processing, volume 8156, pp 733–742. Springer

  • Meng J, Wang H, Yuan J, Tan Y-P (2016) From keyframes to key objects: video summarization by representative object proposal selection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1039–1048

  • Mohan J, Nair M (2018) Dynamic summarization of videos based on descriptors in space-time video volumes and sparse autoencoder. IEEE Access 6:59768–59778

    Article  Google Scholar 

  • Nair M, Mohan J (2019) Video summarization using convolutional neural network and random forest classifier. In: TENCON 2019-2019 IEEE Region 10 Conference (TENCON), pp 476–480. IEEE

  • Nair MS, Mohan J (2020) Domain-independent video summarization based on transfer learning using convolutional neural network. In: Advances in electrical and computer technologies, pp 435–452. Springer

  • Panda R, Roy-Chowdhury RK (2017) Collaborative summarization of topic-related videos. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 7083–7092

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  • Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: null, p 1470. IEEE

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  • Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning, AAAI, pp 4–12

  • Tiwari V, Bhatnagar C (2021) A survey of recent work on video summarization: approaches and techniques. Multimed Tools Appl 80(18):27187–27221

    Article  Google Scholar 

  • Van den Bergh M, Boix X, Roig G, de Capitani B, Van Gool L (2012) Seeds: superpixels extracted via energy-driven sampling. In: European Conference on computer vision, pp 13–26. Springer

  • Wu J, Zhong S-h, Jiang J, Yang Y (2016) A novel clustering method for static video summarization. Multimed Tools Appl 76(260):1–17

    Google Scholar 

  • Yang H, Tian Q, Zhuang Q, Li L, Liang Q (2021) Fast and robust key frame extraction method for gesture video based on high-level feature representation. Signal, Image Video Process 15(3):617–626

    Article  Google Scholar 

  • Yao T, Mei T, Rui Y (2016) Highlight detection with pairwise deep ranking for first-person video summarization. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 982–990

  • Zhang K, Chao W-L, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: European Conference on computer vision, pp 766–782. Springer

  • Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 6848–6856

  • Zhu Y, Newsam S (2017) Densenet for dense flow. In: 2017 IEEE International Conference on image processing (ICIP), pp 790–794. IEEE

Download references

Acknowledgements

This work is supported by Cochin University of Science and Technology (CUSAT) through the Seed Money for New Research Initiatives (SMNRI) project (File No. PL.(UGC)1/SPG/SMNRI/2018-19 dated 14.11.2018).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Madhu S. Nair.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nair, M.S., Mohan, J. VSMCNN-dynamic summarization of videos using salient features from multi-CNN model. J Ambient Intell Human Comput 14, 14071–14080 (2023). https://doi.org/10.1007/s12652-022-04112-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-022-04112-4

Keywords

Navigation