Skip to main content
Log in

Video summarization and captioning using dynamic mode decomposition for surveillance

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Video surveillance has become a major tool in security maintenance. But analyzing in a playback version to detect any motion or any sort of movements might be tedious work because only for a short length of the video there would be any motion. There would be a lot of time wasted in analyzing the video and also it is impossible to always find the accurate frame where the transition has occurred. So there is a need in obtaining a summary video that captures any changes/motion. With the advancements in image processing using OpenCV and deep learning, video summarization is no longer an impossible work. Captions are generated for the summarized videos using an encoder–decoder captioning model. With the help of large, well-labeled video data sets like common objects in context, Microsoft video description, video captioning is a feasible task. Encoder–decoder models are used extensively to extract text from visual features with the arrival of long short term memory (LSTM). Attention mechanism has been widely used on decoder for the work of video captioning. Keyframes are obtained from very long videos using methods like dynamic mode decomposition, an algorithm in fluid dynamics, OpenCV’s absdiff(). We propose these tools for motion detection and video/image captioning for very long videos which are common in video surveillance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Min Z (2007) Key frame extraction from scenery video. In: 2007 International conference on wavelet analysis and pattern recognition, pp 540–543. https://doi.org/10.1109/ICWAPR.2007.4420729

  2. Shi Y, Yang H, Gong M, Liu X, Xia Y (2017) A fast and robust key frame extraction method for video copyright protection. J Elec Comput Eng 2017:1231794

  3. Xu N, Liu AA, Wong Y, Zhang Y, Nie W, Su Y, Kankanhalli M (2018) Dual-stream recurrent neural network for video captioning. IEEE Trans Circuits Syst Video Technol 29(8):2482–2493

  4. Song J Guo Y, Gao L, Li X, Hanjalic A, Shen HT (2018) From deterministic to generative: multimodal stochastic RNNs for video captioning. IEEE Trans Neural Netw Learn Syst 30(10):3047–3058

  5. Gao L, Li X, Song J, Shen HT (2019) Hierarchical LSTMs with adaptive attention for visual captioning. IEEE Trans Pat Anal Mach Intel 42(5):1112–1131

  6. Pan Y, Yao T, Li H, Mei T (2017) Video captioning with transferred semantic attributes. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp 650–6512

  7. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics pp 311–318

  8. LIN C (2004) Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, Spain pp 74–81

  9. Elliott D, Frank K (2013) Image description using visual dependency representations. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp 1292–1302

  10. Vedantam R, Lawrence Zitnick C, Parikh D (2015) Cider: consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp 4566–4575

  11. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR pp 6105–6114

  12. Struman DJ, Zeltzer D (1994) A survey of glove-based input. IEEE Comput Graph Appl Mag 14(1):30–39

  13. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  14. Yamashita R, Nishio M, Do RKG et al (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9:611–629

    Article  Google Scholar 

  15. Schmidhuber J, Hochreiter S (1997).Long short-term memory. Neural Comput 9(8):1735–1780

  16. Kenter T, Borisov A, De Rijke M (2016) Siamese CBOW: optimizing word embeddings for sentence representations. University of Amsterdam, Amsterdam, Yandex, Moscow

    Google Scholar 

  17. Potdar K, Pardawala TS, Pai CD (2017) A comparative study of categorical variable encoding techniques for neural network classifiers. Int J Comput Appl 175(4):7–9

  18. Pan R, Tian Y, Wang Z (2010) Key-frame extraction based on clustering. In: 2010 IEEE international conference on progress in informatics and computing IEEE, vol 2, pp 867–871

  19. Basaldella M, Antolli E, Serra G, Tasso C (2018) Bidirectional LSTM recurrent neural network for keyphrase extraction. https://doi.org/10.1007/978-3-319-73165-0

    Article  Google Scholar 

  20. Wang Y, Sun Y, Ma Z, Gao L, Xu Y, Wu Y (2020) A method of relation extraction using pre-training models. In: 2020 13th International Symposium on Computational Intelligence and Design (ISCID) pp 176–179. https://doi.org/10.1109/ISCID51228.2020.00046

  21. Shi Y, Yang H, Gong M, Liu X, Xia Y (2017) A fast and robust key frame extraction method for video copyright protection. J Electr Comput Eng 2017:1–7. https://doi.org/10.1155/2017/1231794

    Article  Google Scholar 

  22. Pandey S, Dwivedy P, Meena S, Potnis A (2017) A survey on key frame extraction methods of a MPEG video. In: 2017 International Conference on Computing, Communication and Automation (ICCCA) IEEE, pp 1192–1196

  23. Sun L, Zhou Y (2011) A key frame extraction method based on mutual information and image entropy. In: 2011 International Conference on Multimedia Technology, IEEE pp 35–38

  24. Mentzelopoulos M, Psarrou A (2004) Key-frame extraction algorithm using entropy difference. In: Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval pp 39–45

  25. Gao L, Guo Z, Zhang H, Xu X, Shen HT (2017) Video captioning with attention-based LSTM and semantic consistency. IEEE Trans Multimedia 19(9):2045–2055. https://doi.org/10.1109/TMM.2017.2729019

    Article  Google Scholar 

  26. Lei X, Jiang X, Wang C (2013) Design and implementation of a real-time video stream analysis system based on FFMPEG. In: 2013 Fourth World Congress on Software Engineering IEEE, pp 212–216

  27. Qaiser S, Ali R (2018) Text mining: use of TF-IDF to examine the relevance of words to documents. Int J Comput Appl 181(1):25–29

  28. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojn, Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp 2818–2826

  29. Chen D, Dolan WB (2011) Collecting highly parallel data for paraphrase evaluation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies pp 190–200

Download references

Funding

Our research was supported by Dr. Anand Kumar M. Due references have been provided on all supporting literatures and resources.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anand Kumar M..

Ethics declarations

Availability of data and material

Microsoft Video Description (MSVD) [29] dataset containing YouTube video clips and well labelled captions was used.

Code availability

Custom code was used to build models using Tensorflow 1.15.0. Google Colab was used to train and evaluate the models on a GPU.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Radarapu, R., Gopal, A.S.S., NH, M. et al. Video summarization and captioning using dynamic mode decomposition for surveillance. Int. j. inf. tecnol. 13, 1927–1936 (2021). https://doi.org/10.1007/s41870-021-00668-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-021-00668-0

Keywords

Navigation