Skip to main content
Log in

A robust framework to generate surveillance video summaries using combination of zernike moments and r-transform and deep neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A huge number of cameras records scenes everywhere, generating enormous bulks of videos. Processing these huge masses of videos and detection of abnormal object activities demands adequate resources like time, manpower, and hardware storage, etc. To cope with the aforementioned challenges, our proposed model for an automatic video summarization of abnormal events plays an important role in providing the well-organized storage, quick browsing, and retrieval of the large collection of video data without losing important aspects due to its lightweight. In this research, abnormal object activity detection and summary generation are performed based on two stages i.e. 1) machine learning technique for key event detection, 2) deep learning algorithm to remove extra frames generating summarized video. Firstly, Silhouette images are formed, and two feature descriptors such as Zernike Moments and R-Transform are used to create a combined feature vector. The combined feature vector provides more informative features from images and makes our model lightweight keeping only relevant features. Furthermore, on the combined feature vector, K Nearest Neighbor (KNN) clustering is applied to extract keyframes sequentially. In the end, to improve the performance, Deep Learning Algorithm i.e. ALexNet is trained over preprocessed frames from the dataset. Moreover, the DL classifier aims to eliminate the non-Key Frames and generate surveillance video summaries demonstrating abnormal object activities. The efficiency of the proposed algorithm is analyzed performing an extensive experimentation attaining 99% accuracy approximately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Abbreviations

OCR:

Optical Character Recognition

SI:

Silhouette Image

K-NN:

K Nearest Neighbour

KARD:

Kinetic Activity Recognition Dataset

References

  1. Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Disc 29(3):626–688

    Article  MathSciNet  Google Scholar 

  2. AlMaadeed N (2020) Face recognition and summarization for surveillance video sequences

  3. Bansal M, Kumar M, Kumar M (2021) 2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors. Multimed Tools Appl 80(12):18839–18857

    Article  Google Scholar 

  4. Bansal M, Kumar M, Sachdeva M, Mittal A (2021) Transfer learning for image classification using VGG19: Caltech-101 image data set. J Ambient Intell Humaniz Comput:1–12

  5. Blank M, et al. (2005) Actions as space-time shapes. In tenth IEEE international conference on computer vision (ICCV'05) volume 1. IEEE

  6. Dang C, Moghadam A, Radha H (2014) RPCA-KFE: key frame extraction for consumer video based robust principal component analysis. arXiv preprint arXiv:1405.1678

  7. Dhiman C, Vishwakarma DK (2017) High dimensional abnormal human activity recognition using histogram oriented gradients and zernike moments. In 2017 IEEE international conference on computational intelligence and computing research (ICCIC). IEEE

  8. Doulamis AD, Doulamis ND, Kollias SD (2000) A fuzzy video content representation for video summarization and content-based retrieval. Signal Process 80(6):1049–1067

    Article  MATH  Google Scholar 

  9. Dupont C, Tobias L, Luvison B (2017) Crowd-11: A dataset for fine grained crowd behaviour analysis. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

  10. Dürr O, Sick B (2013) Deep learning: a novel approach to classify phenotypes in high content screening. PLoS One 8:e80999

    Google Scholar 

  11. Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Vis Commun Image Represent 23(7):1031–1040

    Article  Google Scholar 

  12. Elharrouss O, Almaadeed N, al-Maadeed S, Bouridane A, Beghdadi A (2021) A combined multiple action recognition and summarization for surveillance video sequences. Appl Intell 51(2):690–712

    Article  Google Scholar 

  13. Gaglio S, Re GL, Morana M (2014) Human activity recognition process using 3-D posture data. IEEE Trans Human-Mach Syst 45(5):586–597

    Article  Google Scholar 

  14. Gianluigi C, Raimondo S (2006) An innovative algorithm for key frame extraction in video summarization. J Real-Time Image Proc 1(1):69–88

    Article  Google Scholar 

  15. Gygli M, et al. (2014) Creating summaries from user videos. In European conference on computer vision. 2014 (pp. 505–520). Springer, Cham

  16. Huang H, Liu H, Zhang L (2014) Videoweb: space-time aware presentation of a videoclip collection. IEEE J Emerg Select Topics Circuits Syst 4(1):142–152

    Article  Google Scholar 

  17. Hung M-H, Hsieh C-H (2008) Event detection of broadcast baseball videos. IEEE Trans Circuits Syst Vid Technol 18(12):1713–1726

    Article  Google Scholar 

  18. Javed A, Bajwa KB, Malik H, Irtaza A (2016) An efficient framework for automatic highlights generation from sports videos. IEEE Signal Process Lett 23(7):954–958

    Article  Google Scholar 

  19. Ji Z, Xiong K, Pang Y, Li X (2019) Video summarization with attention-based encoder–decoder networks. IEEE Trans Circuits Syst Vid Technol 30(6):1709–1717

    Article  Google Scholar 

  20. Jiang J, He X, Gao M, Wang X, Wu X (2015) Human action recognition via compressive-sensing-based dimensionality reduction. Optik 126(9–10):882–887

    Article  Google Scholar 

  21. Kamiński Ł, Maćkowiak S, Domański M (2017) Human activity recognition using standard descriptors of MPEG CDVS. In 2017 IEEE international conference on Multimedia & Expo Workshops (ICMEW). IEEE

  22. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25:1097–1105

    Google Scholar 

  23. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  24. Kumar A, Kumar M, Kaur A (2021) Face detection in still images under occlusion and non-uniform illumination. Multimed Tools Appl 80(10):14565–14590

    Article  Google Scholar 

  25. Lazaridis L, Dimou A, Daras P (2018) Abnormal behavior detection in crowded scenes using density heatmaps and optical flow. In 2018 26th European signal processing conference (EUSIPCO). IEEE

  26. Li B, Pan H, Sezan I (2003) A general framework for sports video summarization with its application to soccer. In 2003 IEEE international conference on acoustics, speech, and signal processing, 2003. Proceedings. (ICASSP’03), vol.3, pp. III–169. IEEE

  27. Li, C., et al. (2009) Motion-focusing key frame extraction and video summarization for lane surveillance system. In 2009 16th IEEE international conference on image processing (ICIP), pp. 4329–4332. IEEE

  28. Lin J, Zhong S-h, Fares A (2022) Deep hierarchical LSTM networks with attention for video summarization. Comput Electr Eng 97:107618

    Article  Google Scholar 

  29. Ma M, Mei S, Wan S, Hou J, Wang Z, Feng DD (2020) Video summarization via block sparse dictionary selection. Neurocomputing 378:197–209

    Article  Google Scholar 

  30. Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 202–211)

  31. Mahum R, Rehman SU, Okon OD, Alabrah A, Meraj T, Rauf HT (2021) A novel hybrid approach based on deep CNN to detect glaucoma using fundus imaging. Electronics 11(1):26

    Article  Google Scholar 

  32. Mahum R, Rehman SU, Meraj T, Rauf HT, Irtaza A, el-Sherbeeny AM, el-Meligy MA (2021) A novel hybrid approach based on deep cnn features to detect knee osteoarthritis. Sensors 21(18):6189

    Article  Google Scholar 

  33. Mahum R, et al. (2022) A novel framework for potato leaf disease detection using an efficient deep learning model. Human Ecol Risk Assess: An Int J, p. 1–24

  34. Muhammad K, Hussain T, del Ser J, Palade V, de Albuquerque VHC (2019) DeepReS: a deep learning-based video summarization strategy for resource-constrained industrial surveillance scenarios. IEEE Trans Industrial Informa 16(9):5938–5947

    Article  Google Scholar 

  35. Muhammad K, Hussain T, Baik SW (2020) Efficient CNN based summarization of surveillance videos for resource-constrained devices. Pattern Recogn Lett 130:370–375

    Article  Google Scholar 

  36. Munir MH, et al. (2022) An automated framework for Corona virus severity detection using combination of AlexNet and faster RCNN

  37. Murugan AS et al (2018) A study on various methods used for video summarization and moving object detection for video surveillance applications. Multimed Tools Appl 77(18):23273–23290

    Article  Google Scholar 

  38. Napoletano P, Boccignone G, Tisato F (2015) Attentive monitoring of multiple video streams driven by a Bayesian foraging strategy. IEEE Trans Image Process 24(11):3266–3281

    Article  MathSciNet  MATH  Google Scholar 

  39. Ou S-H et al (2014) On-line multi-view video summarization for wireless video sensor network. IEEE J Select Topics Signal Process 9(1):165–179

    Google Scholar 

  40. Pan H, Van Beek P, Sezan M.I (2001) Detection of slow-motion replay segments in sports video for highlights generation. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (cat. No. 01CH37221). IEEE

  41. Pan H, Li B, Sezan MI (2002) Automatic detection of replay segments in broadcast sports programs by detection of logos in scene transitions. In 2002 IEEE international conference on acoustics, speech, and signal processing. IEEE

  42. Reed S, et al. (2014) Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596

  43. Rezaee K, Rezakhani SM, Khosravi MR, Moghimi MK (2021) A survey on deep learning-based real-time crowd anomaly detection for secure distributed video surveillance. Pers Ubiquit Comput:1–17

  44. Shaheed K, Mao A, Qureshi I, Kumar M, Hussain S, Ullah I, Zhang X (2022) DS-CNN: a pre-trained Xception model based on depth-wise separable convolutional neural network for finger vein recognition. Expert Syst Appl 191:116288

    Article  Google Scholar 

  45. Sharif M, Khan MA, Akram T, Javed MY, Saba T, Rehman A (2017) A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection. EURASIP J Image Vid Process 2017(1):1–18

    Google Scholar 

  46. Song Y, et al (2015) Tvsum: Summarizing web videos using titles. in Proceedings of the IEEE conference on computer vision and pattern recognition

  47. Tabbone S, Wendling L, Salmon J-P (2006) A new shape descriptor defined on the radon transform. Comput Vis Image Underst 102(1):42–51

    Article  Google Scholar 

  48. Tang L-X, Mei T, Hua X-S (2009) Near-lossless video summarization. in Proceedings of the 17th ACM international conference on Multimedia

  49. Taskiran CM et al (2006) Automated video program summarization using speech transcripts. IEEE Trans Multimed 8(4):775–791

    Article  Google Scholar 

  50. Tavassolipour M, Karimian M, Kasaei S (2013) Event detection and summarization in soccer videos using bayesian network and copula. IEEE Trans Circ Syst Vid Technol 24(2):291–304

    Article  Google Scholar 

  51. Tran TN, Wehrens R, Buydens LM (2006) KNN-kernel density-based clustering for high-dimensional multivariate data. Comput Stat Data Anal 51(2):513–525

    Article  MathSciNet  MATH  Google Scholar 

  52. Varghese EB, Thampi SM (2018) A deep learning approach to predict crowd behavior based on emotion. In international conference on smart multimedia. Springer

  53. Varghese E, Thampi SM, Berretti S (2020) A psychologically inspired fuzzy cognitive deep learning framework to predict crowd behavior. IEEE Trans Affect Comput

  54. Wang F, Ngo C-W (2007) Rushes video summarization by object and event understanding. In Proceedings of the international workshop on TRECVID video summarization, pp. 25–29

  55. Wang T, et al. (2007) Video collage: a novel presentation of video sequence. In 2007 IEEE international conference on multimedia and expo. IEEE

  56. Wang M, Hong R, Li G, Zha ZJ, Yan S, Chua TS (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Trans Multimed 14(4):975–985

    Article  Google Scholar 

  57. Xu J, Sun Z, Ma C (2021) Crowd aware summarization of surveillance videos by deep reinforcement learning. Multimed Tools Appl 80(4):6121–6141

    Article  Google Scholar 

  58. Yao T, Mei T, Rui Y (2016) Highlight detection with pairwise deep ranking for first-person video summarization. in Proceedings of the IEEE conference on computer vision and pattern recognition

  59. You J, Liu G, Sun L, Li H (2007) A multiple visual models based perceptive analysis framework for multilevel video summarization. IEEE Trans Circuits Syst Vid Technol 17(3):273–285

    Article  Google Scholar 

  60. Zawbaa HM, El-Bendary N, Hassanien AE, Kim TH (2011) Machine learning-based soccer video summarization system. In International Conference on Multimedia, Computer Graphics, and Broadcasting. 2011 (pp. 19–28). Springer, Berlin, Heidelberg

  61. Zhang L, Xu QK, Nie LZ, Huang H (2014) VideoGraph: a non-linear video representation for efficient exploration. Vis Comput 30(10):1123–1132

    Article  Google Scholar 

  62. Zhang S, Zhu Y, Roy-Chowdhury AK (2016) Context-aware surveillance video summarization. IEEE Trans Image Process 25(11):5469–5478

    Article  MathSciNet  MATH  Google Scholar 

  63. Zhang S, Zhang W, Li Y (2016) Human action recognition based on multifeature fusion. In Chinese intelligent systems conference. 2016. Springer

  64. Zhao W, Wang J, Bhat D, Sakiewicz K, Nandhakumar N, Chang W (1999) Improving color based video shot detection. In Proceedings IEEE international conference on multimedia computing and systems (vol. 2, pp. 752–756). IEEE

  65. Zhu X, et al. (2003) Medical video mining for efficient database indexing, management and access. In proceedings 19th international conference on data engineering (cat. No. 03CH37405). IEEE

Download references

Acknowledgements

The authors extend their appreciation to King Saud University, Riyadh, Saudi Arabia and UET Taxila for supporting this work.

Funding

The authors extend their appreciation to “King Saud University” for funding through researchers supporting project number (RSP- 2021/164), King Saud University, Riyadh, Saudi Arabia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rabbia Mahum.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahum, R., Irtaza, A., Nawaz, M. et al. A robust framework to generate surveillance video summaries using combination of zernike moments and r-transform and deep neural network. Multimed Tools Appl 82, 13811–13835 (2023). https://doi.org/10.1007/s11042-022-13773-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13773-4

Keywords

Navigation