A robust framework to generate surveillance video summaries using combination of zernike moments and r-transform and deep neural network

Mahum, Rabbia; Irtaza, Aun; Nawaz, Marriam; Nazir, Tahira; Masood, Momina; Shaikh, Sarang; Nasr, Emad Abouel

doi:10.1007/s11042-022-13773-4

A robust framework to generate surveillance video summaries using combination of zernike moments and r-transform and deep neural network

Published: 04 October 2022

Volume 82, pages 13811–13835, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Rabbia Mahum ORCID: orcid.org/0000-0003-1983-8201¹,
Aun Irtaza¹,
Marriam Nawaz^1,2,
Tahira Nazir³,
Momina Masood¹,
Sarang Shaikh⁴ &
…
Emad Abouel Nasr⁵

512 Accesses
6 Citations
Explore all metrics

Abstract

A huge number of cameras records scenes everywhere, generating enormous bulks of videos. Processing these huge masses of videos and detection of abnormal object activities demands adequate resources like time, manpower, and hardware storage, etc. To cope with the aforementioned challenges, our proposed model for an automatic video summarization of abnormal events plays an important role in providing the well-organized storage, quick browsing, and retrieval of the large collection of video data without losing important aspects due to its lightweight. In this research, abnormal object activity detection and summary generation are performed based on two stages i.e. 1) machine learning technique for key event detection, 2) deep learning algorithm to remove extra frames generating summarized video. Firstly, Silhouette images are formed, and two feature descriptors such as Zernike Moments and R-Transform are used to create a combined feature vector. The combined feature vector provides more informative features from images and makes our model lightweight keeping only relevant features. Furthermore, on the combined feature vector, K Nearest Neighbor (KNN) clustering is applied to extract keyframes sequentially. In the end, to improve the performance, Deep Learning Algorithm i.e. ALexNet is trained over preprocessed frames from the dataset. Moreover, the DL classifier aims to eliminate the non-Key Frames and generate surveillance video summaries demonstrating abnormal object activities. The efficiency of the proposed algorithm is analyzed performing an extensive experimentation attaining 99% accuracy approximately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning models for digital image processing: a review

Article 07 January 2024

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

A review of object detection based on deep learning

Article 12 June 2020

Abbreviations

OCR:: Optical Character Recognition
SI:: Silhouette Image
K-NN:: K Nearest Neighbour
KARD:: Kinetic Activity Recognition Dataset

References

Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Disc 29(3):626–688
Article MathSciNet Google Scholar
AlMaadeed N (2020) Face recognition and summarization for surveillance video sequences
Bansal M, Kumar M, Kumar M (2021) 2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors. Multimed Tools Appl 80(12):18839–18857
Article Google Scholar
Bansal M, Kumar M, Sachdeva M, Mittal A (2021) Transfer learning for image classification using VGG19: Caltech-101 image data set. J Ambient Intell Humaniz Comput:1–12
Blank M, et al. (2005) Actions as space-time shapes. In tenth IEEE international conference on computer vision (ICCV'05) volume 1. IEEE
Dang C, Moghadam A, Radha H (2014) RPCA-KFE: key frame extraction for consumer video based robust principal component analysis. arXiv preprint arXiv:1405.1678
Dhiman C, Vishwakarma DK (2017) High dimensional abnormal human activity recognition using histogram oriented gradients and zernike moments. In 2017 IEEE international conference on computational intelligence and computing research (ICCIC). IEEE
Doulamis AD, Doulamis ND, Kollias SD (2000) A fuzzy video content representation for video summarization and content-based retrieval. Signal Process 80(6):1049–1067
Article MATH Google Scholar
Dupont C, Tobias L, Luvison B (2017) Crowd-11: A dataset for fine grained crowd behaviour analysis. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
Dürr O, Sick B (2013) Deep learning: a novel approach to classify phenotypes in high content screening. PLoS One 8:e80999
Google Scholar
Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Vis Commun Image Represent 23(7):1031–1040
Article Google Scholar
Elharrouss O, Almaadeed N, al-Maadeed S, Bouridane A, Beghdadi A (2021) A combined multiple action recognition and summarization for surveillance video sequences. Appl Intell 51(2):690–712
Article Google Scholar
Gaglio S, Re GL, Morana M (2014) Human activity recognition process using 3-D posture data. IEEE Trans Human-Mach Syst 45(5):586–597
Article Google Scholar
Gianluigi C, Raimondo S (2006) An innovative algorithm for key frame extraction in video summarization. J Real-Time Image Proc 1(1):69–88
Article Google Scholar
Gygli M, et al. (2014) Creating summaries from user videos. In European conference on computer vision. 2014 (pp. 505–520). Springer, Cham
Huang H, Liu H, Zhang L (2014) Videoweb: space-time aware presentation of a videoclip collection. IEEE J Emerg Select Topics Circuits Syst 4(1):142–152
Article Google Scholar
Hung M-H, Hsieh C-H (2008) Event detection of broadcast baseball videos. IEEE Trans Circuits Syst Vid Technol 18(12):1713–1726
Article Google Scholar
Javed A, Bajwa KB, Malik H, Irtaza A (2016) An efficient framework for automatic highlights generation from sports videos. IEEE Signal Process Lett 23(7):954–958
Article Google Scholar
Ji Z, Xiong K, Pang Y, Li X (2019) Video summarization with attention-based encoder–decoder networks. IEEE Trans Circuits Syst Vid Technol 30(6):1709–1717
Article Google Scholar
Jiang J, He X, Gao M, Wang X, Wu X (2015) Human action recognition via compressive-sensing-based dimensionality reduction. Optik 126(9–10):882–887
Article Google Scholar
Kamiński Ł, Maćkowiak S, Domański M (2017) Human activity recognition using standard descriptors of MPEG CDVS. In 2017 IEEE international conference on Multimedia & Expo Workshops (ICMEW). IEEE
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25:1097–1105
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Kumar A, Kumar M, Kaur A (2021) Face detection in still images under occlusion and non-uniform illumination. Multimed Tools Appl 80(10):14565–14590
Article Google Scholar
Lazaridis L, Dimou A, Daras P (2018) Abnormal behavior detection in crowded scenes using density heatmaps and optical flow. In 2018 26th European signal processing conference (EUSIPCO). IEEE
Li B, Pan H, Sezan I (2003) A general framework for sports video summarization with its application to soccer. In 2003 IEEE international conference on acoustics, speech, and signal processing, 2003. Proceedings. (ICASSP’03), vol.3, pp. III–169. IEEE
Li, C., et al. (2009) Motion-focusing key frame extraction and video summarization for lane surveillance system. In 2009 16th IEEE international conference on image processing (ICIP), pp. 4329–4332. IEEE
Lin J, Zhong S-h, Fares A (2022) Deep hierarchical LSTM networks with attention for video summarization. Comput Electr Eng 97:107618
Article Google Scholar
Ma M, Mei S, Wan S, Hou J, Wang Z, Feng DD (2020) Video summarization via block sparse dictionary selection. Neurocomputing 378:197–209
Article Google Scholar
Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 202–211)
Mahum R, Rehman SU, Okon OD, Alabrah A, Meraj T, Rauf HT (2021) A novel hybrid approach based on deep CNN to detect glaucoma using fundus imaging. Electronics 11(1):26
Article Google Scholar
Mahum R, Rehman SU, Meraj T, Rauf HT, Irtaza A, el-Sherbeeny AM, el-Meligy MA (2021) A novel hybrid approach based on deep cnn features to detect knee osteoarthritis. Sensors 21(18):6189
Article Google Scholar
Mahum R, et al. (2022) A novel framework for potato leaf disease detection using an efficient deep learning model. Human Ecol Risk Assess: An Int J, p. 1–24
Muhammad K, Hussain T, del Ser J, Palade V, de Albuquerque VHC (2019) DeepReS: a deep learning-based video summarization strategy for resource-constrained industrial surveillance scenarios. IEEE Trans Industrial Informa 16(9):5938–5947
Article Google Scholar
Muhammad K, Hussain T, Baik SW (2020) Efficient CNN based summarization of surveillance videos for resource-constrained devices. Pattern Recogn Lett 130:370–375
Article Google Scholar
Munir MH, et al. (2022) An automated framework for Corona virus severity detection using combination of AlexNet and faster RCNN
Murugan AS et al (2018) A study on various methods used for video summarization and moving object detection for video surveillance applications. Multimed Tools Appl 77(18):23273–23290
Article Google Scholar
Napoletano P, Boccignone G, Tisato F (2015) Attentive monitoring of multiple video streams driven by a Bayesian foraging strategy. IEEE Trans Image Process 24(11):3266–3281
Article MathSciNet MATH Google Scholar
Ou S-H et al (2014) On-line multi-view video summarization for wireless video sensor network. IEEE J Select Topics Signal Process 9(1):165–179
Google Scholar
Pan H, Van Beek P, Sezan M.I (2001) Detection of slow-motion replay segments in sports video for highlights generation. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (cat. No. 01CH37221). IEEE
Pan H, Li B, Sezan MI (2002) Automatic detection of replay segments in broadcast sports programs by detection of logos in scene transitions. In 2002 IEEE international conference on acoustics, speech, and signal processing. IEEE
Reed S, et al. (2014) Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596
Rezaee K, Rezakhani SM, Khosravi MR, Moghimi MK (2021) A survey on deep learning-based real-time crowd anomaly detection for secure distributed video surveillance. Pers Ubiquit Comput:1–17
Shaheed K, Mao A, Qureshi I, Kumar M, Hussain S, Ullah I, Zhang X (2022) DS-CNN: a pre-trained Xception model based on depth-wise separable convolutional neural network for finger vein recognition. Expert Syst Appl 191:116288
Article Google Scholar
Sharif M, Khan MA, Akram T, Javed MY, Saba T, Rehman A (2017) A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection. EURASIP J Image Vid Process 2017(1):1–18
Google Scholar
Song Y, et al (2015) Tvsum: Summarizing web videos using titles. in Proceedings of the IEEE conference on computer vision and pattern recognition
Tabbone S, Wendling L, Salmon J-P (2006) A new shape descriptor defined on the radon transform. Comput Vis Image Underst 102(1):42–51
Article Google Scholar
Tang L-X, Mei T, Hua X-S (2009) Near-lossless video summarization. in Proceedings of the 17th ACM international conference on Multimedia
Taskiran CM et al (2006) Automated video program summarization using speech transcripts. IEEE Trans Multimed 8(4):775–791
Article Google Scholar
Tavassolipour M, Karimian M, Kasaei S (2013) Event detection and summarization in soccer videos using bayesian network and copula. IEEE Trans Circ Syst Vid Technol 24(2):291–304
Article Google Scholar
Tran TN, Wehrens R, Buydens LM (2006) KNN-kernel density-based clustering for high-dimensional multivariate data. Comput Stat Data Anal 51(2):513–525
Article MathSciNet MATH Google Scholar
Varghese EB, Thampi SM (2018) A deep learning approach to predict crowd behavior based on emotion. In international conference on smart multimedia. Springer
Varghese E, Thampi SM, Berretti S (2020) A psychologically inspired fuzzy cognitive deep learning framework to predict crowd behavior. IEEE Trans Affect Comput
Wang F, Ngo C-W (2007) Rushes video summarization by object and event understanding. In Proceedings of the international workshop on TRECVID video summarization, pp. 25–29
Wang T, et al. (2007) Video collage: a novel presentation of video sequence. In 2007 IEEE international conference on multimedia and expo. IEEE
Wang M, Hong R, Li G, Zha ZJ, Yan S, Chua TS (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Trans Multimed 14(4):975–985
Article Google Scholar
Xu J, Sun Z, Ma C (2021) Crowd aware summarization of surveillance videos by deep reinforcement learning. Multimed Tools Appl 80(4):6121–6141
Article Google Scholar
Yao T, Mei T, Rui Y (2016) Highlight detection with pairwise deep ranking for first-person video summarization. in Proceedings of the IEEE conference on computer vision and pattern recognition
You J, Liu G, Sun L, Li H (2007) A multiple visual models based perceptive analysis framework for multilevel video summarization. IEEE Trans Circuits Syst Vid Technol 17(3):273–285
Article Google Scholar
Zawbaa HM, El-Bendary N, Hassanien AE, Kim TH (2011) Machine learning-based soccer video summarization system. In International Conference on Multimedia, Computer Graphics, and Broadcasting. 2011 (pp. 19–28). Springer, Berlin, Heidelberg
Zhang L, Xu QK, Nie LZ, Huang H (2014) VideoGraph: a non-linear video representation for efficient exploration. Vis Comput 30(10):1123–1132
Article Google Scholar
Zhang S, Zhu Y, Roy-Chowdhury AK (2016) Context-aware surveillance video summarization. IEEE Trans Image Process 25(11):5469–5478
Article MathSciNet MATH Google Scholar
Zhang S, Zhang W, Li Y (2016) Human action recognition based on multifeature fusion. In Chinese intelligent systems conference. 2016. Springer
Zhao W, Wang J, Bhat D, Sakiewicz K, Nandhakumar N, Chang W (1999) Improving color based video shot detection. In Proceedings IEEE international conference on multimedia computing and systems (vol. 2, pp. 752–756). IEEE
Zhu X, et al. (2003) Medical video mining for efficient database indexing, management and access. In proceedings 19th international conference on data engineering (cat. No. 03CH37405). IEEE

Download references

Acknowledgements

The authors extend their appreciation to King Saud University, Riyadh, Saudi Arabia and UET Taxila for supporting this work.

Funding

The authors extend their appreciation to “King Saud University” for funding through researchers supporting project number (RSP- 2021/164), King Saud University, Riyadh, Saudi Arabia.

Author information

Authors and Affiliations

Department of Computer Science, UET, Taxila, 47050, Pakistan
Rabbia Mahum, Aun Irtaza, Marriam Nawaz & Momina Masood
Department of Software Engineering, UET, Taxila, 47050, Pakistan
Marriam Nawaz
Faculty of Computing, Riphah International University, Islamabad, Pakistan
Tahira Nazir
Department of Information Security and Communication Technology, Norwegian University of Science and Technology (NTNU), 2815, Gjøvik, Norway
Sarang Shaikh
Industrial Engineering Department, College of Engineering, King Saud University, Riyadh, 11421, Saudi Arabia
Emad Abouel Nasr

Authors

Rabbia Mahum
View author publications
You can also search for this author in PubMed Google Scholar
Aun Irtaza
View author publications
You can also search for this author in PubMed Google Scholar
Marriam Nawaz
View author publications
You can also search for this author in PubMed Google Scholar
Tahira Nazir
View author publications
You can also search for this author in PubMed Google Scholar
Momina Masood
View author publications
You can also search for this author in PubMed Google Scholar
Sarang Shaikh
View author publications
You can also search for this author in PubMed Google Scholar
Emad Abouel Nasr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rabbia Mahum.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mahum, R., Irtaza, A., Nawaz, M. et al. A robust framework to generate surveillance video summaries using combination of zernike moments and r-transform and deep neural network. Multimed Tools Appl 82, 13811–13835 (2023). https://doi.org/10.1007/s11042-022-13773-4

Download citation

Received: 04 June 2021
Revised: 03 June 2022
Accepted: 01 September 2022
Published: 04 October 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11042-022-13773-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robust framework to generate surveillance video summaries using combination of zernike moments and r-transform and deep neural network

Abstract

Access this article

Similar content being viewed by others

Deep learning models for digital image processing: a review

Convolutional neural network: a review of models, methodologies and applications to object detection

A review of object detection based on deep learning

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A robust framework to generate surveillance video summaries using combination of zernike moments and r-transform and deep neural network

Abstract

Access this article

Similar content being viewed by others

Deep learning models for digital image processing: a review

Convolutional neural network: a review of models, methodologies and applications to object detection

A review of object detection based on deep learning

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation