Abstract
Cameras are the primary data sources in video surveillance systems and produce massive data every second. Video surveillance is an extremely beneficial functionality brought to us by modern technology. An essential application of video surveillance in public security is facilitating the observation and analysis of events. Video surveillance systems require high-bandwidth media to transfer, high-capacity media to store, and high-performance hardware to process data. Consequently, these systems impose many costs on organizations. Video compression techniques can reduce the amount of data transferred or stored by surveillance systems and, as a result, lower the costs. Fixed CCTV cameras are the largest category of surveillance cameras. Backgrounds in these videos are typically constant and saving them for every frame is redundant. Therefore, a background-aware approach can achieve a higher compression rate in compressing these cameras’ videos than conventional approaches. This paper proposes a video codec for fixed cameras based on background extraction and moving-object detection algorithms. By background extraction, the pure backgrounds of the video are generated and stored in JPEG format for consecutive time intervals. By moving-object detection, the objects, and their coordinates are extracted in each frame using YOLOv7 and stored in JPEG format separately from the backgrounds. At the decoder side, each frame is built up using the generated background and detected objects which have been stored and transmitted as JPEG files. The evaluation of the proposed method on an appropriate set of videos from CDnet2014 and EWAP datasets shows that the proposed method can compress the videos effectively by significant compression ratios up to 46.76x, while the worst quality loss resulting from the compression is 0.99 according to the SSIM measure.
Similar content being viewed by others
Data availability
The datasets which have been used in this paper are available on web and have already been used by other authors.
References
Subudhi BN, Rout DK, Ghosh A (2019) Big data analytics for video surveillance. Multimed Tools Appl 78:26129–26162. https://doi.org/10.1007/s11042-019-07793-w
Zheng W, Wang K, Wang F-Y (2020) A novel background subtraction algorithm based on parallel vision and bayesian GANs. Neurocomputing 394:178–200. https://doi.org/10.1016/j.neucom.2019.04.088
Qiu S, Cui Y, Meng X (2020) A data encryption and fast transmission algorithm based on surveillance video. Wirel Commun Mob Comput 2020:e8842412. https://doi.org/10.1155/2020/8842412
Dhungel P, Tandan P, Bhusal S et al (2020) Video compression for surveillance application using deep neural network. J Artif Intell Capsule Netw 2:131–145. https://doi.org/10.36548/jaicn.2020.2.006
Bidwe RV, Mishra S, Patil S et al (2022) Deep learning approaches for video compression: a bibliometric analysis. Big Data Cognit Comput 6:44. https://doi.org/10.3390/bdcc6020044
Kwon O, Lee N, Shin B (2014) Data quality management, data usage experience and acquisition intention of big data analytics. Int J Inf Manag 34:387–394. https://doi.org/10.1016/j.ijinfomgt.2014.02.002
De Gregorio M, Giordano M (2015) Background modeling by weightless neural networks. In: Murino V, Puppo E, Sona D et al (eds) New trends in Image Analysis and Processing -- ICIAP 2015 Workshops. Springer International Publishing, Cham, pp 493–501. https://doi.org/10.1007/978-3-319-23222-5_60
Wu L, Huang K, Shen H, Gao L (2021) Foreground-background parallel compression with residual encoding for Surveillance Video. IEEE Trans Circuits Syst Video Technol 31:2711–2724. https://doi.org/10.1109/TCSVT.2020.3027741
Chen Y, Hu R, Xiao J, Wang Z (2019) Multisource surveillance video coding with synthetic reference frame. J Vis Commun Image Represent 65:102685. https://doi.org/10.1016/j.jvcir.2019.102685
Zhao Y, Luo D, Wang F et al. (2023) End-to-end compression for surveillance video with unsupervised foreground-background separation. IEEE Trans Broadcast 1–13. https://doi.org/10.1109/TBC.2023.3280039
Kusuma H, Mahesh RA (2015) Video Compression using spatial and temporal redundancy –a comparative study. Int J Innovative Res Sci Eng Technol 4:8. https://doi.org/10.15680/IJIRSET.2015.040613
Nilsson F, Communications A (2017) Intelligent Network Video: Understanding Modern Video Surveillance Systems, 2nd ed. CRC Press. ISBN: 978-1-4665-5521-1
Digital Image Processing (3rd Edition): Gonzalez RC, Woods RE: 9780131687288: Amazon.com: Books. https://www.amazon.com/Digital-Image-Processing-Rafael-Gonzalez/dp/013168728X. Accessed 7 Mar 2023. ISBN: 978-0-13-168728-8
Birman R, Segal Y, Hadar O (2020) Overview of Research in the field of Video Compression using deep neural networks. Multimed Tools Appl 79:11699–11722. https://doi.org/10.1007/s11042-019-08572-3
Babu RV, Makur A (2006) Object-based surveillance video compression using foreground motion compensation. In: Robotics and Vision 2006 9th International Conference on Control, Automation. pp 1–6. https://doi.org/10.1109/ICARCV.2006.345186
Wang S, Zhao Y, Gao H et al (2022) End-to-end video compression for surveillance and conference videos. Multimed Tools Appl 81:42713–42730. https://doi.org/10.1007/s11042-022-13484-w
Ding D, Ma Z, Chen D et al (2021) Advances in video compression system using deep neural network: a review and case studies. Proc IEEE 109:1494–1520. https://doi.org/10.1109/JPROC.2021.3059994
Image I and Video Compression – 2nd Edition. https://www.elsevier.com/books/intelligent-image-and-video-compression/bull/978-0-12-820353-8. Accessed 4 Mar 2023. ISBN: 978-0-12-820353-8
Bhojani DR, Dwivedi VJ, Thanki RM (2020) Hybrid video compression standard. Springer, Singapore. ISBN: 9789811502446 9789811502453
Domínguez HO, Rao KR (2018) Versatile video coding latest advances in video coding standards. In: Versatile Video Coding: Latest Advances in Video Coding Standards. River Publishers, pp i–xxx. ISBN: 978-87-7022-046-0
H.261 : Video codec for audiovisual services at p x 64 kbit/s. https://www.itu.int/rec/T-REC-H.261-199303-I/en. Accessed 26 Feb 2023
ITU-T and ISO/IEC JTC 1, Generic Coding of Moving Pictures and Associated Audio Information—Part 2 (2023) : Video, ITU-T Rec. H.262 and ISO/IEC 13818-2 (MPEG-2 Video), version 1, 1994. Accessed 7 Mar 2023
H.263 : Video coding for low bit rate communication. https://www.itu.int/rec/T-REC-H.263/. Accessed 26 Feb 2023
ISO/IEC JTC 1, Coding of Audio-Visual Objects—Part 2: Visual, ISO/IEC 14496-2 (MPEG-4 Visual), version 1, 1999, version 2, 2000, version 3, 2004. Accessed 7 Mar 2023
ITU-T and ISO/IEC JTC 1, Advanced Video Coding for Generic Audiovisual Services, Rec ITU-T (2012) H.264 and ISO/IEC 14496-10 (AVC), version 1, 2003, version 2, 2004, versions 3, 4, 2005, versions 5, 6, 2006, versions 7, 8, 2007, versions 9, 10, 11, 2009, versions 12, 13, 2010, versions 14, 15, 2011, version 16, Accessed 7 Mar 2023
Joint Collaborative Team on Video Coding (JCT-VC) (2019) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 ISO/IEC 23008-2 and ITU-T Recommendation H.265, High Efficiency Video Coding (HEVC), Accessed 7 Mar 2023
Lainema J, Bossen F, Han W-J et al (2012) Intra Coding of the HEVC Standard. IEEE Trans Circuits Syst Video Technol 22:1792–1801. https://doi.org/10.1109/TCSVT.2012.2221525
Shi YQ, Sun H (2019) Image and video compression for multimedia engineering: Fundamentals, Algorithms, and Standards, 3rd edon. CRC Press; Taylor and Francis Group. ISBN: 978-1-138-29959-7
Nilsson F (2017) Intelligent network video: understanding modern video surveillance systems. CRC Press, Boca Raton (ISBN: 978-0-429-24913-6)
Bull D, Zhang F (2021) Intelligent image and video compression: communicating pictures, 2nd ed. Academic Press. ISBN: 978-0-12-820353-8
Zhang X, Huang T, Tian Y, Gao W (2014) Background-modeling-based adaptive prediction for Surveillance Video Coding. IEEE Trans Image Process 23:769–784. https://doi.org/10.1109/TIP.2013.2294549
Human detection in surveillance videos and its applications - a review | EURASIP Journal on Advances in Signal Processing | Full Text. https://asp-eurasipjournals.springeropen.com/articles/10.1186/1687-6180-2013-176. Accessed 23 Feb 2023
Kalsotra R, Arora S (2021) Background subtraction for moving object detection: explorations of recent developments and challenges. Vis Comput. https://doi.org/10.1007/s00371-021-02286-0
Kalsotra R, Arora S (2019) A comprehensive survey of video datasets for background subtraction. IEEE Access 7:59143–59171. https://doi.org/10.1109/ACCESS.2019.2914961
Bouwmans T, Porikli F, Höferlin B, Vacavant A (2014) Background modeling and foreground detection for video surveillance. CRC Press. ISBN: 978-1-4822-0538-1
Reddy V, Sanderson C, Lovell BC (2011) A low-complexity algorithm for static background estimation from cluttered image sequences in Surveillance contexts. EURASIP J Image Video Process 2011:1–14. https://doi.org/10.1155/2011/164956
Laugraud B, Piérard S, Van Droogenbroeck M (2017) LaBGen: A method based on motion detection for generating the background of a scene. Pattern Recognit Lett 96:12–21. https://doi.org/10.1016/j.patrec.2016.11.022
Laugraud B, Piérard S, Van Droogenbroeck M (2016) LaBGen-P: A pixel-level stationary background generation method based on LaBGen. In: 2016 23rd International Conference on Pattern Recognition (ICPR). pp 107–113. https://doi.org/10.1109/ICPR.2016.7899617
Laugraud B, Piérard S, Van Droogenbroeck M (2018) LaBGen-P-Semantic: A First Step for leveraging semantic segmentation in background generation. J Imaging 4:86. https://doi.org/10.3390/jimaging4070086
Wang H-C, Lai Y-C, Cheng W-H et al (2018) Background extraction based on joint gaussian conditional Random fields. IEEE Trans Circuits Syst Video Technol 28:3127–3140. https://doi.org/10.1109/TCSVT.2017.2733623
Savakis A, Shringarpure AM (2018) Semantic background estimation in video sequences. In: 2018 5th International Conference on Signal Processing and Integrated Networks. pp 597–601. https://doi.org/10.1109/SPIN.2018.8474279
Wang S, Chen Y, Bai Y (2016) A surveillance video compression algorithm based on regional dictionary. MATEC Web of Conferences 56:02008. https://doi.org/10.1051/matecconf/20165602008
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified. Real-Time object detection. https://doi.org/10.48550/arXiv.1506.02640
Zou Z, Chen K, Shi Z et al (2023) Object Detection in 20 Years: A Survey. Proceedings of the IEEE 111:257–276. https://doi.org/10.1109/JPROC.2023.3238524
Shaikh SH, Saeed K, Chaki N (2014) Moving object detection using background Subtraction. In: Shaikh SH, Saeed K, Chaki N (eds) Moving object detection using background subtraction. Springer International Publishing, Cham, pp 15–23 (ISBN: 978-3-319-07386-6)
Kumar S, Yadav JS (2016) Video object extraction and its tracking using background subtraction in complex environments. Perspect Sci 8:317–322. https://doi.org/10.1016/j.pisc.2016.04.064
Zuo J, Jia Z, Yang J, Kasabov N (2020) Moving object detection in video sequence images based on an improved visual background extraction algorithm. Multimed Tools Appl 79:29663–29684. https://doi.org/10.1007/s11042-020-09530-0
Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vision 63:153–161. https://doi.org/10.1007/s11263-005-6644-8
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2015/hash/14bfa6bb14875e45bba028a21ed38046-Abstract.html
Chu W, Cai D (2018) Deep feature based contextual model for object detection. Neurocomputing 275:1035–1042. https://doi.org/10.1016/j.neucom.2017.09.048
Fu Z, Chen Y, Yong H et al (2019) Foreground gating and background refining network for surveillance object detection. IEEE Trans Image Process 28:6077–6090. https://doi.org/10.1109/TIP.2019.2922095
Hindawi AD (n.d.) Encryption and fast transmission algorithm based on surveillance video. https://www.hindawi.com/journals/wcmc/2020/8842412/. Accessed 8 Sep 2022. https://doi.org/10.1155/2020/8842412
Kumar A, Srivastava S (2020) Object detection system based on Convolution neural networks using single shot multi-box detector. Procedia Comput Sci 171:2610–2617. https://doi.org/10.1016/j.procs.2020.04.283
Alipour P, Shahbahrami A (2022) An adaptive background subtraction approach based on frame differences in video surveillance. In: 2022 International Conference on Machine Vision and Image Processing (MVIP). pp 1–5. https://doi.org/10.1109/MVIP53647.2022.9738762
Wang C-Y, Bochkovskiy A, Liao H-YM (2022) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. https://doi.org/10.48550/arXiv.2207.02696
Kathuria A (2018) What’s new in YOLO v3? In: Medium. https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b. Accessed 10 Sep 2022
Wang Y, Jodoin P-M, Porikli F et al (2014) CDnet 2014: an expanded change detection benchmark dataset. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp 393–400. https://doi.org/10.1109/CVPRW.2014.126
Becker S, Hug R, Hübner W, Arens M (2019). In: Leal-Taixé L, Roth S (eds) RED: a simple but effective baseline predictor for the TrajNet Benchmark. Springer International Publishing, Cham, pp 138–153. https://doi.org/10.1007/978-3-030-11015-4_13
Haddad S, Wu M, Wei H, Lam SK (n.d.) Situation-Aware Pedestrian Trajectory Prediction with Spatio-Temporal Attention Model. https://doi.org/10.3217/978-3-85125-652-9
Channappayya S, Bovik AC (2008) Structural similarity index based optimization. In: Furht B (ed) Encyclopedia of Multimedia. Springer US, Boston, MA, pp 832–836. https://doi.org/10.1007/978-0-387-78414-4_67
Perumal B, Rajasekaran MP (2016) A hybrid discrete wavelet transform with neural network back propagation approach for efficient medical image compression. In: 2016 International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS). pp 1–5. https://doi.org/10.1109/ICETETS.2016.7603060
Abdel-Salam Nasr M, AlRahmawy MF, Tolba AS (2017) Multi-scale structural similarity index for motion detection. J King Saud Univ - Comput Inform Sci 29:399–409. https://doi.org/10.1016/j.jksuci.2016.02.004
Wang Z, Simoncelli EP, Bovik AC (2003) Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003. pp 1398–1402 Vol.2. https://doi.org/10.1109/ACSSC.2003.1292216
Author information
Authors and Affiliations
Contributions
I, Asadollah Shahbahrami proposed the idea, of video compression using background extraction and moving Object Detection by YOLO. My PhD student, Soheib Hadi implemented and tested the proposed technique and Dr. Hossien Azgomi as advisor helped our Ph.D. student to write the paper. In other words, this work is a part of my student’s Ph.D. thesis.
Corresponding author
Ethics declarations
Funding and/or Conflicts of Interests/Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hadi, S., Shahbahrami, A. & Azgomi, H. A video codec based on background extraction and moving object detection. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-17933-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-023-17933-y