Intelligent surveillance support system

The Intelligent Surveillance Support System(ISSS) is an innovative software solution that enables real-time monitoring and analysis of security footage to detect and identify potential threats. This system incorporates advanced features such as face recognition, alarm on theft detection, visitors in/out detection and motion detection, to provide a comprehensive and reliable security solution. The implementation of this software aims to improve the efficiency of surveillance systems, thereby enhancing the safety and security of public and private spaces. The focus of this study is on performing the aforementioned tasks in real time while utilizing enhanced algorithms from the OpenCV Library, such as LBPH and Haar Cascading, which enhance the use of machine perception and help us produce outcomes with an accuracy of about 95% after multiple runs. With the rapid advancements in technology and the increasing need for surveillance in today’s world, the Intelligent Surveillance Support System holds immense potential in the field of security and surveillance.


Introduction
The Intelligent Surveillance Support System (ISSS) is a sophisticated platform crafted to augment the surveillance infrastructure of public and private domains.ISSS integrates a host of features such as monitoring, recording, noise detection, motion detection, identification, and rectangle selection.The system's user interface is effortlessly navigable and is developed using the Python's Tkinter library.The software enables users to seamlessly interact with the graphical interface to conduct diverse surveillance operations, such as face recognition, alarm on theft detection, visitors in/out detection and motion detection.
Computer Vision is a highly evolving scientific field in the area of Artificial Intelligence which helps in giving the computer with human like vision capabilities.It is the study of how computers can gain sophisticated knowledge from digital image or video sources.Understanding and automating processes that are carried out by the human visual senses is beneficial to engineers.Intelligent Surveillance Support System is designed light-weighted so that it does not burden the hardware it is running on.However minimum required hardware for this would be: a working PC/Laptop, webcam with installed drivers or other camera sources like CCTV or USB connected cameras or wireless cameras connected to the PC/laptop and night vision needs flashlight or LED or night vision enabled camera.

This section presents a literature survey of previously done work in the same domain
A study has [1] depicted inferences which emphasised on OpenCV, an open-source, computer vision library for identifying and transforming useful information from images.Another researcher [2] proved that processing's purpose is to help a computer understand the content of an image.The majority of image processing is done using a group of libraries offered by OpenCV.This provides the de facto standard API for computer vision applications.We can manage several issues that pop up in real time by using image processing software.Real-time OpenCV applications for image processing are also presented, along with instructions and examples.
According to some researchers [3], The Local Binary Pattern Histogram (LBPH) technique has been offered as a simple solution to the face identification problem since it can recognise both the front and side faces.Yet, in the presence of varied lighting, changeable emotions, and deflected attitude, the LBPH algorithm's recognition rate decreases.This problem is addressed by a modified LBPH method based on pixel neighbourhood grey median (MLBPH).The grey value of the pixel is changed to the median value of its neighbourhood sampling value after the feature value is extracted by the sunblock's and the statistical histogram is established to create the MLBPH feature dictionary, which is used to recognise the identity of the human face in comparison to the test image.It has been [4] stated that Internet of Things can be useful for facial recognition to improve the smart home facilities.Recognition will be done with LBPH technique to identify a person and that can be highly useful for home residents.Challenging areas are to secure, monitor and control real time automation.The required components for the same are web camera, speaker, a stepper motor and Raspberry Pi3 System.
Studies by some researchers [5] in 2021 illustrated how important a person's face is to who they are; in the real world, it is used to tell apart the personalities of two or more people.To ensure that only the right person has access to their particular ac-counts, both real and virtual, some biological components have recently been altered.Biometrics, which uses identification methods including fingerprints, palm veins, DNA, palm prints, and facial recognition, is one of the methods that has been developed.Their research will demonstrate how image processing can be used to use facial identification and recognition algorithms to create a tool that can recognise students' frontal faces in a classroom.In 2021, a work [6] explained that since its debut, digitisation of images has played a substantial and crucial role in the computer science discipline.It encompasses the techniques and methods used when modifying a digital image using a computer.It is a form of signal processing where the input and output could either be a picture or characteristics of that picture.One such crucial area of image processing is image in painting.It is a type of picture preservation and restoration.Additionally selects the best efficient in painting algorithm based on runtime metrics.
The idea [7] of a facial identification system has been proposed to increase reliability by employing facial recognition for a variety of purposes, such as making it simpler for individuals to access with the right security measures during Covid-19 as well as security when trying to disguise their identity.The technique considers using models like Eigen Faces, Fisher Faces, and LBPH Faces as well as software like Python and OpenCV.The units of analysis are taken into account to be still images and clips from videos that capture facial expressions; facial recognition algorithms are then trained on their patterns.According to the results, the LBPH Faces were able to identify faces with a 95% certainty and in less time, which increased the accuracy of facial recognition.
Another work [8] presented a project whose goal was to use face recognition to track attendance in real time across all institutional domains.It is one of the main issues facing all businesses.The proposed approach used machine learning for other biometric measurements like fingerprint, iris, hand, and retina scans, it was simpler to process.The LBPH method will identify the face after the Haar cascade classifier has detected it.Real-time face data creation is the experiment's focus.According to the study [9], NumPy arrays, which are the Python language's standard representation for numerical data, enable the efficient implementation of numerical operations in a high-level language, shows how to vectorize calculations, eliminate memory data copies, and reduce operation counts as the ways to improve NumPy speed.It has been [10] stated that Tkinter programming is intended for Python users that need to create applications with graphical user interfaces (GUIs).
Other Researchers Also a work [11] explains scikit-image, a collection of Python-based image processing methods made available by a thriving community of volunteers under the permissive BSD Open Source license.Python's expanding popularity as a scientific programming language and the expanding accessibility of a sizeable eco-system of auxiliary tools make it an ideal environment for developing an image processing toolkit.Goala et al. in their work [12] employs a fuzzy multi-criteria decision support system to prioritize the components of a smart city that could be at risk of terrorist attacks.To achieve this, a novel aggregation operation on Intuitionistic fuzzy sets has been introduced.Furthermore, a case study on a smart city has been conducted to demonstrate the practicality of the proposed approach.A specialized optimal data aggregation approach, enabled by the Internet of Things (IoT), has been developed for intelligent surveillance systems in smart cities.This approach [13] aims to transform raw data values into refined ones while minimizing data loss.Furthermore, the proposed scheme ensures that each server is responsible for performing the data refinement process, thus maintaining the desired accuracy and precision ratio.Researchers in [14] their report emphasizes the need for the healthcare industry to implement digital protection coordinated by blockchain technology to safeguard crucial clinical assets in the context of artificial intelligence utilization in clinical settings.By leveraging blockchain-based applications, it becomes possible to accurately identify and address the most critical and potentially harmful errors within the medical field.Decentralized information protection, supported by blockchain, ensures the security of patient health records and protects them from data theft Figs 1, 2, 3 and 4.
The summary of some other studies in this domain are presented in Table 1.

Methodology
This paper approaches the aspects listed below, each require a particular set of algorithms and techniques to operate: Monitor, recognise the family member, listen for noises, and look out for visitors in the room.We will go over the methodologies and algorithms used for each function.First, the GUI has seven buttons, each with its own function:  This code captures frames from the default camera, calculates the difference between two consecutive frames, applies thresholding and contour detection to identify motion in the video.If motion is detected, it draws a green rectangle around the moving object and displays "MOTION" text on the screen.If no motion is detected, it displays "NO-MOTION" in red text.
Our Noise algorithm consists of the following steps: 1 Import OpenCV library. 2 Define a function called "noise".
3 Capture a video stream from the default camera using VideoCapture(0) method.
4 Start an infinite loop to process each frame of the video stream.5 Read two frames from the video stream and store them as frame1 and frame2.6 Calculate the absolute difference between the two frames and convert the result to grayscale.7 Blur the grayscale image using a kernel of size 5 × 5. 8 Threshold the blurred image using a threshold value of 25 and set all values above the threshold to 255. 9 Find all the contours in the thresholded image using the findContours method.10 If the length of contours is greater than zero, then find the contour with the maximum area using the max method.11 Get the bounding rectangle for the maximum area contour using the boundingRect method.
12 Draw a green rectangle around the area with motion using the rectangle method.
13 If no motion is detected, then put the text "NO-MOTION" in red colour.14 If motion is detected, then put the text "MOTION" in green colour.15 Display the resulting frame.16 If the 'Esc' key is pressed, release the video capture object and destroy all windows.17 End the loop.

F. Module 6-Person Identification
This module defines three functions-"collect_data()", "train()", and "identify()"-that together enable the collection of face images, training of a face recognition model, and real-time recognition of known faces from a webcam.The algorithm uses OpenCV, Haar cascades, and LBPH (Local Binary Pattern Histograms) face recognition to achieve this.The algorithm is implemented with a Tkinter GUI that allows the user to choose between adding new faces to the system or recognizing known faces.Face identification in pictures or video frames is accomplished using the Haar Cascade Classifier, a machine learningbased technique to object detection.A sizeable collection of photos with both positive and negative examples of the item being recognised is used to train the classifier.To find the item, the classifier employs a collection of characteristics that are represented as Haar wavelets.The LBPH Face Recogniser is a technique for identifying faces in pictures.First, the system uses a Haar Cascade Classifier to find faces in the input picture.A histogram of these patterns is then computed once local binary patterns from the observed facial areas have been extracted.The face is then identified using this histogram.
Using Grid parameters, the image can be divided and produced in the previous step into several grids, as seen in the following image: The model is then trained, and subsequently, when we want to make predictions, we follow the same steps and compare the histograms with the model that has previously been trained.This is how this functionality works.The open CV library is used to implement the proposed methodology [22].
Algorithm for this module is as follows: the image that is to be stored in the visitors folder with timestamp, during the person entry in frame.The Fig. 9 shows the result where the proposed module correctly identifies the lamps stolen or is missing form the frame.
Here in Fig. 5 one can see that our model identifies rightly that there is no motion in the frame.
Here in Fig. 6 the model identifies the motion in the frame caused by the movement of hand.The rectangle selected as the subframe rightly identifies that there is no motion within, as shown in Fig. 7.The Fig. 8 shows that person when moves across the frame, our model identifies the entry and stores the shown above image along with timestamp in the visitors folder.
Our proposed model uses SSIM(Structural Similarity Index Metric), a metric that is used to examine how similar two provided images are to one another.There is a ton of information detailing the theory of SSIM because it has been there since 2004, but very few resources go into great detail, especially for a gradient-based implementation because SSIM is frequently employed as a loss function.Three crucial elements are extracted from an image via the Structural Similarity Index (SSIM) metric: Structure, Contrast, and Luminance.On the basis of these 3 attributes, a comparison between the two photos is made.
Speaking about the datasets, the algorithms use real time frame comparisons for the majority of the features.The Structural Similarity Index, which ranges in value from -1 to + 1, is calculated using this system between two provided images.A number of + 1 denotes that the two photo-graphs are identical or extremely similar, whereas a value Fig. 5 Module-1 no motion Fig. 6 Module-1 motion found Fig. 7 Module-2 rectangular noise where a frame is selected of -1 denotes that the two images are highly dissimilar.To fit into the range [0, 1], where the extremes have the same importance, these values are routinely adjusted.Still lets look upon the is an analysis of the main aspect.A machine learning approach known as haar cascading is used to identify the faces in a given frame.The research conducted by researchers [23] in their work of Facial identification using Haar cascade and LBP classifiers, justifies the adoption of the haar cascading algorithm over the LBP algorithm.With the help of their work, we are now able to evaluate the algorithm in the context of our use case, the Smart CCTV based Real time threat detection System.Haar Cascade classifier used in this system and accuracy measures are evaluated with respect to number of faces detected as shown in Table 2.
After recognition, the LBPH algorithm is used to carry out identification.Instead of LBPH, we could have utilized algorithms like Eigen Face or Fisher Face.In order to support the decision to use the LBPH method, the paper by researchers [24] is cited.
Face photos taken in unrestricted weather are not part of any datasets that are currently available.Unfortunately, there aren't enough images in large web-based dataset sources like imagenet for the experiment.As a result, a dataset called "LUDB" was produced for this type of research which was used by researchers [24].Along with the LUDB dataset, two more well-known datasets, AT&T and 5_Celebrity [24], were employed for their performance study.The popular machine learning platform Kaggle provided them 5_Celebrity dataset.The metrics accuracy, precision, recall and F1 score of LBPH were determined using the Scikit learn tool.Through their work, we are now able to analyse the algorithm in the context of our use case, The Intelligent Surveillance Support System (ISSS), which also contains the facial recognition feature.Based on our ISSS framework, the following summarises the LBPH comparison for the three datasets mentioned above: Refer to their study as shown in Table 3 to learn more about how this method, compares under various weather situations.LBPH accuracy graph for the datasets.Figure 9 presents precision and recall measures.

Conclusion and future scope
This Intelligent Surveillance Support System is a innovative solution for advanced video surveillance.Leveraging the power of computer vision using OpenCV, it offers unparalleled capabilities that far exceed those of traditional  systems.With its advanced motion detection, theft detection, facial recognition, and in-out movement tracking, this system provides a highly sophisticated level of security that can keep any environment safe and secure (Fig. 10).Furthermore, this system has a wide range of future extensions that will further enhance its capabilities.For instance, the system can be integrated with mobile CCTV and used to monitor offline centre-based examinations without the need for human intervention.The addition of built-in night vision will make it ideal for use in low-light environments, while the incorporation of deep learning technology will enable it to identify deadly weapons and detect accidental fires.
This system can also be developed as a standalone device that requires no external support, making it an ideal solution for deployment in remote locations or areas with limited connectivity.Moreover, the creation of a standalone program that does not require any prerequisites, such as Python, will make it more accessible to users who may not have the technical expertise to operate traditional security systems.
In conclusion, the Intelligent Surveillance Support System (ISSS) is a sophisticated software solution that leverages advanced features and optimised algorithms from the OpenCV library to enhance security and surveillance operations.By providing real-time monitoring and analysis of security footage, the ISSS offers a comprehensive and reliable security solution that improves the efficiency and effectiveness of surveillance systems.
Key factors associated with the ISSS include its ability to perform face recognition, theft detection, visitors in/out detection, and motion detection in real time.These features enable proactive threat detection, efficient monitoring of individuals, prevention of security breaches, and instant alerts for unusual activities.By automating these processes and leveraging machine perception, the ISSS minimizes human error, reduces response time, and enhances the overall safety and security of public and private spaces.
In terms of future improvements, some possible areas of improvement could include the incorporation of Advanced AI Algorithms, Multi-camera integration, behavioural Analysis, IoT devices, Cloud based processing, data analytics and predictive insights, etc. Exploring these areas and incorporating the latest advancements in technology and research, the Intelligent Surveillance Support System can continue to evolve and provide even more robust and efficient security solutions in the future.
In summary, Intelligent Surveillance Support System represents modern video surveillance technology, and its advanced features, extensive functionality, and flexible deployment options make it a valuable asset for a broad range of applications.

3 •
/doi.org/10.1007/s43926-023-00039-0 1 Monitor(Module 5): The function uses OpenCV to detect motion in a video stream by comparing successive frames and finding contours in the thresholded difference image and helps identify missing object.• Rectangle(Module 2): This feature tracks motion in a user-selected region of interest in the camera frame by comparing consecutive frames of the video.It displays "MOTION" in green text if motion is detected and "NO-MOTION" in red text otherwise, and ends when the escape key is pressed.• Noise(Module 1): This feature detects motion in a video captured by the default camera by comparing consecutive frames, applying thresholding and contour detection.It displays "MOTION" in green text if motion is detected and "NO-MOTION" in red text if no motion is detected.. • Record(Module 4): This feature records a timestamped video using the default camera and saves it in AVI format with 640 × 480 resolution and 20 FPS, which can be stopped and saved by pressing the "Esc" key.

Fig. 4
Fig. 4 Identification of related feature

Fig. 8 Fig. 9
Fig. 8 Module-3 visitor found and will be stored in visitors file

Table 2
Accuracy measures for no. of faces detected