Application of foreground object patterns analysis for event detection in an innovative video surveillance system
- 1.7k Downloads
- 2 Citations
Abstract
SmartMonitor is an innovative surveillance system based on video content analysis. It is a modular solution that can work in several predefined scenarios mainly concerned with home/surrounding protection against unauthorized intrusion, supervision over ill person and crime detection. Each scenario is associated with several actions and conditions, which imply the utilization of algorithms with various input parameters. In this paper, focus is put on the analysis of foreground object patterns for the purposes of event recognition, as well as the experimental investigation of selected methods and algorithms which were developed and employed for the SmartMonitor system prototype. The prototype performs three main tasks: detection and localization of foreground regions using adaptive background modelling based on Gaussian Mixture Models, candidate objects extraction and classification using Haar and HOG descriptors, and tracking using Mean-Shift algorithm. The main goal of the work described here is to match system parameters with each scenario to provide the highest effectiveness and to decrease the number of false alarms.
Keywords
SmartMonitor Visual surveillance system Video content analysis Foreground object pattern Pattern analysis Event detection1 Introduction
Video surveillance systems have recently become more autonomic and functional. The advances in video content analysis (VCA) algorithms have undoubtedly contributed to the application of such systems in new areas and demanding locations. This has also resulted in lowering the demand for operators of monitoring systems, at the same time has facilitated the work of those who have to handle a large number of cameras and peripherals combined into a single system. Intelligent monitoring systems with VCA functionality are implemented mainly for monitoring wide areas and public buildings, and the infrastructure utilized for this purpose is specific and expensive. Despite this, there are still people who want to ensure their own safety, protect their houses or small businesses and surrounding areas. For this reason, the demand for systems that utilize common electronic devices, work without human control and are affordable for individuals arises. This causes that surveillance systems have to operate under different conditions; however, the concept of ’universality’ cannot be applied here directly—there are no systems that work equally well in various environments and circumstances using the same parameters. Since the solution cannot be universal, it should offer a possibility to adjust it to enable customisation and better adaptation. In response to these needs, SmartMonitor was developed as a customizable visual surveillance system for personal use.
The SmartMonitor system is designed to work under several independent scenarios that provide homes and their surroundings protection against unauthorized intrusion, allow for supervision of people who are ill and detect suspicious behaviour. Each scenario is characterized by a group of performed actions and is activated when certain conditions are fulfilled, for instance a movement is detected in a protected area or there is no movement for a specific period of time. In these cases, it is crucial to properly configure associated thresholds to avoid multiple false alarms. The most important parameters are associated with the duration of the actions performed by the object and an object’s physical (two-dimensional) features. In the paper, they are investigated to find the most appropriate parameter values for the event detection in each scenario.
Simplified diagram of the system modules
Due to the fact that the SmartMonitor system is a combination of security and surveillance solutions with different degrees of advancement, it can be compared to alarm systems based on sensors, small CCTV, home automation, video surveillance and advanced systems based on video content analysis algorithms. To provide a background of existing solutions, we will focus on some examples and indicate the differences. The solutions like ADT Pulse [1] or vivint [2] generate alerts based on various sensors, in turn ZoneMinder [3] is intended for video surveillance with motion detection. The key features of these systems include: a use of wireless camera, remote access and some home automation functionalities. They have features similar to our system, but require human intervention and simultaneously do not provide an automatic differentiation of dangerous situations as well as an automatic response. AgentVi [4] provides solution for video analysis in large installations that is based on open architecture approach. The software is distributed between an edge device and a server. Another VCA-based industry solution is IVA 5.60 Intelligent Video Analysis by Bosch [5]. It is a guard assistant system based on intelligent video analysis which detects, tracks and analyses moving objects. The analytics is built into cameras and encoders, which increases the cost of installation. Both AgentVi and IVA 5.60 offer advanced video analysis, but are not intended for home use. They differ in respect of architecture, where SmartMonitor is a centralized solution and does not process any data on edge devices. Moreover, the mentioned systems do not enable the use of controlled devices. The more advanced solutions are also present on the market, such as AISight by BRS Labs [6] for behaviour recognition. This system is able to autonomously build a base of knowledge and generate real-time alerts to support the security team in various industry areas. It analyses traffic, detects perimeter intrusion, secures facilities, generates transit and manufacturing alerts, and identifies various events and activities with respect to an usual time of a day. Compared to AISight, the SmartMonitor system is not capable of self-learning. However, for the purposes of SmartMonitor, such functionality is redundant and might increase the final price. Moreover, this solution does not enable the use of controlled devices.
The rest of the paper is organized as follows. The second section provides a description of system working scenarios. The third section describes the algorithms that were applied in building the system prototype. The fourth section includes experimental conditions, and the fifth one discusses the results of the experiments and their explanation. The last section concludes the paper.
2 Brief description of the system scenarios
Two sample frames presenting running objects in the garden—basic human/not-human classification excludes the alarm activation when dogs are detected (scenario A) [7]
Sample frames presenting the simulation of an ill person fainting (scenario B) [7]
Sample frames presenting the simulation of a crime scene (scenario C) [7]
3 Methods and algorithms developed and employed for the system prototype
-
A camera has to be placed in a fixed location and observe the same area in a continuous manner;
-
Exposure parameters have to remain unchanged for a long period of time;
-
Frame resolution has to enable the extraction of single objects;
-
Camera noise and weather conditions may not cause problems during the extraction of foreground objects;
-
Individual frames of the video stream are processed;
-
Information about future frames is not included.
3.1 Background modelling and foreground extraction
General scheme of applied background subtraction process [13]
Foreground extraction: a sample input image (a) and the results of foreground extraction using three models—RGB (b), intensity (c) and chrominance (d) [13]
3.2 Types of false detections and the artefacts removal process
Depending on the environment and the utilized colour model, false detections can take various forms—from large coherent regions to single isolated pixels. Artefacts occur for several reasons, e.g. sudden illumination changes, shadows of moving objects, background movement and background initialization in the presence of moving objects [17]. Short and sudden changes in illumination may appear due to turning the light on and off or the sun shining through the clouds. It causes the background colours to change, leading to an increase in the difference between the model and the current frame. Moving shadows are usually detected when the intensity model is used and very often their regions are connected with the actual OOI’s region. Hence, a shadow may be mistakenly classified as a foreground region. Background movement can be defined as relocation of part of the background caused, for example, by movements of the grass and leaves in the wind, and resulting in a high level of noise in the foreground areas. As building the background model usually includes the use of the first captured frame, the selected image cannot contain any moving objects—if it does, they are incorrectly incorporated into the background image and partially occlude it.
The artefact removal process—exemplary images obtained at each stage
3.3 Object classification
Results of the experiment utilizing the HOG descriptor with a fixed template size [19]
The second classifier is based on Haar-like features [20], which are simple features combined into a cascade. The AdaBoost machine learning technique is used to select the most appropriate Haar features and set correct threshold values. During classification using the Haar-like features cascade, subsequent object’s features are calculated only when the answer of the previous feature is consistent with the learned value. Otherwise the object is rejected. The cascade is designed such as to reject the negative objects at the earliest stage of recognition [21].
3.4 Object tracking
Results of the experiment utilizing the Mean-Shift algorithm [19]
4 The experimental conditions
In the previous section, the main algorithms and methods developed and employed for the system prototype have been briefly presented, namely background modelling, artefacts removal process, tracking method and human silhouettes classifiers. The reasons for selecting the described solutions were also provided along with examples of other approaches that were initially taken into account. Some experimental results of employing the individual solutions have been provided. The experiments proved the accuracy and efficiency of single approaches, however their fusion into prototype software has to be investigated. This section contains the explanation of conditions of the experiments investigating combined algorithmic approaches in the context of object pattern analysis and event recognition for the determination of appropriate system working parameters associated with each scenario.
-
P, defines the proportion of an object’s bounding box;
-
K, defines the maximum number of frames, that is the time in which a person stays in the protected area or does not move;
-
T, defines the number of frames, that is the time in which the change in proportion, if other conditions are met, causes the activation of the alarm.
Step 1. Set initial parameters and threshold values.
Step 2. Open input file (video sequence).
Step 3. Build a background model.
Step 4. Retrieve current video frame.
Step 5. Localize foreground areas in each processed frame.
Step 6. Perform Haar and HOG classification for each detected object.
Step 7. If the classification step gives a positive result, go to Step 8. Otherwise perform Mean-Shift tracking for the recently detected object.
Step 8. Check thresholds predefined for each scenario:
Step 8.1. For scenario A, check object’s location—if an object remains in the protected area longer than K frames, then start the alarm;
Step 8.2. For scenario B, check object’s position—if object’s position does not change for more than K frames and object’s proportions do not change to larger than P over T frames, then start the alarm;
Step 8.3. For scenario C, check object’s location and proportions—if object’s proportions changed and exceed P, then start the alarm.
Step 9. If is it not the end of the video sequence, go to step 4. Otherwise terminate the processing.
Initial system parameters.
| Parameter | Value |
|---|---|
| Frame spatial resolution | \(640 \times 360\) pixels or less |
| Scaling factor | 0.5—\(320 \times 180\) pixel frames are processed |
| Learning mode for background model | 40 frames without any movements are used for learning (the model calibrates within 2–3 s) |
| Morphological operations | Median filter and double erosion using foreground binary mask |
| The size of the area subjected to further classification | Minimum 60 pixels |
| Proportions of object’s bounding box that is classified as a human | 2:1 (height to width) |
| Minimum object’s size for the Haar classifier | \(24 \times 12\) pixels |
| Minimum object’s size for the HOG classifier | \(96 \times 48\) pixels |
| The area on which the tracked object is searched for using the Mean-Shift algorithm | Half of the minimum object’s size (width, height) |
5 Practical verification of the developed approach
In this section, the experimental results of application of the SmartMonitor system prototype are presented. Several experiments were carried out to investigate the effectiveness of the combination of algorithmic approaches and the accuracy of object pattern analysis for the needs of event recognition related to system scenarios and alarm activation conditions. Experimental results are provided in two ways: as figures and descriptions of the object pattern analysis, object-related parameters and their thresholds. Each figure contains: a sample frame before an alarm activation, a sample frame after the alarm activation and three graphs—object trajectory as XY position, aspect ratio and area of an object’s bounding box.
Sample frames presenting the outdoor scene (scenario A). The two upper frames are (respectively): movement detected without triggering the alarm, and tracked object crossing a virtual line (alarm was triggered). The lower plots present: tracked object position in the image plane, aspect ratio of the bounding box and the area in pixels
Sample frames presenting the outdoor scene (scenario A). The two upper frames are (respectively): movement detected without triggering alarm, and tracked object crossing a virtual line (alarm was triggered). The lower plots present: tracked object position in the image plane, aspect ratio of the bounding box and the area in pixels
Sample frames presenting the indoor scene (scenario B, supervision over ill person). The two upper frames are (respectively): movement detected without triggered alarm, and tracked object changing its proportions (alarm was triggered as an effect of falling down). The lower plots present: tracked object position in the image plane, aspect ratio of the bounding box and the area in pixels
Sample frames presenting an indoor scene (Scenario C, crime protection). The two upper frames are (respectively): movement detected without triggered alarm, and tracked object changing its proportions (alarm was triggered as an effect of raised hands). The lower plots present: tracked object position in the image plane, aspect ratio of the bounding box and the area in pixels
6 Summary and conclusions
The main goal of the paper was to provide experimental results of the algorithms prepared for the prototype SmartMonitor software. SmartMonitor is an innovative surveillance system based on image analysis that was created to ensure protection of individual users and their properties in small areas. The system enables the user to set individual safety rules, which in turn determine the degree of system’s sensitivity. Human interaction is only required during calibration. The system is now prepared to be placed on the market.
Threshold values of parameters determining alarm activation in each scenario.
| Scenario | Parameter | Threshold values |
|---|---|---|
| A | K | Minimum 15 and maximum 45 frames, which represents 1–2 s |
| B | K | Minimum 50 and maximum 150 frames, which represents 2–10 s for 15 fps |
| B | P | Minimum 0.7 and maximum 1.2 |
| B | T | The parameter turned out to be unnecessary |
| C | K | Minimum 30 and maximum 60 (in order to eliminate false alarms) |
| C | P | Minimum 0.5 |
System prototype consists of three key modules which are background modelling using adaptive Gaussian Mixture Models, object classification using the Haar and HOG classifiers, and tracking using Mean-Shift algorithm. The proposed combination of algorithms proved to be effective and appropriate for the system. The experiments helped to determine suitable threshold values of the parameters responsible for triggering the alarms in three various situations corresponding to system working scenarios. The most important task was to analyse the patterns of moving objects, especially human silhouettes, and their features. It turned out that the ratio of an object’s bounding box and the time in which a person remains in the protected area or does not move constitute crucial parameters for the recognition of specific events.
Notes
Acknowledgments
The project ‘Innovative security system based on image analysis —“SmartMonitor” prototype construction’ (original title: Budowa prototypu innowacyjnego systemu bezpieczeństwa opartego o analize obrazu—“SmartMonitor”) is the project co-founded by European Union (project number PL: UDA-POIG.01.04.00-32-008/10-02, Value: 9.996.604 PLN, EU contribution: 5.848.560 PLN, realization period: 07.2011-04.2013). European Funds—for the development of innovative economy (Fundusze Europejskie—dla rozwoju innowacyjnej gospodarki).
References
- 1.ADT (2014) Webpage http://www.adt.com/video-surveillance Accessed 07 May 2014
- 2.Vivint (2014) Webpage http://www.vivint.com/en/solutions/ Accessed 07 May 2014
- 3.ZoneMinder (2014) Online Documentation http://www.zoneminder.com/documentation Accessed 07 May 2014
- 4.AgentVi (2014) Webpage http://www.agentvi.com/61-Products-62-Vi_System Accessed 07 May 2014
- 5.Bosch (2014) Webpage http://us.boschsecurity.com/us_product/03_solutions_2/solutions Accessed 07 May 2014
- 6.BRS Labs (2014) Webpage http://www.brslabs.com/ Accessed 07 May 2014
- 7.Frejlichowski D, Gościewska K, Forczmański P, Nowosielski A, Hofman R (2013) Extraction of the foreground fegions by means of the adaptive background modelling based on various colour components for a visual surveillance system. In: Burduk R et al. (eds) CORES 2013, Advances in intelligent systems and computing, vol 226, pp 351–360Google Scholar
- 8.Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. IEEE Computer Society Conference on computer vision and pattern recognition 2Google Scholar
- 9.Zivkovic Z (2004) Improved adaptive Gaussian mixture model for background subtraction. In: Proceedings of the 17th International Conference on pattern recognition, vol 2, pp 28–31Google Scholar
- 10.Gurwicz Y, Yehezkel R, Lachover B (2011) Multiclass object classification for realtime video surveillance systems. Pattern Recognit Lett 32:805–815CrossRefGoogle Scholar
- 11.Frejlichowski D (2008) Automatic localisation of moving vehicles in image sequences using morphological operations. In: Proceedings of the 1st IEEE International Conference on Information Technology, Gdansk 2008, pp 439–442Google Scholar
- 12.Frejlichowski D, Forczmański P, Nowosielski A, Gościewska K, Hofman R (2012) SmartMonitor: an approach to simple, intelligent and affordable visual surveillance system. In: Bolc L et al. (eds) ICCVG 2012, Lect Notes Comput Sci, vol 7594, pp 726–734Google Scholar
- 13.Frejlichowski D, Gościewska K, Forczmański P, Nowosielski A, Hofman R (2013) The removal of false detections from foreground regions extracted using adaptive background modelling for a visual surveillance system, In: Saeed K et al. (eds) CISIM 2013, Lect Notes Comput Sci, vol 8104, pp 253–264Google Scholar
- 14.Sen-Ching SCS, Kamath C (2004) Robust techniques for background subtraction in urban traffic video. In: Bhaskaran V, Panchanathan S (eds) Visual communications and image processing, vol 5308, pp 881–892Google Scholar
- 15.Kaewtrakulpong P, Bowden R (2001) An improved adaptive background mixture model for realtime tracking with shadow detection. In: Video based surveillance systems: computer vision and distributed processing, Kluwer Academic PublishersGoogle Scholar
- 16.Forczmański P, Seweryn M (2010) Surveillance video stream analysis using adaptive background model and object recognition. Lect Notes Comput Sci Comput Vis Graph 6374:114–121CrossRefGoogle Scholar
- 17.Javed O, Shafique K, Shah M (2002) A hierarchical approach to robust background subtraction using color and gradient information. Workshop on Motion and Video Computing, pp 22–27Google Scholar
- 18.Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE Comput Soc Conf Comput Vis Pattern Recognit 1:886–893Google Scholar
- 19.Frejlichowski D, Gościewska K, Forczmański P, Nowosielski A, Hofman R (2012) SmartMonitor: recent progress in the development of an innovative visual surveillance system. J Theor Appl Comput Sci 6(3):28–35Google Scholar
- 20.Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. IEEE Comput Soc Conf Comput Vis Pattern Recognit 1:511–518Google Scholar
- 21.Avidan S (2005) Ensemble tracking. In: 2005 IEEE Computer Society Conference on computer vision and pattern recognition, San DiegoGoogle Scholar
- 22.Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv 38(4):13CrossRefGoogle Scholar
- 23.Welch B (2006) An Introduction to the Kalman filter. UNC-Chapel Hill, TR 95–041Google Scholar
- 24.Cheng Y (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal 17(8):790–799CrossRefGoogle Scholar
- 25.Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal 24(5):603–619CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.












