Keywords

1 Introduction

Many computer vision applications require the detection of changes within video streams, e.g. video surveillance, smart environments, video indexing and retrieval [5, 28]. For all these applications, a robust change detection algorithms with a low false alarm rate is required as a pre-processing step. Many algorithms have been proposed to solve the problem of video change detection. Most of them rely on background subtraction techniques to segment the scene into foreground and background components. The outputs of a change detection algorithm are usually binary images of the foreground areas corresponding to moving objects. These algorithms are designed to cope with the challenges that can be found in a real-world videos such as high variation in environmental conditions, illumination changes, shadows, camera movements and camera-induced distortions and so on. To this end, algorithms are becoming increasingly more complex and thus computationally expensive both in terms of time and memory space. Parallelization of background subtraction algorithms on GPU is a possible way to speed up the computation to make them usable in real-time applications (e.g. [24]).

Notwithstanding the improvements, it is still difficult to design general-purpose background subtraction algorithms. These algorithms have been demonstrated to perform well on some types of videos but there is no single algorithm that is able to tackle all the challenges in a robust and computationally efficient way. This can be clearly seen in the CDNET 2014 competition [26] (ChangeDetection.net) where change detection algorithms are evaluated on a common dataset composed of different types of videos sequences and classified according to their performance. To date, more than 35 different algorithms have been evaluated, but many more exists in the literature.

Finally the output of a background subtraction algorithm is usually refined in order to reduce noisy patterns such as isolated pixels, holes, and jagged boundaries. To improve algorithm accuracy, post-processing of the foreground component, ranging from simple noise removal to more complex object-level techniques, has been investigated. Results indicate that significant improvements in performance are possible if a specific post-processing algorithm is designed and the corresponding parameters are set appropriately [18].

Given the existence of so many change detection algorithms, each which its own peculiarities and strengths, here we are interested in finding how far can we get, in terms of performances, by leveraging existing algorithms to create new ones. Instead of designing from scratch a new algorithm, we combine existing ones with the aim to build a better change detection algorithm. We are interested in testing this idea under two scenarios: a purely performance-based scenario and a performance/efficiency balanced scenario. In [3] we have investigated the first scenario by considering the best change detection algorithms in the CDNET 2014 competition, disregarding their computational complexity, and combining then under a Genetic Programming (GP) framework [11]. The resulting algorithms significantly outperform the algorithms used in the combination and even other, more recent, approaches. In this work, we present our findings with respect to the second scenario. We apply the same general approach used in [3] but considering state-of-the-art algorithms that are computationally efficient but not top-performing. We want to investigate if also in this scenario, we are able to create an effective algorithm and what kind of performances we can achieve.

2 The Proposed Approach

In this section, we summarize our GP-based combining approach. A detailed description of the method can be found in [3]. GP is a domain-independent evolutionary method that genetically breeds a population of functions, or more generally, computer programs to solve a given problem [11]. Evolution is driven by the best fit individuals according to an objective function (i.e. fitness function) that must be maximized or minimized. The solutions can be represented as trees, lines of code, expressions in prefix or postfix notations, strings of variable length, etc. GP has been widely used for finding suitable solutions for a wide range of problems needing optimization. For example, in image processing and computer vision applications GP has been used for: image segmentation, enhancement, layouting, classification, feature extraction, and object recognition [1, 2, 4, 7, 13]. For our purposes, we feed GP the set of the binary foreground images that correspond to the outputs of the single change detection algorithms, and a set operators represented by unary, binary, and n-ary functions that are used to combine the outputs (via logical AND, logical OR, etc...) as well as to perform post-processing (via filter operators).

More formally, given a set of n change detection algorithms \(\mathcal {C}=\{C_i\}_{i=1}^n\), the solutions evolved by GP are built using the set of functionals symbols \(\mathcal {F}\) and the set of terminal symbols \(\mathcal {T}=\mathcal {C}\). We build the set of functionals symbols considering operators that work in the spatial neighborhood of the image pixel, or combine (stack) the information at the same pixel location but across different change detection algorithms. The list of functional symbols used is given below:

  • ERO (Erosion): it requires one input, works in the spatial domain and performs morphological erosion with a \(3\times 3\) square structuring element;

  • DIL (Dilation): it requires one input, works in the spatial domain and performs morphological dilation with a \(3\times 3\) square structuring element;

  • MF (Median Filter): it requires one input, works in the spatial domain an performs median filtering with a \(5 \times 5\) kernel;

  • OR (Logical OR): it requires two inputs, works in the stack domain and performs the logical OR operation;

  • AND (Logical AND): it requires two inputs, works in the stack domain and performs the logical AND operation;

  • MV (Majority Vote): it requires two or more inputs, works in the stack domain and performs the majority vote operation;

We define the fitness function used in GP by taking inspiration from the CDNET website, where change detection algorithms are evaluated using different performance measures and ranked accordingly. Given a set of video sequences \(\mathcal {V}=\{V_1,\ldots ,V_S\}\), a set of performance measures \(\mathcal {M}=\{m_1,\ldots ,m_M\}\) the fitness function of a candidate solution \(C_0\), \(f(C_0)\) is defined as the average rank across video sequences and performance measures:

$$\begin{aligned} f(C_0)= \frac{1}{M} \sum _{j=1}^M \bigg ( \mathrm{{rank}}_{C_0} \Big ( C_0; \big \{ m_j\big (C_k(\mathcal {V})\big ) \big \} _{k=1}^{n} \Big ) + \sum _{i=1}^2{w_i P_i(C_0)} \bigg ) \end{aligned}$$
(1)

where \(\mathrm{{rank}_{C_0}}(\cdot )\) computes the rank of the candidate solution \(C_0\) with respect to the set of algorithms \(\mathcal {C}\) according to the measure \(m_j\). \(P_1(C_0)\) is defined as the signed distance between the candidate solution \(C_0\) and the best algorithm in \(\mathcal {C}\) according to the measure \(m_j\):

$$\begin{aligned} P_1(C_0)= {\left\{ \begin{array}{ll} \displaystyle -m_j(C_0(\mathcal {V}))+\max _{C_k \in \mathcal {C}}{m_j(C_k(\mathcal {V}))} \\ \qquad \qquad \qquad \qquad \text{ if } \text{ the } \text{ higher } m_j \text { the better} \\ \displaystyle m_j(C_0(\mathcal {V}))-\min _{C_k \in \mathcal {C}}{m_j(C_k(\mathcal {V}))} \\ \qquad \qquad \qquad \qquad \text{ if } \text{ the } \text{ lower } m_j \text { the better} \\ \end{array}\right. } \end{aligned}$$
(2)

and \(P_2(C_0)\) is a penalty term corresponding to the number of different algorithms selected for the candidate solution \(C_0\):

$$\begin{aligned} P_2(C_0) = \frac{\# \text{ of } \text{ algorithms } \text{ selected } \text{ in } C_0}{\# \text{ of } \text{ algorithms } \text{ in } \mathcal {C}} \end{aligned}$$
(3)

The role of \(P_1\) is to produce a fitness function \(f(C_0) \in \mathbb {R}\), so that in case of candidate solutions having the same average rank, the one having better performance measures is considered a fitter individual in GP. The penalty term \(P_2\) is used to force GP to select a small number of algorithms in \(\mathcal {C}\) to build the candidate solutions. The relative importance of \(P_1\) and \(P_2\) is independently regulated by the weights \(w_1\) and \(w_2\) respectively.

3 Experimental Setup

Since we wanted to test computationally efficient and simple change detection algorithms, we chose the set of change detection algorithms \(\mathcal {C}\) to be combined among those implemented in the BGSLibraryFootnote 1. BGSLibrary is a free, open source and platform independent library which provides a C++ framework to perform background subtraction using code provided by different authors. We used the 1.9.1 version of the library which implements more than 30 different algorithms. We base our choice of the algorithms on the recent review paper of the authors of BGSLibrary [20] where the computational costs as well as the performances of the different algorithms have been assessed. The rationale is to use computationally efficient algorithms having above average performances, and possibly exploiting different background subtraction strategies. Based on the results in [20], and on some preliminary tests that we have performed, we selected the following algorithms: Static Frame Difference (SFD), Adaptive-Selective Background Learning (ASB), Adaptive Median (AM) [15], Gaussian Average (GA) [27], Gaussian Mixture Model (ZMM) [29], Gaussian Mixture Model (MoG) [6], Gaussian Mixture Model (GMM) [22], Eigenbackground/SL-PCA (EIG) [17], VuMeter (VM) [9], \(\varSigma \varDelta \) Background Estimation (SD) [12], Multiple Cues (MC) [16]. All the algorithms have been tested in [20] with the exception of SigmaDelta and SJNMultiCue algorithms. These have been added in recent versions of the BGSLibrary. We decide to include them since they show interesting performances although they are slightly more computationally intensive with respect to the simpler algorithms.

The performance measures \(\mathcal {M}\) are computed using the framework of the CDNET 2014 challenge [26]. The framework implements the following seven different measures: recall, specificity, false positive ratio (FPR), false negative ratio (FNR), percentage of wrong classifications (PWC), precision, and F-measure. A ranking of the tested algorithms is also computed starting from the partial ranks on these measures. The CDNET 2014 dataset is composed of 11 video categories, with four to six videos sequences in each category. The categories exhibit different video contents and are chosen to test the background subtraction algorithms under different operating conditions. The challenge rules impose that each algorithm should use only a single set of parameters for all the video sequences. For this reason we set the parameters of the algorithms to their default values, i.e. the values in the configuration files provided in the BGSLibrary.

We set the parameters of GP as in [3]. Also, the GP solutions are generated by considering the shortest video sequence in each of the 11 CDNET 2014 categories as training set. The images in this set are less than 10% of the total images in the whole dataset; this minimizes the over-fitting effect if more images were used. We name the best solution found by GP in this way as IUTIS-2 (the term IUTIS is derived by quoting the Greek fabulists Aesop 620BC-560BC: “In Unity There Is Strength”). We also created a different algorithm, IUTIS-1, by considering a smaller training set composed of all video sequences in the “Baseline” category. As the name suggests, this category contains basic video sequences. IUTIS-1 exhibits worse performances than IUTIS-2, and since its results are not directly comparable with the reported ones, it will not be further considered in the discussion.

Fig. 1.
figure 1

IUTIS-2 solution tree and example masks.

Table 1. Detailed evaluation results of the IUTIS-2 algorithm for each category of the evaluation dataset.
Table 2. Overall F-measure all the change detection algorithms considered to build IUTIS-2 (left), and IUTIS-3, IUTIS-5 and IUTIS-7 (right). An empty circle means that the algorithm was in \(\mathcal {C}\) but was not selected, a full circle otherwise.

4 Results

The tree structure for the IUTIS-2 solution is shown in Fig. 1. In the same figure an example of the output at each node on a sample frame is also reported. From the solution tree it is possible to notice that IUTIS-2 selected and combined a subset of four simple change detection algorithms out of the 11 available: it selected GA, ZMM, MC, and ASB. Concerning the tree structure, we can notice that IUTIS-2 presents a single long branch in its right-hand side. Starting from the functionals defined in Sect. 2, GP was able to create new ones. For instance, the solution tree uses a sequence of the operator \(\text {MF}\), which can be seen as an approximation of what could be obtained using a larger kernel for the median filter. The detailed results of the IUTIS-2 algorithm, computed using the evaluation framework of the CDNET 2014 challenge on its 11 video categories, are reported in Table 1. The overall F-measure of IUTIS-2 and of all the change detection algorithms considered are reported in Table 2. From the values reported it is possible to see that our solution is better than the best algorithm fed to GP (i.e. MC), achieving a F-measure that is 7.2% higher.

For comparison, we also report here (see Fig. 2) the solution trees of IUTIS-3 and IUTIS-5, that we recall have been generated by the same method here described but in a purely performance-based scenario [3]. The set of algorithms \(\mathcal {C}\) available for GP to build IUTIS-3 were the three top performing algorithms on CDNET 2014, i.e.: Flux Tensor with Split Gaussian models (FTS) [25], Self-Balanced SENsitivity SEgmenter (SBS) [21], and Change Detection with Weightless Neural Networks (CWS) [10]. The set \(\mathcal {C}\) available for IUTIS-5 also included Change Detection based on Spectral Reflectaces (SPC) [19] and Extension of the Adapting Multi-resolution Background Extractor (AMB) [24].

From the comparison of the solution trees reported in Figs. 1 and 2, it is possible to notice that IUTIS-2 is more complex in terms of functionals used (see for example the right branch). This is due to the fact that very simple algorithms are used and more operations are needed on their output to achieve higher performance. The overall F-measure of IUTIS-3, IUTIS-5 and of all the change detection algorithms considered are reported in Table 2. From the results it is possible to see that IUTIS-3 is better than any other single algorithms, with a F-measure that is 4.13% higher than the best algorithm used by GP. In the case of IUTIS-5 this difference increases to 5.4%. It is worth noting that all our solutions are better than majority vote solutions (denoted with MV) applied to the corresponding sets \(\mathcal {C}\). In [3] we also experimented with larger cardinalities of \(\mathcal {C}\), i.e. \(\#\mathcal {C}=7\) and 9, but in both cases the corresponding solutions found by GP, i.e. IUTIS-7 and IUTIS-9, obtained identical performance with respect to IUTIS-5 and thus we only report them in Table 2.

Outputs of some of the tested algorithms on sample frames in the CDNET 2014 dataset, together with input images and ground truth masks, are shown in Fig. 3.

Fig. 2.
figure 2

IUTIS-3 solution tree (left) and IUTIS-5 solution tree (right).

Fig. 3.
figure 3

Examples of binary masks created by the proposed algorithms and some of the algorithms used in the combination.

5 Computational Time

IUTIS-2 algorithm has been implemented in C++ and use the OpenCV library for image processing. Table 3 reports the computational time of the proposed algorithm in frames per seconds. For evaluation purpose, we have implemented two versions of IUTIS-2: a “Sequential” one, and a “Parallel” one. The first version refers to the implementation of the algorithms without any particular optimization. While the second one refers to an optimized implementation of the algorithms obtained by exploiting parallelism on a multicore CPU. We used the OpenMP directives (parallel for and sections) to parallelize both the computation of the masks, and the execution branches of the solution tree. The timing measurements are carried out on a 3.3 GHz Intel Core-i5 (quadcore) with 16 GB RAM and Windows 7 Professional operating system. As it can be seen, the IUTIS-2 algorithm can be efficiently parallelized. Specifically, the frame rates of the parallel version is, on average, about 2 times faster than that the sequential version.

In Table 3 we also report the computational time of IUTIS-3, and IUTIS-5 used in [3]. For these algorithms, the computational time is an estimated of an hypothetical parallel implementation and corresponds to the slowest algorithm used in the solution tree. For completeness we compare the computational time of the different IUTIS algorithms with the top five algorithms in Table 2 (right). The slowest algorithm, with 10 frame-per-seconds, is FTS (that is also used in IUTIS-3 and IUTIS-5). This algorithm is implemented in MATLAB while the other algorithms are all implemented in C++. The AMB algorithm is the most efficient one with an impressive 843 frame per second. This result is achieved thanks to the parallel implementation on GPU using the CUDA architecture.

Table 3. Computational time, in frames per seconds and at the resolution of \(320\times 240\) pixels, of different change detection algorithms on a i5-2500K@3.3Ghz computer with 16 GB RAM. For IUTIS-3 and IUTIS-5 we report an estimate corresponding to the slowest algorithm in an hypothetical parallel implementation.

6 Conclusion

In this paper we have presented an evolutionary approach, based on Genetic Programming, to combine simple change detection algorithms to create a more robust algorithm. The solutions provided by Genetic Programming allow us to select a subset of the simple algorithms. Moreover, we are able to automatically combine them in different ways, and perform post-processing on their outputs using suitable operators to produce the best results. Our combination strategy, is able to produce algorithms that are more effective in solving the change detection problem in different scenario. If we are interested in obtaining the maximum performance disregarding the computational complexity of the algorithms themselves, we can combining few top-performing algorithms and achieve the best overall performances (i.e. IUTIS-3 and IUTIS-5). On the contrary, if we want to improve the performances of existing algorithms while maintaining a limited computational complexity, we can effectively combine several simple algorithms and achieve comparable results of more complex state-of-the-art change detection algorithms (i.e. IUTIS-2). In particular, the parallelized version of IUTIS-2 exhibits remarkable performance while being computationally affordable for real-time applications.