Abstract
A large number of modern video background modeling algorithms deal with computational costly minimization problems that often need parameter adjustments. While in most cases spatial and temporal constraints are added artificially to the minimization process, our approach is to exploit Dynamic Mode Decomposition (DMD), a spectral decomposition technique that naturally extracts spatiotemporal patterns from data. Applied to video data, DMD can compute background models. However, the original DMD algorithm for background modeling is neither efficient nor robust. In this paper, we present an equivalent reformulation with constraints leading to a more suitable decomposition into fore and background. Due to the reformulation, which uses sparse and lowdimensional structures, an efficient and robust algorithm is derived that computes accurate background models. Moreover, we show how our approach can be extended to RGB data, data with periodic parts, and streaming data enabling a versatile use.
1 Introduction
Modern video background modeling algorithms are often inherently based on matrix decompositions [6]. Besides pre and postprocessing, the core of these algorithms is to explicitly or implicitly decompose the data into a lowrank matrix characterizing the background and a sparse matrix that shows the foreground. To perform such a decomposition, a computational costly minimization problem with parameter adjustment is typically established that neglects spatial and temporal coherence at first (due to the use of simple matrix norms, seminorms, and quasinorms in the loss function). Since this formulation may fail to produce adequate background models for more complex video sequences, spatial and/or temporal constraints are taken into account to meet the requirements. Another more recent approach is to integrate neural networks [1] or related concepts into the process of background modeling, which leads to supervised methods in general. In this paper, however, we neither focus on a minimizationbased background modeling approach as described above nor on learningbased concepts. Instead, we follow a different approach and exploit Dynamic Mode Decomposition (DMD), a versatile spectral decomposition technique that naturally extracts spatiotemporal patterns from data.
DMD was originally established in the fluid dynamics community [30] and has evolved to an effective analysis tool. Numerically, DMD produces the following triples: DMD modes, amplitudes, and eigenvalues. For timedependent data, the modes represent the spatial contribution and the amplitudes specify their impact. Each associated eigenvalue characterizes the temporal development. According to this interpretation, Grosek and Kutz [14] were the first to model the background in video by DMD. Compared to stateoftheart algorithms, the approach suffers in terms of accuracy, efficiency, and robustness. Some publications address the high computational costs of DMD by efficient subsampling strategies or related concepts, but these techniques struggle even more with problems regarding accuracy and robustness.
Instead of applying subsampling techniques, we present an equivalent reformulation of DMD with constraints that enforces the existence of background components. Due to the reformulation, which uses sparse and lowdimensional structures, extremely efficient algorithms (e.g., the power method) can be leveraged for the computation. The contribution of this paper can be summarized as follows:

We elaborate on the drawbacks of previous methods and thus motivate our approach.

An equivalent reformulation of DMD with constraints is presented, leading to a suitable decomposition into fore and background.

We derive an efficient and robust algorithm that computes accurate background models.

The approach is extended to RGB data, data with periodic background parts, and streaming data.
The remainder of this paper is organized as follows: After clarifying related work in the subsequent section, Sect. 3 reviews the traditional DMD approach for foreground–background separation. Section 4 presents our approach including a motivation, new formulation, the associated algorithm, extensions, and implementation details. Section 5 provides a qualitative and quantitative evaluation in terms of accuracy, efficiency, and robustness. Finally, a conclusion is drawn in Sect. 6.
2 Related Work
There are numerous background modeling algorithms developed in the literature [5], and they can be classified as unsupervised [26, 29, 37], supervised [1, 4, 7, 23, 24], and semisupervised methods [12, 13]. Supervised methods allow for more accurate results than unsupervised methods, especially in complex scenarios, but they need typically large training data. Semisupervised approaches achieve comparably accurate results with less labeled training data; however, they are prone to supervision aspects as well. Since our approach does not rely on training processes or related concepts, it is an unsupervised method. Such methods were published in large numbers and varieties over the past decade. Depending on the tasks, papers [11, 34, 39] classify them differently. To our knowledge, the most recent review for unsupervised background modeling algorithms was written by Bouwmans et al. [6] in combination with the LRSlibrary [32]. Using the approach of decomposing video data into a lowrank and sparse matrix, they classify algorithms into the following categories: robust principal component analysis (RPCA), matrix completion (MC), nonnegative matrix factorization (NMF), subspace tracking (ST), lowrank recovery (LRR), threeterm decomposition (TTD), and tensor decompositions. Bouwmans et al. [6] put DMD into the RPCA category. However, DMD substantially differs from conventional RPCA approaches, e.g., the structure of the established minimization problems is completely different. We thus compare our approach only with DMDbased approaches in the following.
The first background modeling approach with DMD was proposed by Grosek and Kutz [14]. An extension to RGB video was presented by Tirunagari [35]. However, alternative stateoftheart background modeling algorithms achieve better results with substantially less computational costs. Kutz et al. [21] later proposed using multiresolution DMD for background modeling. It integrates multiresolution analysis techniques into DMD, which enables the identification of multitime scale features and, therefore, may lead to better results. Nevertheless, computational costs are even higher. The issue of efficiency was addressed by Erichson and Donovan [10] and slightly later by Erichson et al. [9] and Pendergrass et al. [28]. All these approaches use efficient sampling strategies, either based on a randomized singular value decomposition (randomized DMD) or on compressed data (compressed DMD). Whereas these subsampling techniques show promising efficiency, the background model suffers from the subsampling process in terms of accuracy and robustness. Instead of applying subsampling techniques, we use an equivalent reformulation of DMD with constraints that allows efficient and robust computation and leads to a more suitable decomposition into fore and background.
Recently, Haq et al. [16] also presented an efficient DMDbased method for background modeling that uses dictionary learning as subsampling strategy. As the dictionary needs to be trained, it substantially differs from our approach.
Finally, there are also other DMDbased techniques for videos, e.g., video shot detection [3] or saliency detection [27]. However, those approaches have other objectives and technical contributions than our paper.
3 Theoretical Foundation
This section provides the basics of the foreground– background separation with DMD, including the general theory of DMD and its traditional application to video data (see Fig. 1).
3.1 Dynamic Mode Decomposition
Before formulating the DMD background modeling algorithm, we summarize the principles of DMD in general [19, 20]. These aspects help understand foreground–background separation and, thus, our approach.
Let us consider data \(x_0,\dots ,x_m \in \mathbb {R}^n\) sampled uniformly in time. Typically, the data dimension n is considerably greater than m (compare Fig. 1, left box). DMD basically computes eigenvalues and eigenvectors of a highdimensional matrix \(A_{\text {\tiny DMD}} \in \mathbb {R}^{n \times n}\) (characterizing the dynamics of the data) via a lowdimensional representation \(S_{\text {\tiny DMD}}\) (see Fig. 1, middle box). The matrix \(A_{\text {\tiny DMD}}\) is derived from the minimization problem
where \(X = \begin{bmatrix} x_0&\dots&x_{m1} \end{bmatrix}\), \(Y = \begin{bmatrix} x_1&\dots&x_{m} \end{bmatrix}\). In general, the minimization problem has infinite solutions (as we will prove later). DMD therefore aims for the minimumnorm solution, which is explicitly given by
where \(X^+\) is the Moore–Penrose pseudoinverse of X. To obtain the eigenvalues and eigenvectors of \(A_{\text {\tiny DMD}}\), the lowdimensional representation \(S_{\text {\tiny DMD}}\) is computed via the thin singular value decomposition (SVD) of X, i.e., \(X = U \Sigma V^*\) with \(r=\text {rank}(X)\), and the following reformulation:
Eq. 3 is the core of DMD, forming the starting point of Algorithm 1 (Lines 1–4). In the next steps, the (DMD) eigenvalues \(\lambda _j \in \mathbb {C}\) and eigenvectors \(v_j \in \mathbb {C}^{r}\) of \(S_{\text {\tiny DMD}}\) are computed (line 5). Finally, all eigenvectors with nonzero eigenvalues are transformed into modes \(\vartheta _j = \frac{1}{\lambda _j} Y V \Sigma ^{1} v_j \in \mathbb {C}^n\) (Lines 6–8) and the corresponding amplitudes \(a_j \in \mathbb {C}\) are computed by \(\min \Vert \Theta a  x_0 \Vert \) (Lines 9–11). Algorithm 1 results in the following decomposition of the data:
This approximation property is crucial for the interpretation of DMD triples \((\vartheta _j,\lambda _j,a_j)\), in particular, concerning the foreground–background separation approach.
3.2 Background Modeling with DMD
To compute a background model with DMD, the DMD components have to be interpreted in terms of video data. An overview of the traditional DMD background modeling is illustrated in Fig. 1. After applying DMD to the vectorized video frames, DMD produces modes \(\vartheta _j \in \mathbb {C}^n\), eigenvalues \(\lambda _j \in \mathbb {C}\), and amplitudes \(a_j \in \mathbb {C}\) with a computational complexity of \(\mathcal {O}(nm^2)\) in general as \(n \gg m\). The entries of a DMD mode \(\vartheta _j\) correspond to grayscaled pixel values at specific pixel locations. According to Eq. 4, the associated eigenvalue \(\lambda _j\) characterizes the temporal development. To extract a background model, Grosek and Kutz [14] propose selecting DMD triples that vary over time extremely slowly or not even at all. These triples are characterized by eigenvalues that satisfy \(\lambda _j \approx 1\) or, more precisely, by the index set
for a given threshold \(\delta \) (see Fig. 2, top). Therefore, the video frames can be separated by DMD into fore and background:
Since the eigenvalues \(\lambda _j\), amplitudes \(a_j\), and modes \(\vartheta _j\) are complex in general, the sums are not realvalued. Whereas Grosek and Kutz [14] use the absolute value as well as a rearrangement of negative values, Erichson et al. [9] recommend the use of the real part. In this paper, we follow the technique of Grosek and Kutz.
4 Proposed Approach
In this section, we present our DMD background modeling approach. After a short motivation, the theory and the algorithm (based on grayscaled data) are explained in detail. We then extend our approach to RGB data, data with periodic parts, and streaming data, which enables the usage in video surveillance. Finally, the implementation is described.
4.1 Motivation
Leading background modeling algorithms are characterized by accuracy, robustness, and efficiency. While DMD is capable of producing accurate background models in general, the traditional DMD background modeling algorithm is neither robust nor efficient. This fact can be seen in Fig. 1, where the traditional approach fails to produce an accurate background model. We additionally highlight the computational costs of each step of DMD. Since both an SVD and an eigenvalue decomposition is needed, DMD has a computational complexity of \(\mathcal {O}(nm^2)\) in general as \(n \gg m\).
Even though the improvements via subsampling strategies boost efficiency at the cost of accuracy, they do not solve the following underlying problem: In Fig. 2 top, the background model of the traditional approach is shown at an earlier frame. It can be observed that a pair of complexconjugated eigenvalues close to the value 1 is detected. According to the traditional approach, the two corresponding DMD triples \((\vartheta _j, a_j, \lambda _j)\) should be selected. If they are not close enough at the value 1, no components are selected due to thresholding and no background can be computed, i.e., the algorithm is not robust. Otherwise, this small inaccuracy causes numerical instability and a mixing of fore and background parts as shown in Fig. 2. These issues may be even worse for subsampled data.
Consequently, DMD has to be modified such that components either clearly belong to the fore or background. Since DMD only uses one or two triples (often occurring in complexconjugated pairs) for the background model [20], we propose enforcing the existence of an eigenvalue \(\lambda _\mathbb {1} = 1\) by a DMDconsistent projection. Moreover, we also recommend using the efficient power method to compute the relevant DMD components for an adequate background model. We can do this because our experiments have shown that all other eigenvalues are inside the unit circle due to the fact that foreground objects appear and disappear over time. This leads to a more suitable decomposition into fore and background as shown in Fig. 2 that is produced in an efficient manner.
4.2 Theory
As mentioned before, we want to enforce the existence of an eigenvalue \(\lambda _\mathbb {1} = 1\) to obtain a suitable decomposition into fore and background. To understand the general idea of our approach, we shortly summarize the steps in the following:
 (§1):

Prove the existence of infinite solutions of the minimization problem 1 (see Proposition 1).
 (§2):

Formulate a new minimization problem (see Eq. 6) and find a more accessible representation (see Theorem 1, Theorem 2, and Eq. 11).
 (§3):

Find a feasible solution to the final minimization problem (see Theorem 3).
(§1) If the data \(x_0,\dots ,x_m \in \mathbb {R}^n\) are linear independent (which is the case for real video footage in general), we first note that the highdimensional minimization problem 1 has infinite solutions that solve the equation exactly, i.e., \(AX = Y\).
Proposition 1
For linearly independent data \(x_0,\dots ,x_m \in \mathbb {R}^n\) with \(Z = \begin{bmatrix} x_0&\dots&x_m \end{bmatrix} \in \mathbb {R}^{n \times m+1}\), the minimization problem 1 has infinite solutions that solve the equation exactly, i.e., \(AX = Y\). In particular, every combination of \(m+1\) distinct complex numbers \(z_0,\dots ,z_m \in \mathbb {C}\) defines a solution \(A_z\) in the form of its eigenvalues.
Proof
Let us consider distinct complex numbers \(z_0,\dots ,z_m \in \mathbb {C}\) with \(\Lambda _z = \text {diag}(z_0,\dots ,z_m)\) as well as the respective Vandermonde matrix
Defining \(\Xi _z = Z V(z)^{1}\) (the columns of the matrix are basically scaled modes), the matrix \(A_z = \Xi _z \Lambda _z \Xi _z^+ \) solves the minimization problem 1 exactly:
where we used that \(\Xi _z^+ = V(z) Z^+\) (as the columns of the matrix Z are linearly independent). Since \(z_0,\dots ,z_m\) are chosen arbitrarily, the assertion is proven. \(\square \)
The above theorem states that every distinct combination of temporal components \(\lambda _0,\dots ,\lambda _m\) (which are eigenvalues of a respective matrix A) are actually possible outcomes. However, since DMD uses the minimumnorm solution \(A_{\text {\tiny DMD}} = YX^+\), the DMD eigenvalues fit better to the data than a random solution. The existence of adequate eigenvalues for background modeling is not guaranteed nonetheless.
(§2) Hence, the naive approach is to find a matrix \(A \in \mathbb {R}^{n \times n}\) that solves the minimization problem 1 and has an eigenvalue \(\lambda _\mathbb {1} = 1\):
However, this formulation cannot be solved practically due to the highdimensionality of the problem and the eigenvalue equation in the constraints.
To formulate an adequate constrained minimization problem, we use the lowdimensional representation C of a matrix A with respect to the data matrix \(Z = \begin{bmatrix} x_0&\dots&x_m \end{bmatrix} \in \mathbb {R}^{n \times m+1}\):
This matrix is a companion matrix and completely characterizes the datarelevant spectral properties of A. In addition, the retransformed matrix \(Z C Z^+ \in \mathbb {R}^{n \times n}\) inherits many algebraic properties, in particular, the minimization property, as the following theorem shows.
Theorem 1
For linearly independent data \(x_0,\dots ,x_m \in \mathbb {R}^n\) with \(Z = \begin{bmatrix} x_0&\dots&x_m \end{bmatrix} \in \mathbb {R}^{n \times m+1}\), the lowdimensional representation \(C = Z^+ A Z \in \mathbb {R}^{m+1 \times m+1}\) to a matrix \(A \in \mathbb {R}^{n \times n}\) satisfying \(AX=Y\) has the following properties:

(a)
The lowdimensional representation C is a companion matrix
$$\begin{aligned} C = \left[ \begin{array}{cc} \begin{matrix} 0^T \\ I_m \end{matrix}&\tilde{c} \end{array}\right] = \begin{bmatrix} 0^T &{} c_0 \\ I_m &{} c \end{bmatrix} \end{aligned}$$(8)and the related vector \(\tilde{c}\) satisfies
$$\begin{aligned} \tilde{c} = \begin{pmatrix} c_0 \\ c \end{pmatrix} = Z^+ A x_m \in \mathbb {R}^{m+1}. \end{aligned}$$(9) 
(b)
An eigenvalue \(\lambda \) of the matrix C with eigenvector \(\tilde{v}\) is also an eigenvalue of A with corresponding eigenvector \(v = Z \tilde{v}\), if \(A v \in \text {span}(x_0,\dots ,x_m)\).

(c)
An eigenvalue \(\lambda \ne 0\) of the matrix A with eigenvector v is also an eigenvalue of C with corresponding eigenvector \(\tilde{v} = Z^+v\), if \(v \in \text {span}(x_0,\dots ,x_m)\).

(d)
The retransformed matrix \(ZCZ^+ \in \mathbb {R}^{n \times n}\) satisfies the equation \((ZCZ^+)X = Y\).
Proof
(a) The following calculation shows that C is a companion matrix:
where \(c_0 \in \mathbb {R}\) and \(c \in \mathbb {R}^m\) are values that satisfy
(b) Let \(\lambda \) be an eigenvalue of the matrix C with eigenvector \(\tilde{v}\). Then, the nonzero vector \(v = Z \tilde{v}\) (as Z has linearly independent columns) satisfies the following equation
where we have used that \(ZZ^+\) is the projection onto the image of Z.
(c) Let \(\lambda \) be an eigenvalue of the matrix A with eigenvector v. Then, the vector \(\tilde{v} = Z^+v\) satisfies the following equation
To show that \(\tilde{v}\) is an eigenvector, we assume that \(\tilde{v} = 0\). This implies \(0 = A Z \tilde{v}\), which results in \(0 = A Z Z^+ v = A v = \lambda v\). Since \(v \ne 0\), the eigenvalue has to be zero, i.e., \(\lambda = 0\). However, this contradicts the assumption. Therefore, \(\tilde{v} \ne 0\).
(d) Using the companion matrix structure and the linearly independent columns of the matrix Z, the following calcution shows that the retransformed matrix \(ZCZ^+\) solves the first constraint:
\(\square \)
With the help of Theorem 1, every solution A to the minimization problem 1 can be represented by an appropriate companion matrix C that naturally satisfies the first constraint in minimization problem 6. To reformulate the entire constrained minimization problem 6, the lowdimensional representation of \(A_{\text {\tiny DMD}} = Y X^+\) has to be calculated as well, which will be done in the following theorem.
Theorem 2
For linearly independent data \(x_0,\dots ,x_m \in \mathbb {R}^n\) with \(Z = \begin{bmatrix} x_0&\dots&x_m \end{bmatrix} \in \mathbb {R}^{n \times m+1}\), the lowdimensional representation \(C_{\text {\tiny DMD}} = Z^+ A_{\text {\tiny DMD}} Z \in \mathbb {R}^{m+1 \times m+1}\) to the matrix \(A_{\text {\tiny DMD}} = Y X^+ \in \mathbb {R}^{n \times n}\) has the following properties:

(a)
The lowdimensional representation \(C_{\text {\tiny DMD}}\) is a companion matrix with \(\text {rank}(C) = m\) and the related vector \(\tilde{c}_{\text {\tiny DMD}}\) satisfies
$$\begin{aligned} \tilde{c}_{\text {\tiny DMD}} = \begin{pmatrix} c_0 \\ c_{\text {\tiny DMD}} \end{pmatrix} = \begin{pmatrix} 0 \\ X^+ x_m \end{pmatrix}. \end{aligned}$$(10) 
(b)
An eigenvalue \(\lambda \) of the matrix \(C_{\text {\tiny DMD}}\) with eigenvector \(\tilde{v}\) is also an eigenvalue of \(A_{\text {\tiny DMD}}\) with corresponding eigenvector \(v = Z \tilde{v}\).

(c)
An eigenvalue \(\lambda \ne 0\) of the matrix \(A_{\text {\tiny DMD}}\) with eigenvector v is also an eigenvalue of \(C_{\text {\tiny DMD}}\) with corresponding eigenvector \(\tilde{v} = Z^+v\).

(d)
The retransformed matrix \(ZC_{\text {\tiny DMD}}Z^+\) is equal to \(A_{\text {\tiny DMD}}\), i.e., \(ZC_{\text {\tiny DMD}}Z^+ = A_{\text {\tiny DMD}}\) (trivially satisfying \((ZC_{\text {\tiny DMD}}Z^+)X = Y\)).
Proof
(a) According to Theorem 1, the matrix \(C_{\text {\tiny DMD}}\) is a companion matrix
where the corresponding vector \(\tilde{c}_{\text {\tiny DMD}}\) is given by
Since \(c_0 = 0\), the matrix \(C_{\text {\tiny DMD}}\) has \(\text {rank}(C_{\text {\tiny DMD}}) = m\).
\((b) + (c)\) Since \(A_{\text {\tiny DMD}}\) has rank m and satisfies \(A_{\text {\tiny DMD}}X =Y\) by the assumption [19], the image of \(A_{\text {\tiny DMD}}\) is given by
Therefore, the requirements of (b) and (c) in Theorem 1 are met for the matrix \(A_{\text {\tiny DMD}}\).
(d) To show the equality \(Z C_{\text {\tiny DMD}} Z^+ = A_{\text {\tiny DMD}}\), we first notice that both matrices solve the equation \(AX=Y\) (see Theorem 1). Therefore, the two matrices are equal for vectors \(x \in \text {span}(x_0,\dots ,x_{m1})\). In addition, this assertion is also true for the \(m+1\) snapshot \(x_m\) because
Hence, \(Z C_{\text {\tiny DMD}} Z^+ x = A_{\text {\tiny DMD}} x\) for \(x \in \text {span}(x_0,\dots ,x_m)\). To conclude the proof, we show the equality for an arbitrary vector y from the orthogonal complement of \(\text {span}(x_0,\dots ,x_m)^\bot \):
Since a subspace and its orthogonal complement span the entire space, the two matrices are equal. \(\square \)
Theorem 1 and Theorem 2 show that a highdimensional matrix A that solves the minimization problem 1 exactly, such as \(A_{\text {\tiny DMD}}\), can be adequately represented by a lowdimensional representation capturing the datarelevant properties. Moreover, the lowdimensional representation has the structure of a companion matrix that naturally satisfies the first constraint in the minimization problem 6. By these characteristics, the highdimensional constrained minimization problem 6 can be reformulated into the following lowdimensional constrained minimization problem (using the companion matrices from Theorem 1 and Theorem 2):
(§3) The constraint in minimization problem 11 seems to have an impractical form as well. However, it can be solved in a feasible way: Since the characteristic polynomial of a companion matrix C with vector \(\tilde{c}\) is given by
the constraint is equivalent to \(p_{\tilde{c}}(1) = 0\) or, more precisely,
Using the notation \(e_{m+1} = (1, 1, \dots , 1)^T \in \mathbb {R}^{m+1}\), the lowdimensional constrained minimization problem 11 is equivalent to
The solution space \(M_\mathbb {1} = \{ y \in \mathbb {R}^{m+1} ~ ; ~ e_{m+1}^T y = 1 \}\) has several nice properties: It is an affine subspace and therefore convex, which implies existence and uniqueness of a solution for the constrained minimization problem 12 as the following theorem shows.
Theorem 3
For the constrained minimization problem 12, there exists a unique solution given by the orthogonal projection \(P_\mathbb {1}\) onto \(M_\mathbb {1} = \{ y \in \mathbb {R}^{m+1} ~ ; ~ e_{m+1}^T y = 1 \}\). The projection can be computed via
Proof
Since the constraint is linear (more precisely, affinelinear), the solution space \(M_\mathbb {1}\) is an affine subspace. Hence, \(M_\mathbb {1}\) is a nonempty, closed, and convex set. This implies existence and uniqueness of a solution given by the orthogonal projection onto \(M_\mathbb {1}\). The explicit formula can be easily derived from the fact that \(M_\mathbb {1}\) is a hyperplane that is orthogonal to the vector \(e_{m+1}\) and passes through the point \(r_0 = \frac{1}{m+1} \cdot e_{m+1} \in M_\mathbb {1}\):
\(\square \)
By Theorem 3, denoting \(P_\mathbb {1}\) as the (orthogonal) projection onto \(M_\mathbb {1}\), the solution to the constrained minimization problem 12 is given by
The projection solves the robustness issues of DMD by enforcing the existence of an eigenvalue \(\lambda _\mathbb {1} = 1\) for the companion matrix \(C_\mathbb {1}\) to the vector \(c_\mathbb {1}\). The spectral properties of \(C_\mathbb {1}\), i.e., the eigenvector \(v_\mathbb {1}\) to the eigenvalue \(\lambda _\mathbb {1}\), can therefore be used to compute an accurate background model as illustrated in Figs. 2 and 3. The details will be explained in combination with the algorithm in the next subsection.
4.3 Algorithm
In the following, we describe our algorithm “efficient exact dynamic mode decomposition for background modeling” (effexDMD4B), shown in Algorithm 2. Although our algorithm is based on theorems of the previous subsection, it can applied to any video data without any problems, even if the assumptions of the theorems do not hold.
The first step is to obtain the companion matrix \(C_{\text {\tiny DMD}}\). By Eq. 10 (and the related Eq. 8), we have to compute \(X^+ x_m\), which is the minimumnorm solution to the minimization problem
This step is performed by an iterative leastsquares method (lsqr), which has a computational complexity of \(\mathcal {O}(nm)\). To keep the constant small, a special start value \(c_{\text {start}} \in \mathbb {R}^{m+1}\) is used, adapted to the minimization problem 15 (Line 2–3). Due to the temporal coherence of video frames, the vector \(c_{\text {start}}\) should particularly emphasize the last data points. Moreover, it should be a partition of unity. We propose using
which has shown fast convergence rates, even if the number of frames grow (as we will see later).
In Line 4 of Algorithm 2, the projection according to Eq. 14 is applied. This step enforces the existence of an eigenvalue \(\lambda _\mathbb {1}\).
Finally, the eigenvector \(v_\mathbb {1}\) to the eigenvalue \(\lambda _\mathbb {1}\) is computed by the power method (without normalization) and transformed to a mode (Line 5–6) that represents the background model statically. The power method uses the repeated application of the companion matrix \(C_\mathbb {1}\) (which is constructed via Eq. 8 and 9) to a start value \(v_{\text {start}}\). This step allows extremely efficient computation at a complexity of \(\mathcal {O}(m)\) because the companion matrix has a lowdimensional representation as well as a sparse structure with only \(2m+1\) nonzero entries.
Moreover, we propose using a special start value based on the temporal arithmetic mean \(\overline{x} = \frac{1}{m+1} \sum _{k=0}^m x_k\) because the temporal mean represents a good initial guess for the background model. Since the power method operates in the lowdimensional space, the start vector is given by
Although there is generally no guarantee that the power method converges for this start vector, our experiments have shown a stable convergence behavior. This is due to the fact that eigenvalues that do not belong to the background are typically located inside the unit circle (see Fig. 2).
4.4 Extensions
In the previous subsection, the basic algorithm was presented that is capable of computing grayscale background models that are sufficient for the most applications. However, our approach is versatile and can be extended to RGB data, video data with periodic parts, and to streaming data. Moreover, these extensions can combined as well.
RGB Data Most background modeling algorithms only work with grayscale video frames because many formulations cannot handle multiple color channels at the same time. Moreover, the extensive computational overhead produced by additional data usually does not justify the slight improvements regarding accuracy. In contrast, our approach can be easily extended by keeping the computational overhead small. For this, we first follow Tirunagari et al. [35] to perform DMD on RGBbased video frames. Analogously to the left box of Fig. 3, the three channels of a RGBbased video frame are vectorized and stacked on top of each other. Hence, the data for DMD has dimensionality \(3n \times m+1\). The remaining steps of our approach can be performed in the same way as shown in Fig. 3 or Algorithm 2 (note that the background model has the same organized strucure as the RBG input data).
Data with periodic background parts Since DMD is based on a superposition principle (see Eq. 4 and 5), periodic parts in the background can be represented by additional DMD triples whose eigenvalues \(\lambda _j\) capture the frequency. As Fig. 4 shows, periodic processes are characterized by eigenvalues that lie on the unit circle. Hence, our approach can be easily extended in the following way:
Instead of computing only the static component, we use the power method for the ones that lie on the unit circle as well (see Fig. 4). More precisely, since the power method computes the greatest (in absolute value) eigenvalue, the shifted power method is used as long as values on the unit circle are detected (multiple applications with a fixed maximum still result in a computational complexity of \(\mathcal {O}(m)\)). Using the set
for a given threshold \(\delta \), the background of the video frames can be determined via (compare Eq. 5)
where the coefficients \(b = (b_j)_{j \in I_{\mathbb {1}+}}\) need to be computed separately. To do this, we represent the background components by matrices, i.e.,
where \(\Theta _{\mathbb {1}+} = \begin{bmatrix} \vartheta _j \end{bmatrix}_{j \in I_{\mathbb {1}+}}\) contains the respective modes columnwise and \(V_{\mathbb {1}+}\) is the Vandermonde matrix to the eigenvalues \((\lambda _j)_{j \in I_{\mathbb {1}+}}\). Then, a minimization problem in the spirit of RPCA is used:
The problem can be easily solved by the MoorePenrose pseudoinverse as \(I_{\mathbb {1}+}\) is typically very small:
Since the matrix is nearly diagonal, the offdiagonal elements have to be neglected. The resulting solution respects the complex conjugation of eigenvalues \(\lambda _j\) such that the scaled modes \(b_j \vartheta _j\) occur in the same complex conjugate pairs. Consequently, the sum is realvalued and, therefore, we can readily generate a background model. This technique is illustrated with an example in Fig. 4. It can observed that our extended approach captures the movement of the elevator precisely by additional two complex conjugated DMD components.
However, since the existence of additional appropriate eigenvalues (besides \(\lambda _\mathbb {1} = 1\)) is not guaranteed, this approach may select DMD components that do not belong to the background. Nevertheless, our experiments have shown that the procedure yields a robust algorithm if periodic patterns occur in the data (like the escalator in Fig. 4).
Streaming data for video surveillance Background modeling is often used for video surveillance, where data are continually acquired, i.e., it is assumed that data \(x_0,\dots ,x_T\) is given (as well as a solution to that) and \(x_{T+1}\) is new. For the application of our approach to such streaming data, Algorithm 2 needs to be modified slightly. Whereas the projection (Line 4), the power method (Line 5), and the background model computation (Line 6) can be used in the same way, the leastsquares method has to be altered.
In short, the problem is to find a solution \(c_\text {new} \in \mathbb {R}^{T+1}\) for the minimization problem
whereas the previous computed solution \(c_\text {old} \in \mathbb {R}^{T}\) solves the minimization problem \(\min \Vert \begin{bmatrix} x_0&\dots&x_{T1} \end{bmatrix} c  x_{T} \Vert \). There are two different techniques to tackle this problem, and both enable a streaming application in real time.
The first one uses an incremental approach for solving leastsquares problems, e.g., an incremental singular value decomposition, for which many implementations are available. The second technique uses the leastsquares method from Algorithm 2 with a modified start value that stems from the previous computation. We propose choosing
because \(c_\text {new}\) approximates the last relevant snapshot:
4.5 Implementation
Algorithm 2 was implemented in MATLAB R2019a, which already provides many optimized functions. For the leastsquares method in Line 3, the internal function named “lsqr” was used with a start value according to Eq. 16. The maximum iterations were not limited. For the power method in the nonextended case, the companion matrix \(C_\mathbb {1}\) was applied to the start value \(v_\text {start}\) (see Eq. 17) 100 times. Our tests have shown that this number of iteration produces accurate and robust background models. The procedure is done without normalization of the vector. Moreover, for the matrix vector multiplication the lowdimensional and sparse organized structure of the companion matrix should be taken into account (i.e., only \(2m+1\) entries are nonzero). For the power method in the extended case, the internal function “eigs” was used to compute various eigenvalues.
5 Experimental Evaluation
For the evaluation of our approach, two wellknown datasets for foreground background separation were used: The CDnet 2014 dataset [38] and the I2R dataset [22] (used for the periodic background extension). We compare our approach effexDMD4B (both gray and RGB) with ten algorithms that stem from the LRSlibrary [32], which were chosen from different methodcategories and speedclasses: ncRCPA [18], R2PCP [17], FWT [25], GROUSE [2], PGRMC [8], RPCAGD [40], pROST [15], OSTD [33], ROSL [31], and DSNMF [36]. The selection was also oriented toward the results of the corresponding review by Bouwmans et al. [6]. Moreover, the traditional DMD approach (exDMD) [14] and the compressed variant (cDMD) [9] are evaluated as well.
The evaluation includes qualitative (see Fig. 5) and quantitative (see Table 1) experimental results as well as computation time measurements. For each algorithm, either the standard parameter setup was chosen (from the LRSlibrary) or the suggested settings from the associated publication. For the final foreground mask, a \(5 \times 5\) median filter was used and the best result from four fixed thresholds were chosen to guarantee a fair comparison. The results were computed with MATLAB R2019a on a machine with 2.90GHz Intel Core i98950HK processor and 32GB RAM.
5.1 Accuracy
In Fig. 5, results from the above mentioned algorithms are compared qualitatively. An overview of the challenges of each scenario can be found in Table 2. Whereas some algorithms struggle with shadows (2nd row), ripping water (3rd row), or low framerates (9th and 10th row), our approach achieves adequate and robust results. For the bad weather (5th and 6th row) and thermal (7th and 8th row) scenarios, we achieve visually appealing foreground masks, in particular, for the 5th and 7th row. Only in the 1st and 4th row, our foreground mask (as well as the others) struggles with specific components of the frame, like glass, reflections, or spatial coherence .
These observations match to the quantitative outcomes in Table 1, where the \(F_1\)measure is computed for a selection of frames (where we have access to the ground truth). We observe that our approach both the grayscale version and the RGB version consistently obtain good results and remains robust, which is also true for GROUSE, RPCAGD, DSNMF, and ROSL. In particular, the previous DMD approaches, cDMD and exDMD, often fail to detect foreground objects accurately (e.g., for canoe, skating, and tram), leading to scores less than 0.6. Moreover, many algorithms have problems in computing background models for bigger scenarios because computational intense matrix calculations have to be performed. To avoid these issues, either subsampling techniques need to be exploited (which we have not used) or partitioning techniques Table (2).
According to the quantitative analysis, our approach effexDMD4B can compete with other stateoftheart algorithms in all scenarios. Whereas our RGB version achieves the best average scores along RPCAGD, our grayscale approach has the fourth best average score after the algorithm ROSL. Other accurate algorithms are DSNMF, GROUSE, and pROST, achieving average scores higher than 0.8. The two other DMDbased background algorithms have the worst average scores due to their robustness issues. We therefore emphasize that enforcing an eigenvalue \(\lambda _\mathbb {1}\) resolves the robustness problems of DMD and leads to more accurate results. To substantiate this statement, several precision–recall curves are presented in Fig. 6. Whereas cDMD and exDMD often fail to score good precision and recall values, we observe that our approach achieves accurate results for a variety of thresholds.
5.2 Computation Time
To demonstrate the efficiency of our algorithm, we first compare the overall computation time (except for the foreground mask procedure) of all algorithms for a given test dataset. The test dataset is based on the blizzard scenario, has a resolution of \(720 \times 480\), and consists of 500 frames. The comparison is illustrated in Fig. 7. We observe that the grayscale version of effexDMD4B is extremely efficient calculating the background model in nearly 6 seconds. The RGB version needs approximately three times as much computation time. This highlights the fact that Algorithm 2 evolves linearly in space because effexDMD4B_RGB has to deal with three color channels. The traditional DMD algorithm exDMD requires nearly 35 times more computation time than our algorithm (gray), whereas cDMD only needs 1.5 times more computation time due to its high compression rate. For the algorithms from the LRSlibrary, DSNMF is almost as fast as our approach. The other algorithms, like ncRPCA, FWT, and ROSL (which also belong to the highest speed classification), are at least an order of magnitude slower.
Besides the absolute computation time, we also want to demonstrate how the algorithm scales when the number of frames grows. Therefore, Fig. 8 shows the computation time for the six most efficient algorithms as a function of number of frames. We observe that our approach effexDMD4B (grayscale and RGB) as well as DeepSemiNMF evolve linearly as the lines coincide with their dashed lines (linear leastsquares fit). Previous DMDbased methods, like cDMD (or exDMD), highly suffer from operations that exhibit quadratic and cubic computational complexity (due to the computation of all DMD components while our approach avoids it). This can be observed by the purple dashed line, which does not match with the solid purple line. A similar behavior can be observed for ROSL and even worse for FWT.
6 Discussion and Conclusion
In this paper, we have presented a DMDbased approach for the computation of background models. According to the qualitative and quantitative results, our approach can compete with other stateoftheart methods. It produces accurate background models (just like RPCAGD and ROSL) and remains robust, which is a significant improvement compared to previous DMDbased algorithms.
However, the real strengths of effexDMD4B are its high efficiency and flexibility. It is extremely fast and additionally evolves linearly in time and space, whereas most other algorithms exhibit quadratic or even cubic computational complexities. Moreover, it can be applied to a wide range of scenarios with different challenges without adjustment. Many background modeling algorithms need to modify parameter settings to produce accurate results. Another sort of flexibility is the extension of our approach to RGB data, data with periodic backgrounds parts, and to streaming data.
Since DMD offers great potential, we plan to extend our approach to video data that exhibit panning, zooming, jittering, and intermitting elements. This may include the use of different pre and postprocessing tools, like spatiotemporal smoothing, to obtain a more accurate foreground mask. Another idea is to apply other iterative leastsquares methods (lsqr) that respect the special temporal coherence structure of our minimization problem. This consideration is particularly important for the application to streaming data, where special emphasis is on the leastsquares methods (since an execution is needed in every update step). Moreover, since DMD can deal with compressed data (due to the spectral decomposition), a version of effexDMD4B with compression techniques may further boost efficiency.
References
Akilan, T., Wu, Q.J., Jiang, W., Safaei, A., Huo, J.: New trend in video foreground detection using deep learning. In: IEEE 61st International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 889–892 (2018). https://doi.org/10.1109/MWSCAS.2018.8623825
Balzano, L., Nowak, R., Recht, B.: Online identification and tracking of subspaces from highly incomplete information. In: 48th Annual Allerton Conference on Communication, Control, and Computing, pp. 704 –711 (2010). https://doi.org/10.1109/ALLERTON.2010.5706976
Bi, C., Yuan, Y., Zhang, J., Shi, Y., Xiang, Y., Wang, Y., Zhang, R.: Dynamic mode decomposition based video shot detection. IEEE Access 6, 21397–21407 (2018). https://doi.org/10.1109/ACCESS.2018.2825106
Bouwmans, T., Javed, S., Sultana, M., Jung, S.K.: Deep neural network concepts for background subtraction: A systematic review and comparative evaluation. Neural Netw. 117, 8–66 (2019). https://doi.org/10.1016/j.neunet.2019.04.024
Bouwmans, T., Porikli, F., Höferlin, B., Vacavant, A.: Background modeling and foreground detection for video surveillance. CRC Press, USA (2014)
Bouwmans, T., Sobral, A., Javed, S., Jung, S.K., Zahzah, E.H.: Decomposition into lowrank plus additive matrices for background/foreground separation. Computer Sci. Rev. 23, 1–71 (2017). https://doi.org/10.1016/j.cosrev.2016.11.001
Braham, M., Van Droogenbroeck, M.: Deep background subtraction with scenespecific convolutional neural networks. In: 2016 International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 1–4 (2016). 10.1109/IWSSIP.2016.7502717
Cherapanamjeri, Y., Gupta, K., Jain, P.: Nearly optimal robust matrix completion. In: International Conference on Machine Learning (ICML), pp. 797–805 (2017). https://doi.org/10.5555/3305381.3305464
Erichson, N.B., Brunton, S.L., Kutz, J.N.: Compressed dynamic mode decomposition for background modeling. J. RealTime Image Process. (JRTIP) 16(5), 1479–1492 (2019). https://doi.org/10.1007/s1155401606552
Erichson, N.B., Donovan, C.: Randomized lowrank dynamic mode decomposition for motion detection. Computer Vision Image Understand. (CVIU) 146, 40–50 (2016). https://doi.org/10.1016/j.cviu.2016.02.005
GarciaGarcia, B., Bouwmans, T., Silva, A.J.R.: Background subtraction in real applications: challenges, current models and future directions. Computer Sci. Rev. (2020). https://doi.org/10.1016/j.cosrev.2019.100204
Giraldo, J.H., Bouwmans, T.: Graphbgs: Background subtraction via recovery of graph signals. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 6881–6888 (2021). doi: https://doi.org/10.1109/ICPR48806.2021.9412999
Giraldo, J.H., Javed, S., Bouwmans, T.: Graph moving object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 56, 1–10 (2020). https://doi.org/10.1109/TPAMI.2020.3042093
Grosek, J., Kutz, J.N.: Dynamic mode decomposition for realtime background/foreground separation in video. http://arxiv.org/abs/1404.7592 (2014)
Hage, C., Kleinsteuber, M.: Robust PCA and subspace tracking from incomplete observations using \(\ell _0\)surrogates. Comput. Stat. (CompStat) 29(3–4), 467–487 (2014). https://doi.org/10.1007/s0018001304354
Haq, I.U., Fujii, K., Kawahara, Y.: Dynamic mode decomposition via dictionary learning for foreground modeling in videos. Computer Vision Image Understand. (CVIU) (2020). https://doi.org/10.1016/j.cviu.2020.103022
Hintermüller, M., Wu, T.: Robust principal component pursuit via inexact alternating minimization on matrix manifolds. J. Math. Imag. Vision (JMIV) 51(3), 361–377 (2015). https://doi.org/10.1007/s108510140527y
Kang, Z., Peng, C., Cheng, Q.: Robust pca via nonconvex rank approximation. In: IEEE International Conference on Data Mining (ICDM), pp. 211–220 (2015). https://doi.org/10.1109/ICDM.2015.15
Krake, T., Weiskopf, D., Eberhardt, B.: Dynamic mode decomposition: Theory and data reconstruction. http://arxiv.org/abs/1909.10466 (2019)
Kutz, J.N., Brunton, S.L., Brunton, B.W., Proctor, J.L.: Dynamic mode decomposition datadriven modeling of complex systems. SIAM Soc. Indus. Appl. Math. 2, 51 (2016). https://doi.org/10.1137/1.9781611974508
Kutz, J.N., Fu, X., Brunton, S.L., Erichson, N.B.: Multiresolution dynamic mode decomposition for foreground/background separation and object tracking. IEEE International Conference on Computer Vision Workshop (ICCVW) pp. 921–929 (2015). https://doi.org/10.1109/ICCVW.2015.122
Li, L., Huang, W., Gu, I.Y.H., Tian, Q.: Statistical modeling of complex backgrounds for foreground object detection. IEEE Trans. Image Process. (TIP) 13(11), 1459–1472 (2004). https://doi.org/10.1109/TIP.2004.836169
Mandal, M., Vipparthi, S.K.: An empirical review of deep learning frameworks for change detection: Model design, experimental frameworks, challenges and research needs. IEEE Transactions on Intelligent Transportation Systems (2021). (to appear)
Minematsu, T., Shimada, A., Taniguchi, R.i.: Rethinking background and foreground in deep neural networkbased background subtraction. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 3229–3233 (2020). 10.1109/ICIP40778.2020.9191151
Mu, C., Zhang, Y., Wright, J., Goldfarb, D.: Scalable robust matrix recovery: Frankwolfe meets proximal methods. SIAM J. Scientif. Comput. (SISC) 38(5), A3291–A3317 (2016). https://doi.org/10.1137/15M101628X
Narayanamurthy, P., Vaswani, N.: A fast and memoryefficient algorithm for robust pca (merop). In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4684–4688 (2018). https://doi.org/10.1109/ICASSP.2018.8461540
Ngo, T.T., Nguyen, V., Pham, X.Q., Hossain, M.A., Huh, E.N.: Motion saliency detection for surveillance systems using streaming dynamic mode decomposition. Symmetry 12(9), 1397 (2020). https://doi.org/10.3390/sym12091397
Pendergrass, S.D., Brunton, S.L., Kutz, J.N., Erichson, N.B., Askham, T.M.: Dynamic mode decomposition for background modeling. In: IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1862–1870 (2017). https://doi.org/10.1109/ICCVW.2017.220
Rodriguez, P., Wohlberg, B.: Incremental principal component pursuit for video background modeling. J. Math. Imag. Vision (JMIV) 55(1), 1–18 (2016). https://doi.org/10.1007/s108510150610z
Schmid, P., Sesterhenn, J.: Dynamic mode decomposition of numerical and experimental data. In 61st Annual Meeting of the APS Division of Fluid Dynamics. American Physical Society 53(15) (2008)
Shu, X., Porikli, F., Ahuja, N.: Robust orthonormal subspace learning: Efficient recovery of corrupted lowrank matrices. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3874–3881 (2014). https://doi.org/10.1109/CVPR.2014.495
Sobral, A., Bouwmans, T., hadi Zahzah, E.: LRSLibrary: Lowrank and sparse tools for background modeling and subtraction in videos. Robust LowRank and Sparse Matrix Decomposition: Applications in Image and Video Processing (2016)
Sobral, A., Javed, S., Jung, S.K., Bouwmans, T., hadi Zahzah, E.: Online stochastic tensor decomposition for background subtraction in multispectral video sequences. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 106–113 (2015). https://doi.org/10.1109/ICCVW.2015.125
Sobral, A., Vacavant, A.: A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Computer Vision Image Understand. (CVIU) 122, 4–21 (2014). https://doi.org/10.1016/j.cviu.2013.12.005
Tirunagari, S., Poh, N., Bober, M., Windridge, D.: Can dmd obtain a scene background in color? In: International Conference on Image, Vision and Computing (ICIVC), pp. 46–50 (2016). https://doi.org/10.1109/ICIVC.2016.7571272
Trigeorgis, G., Bousmalis, K., Zafeiriou, S., Schuller, B.W.: A deep matrix factorization method for learning attribute representations. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(3), 417–429 (2016). https://doi.org/10.1109/TPAMI.2016.2554555
Vaswani, N., Bouwmans, T., Javed, S., Narayanamurthy, P.: Robust subspace learning: robust pca, robust subspace tracking, and robust subspace recovery. IEEE Signal Process. Magaz. 35(4), 32–55 (2018). https://doi.org/10.1109/MSP.2018.2826566
Wang, Y., Jodoin, P.M., Porikli, F., Konrad, J., Benezeth, Y., Ishwar, P.: Cdnet 2014: An expanded change detection benchmark dataset. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 393–400 (2014). https://doi.org/10.1109/CVPRW.2014.126
Xu, Y., Dong, J., Zhang, B., Xu, D.: Background modeling methods in video analysis: a review and comparative evaluation. CAAI Trans. Intell. Technol. 1(1), 43–60 (2016). https://doi.org/10.1016/j.trit.2016.03.005
Yi, X., Park, D., Chen, Y., Caramanis, C.: Fast algorithms for robust PCA via gradient descent. Adv. Neural Inf. Process Syst. (NIPS) 29, 4152–4160 (2016). https://doi.org/10.5555/3157382.3157562
Acknowledgements
This work is partly supported by “Kooperatives Promotionskolleg Digital Media” at Hochschule der Medien and the University of Stuttgart. Besides that, it is funded by the “Deutsche Forschungsgemeinschaft” (DFG, German Research Foundation)  ProjectID 251654672  TRR 161 (B01, B04).
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Krake, T., Bruhn, A., Eberhardt, B. et al. Efficient and Robust Background Modeling with Dynamic Mode Decomposition. J Math Imaging Vis 64, 364–378 (2022). https://doi.org/10.1007/s10851022010680
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851022010680
Keywords
 Dynamic mode decomposition
 Spectral decomposition
 Background modeling
 Foreground detection