Efficient and Robust Background Modeling with Dynamic Mode Decomposition

A large number of modern video background modeling algorithms deal with computational costly minimization problems that often need parameter adjustments. While in most cases spatial and temporal constraints are added artificially to the minimization process, our approach is to exploit Dynamic Mode Decomposition (DMD), a spectral decomposition technique that naturally extracts spatio-temporal patterns from data. Applied to video data, DMD can compute background models. However, the original DMD algorithm for background modeling is neither efficient nor robust. In this paper, we present an equivalent reformulation with constraints leading to a more suitable decomposition into fore- and background. Due to the reformulation, which uses sparse and low-dimensional structures, an efficient and robust algorithm is derived that computes accurate background models. Moreover, we show how our approach can be extended to RGB data, data with periodic parts, and streaming data enabling a versatile use.


Introduction
Modern video background modeling algorithms are often inherently based on matrix decompositions [6]. Besides pre-and post-processing, the core of these algorithms is to explicitly or implicitly decompose the data into a low-rank matrix characterizing the background and a sparse matrix that shows the foreground. To perform such a decomposition, a computational costly minimization problem with parame-ter adjustment is typically established that neglects spatial and temporal coherence at first (due to the use of simple matrix norms, seminorms, and quasinorms in the loss function). Since this formulation may fail to produce adequate background models for more complex video sequences, spatial and/or temporal constraints are taken into account to meet the requirements. Another more recent approach is to integrate neural networks [1] or related concepts into the process of background modeling, which leads to supervised methods in general. In this paper, however, we neither focus on a minimization-based background modeling approach as described above nor on learning-based concepts. Instead, we follow a different approach and exploit Dynamic Mode Decomposition (DMD), a versatile spectral decomposition technique that naturally extracts spatio-temporal patterns from data.
DMD was originally established in the fluid dynamics community [30] and has evolved to an effective analysis tool. Numerically, DMD produces the following triples: DMD modes, amplitudes, and eigenvalues. For time-dependent data, the modes represent the spatial contribution and the amplitudes specify their impact. Each associated eigenvalue characterizes the temporal development. According to this interpretation, Grosek and Kutz [14] were the first to model the background in video by DMD. Compared to state-of-the-art algorithms, the approach suffers in terms of accuracy, efficiency, and robustness. Some publications address the high computational costs of DMD by efficient subsampling strategies or related concepts, but these techniques struggle even more with problems regarding accuracy and robustness.
Instead of applying subsampling techniques, we present an equivalent reformulation of DMD with constraints that enforces the existence of background components. Due to the reformulation, which uses sparse and low-dimensional structures, extremely efficient algorithms (e.g., the power method) can be leveraged for the computation. The contribution of this paper can be summarized as follows: -We elaborate on the drawbacks of previous methods and thus motivate our approach. -An equivalent reformulation of DMD with constraints is presented, leading to a suitable decomposition into foreand background. -We derive an efficient and robust algorithm that computes accurate background models. -The approach is extended to RGB data, data with periodic background parts, and streaming data.
The remainder of this paper is organized as follows: After clarifying related work in the subsequent section, Sect. 3 reviews the traditional DMD approach for foreground-background separation. Section 4 presents our approach including a motivation, new formulation, the associated algorithm, extensions, and implementation details. Section 5 provides a qualitative and quantitative evaluation in terms of accuracy, efficiency, and robustness. Finally, a conclusion is drawn in Sect. 6.

Related Work
There are numerous background modeling algorithms developed in the literature [5], and they can be classified as unsupervised [26,29,37], supervised [1,4,7,23,24], and semisupervised methods [12,13]. Supervised methods allow for more accurate results than unsupervised methods, especially in complex scenarios, but they need typically large training data. Semi-supervised approaches achieve comparably accurate results with less labeled training data; however, they are prone to supervision aspects as well. Since our approach does not rely on training processes or related concepts, it is an unsupervised method. Such methods were published in large numbers and varieties over the past decade. Depending on the tasks, papers [11,34,39] classify them differently. To our knowledge, the most recent review for unsupervised background modeling algorithms was written by Bouwmans et al. [6] in combination with the LRSlibrary [32]. Using the approach of decomposing video data into a low-rank and sparse matrix, they classify algorithms into the following categories: robust principal component analysis (RPCA), matrix completion (MC), non-negative matrix factorization (NMF), subspace tracking (ST), low-rank recovery (LRR), three-term decomposition (TTD), and tensor decompositions. Bouwmans et al. [6] put DMD into the RPCA category. However, DMD substantially differs from conventional RPCA approaches, e.g., the structure of the established minimization problems is completely different. We thus compare our approach only with DMD-based approaches in the following.
The first background modeling approach with DMD was proposed by Grosek and Kutz [14]. An extension to RGB video was presented by Tirunagari [35]. However, alternative state-of-the-art background modeling algorithms achieve better results with substantially less computational costs. Kutz et al. [21] later proposed using multi-resolution DMD for background modeling. It integrates multi-resolution analysis techniques into DMD, which enables the identification of multi-time scale features and, therefore, may lead to better results. Nevertheless, computational costs are even higher. The issue of efficiency was addressed by Erichson and Donovan [10] and slightly later by Erichson et al. [9] and Pendergrass et al. [28]. All these approaches use efficient sampling strategies, either based on a randomized singular value decomposition (randomized DMD) or on compressed data (compressed DMD). Whereas these subsampling techniques show promising efficiency, the background model suffers from the subsampling process in terms of accuracy  Fig. 1 The traditional DMD approach for background modeling: After vectorizing the m + 1 video frames into n-dimensional data, the DMD modes, eigenvalues, and amplitudes are computed via a low-dimensional representation. Finally, the background model is extracted Algorithm 1 Exact Dynamic Mode Decomposition 1: function exDMD(x 0 , . . . , x m ) 2: Reduced SVD: X = U V * with rank(X ) = r 4: Calculate S DMD = U * Y V −1 5: Calculate λ 1 , . . . , λ r and v 1 , . . . , v r of S DMD 6: for λ i = 0 do 7: end for 9: = diag(λ 1 , λ 2 , . . . , λ r0 ) with λ 1 , λ 2 , . . . , λ r0 = 0 10: = ϑ 1 ϑ 2 . . . ϑ r0 11: Calculate a = lsqr( , x 0 ) with a = (a 1 , . . . , a r0 ) 12: return λ j , ϑ j , a j 13: end function and robustness. Instead of applying subsampling techniques, we use an equivalent reformulation of DMD with constraints that allows efficient and robust computation and leads to a more suitable decomposition into fore-and background.
Recently, Haq et al. [16] also presented an efficient DMDbased method for background modeling that uses dictionary learning as subsampling strategy. As the dictionary needs to be trained, it substantially differs from our approach.
Finally, there are also other DMD-based techniques for videos, e.g., video shot detection [3] or saliency detection [27]. However, those approaches have other objectives and technical contributions than our paper.

Theoretical Foundation
This section provides the basics of the foregroundbackground separation with DMD, including the general theory of DMD and its traditional application to video data (see Fig. 1).

Dynamic Mode Decomposition
Before formulating the DMD background modeling algorithm, we summarize the principles of DMD in general [19,20]. These aspects help understand foreground-background separation and, thus, our approach.
Let us consider data x 0 , . . . , x m ∈ R n sampled uniformly in time. Typically, the data dimension n is considerably greater than m (compare Fig. 1, left box). DMD basically computes eigenvalues and eigenvectors of a highdimensional matrix A DMD ∈ R n×n (characterizing the dynamics of the data) via a low-dimensional representation S DMD (see Fig. 1, middle box). The matrix A DMD is derived from the minimization problem where . . x m . In general, the minimization problem has infinite solutions (as we will prove later). DMD therefore aims for the minimum-norm solution, which is explicitly given by where X + is the Moore-Penrose pseudoinverse of X . To obtain the eigenvalues and eigenvectors of A DMD , the lowdimensional representation S DMD is computed via the thin singular value decomposition (SVD) of X , i.e., X = U V * with r = rank(X ), and the following reformulation: Eq. 3 is the core of DMD, forming the starting point of Algorithm 1 (Lines 1-4). In the next steps, the (DMD) eigenvalues λ j ∈ C and eigenvectors v j ∈ C r of S DMD are computed (line 5). Finally, all eigenvectors with nonzero eigenvalues are transformed into modes ϑ j = 1 λ j Y V −1 v j ∈ C n (Lines 6-8) and the corresponding amplitudes a j ∈ C are computed by min a − x 0 (Lines 9-11). Algorithm 1 results in the following decomposition of the data: This approximation property is crucial for the interpretation of DMD triples (ϑ j , λ j , a j ), in particular, concerning the foreground-background separation approach.

Background Modeling with DMD
To compute a background model with DMD, the DMD components have to be interpreted in terms of video data. An overview of the traditional DMD background modeling is illustrated in Fig. 1. After applying DMD to the vectorized video frames, DMD produces modes ϑ j ∈ C n , eigenvalues λ j ∈ C, and amplitudes a j ∈ C with a computational complexity of O(nm 2 ) in general as n m. The entries of a DMD mode ϑ j correspond to grayscaled pixel values at specific pixel locations. According to Eq. 4, the associated eigenvalue λ j characterizes the temporal development. To extract a background model, Grosek and Kutz [14] propose selecting DMD triples that vary over time extremely slowly or not even at all. These triples are characterized by eigenvalues that satisfy λ j ≈ 1 or, more precisely, by the index set for a given threshold δ (see Fig. 2, top). Therefore, the video frames can be separated by DMD into fore-and background:  Fig. 2 Comparison of the traditional approach with ours: The traditional approach uses eigenvalues of the matrix S DMD for the background that are only approximately 1. Therefore, fore-and background components can be mixed up as the extracted background at a certain time shows. In contrast, our approach uses a projection that enforces the existence of an eigenvalues λ 1 = 1, which leads to a more suitable decomposition. To further improve the background model, the extended method should be consulted, where multiple eigenvalues are used Since the eigenvalues λ j , amplitudes a j , and modes ϑ j are complex in general, the sums are not real-valued. Whereas Grosek and Kutz [14] use the absolute value as well as a rearrangement of negative values, Erichson et al. [9] recommend the use of the real part. In this paper, we follow the technique of Grosek and Kutz.

Proposed Approach
In this section, we present our DMD background modeling approach. After a short motivation, the theory and the algorithm (based on grayscaled data) are explained in detail. We then extend our approach to RGB data, data with periodic parts, and streaming data, which enables the usage in video surveillance. Finally, the implementation is described.

Motivation
Leading background modeling algorithms are characterized by accuracy, robustness, and efficiency. While DMD is capable of producing accurate background models in general, the traditional DMD background modeling algorithm is neither robust nor efficient. This fact can be seen in Fig. 1, where the traditional approach fails to produce an accurate background model. We additionally highlight the computational costs of each step of DMD. Since both an SVD and an eigenvalue decomposition is needed, DMD has a computational complexity of O(nm 2 ) in general as n m.
Even though the improvements via subsampling strategies boost efficiency at the cost of accuracy, they do not solve the following underlying problem: In Fig. 2 top, the background model of the traditional approach is shown at an earlier frame. It can be observed that a pair of complex-conjugated eigenvalues close to the value 1 is detected. According to the traditional approach, the two corresponding DMD triples (ϑ j , a j , λ j ) should be selected. If they are not close enough at the value 1, no components are selected due to thresholding and no background can be computed, i.e., the algorithm is not robust. Otherwise, this small inaccuracy causes numerical instability and a mixing of fore-and background parts as shown in Fig. 2. These issues may be even worse for subsampled data.
Consequently, DMD has to be modified such that components either clearly belong to the fore-or background. Since DMD only uses one or two triples (often occurring in complex-conjugated pairs) for the background model [20], we propose enforcing the existence of an eigenvalue λ 1 = 1 by a DMD-consistent projection. Moreover, we also recommend using the efficient power method to compute the relevant DMD components for an adequate background model. We can do this because our experiments have shown that all other eigenvalues are inside the unit circle due to the fact that foreground objects appear and disappear over time. This leads to a more suitable decomposition into fore-and background as shown in Fig. 2 that is produced in an efficient manner.

Theory
As mentioned before, we want to enforce the existence of an eigenvalue λ 1 = 1 to obtain a suitable decomposition into fore-and background. To understand the general idea of our approach, we shortly summarize the steps in the following: ( §1) If the data x 0 , . . . , x m ∈ R n are linear independent (which is the case for real video footage in general), we first note that the high-dimensional minimization problem 1 has infinite solutions that solve the equation exactly, i.e., AX = Y .

Proposition 1
For linearly independent data x 0 , . . . , x m ∈ R n with Z = x 0 . . . x m ∈ R n×m+1 , the minimization problem 1 has infinite solutions that solve the equation exactly, i.e., AX = Y . In particular, every combination of m + 1 distinct complex numbers z 0 , . . . , z m ∈ C defines a solution A z in the form of its eigenvalues.
Proof Let us consider distinct complex numbers z 0 , . . . , z m ∈ C with z = diag(z 0 , . . . , z m ) as well as the respective Vandermonde matrix Defining z = Z V (z) −1 (the columns of the matrix are basically scaled modes), the matrix A z = z z + z solves the minimization problem 1 exactly: where we used that + z = V (z)Z + (as the columns of the matrix Z are linearly independent). Since z 0 , . . . , z m are chosen arbitrarily, the assertion is proven.
The above theorem states that every distinct combination of temporal components λ 0 , . . . , λ m (which are eigenvalues of a respective matrix A) are actually possible outcomes. However, since DMD uses the minimum-norm solution A DMD = Y X + , the DMD eigenvalues fit better to the data than a random solution. The existence of adequate eigenvalues for background modeling is not guaranteed nonetheless.
( §2) Hence, the naive approach is to find a matrix A ∈ R n×n that solves the minimization problem 1 and has an eigenvalue λ 1 = 1: However, this formulation cannot be solved practically due to the high-dimensionality of the problem and the eigenvalue equation in the constraints.
To formulate an adequate constrained minimization problem, we use the low-dimensional representation C of a matrix A with respect to the data matrix Z = x 0 . . . x m ∈ R n×m+1 : This matrix is a companion matrix and completely characterizes the data-relevant spectral properties of A. In addition, the retransformed matrix ZC Z + ∈ R n×n inherits many algebraic properties, in particular, the minimization property, as the following theorem shows.
to a matrix A ∈ R n×n satisfying AX = Y has the following properties: (a) The low-dimensional representation C is a companion matrix and the related vectorc satisfies Proof (a) The following calculation shows that C is a companion matrix: (b) Let λ be an eigenvalue of the matrix C with eigenvectorṽ. Then, the nonzero vector v = Zṽ (as Z has linearly independent columns) satisfies the following equation where we have used that Z Z + is the projection onto the image of Z .
(c) Let λ be an eigenvalue of the matrix A with eigenvector v. Then, the vectorṽ = Z + v satisfies the following equation To show thatṽ is an eigenvector, we assume thatṽ = 0. This implies 0 = AZṽ, which results in 0 = AZ Z + v = Av = λv.
(d) Using the companion matrix structure and the linearly independent columns of the matrix Z , the following calcution shows that the retransformed matrix ZC Z + solves the first constraint: With the help of Theorem 1, every solution A to the minimization problem 1 can be represented by an appropriate companion matrix C that naturally satisfies the first constraint in minimization problem 6. To reformulate the entire constrained minimization problem 6, the low-dimensional representation of A DMD = Y X + has to be calculated as well, which will be done in the following theorem.
Theorem 2 For linearly independent data x 0 , . . . , x m ∈ R n with Z = x 0 . . . x m ∈ R n×m+1 , the low-dimensional representation C DMD = Z + A DMD Z ∈ R m+1×m+1 to the matrix A DMD = Y X + ∈ R n×n has the following properties: (a) The low-dimensional representation C DMD is a companion matrix with rank(C) = m and the related vectorc DMD satisfies (b) An eigenvalue λ of the matrix C DMD with eigenvectorṽ is also an eigenvalue of A DMD with corresponding eigenvec- Proof (a) According to Theorem 1, the matrix C DMD is a companion matrix where the corresponding vectorc DMD is given bỹ Since c 0 = 0, the matrix C DMD has rank(C DMD ) = m.
Since A DMD has rank m and satisfies A DMD X = Y by the assumption [19], the image of A DMD is given by . . . , x m ).
Therefore, the requirements of (b) and (c) in Theorem 1 are met for the matrix A DMD .
(d) To show the equality ZC DMD Z + = A DMD , we first notice that both matrices solve the equation AX = Y (see Theorem 1). Therefore, the two matrices are equal for vectors x ∈ span(x 0 , . . . , x m−1 ). In addition, this assertion is also true for the m + 1 snapshot x m because To conclude the proof, we show the equality for an arbitrary vector y from the orthogonal complement of span(x 0 , . . . , x m ) ⊥ : Since a subspace and its orthogonal complement span the entire space, the two matrices are equal.
Theorem 1 and Theorem 2 show that a high-dimensional matrix A that solves the minimization problem 1 exactly, such as A DMD , can be adequately represented by a low-dimensional representation capturing the data-relevant properties. Moreover, the low-dimensional representation has the structure of a companion matrix that naturally satisfies the first constraint in the minimization problem 6. By these characteristics, the high-dimensional constrained minimization problem 6 can be reformulated into the following lowdimensional constrained minimization problem (using the companion matrices from Theorem 1 and Theorem 2): ( §3) The constraint in minimization problem 11 seems to have an impractical form as well. However, it can be solved in a feasible way: Since the characteristic polynomial of a companion matrix C with vectorc is given by the constraint is equivalent to pc(1) = 0 or, more precisely, c 0 +c 1 + · · · +c m = 1.
Using the notation e m+1 = (1, 1, . . . , 1) T ∈ R m+1 , the low-dimensional constrained minimization problem 11 is equivalent to The solution space M 1 = {y ∈ R m+1 ; e T m+1 y = 1} has several nice properties: It is an affine subspace and therefore convex, which implies existence and uniqueness of a solution for the constrained minimization problem 12 as the following theorem shows.

Theorem 3 For the constrained minimization problem 12, there exists a unique solution given by the orthogonal projection P
The projection can be computed via Proof Since the constraint is linear (more precisely, affinelinear), the solution space M 1 is an affine subspace. Hence, M 1 is a non-empty, closed, and convex set. This implies existence and uniqueness of a solution given by the orthogonal projection onto M 1 . The explicit formula can be easily derived from the fact that M 1 is a hyperplane that is orthogonal to the vector e m+1 and passes through the point r 0 = 1 m+1 · e m+1 ∈ M 1 : By Theorem 3, denoting P 1 as the (orthogonal) projection onto M 1 , the solution to the constrained minimization problem 12 is given bỹ Algorithm 2 Efficient Exact Dynamic Mode Decomposition for Background Modeling (effexDMD4B) 1: function effexDMD4B(x 0 , . . . , x m , c start , v start ) 2: The projection solves the robustness issues of DMD by enforcing the existence of an eigenvalue λ 1 = 1 for the companion matrix C 1 to the vector c 1 . The spectral properties of C 1 , i.e., the eigenvector v 1 to the eigenvalue λ 1 , can therefore be used to compute an accurate background model as illustrated in Figs. 2 and 3. The details will be explained in combination with the algorithm in the next subsection.

Algorithm
In the following, we describe our algorithm "efficient exact dynamic mode decomposition for background modeling" (effexDMD4B), shown in Algorithm 2. Although our algorithm is based on theorems of the previous subsection, it can applied to any video data without any problems, even if the assumptions of the theorems do not hold.
The first step is to obtain the companion matrix C DMD . By Eq. 10 (and the related Eq. 8), we have to compute X + x m , which is the minimum-norm solution to the minimization problem This step is performed by an iterative least-squares method (lsqr), which has a computational complexity of O(nm). To keep the constant small, a special start value c start ∈ R m+1 is used, adapted to the minimization problem 15 (Line 2-3). Due to the temporal coherence of video frames, the vector c start should particularly emphasize the last data points. Moreover, it should be a partition of unity. We propose using c start = 1 2 m+1 , . . . , which has shown fast convergence rates, even if the number of frames grow (as we will see later). In Line 4 of Algorithm 2, the projection according to Eq. 14 is applied. This step enforces the existence of an eigenvalue λ 1 .
Finally, the eigenvector v 1 to the eigenvalue λ 1 is computed by the power method (without normalization) and   Fig. 3 Our DMD approach for (grayscaled) background modeling: After vectorizing the m + 1 video frames into n-dimensional data, the matrix C DMD is computed and subsequently projected. Then, the power method is applied to the low-dimensional and sparse companion matrix C 1 for the computation of background relevant DMD components that yield the background model transformed to a mode (Line 5-6) that represents the background model statically. The power method uses the repeated application of the companion matrix C 1 (which is constructed via Eq. 8 and 9) to a start value v start . This step allows extremely efficient computation at a complexity of O(m) because the companion matrix has a low-dimensional representation as well as a sparse structure with only 2m + 1 non-zero entries.
Moreover, we propose using a special start value based on the temporal arithmetic mean x = 1 m+1 m k=0 x k because the temporal mean represents a good initial guess for the background model. Since the power method operates in the low-dimensional space, the start vector is given by Although there is generally no guarantee that the power method converges for this start vector, our experiments have shown a stable convergence behavior. This is due to the fact that eigenvalues that do not belong to the background are typically located inside the unit circle (see Fig. 2).

Extensions
In the previous subsection, the basic algorithm was presented that is capable of computing grayscale background models that are sufficient for the most applications. However, our approach is versatile and can be extended to RGB data, video data with periodic parts, and to streaming data. Moreover, these extensions can combined as well. RGB Data Most background modeling algorithms only work with grayscale video frames because many formulations cannot handle multiple color channels at the same time. Moreover, the extensive computational overhead produced by additional data usually does not justify the slight improvements regarding accuracy. In contrast, our approach can be easily extended by keeping the computational overhead small. For this, we first follow Tirunagari et al. [35] to perform DMD on RGB-based video frames. Analogously to the left box of Fig. 3, the three channels of a RGB-based video frame are vectorized and stacked on top of each other. Hence, the data for DMD has dimensionality 3n×m+1. The remaining steps of our approach can be performed in the same way as shown in Fig. 3 or Algorithm 2 (note that the background model has the same organized strucure as the RBG input data). Data with periodic background parts Since DMD is based on a superposition principle (see Eq. 4 and 5), periodic parts in the background can be represented by additional DMD triples whose eigenvalues λ j capture the frequency. As Fig. 4 shows, periodic processes are characterized by eigenvalues that lie on the unit circle. Hence, our approach can be easily extended in the following way: Instead of computing only the static component, we use the power method for the ones that lie on the unit circle as well (see Fig. 4). More precisely, since the power method computes the greatest (in absolute value) eigenvalue, the shifted power method is used as long as values on the unit circle are detected (multiple applications with a fixed maximum still result in a computational complexity of O(m)). Using the set for a given threshold δ, the background of the video frames can be determined via (compare Eq. 5)

Non-extended Approach
Proposed Extended DMD Approach Video Frames Fig. 4 Our extended DMD approach for background modeling. Using the power method multiple times, additional periodic parts in the background can be extracted. These periodic background components are characterized by DMD triples with eigenvalues on the unit circle (highlighted by orange circles) where the coefficients b = (b j ) j∈I 1+ need to be computed separately. To do this, we represent the background components by matrices, i.e., where 1+ = ϑ j j∈I 1+ contains the respective modes column-wise and V 1+ is the Vandermonde matrix to the eigenvalues (λ j ) j∈I 1+ . Then, a minimization problem in the spirit of RPCA is used: The problem can be easily solved by the Moore-Penrose pseudoinverse as I 1+ is typically very small: Since the matrix is nearly diagonal, the off-diagonal elements have to be neglected. The resulting solution respects the complex conjugation of eigenvalues λ j such that the scaled modes b j ϑ j occur in the same complex conjugate pairs. Consequently, the sum is real-valued and, therefore, we can readily generate a background model. This technique is illustrated with an example in Fig. 4. It can observed that our extended approach captures the movement of the elevator precisely by additional two complex conjugated DMD components. However, since the existence of additional appropriate eigenvalues (besides λ 1 = 1) is not guaranteed, this approach may select DMD components that do not belong to the background. Nevertheless, our experiments have shown that the procedure yields a robust algorithm if periodic patterns occur in the data (like the escalator in Fig. 4). Streaming data for video surveillance Background modeling is often used for video surveillance, where data are continually acquired, i.e., it is assumed that data x 0 , . . . , x T is given (as well as a solution to that) and x T +1 is new. For the application of our approach to such streaming data, Algorithm 2 needs to be modified slightly. Whereas the projection (Line 4), the power method (Line 5), and the background model computation (Line 6) can be used in the same way, the leastsquares method has to be altered.
In short, the problem is to find a solution c new ∈ R T +1 for the minimization problem whereas the previous computed solution c old ∈ R T solves the minimization problem min x 0 . . . x T −1 c−x T . There are two different techniques to tackle this problem, and both enable a streaming application in real time.
The first one uses an incremental approach for solving least-squares problems, e.g., an incremental singular value decomposition, for which many implementations are available. The second technique uses the least-squares method from Algorithm 2 with a modified start value that stems from the previous computation. We propose choosing because c new approximates the last relevant snapshot:

Implementation
Algorithm 2 was implemented in MATLAB R2019a, which already provides many optimized functions. For the leastsquares method in Line 3, the internal function named "lsqr" was used with a start value according to Eq. 16. The maximum iterations were not limited. For the power method in the non-extended case, the companion matrix C 1 was applied to the start value v start (see Eq. 17) 100 times. Our tests have shown that this number of iteration produces accurate and robust background models. The procedure is done without normalization of the vector. Moreover, for the matrix vector multiplication the low-dimensional and sparse organized structure of the companion matrix should be taken into account (i.e., only 2m +1 entries are non-zero). For the power method in the extended case, the internal function "eigs" was used to compute various eigenvalues.
The evaluation includes qualitative (see Fig. 5) and quantitative (see Table 1) experimental results as well as computation time measurements. For each algorithm, either the standard parameter setup was chosen (from the LRSlibrary) or the suggested settings from the associated publication. For the final foreground mask, a 5 × 5 median filter was used and the best result from four fixed thresholds were chosen to guarantee a fair comparison. The results were computed with MATLAB R2019a on a machine with 2.90GHz Intel Core i9-8950HK processor and 32GB RAM.

Accuracy
In Fig. 5, results from the above mentioned algorithms are compared qualitatively. An overview of the challenges of each scenario can be found in Table 2. Whereas some algorithms struggle with shadows (2nd row), ripping water (3rd row), or low framerates (9th and 10th row), our approach achieves adequate and robust results. For the bad weather (5th and 6th row) and thermal (7th and 8th row) scenarios, we achieve visually appealing foreground masks, in particular, for the 5th and 7th row. Only in the 1st and 4th row, our foreground mask (as well as the others) struggles with specific components of the frame, like glass, reflections, or spatial coherence .
These observations match to the quantitative outcomes in Table 1, where the F 1 -measure is computed for a selection of frames (where we have access to the ground truth). We observe that our approach both the grayscale version and the RGB version consistently obtain good results and remains robust, which is also true for GROUSE, RPCA-GD, D-S-NMF, and ROSL. In particular, the previous DMD approaches, cDMD and exDMD, often fail to detect foreground objects accurately (e.g., for canoe, skating, and tram), leading to scores less than 0.6. Moreover, many algorithms have problems in computing background models for bigger scenarios because computational intense matrix calculations have to be performed. To avoid these issues, either subsampling techniques need to be exploited (which we have not used) or partitioning techniques Table (2).
According to the quantitative analysis, our approach effexDMD4B can compete with other state-of-the-art algorithms in all scenarios. Whereas our RGB version achieves the best average scores along RPCA-GD, our grayscale approach has the fourth best average score after the algorithm ROSL. Other accurate algorithms are D-S-NMF, GROUSE, and pROST, achieving average scores higher than 0.8. The two other DMD-based background algorithms have the worst average scores due to their robustness issues. We therefore emphasize that enforcing an eigenvalue λ 1 resolves the robustness problems of DMD and leads to more accurate results. To substantiate this statement, several precisionrecall curves are presented in Fig. 6. Whereas cDMD and exDMD often fail to score good precision and recall values,  [18] R2PCP [17] FW-T [25] GROUSE [2] PG-RMC [8] RPCA-GD [40] pROST [15] OSTD [33] ROSL [31] D-S-NMF [36] cDMD [9] exDMD [14] Our Gray The best scores are marked in boldface

Computation Time
To demonstrate the efficiency of our algorithm, we first compare the overall computation time (except for the foreground mask procedure) of all algorithms for a given test dataset. The test dataset is based on the blizzard scenario, has a resolution of 720 × 480, and consists of 500 frames. The comparison is illustrated in Fig. 7. We observe that the grayscale version of effexDMD4B is extremely efficient calculating the background model in nearly 6 seconds. The RGB version needs approximately three times as much computation time. This highlights the fact that Algorithm 2 evolves linearly in space because effexDMD4B_RGB has to deal with three color channels. The traditional DMD algorithm exDMD requires nearly 35 times more computation time than our algorithm (gray), whereas cDMD only needs 1.5 times more computation time due to its high compression rate. For the algorithms from the LRSlibrary, D-S-NMF is almost as fast as our approach. The other algorithms, like ncRPCA, FW-T, and ROSL (which also belong to the highest speed classification), are at least an order of magnitude slower. Besides the absolute computation time, we also want to demonstrate how the algorithm scales when the number of frames grows. Therefore, Fig. 8 shows the computation time for the six most efficient algorithms as a function of number of frames. We observe that our approach effexDMD4B (grayscale and RGB) as well as Deep-Semi-NMF evolve linearly as the lines coincide with their dashed lines (linear least-squares fit). Previous DMD-based methods, like cDMD (or exDMD), highly suffer from operations that exhibit quadratic and cubic computational complexity (due to the computation of all DMD components while our approach avoids it). This can be observed by the purple dashed line, which does not match with the solid purple line. A similar behavior can be observed for ROSL and even worse for FW-T.

Discussion and Conclusion
In this paper, we have presented a DMD-based approach for the computation of background models. According to the qualitative and quantitative results, our approach can compete with other state-of-the-art methods. It produces accurate background models (just like RPCA-GD and ROSL) and remains robust, which is a significant improvement compared to previous DMD-based algorithms.
However, the real strengths of effexDMD4B are its high efficiency and flexibility. It is extremely fast and additionally evolves linearly in time and space, whereas most other algorithms exhibit quadratic or even cubic computational complexities. Moreover, it can be applied to a wide range of scenarios with different challenges without adjustment. Many background modeling algorithms need to modify parameter settings to produce accurate results. Another sort of flexibility is the extension of our approach to RGB data, data with periodic backgrounds parts, and to streaming data.
Since DMD offers great potential, we plan to extend our approach to video data that exhibit panning, zooming, jittering, and intermitting elements. This may include the use of different pre-and post-processing tools, like spatiotemporal smoothing, to obtain a more accurate foreground mask. Another idea is to apply other iterative least-squares methods (lsqr) that respect the special temporal coherence structure of our minimization problem. This consideration is particularly important for the application to streaming data, where special emphasis is on the least-squares methods (since an execution is needed in every update step). Moreover, since DMD can deal with compressed data (due to the spectral decomposition), a version of effexDMD4B with compression techniques may further boost efficiency.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.