1 Introduction

Modern video background modeling algorithms are often inherently based on matrix decompositions [6]. Besides pre- and post-processing, the core of these algorithms is to explicitly or implicitly decompose the data into a low-rank matrix characterizing the background and a sparse matrix that shows the foreground. To perform such a decomposition, a computational costly minimization problem with parameter adjustment is typically established that neglects spatial and temporal coherence at first (due to the use of simple matrix norms, seminorms, and quasinorms in the loss function). Since this formulation may fail to produce adequate background models for more complex video sequences, spatial and/or temporal constraints are taken into account to meet the requirements. Another more recent approach is to integrate neural networks [1] or related concepts into the process of background modeling, which leads to supervised methods in general. In this paper, however, we neither focus on a minimization-based background modeling approach as described above nor on learning-based concepts. Instead, we follow a different approach and exploit Dynamic Mode Decomposition (DMD), a versatile spectral decomposition technique that naturally extracts spatio-temporal patterns from data.

Fig. 1
figure 1

The traditional DMD approach for background modeling: After vectorizing the \(m+1\) video frames into n-dimensional data, the DMD modes, eigenvalues, and amplitudes are computed via a low-dimensional representation. Finally, the background model is extracted

DMD was originally established in the fluid dynamics community [30] and has evolved to an effective analysis tool. Numerically, DMD produces the following triples: DMD modes, amplitudes, and eigenvalues. For time-dependent data, the modes represent the spatial contribution and the amplitudes specify their impact. Each associated eigenvalue characterizes the temporal development. According to this interpretation, Grosek and Kutz [14] were the first to model the background in video by DMD. Compared to state-of-the-art algorithms, the approach suffers in terms of accuracy, efficiency, and robustness. Some publications address the high computational costs of DMD by efficient subsampling strategies or related concepts, but these techniques struggle even more with problems regarding accuracy and robustness.

Instead of applying subsampling techniques, we present an equivalent reformulation of DMD with constraints that enforces the existence of background components. Due to the reformulation, which uses sparse and low-dimensional structures, extremely efficient algorithms (e.g., the power method) can be leveraged for the computation. The contribution of this paper can be summarized as follows:

  • We elaborate on the drawbacks of previous methods and thus motivate our approach.

  • An equivalent reformulation of DMD with constraints is presented, leading to a suitable decomposition into fore- and background.

  • We derive an efficient and robust algorithm that computes accurate background models.

  • The approach is extended to RGB data, data with periodic background parts, and streaming data.

The remainder of this paper is organized as follows: After clarifying related work in the subsequent section, Sect. 3 reviews the traditional DMD approach for foreground–background separation. Section 4 presents our approach including a motivation, new formulation, the associated algorithm, extensions, and implementation details. Section 5 provides a qualitative and quantitative evaluation in terms of accuracy, efficiency, and robustness. Finally, a conclusion is drawn in Sect. 6.

2 Related Work

There are numerous background modeling algorithms developed in the literature [5], and they can be classified as unsupervised [26, 29, 37], supervised [1, 4, 7, 23, 24], and semi-supervised methods [12, 13]. Supervised methods allow for more accurate results than unsupervised methods, especially in complex scenarios, but they need typically large training data. Semi-supervised approaches achieve comparably accurate results with less labeled training data; however, they are prone to supervision aspects as well. Since our approach does not rely on training processes or related concepts, it is an unsupervised method. Such methods were published in large numbers and varieties over the past decade. Depending on the tasks, papers [11, 34, 39] classify them differently. To our knowledge, the most recent review for unsupervised background modeling algorithms was written by Bouwmans et al. [6] in combination with the LRSlibrary [32]. Using the approach of decomposing video data into a low-rank and sparse matrix, they classify algorithms into the following categories: robust principal component analysis (RPCA), matrix completion (MC), non-negative matrix factorization (NMF), subspace tracking (ST), low-rank recovery (LRR), three-term decomposition (TTD), and tensor decompositions. Bouwmans et al. [6] put DMD into the RPCA category. However, DMD substantially differs from conventional RPCA approaches, e.g., the structure of the established minimization problems is completely different. We thus compare our approach only with DMD-based approaches in the following.

The first background modeling approach with DMD was proposed by Grosek and Kutz [14]. An extension to RGB video was presented by Tirunagari [35]. However, alternative state-of-the-art background modeling algorithms achieve better results with substantially less computational costs. Kutz et al. [21] later proposed using multi-resolution DMD for background modeling. It integrates multi-resolution analysis techniques into DMD, which enables the identification of multi-time scale features and, therefore, may lead to better results. Nevertheless, computational costs are even higher. The issue of efficiency was addressed by Erichson and Donovan [10] and slightly later by Erichson et al. [9] and Pendergrass et al. [28]. All these approaches use efficient sampling strategies, either based on a randomized singular value decomposition (randomized DMD) or on compressed data (compressed DMD). Whereas these subsampling techniques show promising efficiency, the background model suffers from the subsampling process in terms of accuracy and robustness. Instead of applying subsampling techniques, we use an equivalent reformulation of DMD with constraints that allows efficient and robust computation and leads to a more suitable decomposition into fore- and background.

Recently, Haq et al. [16] also presented an efficient DMD-based method for background modeling that uses dictionary learning as subsampling strategy. As the dictionary needs to be trained, it substantially differs from our approach.

Finally, there are also other DMD-based techniques for videos, e.g., video shot detection [3] or saliency detection [27]. However, those approaches have other objectives and technical contributions than our paper.

3 Theoretical Foundation

This section provides the basics of the foreground– background separation with DMD, including the general theory of DMD and its traditional application to video data (see Fig. 1).

figure e

3.1 Dynamic Mode Decomposition

Before formulating the DMD background modeling algorithm, we summarize the principles of DMD in general [19, 20]. These aspects help understand foreground–background separation and, thus, our approach.

Let us consider data \(x_0,\dots ,x_m \in \mathbb {R}^n\) sampled uniformly in time. Typically, the data dimension n is considerably greater than m (compare Fig. 1, left box). DMD basically computes eigenvalues and eigenvectors of a high-dimensional matrix \(A_{\text {\tiny DMD}} \in \mathbb {R}^{n \times n}\) (characterizing the dynamics of the data) via a low-dimensional representation \(S_{\text {\tiny DMD}}\) (see Fig. 1, middle box). The matrix \(A_{\text {\tiny DMD}}\) is derived from the minimization problem

$$\begin{aligned} \min \Vert A X - Y \Vert _F^2, \end{aligned}$$

where \(X = \begin{bmatrix} x_0&\dots&x_{m-1} \end{bmatrix}\), \(Y = \begin{bmatrix} x_1&\dots&x_{m} \end{bmatrix}\). In general, the minimization problem has infinite solutions (as we will prove later). DMD therefore aims for the minimum-norm solution, which is explicitly given by

$$\begin{aligned} A_{\text {\tiny DMD}} = YX^+, \end{aligned}$$

where \(X^+\) is the Moore–Penrose pseudoinverse of X. To obtain the eigenvalues and eigenvectors of \(A_{\text {\tiny DMD}}\), the low-dimensional representation \(S_{\text {\tiny DMD}}\) is computed via the thin singular value decomposition (SVD) of X, i.e., \(X = U \Sigma V^*\) with \(r=\text {rank}(X)\), and the following reformulation:

$$\begin{aligned} S_{\text {\tiny DMD}} = U^* A_{\text {\tiny DMD}} U = U^* Y V \Sigma ^{-1} U^* U = U^* Y V \Sigma ^{-1}. \end{aligned}$$

Eq. 3 is the core of DMD, forming the starting point of Algorithm 1 (Lines 1–4). In the next steps, the (DMD) eigenvalues \(\lambda _j \in \mathbb {C}\) and eigenvectors \(v_j \in \mathbb {C}^{r}\) of \(S_{\text {\tiny DMD}}\) are computed (line 5). Finally, all eigenvectors with nonzero eigenvalues are transformed into modes \(\vartheta _j = \frac{1}{\lambda _j} Y V \Sigma ^{-1} v_j \in \mathbb {C}^n\) (Lines 6–8) and the corresponding amplitudes \(a_j \in \mathbb {C}\) are computed by \(\min \Vert \Theta a - x_0 \Vert \) (Lines 9–11). Algorithm 1 results in the following decomposition of the data:

$$\begin{aligned} x_k \approx \sum _{j=1}^{r_0} {\lambda _j^k a_j \vartheta _j}. \end{aligned}$$

This approximation property is crucial for the interpretation of DMD triples \((\vartheta _j,\lambda _j,a_j)\), in particular, concerning the foreground–background separation approach.

3.2 Background Modeling with DMD

To compute a background model with DMD, the DMD components have to be interpreted in terms of video data. An overview of the traditional DMD background modeling is illustrated in Fig. 1. After applying DMD to the vectorized video frames, DMD produces modes \(\vartheta _j \in \mathbb {C}^n\), eigenvalues \(\lambda _j \in \mathbb {C}\), and amplitudes \(a_j \in \mathbb {C}\) with a computational complexity of \(\mathcal {O}(nm^2)\) in general as \(n \gg m\). The entries of a DMD mode \(\vartheta _j\) correspond to grayscaled pixel values at specific pixel locations. According to Eq. 4, the associated eigenvalue \(\lambda _j\) characterizes the temporal development. To extract a background model, Grosek and Kutz [14] propose selecting DMD triples that vary over time extremely slowly or not even at all. These triples are characterized by eigenvalues that satisfy \(\lambda _j \approx 1\) or, more precisely, by the index set

$$\begin{aligned} I_\mathbb {1} = \{ j \in \{1,\dots ,r_0\} ~ ; ~ |\lambda _j - 1 |< \delta \} \end{aligned}$$

for a given threshold \(\delta \) (see Fig. 2, top). Therefore, the video frames can be separated by DMD into fore- and background:

$$\begin{aligned} x_k \approx x_k^{\text {FG}} + x_k^{\text {BG}} = \sum _{j \notin I_\mathbb {1}} \lambda _j^k a_j \vartheta _j + \sum _{j \in I_\mathbb {1}} \lambda _j^k a_j \vartheta _j. \end{aligned}$$

Since the eigenvalues \(\lambda _j\), amplitudes \(a_j\), and modes \(\vartheta _j\) are complex in general, the sums are not real-valued. Whereas Grosek and Kutz [14] use the absolute value as well as a rearrangement of negative values, Erichson et al. [9] recommend the use of the real part. In this paper, we follow the technique of Grosek and Kutz.

4 Proposed Approach

In this section, we present our DMD background modeling approach. After a short motivation, the theory and the algorithm (based on grayscaled data) are explained in detail. We then extend our approach to RGB data, data with periodic parts, and streaming data, which enables the usage in video surveillance. Finally, the implementation is described.

Fig. 2
figure 2

Comparison of the traditional approach with ours: The traditional approach uses eigenvalues of the matrix \(S_{\text {\tiny DMD}}\) for the background that are only approximately 1. Therefore, fore- and background components can be mixed up as the extracted background at a certain time shows. In contrast, our approach uses a projection that enforces the existence of an eigenvalues \(\lambda _\mathbb {1} = 1\), which leads to a more suitable decomposition. To further improve the background model, the extended method should be consulted, where multiple eigenvalues are used

4.1 Motivation

Leading background modeling algorithms are characterized by accuracy, robustness, and efficiency. While DMD is capable of producing accurate background models in general, the traditional DMD background modeling algorithm is neither robust nor efficient. This fact can be seen in Fig. 1, where the traditional approach fails to produce an accurate background model. We additionally highlight the computational costs of each step of DMD. Since both an SVD and an eigenvalue decomposition is needed, DMD has a computational complexity of \(\mathcal {O}(nm^2)\) in general as \(n \gg m\).

Even though the improvements via subsampling strategies boost efficiency at the cost of accuracy, they do not solve the following underlying problem: In Fig. 2 top, the background model of the traditional approach is shown at an earlier frame. It can be observed that a pair of complex-conjugated eigenvalues close to the value 1 is detected. According to the traditional approach, the two corresponding DMD triples \((\vartheta _j, a_j, \lambda _j)\) should be selected. If they are not close enough at the value 1, no components are selected due to thresholding and no background can be computed, i.e., the algorithm is not robust. Otherwise, this small inaccuracy causes numerical instability and a mixing of fore- and background parts as shown in Fig. 2. These issues may be even worse for subsampled data.

Consequently, DMD has to be modified such that components either clearly belong to the fore- or background. Since DMD only uses one or two triples (often occurring in complex-conjugated pairs) for the background model [20], we propose enforcing the existence of an eigenvalue \(\lambda _\mathbb {1} = 1\) by a DMD-consistent projection. Moreover, we also recommend using the efficient power method to compute the relevant DMD components for an adequate background model. We can do this because our experiments have shown that all other eigenvalues are inside the unit circle due to the fact that foreground objects appear and disappear over time. This leads to a more suitable decomposition into fore- and background as shown in Fig. 2 that is produced in an efficient manner.

4.2 Theory

As mentioned before, we want to enforce the existence of an eigenvalue \(\lambda _\mathbb {1} = 1\) to obtain a suitable decomposition into fore- and background. To understand the general idea of our approach, we shortly summarize the steps in the following:


Prove the existence of infinite solutions of the minimization problem 1 (see Proposition 1).


Formulate a new minimization problem (see Eq. 6) and find a more accessible representation (see Theorem 1, Theorem 2, and Eq. 11).


Find a feasible solution to the final minimization problem (see Theorem 3).

(§1) If the data \(x_0,\dots ,x_m \in \mathbb {R}^n\) are linear independent (which is the case for real video footage in general), we first note that the high-dimensional minimization problem 1 has infinite solutions that solve the equation exactly, i.e., \(AX = Y\).

Proposition 1

For linearly independent data \(x_0,\dots ,x_m \in \mathbb {R}^n\) with \(Z = \begin{bmatrix} x_0&\dots&x_m \end{bmatrix} \in \mathbb {R}^{n \times m+1}\), the minimization problem 1 has infinite solutions that solve the equation exactly, i.e., \(AX = Y\). In particular, every combination of \(m+1\) distinct complex numbers \(z_0,\dots ,z_m \in \mathbb {C}\) defines a solution \(A_z\) in the form of its eigenvalues.


Let us consider distinct complex numbers \(z_0,\dots ,z_m \in \mathbb {C}\) with \(\Lambda _z = \text {diag}(z_0,\dots ,z_m)\) as well as the respective Vandermonde matrix

$$\begin{aligned} V(z) = \begin{pmatrix} z_0^0 &{} \dots &{} z_0^m \\ \vdots &{} \ddots &{} \vdots \\ z_m^0 &{} \dots &{} z_m^m \end{pmatrix} \in \mathbb {C}^{m+1 \times m+1}. \end{aligned}$$

Defining \(\Xi _z = Z V(z)^{-1}\) (the columns of the matrix are basically scaled modes), the matrix \(A_z = \Xi _z \Lambda _z \Xi _z^+ \) solves the minimization problem 1 exactly:

$$\begin{aligned} A_z X&= \Xi _z \Lambda _z \Xi _z^+ X \\&= Z V(z)^{-1} \Lambda _z V(z) Z^+ X \\&= Z V(z)^{-1} \Lambda _z V(z) \begin{bmatrix} I_m \\ 0^T \end{bmatrix} \\&= Z V(z)^{-1} \Lambda _z \begin{pmatrix} z_0^0 &{} \dots &{} z_0^{m-1} \\ \vdots &{} \ddots &{} \vdots \\ z_m^0 &{} \dots &{} z_m^{m-1} \end{pmatrix} \\&= Z V(z)^{-1} \begin{pmatrix} z_0^1 &{} \dots &{} z_0^{m} \\ \vdots &{} \ddots &{} \vdots \\ z_m^1 &{} \dots &{} z_m^{m} \end{pmatrix} = Z \begin{bmatrix} 0^T \\ I_m \end{bmatrix} = Y \end{aligned}$$

where we used that \(\Xi _z^+ = V(z) Z^+\) (as the columns of the matrix Z are linearly independent). Since \(z_0,\dots ,z_m\) are chosen arbitrarily, the assertion is proven. \(\square \)

The above theorem states that every distinct combination of temporal components \(\lambda _0,\dots ,\lambda _m\) (which are eigenvalues of a respective matrix A) are actually possible outcomes. However, since DMD uses the minimum-norm solution \(A_{\text {\tiny DMD}} = YX^+\), the DMD eigenvalues fit better to the data than a random solution. The existence of adequate eigenvalues for background modeling is not guaranteed nonetheless.

(§2) Hence, the naive approach is to find a matrix \(A \in \mathbb {R}^{n \times n}\) that solves the minimization problem 1 and has an eigenvalue \(\lambda _\mathbb {1} = 1\):

$$\begin{aligned} \begin{aligned} \min \quad&\Vert A - A_{\text {\tiny DMD}} \Vert _F^2 \\ \text {s.t.} \quad&AX = Y \\&\lambda _\mathbb {1} \text { is eigenvalue of }A. \end{aligned} \end{aligned}$$

However, this formulation cannot be solved practically due to the high-dimensionality of the problem and the eigenvalue equation in the constraints.

To formulate an adequate constrained minimization problem, we use the low-dimensional representation C of a matrix A with respect to the data matrix \(Z = \begin{bmatrix} x_0&\dots&x_m \end{bmatrix} \in \mathbb {R}^{n \times m+1}\):

$$\begin{aligned} C = Z^+ A Z \end{aligned}$$

This matrix is a companion matrix and completely characterizes the data-relevant spectral properties of A. In addition, the retransformed matrix \(Z C Z^+ \in \mathbb {R}^{n \times n}\) inherits many algebraic properties, in particular, the minimization property, as the following theorem shows.

Theorem 1

For linearly independent data \(x_0,\dots ,x_m \in \mathbb {R}^n\) with \(Z = \begin{bmatrix} x_0&\dots&x_m \end{bmatrix} \in \mathbb {R}^{n \times m+1}\), the low-dimensional representation \(C = Z^+ A Z \in \mathbb {R}^{m+1 \times m+1}\) to a matrix \(A \in \mathbb {R}^{n \times n}\) satisfying \(AX=Y\) has the following properties:

  1. (a)

    The low-dimensional representation C is a companion matrix

    $$\begin{aligned} C = \left[ \begin{array}{c|c} \begin{matrix} 0^T \\ I_m \end{matrix}&\tilde{c} \end{array}\right] = \begin{bmatrix} 0^T &{} c_0 \\ I_m &{} c \end{bmatrix} \end{aligned}$$

    and the related vector \(\tilde{c}\) satisfies

    $$\begin{aligned} \tilde{c} = \begin{pmatrix} c_0 \\ c \end{pmatrix} = Z^+ A x_m \in \mathbb {R}^{m+1}. \end{aligned}$$
  2. (b)

    An eigenvalue \(\lambda \) of the matrix C with eigenvector \(\tilde{v}\) is also an eigenvalue of A with corresponding eigenvector \(v = Z \tilde{v}\), if \(A v \in \text {span}(x_0,\dots ,x_m)\).

  3. (c)

    An eigenvalue \(\lambda \ne 0\) of the matrix A with eigenvector v is also an eigenvalue of C with corresponding eigenvector \(\tilde{v} = Z^+v\), if \(v \in \text {span}(x_0,\dots ,x_m)\).

  4. (d)

    The retransformed matrix \(ZCZ^+ \in \mathbb {R}^{n \times n}\) satisfies the equation \((ZCZ^+)X = Y\).


(a) The following calculation shows that C is a companion matrix:

$$\begin{aligned} C = Z^+ A Z = Z^+ \begin{bmatrix} Y&Ax_m \end{bmatrix} = \begin{bmatrix} 0^T &{} c_0 \\ I_m &{} c \end{bmatrix}, \end{aligned}$$

where \(c_0 \in \mathbb {R}\) and \(c \in \mathbb {R}^m\) are values that satisfy

$$\begin{aligned} \tilde{c} =\begin{pmatrix} c_0 \\ c \end{pmatrix} = Z^+ A x_m \in \mathbb {R}^{m+1}. \end{aligned}$$

(b) Let \(\lambda \) be an eigenvalue of the matrix C with eigenvector \(\tilde{v}\). Then, the nonzero vector \(v = Z \tilde{v}\) (as Z has linearly independent columns) satisfies the following equation

$$\begin{aligned} A v = Z Z^+ A v = Z Z^+ A Z \tilde{v} = Z C \tilde{v} = \lambda Z \tilde{v} = \lambda v, \end{aligned}$$

where we have used that \(ZZ^+\) is the projection onto the image of Z.

(c) Let \(\lambda \) be an eigenvalue of the matrix A with eigenvector v. Then, the vector \(\tilde{v} = Z^+v\) satisfies the following equation

$$\begin{aligned} C \tilde{v} = C Z^+ v = Z^+ A Z Z^+ v = Z^+ A v = \lambda Z^+ v = \lambda \tilde{v}. \end{aligned}$$

To show that \(\tilde{v}\) is an eigenvector, we assume that \(\tilde{v} = 0\). This implies \(0 = A Z \tilde{v}\), which results in \(0 = A Z Z^+ v = A v = \lambda v\). Since \(v \ne 0\), the eigenvalue has to be zero, i.e., \(\lambda = 0\). However, this contradicts the assumption. Therefore, \(\tilde{v} \ne 0\).

(d) Using the companion matrix structure and the linearly independent columns of the matrix Z, the following calcution shows that the retransformed matrix \(ZCZ^+\) solves the first constraint:

$$\begin{aligned} (Z C Z^+) X = Z C \begin{bmatrix} I_m \\ 0^T \end{bmatrix} = Z \begin{bmatrix} 0^T &{} c_0 \\ I_m &{} c \end{bmatrix} \begin{bmatrix} I_m \\ 0^T \end{bmatrix} = Z \begin{bmatrix} 0^T \\ I_m \end{bmatrix} = Y. \end{aligned}$$

\(\square \)

With the help of Theorem 1, every solution A to the minimization problem 1 can be represented by an appropriate companion matrix C that naturally satisfies the first constraint in minimization problem 6. To reformulate the entire constrained minimization problem 6, the low-dimensional representation of \(A_{\text {\tiny DMD}} = Y X^+\) has to be calculated as well, which will be done in the following theorem.

Theorem 2

For linearly independent data \(x_0,\dots ,x_m \in \mathbb {R}^n\) with \(Z = \begin{bmatrix} x_0&\dots&x_m \end{bmatrix} \in \mathbb {R}^{n \times m+1}\), the low-dimensional representation \(C_{\text {\tiny DMD}} = Z^+ A_{\text {\tiny DMD}} Z \in \mathbb {R}^{m+1 \times m+1}\) to the matrix \(A_{\text {\tiny DMD}} = Y X^+ \in \mathbb {R}^{n \times n}\) has the following properties:

  1. (a)

    The low-dimensional representation \(C_{\text {\tiny DMD}}\) is a companion matrix with \(\text {rank}(C) = m\) and the related vector \(\tilde{c}_{\text {\tiny DMD}}\) satisfies

    $$\begin{aligned} \tilde{c}_{\text {\tiny DMD}} = \begin{pmatrix} c_0 \\ c_{\text {\tiny DMD}} \end{pmatrix} = \begin{pmatrix} 0 \\ X^+ x_m \end{pmatrix}. \end{aligned}$$
  2. (b)

    An eigenvalue \(\lambda \) of the matrix \(C_{\text {\tiny DMD}}\) with eigenvector \(\tilde{v}\) is also an eigenvalue of \(A_{\text {\tiny DMD}}\) with corresponding eigenvector \(v = Z \tilde{v}\).

  3. (c)

    An eigenvalue \(\lambda \ne 0\) of the matrix \(A_{\text {\tiny DMD}}\) with eigenvector v is also an eigenvalue of \(C_{\text {\tiny DMD}}\) with corresponding eigenvector \(\tilde{v} = Z^+v\).

  4. (d)

    The retransformed matrix \(ZC_{\text {\tiny DMD}}Z^+\) is equal to \(A_{\text {\tiny DMD}}\), i.e., \(ZC_{\text {\tiny DMD}}Z^+ = A_{\text {\tiny DMD}}\) (trivially satisfying \((ZC_{\text {\tiny DMD}}Z^+)X = Y\)).


(a) According to Theorem 1, the matrix \(C_{\text {\tiny DMD}}\) is a companion matrix

$$\begin{aligned} C_{\text {\tiny DMD}} = \begin{bmatrix} 0^T &{} c_0 \\ I_m &{} c_{\text {\tiny DMD}} \end{bmatrix}, \end{aligned}$$

where the corresponding vector \(\tilde{c}_{\text {\tiny DMD}}\) is given by

$$\begin{aligned} \tilde{c}_{\text {\tiny DMD}} =\begin{pmatrix} c_0 \\ c_{\text {\tiny DMD}} \end{pmatrix} = Z^+ A_{\text {\tiny DMD}} x_m&= Z^+ Y X^+ x_m \\&= \begin{bmatrix} 0^T \\ I_m \end{bmatrix} X^+ x_m = \begin{pmatrix} 0 \\ X^+ x_m \end{pmatrix}. \end{aligned}$$

Since \(c_0 = 0\), the matrix \(C_{\text {\tiny DMD}}\) has \(\text {rank}(C_{\text {\tiny DMD}}) = m\).

\((b) + (c)\) Since \(A_{\text {\tiny DMD}}\) has rank m and satisfies \(A_{\text {\tiny DMD}}X =Y\) by the assumption [19], the image of \(A_{\text {\tiny DMD}}\) is given by

$$\begin{aligned} \text {im}(A_{\text {\tiny DMD}}) = \text {span}(x_1,\dots ,x_m). \end{aligned}$$

Therefore, the requirements of (b) and (c) in Theorem 1 are met for the matrix \(A_{\text {\tiny DMD}}\).

(d) To show the equality \(Z C_{\text {\tiny DMD}} Z^+ = A_{\text {\tiny DMD}}\), we first notice that both matrices solve the equation \(AX=Y\) (see Theorem 1). Therefore, the two matrices are equal for vectors \(x \in \text {span}(x_0,\dots ,x_{m-1})\). In addition, this assertion is also true for the \(m+1\) snapshot \(x_m\) because

$$\begin{aligned} Z C_{\text {\tiny DMD}} Z^+ x_m= & {} Z C_{\text {\tiny DMD}} e_{m+1} = Z \begin{pmatrix} 0 \\ X^+ x_m \end{pmatrix} \\= & {} Y X^+ x_m = A_{\text {\tiny DMD}} x_m. \end{aligned}$$

Hence, \(Z C_{\text {\tiny DMD}} Z^+ x = A_{\text {\tiny DMD}} x\) for \(x \in \text {span}(x_0,\dots ,x_m)\). To conclude the proof, we show the equality for an arbitrary vector y from the orthogonal complement of \(\text {span}(x_0,\dots ,x_m)^\bot \):

$$\begin{aligned} (Z C_{\text {\tiny DMD}} Z^+) y&= Z C_{\text {\tiny DMD}} (Z^T Z)^{-1} Z^T y \\&= 0 \\&= Y (X^T X)^{-1} X^T y = Y X^+ y = A_{\text {\tiny DMD}} y. \end{aligned}$$

Since a subspace and its orthogonal complement span the entire space, the two matrices are equal. \(\square \)

Theorem 1 and Theorem 2 show that a high-dimensional matrix A that solves the minimization problem 1 exactly, such as \(A_{\text {\tiny DMD}}\), can be adequately represented by a low-dimensional representation capturing the data-relevant properties. Moreover, the low-dimensional representation has the structure of a companion matrix that naturally satisfies the first constraint in the minimization problem 6. By these characteristics, the high-dimensional constrained minimization problem 6 can be reformulated into the following low-dimensional constrained minimization problem (using the companion matrices from Theorem 1 and Theorem 2):

$$\begin{aligned} \begin{aligned} \min \quad&\Vert \tilde{c} - \tilde{c}_{\text {\tiny DMD}} \Vert _2^2 \\ \text {s.t.} \quad&\lambda _\mathbb {1} \text { is eigenvalue of }\left[ \begin{array}{c|c} \begin{matrix} 0^T \\ I_m \end{matrix}&\tilde{c} \end{array}\right] . \end{aligned} \end{aligned}$$

(§3) The constraint in minimization problem 11 seems to have an impractical form as well. However, it can be solved in a feasible way: Since the characteristic polynomial of a companion matrix C with vector \(\tilde{c}\) is given by

$$\begin{aligned} p_{\tilde{c}}(s) = -\tilde{c}_0 - \tilde{c}_1 s - \dots - \tilde{c}_m s^m + s^{m+1}, \end{aligned}$$

the constraint is equivalent to \(p_{\tilde{c}}(1) = 0\) or, more precisely,

$$\begin{aligned} \tilde{c}_0 + \tilde{c}_1 + \dots + \tilde{c}_m = 1. \end{aligned}$$

Using the notation \(e_{m+1} = (1, 1, \dots , 1)^T \in \mathbb {R}^{m+1}\), the low-dimensional constrained minimization problem 11 is equivalent to

$$\begin{aligned} \begin{aligned} \min \quad&\Vert \tilde{c} - \tilde{c}_{\text {\tiny DMD}} \Vert _2^2 \\ \text {s.t.} \quad&e_{m+1}^T \tilde{c} = 1. \end{aligned} \end{aligned}$$

The solution space \(M_\mathbb {1} = \{ y \in \mathbb {R}^{m+1} ~ ; ~ e_{m+1}^T y = 1 \}\) has several nice properties: It is an affine subspace and therefore convex, which implies existence and uniqueness of a solution for the constrained minimization problem 12 as the following theorem shows.

Theorem 3

For the constrained minimization problem 12, there exists a unique solution given by the orthogonal projection \(P_\mathbb {1}\) onto \(M_\mathbb {1} = \{ y \in \mathbb {R}^{m+1} ~ ; ~ e_{m+1}^T y = 1 \}\). The projection can be computed via

$$\begin{aligned} P_\mathbb {1} y = y + \frac{1-e_{m+1}^T y}{e_{m+1}^T e_{m+1}} e_{m+1}. \end{aligned}$$


Since the constraint is linear (more precisely, affine-linear), the solution space \(M_\mathbb {1}\) is an affine subspace. Hence, \(M_\mathbb {1}\) is a non-empty, closed, and convex set. This implies existence and uniqueness of a solution given by the orthogonal projection onto \(M_\mathbb {1}\). The explicit formula can be easily derived from the fact that \(M_\mathbb {1}\) is a hyperplane that is orthogonal to the vector \(e_{m+1}\) and passes through the point \(r_0 = \frac{1}{m+1} \cdot e_{m+1} \in M_\mathbb {1}\):

$$\begin{aligned} P_\mathbb {1} y&= r_0 + \left( I_{m+1} - \frac{e_{m+1} e_{m+1}^T}{e_{m+1}^T e_{m+1}} \right) \left( y - r_0 \right) \\&= y + \frac{1-e_{m+1}^T y}{e_{m+1}^T e_{m+1}} e_{m+1}. \end{aligned}$$

\(\square \)

Fig. 3
figure 3

Our DMD approach for (grayscaled) background modeling: After vectorizing the \(m+1\) video frames into n-dimensional data, the matrix \(C_{\text {\tiny DMD}}\) is computed and subsequently projected. Then, the power method is applied to the low-dimensional and sparse companion matrix \(C_\mathbb {1}\) for the computation of background relevant DMD components that yield the background model

By Theorem 3, denoting \(P_\mathbb {1}\) as the (orthogonal) projection onto \(M_\mathbb {1}\), the solution to the constrained minimization problem 12 is given by

$$\begin{aligned} \tilde{c}_\mathbb {1} = P_\mathbb {1} \tilde{c}_{\text {\tiny DMD}} = \tilde{c}_{\text {\tiny DMD}} + \frac{1-e_{m+1}^T \tilde{c}_{\text {\tiny DMD}}}{e_{m+1}^T e_{m+1}} \cdot e_{m+1}. \end{aligned}$$

The projection solves the robustness issues of DMD by enforcing the existence of an eigenvalue \(\lambda _\mathbb {1} = 1\) for the companion matrix \(C_\mathbb {1}\) to the vector \(c_\mathbb {1}\). The spectral properties of \(C_\mathbb {1}\), i.e., the eigenvector \(v_\mathbb {1}\) to the eigenvalue \(\lambda _\mathbb {1}\), can therefore be used to compute an accurate background model as illustrated in Figs. 2 and  3. The details will be explained in combination with the algorithm in the next subsection.

4.3 Algorithm

In the following, we describe our algorithm “efficient exact dynamic mode decomposition for background modeling” (effexDMD4B), shown in Algorithm 2. Although our algorithm is based on theorems of the previous subsection, it can applied to any video data without any problems, even if the assumptions of the theorems do not hold.

The first step is to obtain the companion matrix \(C_{\text {\tiny DMD}}\). By Eq. 10 (and the related Eq. 8), we have to compute \(X^+ x_m\), which is the minimum-norm solution to the minimization problem

$$\begin{aligned} \min \Vert X c - x_m \Vert _2^2. \end{aligned}$$

This step is performed by an iterative least-squares method (lsqr), which has a computational complexity of \(\mathcal {O}(nm)\). To keep the constant small, a special start value \(c_{\text {start}} \in \mathbb {R}^{m+1}\) is used, adapted to the minimization problem 15 (Line 2–3). Due to the temporal coherence of video frames, the vector \(c_{\text {start}}\) should particularly emphasize the last data points. Moreover, it should be a partition of unity. We propose using

$$\begin{aligned} c_{\text {start}} = \left( \frac{1}{2^{m+1}},\dots , \frac{1}{8}, \frac{1}{4}, \frac{1}{2}\right) ^T \in \mathbb {R}^{m+1}, \end{aligned}$$

which has shown fast convergence rates, even if the number of frames grow (as we will see later).

figure f

In Line 4 of Algorithm 2, the projection according to Eq. 14 is applied. This step enforces the existence of an eigenvalue \(\lambda _\mathbb {1}\).

Finally, the eigenvector \(v_\mathbb {1}\) to the eigenvalue \(\lambda _\mathbb {1}\) is computed by the power method (without normalization) and transformed to a mode (Line 5–6) that represents the background model statically. The power method uses the repeated application of the companion matrix \(C_\mathbb {1}\) (which is constructed via Eq. 8 and  9) to a start value \(v_{\text {start}}\). This step allows extremely efficient computation at a complexity of \(\mathcal {O}(m)\) because the companion matrix has a low-dimensional representation as well as a sparse structure with only \(2m+1\) non-zero entries.

Moreover, we propose using a special start value based on the temporal arithmetic mean \(\overline{x} = \frac{1}{m+1} \sum _{k=0}^m x_k\) because the temporal mean represents a good initial guess for the background model. Since the power method operates in the low-dimensional space, the start vector is given by

$$\begin{aligned} v_{\text {start}} = Z^+ \overline{x} = \frac{1}{m+1} Z^+ \overline{x} = \frac{1}{m+1} e_{m+1}. \end{aligned}$$

Although there is generally no guarantee that the power method converges for this start vector, our experiments have shown a stable convergence behavior. This is due to the fact that eigenvalues that do not belong to the background are typically located inside the unit circle (see Fig. 2).

4.4 Extensions

In the previous subsection, the basic algorithm was presented that is capable of computing grayscale background models that are sufficient for the most applications. However, our approach is versatile and can be extended to RGB data, video data with periodic parts, and to streaming data. Moreover, these extensions can combined as well.

RGB Data Most background modeling algorithms only work with grayscale video frames because many formulations cannot handle multiple color channels at the same time. Moreover, the extensive computational overhead produced by additional data usually does not justify the slight improvements regarding accuracy. In contrast, our approach can be easily extended by keeping the computational overhead small. For this, we first follow Tirunagari et al. [35] to perform DMD on RGB-based video frames. Analogously to the left box of Fig. 3, the three channels of a RGB-based video frame are vectorized and stacked on top of each other. Hence, the data for DMD has dimensionality \(3n \times m+1\). The remaining steps of our approach can be performed in the same way as shown in Fig. 3 or Algorithm 2 (note that the background model has the same organized strucure as the RBG input data).

Fig. 4
figure 4

Our extended DMD approach for background modeling. Using the power method multiple times, additional periodic parts in the background can be extracted. These periodic background components are characterized by DMD triples with eigenvalues on the unit circle (highlighted by orange circles)

Data with periodic background parts Since DMD is based on a superposition principle (see Eq. 4 and  5), periodic parts in the background can be represented by additional DMD triples whose eigenvalues \(\lambda _j\) capture the frequency. As Fig. 4 shows, periodic processes are characterized by eigenvalues that lie on the unit circle. Hence, our approach can be easily extended in the following way:

Instead of computing only the static component, we use the power method for the ones that lie on the unit circle as well (see Fig. 4). More precisely, since the power method computes the greatest (in absolute value) eigenvalue, the shifted power method is used as long as values on the unit circle are detected (multiple applications with a fixed maximum still result in a computational complexity of \(\mathcal {O}(m)\)). Using the set

$$\begin{aligned} I_{\mathbb {1}+} = \{ j \in \{1,\dots ,r_0\} ~ ; ~ \left||\lambda _j |-1 \right|< \delta \} \end{aligned}$$

for a given threshold \(\delta \), the background of the video frames can be determined via (compare Eq. 5)

$$\begin{aligned} x_k^{\text {BG}} = \sum _{j \in I_{\mathbb {1}+}} \lambda _j^k b_j \vartheta _j, \end{aligned}$$

where the coefficients \(b = (b_j)_{j \in I_{\mathbb {1}+}}\) need to be computed separately. To do this, we represent the background components by matrices, i.e.,

$$\begin{aligned} Z^{\text {BG}} = \begin{bmatrix} x_0^{\text {BG}}&\dots&x_m^{\text {BG}} \end{bmatrix} = \Theta _{\mathbb {1}+} \text {diag}(b) V_{\mathbb {1}+}, \end{aligned}$$

where \(\Theta _{\mathbb {1}+} = \begin{bmatrix} \vartheta _j \end{bmatrix}_{j \in I_{\mathbb {1}+}}\) contains the respective modes column-wise and \(V_{\mathbb {1}+}\) is the Vandermonde matrix to the eigenvalues \((\lambda _j)_{j \in I_{\mathbb {1}+}}\). Then, a minimization problem in the spirit of RPCA is used:

$$\begin{aligned} \min _b \Vert Z - \Theta _{\mathbb {1}+} \text {diag}(b) V_{\mathbb {1}+} \Vert _F. \end{aligned}$$

The problem can be easily solved by the Moore-Penrose pseudoinverse as \(I_{\mathbb {1}+}\) is typically very small:

$$\begin{aligned} \text {diag}(b) = \Theta _{\mathbb {1}+}^+ Z V_{\mathbb {1}+}^+. \end{aligned}$$

Since the matrix is nearly diagonal, the off-diagonal elements have to be neglected. The resulting solution respects the complex conjugation of eigenvalues \(\lambda _j\) such that the scaled modes \(b_j \vartheta _j\) occur in the same complex conjugate pairs. Consequently, the sum is real-valued and, therefore, we can readily generate a background model. This technique is illustrated with an example in Fig. 4. It can observed that our extended approach captures the movement of the elevator precisely by additional two complex conjugated DMD components.

However, since the existence of additional appropriate eigenvalues (besides \(\lambda _\mathbb {1} = 1\)) is not guaranteed, this approach may select DMD components that do not belong to the background. Nevertheless, our experiments have shown that the procedure yields a robust algorithm if periodic patterns occur in the data (like the escalator in Fig. 4).

Fig. 5
figure 5

Qualitative comparison of the results. From left to right: original frame (ORIG), groundtruth (GT), ncRPCA (a), R2PCP (b), FW-T (c), GROUSE (d), PG-RMC (e), RPCA-GD (f), pROST (g), OSTD (h), ROSL (i), D-S-NMF (j), cDMD (k), exDMD (l), our algorithm gray effexDMD4B (m) and our algorithm RGB effexDMD4B (n). From top to bottom: see description of scenarios in Table 2

Streaming data for video surveillance Background modeling is often used for video surveillance, where data are continually acquired, i.e., it is assumed that data \(x_0,\dots ,x_T\) is given (as well as a solution to that) and \(x_{T+1}\) is new. For the application of our approach to such streaming data, Algorithm 2 needs to be modified slightly. Whereas the projection (Line 4), the power method (Line 5), and the background model computation (Line 6) can be used in the same way, the least-squares method has to be altered.

In short, the problem is to find a solution \(c_\text {new} \in \mathbb {R}^{T+1}\) for the minimization problem

$$\begin{aligned} \min \Vert \begin{bmatrix} x_0&\dots&x_{T} \end{bmatrix} c - x_{T+1} \Vert , \end{aligned}$$

whereas the previous computed solution \(c_\text {old} \in \mathbb {R}^{T}\) solves the minimization problem \(\min \Vert \begin{bmatrix} x_0&\dots&x_{T-1} \end{bmatrix} c - x_{T} \Vert \). There are two different techniques to tackle this problem, and both enable a streaming application in real time.

The first one uses an incremental approach for solving least-squares problems, e.g., an incremental singular value decomposition, for which many implementations are available. The second technique uses the least-squares method from Algorithm 2 with a modified start value that stems from the previous computation. We propose choosing

$$\begin{aligned} c_\text {new} = \frac{1}{2} \begin{pmatrix} c_\text {old} \\ 1 \end{pmatrix} \end{aligned}$$

because \(c_\text {new}\) approximates the last relevant snapshot:

$$\begin{aligned} \begin{bmatrix} x_0&\dots&x_{T} \end{bmatrix} c_\text {new}&= \frac{1}{2} \begin{bmatrix} x_0&\dots&x_{T} \end{bmatrix} \begin{pmatrix} c_\text {old} \\ 1 \end{pmatrix} \\&= \frac{1}{2} \left( \begin{bmatrix} x_0&\dots&x_{T-1} \end{bmatrix} c_\text {old} + x_T\right) \\&\approx \frac{1}{2} x_T + \frac{1}{2} x_T \\&= x_T. \end{aligned}$$

4.5 Implementation

Algorithm 2 was implemented in MATLAB R2019a, which already provides many optimized functions. For the least-squares method in Line 3, the internal function named “lsqr” was used with a start value according to Eq. 16. The maximum iterations were not limited. For the power method in the non-extended case, the companion matrix \(C_\mathbb {1}\) was applied to the start value \(v_\text {start}\) (see Eq. 17) 100 times. Our tests have shown that this number of iteration produces accurate and robust background models. The procedure is done without normalization of the vector. Moreover, for the matrix vector multiplication the low-dimensional and sparse organized structure of the companion matrix should be taken into account (i.e., only \(2m+1\) entries are non-zero). For the power method in the extended case, the internal function “eigs” was used to compute various eigenvalues.

5 Experimental Evaluation

For the evaluation of our approach, two well-known datasets for foreground background separation were used: The CDnet 2014 dataset [38] and the I2R dataset [22] (used for the periodic background extension). We compare our approach effexDMD4B (both gray and RGB) with ten algorithms that stem from the LRSlibrary [32], which were chosen from different method-categories and speed-classes: ncRCPA [18], R2PCP [17], FW-T [25], GROUSE [2], PG-RMC [8], RPCA-GD [40], pROST [15], OSTD [33], ROSL [31], and D-S-NMF [36]. The selection was also oriented toward the results of the corresponding review by Bouwmans et al. [6]. Moreover, the traditional DMD approach (exDMD) [14] and the compressed variant (cDMD) [9] are evaluated as well.

Table 1 Quantitative comparison of the results in terms of the \(F_1\)-measure

The evaluation includes qualitative (see Fig.  5) and quantitative (see Table  1) experimental results as well as computation time measurements. For each algorithm, either the standard parameter setup was chosen (from the LRSlibrary) or the suggested settings from the associated publication. For the final foreground mask, a \(5 \times 5\) median filter was used and the best result from four fixed thresholds were chosen to guarantee a fair comparison. The results were computed with MATLAB R2019a on a machine with 2.90GHz Intel Core i9-8950HK processor and 32GB RAM.

Table 2 Description of chosen scenarios
Fig. 6
figure 6

Precision–recall curves for the three scenarios pedestrians, fountain02, and tram (from left to right), which are chosen from different categories. For comparison, the three most accurate algorithms RPCA-GD, ROSL, and D-S-NMF are selected as well as all other DMD-based methods

5.1 Accuracy

In Fig. 5, results from the above mentioned algorithms are compared qualitatively. An overview of the challenges of each scenario can be found in Table 2. Whereas some algorithms struggle with shadows (2nd row), ripping water (3rd row), or low framerates (9th and 10th row), our approach achieves adequate and robust results. For the bad weather (5th and 6th row) and thermal (7th and 8th row) scenarios, we achieve visually appealing foreground masks, in particular, for the 5th and 7th row. Only in the 1st and 4th row, our foreground mask (as well as the others) struggles with specific components of the frame, like glass, reflections, or spatial coherence .

These observations match to the quantitative outcomes in Table 1, where the \(F_1\)-measure is computed for a selection of frames (where we have access to the ground truth). We observe that our approach both the grayscale version and the RGB version consistently obtain good results and remains robust, which is also true for GROUSE, RPCA-GD, D-S-NMF, and ROSL. In particular, the previous DMD approaches, cDMD and exDMD, often fail to detect foreground objects accurately (e.g., for canoe, skating, and tram), leading to scores less than 0.6. Moreover, many algorithms have problems in computing background models for bigger scenarios because computational intense matrix calculations have to be performed. To avoid these issues, either subsampling techniques need to be exploited (which we have not used) or partitioning techniques Table (2).

According to the quantitative analysis, our approach effexDMD4B can compete with other state-of-the-art algorithms in all scenarios. Whereas our RGB version achieves the best average scores along RPCA-GD, our grayscale approach has the fourth best average score after the algorithm ROSL. Other accurate algorithms are D-S-NMF, GROUSE, and pROST, achieving average scores higher than 0.8. The two other DMD-based background algorithms have the worst average scores due to their robustness issues. We therefore emphasize that enforcing an eigenvalue \(\lambda _\mathbb {1}\) resolves the robustness problems of DMD and leads to more accurate results. To substantiate this statement, several precision–recall curves are presented in Fig. 6. Whereas cDMD and exDMD often fail to score good precision and recall values, we observe that our approach achieves accurate results for a variety of thresholds.

Fig. 7
figure 7

Comparison of computation time (in seconds) for a test dataset with a resolution of \(720 \times 480\) and 500 frames

5.2 Computation Time

To demonstrate the efficiency of our algorithm, we first compare the overall computation time (except for the foreground mask procedure) of all algorithms for a given test dataset. The test dataset is based on the blizzard scenario, has a resolution of \(720 \times 480\), and consists of 500 frames. The comparison is illustrated in Fig. 7. We observe that the grayscale version of effexDMD4B is extremely efficient calculating the background model in nearly 6 seconds. The RGB version needs approximately three times as much computation time. This highlights the fact that Algorithm 2 evolves linearly in space because effexDMD4B_RGB has to deal with three color channels. The traditional DMD algorithm exDMD requires nearly 35 times more computation time than our algorithm (gray), whereas cDMD only needs 1.5 times more computation time due to its high compression rate. For the algorithms from the LRSlibrary, D-S-NMF is almost as fast as our approach. The other algorithms, like ncRPCA, FW-T, and ROSL (which also belong to the highest speed classification), are at least an order of magnitude slower.

Fig. 8
figure 8

Comparison of computation time (in seconds) as function of number of frames for the five most efficient algorithms

Besides the absolute computation time, we also want to demonstrate how the algorithm scales when the number of frames grows. Therefore, Fig. 8 shows the computation time for the six most efficient algorithms as a function of number of frames. We observe that our approach effexDMD4B (grayscale and RGB) as well as Deep-Semi-NMF evolve linearly as the lines coincide with their dashed lines (linear least-squares fit). Previous DMD-based methods, like cDMD (or exDMD), highly suffer from operations that exhibit quadratic and cubic computational complexity (due to the computation of all DMD components while our approach avoids it). This can be observed by the purple dashed line, which does not match with the solid purple line. A similar behavior can be observed for ROSL and even worse for FW-T.

6 Discussion and Conclusion

In this paper, we have presented a DMD-based approach for the computation of background models. According to the qualitative and quantitative results, our approach can compete with other state-of-the-art methods. It produces accurate background models (just like RPCA-GD and ROSL) and remains robust, which is a significant improvement compared to previous DMD-based algorithms.

However, the real strengths of effexDMD4B are its high efficiency and flexibility. It is extremely fast and additionally evolves linearly in time and space, whereas most other algorithms exhibit quadratic or even cubic computational complexities. Moreover, it can be applied to a wide range of scenarios with different challenges without adjustment. Many background modeling algorithms need to modify parameter settings to produce accurate results. Another sort of flexibility is the extension of our approach to RGB data, data with periodic backgrounds parts, and to streaming data.

Since DMD offers great potential, we plan to extend our approach to video data that exhibit panning, zooming, jittering, and intermitting elements. This may include the use of different pre- and post-processing tools, like spatio-temporal smoothing, to obtain a more accurate foreground mask. Another idea is to apply other iterative least-squares methods (lsqr) that respect the special temporal coherence structure of our minimization problem. This consideration is particularly important for the application to streaming data, where special emphasis is on the least-squares methods (since an execution is needed in every update step). Moreover, since DMD can deal with compressed data (due to the spectral decomposition), a version of effexDMD4B with compression techniques may further boost efficiency.