1 Introduction

Dynamic mode decomposition (DMD) [16] is an analysis method originally proposed in the field of fluids analysis, and in recent years, it has attracted much attention as an analysis method for time series data with the trend of data science. The effectiveness of dynamic mode decomposition has been verified for various time-series data by many researchers, and the algorithmic development in recent years has been remarkable [11, 12].

Time-series data is often contaminated with random noise in the fields of data science. Due to its practical importance, the development of numerical methods that are robust against noise has been promoted in recent years. The first such attempt was seen in 2016 [7], where noise-corrected DMD, forward-backward DMD, and total least squares DMD were proposed. The total least squares DMD was also proposed in [10] at the same time, and detailed results are given specifically for this method. In 2018 [3], the optimized DMD [6] proposed around 2012 was reviewed from the viewpoint of robustness to noise, and its nice performance was reported. Furthermore, in 2019, the consistent DMD [4] was proposed with an emphasis on noise resistance. Apart from such studies, a statistical model that expresses noise is given, a framework for removing noise by estimating the probability distribution from the data is shown, and Bayesian DMD and others have been specifically proposed in [17]. As described above, although the development of algorithms aimed at removing noise has been promoted recently, the effects have mainly been verified by numerical experiments. The asymptotic analysis in the statistical sense has only a few origins from the result of statistical analysis by Gleser [8, Lemma 3.3].

With such a background, we focus on the statistical model of the DMD for noisy datasets with the aid of numerical analysis for matrix computation, i.e., numerical linear algebra. From the viewpoint of numerical linear algebra, the algorithmic aspect of the DMD is explained as follows. The DMD is a dimensionality reduction algorithm using matrix datasets \(X,Y\in {\mathbb {R}}^{m\times n}\) regarded as the snapshots for the dynamical system. The aim is to find \(r(\le \min (m,n))\) eigenvalues and the corresponding eigenvectors of unknown matrix A with \(\mathrm{rank}(A)\approx r\) such that \(AX\approx Y\). Usually, matrix datasets \(X,Y\in {\mathbb {R}}^{m\times n}\) are contaminated with noise. Under a statistical model of the noise, the total least squares (TLS) method is successfully applied to the DMD [7, 10]. In [1], for convergence theory, a nice statistical model of the noise is presented, where the strong convergence for the DMD based on the TLS can be straightforwardly proved.

The aim of this paper is to develop the convergence theory in a more general situation, where an efficient dimensionality reduction technique is applied to the DMD. The above technique is interpreted as the proper orthogonal decomposition (POD) for dimensionality reduction. Importantly, the strong convergence can be straightforwardly proved for the TLS-DMD combined with the POD under reasonable assumptions.

This paper is organized as follows. Section 2 is devoted to a description of the previous study. In Sect. 3, we construct a general framework of the projected TLS-DMD with the POD basis. In Sect. 4, we prove strong convergence of the proposed method in a statistical sense. Section 5 describes the conclusion of this paper.

2 Previous study

We consider a dynamical system, producing a sequence of vectors \({\varvec{z}}_{k}\in {\mathbb {R}}^{m}\ (k=0,1,\ldots )\) such that

$$\begin{aligned} {\varvec{z}}_{k+1}\approx A{\varvec{z}}_{k}\quad (k=0,1,\ldots ) \end{aligned}$$
(1)

for an \(m \times m\) matrix A, where the meaning of \(\approx \) is defined in the following sections with mathematical rigor. In addition, we assume that \({\varvec{z}}_{k}\in {\mathbb {R}}^{m}\ (k=0,1,\ldots )\) can be partially observed as follows. For positive integers \(k_{1},k_{2},\ldots ,k_{n}\), data pairs \(({\varvec{z}}_{k_{\ell }},{\varvec{z}}_{k_{\ell }+1})\) are given for \(\ell =1,2,\ldots , n\). Define \(X,Y \in {\mathbb {R}}^{m\times n}\) such that

$$\begin{aligned} X:=[{\varvec{z}}_{k_{1}},{\varvec{z}}_{k_{2}},\ldots , {\varvec{z}}_{k_{n}}],\quad Y:=[{\varvec{z}}_{k_{1}+1},{\varvec{z}}_{k_{2}+1},\ldots , {\varvec{z}}_{k_{n}+1}]. \end{aligned}$$
(2)

Then we have \(AX\approx Y\) from (1).

Our problem setting is that datasets presented as \(m\times n\) matrices XY are given. We aim to find the r eigenvalues with large absolute values and the corresponding eigenvectors of unknown matrix A, where r is considerably smaller than \(\min (m,n)\). The aim is due to the assumption that A can be approximated by a rank r matrix. Basically, the DMD is used to compute the above eigenpairs without explicitly constructing the large scale matrix A. For this purpose, the singular value decomposition (SVD) [9, Sect. 6.3] is applied. More precisely, the truncated SVD is computed to approximate X in the algorithm of the DMD. The algorithm is written as follows.

figure a

Remark 1

Throughout the paper, the first r singular values and vectors mean the r largest singular values and the corresponding singular vectors. In addition, we assume that \(\mathrm{rank}(X)\) and \(\mathrm{rank}(Y)\) are not smaller than r. This feature is consistent with the problem setting below.

The above matrix expression of the DMD shows relations to other decomposition techniques based on the least squares methods. See [16, Sect. 2.2] for the details.

2.1 Noise model of the DMD

Usually, the observed snapshot datasets are contaminated with noise. In addition, with (1) and (2) in mind, we consider the following simple statistical model. Let

$$\begin{aligned} A{\bar{X}}={\bar{Y}}, \end{aligned}$$

where \({\bar{X}}\) and \({\bar{Y}}\) are not contaminated with noise. In addition, let

$$\begin{aligned} X={\bar{X}}+E,\ Y={\bar{Y}}+F, \end{aligned}$$

where all the elements of EF are random variables. Hence, we have \(AX\approx Y\) for the above exact A associated with the matrix datasets without noise. The above noise model can be applied to time series data related to the Koopman spectral analysis for dynamical system; see [2, 5, 14, 17, 18] for the details.

Noting a particular feature of the Koopman spectral analysis, we can discover the following normalization for our problem setting. Letting \({\mathcal {R}}\) denote the range of the argument matrix, we assume \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})\) in the following discussion. The theoretical justification of this assumption is explained in [1].

Simply, if the dynamical system forming the time series is \({\varvec{z}}_{k+1}=A{\varvec{z}}_{k}\ (k=0,1,\dots , )\), we can see that \({\mathcal {R}}(X)={\mathcal {R}}(Y)(\subset {\mathcal {R}}(A))\) by the definitions as in (2). However, in practical situations, \({\varvec{z}}_{k+1}=A{\varvec{z}}_{k}\) for each step k is only approximate as in (1), and thus X and Y are defined based on \({\bar{X}}\) and \({\bar{Y}}\) above, taking stochastic perturbations into account.

With such a background, we consider the next noise model, slightly different from [1].

Problem 1

Find r nonzero eigenvalues \(\lambda _{i}\ (i=1,\ldots ,r)\) and the corresponding eigenvectors \({\varvec{w}}_{i}\ (i=1,\ldots ,r)\) of \(A \in {\mathbb {R}}^{m\times m}\) with rank \(r(\le m)\) under the following assumptions:

  • Assume that unknown matrices \({\bar{X}},{\bar{Y}}\in {\mathbb {R}}^{m\times n}\) and \(A \in {\mathbb {R}}^{m\times m}\) satisfy

    $$\begin{aligned} A{\bar{X}}={\bar{Y}},\ {\bar{Z}}=[{\bar{X}}^{\top },{\bar{Y}}^{\top }]^{\top },\ \mathrm{rank}({\bar{Z}})=r,\ {\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}}). \end{aligned}$$
  • Let \(\lambda _{i}\ (i=1,\ldots ,r)\) denote nonzero eigenvalues of A. Assume that the eigenvalues \(\lambda _{i}\ (i=1,\ldots ,r)\) are all distinct, i.e., \(\lambda _{i}\not =\lambda _{j}\ (1\le i\not = j \le r)\). Moreover, let \({\varvec{w}}_{i}\ (i=1,\ldots ,r)\) denote the eigenvectors corresponding to \(\lambda _{i}\ (i=1,\ldots ,r)\). In other words,

    $$\begin{aligned} A{\varvec{w}}_{i}=\lambda _{i}{\varvec{w}}_{i}\ (i=1,\ldots ,r),\quad \lambda _{i}\not =\lambda _{j}\ (1\le i\not = j \le r). \end{aligned}$$
  • Observed matrices are \(X,Y\in {\mathbb {R}}^{m\times n}\) such that

    $$\begin{aligned}X={\bar{X}}+E,Y={\bar{Y}}+F,\end{aligned}$$

    where \(E,F \in {\mathbb {R}}^{m\times n}\) are random matrices.

In the above problem setting, we assume that the nonzero eigenvalues of A are all simple, different from the statistical model in [1] because we aim another generalization concerning algorithmic development and its convergence theory under the above simple condition. Our purpose is to perform an asymptotic convergence analysis for \(n\rightarrow \infty \) under reasonable assumptions. To this end, we consider the next assumptions in this paper.

Assumption 1

The columns of \([E^{\top },F^{\top }]^{\top }\) are i.i.d. with common mean vector \({\varvec{0}}\in {\mathbb {R}}^{2m}\) and common covariance matrix \(\sigma ^{2}I \in {\mathbb {R}}^{2m\times 2m}\). Moreover, \(\lim _{n\rightarrow \infty }n^{-1}{\bar{X}}{\bar{X}}^{\top }\) exists.

Assumption 2

Under Assumption 1, the rank of the matrix \(\lim _{n\rightarrow \infty }n^{-1}{\bar{X}}{\bar{X}}^{\top }\) is r in Problem 1.

Under the above assumptions, we consider numerical methods for solving Problem 1. The parameter estimation of the above noise model is relevant to the total least squares (TLS) problems as in [8, 13]. In the following, we explain the TLS and its application to the DMD.

Remark 2

In Problem 1, the condition \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})\) might be too strong. In addition, one may think that Assumptions 1 and 2 are unrealistic assumptions for time series data. While the above criticisms are valid, it should be noted that, to the best of the author’s knowledge, there are currently no analytical results for stochastic perturbations when projection is used for the DMD and its variants. This study aims to prove consistency under the simplest of conditions, which is a significant contribution in itself. Each of the conditions imposed on Problem 1 is based on the following background and makes some sense.

The condition \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})\) is due to (1) and (2). In terms of theoretical aspect of Problem 1, this condition leads to the uniqueness of the target eigenpairs of A, though A is not unique [1, Lemma 9].

Assumptions 1 and 2 originally come from the asymptotic analysis of an errors-in-variables linear regression model [8, Assumptions A, B, and C]. It is an important future issue to examine how natural these assumptions are for time series data from dynamical systems. The following is an intuitive reason why the assumptions seem to hold. In the DMD, we assume that the nonzero eigenvalues of matrix A are approximately in the unit circle of the complex plane. This is because the goal is to analyze the stable behavior of the system over time. The time series \({\varvec{z}}_{k_{\ell }}\ (\ell =1,\ldots ,n)\) in each column of the matrix X have approximately the same size, and \(n^{-1}XX^{\top }\) is the average of \({\varvec{z}}_{k_{\ell }}{\varvec{z}}_{k_{\ell }}{}^{\top }\ (\ell =1,\ldots ,n)\). Since the absolute values of the nonzero eigenvalues of A are equal, it is unlikely that the information on a particular eigenvector will disappear, and we can expect that the necessary information on the r eigenvectors will remain in X. This property leads to Assumptions 1 and 2 for \({\bar{X}}\). Mathematically rigorously formulating and proving this property is an issue to be addressed in the future.

2.2 Total least squares DMD (TLS-DMD)

For given \(X,Y \in {\mathbb {R}}^{m\times n}\), the total least squares (TLS) problem is formulated as

$$\begin{aligned}&\min _{A_{\mathrm{tls}}, \varDelta X, \varDelta Y} \Vert \varDelta X\Vert _{\mathrm{F}}^{2}+\Vert \varDelta Y\Vert _{\mathrm{F}}^{2} \\&\mathrm{s.t.}\ Y+\varDelta Y=A_{\mathrm{tls}}(X+\varDelta X). \end{aligned}$$

For computing \(A_{\mathrm{tls}}\), there is a sophisticated method using the singular value decomposition (SVD) [9, Sect. 6.3]. Although \(A_{\mathrm{tls}}\) is usually assumed to be nonsingular, the rank-deficient case is investigated in [13]. In [7, 10], such a method is successfully applied to the DMD, where \(m\le n\) is assumed. We consider the method in [10] as follows.

figure b

Important features of this algorithm are described as follows. In the following, \(\dag \) means the Moore–Penrose inverse. In the above algorithm, let

$$\begin{aligned} {\widehat{X}}^{\dag }=V\varSigma ^{-1} U^{\top }, \quad {\widehat{A}}_{\mathrm{tls}}={\widehat{Y}}{\widehat{X}}^{\dag }. \end{aligned}$$
(3)

Note that, in the case of \(r=m\le n\), the matrix \({\widehat{A}}_{\mathrm{tls}}\) is the solution of the TLS problem. In general, for any \(r\le \min (m,n)\), it is easy to see that \({\widehat{A}}_{\mathrm{tls}}\) in (3) satisfies \({\widehat{A}}_{\mathrm{tls}}{\widehat{X}}={\widehat{Y}}\) for \(m\times r\) matrices \({\widehat{X}},{\widehat{Y}}\) in line 3 of Algorithm 2, where we assume \(\mathrm{rank}({\widehat{X}})=r\). More importantly,

$$\begin{aligned} {\widetilde{A}}_{\mathrm{dmd}}=U^{\top }{\widehat{A}}_{\mathrm{tls}}U. \end{aligned}$$

This implies that Algorithm 2 computes eigenpairs of \({\widehat{A}}_{\mathrm{tls}}\) by the Rayleigh–Ritz procedure [15, Sect. 4.3] using the r dimensional subspace.

2.3 Strong consistency of the TLS-DMD

First of all, note that, in general, the strong convergence of random variables is defined as follows.

Definition 1

(Strong convergence) The sequence of random variables \({\mathcal {X}}^{(n)}\ (n=0,1,\ldots )\) converges to \({\mathcal {X}}\) with probability one if and only if

$$\begin{aligned} \lim _{n \rightarrow \infty }{\mathcal {X}}^{(n)}(\omega )={\mathcal {X}}(\omega ) \quad \text {for} \quad \omega \in \varOmega {\setminus } \mathrm{N}, \end{aligned}$$

where \(\varOmega \) is a sample space and \(\mathrm{N}\) is a null set for a probability measure.

On the basis of probabilistic analysis of the TLS in [8, 13], the strong convergence of eigenpairs computed by Algorithm 2 is proved in [1]. In other words, the strong consistency of the estimation by the TLS-DMD is theoretically guaranteed for Problem 1, where we assume that \(\mathrm{rank}(A)\) can be estimated by some methods to set \(r:=\mathrm{rank}(A)\).

Theorem 1

(Strong consistency of the TLS-DMD [1]) Suppose that Algorithm 2 is applied to Problem 1 with \(r(<m)\) under Assumptions 1 and 2. Let \(\lambda _{\mathrm{dmd}}^{(n)},{\varvec{w}}_{\mathrm{dmd}}^{(n)}\) denote \(\lambda _{\mathrm{dmd}},{\varvec{w}}_{\mathrm{dmd}}\) in Algorithm 2 for n, respectively. Then each eigenvalue \(\lambda _{\mathrm{dmd}}^{(n)}\) is convergent to a nonzero eigenvalue of A with probability one, and the accumulation points of \({\varvec{w}}_{\mathrm{dmd}}^{(n)}\) are the eigenvector corresponding to \(\lambda _{\mathrm{dmd}}^{(\infty )}\) with probability one.

More precisely, in [1], the convergence theorem states that the eigenvectors (or generalized eigenvectors) of a perturbed matrix go toward the corresponding invariant subspace. However, if we assume that all the nonzero eigenvalues are simple, all the computed eigenpairs converge to the target eigenpairs, as in the above theorem. The proof is due to the continuity of eigenpairs for simple eigenvalues.

3 General framework with the use the POD approach

Proper orthogonal decomposition (POD) is an efficient technique for dimensionality reduction, resulting in an idea to use it together with the DMD. Such a method is often called the projected DMD [10, 18] because the POD basis is used for the subspace to be projected. A typical usage of the POD is based on the truncated SVD of data matrices. As an example,

$$\begin{aligned} X \approx U_{r}\varSigma _{r}V_{r}^{\top }\quad \text {(truncated SVD)}, \end{aligned}$$

where \(\varSigma _{r}\) is a diagonal matrix comprising the r largest singular values of X and \(U_{r},V_{r}\) are the corresponding singular vector matrices. Then the columns of \(U_{r}\) is a basis of the POD. As another example, one might choose

$$\begin{aligned} Y \approx U_{r}\varSigma _{r}V_{r}^{\top }\quad \text {(truncated SVD)} \end{aligned}$$

due to \({\mathcal {R}}({\bar{Y}})\subset {\mathcal {R}}(A)\) in general cases. If we aim to reduce errors based on the law of large numbers, we might use all the snapshots

$$\begin{aligned} Z:=[X,Y],\quad Z \approx U_{r}\varSigma _{r}V_{r}^{\top }\quad \text {(truncated SVD)} \end{aligned}$$

for the POD basis.

In this paper, to cover the above various cases, we construct a general framework for computing the POD basis, resulting in the projected TLS-DMD.

3.1 General projection methods using the POD basis

The aim of this section is to design an algorithm for computing the exact eigenpairs of A in Problem 1 in the asymptotic regime \(n\rightarrow \infty \). To this end, we consider the next algorithm, regarded as a general framework to use the POD basis, where the basis vectors are the columns of a matrix \(M_{r}\in {\mathbb {R}}^{m\times r}.\)

figure c

The matrix \(M_{r}\) is obtained by the SVD as the above discussions. The use of \(M_{r}\) is related to the Rayleigh–Ritz procedure. More specifically, \(M_{r}\) comprises orthonormal basis vectors for the subspace of the Rayleigh–Ritz procedure. If \(\mathrm{rank}(X)=m\),

$$\begin{aligned} {M_{r}^{\top }Y}(M_{r}^{\top }X)^{\dagger }=M_{r}^{\top }YX^{\dagger }M_{r}, \end{aligned}$$

where \(\dag \) means the Moore–Penrose inverse. In other words, the Rayleigh–Ritz procedure for computing the eigenpairs of \(YX^{\dagger }\) is performed. Due to \(A\approx {Y}{X}^{\dagger }\), the above Rayleigh–Ritz procedure aims to compute the eigenpairs of A.

3.2 Numerical tests

Practical efficiency of the DMD and the POD approach is investigated in previous study [7, 10]. Here we present simple numerical results to validate the convergence theory of Algorithm 3, proved in the next section, and show its efficiency. All the tests are performed in MATLAB.

Let \(L\in {\mathbb {R}}^{m\times r}\) denote a random matrix given by the MATLAB’s built-in function randn. In our test, let \(m=200,r=2\) and

$$\begin{aligned} A=L\begin{bmatrix} 0 &{} \quad 1 \\ -1 &{} \quad 0 \end{bmatrix}L^{\dagger }, \end{aligned}$$

implying \(\mathrm{rank}(A)=2\), where \(L^{\dagger }\) is computed by the MATLAB’s built-in function pinv. In this case, the exact eigenvalues are \(\pm \mathrm{i}\). The matrices \({\bar{X}}\) and \({\bar{Y}}\) are given as follows. Generate \(K \in {\mathbb {R}}^{m\times n}\) as a random matrix with the MATLAB’s built-in function randn and let \({\bar{X}}:=AK\) and \({\bar{Y}}:=A{\bar{X}}\) to satisfy the condition \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})(={\mathcal {R}}(A))\) in Problem 1.

The results of the numerical tests are shown in Tables 1 and 2. We used three methods: Original DMD (Algorithm 1), TLS-DMD (Algorithm 2), Projected TLS-DMD (Algorithm 3). In Algorithm 3, \(M_{r}\) is given by the SVD of [XY]. The error of eigenvalues is defined as the square root of the sum of squares of the errors of each computed eigenvalue. The error of each eigenvector is defined as follows. Let \(\theta \) denote the angle between \({\varvec{w}}\) and \({\varvec{w}}_{\mathrm{dmd}}\). Here \({\varvec{w}}\) and \({\varvec{w}}_{\mathrm{dmd}}\) are normalized such that \(\Vert {\varvec{w}}\Vert = \Vert {\varvec{w}}_{\mathrm{dmd}}\Vert =1\) for the Euclidean norm \(\Vert \cdot \Vert \). Then we have

$$\begin{aligned} |\cos \theta |=|{\varvec{w}}^{\mathrm{H}}{\varvec{w}}_{\mathrm{dmd}}|, \end{aligned}$$

where \(\mathrm{H}\) means the Hermitian transpose due to \({\varvec{w}},{\varvec{w}}_{\mathrm{dmd}}\in {\mathbb {C}}^{m}\). On the basis of this feature, we compute

$$\begin{aligned} 1-|{\varvec{w}}^{\mathrm{H}}{\varvec{w}}_{\mathrm{dmd}}| \end{aligned}$$

for each eigenvector. The square root of their sum of squares is defined as the errors of eigenvectors. The random noise for each component of XY is given by the Gaussian distribution \(\mathrm{N}(0,0.1^2)\). Tables 1 and 2 display the averages of 100 trials of the errors for sample size \(n=50,100\).

Table 1 Errors of eigenvalues for three algorithms
Table 2 Errors of eigenvectors for three algorithms

From Tables 1 and 2, we see the efficiency of the projected TLS-DMD in terms of accuracy and its convergence behavior. More specifically, the projection in Algorithm 3 appears to improve estimation accuracy. Since the matrix \(M_{r}\) is given by the SVD of [XY] comprising all the snapshots, the subspace spanned by the columns of \(M_{r}\) can be a nice approximation of \({\mathcal {R}}({\bar{X}})(={\mathcal {R}}({\bar{Y}}))\). Such a practical performance must be deeply investigated with mathematical rigor in the future.

We would like to emphasize here that the main result of this paper is to prove strong consistency of the projected TLS-DMD (Algorithm 3), and thus the detailed numerical and theoretical comparisons with existing methods from the practical viewpoints are beyond the scope of this paper. The theoretical justification of the convergence of the projected TLS-DMD is presented in the next section.

4 Consistency analysis

In this section, we prove strong consistency for Algorithm 3. More specifically, we prove that the computed eigenpairs converge to the exact eigenpairs of A under reasonable assumptions. To this end, we first show an important feature of the Rayleigh–Ritz procedure for denoised matrices in Sect. 4.1. Next, we prove consistency in the statistical sense of Algorithm 3 for noisy datasets in Sect. 4.2. This entire mathematical analysis is the contribution of this paper. Although the presented strong consistency below is a development of the result in [1], its consistency analysis is a completely new approach. From a technical perspective, the discovery of Lemma 5 is crucial, and to take advantage of it, the key properties of projective methods are organized to complete the proof.

As a preparation, we note a basic feature of Algorithm 2. In Algorithm 2, let \(N_{r}\) be a \(2m\times r\) matrix comprising the first r left singular vectors of Z. Even if we define \({\widehat{X}},{\widehat{Y}} \in {\mathbb {R}}^{m\times r}\) such that \([{\widehat{X}}^{\top },{\widehat{Y}}^{\top }]:=N_{r}^{\top }\) in line 3 of Algorithm 2, the eigenpairs computed by this modified algorithm are equal to those in the original algorithm.

The above feature is easily proved as follows. Using a diagonal matrix \(\varSigma _{Z}\in {\mathbb {R}}^{r\times r}\) comprising the nonzero singular values of Z, we have

$$\begin{aligned} N_{r}=ZQ\varSigma _{Z}^{-1} \end{aligned}$$

due to the relationship between the left singular vectors and the right singular vectors. Thus, we have

$$\begin{aligned} {\widehat{X}}=XQ\varSigma _{Z}^{-1},\quad {\widehat{Y}}=YQ\varSigma _{Z}^{-1} \end{aligned}$$

in the modified algorithm. Therefore, \({\widehat{Y}}{\widehat{X}}^{\dagger }=YQ\varSigma _{Z}^{-1}\varSigma _{Z}(XQ)^{\dagger }=YQ(XQ)^{\dagger }\) is equal to \({\widehat{Y}}{\widehat{X}}^{\dagger }\) in Algorithm 2, where \(\dag \) means the Moore–Penrose inverse. Since the TLS-DMD is the Rayleigh–Ritz procedure for \({\widehat{Y}}{\widehat{X}}^{\dagger }\), the modified algorithm is mathematically equivalent to the original algorithm.

Note that the first r left singular vectors of Z and the first r eigenvectors of \(ZZ^{\top }\) are the same. Hence, the above modified algorithm can be descried as follows.

figure d

The next lemma is a basic tool of our mathematical analysis.

Lemma 1

Algorithm 4 is mathematically equivalent to Algorithm 2.

We use Algorithm 4 in the following discussion.

4.1 Deterministic analysis

Our mathematical analysis begins with deterministic analysis. We aim to analyze Algorithm 3. In line 2 of Algorithm 3, Algorithm 2 is applied to \(r \times n\) matrices XY. Noting Lemma 1, we investigate the case of \(r=m\) in Algorithm 4, resulting in the next lemma.

Lemma 2

Suppose that Algorithm 4 is applied with input matrices \(X:={\bar{X}},Y:={\bar{Y}}\), where \({\bar{X}},{\bar{Y}}\) are defined in Problem 1 with \(r=m\). Assume that there exist a nonsingular matrix \(C \in {\mathbb {C}}^{r \times r}\), a diagonal nonsingular matrix \(D \in {\mathbb {C}}^{r \times r}\), and a full-row rank matrix \(K \in {\mathbb {C}}^{r \times n}\) such that \(X = CK\) and \(Y = C D K\). Then

$$\begin{aligned} A=CDC^{-1}, \quad D=\mathrm{diag}(\lambda _{1},\ldots ,\lambda _{r}), \quad C=[{\varvec{w}}_{1},\ldots , {\varvec{w}}_{r}]. \end{aligned}$$
(4)

Moreover, each computed eigenvalue \(\lambda _{\mathrm{dmd}}\) in Algorithm 4 is an eigenvalue of A, and \({\varvec{w}}_{\mathrm{dmd}}\) is the corresponding eigenvector that is a column of C.

Proof

Using \(AX=Y,\ X=CK,\ Y=CDK\), we see

$$\begin{aligned} ACK=CDK. \end{aligned}$$

From \(\mathrm{rank}(K)=r\), it is easy to see that \(AC=CD\) holds, where D is a diagonal matrix. Therefore, we obtain (4).

In the following, we prove that each computed eigenvalue \(\lambda _{\mathrm{dmd}}\) in Algorithm 4 is an eigenvalue of A, and \({\varvec{w}}_{\mathrm{dmd}}\) is the corresponding eigenvector. First of all, noting the definitions of XYZ, we have

$$\begin{aligned} Z=\begin{bmatrix} X \\ Y \end{bmatrix}=\begin{bmatrix} CK \\ CDK \end{bmatrix}=\begin{bmatrix} C \\ CD \end{bmatrix}K. \end{aligned}$$

In the following, \(\mathrm{H}\) means the Hermitian transpose for complex matrices. Thus,

$$\begin{aligned} ZZ^{\top }=\begin{bmatrix} C \\ CD \end{bmatrix}(KK^{\mathrm{H}}) \begin{bmatrix} C^{\mathrm{H}} \ \ C^{\mathrm{H}}D^{\mathrm{H}} \end{bmatrix} \end{aligned}$$

holds. In addition, from

$$\begin{aligned} \mathrm{rank}\left( \begin{bmatrix} C \\ CD \end{bmatrix} \right) =\mathrm{rank}(KK^{\mathrm{H}}) =\mathrm{rank}(\begin{bmatrix} C^{\mathrm{H}} \ \ C^{\mathrm{H}}D^{\mathrm{H}} \end{bmatrix} ) =r, \end{aligned}$$

the eigenspace spanned by the first r eigenvectors of \(ZZ^{\top }\) is the range of

$$\begin{aligned} \begin{bmatrix} C \\ CD \end{bmatrix}. \end{aligned}$$

Therefore, letting \(L \in {\mathbb {C}}^{r\times r}\) denote a nonsingular matrix, we have

$$\begin{aligned} N_{r}=\begin{bmatrix} C \\ CD \end{bmatrix}L=\begin{bmatrix} CL \\ CDL \end{bmatrix}, \end{aligned}$$

where \(N_{r}\in {\mathbb {R}}^{2m\times r}\) is an eigenvector matrix of \(ZZ^{\top }\). In other words,

$$\begin{aligned} {\widehat{X}}=CL,\quad {\widehat{Y}}=CDL \end{aligned}$$

in line 3 of Algorithm 4. Note that \({\widehat{X}}=CL \in {\mathbb {R}}^{r\times r}\) is nonsingular, where we now assume \(r=m\). It then follows that

$$\begin{aligned} {\widehat{A}}_{\mathrm{tls}}={\widehat{Y}}{\widehat{X}}^{-1}=CDL(CL)^{-1}=CDC^{-1}=A. \end{aligned}$$

Thus, Algorithm 4 computes the eigenpairs of A. Therefore, each computed eigenvalue \(\lambda _{\mathrm{dmd}}\) in Algorithm 4 is an eigenvalue of A, and \({\varvec{w}}_{\mathrm{dmd}}\) is the corresponding eigenvector. \(\square \)

Using the above lemma for Algorithm 4, we see that, in an ideal case, Algorithm 3 computes the exact eigenpairs of A defined in Problem 1, as in the next lemma.

Lemma 3

Suppose that Algorithm 3 is applied with input matrices \(X:={\bar{X}},Y:={\bar{Y}}\), where \({\bar{X}},{\bar{Y}}\) are defined in Problem 1. In addition, assume

$$\begin{aligned} {\mathcal {R}}(M_{r})={\mathcal {R}}({\bar{X}}). \end{aligned}$$
(5)

Then each computed eigenvalue \(\lambda _{\mathrm{dmd}}\) is an eigenvalue of A, and \({\varvec{w}}_{\mathrm{dmd}}\) is the corresponding eigenvector.

Proof

Let

$$\begin{aligned} \varLambda = \mathrm{diag}(\lambda _{1},\ldots ,\lambda _{r}),\quad AW=W\varLambda , \end{aligned}$$

where W is an eigenvector matrix corresponding to the nonzero eigenvalues of A. Noting \(\mathrm{rank}(A)=r\), we have \({\mathcal {R}}(A)={\mathcal {R}}(W)\). In addition, due to \(\mathrm{rank}({\bar{Z}})=\mathrm{rank}([{\bar{X}}^{\top },{\bar{Y}}^{\top }]^{\top })=r\) and \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})\), we see \(\mathrm{rank}({\bar{X}})=\mathrm{rank}({\bar{Y}})=r\). Hence, \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})={\mathcal {R}}(W)\). Thus, noting (5), we have \({\mathcal {R}}(W)={\mathcal {R}}(M_{r})\), leading to the existence of a nonsingular matrix \(B\in {\mathbb {C}}^{r\times r}\) such that

$$\begin{aligned} M_{r}=WB. \end{aligned}$$
(6)

In addition, \(M_{r}^{\top }M_{r}=I\) yields

$$\begin{aligned} B^{\mathrm{H}}W^{\mathrm{H}}WB=I. \end{aligned}$$
(7)

Moreover, from \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})\), using \(K_{W}\in {\mathbb {C}}^{r \times n}\) with \(\mathrm{rank}(K_{W})=r\), we have

$$\begin{aligned} {\bar{X}}=WK_{W},\quad {\bar{Y}}=W\varLambda K_{W}. \end{aligned}$$
(8)

The above Eqs. (6)–(8) indicate

$$\begin{aligned} M_{r}^{\top }{\bar{X}}=B^{\mathrm{H}}W^{\mathrm{H}}WK_{W}=B^{-1}K_{W},\quad M_{r}^{\top }{\bar{Y}}=B^{\mathrm{H}}W^{\mathrm{H}}W\varLambda K_{W}=B^{-1}\varLambda K_{W}. \end{aligned}$$

Thus, noting Lemma 2, we see that Algorithm 4 with \(X:=M_{r}^{\top }{\bar{X}},Y:=M_{r}^{\top }{\bar{Y}}\) computes the diagonal elements of \(\varLambda \) and the columns of \(B^{-1}\) as eigenpairs of \({\widetilde{A}}_{\mathrm{dmd}}\). Let \(W_{\mathrm{dmd}}\in {\mathbb {C}}^{r\times r}\) denote an eigenvector matrix comprising the r eigenvectors of \({\varvec{w}}_{\mathrm{dmd}}\) computed by Algorithm 4 with \(m=r\). Then Algorithm 3 computes the columns of

$$\begin{aligned} M_{r}W_{\mathrm{dmd}}=WBB^{-1}=W, \end{aligned}$$

which is the target eigenvector matrix of A. \(\square \)

In a real situation, the exact \({\bar{X}},{\bar{Y}}\) are unknown, perturbed matrices XY can be observed as in Problem 1. In the following, we analyze this situation.

4.2 Statistical analysis

This section is devoted to statistical analysis for Algorithm 3, where the data matrices XY are contaminated with random noise.

Recall that E and F are random matrices, and thus \(X:={\bar{X}}+E\) and \(Y:={\bar{Y}}+F\) are also random matrices. In addition, any matrix computed by X and Y is random matrix such as Z and \(M_{r}\). The basic tool for asymptotic analysis \(n\rightarrow \infty \) is the next lemma.

Lemma 4

[8, Lemma 3.1] Under Assumption 1, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }n^{-1}ZZ^{\top }=\lim _{n\rightarrow \infty }n^{-1}{\bar{Z}}{\bar{Z}}^{\top }+\sigma ^2 I=:\varPhi \quad \text { with probability one}. \end{aligned}$$

More specifically, letting \(Z^{(n)}:=Z\) comprising random variables with the use of the sample size n, we have

$$\begin{aligned} \lim _{n \rightarrow \infty }n^{-1}Z^{(n)}(\omega )Z^{(n)}(\omega )^{\top }=\varPhi \quad \mathrm{for} \quad \omega \in \varOmega {\setminus } \mathrm{N}_{1}, \end{aligned}$$

where \(\varOmega \) is a sample space and \(\mathrm{N}_{1}\) is a null set as in Definition 1.

Thus, \(n^{-1}ZZ^{\top }\) is not convergent to \(n^{-1}{\bar{Z}}{\bar{Z}}^{\top }\). At first glance, this property makes an analysis of convergence less straightforward. However, we can easily avoid this difficulty as follows. Importantly, the eigenvectors of \(n^{-1}ZZ^{\top }\) are convergent to those of \(n^{-1}{\bar{Z}}{\bar{Z}}^{\top }\) from the above equality. Algorithm 4 computes the eigenvector matrix \(N_{r}\) to obtain the eigenpairs of A. Thus, Algorithm 4 can compute the exact eigenpairs in the asymptotic regime \(n\rightarrow \infty \).

In the following, as in Lemma 4, we use superscripts to denote matrices for a fixed n, as in \(Z^{(n)}\). In addition, if a discussion of the sample space is required, \(\omega \in \varOmega \) is explicitly indicated, as in \(Z^{(n)}(\omega )\). The same notation is used for random matrices other than Z.

The next lemma states that Algorithm 3 has a similar feature to Lemma 4.

Lemma 5

Suppose that Algorithm 3 is applied to Problem 1 under Assumptions 1 and 2. In addition, we assume

$$\begin{aligned} \lim _{n \rightarrow \infty }M^{(n)}_{r}(\omega )={\bar{M}}_{r} \quad \mathrm{for} \quad \omega \in \varOmega {\setminus } \mathrm{N}_{2}, \quad {\mathcal {R}}({\bar{M}}_{r})={\mathcal {R}}({\bar{X}}), \end{aligned}$$

where \(\mathrm{N}_{2}\) is a null set. Then letting

$$\begin{aligned}&{Z}_{M}^{(n)}(\omega ):=\begin{bmatrix} M_{r}^{(n)}(\omega )^{\top }{X}^{(n)}(\omega ) \\ M_{r}^{(n)}(\omega )^{\top }{Y}^{(n)}(\omega ) \end{bmatrix}, \quad {\bar{Z}}_{{\bar{M}}}:=\begin{bmatrix} {\bar{M}}_{r}^{\top }{\bar{X}} \\ {\bar{M}}_{r}^{\top }{\bar{Y}} \end{bmatrix}, \nonumber \\&\lim _{n\rightarrow \infty }n^{-1}{\bar{Z}}_{{\bar{M}}}{\bar{Z}}_{{\bar{M}}}^{\top }+\sigma ^2 I=:\varPsi , \end{aligned}$$
(9)

we have

$$\begin{aligned} \lim _{n \rightarrow \infty }n^{-1}Z_{M}^{(n)}(\omega )Z_{M}^{(n)}(\omega )^{\top }=\varPsi \quad \mathrm{for} \quad \omega \in \varOmega {\setminus } (\mathrm{N}_{1}\cup \mathrm{N}_{2}), \end{aligned}$$

where \(\mathrm{N}_{1}\cup \mathrm{N}_{2}\) is a null set.

Proof

For \({Z}_{M}^{(n)}\) comprising the random variables, we see

$$\begin{aligned} n^{-1}{Z}_{{M}}^{(n)}{Z}_{{M}}^{(n)}{}^{\top }=n^{-1} \begin{bmatrix} {M}_{r}^{(n)}{}^{\top } &{} \quad O \\ O &{} \quad {M}_{r}^{(n)}{}^{\top } \end{bmatrix} Z^{(n)}Z^{(n)}{}^{\top } \begin{bmatrix} {M}_{r}^{(n)} &{} \quad O \\ O &{}\quad {M}_{r}^{(n)} \end{bmatrix} \end{aligned}$$

in view of \(Z^{(n)}=[X^{(n) {\top }},Y^{(n) {\top }}]^{\top }\). Noting

$$\begin{aligned} \begin{bmatrix} {M}_{r}^{(n)}(\omega )^{\top } &{} \quad O \\ O &{} \quad {M}_{r}^{(n)}(\omega )^{\top } \end{bmatrix} \begin{bmatrix} {M}_{r}^{(n)}(\omega ) &{}\quad O \\ O &{} \quad {M}_{r}^{(n)}(\omega ) \end{bmatrix} =I \quad \mathrm{for} \quad \omega \in \varOmega \end{aligned}$$

and Lemma 4, we have

$$\begin{aligned} \lim _{n \rightarrow \infty }n^{-1}Z_{M}^{(n)}(\omega )Z_{M}^{(n)}(\omega )^{\top }=\varPsi \quad \mathrm{for} \quad \omega \in \varOmega {\setminus } (\mathrm{N}_{1}\cup \mathrm{N}_{2}), \end{aligned}$$

where \(\varPsi \) is defined in (9). \(\square \)

The above lemma implies

$$\begin{aligned} \lim _{n\rightarrow \infty }n^{-1}Z_{M}^{(n)}Z_{M}^{(n)}{}^{\top }=\varPsi \quad \text { with probability one} \end{aligned}$$

because \(\mathrm{N}_{1}\cup \mathrm{N}_{2}\) is a null set. In other words, this indicates the strong convergence in Definition 1.

We are now in position to present the main theorem that states the strong consistency of Algorithm 3.

Theorem 2

Suppose that Algorithm 3 is applied to Problem 1 under Assumptions 1 and 2. In addition, letting \(M_{r}^{(n)}\) denote \(M_{r}\) for n, we assume

$$\begin{aligned} \lim _{n \rightarrow \infty }M^{(n)}_{r}={\bar{M}}_{r}\quad \text { with probability one}, \quad {\mathcal {R}}({\bar{M}}_{r})={\mathcal {R}}({\bar{X}}). \end{aligned}$$
(10)

Let \(\lambda _{\mathrm{dmd}}^{(n)},{\varvec{w}}_{\mathrm{dmd}}^{(n)}\) denote \(\lambda _{\mathrm{dmd}},{\varvec{w}}_{\mathrm{dmd}}\) in Algorithm 3 for n. Then each eigenvalue \(\lambda _{\mathrm{dmd}}^{(n)}\) is convergent to a nonzero eigenvalue of A with probability one, and the accumulation points of \({\varvec{w}}_{\mathrm{dmd}}^{(n)}\) are the eigenvector corresponding to \(\lambda _{\mathrm{dmd}}^{(\infty )}\) with probability one.

Proof

Algorithm 3 uses Algorithm 2, and Algorithm 2 is mathematically equivalent to Algorithm 4. Recall that Lemma 3 states that Algorithm 3 computes the exact eigenpairs in an ideal case \(X={\bar{X}},Y={\bar{Y}}\), where the TLS-DMD (Algorithm 4) is applied to \(X={{\bar{M}}_{r}}^{\top }{\bar{X}},Y={{\bar{M}}_{r}}^{\top }{\bar{Y}}\). Under the condition, \(N_{r}\) in line 2 of Algorithm 4 is an eigenvector matrix of \({\bar{Z}}_{{\bar{M}}}{\bar{Z}}_{{\bar{M}}}{}^{\top }\) in Lemma 5. Let \({\bar{N}}_{r}\) be this \(N_{r}\). Note that the eigenpairs computed from \({\bar{N}}_{r}\) in Algorithm 4 coincide with the eigenpairs of A, independent of n, as shown in Lemma 3.

Below we consider general \(X:={\bar{X}}+E, Y:={\bar{Y}}+F\) in Algorithm 3. Then the TLS-DMD (Algorithm 4) is applied to \(X:={{M}_{r}}^{\top }{X},Y:={{M}_{r}}^{\top }{Y}\). Using the matrices, define

$$\begin{aligned} {Z}_{M}:=\begin{bmatrix} M_{r}^{\top }{X} \\ M_{r}^{\top }{Y} \end{bmatrix}. \end{aligned}$$

Then \(N_{r}\) in line 2 of Algorithm 4 is an eigenvector matrix of \(Z_{M}Z_{M}^{\top }\). Let \(N_{r}^{(n)}\) denote this \(N_{r}\) for n. Due to these definitions and Lemma 5, as \(n\rightarrow \infty \), the accumulation points of \(N_{r}^{(n)}\) are equal to \({\bar{N}}_{r}\) with probability one under the assumptions in (10). Since the eigenvalues \(\lambda _{\mathrm{dmd}}^{(n)}\) and the corresponding eigenvectors \({\varvec{w}}_{\mathrm{dmd}}^{(n)}\) are computed by \(N_{r}^{(n)}\) and \(M_{r}^{(n)}\), the continuity of eigenvalues and eigenvectors yields the strong convergence of \(\lambda _{\mathrm{dmd}}^{(n)}\) and \({\varvec{w}}_{\mathrm{dmd}}^{(n)}\). \(\square \)

Finally, we discuss how to construct \(M_{r}\) to satisfy the assumption for the strong consistency. The most simple solution is \(M_{r}\) to be the first r left singular vectors of [XY]. In this case, \((2n)^{-1}[X,Y] [X,Y]^{\top }\) is convergent to \((2n)^{-1}[{\bar{X}},{\bar{Y}}] [{\bar{X}},{\bar{Y}}]^{\top }+\sigma ^2 I\) with probability one, proved in the same manner as in Lemma 4 [8, Lemma 3.1]. Thus, \(M_{r}\) is convergent to some \({\bar{M}}_{r}\) such that \({\mathcal {R}}({\bar{M}}_{r})={\mathcal {R}}({\bar{X}})\) with probability one. Consequently, we have the next corollary.

Corollary 1

Suppose that Algorithm 3 is applied to Problem 1 under Assumptions 1 and 2. In addition, assume that \(M_{r}\) comprises the first r left singular vectors of [XY]. Moreover, let \(\lambda _{\mathrm{dmd}}^{(n)},{\varvec{w}}_{\mathrm{dmd}}^{(n)}\) denote \(\lambda _{\mathrm{dmd}},{\varvec{w}}_{\mathrm{dmd}}\) in Algorithm 3 for n, respectively. Then each eigenvalue \(\lambda _{\mathrm{dmd}}^{(n)}\) is convergent to a nonzero eigenvalue of A with probability one, and the accumulation points of \({\varvec{w}}_{\mathrm{dmd}}^{(n)}\) are the eigenvector corresponding to \(\lambda _{\mathrm{dmd}}^{(\infty )}\) with probability one.

Note that the SVD of [XY] for \(M_{r}\) can be replaced by the SVD of X or Y to illustrate the strong consistency. In this sense, our consistency analysis covers general situations concerning the subspace for the projection.

5 Conclusion

In this paper, we gave a general framework for algorithm design of the TLS-DMD using the projection for the POD basis and proved its strong consistency. This is a pioneering work on asymptotic statistical analysis to noisy datasets in the situation where projection methods are combined with the DMD. More precisely, the contribution of this paper is that we formulated Algorithm 3 based on the DMD widely used for the analysis of time series data to prove its strong consistency straightforwardly under a simple statistical model of noise as presented in Sect. 4. Theorem 2 can be interpreted as a fundamental theorem for applying the TLS-DMD with the aid of projections to noisy datasets.