Strong consistency of the projected total least squares dynamic mode decomposition for datasets with random noise

Dynamic mode decomposition (DMD) has attracted much attention in recent years as an analysis method for time series data. In this paper, we perform asymptotic analysis on the DMD to prove strong consistency in the statistical sense. More specifically, we first give a statistical model of random noise for data with observation noise. Among many variants of the DMD, the total least squares DMD (TLS-DMD) is known as a robust method for the random noise in the observation. We focus on consistency analysis of the TLS-DMD in the statistical sense and aim to generalize the analysis for projected methods. This paper gives a general framework for designing projection methods based on efficient dimensionality reduction analogously to the proper orthogonal decomposition under the statistical model and proves its strong consistency of the estimation.


Introduction
Dynamic mode decomposition (DMD) [16] is an analysis method originally proposed in the field of fluids analysis, and in recent years, it has attracted much attention as an analysis method for time series data with the trend of data science. The effectiveness of for an m × m matrix A, where the meaning of ≈ is defined in the following sections with mathematical rigor. In addition, we assume that z k ∈ ℝ m (k = 0, 1, …) can be partially observed as follows. For positive integers k 1 , k 2 , … , k n , data pairs (z k , z k +1 ) are given for = 1, 2, … , n . Define X, Y ∈ ℝ m×n such that Then we have AX ≈ Y from (1).
Our problem setting is that datasets presented as m × n matrices X, Y are given. We aim to find the r eigenvalues with large absolute values and the corresponding eigenvectors of unknown matrix A, where r is considerably smaller than min(m, n) . The aim is due to the assumption that A can be approximated by a rank r matrix. Basically, the DMD is used to compute the above eigenpairs without explicitly constructing the large scale matrix A. For this purpose, the singular value decomposition (SVD) [9,Sect. 6.3] is applied. More precisely, the truncated SVD is computed to approximate X in the algorithm of the DMD. The algorithm is written as follows.
Require: Matrices X, Y ∈ R m×n , rank r(≤ min(m, n)) 1: Compute the truncated SVD of X(≈ UrΣrV r ): Ur ∈ R m×r comprises the first r left singular vectors, Σr ∈ R r×r is a diagonal matrix comprising the first r singular values, Vr ∈ R n×r comprises the first r right singular vectors 2: A dmd = U r Y VrΣr −1 3: Compute the eigenpairs of A dmd w dmd = λ dmd w dmd , and let w dmd = Ur w dmd Remark 1 Throughout the paper, the first r singular values and vectors mean the r largest singular values and the corresponding singular vectors. In addition, we assume that rank(X) and rank(Y) are not smaller than r. This feature is consistent with the problem setting below.
The above matrix expression of the DMD shows relations to other decomposition techniques based on the least squares methods. See [16,Sect. 2.2] for the details.

Noise model of the DMD
Usually, the observed snapshot datasets are contaminated with noise. In addition, with (1) and (2) in mind, we consider the following simple statistical model. Let where X and Ȳ are not contaminated with noise. In addition, let where all the elements of E, F are random variables. Hence, we have AX ≈ Y for the above exact A associated with the matrix datasets without noise. The above noise model can be applied to time series data related to the Koopman spectral analysis for dynamical system; see [2,5,14,17,18] for the details.
Noting a particular feature of the Koopman spectral analysis, we can discover the following normalization for our problem setting. Letting R denote the range of the argument matrix, we assume R(X) = R(Ȳ) in the following discussion. The theoretical justification of this assumption is explained in [1].
Simply, if the dynamical system forming the time series is z k+1 = Az k (k = 0, 1, … , ) , we can see that R(X) = R(Y)(⊂ R(A)) by the definitions as in (2). However, in practical situations, z k+1 = Az k for each step k is only approximate as in (1), and thus X and Y are defined based on X and Ȳ above, taking stochastic perturbations into account.
With such a background, we consider the next noise model, slightly different from [1].

Problem 1
Find r nonzero eigenvalues i (i = 1, … , r) and the corresponding eigenvectors w i (i = 1, … , r) of A ∈ ℝ m×m with rank r(≤ m) under the following assumptions: • Assume that unknown matrices X ,Ȳ ∈ ℝ m×n and A ∈ ℝ m×m satisfy • Let i (i = 1, … , r) denote nonzero eigenvalues of A. Assume that the eigenvalues i (i = 1, … , r) are all distinct, i.e., i ≠ j (1 ≤ i ≠ j ≤ r) . Moreover, let w i (i = 1, … , r) denote the eigenvectors corresponding to i (i = 1, … , r) . In other words, • Observed matrices are X, Y ∈ ℝ m×n such that where E, F ∈ ℝ m×n are random matrices.
In the above problem setting, we assume that the nonzero eigenvalues of A are all simple, different from the statistical model in [1] because we aim another generalization concerning algorithmic development and its convergence theory under the above simple condition. Our purpose is to perform an asymptotic convergence analysis for n → ∞ under reasonable assumptions. To this end, we consider the next assumptions in this paper.

3
Strong consistency of the projected TLS-DMD for datasets with random noise
Assumption 2 Under Assumption 1, the rank of the matrix lim n→∞ n −1XX⊤ is r in Problem 1.
Under the above assumptions, we consider numerical methods for solving Problem 1. The parameter estimation of the above noise model is relevant to the total least squares (TLS) problems as in [8,13]. In the following, we explain the TLS and its application to the DMD.

Remark 2
In Problem 1, the condition R(X) = R(Ȳ) might be too strong. In addition, one may think that Assumptions 1 and 2 are unrealistic assumptions for time series data. While the above criticisms are valid, it should be noted that, to the best of the author's knowledge, there are currently no analytical results for stochastic perturbations when projection is used for the DMD and its variants. This study aims to prove consistency under the simplest of conditions, which is a significant contribution in itself. Each of the conditions imposed on Problem 1 is based on the following background and makes some sense.
The condition R(X) = R(Ȳ) is due to (1) and (2). In terms of theoretical aspect of Problem 1, this condition leads to the uniqueness of the target eigenpairs of A, though A is not unique [1, Lemma 9].
Assumptions 1 and 2 originally come from the asymptotic analysis of an errorsin-variables linear regression model [8, Assumptions A, B, and C]. It is an important future issue to examine how natural these assumptions are for time series data from dynamical systems. The following is an intuitive reason why the assumptions seem to hold. In the DMD, we assume that the nonzero eigenvalues of matrix A are approximately in the unit circle of the complex plane. This is because the goal is to analyze the stable behavior of the system over time. The time series z k ( = 1, … , n) in each column of the matrix X have approximately the same size, and n −1 XX ⊤ is the average of z k z k ⊤ ( = 1, … , n) . Since the absolute values of the nonzero eigenvalues of A are equal, it is unlikely that the information on a particular eigenvector will disappear, and we can expect that the necessary information on the r eigenvectors will remain in X. This property leads to Assumptions 1 and 2 for X . Mathematically rigorously formulating and proving this property is an issue to be addressed in the future.

Total least squares DMD (TLS-DMD)
For given X, Y ∈ ℝ m×n , the total least squares (TLS) problem is formulated as For computing A tls , there is a sophisticated method using the singular value decomposition (SVD) [9,Sect. 6.3]. Although A tls is usually assumed to be nonsingular, the rank-deficient case is investigated in [13]. In [7,10], such a method is successfully applied to the DMD, where m ≤ n is assumed. We consider the method in [10] as follows.
Compute the SVD of Z, and let Q be an n×r matrix comprising the first r right singular vectors Important features of this algorithm are described as follows. In the following, † means the Moore-Penrose inverse. In the above algorithm, let Note that, in the case of r = m ≤ n , the matrix Â tls is the solution of the TLS problem. In general, for any r ≤ min(m, n) , it is easy to see that Â tls in (3) satisfies Â tlsX =Ŷ for m × r matrices X ,Ŷ in line 3 of Algorithm 2, where we assume rank(X) = r . More importantly, This implies that Algorithm 2 computes eigenpairs of Â tls by the Rayleigh-Ritz procedure [15,Sect. 4.3] using the r dimensional subspace.

Strong consistency of the TLS-DMD
First of all, note that, in general, the strong convergence of random variables is defined as follows.
Definition 1 (Strong convergence) The sequence of random variables X (n) (n = 0, 1, …) converges to X with probability one if and only if where is a sample space and N is a null set for a probability measure.
On the basis of probabilistic analysis of the TLS in [8,13], the strong convergence of eigenpairs computed by Algorithm 2 is proved in [1]. In other words, the strong consistency of the estimation by the TLS-DMD is theoretically guaranteed for Problem 1, where we assume that rank(A) can be estimated by some methods to set r ∶= rank(A).
Theorem 1 (Strong consistency of the TLS-DMD [1]) Suppose that Algorithm 2 is applied to Problem 1 with r(< m) under Assumptions 1 and 2. Let (n) dmd , w (n) dmd denote dmd , w dmd in Algorithm 2 for n, respectively. Then each eigenvalue (n) dmd is convergent to a nonzero eigenvalue of A with probability one, and the accumulation points of w (n) dmd are the eigenvector corresponding to (∞) dmd with probability one.
More precisely, in [1], the convergence theorem states that the eigenvectors (or generalized eigenvectors) of a perturbed matrix go toward the corresponding invariant subspace. However, if we assume that all the nonzero eigenvalues are simple, all the computed eigenpairs converge to the target eigenpairs, as in the above theorem. The proof is due to the continuity of eigenpairs for simple eigenvalues.

General framework with the use the POD approach
Proper orthogonal decomposition (POD) is an efficient technique for dimensionality reduction, resulting in an idea to use it together with the DMD. Such a method is often called the projected DMD [10,18] because the POD basis is used for the subspace to be projected. A typical usage of the POD is based on the truncated SVD of data matrices. As an example, where r is a diagonal matrix comprising the r largest singular values of X and U r , V r are the corresponding singular vector matrices. Then the columns of U r is a basis of the POD. As another example, one might choose due to R(Ȳ) ⊂ R(A) in general cases. If we aim to reduce errors based on the law of large numbers, we might use all the snapshots for the POD basis. In this paper, to cover the above various cases, we construct a general framework for computing the POD basis, resulting in the projected TLS-DMD.

General projection methods using the POD basis
The aim of this section is to design an algorithm for computing the exact eigenpairs of A in Problem 1 in the asymptotic regime n → ∞ . To this end, we consider the next algorithm, regarded as a general framework to use the POD basis, where the basis vectors are the columns of a matrix M r ∈ ℝ m×r . The matrix M r is obtained by the SVD as the above discussions. The use of M r is related to the Rayleigh-Ritz procedure. More specifically, M r comprises orthonormal basis vectors for the subspace of the Rayleigh-Ritz procedure. If rank(X) = m, where † means the Moore-Penrose inverse. In other words, the Rayleigh-Ritz procedure for computing the eigenpairs of YX † is performed. Due to A ≈ YX † , the above Rayleigh-Ritz procedure aims to compute the eigenpairs of A.

Numerical tests
Practical efficiency of the DMD and the POD approach is investigated in previous study [7,10]. Here we present simple numerical results to validate the convergence theory of Algorithm 3, proved in the next section, and show its efficiency. All the tests are performed in MATLAB.
Let L ∈ ℝ m×r denote a random matrix given by the MATLAB's built-in function randn. In our test, let m = 200, r = 2 and implying rank(A) = 2 , where L † is computed by the MATLAB's built-in function pinv. In this case, the exact eigenvalues are ±i . The matrices X and Ȳ are given as follows. Generate K ∈ ℝ m×n as a random matrix with the MATLAB's

3
Strong consistency of the projected TLS-DMD for datasets with random noise built-in function randn and let X ∶= AK and Ȳ ∶= AX to satisfy the condition R(X) = R(Ȳ)(= R(A)) in Problem 1.
The results of the numerical tests are shown in Tables 1 and 2. We used three methods: Original DMD (Algorithm 1), TLS-DMD (Algorithm 2), Projected TLS-DMD (Algorithm 3). In Algorithm 3, M r is given by the SVD of [X, Y]. The error of eigenvalues is defined as the square root of the sum of squares of the errors of each computed eigenvalue. The error of each eigenvector is defined as follows. Let denote the angle between w and w dmd . Here w and w dmd are normalized such that ‖w‖ = ‖w dmd ‖ = 1 for the Euclidean norm ‖ ⋅ ‖ . Then we have where H means the Hermitian transpose due to w, w dmd ∈ ℂ m . On the basis of this feature, we compute for each eigenvector. The square root of their sum of squares is defined as the errors of eigenvectors. The random noise for each component of X, Y is given by the Gaussian distribution N(0, 0.1 2 ) . Tables 1 and 2 display the averages of 100 trials of the errors for sample size n = 50, 100.
From Tables 1 and 2, we see the efficiency of the projected TLS-DMD in terms of accuracy and its convergence behavior. More specifically, the projection in Algorithm 3 appears to improve estimation accuracy. Since the matrix M r is given by the SVD of [X, Y] comprising all the snapshots, the subspace spanned by the columns of M r can be  a nice approximation of R(X)(= R(Ȳ)) . Such a practical performance must be deeply investigated with mathematical rigor in the future. We would like to emphasize here that the main result of this paper is to prove strong consistency of the projected TLS-DMD (Algorithm 3), and thus the detailed numerical and theoretical comparisons with existing methods from the practical viewpoints are beyond the scope of this paper. The theoretical justification of the convergence of the projected TLS-DMD is presented in the next section.

Consistency analysis
In this section, we prove strong consistency for Algorithm 3. More specifically, we prove that the computed eigenpairs converge to the exact eigenpairs of A under reasonable assumptions. To this end, we first show an important feature of the Rayleigh-Ritz procedure for denoised matrices in Sect. 4.1. Next, we prove consistency in the statistical sense of Algorithm 3 for noisy datasets in Sect. 4.2. This entire mathematical analysis is the contribution of this paper. Although the presented strong consistency below is a development of the result in [1], its consistency analysis is a completely new approach. From a technical perspective, the discovery of Lemma 5 is crucial, and to take advantage of it, the key properties of projective methods are organized to complete the proof.
As a preparation, we note a basic feature of Algorithm 2. In Algorithm 2, let N r be a 2m × r matrix comprising the first r left singular vectors of Z. Even if we define X ,Ŷ ∈ ℝ m×r such that [ � X ⊤ , � Y ⊤ ] ∶= N ⊤ r in line 3 of Algorithm 2, the eigenpairs computed by this modified algorithm are equal to those in the original algorithm.
The above feature is easily proved as follows. Using a diagonal matrix Z ∈ ℝ r×r comprising the nonzero singular values of Z, we have due to the relationship between the left singular vectors and the right singular vectors. Thus, we have in the modified algorithm. Therefore, ŶX † = YQ −1 Z Z (XQ) † = YQ(XQ) † is equal to ŶX † in Algorithm 2, where † means the Moore-Penrose inverse. Since the TLS-DMD is the Rayleigh-Ritz procedure for ŶX † , the modified algorithm is mathematically equivalent to the original algorithm.
Note that the first r left singular vectors of Z and the first r eigenvectors of ZZ ⊤ are the same. Hence, the above modified algorithm can be descried as follows.
Strong consistency of the projected TLS-DMD for datasets with random noise Algorithm 4 Total least squares DMD.
Require: Matrices X, Y ∈ R m×n , rank r(≤ min(m, n)) 1: Z = [X , Y ] ∈ R 2m×n 2: Compute the eigenvectors of ZZ , and let Nr be a 2m × r matrix comprising the first r eigenvectors 3: Define X, Y ∈ R m×r such that [ X , Y ] := N r 4: Compute the SVD: X = UΣV , where U ∈ R m×r , Σ ∈ R r×r , V ∈ R r×r 5: A dmd = U Y V Σ −1 6: Compute the eigenpairs of A dmd w dmd = λ dmd w dmd , and let w dmd = U w dmd The next lemma is a basic tool of our mathematical analysis.

Lemma 1 Algorithm 4 is mathematically equivalent to Algorithm 2.
We use Algorithm 4 in the following discussion.

Deterministic analysis
Our mathematical analysis begins with deterministic analysis. We aim to analyze Algorithm 3. In line 2 of Algorithm 3, Algorithm 2 is applied to r × n matrices X, Y. Noting Lemma 1, we investigate the case of r = m in Algorithm 4, resulting in the next lemma.

Lemma 2
Suppose that Algorithm 4 is applied with input matrices X ∶=X, Y ∶=Ȳ , where X ,Ȳ are defined in Problem 1 with r = m . Assume that there exist a nonsingular matrix C ∈ ℂ r×r , a diagonal nonsingular matrix D ∈ ℂ r×r , and a full-row rank matrix K ∈ ℂ r×n such that X = CK and Y = CDK . Then Moreover, each computed eigenvalue dmd in Algorithm 4 is an eigenvalue of A, and w dmd is the corresponding eigenvector that is a column of C.
Proof Using AX = Y, X = CK, Y = CDK , we see From rank(K) = r , it is easy to see that AC = CD holds, where D is a diagonal matrix. Therefore, we obtain (4).
In the following, we prove that each computed eigenvalue dmd in Algorithm 4 is an eigenvalue of A, and w dmd is the corresponding eigenvector. First of all, noting the definitions of X, Y, Z, we have In the following, H means the Hermitian transpose for complex matrices. Thus, holds. In addition, from the eigenspace spanned by the first r eigenvectors of ZZ ⊤ is the range of Therefore, letting L ∈ ℂ r×r denote a nonsingular matrix, we have where N r ∈ ℝ 2m×r is an eigenvector matrix of ZZ ⊤ . In other words, in line 3 of Algorithm 4. Note that X = CL ∈ ℝ r×r is nonsingular, where we now assume r = m . It then follows that Thus, Algorithm 4 computes the eigenpairs of A. Therefore, each computed eigenvalue dmd in Algorithm 4 is an eigenvalue of A, and w dmd is the corresponding eigenvector. ◻ Using the above lemma for Algorithm 4, we see that, in an ideal case, Algorithm 3 computes the exact eigenpairs of A defined in Problem 1, as in the next lemma.

Lemma 3 Suppose that Algorithm 3 is applied with input matrices
where X ,Ȳ are defined in Problem 1. In addition, assume Then each computed eigenvalue dmd is an eigenvalue of A, and w dmd is the corresponding eigenvector.
where W is an eigenvector matrix corresponding to the nonzero eigenvalues of A. Noting rank(A) = r , we have R(A) = R(W) . In addition, due to rank(Z) = rank([X ⊤ ,Ȳ ⊤ ] ⊤ ) = r and R(X) = R(Ȳ) , we see rank(X) = rank(Ȳ) = r . Hence, R(X) = R(Ȳ) = R(W) . Thus, noting (5) In a real situation, the exact X ,Ȳ are unknown, perturbed matrices X, Y can be observed as in Problem 1. In the following, we analyze this situation.

Statistical analysis
This section is devoted to statistical analysis for Algorithm 3, where the data matrices X, Y are contaminated with random noise.
Recall that E and F are random matrices, and thus X ∶=X + E and Y ∶=Ȳ + F are also random matrices. In addition, any matrix computed by X and Y is random matrix such as Z and M r . The basic tool for asymptotic analysis n → ∞ is the next lemma.   More specifically, letting Z (n) ∶= Z comprising random variables with the use of the sample size n, we have where is a sample space and N 1 is a null set as in Definition 1.
Thus, n −1 ZZ ⊤ is not convergent to n −1ZZ⊤ . At first glance, this property makes an analysis of convergence less straightforward. However, we can easily avoid this difficulty as follows. Importantly, the eigenvectors of n −1 ZZ ⊤ are convergent to those of n −1ZZ⊤ from the above equality. Algorithm 4 computes the eigenvector matrix N r to obtain the eigenpairs of A. Thus, Algorithm 4 can compute the exact eigenpairs in the asymptotic regime n → ∞.
In the following, as in Lemma 4, we use superscripts to denote matrices for a fixed n, as in Z (n) . In addition, if a discussion of the sample space is required, ∈ is explicitly indicated, as in Z (n) ( ) . The same notation is used for random matrices other than Z.
The next lemma states that Algorithm 3 has a similar feature to Lemma 4.  (n) dmd , w (n) dmd denote dmd , w dmd in Algorithm 3 for n, respectively. Then each eigenvalue (n) dmd is convergent to a nonzero eigenvalue of A with probability one, and the accumulation points of w (n) dmd are the eigenvector corresponding to (∞) dmd with probability one.

Lemma 5 Suppose that Algorithm 3 is applied to Problem
Note that the SVD of [X, Y] for M r can be replaced by the SVD of X or Y to illustrate the strong consistency. In this sense, our consistency analysis covers general situations concerning the subspace for the projection.

Conclusion
In this paper, we gave a general framework for algorithm design of the TLS-DMD using the projection for the POD basis and proved its strong consistency. This is a pioneering work on asymptotic statistical analysis to noisy datasets in the situation where projection methods are combined with the DMD. More precisely, the contribution of this paper is that we formulated Algorithm 3 based on the DMD widely used for the analysis of time series data to prove its strong consistency straightforwardly under a simple statistical model of noise as presented in Sect. 4. Theorem 2 can be interpreted as a fundamental theorem for applying the TLS-DMD with the aid of projections to noisy datasets.