Beyond expectations: Residual Dynamic Mode Decomposition and Variance for Stochastic Dynamical Systems

Koopman operators linearize nonlinear dynamical systems, making their spectral information of crucial interest. Numerous algorithms have been developed to approximate these spectral properties, and Dynamic Mode Decomposition (DMD) stands out as the poster child of projection-based methods. Although the Koopman operator itself is linear, the fact that it acts in an infinite-dimensional space of observables poses challenges. These include spurious modes, essential spectra, and the verification of Koopman mode decompositions. While recent work has addressed these challenges for deterministic systems, there remains a notable gap in verified DMD methods for stochastic systems, where the Koopman operator measures the expectation of observables. We show that it is necessary to go beyond expectations to address these issues. By incorporating variance into the Koopman framework, we address these challenges. Through an additional DMD-type matrix, we approximate the sum of a squared residual and a variance term, each of which can be approximated individually using batched snapshot data. This allows verified computation of the spectral properties of stochastic Koopman operators, controlling the projection error. We also introduce the concept of variance-pseudospectra to gauge statistical coherency. Finally, we present a suite of convergence results for the spectral information of stochastic Koopman operators. Our study concludes with practical applications using both simulated and experimental data. In neural recordings from awake mice, we demonstrate how variance-pseudospectra can reveal physiologically significant information unavailable to standard expectation-based dynamical models.


Introduction
Stochastic dynamical systems are widely used to model and study systems that evolve under the influence of both deterministic and random effects.They offer a framework for understanding, predicting, and controlling systems exhibiting randomness.This makes them invaluable across various scientific, engineering, and economic applications.
Given a state-space Ω ⊂ R d and a sample space Ω s , we consider a discrete-time stochastic dynamical system where {τ n } n∈N ∈ Ω s are independent and identically distributed (i.i.d.) random variables with distribution ρ supported on Ω s , x x x 0 ∈ Ω is an initial condition, and F : Ω × Ω s → Ω is a function.In many applications, the function F is unknown or cannot be studied directly, which is the premise of this paper.We adopt the notation F τ (x x x) = F(x x x, τ) for convenience and express x x x n = (F τ n • • • • • F τ 1 )(x x x 0 ), where '•' denotes the composition of functions.With the assumptions above, equation (1) describes a discrete-time Markov process.For such systems, the Kolmogorov backward equation governs the evolution of an observable [34,40], with the right-hand side defined as the stochastic Koopman operator [51].The works [51,57] have spurred increased interest in the data-driven approximation of both deterministic and stochastic Koopman operators and in analyzing their spectral properties [11,43,54].Prominent applications span a variety of fields including fluid dynamics [31,52,66,68], epidemiology [64], neuroscience [9,14,47], finance [46], robotics [6,8], power systems [75,76], and molecular dynamics [39,59,69,70].
Although the function F is usually nonlinear, the stochastic Koopman operator is always linear; however, it operates on an infinite-dimensional space of observables.Of particular interest is the spectral content of the Koopman operator near the unit circle, which corresponds to slow subspaces encapsulating the long-term dynamics.If finite-dimensional eigenspaces can capture this spectral content effectively, they can serve as a finite-dimensional approximation.Numerous algorithms have been developed to approximate the spectral properties of Koopman operators [1,2,10,12,26,30,42,48,52,55].Among these, Dynamic Mode Decomposition (DMD) is particularly popular [44].Initially introduced in the fluids community [67,68], DMD's connection to the Koopman operator was established in [66].Since then, several extensions and variants of DMD have been developed [4,15,19,63,84,85], including methods tailored for stochastic systems [24,72,82,87].
At its core, DMD is a projection method.It is widely recognized that achieving convergence and meaningful applications of DMD can be challenging due to the infinitedimensional nature of Koopman operators [12,23,37,84].Challenges include the presence of spurious (unphysical) modes resulting from projection, essential spectra, 1 the absence of non-trivial finite-dimensional invariant subspaces, and the verification of Koopman mode decompositions (KMDs).Residual Dynamic Mode Decomposition (ResDMD) has been introduced to address these issues for deterministic systems [20,23].ResDMD facilitates a data-driven approach to compute residuals associated with the full infinite-dimensional Koopman operator, thus enabling the computation of spectral properties with controlled errors and the verification of learned dictionaries and KMDs.Despite the evident importance of analyzing stochastic systems through the Koopman perspective, similar verified DMD methods in this setting are absent. 1For an illustrative example of a transition operator with non-trivial essential spectra, refer to [3].If the operator in question is either selfadjoint or an L2 isometry, the methodologies described in [18,21] and [23] respectively, can be applied to compute spectral measures.res var : 0:506 Fig. 1 The evolution of two eigenfunctions on the attractor of the stochastic Van der Pol oscillator from Section 5.2.The plots show the arguments.In blue, we see a sample of the true trajectories, while the expected values predicted from the stochastic Koopman operator are shown in red.Top: Eigenfunction associated with m = 0 and k = 1 in Table 1.The variance residual is small, and trajectories hug the expectation closely.Bottom: Eigenfunction associated with m = 1 and k = 1 in Table 1.The variance residual is large, and trajectories deviate from the expectation.
This paper presents several infinite-dimensional techniques for the data-driven analysis of stochastic systems.The central concept we explore is going beyond expectations to include higher moments within the Koopman framework.Figure 1 illustrates this point by depicting the evolution of two eigenfunctions associated with the stochastic Van der Pol oscillator (detailed in Section 5.2), alongside the expectation determined by the stochastic Koopman operator.Both eigenvalues and eigenfunctions are computed with a negligible projection error. 2 Notably, although both corresponding eigenvalues oscillate at the same frequency due to having identical arguments, the variances of the trajectories exhibit significant differences.This divergence is quantified by what we define as a variance residual (see Section 3.2).

Contributions
The contributions of our paper are as follows: -Variance Incorporation: We integrate the concept of variance into the Koopman framework and establish its relationship with batched Koopman operators.Proposition 2 decomposes a mean squared Koopman error into an infinite-dimensional residual and a variance term.Additionally, we present methodologies (see Algorithms 1 and 2) for independently calculating these components, thereby enhancing the understanding of the spectral prop-erties of the Koopman operator and the deviation from mean dynamics.-Variance-Pseudospectra: We introduce a novel concept of pseudospectra, termed variance-pseudospectra (see Definition 2), which serves as a measure of statistical coherency. 3We also offer algorithms for computing these pseudospectra (see Algorithms 3 and 4) and prove their convergence.-Convergence Theory: Section 4 of our paper is dedicated to proving a suite of convergence theorems.These pertain to the spectral properties of stochastic Koopman operators, the accuracy of KMD forecasts, and the derivation of concentration bounds for estimating Koopman matrices from a finite set of snapshot data.
Various examples are given in Section 5 and code is available at: https://github.com/MColbrook/Residual-Dynamic-Mode-Decomposition.

Previous work
Existing literature on stochastic Koopman operators primarily addresses the challenge of noisy observables in extended dynamic mode decomposition (EDMD) methodologies [82], and in techniques for debiasing DMD [27,35,77].A related concern is the estimation error in Koopman operator approximations due to the finite nature of data sets.This issue is present in both deterministic and stochastic scenarios.As [84] describes, EDMD converges with large data sets to a Galerkin approximation of the Koopman operator.The work in [58] thoroughly analyzes kernel autocovariance operators, including nonasymptotic error bounds under classical ergodic and mixing assumptions.In [60], the authors offer the first comprehensive probabilistic bounds on the finite-data approximation error for truncated Koopman generators in stochastic differential equations (SDEs) and nonlinear control systems.They examine two scenarios: (1) i.i.d.sampling and (2) ergodic sampling, with the latter assuming exponential stability of the Koopman semigroup.Additionally, the variational approach to conformational dynamics (VAC), which bears similarities to DMD, is known for providing spectral estimates of time-reversible processes that result in a self-adjoint transition operator.The connection of VAC with Koopman operators is detailed in [83], and the approximation of spectral information with error bounds is discussed in [39].

Data-driven setup
We present data-driven methods that utilize a dataset of "snapshot" pairs alongside a dictionary of observables.While numerous approaches for selecting a dictionary exist in the literature [17, 32, 80-82, 84, 85], this topic is not the primary focus of our current study. 4Following the methodology outlined in [79], we consider our given data to consist of pairs of snapshots, which are Unlike in deterministic systems, for stochastic systems, it can be beneficial for S to include the same initial condition x x x (m) multiple times, as each execution of the dynamics yields an independent realization of a trajectory.We say that S is M 1 -batched if it can be split into M 1 subsets such that In other words, for each x x x ( j) , we have multiple realizations of F τ (x x x ( j) ).Using batched data, we can approximate higherorder stochastic Koopman operators representing the moments of the trajectories.An unbatched dataset can be adapted to approximate a batched dataset by categorizing or "binning" the x x x points in the snapshot data.In practical scenarios, one may encounter a combination of both batched and unbatched data.Depending on the type of snapshot data used, Galerkin approximations of stochastic Koopman operators can be achieved in the limit of large datasets (as discussed in Section 2.2).

Mathematical Preliminaries
This section discusses several foundational concepts upon which our paper builds.

The stochastic Koopman operator
Let g : Ω → C be a function, commonly called an observable.Given an initial condition x x x 0 ∈ Ω , measuring the initial state of the dynamical system through g yields the value g(x x x 0 ).One time-step later, the measurement g(x x x 1 ) = g(F τ (x x x 0 )) = (g • F τ )(x x x 0 ) is obtained, where τ is a realization from a probability distribution supported on Ω s , i.e., τ ∼ ρ.The "pullback" operator, given g, outputs the "look ahead" measurement function g • F τ .This function is a random variable, and the stochastic Koopman operator is its expectation [56]: Here, E τ represents the expectation with respect to the distribution ρ.The subscript (1) indicates this is the first moment.
Throughout the paper, we assume that the domain of the operator K (1) is L 2 (Ω , ω), where ω is a positive measure on Ω .This space is equipped with an inner product and norm, denoted by ⟨•, •⟩ and ∥ • ∥, respectively.We do not assume that K (1) is compact or self-adjoint.We now introduce the batched Koopman operator, designed to capture the variance and other higher-order moments in the trajectories of dynamical systems.For r ∈ N and g : Ω r → C, we define where the same realization τ ∼ ρ is used for the r arguments of g.Notably, both the classical and the batched versions of the Koopman operators adhere to the semigroup property, as we will demonstrate.
Proposition 1 For any r, n ∈ N, Proof For r = 1, see [24].For r > 1, note that K (r) is a firstorder Koopman operator of a dynamical system on Ω r .⊓ ⊔ This proposition indicates that n applications of the stochastic Koopman operator yield the expected value of an observable after n time steps.It is crucial to understand that K (1) only calculates the expected value.To gain insights into the variability around this mean and to understand the projection error inherent in DMD methods, we need to consider higher-order statistics, such as the variance.These aspects are further explored in Section 3.

Extended Dynamic Mode Decomposition
EDMD is a widely-used method for constructing a finitedimensional approximation of the Koopman operator K (1) , utilizing the snapshot data S in (2).This approach involves projecting the infinite-dimensional Koopman operator onto a finite-dimensional matrix and approximating its entries.For notational simplicity, we will omit the subscript (1) when referring to the Koopman operator in this section.Originally, EDMD assumes that the initial conditions are independently drawn from a distribution ω [84].However, in our adaptation, we apply EDMD to any given S, treating the x x x (m) as quadrature nodes for integration with respect to ω.This flexibility allows us to use different quadrature weights depending on the specific scenario.
One first chooses a dictionary {ψ 1 , . . ., ψ N } in the space L 2 (Ω , ω).This dictionary consists of a list of observables that form a finite-dimensional subspace V N = span{ψ 1 , . . ., ψ N }.EDMD computes a matrix K ∈ C N×N that approximates the action of K within this subspace.Specifically, the goal is to achieve K = P V N K P * V N , where P V N : L 2 (Ω , ω) → V N is the orthogonal projection onto V N .In the Galerkin framework, this equates to: A matrix K satisfying this relationship is given by Commonly, we stack the Ψ and define the feature map Then, for any g ∈ V N , we use the shorthand g = Ψg g g for g(x x x) = ∑ N j=1 g j ψ j (x x x).With the previously defined K, the approximation becomes The accuracy of this approximation depends on how well V N can approximate K g.
The entries of the matrices G and A are inner products and must be approximated using the trajectory data S.For quadrature weights {w m }, we define G as the numerical approximation of G: The weights {w m } reflect the significance assigned to each snapshot in the dataset, influenced by factors such as data distribution or reliability, which we will explore further.Similarly, for A, we define Let Ψ X ,Ψ Y ∈ C M×N collect the dictionary's evaluations of these samples: ) ) . . .
and let W = diag(w 1 , . . ., w M ).Then we can succinctly write Throughout this paper, the symbol X denotes an estimation of the quantity X.
Various sampling methods converge in the large data limit, meaning that lim We detail three convergent sampling methods: (i) Random sampling: In the initial definition of EDMD, ω is a probability measure and {x x x (m) } M m=1 are independently drawn according to ω with each quadrature weight set to w m = 1/M.The strong law of large numbers guarantees that (9)  Specifically, we use: This sampling method's analysis for stochastic Koopman operators is detailed in [82].An advantage is that knowledge of ω is not required.However, the convergence rate depends on the specific problem [36].Note that in an ergodic system, the stochastic Koopman operator is an isometry on L 1 (Ω , ω) but typically not on L 2 (Ω , ω). (iii) High-order quadrature: When the dictionary and F are sufficiently regular, and the dimension d is not too large, and if we can choose the {x x x (m) } M m=1 , employing a high-order quadrature rule is advantageous.For deterministic systems, this approach can significantly increase convergence rates in (9) [23].In stochastic systems, high-order quadrature applies primarily to batched snapshot data.We may select {x x x ( j) } M 1 j=1 based on an M 1point quadrature rule with associated weights {w j } M 1 j=1 .Convergence is achieved as M 2 → ∞, effectively applying Monte Carlo integration of the random variable τ over Ω s for each fixed x x x ( j) .
The convergence described in (9) implies that the eigenvalues obtained through EDMD converge to the spectrum of P V N K P * V N as M → ∞.Therefore, approximating the spectrum of K , denoted Sp(K ), by the eigenvalues of K is closely related to the so-called finite section method [7].However, just as the finite section method can be prone to spectral pollution, which refers to the appearance of spurious modes that accumulate even as the size of the dictionary increases, this is also a concern for EDMD [84].Consequently, having a method to validate the accuracy of the proposed eigenvalue-eigenvector pairs becomes crucial, which is one of the key functions of ResDMD.

Residual Dynamic Mode Decomposition (ResDMD)
Accurately estimating the spectrum of K is critical for analyzing dynamical systems.For deterministic systems, Res-DMD achieves this goal, providing robust spectral estimates [20,23].Unlike classical DMD methods, ResDMD introduces an additional matrix specifically designed to approximate K * K .This enhancement not only offers rigorous error guarantees for the spectral approximation but also enables a posteriori assessment of the reliability of the computed spectra and Koopman modes.This capability is particularly valuable in addressing issues such as spectral pollution, which are common challenges in DMD-type methods.
ResDMD is built around the approximation of residuals associated with K , providing an error bound.For any given candidate eigenvalue-eigenvector pair (λ , g), with λ ∈ C and g = Ψ g g g ∈ V N , one can consider the relative squared residual as follows: This pair (λ , g) can be computed either from K or other methods.A small residual means that λ can be approximately considered as an eigenvalue of K , with g as the corresponding eigenfunction.The relative residual in (10) serves as a measure of the coherency of observables, indicating that observables with smaller residuals play a significant role in the dynamics of the system.If the relative (non-squared) residual is bounded by ε, then K n g = λ n g + O(nε).In other words, λ characterizes the coherent oscillation and the decay/growth in the observable g with time.
The residual is closely related to the notion of pseudospectra [78].
Definition 1 For any λ ∈ C, define: where Cl denotes closure of a set.Furthermore, we say that g is a ε-pseudoeigenfunction if there exists λ ∈ C such that the relative squared residual in (10) To compute (10), notice that three of the four inner products appearing in the numerator are: (11) with A, G numerically approximated by EDMD (8).Hence, the success of the computation relies on finding a numerical approximation to ⟨K [g], K [g]⟩.To that end, we deploy the same quadrature rule discussed in ( 5)-( 6) and set then Y WΨ Y g g g = g g g * Lg g g.We obtain a numerical approximation of (10) as The matrix L introduced by ResDMD formally corresponds to an approximation of K * K .The computation utilizes the same dataset as that employed for G and Ã and is computationally efficient to construct.The work presented in [23] demonstrates that the approximation outlined in ( 13) can be effectively used in various algorithms for rigorously computing the spectra and pseudospectra of K for deterministic systems.However, these results from [23] are not directly applicable to stochastic systems.
3 Variance from the Koopman perspective When analyzing a system with inherent stochasticity, basing conclusions only on the mean trajectory can lead to misleading interpretations, as illustrated in Figure 1.To achieve a more accurate statistical understanding of such systems, it is crucial to quantify how much and in what ways the trajectory deviates from this mean.This need for a more comprehensive analysis underpins our exploration into quantifying the variance.

Variance via Koopman operators
For any observable g ∈ L 2 (Ω , ω) and x x x ∈ Ω , g(F τ (x x x)) is a random variable.One can define its moments: Recalling the definitions in (4), this becomes: This means that the r-th order Koopman operator directly computes the moments of the trajectory.In particular, the combination of the first and the second moment provides the following variance term: We integrate the local definition of variance over the entire domain to define: The following proposition provides a Koopman analog of decomposing an integrated mean squared error (IMSE).
Proof We expand |g(F τ (x x x)) + h(x x x)| 2 for a fixed x x x ∈ Ω and take expectations to find that The result now follows by integrating over x x x with respect to the measure ω.

⊓ ⊔
Similarly, for any two functions g, h ∈ L 2 (Ω , ω), we define the covariance: and obtain the following similar result using covariance: Proposition 2 is analogous to the decomposition of an IMSE and is practically useful.Suppose we use an observation h to approximate −g • F τ , in an attempt to minimize ∥g • F τ + h∥ 2 .An unbiased estimator is −K (1) [g]; however, this approximation will not be perfect due to the variance term in (15).Therefore, there is a variance-residual tradeoff for stochastic Koopman operators.Depending on the type of trajectory data collected, one can approximate the quantities E τ ∥g • F τ + h∥ 2 and ∥K (1) [g] + h∥ 2 in (15) and hence, estimate the third variance term.
Example 1 (Circle map) Let Ω = [0, 1] per be the periodic interval and consider where Ω s = [0, 1] per , ρ is absolutely continuous, and c is a constant.Let ψ j (x x x) = e 2πi jx x x for j ∈ Z. Then Define the constants Ω s e 2πi jτ dρ(τ).
Let D be the operator that multiplies each ψ j by α j .Then K (1) = T D, where T is the Koopman operator corresponding to x x x → x x x + f (x x x).Since ρ is absolutely continuous, the Riemann-Lebesgue lemma implies that lim | j|→∞ α j = 0 and hence D is a compact operator.It follows that if T is bounded, then K (1) is a compact operator.A straightforward computation using (14) shows that For example, if f = 0, K (1) has pure point spectrum with eigenfunctions ψ j .However, as | j| → ∞, the variance converges to one and ψ j become less statistically coherent.This example is explored further in Section 5.1.

⊓ ⊔
Another immediate application of the variance term is in providing an estimated bound for the Koopman operator prediction of trajectories.
Proposition 3 We have for any a > 0.
Proof the result follows from combining Proposition 1 and ( 14) with Chernoff's bound.

⊓ ⊔
The bound can be combined with concentration bounds for Ψ Kn − K n (see Section 4.2).

ResDMD in stochastic systems
In the deterministic setting, ResDMD provides an efficient way to evaluate the accuracy of candidate eigenpairs through the computation of an additional matrix L in (12).However, what happens in the stochastic setting?Suppose that (λ , g) is a candidate eigenpair of K (1) with g ∈ V N .Resembling (10), we consider We can write the numerator in terms of A, G, and L, i.e., Hence, we define which furnishes an approximation of (20).Setting h = −λ g in Proposition 2, we see that Thus, res var (λ , g) approximates the sum of the squared residual ∥K [g] − λ g∥ 2 and the integrated variance of g • F τ .For stochastic systems, the integrated variance of g • F τ is usually non-zero so that lim Based on this notion and drawing an analogy with Definition 1, we make the following definition.
Definition 2 For any λ ∈ C, define: For ε > 0, we define the variance-ε-pseudospectrum as where Cl denotes the closure of a set.Furthermore, we say that g is a variance-ε-pseudoeigenfunction if there exists Superficially, this definition is a straightforward extension of Definition 1.However, there are some essential differences.Both the conceptual understanding and the computation methods need to be modified.
First, the relation (22) shows that Sp var ε (K (1) ) takes into account uncertainty through the variance term.Hence, the variance-pseudospectrum provides a notion of statistical coherency.Furthermore, comparing Definition 1 and Definition 2, we have If the dynamical system is deterministic, then Sp var ε (K (1) ) is equal to the approximate point ε-pseudospectrum.However, in the presence of variance, they are no longer equal.
Second, the relation (22) gives a computational surprise.Following the same derivation between (10)-( 13), with L, A, and G accordingly adjusted through replacing K by K (1) in ( 11)-( 12), we can still compute the variance-residual term.However, the original residual itself, res(λ , g), needs a modification.Recalling (10), in the same spirit of EDMD, if g ∈ V N , we write where H is a newly introduced matrix with We employ the quadrature rule for the x x x-domain to approximate this new term.If S is batched with M 2 = 2, then we can form the matrix w l ψ j (y y y (l,1) )ψ i (y y y (l,2) ).
Since τ l,1 and τ l,2 are independent, we have We stress that K (1) is applied separately to ψ i and ψ j and thus τ l,1 and τ l,2 need to be independent realizations.The convergence in (25) allows us to compute the spectral properties of K (1) directly (see Section 3.3).In particular, instead of (13), we now have and the approximate decomposition which becomes exact in the large data limit.

Algorithms
In the derivations above, we noticed that one-batched data permits computation only of res var (λ , g), while two-batched data also permits the computation of res(λ , g).Algorithms 1 and 2 approximate the relative residuals of EDMD eigenpairs in the scenario of unbatched and batched data, respectively.In Algorithm 2, we have taken an average when computing Ã and L to reduce quadrature error, and an average when computing H to ensure that it is self-adjoint (and positive semi-definite).Algorithm 3 approximates the pseudospectrum and corresponding pseudoeigenfunctions, given batched snapshot data.Algorithm 4 approximates the variance-pseudospectrum and corresponding variance-pseudoeigenfunctions, and does not need batched data.Note that the computational complexity of all of these algorithms scales the same as those for ResDMD, which is discussed in [20,23].In particular, Algorithms 1 and 2 scale the same as EDMD.

Theoretical guarantees
We now prove the correctness of the algorithms mentioned above.Specifically, through a series of theorems, we demonstrate that the computations of Ã, G, L, and H are accurate and that the spectral estimates can be trusted.To achieve this, we divide the section into three subsections, each focusing on demonstrating the accuracy of the spectrum, the predictive power, and the matrices, respectively.The universal assumptions made in this section are as follows: -K (1) is bounded.
-{ψ j } N j=1 are linearly independent for any finite N.
The algorithms and proofs can be readily adapted for an unbounded K (1) .The latter two assumptions can also be relaxed with minor modifications.

Accuracy in finding spectral quantities
In this subsection, we prove the convergence of our algorithms.We have already discussed the convergence of residuals in Algorithms 1 and 2, under the assumption of convergence of the finite matrices G, Ã, L, and H in the large data Algorithm 3 : Pseudospectra (batched data).
1: Compute where Ψ X and Y are given in (7) and the superscript for Ψ Y corresponds to each batch of snapshot data.2: For each z j , compute r j = min g g g∈C N res(z j ,Ψg g g) (see (26)) and the corresponding singular vectors g g g j .This step is a generalized SVD problem.Output: {z j : r j < ε}, an estimate of Sp ε (K (1) ), and pseudoeigenfunctions {g g g j : r j < ε}.
limit.Hence, we focus on Algorithm 4. We first define the functions f M,N (λ ) = min g g g∈C N res var (λ ,Ψg g g), and note that r j = f M,N (z j ) in Algorithm 4. Our first lemma describes the limit of these functions as M → ∞ and N → ∞.
λ ) exists.Moreover, f N is a nonincreasing function of N and converges to σ var inf from above and uniformly on compact subsets of C as a function of the spectral parameter λ .
Proof The limit f N (λ ) = lim M→∞ f M,N (λ ) follows trivially from the convergence of matrices.Moreover, we have By definition, we also have Let δ > 0 and choose g ∈ L 2 (Ω , ω) such that ∥g∥ = 1 and , there exists some n and g n ∈ V n such that ∥g n ∥ = 1 and It follows that f n (λ ) ≤ σ var inf (λ ) + 2δ .Since this holds for any δ > 0, lim N→∞ f N (λ ) = σ var inf (λ ).Since σ var inf (λ ) is continuous in λ , f N converges uniformly down to σ var inf on compact subsets of C by Dini's theorem.

⊓ ⊔
Let {Grid(N) = {z 1,N , z 2,N , . . ., z k(N),N }} be a sequence of grids, each finite, such that for any λ ∈ C, For example, we could take In practice, one considers a grid of points over the region of interest in the complex plane.Lemma 1 tells us that to study Algorithm 4 in the large data limit, we must analyze To make the convergence of Algorithm 4 precise, we use the Attouch-Wets metric defined by [5]:  Proof Lemma 1 shows that Γ ε N (K (1) ) ⊂ Sp var ε (K (1) ).To prove convergence, we use the characterization of the Attouch-Wets topology.Suppose that m is large such that B m (0 ).Hence, we must show that given δ > 0, there exists n 0 such that if N > n 0 then Sp var ε (K (1) )∩B m (0) ⊂ Γ ε N (K (1) ) + B δ (0).Suppose for a contradiction that this statement is false.Then, there exists δ > 0, λ n j ∈ Sp var ε (K ( 1) ) ∩ B m (0), and Without loss of generality, we can assume that λ n j → λ ∈ Sp var ε (K ( 1) ) ∩ B m (0).There exists some z with σ var inf (z) < ε and |λ − z| ≤ δ /2.Let z n j ∈ Grid(n j ) such that |z − z n j | ≤ dist(z, Grid(n j )) + n j −1 .Since σ var inf is continuous and f N converges locally uniformly to σ var inf , we must have f n j (z n j ) < ε for large n j so that which is smaller than δ for large n j , and we reach the desired contradiction.

Error bounds for iterations
We now aim to bound the difference between Kn and K n , a step crucial for measuring the accuracy of our approximation of the mean trajectories in L 2 (Ω , ω).This effort, in conjunction with the Chernoff-like bound presented in (19), enables us to compute the statistical properties of the trajectories and their forecasts.Our approach to establishing these bounds is twofold.First, we consider the difference between Kn and K n , taking into account both the estimation errors and the errors intrinsic to the subspace.Subsequently, we establish concentration bounds for the estimation errors of G, Ã, and L.
Theorem 2 (Error bound for forecasts) Define the quantities Proof We introduce the two matrices Note that We can re-write T as It follows that We have that A simple proof by induction now shows that We wish to bound the quantity We can express the final term on the right-hand side as

It follows that
and hence that The theorem now follows from the triangle inequality.

⊓ ⊔
This theorem explicitly tells us how much to trust the prediction using the computed Koopman matrix, compared with the true Koopman operator.The quantities ∆ G and ∆ A represent errors due to estimation or quadrature.They are both expected to be small.The quantity δ n (g) is an intrinsic invariant subspace error that depends on the dictionary and observable g.To approximate δ n (g), note that and hence To bound the term on the right-hand side, we can use the matrix H in (24) and the fact that

Estimation error for computation of A, G, and L
To effectively estimate K (1) g and Sp var ε (K ( 1) ) in practical applications, it is imperative to have reliable approximations of A, G, and L. We provide a justification for our ability to construct such approximations from trajectory data with high probability, employing concentration bounds.The subsequent result delineates the requisite number of samples and basis functions needed to achieve a desired level of accuracy with high probability.To ensure this level of accuracy, several reasonable assumptions about the stochastic dynamical system are necessary.
Assumption 1 We suppose that x x x (m) in the snapshot data are sampled at random according to ω, independent of τ, and for simplicity, assume that ω is a probability measure. 6 We assume that τ : Ω s → H for some Hilbert space H and let κ = (x x x, τ).In this section, E and P are with respect to the joint distribution of κ.We assume that -The random variable κ is sub-Gaussian, meaning that there exists some a > 0 such that This allows us to define the following finite quantity: 6 Similar types of bounds to Theorem 3 can be derived for ergodic sampling and high-order quadrature sampling.
-The dictionary functions are uniformly bounded and satisfy the following Lipschitz condition: -The function F is Lipschitz with With these assumptions, we can show that our approximations of A, G, and L are good with high probability.

Theorem 3 (Concentration bound on estimation errors)
Under Assumption 1 we have, for any t > 0, where ∥ • ∥ Fr denotes the Frobenius norm, and α and β are given by Proof We first argue for ∥ Ã − A∥ Fr .Fix j, k ∈ {1, . . ., N} and define the random variable where we have used Hölder's inequality to derive the last line.It follows that For any b > 0, we have λ |Y | ≤ λ 2 /(2b) + b|Y | 2 /2.We also have bY 2 ≤ exp(bY 2 /2).It follows that . Now let {Y (m) } M m=1 independent copies of Y , then where we use Markov's inequality in the first inequality.
Minimizing over λ , we obtain We can argue in the same manner for −Y and deduce that Similarly, we can argue for the imaginary part of E[X] − X.We now allow j, k to vary and let X j,k = ψ k (F(x x x, τ))ψ j (x x x).For t > 0, consider the events S j,k,1 : 1 (P(S c j,k,1 ) + P(S c j,k,2 )) Moreover, the AM-GM inequality implies that It follows that If ∩ j,k,i S j,k,i , then ∥ Ã − A∥ Fr < t.We can argue in the same manner, without the function F, to deduce that Finally, for the matrix L and its estimate L, we derive similar concentration bounds for ψ k (F(x x x, τ))ψ j (F(x x x, τ)) to see that The statement of the theorem now follows.
⊓ ⊔ This theorem explicitly spells out the number of basis functions and samples required to approximate the three matrices appearing in Theorem 2. Roughly speaking, if we set For any fixed tolerance t, the confidence exponentially tightens up when M, the number of samples, increases.The idea is similar to other concentration inequality type bounds: if one samples from the same distribution many times, the sample mean becomes closer and closer to the true mean, and this bound gives the confidence interval for the tail bound.
On the other hand, when N increases, more entries in the matrices need to be approximated, so it brings a logarithmically negative effect.More samples are needed to balance out the increase of N.

Examples
We now present three examples.The first two are based on numerically sampled trajectory data, while the final example utilizes collected experimental data.

Arnold's circle map
For our first example, we revisit the circle map discussed in Example 1, setting c = 1/5, ρ as the uniform distribution on [0, 1], and defining Our dictionary consists of Fourier modes {exp(i jx x x) : j = −n, . . ., n} with n = 20 (yielding N = 41), and we use batched trajectory data with M 1 = 100 equally spaced {x x x ( j) }, and M 2 = 2 × 10 4 .Figure 2 illustrates the convergence of the matrices Ã, L, and H.We do not display the convergence of G as its error was on the order of machine precision, a result of the exponential convergence achieved by the trapezoidal  quadrature rule across different batches.Figure 3 shows the residuals computed using Algorithm 2. The quantity res var (λ , g) deviates from (18) (the formula for f = 0), particularly when |λ | is small.As n increases, the residuals res(λ , g) converge to zero, indicating more accurate computation of the spectral content of K (1) .However, the residuals res var (λ , g) converge to finite positive values, except for the trivial eigenvalue 1, which satisfies lim M→∞ res var (λ , g) = 0.
To underscore the significance of variance in our analysis, Figure 4 displays the absolute value of the matrix L − H, which approximates the covariance matrix defined in (16).Notably, the covariance disappears for the constant function exp(i jx x x) with j = 0, and the matrix is diagonally dominated.Figure 5 presents the results obtained from applying Algorithms 3 and 4.These results align in areas where the variance is minimal (large |λ |).However, in regions where |λ | is small, the variance component in (27) becomes significant.This observation leads us to infer that only about seven eigenpairs are of meaningful significance in a statistically coherent framework.

Stochastic Van der Pol oscillator
We now consider the stochastic differential equation where B t denotes standard one-dimensional Brownian motion, δ > 0, and µ > 0. 7 This equation represents a noisy version of the Van der Pol oscillator.In the absence of noise, the Van der Pol oscillator exhibits a limit cycle to which all initial conditions converge, except for the unstable fixed point at the origin.The introduction of noise transforms the system, resulting in a global attractor that forms a band around the deterministic system's limit cycle.
The generator of the stochastic solutions, known as the backward Kolmogorov operator, is described in [25,Section 9.3].It is a second-order elliptic type differential operator L , defined by For a discrete times step ∆ t , the Koopman operator is given by exp(∆ t L ).In the absence of noise (δ = 0), the Koopman operator has eigenvalues forming a lattice [53,Theorem 13]: where ω 0 ≈ 1 − µ 2 /16 is the base frequency of the limit cycle [74].When δ is moderate, the base frequency of the  averaged limit cycle remains similar to that in the deterministic case [45].
We simulate the dynamics using the Euler-Maruyama method [65] with a time step of 3 × 10 −3 .Data are collected along a single trajectory of length M 1 = 10 6 with M 2 = 2, starting the sampling after the trajectory reaches the global attractor.We employ 318 Laplacian radial basis functions with centers on the attractor as our dictionary.The parameters are set to µ = 0.5, δ = 0.02, and ∆ t = 0.3.Figure 6 displays the results obtained using Algorithms 3 and 4. Similar to observations from the circle map example, Sp ε (K (1) ) and Sp var ε (K (1) ) exhibit greater similarity near the unit circle.The lattice-like structure in the eigenvalues is also evident, with the EDMD-computed eigenvalues appearing as perturbations of the set { λm,k }.Table 1 lists some of these eigenvalues alongside the residuals calculated using Algorithm 2. We observe that as |k| increases, res(λ , g) also increases, and similarly, res var (λ , g) increases with m.For any given eigenvalue, res(λ , g) decreases to zero with larger dictionaries.In contrast, res var (λ , g) approaches a finite non-zero value, except for the trivial eigenvalue, which has a constant eigenfunction exhibiting zero variance.Figure 7 illustrates the corresponding eigenfunctions on the attractor, showcasing their beautiful modal structure.
In this example, the norm of the Koopman operator ∥K ∥ is approximately 1, and the subspace error δ n (g) predominantly contributes to the bound established in Theorem 2. We analyze the two observables X 1 and X 2 , each starting from a point randomly selected on the attractor.Figure 8 presents the calculated values of δ n (X 1 ) and δ n (X 2 ) as per (29) and (30), along with the variance of the trajectory.Additionally, Figure 9 compares the values computed using K n X i with the actual values of K n X i , obtained by integrating the generator L .Together, these figures demonstrate the Var(X i (x n )) Fig. 8 Left: Subspace errors δ n (X 1 ) and δ n (X 2 ) for the stochastic Van der Pol oscillator, computed using ( 29) and (30).Right: Variance of trajectory.
We have rescaled the horizontal axis in both plots to correspond to time.convergence of the mean trajectories towards the dominant subspace of K .

Neuronal population dynamics
As a final example, we apply our approach to experimental neuroscience data.Recent technological advancements in this field now allow for the simultaneous monitoring of large neuronal populations in the brains of awake, behaving animals.This development has spurred significant interest in employing data-driven methods to derive physically meaningful insights from high-dimensional neural measurements [62].
To analyze complex neural data, researchers have employed a variety of analytical tools to uncover features like low-dimensional manifolds, latent population dynamics, withintrial variance, and trial-to-trial variability.However, existing methods often examine these features in isolation [16,29,61,73].From a dynamical systems perspective, a unified model that captures these distinct aspects of neural data would be highly advantageous.In this context, the Koopman operator framework offers a compelling approach to analyzing highdimensional neural observables [47].DMD has emerged as a prominent method for the spatiotemporal decomposition of diverse datasets [9,14].Nevertheless, a limitation of DMD is its lack of explicit uncertainty quantification regarding the modes and forecasts it uncovers.This aspect is particularly vital in neural time series analysis, where it is challenging to identify physically meaningful spectral components [28].
Our framework offers a unified, data-driven solution to uncover validated latent dynamical modes and their associated variance in neural data.To demonstrate its efficacy, we applied it to high-dimensional neuronal recordings from the visual cortex of awake mice, as publicly shared by the Allen Brain Observatory [71], involving 400-800 neurons per mouse.Our focus was on the "Drifting Gratings" task epoch, wherein mice were presented with gratings drifting in one of eight directions (0 • , 45 • , etc.), modulated sinusoidally at one of five temporal frequencies.We specifically analyzed responses to gratings modulated at 15 Hz across all eight directions, as these stimuli consistently elicited an identifiable eigenvalue in the neural data corresponding to the expected frequency.This analysis encompassed 120 trials per mouse (stimulus duration of 2s) for a total of 20 mice, as detailed in [71].We computed distinct stochastic Koopman operators for 15 different arousal levels, categorized by the average pupil diameter measured during the 500ms be- fore each stimulus [49].For this analysis, DMD was employed to identify 100 dictionary functions.Our data-driven approach was effective in identifying an isolated, population-level coherent mode at the stimulus frequency.As illustrated in Figure 10, this is evidenced by a distinct eigenvalue, highlighted in green, which consistently appears as a clear local minimum in the variance pseudospectra contour plots across various arousal states.Without the variance pseudospectra, discerning which DMD eigenvalues are reliable and indicative of coherence can be challenging.We observed that individual neurons displayed a variety of waveforms, all linked to this single linear dynamic mode.Demonstrating the diversity of these responses, Figure 11 showcases five randomly chosen sample trajectories from the KMD.These trajectories highlight the distinct spike counts and/or timings of different neurons, all parsimoniously represented by a single latent mode.
Importantly, neuronal responses demonstrate significant trial-to-trial variability, a phenomenon of considerable physiological interest due to its close relationship with ongoing fluctuations in an animal's internal state.Dynamical systems approaches are adept at modeling this type of variability, which often stems from changes in the neural population's pre-stimulus state [61].Furthermore, the extent of this vari-ability is heavily influenced by internal states like arousal and attention, as detailed in [50].Our stochastic modeling approach enables us to additionally estimate this second source of trial-to-trial variability in neuronal responses.
To validate the physiological significance of our variance estimates, we analyzed the variance linked to the Koopman operators computed across each of 15 levels of pupil diameter, effectively using pupil diameter as a parameter for the Koopman operator in relation to arousal.Our hypothesis was that this analysis would reflect the well-known "Ushape" pattern described by the Yerkes-Dodson law [86], with variance minimized at intermediate arousal levels [49].Figure 10 indicates that the eigenvalue or expectation derived from 10 remains consistent across various arousal states.However, from Figure 12, a notable modulation in variance residuals is observed in accordance with arousal levels, aligning with our predictions: the variance associated with the leading mode is specifically reduced at intermediate arousal levels.This pattern underscores the physiological relevance of the variance estimates yielded by our modeling approach.Consequently, our findings suggest that arousal systematically influences dynamical variance, providing both practical and physiological rationales for employing dynamical models that explicitly estimate variance.Overall, our datadriven framework offers a unified and formal representation of neural dynamics, parsimoniously capturing multiple physiologically significant features in the data.

Conclusion
We have demonstrated the role of variance in the Koopman analysis of stochastic dynamical systems.To effectively study projection errors in data-driven approaches for these systems, it is crucial to move beyond expectations and study more than just the stochastic Koopman operator.Incorporating variance into the Koopman framework enhances our understanding of spectral properties and the related projection errors.By analyzing various types of residuals, we have developed data-driven algorithms capable of computing the spectral properties of infinite-dimensional stochastic Koopman operators.Furthermore, we introduced the concept of variance pseudospectra, a tool designed to assess statistical coherency.From a computational perspective, our work includes several convergence theorems pertinent to the spectral properties of these operators.In the realm of experimental neural recordings, our framework has proven effective in extracting and compactly representing multiple data features with known physiological significance.
There are several avenues of future work related to this paper.One such direction involves an analysis of the algorithms and theorems presented in Section 4 in scenarios involving noisy snapshot data.Another avenue explores the Variance Relative Squared Residual Fig. 12 The variance relative squared residual as a function of the arousal state.The red lines show the average across the mice, and the green error bounds correspond to the standard error of the mean.The "U-shape" is characteristic of the so-called Yerkes-Dodson law, which we produce in a data-driven fashion from the dynamics.
trade-offs between computing the squared residual and variance terms, as outlined in (15), potentially reflecting variancebias trade-offs in statistical analysis.Additionally, we aim to assess the robustness and generalizability of the proposed framework across further stochastic dynamical systems.

where C 1 ,
C 2 are closed nonempty subsets of C.This metric corresponds to local uniform converge on compact subsets of C. For any closed nonempty sets C and C n , d AW (C n ,C) → 0 if and only if for any δ > 0 and B m (0) (closed ball of radius m ∈ N about 0), there exists N such that if n > N then C n ∩ B m (0) ⊂ C + B δ (0) and C ∩ B m (0) ⊂ C n + B δ (0).The following theorem contains our convergence result.

Fig. 2 Fig. 3
Fig. 2 Estimation error for the matrices Ã, L and H for the circle map.The solid line shows the expected Monte-Carlo convergence rate.

Fig. 4
Fig. 4 Absolute values of the matrix L − H for the circle map.This difference corresponds to the covariance matrix in (16).

Fig. 5
Fig. 5 Pseudospectra vs. variance pseudospectra.Left: Output of Algorithm 3 for the circle map.Right: Output of Algorithm 4 for the circle map.We have shown the minimized residuals over a contour plot of ε in both cases.The red dots correspond to the EDMD eigenvalues.

Fig. 6
Fig. 6 Pseudospectra vs. variance pseudospectra.Left: Output of Algorithm 3 for the stochastic Van der Pol oscillator.Right: Output of Algorithm 4 for the stochastic Van der Pol oscillator.We have shown the minimized residuals over a contour plot of ε in both cases.The red dots correspond to the EDMD eigenvalues.

Fig. 7
Fig. 7 Computed eigenfunctions (real part shown) of the stochastic Van der Pol oscillator.Due to conjugate symmetry, we have only shown eigenfunctions corresponding to eigenvalues with non-negative imaginary parts.

Fig. 9
Fig.9Comparison of computed K n X i , where K ∈ C N×N is the EDMD matrix, and the true values of K n X i .

Fig. 10 Fig. 11
Fig.10Variance pseudospectra for a single mouse in the neuronal population dynamics example.Each case corresponds to a pupil diameter of 8% (left), 28% (middle), and 43% (right).The identified mode is shown in green, and the red dots show the other DMD eigenvalues.The variance pseudospectra changes considerably as the arousal state changes, but the green eigenvalue shows little variability.

Table 1
Computed eigenvalues of the stochastic Van der Pol oscillator, and the residuals computed using Algorithm 2. We have ordered them according to perturbations of λm,k .Due to conjugate symmetry, we have only shown eigenvalues with non-negative imaginary parts.