Beyond expectations: residual dynamic mode decomposition and variance for stochastic dynamical systems

Colbrook, Matthew J.; Li, Qin; Raut, Ryan V.; Townsend, Alex

doi:10.1007/s11071-023-09135-w

Beyond expectations: residual dynamic mode decomposition and variance for stochastic dynamical systems

Original Paper
Open access
Published: 23 December 2023

Volume 112, pages 2037–2061, (2024)
Cite this article

Download PDF

You have full access to this open access article

Nonlinear Dynamics Aims and scope Submit manuscript

Beyond expectations: residual dynamic mode decomposition and variance for stochastic dynamical systems

Download PDF

Matthew J. Colbrook ORCID: orcid.org/0000-0003-4964-9575¹,
Qin Li²,
Ryan V. Raut^3,4 &
…
Alex Townsend⁵

1808 Accesses
3 Citations
Explore all metrics

Abstract

Koopman operators linearize nonlinear dynamical systems, making their spectral information of crucial interest. Numerous algorithms have been developed to approximate these spectral properties, and dynamic mode decomposition (DMD) stands out as the poster child of projection-based methods. Although the Koopman operator itself is linear, the fact that it acts in an infinite-dimensional space of observables poses challenges. These include spurious modes, essential spectra, and the verification of Koopman mode decompositions. While recent work has addressed these challenges for deterministic systems, there remains a notable gap in verified DMD methods for stochastic systems, where the Koopman operator measures the expectation of observables. We show that it is necessary to go beyond expectations to address these issues. By incorporating variance into the Koopman framework, we address these challenges. Through an additional DMD-type matrix, we approximate the sum of a squared residual and a variance term, each of which can be approximated individually using batched snapshot data. This allows verified computation of the spectral properties of stochastic Koopman operators, controlling the projection error. We also introduce the concept of variance-pseudospectra to gauge statistical coherency. Finally, we present a suite of convergence results for the spectral information of stochastic Koopman operators. Our study concludes with practical applications using both simulated and experimental data. In neural recordings from awake mice, we demonstrate how variance-pseudospectra can reveal physiologically significant information unavailable to standard expectation-based dynamical models.

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Article 05 June 2015

Data-driven linearization of dynamical systems

Article Open access 15 August 2024

Enhancing spectral analysis in nonlinear dynamics with pseudoeigenfunctions from continuous spectra

Article Open access 20 August 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Automotive Engineering

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Stochastic dynamical systems are widely used to model and study systems that evolve under the influence of both deterministic and random effects. They offer a framework for understanding, predicting, and controlling systems exhibiting randomness. This makes them invaluable across various scientific, engineering, and economic applications.

Given a state-space $\Omega \subset \mathbb {R}^d$ and a sample space $\Omega _s$, we consider a discrete-time stochastic dynamical system

$$\begin{aligned} \pmb {x}_{n} = F(\pmb {x}_{n-1},\tau _n), \qquad n\ge 1, \quad \pmb {x}_n \in \Omega , \end{aligned}$$

(1)

where $\{\tau _n\}_{n\in \mathbb {N}}\in \Omega _s$ are independent and identically distributed (i.i.d.) random variables with distribution $\rho $ supported on $\Omega _s$, $\pmb {x}_0\in \Omega $ is an initial condition, and $F: \Omega \times \Omega _s\rightarrow \Omega $ is a function. In many applications, the function F is unknown or cannot be studied directly, which is the premise of this paper. We adopt the notation $F_{\tau }(\pmb {x})=F(\pmb {x},\tau )$ for convenience and express $\pmb {x}_{n} = (F_{\tau _n}\circ \cdots \circ F_{\tau _1})(\pmb {x}_0)$, where ‘$\circ $’ denotes the composition of functions.

With the assumptions above, equation (1) describes a discrete-time Markov process. For such systems, the Kolmogorov backward equation governs the evolution of an observable [34, 40], with the right-hand side defined as the stochastic Koopman operator [51]. The works [51, 57] have spurred increased interest in the data-driven approximation of both deterministic and stochastic Koopman operators and in analyzing their spectral properties [11, 43, 54]. Prominent applications span a variety of fields including fluid dynamics [31, 52, 66, 68], epidemiology [64], neuroscience [9, 14, 47], finance [46], robotics [6, 8], power systems [75, 76], and molecular dynamics [39, 59, 69, 70].

Although the function F is usually nonlinear, the stochastic Koopman operator is always linear; however, it operates on an infinite-dimensional space of observables. Of particular interest is the spectral content of the Koopman operator near the unit circle, which corresponds to slow subspaces encapsulating the long-term dynamics. If finite-dimensional eigenspaces can capture this spectral content effectively, they can serve as a finite-dimensional approximation. Numerous algorithms have been developed to approximate the spectral properties of Koopman operators [1, 2, 10, 12, 26, 30, 42, 48, 52, 55]. Among these, dynamic mode decomposition (DMD) is particularly popular [44]. Initially introduced in the fluids community [67, 68], DMD’s connection to the Koopman operator was established in [66]. Since then, several extensions and variants of DMD have been developed [4, 15, 19, 63, 84, 85], including methods tailored for stochastic systems [24, 72, 82, 87].

At its core, DMD is a projection method. It is widely recognized that achieving convergence and meaningful applications of DMD can be challenging due to the infinite-dimensional nature of Koopman operators [12, 23, 37, 84]. Challenges include the presence of spurious (unphysical) modes resulting from projection, essential spectra,^{Footnote 1} the absence of non-trivial finite-dimensional invariant subspaces, and the verification of Koopman mode decompositions (KMDs). Residual Dynamic Mode Decomposition (ResDMD) has been introduced to address these issues for deterministic systems [20, 23]. ResDMD facilitates a data-driven approach to compute residuals associated with the full infinite-dimensional Koopman operator, thus enabling the computation of spectral properties with controlled errors and the verification of learned dictionaries and KMDs. Despite the evident importance of analyzing stochastic systems through the Koopman perspective, similar verified DMD methods in this setting are absent.

This paper presents several infinite-dimensional techniques for the data-driven analysis of stochastic systems. The central concept we explore is going beyond expectations to include higher moments within the Koopman framework. Figure 1 illustrates this point by depicting the evolution of two eigenfunctions associated with the stochastic Van der Pol oscillator (detailed in Sect. 5.2), alongside the expectation determined by the stochastic Koopman operator. Both eigenvalues and eigenfunctions are computed with a negligible projection error.^{Footnote 2} Notably, although both corresponding eigenvalues oscillate at the same frequency due to having identical arguments, the variances of the trajectories exhibit significant differences. This divergence is quantified by what we define as a variance residual (see Sect. 3.2).

1.1 Contributions

The contributions of our paper are as follows:

Variance Incorporation: We integrate the concept of variance into the Koopman framework and establish its relationship with batched Koopman operators. Proposition 2 decomposes a mean squared Koopman error into an infinite-dimensional residual and a variance term. Additionally, we present methodologies (see Algorithms 1 and 2) for independently calculating these components, thereby enhancing the understanding of the spectral properties of the Koopman operator and the deviation from mean dynamics.
Variance-Pseudospectra: We introduce a novel concept of pseudospectra, termed variance-pseudospectra (see Definition 2), which serves as a measure of statistical coherency.^{Footnote 3} We also offer algorithms for computing these pseudospectra (see Algorithms 3 and 4) and prove their convergence.
Convergence Theory: Sect. 4 of our paper is dedicated to proving a suite of convergence theorems. These pertain to the spectral properties of stochastic Koopman operators, the accuracy of KMD forecasts, and the derivation of concentration bounds for estimating Koopman matrices from a finite set of snapshot data.

Various examples are given in Sect. 5 and code is available at: https://github.com/MColbrook/Residual-Dynamic-Mode-Decomposition.

1.2 Previous work

Existing literature on stochastic Koopman operators primarily addresses the challenge of noisy observables in extended dynamic mode decomposition (EDMD) methodologies [82], and in techniques for debiasing DMD [27, 35, 77]. A related concern is the estimation error in Koopman operator approximations due to the finite nature of data sets. This issue is present in both deterministic and stochastic scenarios. As [84] describes, EDMD converges with large data sets to a Galerkin approximation of the Koopman operator. The work in [58] thoroughly analyzes kernel autocovariance operators, including nonasymptotic error bounds under classical ergodic and mixing assumptions. In [60], the authors offer the first comprehensive probabilistic bounds on the finite-data approximation error for truncated Koopman generators in stochastic differential equations (SDEs) and nonlinear control systems. They examine two scenarios: (1) i.i.d. sampling and (2) ergodic sampling, with the latter assuming exponential stability of the Koopman semigroup. Additionally, the variational approach to conformational dynamics (VAC), which bears similarities to DMD, is known for providing spectral estimates of time-reversible processes that result in a self-adjoint transition operator. The connection of VAC with Koopman operators is detailed in [83], and the approximation of spectral information with error bounds is discussed in [39].

1.3 Data-driven setup

We present data-driven methods that utilize a dataset of “snapshot” pairs alongside a dictionary of observables. While numerous approaches for selecting a dictionary exist in the literature [17, 32, 80,81,82, 84, 85], this topic is not the primary focus of our current study.^{Footnote 4} Following the methodology outlined in [79], we consider our given data to consist of pairs of snapshots, which are

$$\begin{aligned} \texttt {S}=\left\{ (\pmb {x}^{(m)},\pmb {y}^{(m)})\right\} _{m=1}^M,\quad \pmb {y}^{(m)}=F(\pmb {x}^{(m)},\tau _m). \end{aligned}$$

(2)

Unlike in deterministic systems, for stochastic systems, it can be beneficial for $\texttt {S}$ to include the same initial condition $\pmb {x}^{(m)}$ multiple times, as each execution of the dynamics yields an independent realization of a trajectory. We say that $\texttt {S}$ is $M_1$-batched if it can be split into $M_1$ subsets such that

$$\begin{aligned} \texttt {S}&=\cup _{j=1}^{M_1}{} \texttt {S}_j,\\ \texttt {S}_j&=\{(\pmb {x}^{(j)},\pmb {y}^{(j,k)}):k=1,\ldots ,M_2,\pmb {y}^{(j,k)}=F(\pmb {x}^{(j)},\tau _{j,k})\}. \end{aligned}$$

In other words, for each $\pmb {x}^{(j)}$, we have multiple realizations of $F_\tau (\pmb {x}^{(j)})$. Using batched data, we can approximate higher-order stochastic Koopman operators representing the moments of the trajectories. An unbatched dataset can be adapted to approximate a batched dataset by categorizing or “binning” the $\pmb {x}$ points in the snapshot data. In practical scenarios, one may encounter a combination of both batched and unbatched data. Depending on the type of snapshot data used, Galerkin approximations of stochastic Koopman operators can be achieved in the limit of large datasets (as discussed in Sect. 2.2).

2 Mathematical preliminaries

This section discusses several foundational concepts upon which our paper builds.

2.1 The stochastic Koopman operator

Let $g:\Omega \rightarrow \mathbb {C}$ be a function, commonly called an observable. Given an initial condition $\pmb {x}_0\in \Omega $, measuring the initial state of the dynamical system through g yields the value $g(\pmb {x}_0)$. One time-step later, the measurement $g(\pmb {x}_1) = g(F_\tau (\pmb {x}_0)) = (g\circ F_\tau )(\pmb {x}_0)$ is obtained, where $\tau $ is a realization from a probability distribution supported on $\Omega _s$, i.e., $\tau \sim \rho $. The “pull-back” operator, given g, outputs the “look ahead” measurement function $g\circ F_\tau $. This function is a random variable, and the stochastic Koopman operator is its expectation [56]:

$$\begin{aligned} \mathscr {K}_{(1)}[g] = \mathbb {E}_{\tau }\left[ g\circ F_\tau \right] =\int _{\Omega _s} g\circ F_\tau \,\textrm{d}\rho (\tau ). \end{aligned}$$

(3)

Here, $\mathbb {E}_{\tau }$ represents the expectation with respect to the distribution $\rho $. The subscript (1) indicates this is the first moment. Throughout the paper, we assume that the domain of the operator $\mathscr {K}_{(1)}$ is $L^2(\Omega ,\omega )$, where $\omega $ is a positive measure on $\Omega $. This space is equipped with an inner product and norm, denoted by $\langle \cdot ,\cdot \rangle $ and $\Vert \cdot \Vert $, respectively. We do not assume that $\mathscr {K}_{(1)}$ is compact or self-adjoint.

We now introduce the batched Koopman operator, designed to capture the variance and other higher-order moments in the trajectories of dynamical systems. For $r\in \mathbb {N}$ and $g:\Omega ^{r}\rightarrow \mathbb {C}$, we define

$$\begin{aligned} \mathscr {K}_{(r)}[g] = \mathbb {E}_{\tau }\left[ g(F_\tau ,\ldots ,F_\tau )\right] , \end{aligned}$$

(4)

where the same realization $\tau \sim \rho $ is used for the r arguments of g. Notably, both the classical and the batched versions of the Koopman operators adhere to the semigroup property, as we will demonstrate.

Proposition 1

For any $r,n\in \mathbb {N}$,

$$\begin{aligned} \mathscr {K}_{(r)}^n[g]=\mathbb {E}_{\tau _1,\ldots ,\tau _n}\left[ g(F_{\tau _n}\circ \cdots \circ F_{\tau _1},\ldots ,F_{\tau _n}\circ \cdots \circ F_{\tau _1})\right] . \end{aligned}$$

Proof

For $r=1$, see [24]. For $r>1$, note that $\mathscr {K}_{(r)}$ is a first-order Koopman operator of a dynamical system on $\Omega ^r$. $\square $

This proposition indicates that n applications of the stochastic Koopman operator yield the expected value of an observable after n time steps. It is crucial to understand that $\mathscr {K}_{(1)}$ only calculates the expected value. To gain insights into the variability around this mean and to understand the projection error inherent in DMD methods, we need to consider higher-order statistics, such as the variance. These aspects are further explored in Sect. 3.

2.2 Extended dynamic mode decomposition

EDMD is a widely-used method for constructing a finite-dimensional approximation of the Koopman operator $\mathscr {K}_{(1)}$, utilizing the snapshot data $\texttt {S}$ in (2). This approach involves projecting the infinite-dimensional Koopman operator onto a finite-dimensional matrix and approximating its entries. For notational simplicity, we will omit the subscript (1) when referring to the Koopman operator in this section. Originally, EDMD assumes that the initial conditions are independently drawn from a distribution $\omega $ [84]. However, in our adaptation, we apply EDMD to any given $\texttt {S}$, treating the $\pmb {x}^{(m)}$ as quadrature nodes for integration with respect to $\omega $. This flexibility allows us to use different quadrature weights depending on the specific scenario.

One first chooses a dictionary $\{\psi _1,\ldots ,\psi _{N}\}$ in the space $L^2(\Omega ,\omega )$. This dictionary consists of a list of observables that form a finite-dimensional subspace $V_N=\textrm{span}\{\psi _1,\ldots ,\psi _{N}\}$. EDMD computes a matrix $K\in \mathbb {C}^{N\times N}$ that approximates the action of $\mathscr {K}$ within this subspace. Specifically, the goal is to achieve $K=\mathscr {P}_{V_{N}}\mathscr {K}\mathscr {P}_{V_{N}}^*$, where $\mathscr {P}_{V_{N}}:L^2(\Omega ,\omega )\rightarrow V_N$ is the orthogonal projection onto $V_N$. In the Galerkin framework, this equates to:

$$\begin{aligned} \langle \mathscr {K}[\psi _j],\psi _i\rangle = \sum _{s=1}^N K_{s,j}\langle \psi _s,\psi _i\rangle , \qquad 1\le i,j\le N. \end{aligned}$$

A matrix K satisfying this relationship is given by

$$\begin{aligned} K = G^{\dagger }A, \qquad G_{i,j} = \langle \psi _j,\psi _i\rangle ,\quad A_{i,j} = \langle \mathscr {K}[\psi _j],\psi _i\rangle . \end{aligned}$$

Commonly, we stack the $\Psi $ and define the feature map

$$\begin{aligned} \Psi (\pmb {x})=\begin{bmatrix}\psi _1(\pmb {x})&\cdots&\psi _N(\pmb {x}) \end{bmatrix}^\top \in \mathbb {C}^{1\times N}. \end{aligned}$$

Then, for any $g\in V_N$, we use the shorthand $g=\Psi \pmb {g}$ for $g(\pmb {x}) = \sum _{j=1}^N g_j\psi _j(\pmb {x})$. With the previously defined K, the approximation becomes

$$\begin{aligned} \mathscr {K}[g](\pmb {x}) \approx \sum _{i=1}^N \left( \sum _{j=1}^N K_{i,j}g_j\right) \psi _i(\pmb {x})=\Psi (\pmb {x})K\pmb {g}. \end{aligned}$$

The accuracy of this approximation depends on how well $V_N$ can approximate $\mathscr {K}g$.

The entries of the matrices G and A are inner products and must be approximated using the trajectory data $\texttt {S}$. For quadrature weights $\{w_m\}$, we define $\tilde{G}$ as the numerical approximation of G:

$$\begin{aligned} \tilde{G}_{i,j} = \sum _{m=1}^{M} w_{m} \psi _j(\pmb {x}^{(m)})\overline{\psi _i(\pmb {x}^{(m)})}\approx \langle \psi _j\,,\psi _i\rangle \,= {G}_{i,j}\,. \end{aligned}$$

(5)

The weights $\{w_m\}$ reflect the significance assigned to each snapshot in the dataset, influenced by factors such as data distribution or reliability, which we will explore further. Similarly, for A, we define

$$\begin{aligned} \tilde{A}_{i,j} = \sum _{m=1}^{M} w_{m} \psi _j(\pmb {y}^{(m)})\overline{\psi _i(\pmb {x}^{(m)})} \approx \langle \mathscr {K}[\psi _j]\,,\psi _i\rangle \,=A_{i,j}\,. \end{aligned}$$

(6)

Let $\Psi _X,\Psi _Y\in \mathbb {C}^{M\times N}$ collect the dictionary’s evaluations of these samples:

$$\begin{aligned} \Psi _X=\begin{pmatrix} \Psi ^\top (\pmb {x}^{(1)})\\ \vdots \\ \Psi ^\top (\pmb {x}^{(M)}) \end{pmatrix}\,,\quad \Psi _Y=\begin{pmatrix} \Psi ^\top (\pmb {y}^{(1)})\\ \vdots \\ \Psi ^\top (\pmb {y}^{(M)}) \end{pmatrix}\,, \end{aligned}$$

(7)

and let $W=\textrm{diag}(w_1,\ldots ,w_{M})$. Then we can succinctly write

$$\begin{aligned} \tilde{G}=\Psi _X^*W\Psi _X,\quad \tilde{A}=\Psi _X^*W\Psi _Y. \end{aligned}$$

(8)

Throughout this paper, the symbol $\tilde{X}$ denotes an estimation of the quantity X.

Various sampling methods converge in the large data limit, meaning that

$$\begin{aligned} \lim _{M\rightarrow \infty } \tilde{G}=G,\quad \lim _{M\rightarrow \infty } \tilde{A}=A. \end{aligned}$$

(9)

We detail three convergent sampling methods:

(i)
Random sampling: In the initial definition of EDMD, $\omega $ is a probability measure and $\{\pmb {x}^{(m)}\}_{m=1}^M$ are independently drawn according to $\omega $ with each quadrature weight set to $w_m=1/M$. The strong law of large numbers guarantees that (9) holds with probability one [38, Section 3.4] [41, Section 4]. Typically, convergence occurs at a Monte Carlo rate of $\mathscr {O}(M^{-1/2})$ [13].
(ii)
Ergodic sampling: If the stochastic dynamical system is ergodic, the Birkhoff–Khinchin theorem [33, Theorem II.8.1, Corollary 3] supports convergence using data from a single trajectory for almost every initial point. Specifically, we use:
$$\begin{aligned}{} & {} \pmb {x}^{(m+1)}=F(\pmb {x}^{(m)},\tau _{m}),\quad w_m=1/M. \end{aligned}$$
This sampling method’s analysis for stochastic Koopman operators is detailed in [82]. An advantage is that knowledge of $\omega $ is not required. However, the convergence rate depends on the specific problem [36]. Note that in an ergodic system, the stochastic Koopman operator is an isometry on $L^1(\Omega ,\omega )$ but typically not on $L^2(\Omega ,\omega )$.
(iii)
High-order quadrature: When the dictionary and F are sufficiently regular, and the dimension d is not too large, and if we can choose the $\{\pmb {x}^{(m)}\}_{m=1}^{M}$, employing a high-order quadrature rule is advantageous. For deterministic systems, this approach can significantly increase convergence rates in (9) [23]. In stochastic systems, high-order quadrature applies primarily to batched snapshot data. We may select $\{\pmb {x}^{(j)}\}_{j=1}^{M_1}$ based on an $M_1$-point quadrature rule with associated weights $\{w_j\}_{j=1}^{M_1}$. Convergence is achieved as $M_2\rightarrow \infty $, effectively applying Monte Carlo integration of the random variable $\tau $ over $\Omega _s$ for each fixed $\pmb {x}^{(j)}$.

The convergence described in (9) implies that the eigenvalues obtained through EDMD converge to the spectrum of $\mathscr {P}_{V_{N}}\mathscr {K}\mathscr {P}_{V_{N}}^*$ as $M\rightarrow \infty $. Therefore, approximating the spectrum of $\mathscr {K}$, denoted $\textrm{Sp}(\mathscr {K})$, by the eigenvalues of $\tilde{K}$ is closely related to the so-called finite section method [7]. However, just as the finite section method can be prone to spectral pollution, which refers to the appearance of spurious modes that accumulate even as the size of the dictionary increases, this is also a concern for EDMD [84]. Consequently, having a method to validate the accuracy of the proposed eigenvalue-eigenvector pairs becomes crucial, which is one of the key functions of ResDMD.

2.3 Residual dynamic mode decomposition (ResDMD)

Accurately estimating the spectrum of $\mathscr {K}$ is critical for analyzing dynamical systems. For deterministic systems, ResDMD achieves this goal, providing robust spectral estimates [20, 23]. Unlike classical DMD methods, ResDMD introduces an additional matrix specifically designed to approximate $\mathscr {K}^*\mathscr {K}$. This enhancement not only offers rigorous error guarantees for the spectral approximation but also enables a posteriori assessment of the reliability of the computed spectra and Koopman modes. This capability is particularly valuable in addressing issues such as spectral pollution, which are common challenges in DMD-type methods.

ResDMD is built around the approximation of residuals associated with $\mathscr {K}$, providing an error bound. For any given candidate eigenvalue-eigenvector pair $(\lambda ,g)$, with $\lambda \in \mathbb {C}$ and $g=\Psi \,\pmb {g}\in V_{N}$, one can consider the relative squared residual as follows:

$$\begin{aligned}&\frac{\int _{\Omega }\left| \mathscr {K}[g](\pmb {x})-\lambda g(\pmb {x})\right| ^2\,\textrm{d}\omega (\pmb {x})}{\int _{\Omega }\left| g(\pmb {x})\right| ^2\,\textrm{d}\omega (\pmb {x})}\nonumber \\&\quad =\frac{\langle \mathscr {K}[g],\mathscr {K}[g]\rangle -\lambda \langle g,\mathscr {K}[g]\rangle -\overline{\lambda }\langle \mathscr {K}[g],g\rangle +|\lambda |^2\langle g,g\rangle }{\langle g,g\rangle }. \end{aligned}$$

(10)

This pair $(\lambda ,g)$ can be computed either from K or other methods. A small residual means that $\lambda $ can be approximately considered as an eigenvalue of $\mathscr {K}$, with g as the corresponding eigenfunction. The relative residual in (10) serves as a measure of the coherency of observables, indicating that observables with smaller residuals play a significant role in the dynamics of the system. If the relative (non-squared) residual is bounded by $\epsilon $, then $\mathscr {K}^ng=\lambda ^n g+\mathscr {O}(n\epsilon )$. In other words, $\lambda $ characterizes the coherent oscillation and the decay/growth in the observable g with time.

The residual is closely related to the notion of pseudospectra [78].

Definition 1

For any $\lambda \in \mathbb {C}$, define:

$$\begin{aligned} \sigma _{\textrm{inf}}(\lambda )=\inf \left\{ \Vert \mathscr {K}[g]-\lambda g\Vert :g{\in }L^2(\Omega ,\omega ),\Vert g\Vert =1\right\} . \end{aligned}$$

For $\epsilon >0$, the approximate point^{Footnote 5}$\epsilon $-pseudospectrum is

$$\begin{aligned} \textrm{Sp}_{\epsilon }(\mathscr {K})=\textrm{Cl}\left( \left\{ \lambda \in \mathbb {C}:\sigma _{\textrm{inf}}(\lambda )<\epsilon \right\} \right) , \end{aligned}$$

where $\textrm{Cl}$ denotes closure of a set. Furthermore, we say that g is a $\epsilon $-pseudoeigenfunction if there exists $\lambda \in \mathbb {C}$ such that the relative squared residual in (10) is bounded by $\epsilon ^2$.

To compute (10), notice that three of the four inner products appearing in the numerator are:

$$\begin{aligned} \langle \mathscr {K}[g],g\rangle =\pmb {g}^*A\pmb {g},\;\langle g,\mathscr {K}[g]\rangle =\pmb {g}^*A^*\pmb {g},\; \langle g,g\rangle =\pmb {g}^*G\pmb {g}, \end{aligned}$$

(11)

with A, G numerically approximated by EDMD (8). Hence, the success of the computation relies on finding a numerical approximation to $\langle \mathscr {K}[g],\mathscr {K}[g]\rangle $. To that end, we deploy the same quadrature rule discussed in (5)-(6) and set

$$\begin{aligned} L=[L_{i,j}]\,,\quad L_{i,j} = \langle \mathscr {K}[\psi _j],\mathscr {K}[\psi _i]\rangle ,\quad \tilde{L}=\Psi _Y^*W\Psi _Y\,, \end{aligned}$$

(12)

then $\langle \mathscr {K}[g],\mathscr {K}[g]\rangle \approx \pmb {g}^*\Psi _Y^*W\Psi _Y\pmb {g}=\pmb {g}^*\tilde{L}\pmb {g}$. We obtain a numerical approximation of (10) as

$$\begin{aligned} {[}\textrm{res}(\lambda ,g)]^2=\frac{\pmb {g}^*\left[ \tilde{L}- \lambda \tilde{A}^* - \overline{\lambda }\tilde{A} + |\lambda |^2\tilde{G}\right] \pmb {g}}{\pmb {g}^*\tilde{G}\pmb {g}}.\nonumber \\ \end{aligned}$$

(13)

The matrix L introduced by ResDMD formally corresponds to an approximation of $\mathscr {K}^*\mathscr {K}$. The computation utilizes the same dataset as that employed for $\tilde{G}$ and $\tilde{A}$ and is computationally efficient to construct. The work presented in [23] demonstrates that the approximation outlined in (13) can be effectively used in various algorithms for rigorously computing the spectra and pseudospectra of $\mathscr {K}$ for deterministic systems. However, these results from [23] are not directly applicable to stochastic systems.

3 Variance from the Koopman perspective

When analyzing a system with inherent stochasticity, basing conclusions only on the mean trajectory can lead to misleading interpretations, as illustrated in Fig. 1. To achieve a more accurate statistical understanding of such systems, it is crucial to quantify how much and in what ways the trajectory deviates from this mean. This need for a more comprehensive analysis underpins our exploration into quantifying the variance.

3.1 Variance via Koopman operators

For any observable $g\in L^2(\Omega ,\omega )$ and $\pmb {x}\in \Omega $, $g(F_\tau (\pmb {x}))$ is a random variable. One can define its moments:

$$\begin{aligned} \mathbb {E}_{\tau }[(g(F_\tau (\pmb {x})))^r]=\int _{\Omega _s} [g(F_\tau (\pmb {x}))]^r\,\textrm{d}\rho (\tau ),\quad r\in \mathbb {N}. \end{aligned}$$

Recalling the definitions in (4), this becomes:

$$\begin{aligned} \mathbb {E}_{\tau }[(g(F_\tau (\pmb {x})))^r]=\mathscr {K}_{(r)}[g\otimes \cdots \otimes g](\pmb {x},\ldots ,\pmb {x}). \end{aligned}$$

This means that the r-th order Koopman operator directly computes the moments of the trajectory. In particular, the combination of the first and the second moment provides the following variance term:

$$\begin{aligned} \text {Var}_{\tau }[g(F_\tau (\pmb {x}))]&= \mathbb {E}_\tau \left[ |g(F_\tau (\pmb {x}))|^2\right] -|\mathbb {E}_\tau [g(F_\tau (\pmb {x}))]|^2\\&= \mathscr {K}_{(2)}[g\otimes \overline{g}](\pmb {x},\pmb {x})-|\mathscr {K}_{(1)}[g](\pmb {x})|^2\,. \end{aligned}$$

We integrate the local definition of variance over the entire domain to define:

$$\begin{aligned} \text {Var}_{\tau }[g(F_\tau )]&= \int _\Omega \text {Var}_{\tau }[g(F_\tau (\pmb {x})]\,\textrm{d}\omega (\pmb {x}). \end{aligned}$$

(14)

The following proposition provides a Koopman analog of decomposing an integrated mean squared error (IMSE).

Proposition 2

Let $g,h\in L^2(\Omega ,\omega )$, then

$$\begin{aligned} \begin{aligned}&\mathbb {E}_{\tau }\left[ \Vert g\circ F_\tau +h\Vert ^2\right] \\&\quad =\Vert \mathscr {K}_{(1)}[g]+h\Vert ^2+\int _{\Omega }\textrm{Var}_{\tau }\left[ \left( g\circ F_\tau \right) (\pmb {x})\right] \,\textrm{d}\omega (\pmb {x}). \end{aligned} \end{aligned}$$

(15)

Proof

We expand $|g(F_\tau (\pmb {x}))+h(\pmb {x})|^2$ for a fixed $\pmb {x}\in \Omega $ and take expectations to find that

$$\begin{aligned}&\mathbb {E}_{\tau }\left[ |g(F_\tau (\pmb {x}))+h(\pmb {x})|^2\right] \\&\quad =\mathbb {E}_{\tau }\left[ |g(F_\tau (\pmb {x}))|^2\right] {+}\mathscr {K}_{(1)}[g](\pmb {x})\overline{h(\pmb {x})}\\&\qquad {+}h(\pmb {x})\overline{\mathscr {K}_{(1)}[g](\pmb {x})}{+}|h(\pmb {x})|^2\\&\quad =|\mathscr {K}_{(1)}[g](\pmb {x})+h(\pmb {x})|^2+\mathbb {E}_{\tau }\left[ |g(F_\tau (\pmb {x}))|^2\right] \\&\qquad -\left| \mathbb {E}_{\tau }\left[ g(F_\tau (\pmb {x}))\right] \right| ^2. \end{aligned}$$

The result now follows by integrating over $\pmb {x}$ with respect to the measure $\omega $. $\square $

Similarly, for any two functions $g,h\in L^2(\Omega ,\omega )$, we define the covariance:

$$\begin{aligned} \mathscr {C}(g,h)= & {} \int _{\Omega }\mathbb {E}_{\tau }[(g\circ F_\tau {-}\mathscr {K}_{(1)}[g])\overline{(h\circ F_\tau {-}\mathscr {K}_{(1)}[h])}]\,\text {d}\omega (\pmb {x}) \nonumber \\ \end{aligned}$$

(16)

and obtain the following similar result using covariance:

$$\begin{aligned}&\int _{\Omega } \mathbb {E}_{\tau }[g(F_\tau (\pmb {x}))\overline{h(F_\tau (\pmb {x}))}] \,\textrm{d} \omega (\pmb {x}) \nonumber \\&\qquad =\langle \mathscr {K}[g],\mathscr {K}[h] \rangle + \mathscr {C}(g,h)\,. \end{aligned}$$

Proposition 2 is analogous to the decomposition of an IMSE and is practically useful. Suppose we use an observation h to approximate $-g\circ F_\tau $, in an attempt to minimize $\Vert g\circ F_\tau +h\Vert ^2$. An unbiased estimator is $-\mathscr {K}_{(1)}[g]$; however, this approximation will not be perfect due to the variance term in (15). Therefore, there is a variance-residual tradeoff for stochastic Koopman operators. Depending on the type of trajectory data collected, one can approximate the quantities $\mathbb {E}_{\tau }\left[ \Vert g\circ F_\tau +h\Vert ^2\right] $ and $\Vert \mathscr {K}_{(1)}[g]+h\Vert ^2$ in (15) and hence, estimate the third variance term.

Example 1

[Circle map] Let $\Omega =[0,1]_{\textrm{per}}$ be the periodic interval and consider

$$\begin{aligned} F(\pmb {x},\tau )=\pmb {x}+c+f(\pmb {x})+\tau \,\,\,\,\,\,\textrm{mod}(1), \end{aligned}$$

where $\Omega _s=[0,1]_{\textrm{per}}$, $\rho $ is absolutely continuous, and c is a constant. Let $\psi _j(\pmb {x})=e^{2\pi i j\pmb {x}}$ for $j\in \mathbb {Z}$. Then

$$\begin{aligned} \mathscr {K}_{(1)}[\psi _j](\pmb {x})=\psi _j(\pmb {x})e^{2\pi i jf(\pmb {x})}e^{2\pi i jc}\int _{\Omega _s} e^{2\pi i j\tau }\,\textrm{d}\rho (\tau ). \end{aligned}$$

(17)

Define the constants

$$\begin{aligned} \alpha _j=e^{2\pi i jc}\int _{\Omega _s} e^{2\pi i j\tau }\,\textrm{d}\rho (\tau ). \end{aligned}$$

Let D be the operator that multiplies each $\psi _j$ by $\alpha _j$. Then $\mathscr {K}_{(1)}=T D$, where T is the Koopman operator corresponding to $\pmb {x}\mapsto \pmb {x}+f(\pmb {x})$. Since $\rho $ is absolutely continuous, the Riemann–Lebesgue lemma implies that $\lim _{|j|\rightarrow \infty }\alpha _j=0$ and hence D is a compact operator. It follows that if T is bounded, then $\mathscr {K}_{(1)}$ is a compact operator. A straightforward computation using (14) shows that

$$\begin{aligned} \int _{\Omega }\textrm{Var}_{\tau }[\psi _j(F_\tau (\pmb {x}))]\,\textrm{d}\omega (\pmb {x}) = 1-|\alpha _j|^2. \end{aligned}$$

(18)

For example, if $f=0$, $\mathscr {K}_{(1)}$ has pure point spectrum with eigenfunctions $\psi _j$. However, as $|j|\rightarrow \infty $, the variance converges to one and $\psi _j$ become less statistically coherent. This example is explored further in Sect. 5.1. $\square $

Another immediate application of the variance term is in providing an estimated bound for the Koopman operator prediction of trajectories.

Proposition 3

We have

$$\begin{aligned} \begin{aligned}&\mathbb {P}\left( \left| g\circ F_{\tau _n}\circ \cdots \circ F_{\tau _1}(\pmb {x})-\mathscr {K}^n[g](\pmb {x})\right| \ge a\right) \\&\quad \le \frac{1}{a^2}\text {Var}_{\tau _1,\ldots ,\tau _n} \left[ g\circ F_{\tau _n}\circ \cdots \circ F_{\tau _1}(\pmb {x})\right] \\&\quad =\frac{1}{a^2}\left( \mathscr {K}_{(2)}^n[g\otimes \overline{g}](\pmb {x},\pmb {x})-|\mathscr {K}_{(1)}^n[g](\pmb {x})|^2\right) \end{aligned} \end{aligned}$$

(19)

for any $a>0$.

Proof

the result follows from combining Proposition 1 and (14) with Chernoff’s bound. $\square $

The bound can be combined with concentration bounds for $\Psi \tilde{K}^n-\mathscr {K}^n$ (see Sect. 4.2).

3.2 ResDMD in stochastic systems

In the deterministic setting, ResDMD provides an efficient way to evaluate the accuracy of candidate eigenpairs through the computation of an additional matrix L in (12). However, what happens in the stochastic setting?

Suppose that $(\lambda , g)$ is a candidate eigenpair of $\mathscr {K}_{(1)}$ with $g\in V_N$. Resembling (10), we consider

$$\begin{aligned} \frac{\mathbb {E}_{\tau }\left[ \Vert g\circ F_\tau -\lambda g\Vert ^2\right] }{\Vert g\Vert ^2}. \end{aligned}$$

(20)

We can write the numerator in terms of A, G, and L, i.e.,

$$\begin{aligned} \mathbb {E}_{\tau }\left[ \Vert g\circ F_\tau -\lambda g\Vert ^2\right]&=\pmb {g}^*(L-\lambda A^*-\overline{\lambda }A+|\lambda |^2G)\pmb {g}\\&=\lim _{M\rightarrow \infty }\pmb {g}^*(\tilde{L}-\lambda \tilde{A}^*-\overline{\lambda }\tilde{A}+|\lambda |^2\tilde{G})\pmb {g}. \end{aligned}$$

Hence, we define

$$\begin{aligned} {[}\textrm{res}^{\textrm{var}}(\lambda ,g)]^2=\frac{\pmb {g}^*\left[ \tilde{L}-\lambda \tilde{A}^*-\overline{\lambda }\tilde{A}+|\lambda |^2\tilde{G}\right] \pmb {g}}{\pmb {g}^*\tilde{G}\pmb {g}}, \end{aligned}$$

(21)

which furnishes an approximation of (20). Setting $h=-\lambda g$ in Proposition 2, we see that

$$\begin{aligned} \begin{aligned}&\mathbb {E}_{\tau }\left[ \Vert g\circ F_\tau -\lambda g\Vert ^2\right] \\&\quad =\mathbb {E}_{\tau }\left[ \int _{\Omega }|g(F_\tau (\pmb {x}))-\lambda g(\pmb {x})|^2\,\textrm{d}\omega (\pmb {x})\right] \\&\quad =\underbrace{\Vert \mathscr {K}_{(1)}[g]-\lambda g\Vert ^2}_{\text {squared residual}} +\underbrace{\int _{\Omega }\textrm{Var}_{\tau }\left[ g(F_\tau (\pmb {x}))\right] \,\textrm{d}\omega (\pmb {x})}_{\text {integrated variance of}\, g\circ F_\tau }. \end{aligned} \end{aligned}$$

(22)

Thus, $\textrm{res}^{\textrm{var}}(\lambda ,g)$ approximates the sum of the squared residual $\Vert \mathscr {K}[g]-\lambda g\Vert ^2$ and the integrated variance of $g\circ F_{\tau }$. For stochastic systems, the integrated variance of $g\circ F_\tau $ is usually nonzero so that

$$\begin{aligned} \lim _{M\rightarrow \infty }\textrm{res}^{\textrm{var}}(\lambda ,g)> \Vert \mathscr {K}_{(1)}[g]-\lambda g\Vert \Vert g\Vert . \end{aligned}$$

(23)

Based on this notion and drawing an analogy with Definition 1, we make the following definition.

Definition 2

For any $\lambda \in \mathbb {C}$, define:

$$\begin{aligned} \sigma _{\text {inf}}^{\text {var}} (\lambda )= & {} \inf \left\{ \sqrt{\mathbb {E}_{\tau }\left[ \Vert g\circ F_\tau {-}\lambda g\Vert ^2\right] }: g{\in }L^2(\Omega ,\omega ),\Vert g\Vert =1\right\} . \end{aligned}$$

For $\epsilon >0$, we define the variance-$\epsilon $-pseudospectrum as

$$\begin{aligned} \textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})=\textrm{Cl}\left( \left\{ \lambda \in \mathbb {C}:\sigma _{\textrm{inf}}^{\textrm{var}}(\lambda )<\epsilon \right\} \right) , \end{aligned}$$

where $\textrm{Cl}$ denotes the closure of a set. Furthermore, we say that g is a variance-$\epsilon $-pseudoeigenfunction if there exists $\lambda \in \mathbb {C}$ such that $\sqrt{\mathbb {E}_{\tau }\left[ \Vert g\circ F_\tau {-}\lambda g\Vert ^2\right] }\le \epsilon $.

Superficially, this definition is a straightforward extension of Definition 1. However, there are some essential differences. Both the conceptual understanding and the computation methods need to be modified.

First, the relation (22) shows that $\textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})$ takes into account uncertainty through the variance term. Hence, the variance-pseudospectrum provides a notion of statistical coherency. Furthermore, comparing Definitions 1 and 2, we have

$$\begin{aligned} \textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})\subset \textrm{Sp}_{\epsilon }(\mathscr {K}_{(1)}). \end{aligned}$$

If the dynamical system is deterministic, then $\textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})$ is equal to the approximate point $\epsilon $-pseudospectrum. However, in the presence of variance, they are no longer equal.

Second, the relation (22) gives a computational surprise. Following the same derivation between (10)–(13), with L, A, and G accordingly adjusted through replacing $\mathscr {K}$ by $\mathscr {K}_{(1)}$ in (11)–(12), we can still compute the variance-residual term. However, the original residual itself, $\textrm{res}(\lambda ,g)$, needs a modification. Recalling (10), in the same spirit of EDMD, if $g\in V_N$, we write

$$\begin{aligned}&\Vert \mathscr {K}_{(1)}[g]-\lambda g\Vert ^2\\&\quad = \langle \mathscr {K}_{(1)}[g]\,,\mathscr {K}_{(1)}[g]\rangle -\lambda \langle g,\mathscr {K}_{(1)}[g]\rangle \\&\qquad -\bar{\lambda } \langle \mathscr {K}_{(1)}[g],g\rangle + |\lambda |^2\langle g,g\rangle \\&\quad = \pmb {g}^*({H}-\lambda {A}^*-\overline{\lambda }{A}+|\lambda |^2{G})\pmb {g}, \end{aligned}$$

where H is a newly introduced matrix with

$$\begin{aligned} H_{i,j}=\langle \mathscr {K}_{(1)}[\psi _j],\mathscr {K}_{(1)}[\psi _i] \rangle . \end{aligned}$$

(24)

We employ the quadrature rule for the $\pmb {x}$-domain to approximate this new term. If $\texttt {S}$ is batched with $M_2=2$, then we can form the matrix

$$\begin{aligned} \tilde{H}_{i,j}=\sum _{l=1}^{M_1} w_{l} \psi _j(\pmb {y}^{(l,1)})\overline{\psi _i(\pmb {y}^{(l,2)})}. \end{aligned}$$

Since $\tau _{l,1}$ and $\tau _{l,2}$ are independent, we have

$$\begin{aligned} \lim _{M_1\rightarrow \infty } \tilde{H}_{i,j}=H_{i,j}=\langle \mathscr {K}[\psi _j],\mathscr {K}[\psi _i] \rangle . \end{aligned}$$

(25)

We stress that $\mathscr {K}_{(1)}$ is applied separately to $\psi _i$ and $\psi _j$ and thus $\tau _{l,1}$ and $\tau _{l,2}$ need to be independent realizations.

The convergence in (25) allows us to compute the spectral properties of $\mathscr {K}_{(1)}$ directly (see Sect. 3.3). In particular, instead of (13), we now have

$$\begin{aligned} {[}\textrm{res}(\lambda ,g)]^2=\frac{\pmb {g}^*\left[ \tilde{H}-\lambda \tilde{A}^*-\overline{\lambda }\tilde{A}+|\lambda |^2\tilde{G}\right] \pmb {g}}{\pmb {g}^*\tilde{G}\pmb {g}}\nonumber \\ \end{aligned}$$

(26)

and the approximate decomposition

$$\begin{aligned} \begin{aligned}&\int _{\Omega }\textrm{Var}_{\tau }\left[ g(F_\tau (\pmb {x}))\right] \,\textrm{d}\omega (\pmb {x})=\pmb {g}^*\left( L-H\right) \pmb {g}\\&\quad \approx \pmb {g}^*\left( \tilde{L}{-}\tilde{H}\right) \pmb {g}=\Vert g\Vert ^2\left( [\textrm{res}^{\textrm{var}}(\lambda ,g)]^2{-}[\textrm{res}(\lambda ,g)]^2\right) , \end{aligned} \end{aligned}$$

(27)

which becomes exact in the large data limit.

3.3 Algorithms

In the derivations above, we noticed that one-batched data permits computation only of $\textrm{res}^\textrm{var}(\lambda ,g)$, while two-batched data also permits the computation of $\textrm{res}(\lambda ,g)$. Algorithms 1 and 2 approximate the relative residuals of EDMD eigenpairs in the scenario of unbatched and batched data, respectively. In Algorithm 2, we have taken an average when computing $\tilde{A}$ and $\tilde{L}$ to reduce quadrature error, and an average when computing $\tilde{H}$ to ensure that it is self-adjoint (and positive semi-definite). Algorithm 3 approximates the pseudospectrum and corresponding pseudoeigenfunctions, given batched snapshot data. Algorithm 4 approximates the variance-pseudospectrum and corresponding variance-pseudoeigenfunctions, and does not need batched data. Note that the computational complexity of all of these algorithms scales the same as those for ResDMD, which is discussed in [20, 23]. In particular, Algorithms 1 and 2 scale the same as EDMD.

4 Theoretical guarantees

We now prove the correctness of the algorithms mentioned above. Specifically, through a series of theorems, we demonstrate that the computations of $\tilde{A},\tilde{G},\tilde{L}$, and $\tilde{H}$ are accurate and that the spectral estimates can be trusted. To achieve this, we divide the section into three subsections, each focusing on demonstrating the accuracy of the spectrum, the predictive power, and the matrices, respectively. The universal assumptions made in this section are as follows:

$\mathscr {K}_{(1)}$ is bounded.
$\{\psi _j\}_{j=1}^N$ are linearly independent for any finite N.
$V_N\subset V_{N+1}$ and the union, $\cup _N V_N$, is dense in $L^2(\Omega ,\omega )$.

The algorithms and proofs can be readily adapted for an unbounded $\mathscr {K}_{(1)}$. The latter two assumptions can also be relaxed with minor modifications.

4.1 Accuracy in finding spectral quantities

In this subsection, we prove the convergence of our algorithms. We have already discussed the convergence of residuals in Algorithms 1 and 2, under the assumption of convergence of the finite matrices $\tilde{G},\tilde{A},\tilde{L}$, and $\tilde{H}$ in the large data limit. Hence, we focus on Algorithm 4. We first define the functions

$$\begin{aligned} f_{M,N}(\lambda )=\min _{\pmb {g}\in \mathbb {C}^{N}} \textrm{res}^{\textrm{var}}(\lambda ,\Psi \pmb {g}), \end{aligned}$$

and note that $r_j=f_{M,N}(z_j)$ in Algorithm 4. Our first lemma describes the limit of these functions as $M\rightarrow \infty $ and $N\rightarrow \infty $.

Lemma 1

Suppose that

$$\begin{aligned} \lim _{M\rightarrow \infty } \tilde{G}=G,\quad \lim _{M\rightarrow \infty } \tilde{A}=A,\quad \lim _{M\rightarrow \infty } \tilde{L}=L, \end{aligned}$$

then $f_N(\lambda )=\lim _{M\rightarrow \infty }f_{M,N}(\lambda )$ exists. Moreover, $f_N$ is a nonincreasing function of N and converges to $\sigma _{\textrm{inf}}^{\textrm{var}}$ from above and uniformly on compact subsets of $\mathbb {C}$ as a function of the spectral parameter $\lambda $.

Proof

The limit $f_N(\lambda )=\lim _{M\rightarrow \infty }f_{M,N}(\lambda )$ follows trivially from the convergence of matrices. Moreover, we have

$$\begin{aligned} f_N(\lambda )&=\min _{\pmb {g}\in \mathbb {C}^{N}}\sqrt{\frac{\pmb {g}^*(L-\lambda A^*-\overline{\lambda }A+|\lambda |^2G)\pmb {g}}{\pmb {g}^*G\pmb {g}}}\\&=\inf \left\{ \sqrt{\mathbb {E}_{\tau }\left[ \Vert g\circ F_\tau -\lambda g\Vert ^2\right] }:g\in V_N,\Vert g\Vert =1\right\} . \end{aligned}$$

Since ${V}_{N}\subset {V}_{N+1}$, $f_N(\lambda )$ is nonincreasing in N. By definition, we also have

$$\begin{aligned} f_N(\lambda )\ge \sigma _{\textrm{inf}}^{\textrm{var}}(\lambda ). \end{aligned}$$

Let $\delta >0$ and choose $g\in L^2(\Omega ,\omega )$ such that $\Vert g\Vert =1$ and

$$\begin{aligned} \sqrt{\mathbb {E}_{\tau }\left[ \Vert g\circ F_\tau -\lambda g\Vert ^2\right] }\le \sigma _{\textrm{inf}}^{\textrm{var}}(\lambda )+\delta . \end{aligned}$$

Since $\cup _N V_N$ is dense in $L^2(\Omega ,\omega )$, there exists some n and $g_{n}\in {V}_{n}$ such that $\Vert g_n\Vert =1$ and

$$\begin{aligned} \sqrt{\mathbb {E}_{\tau }\left[ \Vert g_n\circ F_\tau -\lambda g_n\Vert ^2\right] }\le \sqrt{\mathbb {E}_{\tau }\left[ \Vert g\circ F_\tau -\lambda g\Vert ^2\right] }+\delta . \end{aligned}$$

It follows that $f_n(\lambda )\le \sigma _{\textrm{inf}}^{\textrm{var}}(\lambda )+2\delta $. Since this holds for any $\delta >0$, $\lim _{N\rightarrow \infty }f_N(\lambda )=\sigma _{\textrm{inf}}^{\textrm{var}}(\lambda )$. Since $\sigma _{\textrm{inf}}^{\textrm{var}}(\lambda )$ is continuous in $\lambda $, $f_N$ converges uniformly down to $\sigma _{\textrm{inf}}^{\textrm{var}}$ on compact subsets of $\mathbb {C}$ by Dini’s theorem. $\square $

Let $\{\textrm{Grid}(N)=\{z_{1,N},z_{2,N},\ldots ,z_{k(N),N}\}\}$ be a sequence of grids, each finite, such that for any $\lambda \in \mathbb {C}$,

$$\begin{aligned} \lim _{N\rightarrow \infty }\textrm{dist}(\lambda ,\textrm{Grid}(N))=0. \end{aligned}$$

For example, we could take

$$\begin{aligned} \textrm{Grid}(N)=\frac{1}{N}\left[ \mathbb {Z}+i\mathbb {Z}\right] \cap \{z\in \mathbb {C}:|z|\le N\}. \end{aligned}$$

(28)

In practice, one considers a grid of points over the region of interest in the complex plane. Lemma 1 tells us that to study Algorithm 4 in the large data limit, we must analyze

$$\begin{aligned} \Gamma ^\epsilon _{N}(\mathscr {K}_{(1)})=\left\{ \lambda \in \textrm{Grid}(N):f_N(\lambda )<\epsilon \right\} . \end{aligned}$$

To make the convergence of Algorithm 4 precise, we use the Attouch–Wets metric defined by [5]:

$$\begin{aligned}{} & {} d_{\text {AW}} (C_1,C_2)\quad =\!\sum _{n=1}^{\infty } 2^{-n}\min \big \{1,\underset{\left| x\right| \le n}{\sup }\left| \text {dist} (x,C_1) - \text {dist}(x,C_2)\right| \big \}, \end{aligned}$$

where $C_1,C_2$ are closed nonempty subsets of $\mathbb {C}$. This metric corresponds to local uniform converge on compact subsets of $\mathbb {C}$. For any closed nonempty sets C and $C_n$, $d_{\textrm{AW}}(C_n,C)\rightarrow {0}$ if and only if for any $\delta >0$ and $B_m(0)$ (closed ball of radius $m\in \mathbb {N}$ about 0), there exists N such that if $n>N$ then $C_n\cap B_m(0)\subset {C+B_{\delta }(0)}$ and $C\cap B_m(0)\subset {C_n+B_{\delta }(0)}$. The following theorem contains our convergence result.

Theorem 1

(Convergence to variance-pseudospectrum) Let $\epsilon >0$. Then, $\Gamma ^\epsilon _{N}(\mathscr {K}_{(1)})\subset \textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})$ and

$$\begin{aligned} \lim _{N\rightarrow \infty }d_{\textrm{AW}}\left( \Gamma ^\epsilon _{N}(\mathscr {K}_{(1)}),\textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})\right) =0. \end{aligned}$$

Proof

Lemma 1 shows that $\Gamma ^\epsilon _{N}(\mathscr {K}_{(1)})\subset \textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})$. To prove convergence, we use the characterization of the Attouch–Wets topology. Suppose that m is large such that $B_m(0)\cap \textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})\ne \emptyset $. Since $\Gamma ^\epsilon _{N}(\mathscr {K}_{(1)})\subset \textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})$, we clearly have $\Gamma _{N}^{\epsilon }(\mathscr {K}_{(1)})\cap B_m(0)\subset \textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})$. Hence, we must show that given $\delta >0$, there exists $n_0$ such that if $N>n_0$ then $\textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})\cap B_m(0)\subset {\Gamma _{N}^{\epsilon }(\mathscr {K}_{(1)})+B_{\delta }(0)}$. Suppose for a contradiction that this statement is false. Then, there exists $\delta >0$, $\lambda _{n_j}\in \textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})\cap B_m(0)$, and $n_j\rightarrow \infty $ such that

$$\begin{aligned} \textrm{dist}(\lambda _{n_j},\Gamma _{n_j}^{\epsilon }(\mathscr {K}_{(1)}))\ge \delta . \end{aligned}$$

Without loss of generality, we can assume that $\lambda _{n_j}\rightarrow \lambda \in \textrm{Sp}_{\epsilon }^{\textrm{var}}(\mathscr {K}_{(1)})\cap B_m(0)$. There exists some z with $\sigma _{\textrm{inf}}^{\textrm{var}}(z)<\epsilon $ and $\left| \lambda -z\right| \le \delta /2$. Let $z_{n_j}\in \textrm{Grid}(n_j)$ such that $|z-z_{n_j}|\le \textrm{dist}(z,\textrm{Grid}(n_j))+{n_j}^{-1}.$ Since $\sigma _{\textrm{inf}}^{\textrm{var}}$ is continuous and $f_N$ converges locally uniformly to $\sigma _{\textrm{inf}}^{\textrm{var}}$, we must have $f_{n_j}(z_{n_j})<\epsilon $ for large $n_j$ so that $z_{n_j}\in \Gamma _{n_j}^{\epsilon }(\mathscr {K}_{(1)})$. But $ \left| z_{n_j}-\lambda \right| \le \left| z-\lambda \right| +\left| z_{n_j}-z\right| \le \delta /2 + |z-z_{n_j}|, $ which is smaller than $\delta $ for large $n_j$, and we reach the desired contradiction. $\square $

4.2 Error bounds for iterations

We now aim to bound the difference between $\tilde{K}^n$ and $\mathscr {K}^n$, a step crucial for measuring the accuracy of our approximation of the mean trajectories in $L^2(\Omega ,\omega )$. This effort, in conjunction with the Chernoff-like bound presented in (19), enables us to compute the statistical properties of the trajectories and their forecasts. Our approach to establishing these bounds is twofold. First, we consider the difference between $\tilde{K}^n$ and $\mathscr {K}^n$, taking into account both the estimation errors and the errors intrinsic to the subspace. Subsequently, we establish concentration bounds for the estimation errors of $\tilde{G}$, $\tilde{A}$, and $\tilde{L}$.

Theorem 2

(Error bound for forecasts) Define the quantities

$$\begin{aligned} I_G&=G^{\frac{1}{2}}\tilde{G}^{-\frac{1}{2}},\\ \Delta _G&=\Vert I_G\Vert \Vert (I-I_G^{-1})\Vert +\Vert (I-I_G)\Vert ,\\ \Delta _A&=\Vert \mathscr {K}\Vert (1+\Vert I_G\Vert )\Vert I_G-I\Vert +\Vert I_G\Vert ^2 \Vert G^{-\frac{1}{2}}(A-\tilde{A})G^{- \frac{1}{2}}\Vert . \end{aligned}$$

Let $g=\sum _{j=1}^N\pmb {g}_j\psi _j\in V_N$ and suppose that

$$\begin{aligned} \Vert \mathscr {K}^n_{(1)}g-\mathscr {P}_{V_N}^*(\mathscr {P}_{V_N}\mathscr {K}_{(1)}\mathscr {P}_{V_N}^*)^ng\Vert \le \delta _n(g)\Vert g\Vert . \end{aligned}$$

Then

$$\begin{aligned} \Vert \Psi \tilde{K}^n\pmb {g}-\mathscr {K}^n_{(1)}g\Vert \le C_n\Vert g\Vert , \end{aligned}$$

where

$$\begin{aligned} C_n=\left[ \frac{\Vert \mathscr {K}\Vert ^n-\Delta _A^n}{\Vert \mathscr {K}\Vert -\Delta _A}\Delta _A(\Delta _G+1)+\Vert \mathscr {K}\Vert ^n\Delta _G+\delta _n(g)\right] . \end{aligned}$$

Proof

We introduce the two matrices

$$\begin{aligned} T=G^{-1/2}AG^{-1/2},\quad \tilde{T}=\tilde{G}^{-1/2}\tilde{A}\tilde{G}^{-1/2}. \end{aligned}$$

Note that

$$\begin{aligned} \Vert T\Vert =\sup _{x\in \mathbb {C}^N}\frac{\Vert TG^{1/2}x\Vert }{\Vert G^{1/2}x\Vert }&=\sup _{x\in \mathbb {C}^N}\frac{\Vert G^{1/2}Kx\Vert }{\Vert G^{1/2}x\Vert }\\&=\Vert \mathscr {P}_{V_{N}}\mathscr {K}\mathscr {P}_{V_{N}}^*\Vert \le \Vert \mathscr {K}\Vert . \end{aligned}$$

We can re-write $\tilde{T}$ as

$$\begin{aligned} \tilde{T}&=I_G^*G^{-1/2}{\tilde{A}} G^{-1/2}I_G\\&=I_G^*TI_G+I_G^*G^{-1/2}({\tilde{A}}-A)G^{-1/2}I_G\\&=T+(I_G-I)^*TI_G+T(I_G-I)\\&\quad +I_G^*G^{-1/2}({\tilde{A}}-A)G^{-1/2}I_G. \end{aligned}$$

It follows that

$$\begin{aligned} \Vert T-\tilde{T}\Vert&\le \Vert \mathscr {K}\Vert (1+\Vert I_G\Vert )\Vert I_G-I\Vert \\&\quad +\Vert I_G\Vert ^2\Vert G^{-1/2}(A-\tilde{A})G^{-1/2}\Vert \\&=\Delta _A. \end{aligned}$$

We have that

$$\begin{aligned} T^n-\tilde{T}^n=T(T^{n-1}-\tilde{T}^{n-1})+({T}-\tilde{T})\tilde{T}^{n-1}. \end{aligned}$$

A simple proof by induction now shows that

$$\begin{aligned} \Vert T^n-\tilde{T}^n\Vert&\le \Vert {T}-\tilde{T}\Vert \sum _{j=0}^{n-1}\Vert T\Vert ^j\Vert \tilde{T}\Vert ^{n-1-j}\\&\le \Delta _A\sum _{j=0}^{n-1}\Vert \mathscr {K}\Vert ^{j}(\Vert \mathscr {K}\Vert +\Delta _A)^{n-1-j}\\&= \Delta _A\frac{\Vert \mathscr {K}\Vert ^n-\Delta _A^n}{\Vert \mathscr {K}\Vert -\Delta _A}. \end{aligned}$$

We wish to bound the quantity

$$\begin{aligned}&\Vert \Psi K^n\pmb {g}-\Psi \tilde{K}^n\pmb {g}\Vert =\Vert {T}^n{G}^{1/2}\pmb {g}-I_G\tilde{T}^n\tilde{G}^{1/2}\pmb {g}\Vert \\&\quad \le \Vert {T}^n-\tilde{T}^n\Vert \Vert g\Vert +\Vert \tilde{T}^n{G}^{1/2}\pmb {g}-I_G\tilde{T}^n\tilde{G}^{1/2}\pmb {g}\Vert . \end{aligned}$$

We can express the final term on the right-hand side as

$$\begin{aligned} \tilde{T}^n{G}^{1/2}\pmb {g}-I_G{\tilde{T}}^n\tilde{G}^{1/2}\pmb {g}&=I_G\tilde{T}^n(I-I_G^{-1}){G}^{1/2}\pmb {g}\\&\quad +(I-I_G){\tilde{T}}^n{G}^{1/2}\pmb {g}. \end{aligned}$$

It follows that

$$\begin{aligned}&\Vert \tilde{T}^n{G}^{1/2}\pmb {g}-I_G\tilde{T}^n\tilde{G}^{1/2}\pmb {g}\Vert \le \Vert \tilde{T}^n\Vert \Vert {G}^{1/2}\pmb {g}\Vert \Delta _G\\&\quad \le \left( \Vert \mathscr {K}\Vert ^n+\Vert {T}^n-\tilde{T}^n\Vert \right) \Delta _G\Vert g\Vert \end{aligned}$$

and hence that

$$\begin{aligned}&\Vert \Psi K^n\pmb {g}{-}\Psi \tilde{K}^n\pmb {g}\Vert {\le } \left[ \Vert {T}^n{-}\tilde{T}^n\Vert (\Delta _G{+}1){+}\Vert \mathscr {K}\Vert ^n\Delta _G\right] \Vert g\Vert \\&\quad \le \left[ \frac{\Vert \mathscr {K}\Vert ^n-\Delta _A^n}{\Vert \mathscr {K}\Vert -\Delta _A}\Delta _A(\Delta _G+1)+\Vert \mathscr {K}\Vert ^n\Delta _G\right] \Vert g\Vert . \end{aligned}$$

The theorem now follows from the triangle inequality

$\square $

This theorem explicitly tells us how much to trust the prediction using the computed Koopman matrix, compared with the true Koopman operator. The quantities $\Delta _G$ and $\Delta _A$ represent errors due to estimation or quadrature. They are both expected to be small. The quantity $\delta _n(g)$ is an intrinsic invariant subspace error that depends on the dictionary and observable g. To approximate $\delta _n(g)$, note that

$$\begin{aligned} \mathscr {K}^{n}[g]{-}\Psi K^n\pmb {g}{=}\sum _{j=1}^n\mathscr {K}^{n-j}[\mathscr {K}[\Psi K^{j-1}\pmb {g}]{-}\Psi K^j \pmb {g}] \end{aligned}$$

and hence

$$\begin{aligned} \Vert \mathscr {K}^{n}[g]{-}\Psi K^n\pmb {g}\Vert {\le }\sum _{j=1}^n\Vert \mathscr {K}\Vert ^{n{-}j}\Vert \mathscr {K}[\Psi K^{j{-}1}\pmb {g}]{-}\Psi K^j \pmb {g}\Vert . \end{aligned}$$

(29)

To bound the term on the right-hand side, we can use the matrix H in (24) and the fact that

$$\begin{aligned} \Vert \mathscr {K}\Psi \pmb {v}{-}\Psi Kv\Vert =\sqrt{\pmb {v}^*H\pmb {v}{-}2\textrm{Re}(\pmb {v}^*K^*A\pmb {v}){+}\pmb {v}^*K^*GK\pmb {v}} \end{aligned}$$

(30)

for any $\pmb {v}\in \mathbb {C}^N$.

4.3 Estimation error for computation of A, G, and L

To effectively estimate $\mathscr {K}_{(1)}g$ and $\textrm{Sp}_\epsilon ^\textrm{var}(\mathscr {K}_{(1)})$ in practical applications, it is imperative to have reliable approximations of A, G, and L. We provide a justification for our ability to construct such approximations from trajectory data with high probability, employing concentration bounds. The subsequent result delineates the requisite number of samples and basis functions needed to achieve a desired level of accuracy with high probability. To ensure this level of accuracy, several reasonable assumptions about the stochastic dynamical system are necessary.

Assumption 1

We suppose that $\pmb {x}^{(m)}$ in the snapshot data are sampled at random according to $\omega $, independent of $\tau $, and for simplicity, assume that $\omega $ is a probability measure.^{Footnote 6} We assume that $\tau :\Omega _s\rightarrow \mathscr {H}$ for some Hilbert space $\mathscr {H}$ and let $\kappa =(\pmb {x},\tau )$. In this section, $\mathbb {E}$ and $\mathbb {P}$ are with respect to the joint distribution of $\kappa $. We assume that

The random variable $\kappa $ is sub-Gaussian, meaning that there exists some $a>0$ such that
$$\begin{aligned} \mathbb {E}\left[ e^{\Vert \kappa -\mathbb {E}(\kappa )\Vert ^2/a^2}\right] <\infty . \end{aligned}$$
This allows us to define the following finite quantity:
$$\begin{aligned} \Upsilon =\inf \left\{ s>0:e^{\frac{\mathbb {E}[\Vert \kappa -\mathbb {E}(\kappa )\Vert ^2]}{s^2}}\mathbb {E}\left[ e^{\frac{1}{s^2}\Vert \kappa -\mathbb {E}(\kappa )\Vert ^2}\right] \le 2\right\} . \end{aligned}$$
The dictionary functions are uniformly bounded and satisfy the following Lipschitz condition:
$$\begin{aligned} |\psi _k(\pmb {x})-\psi _k(\pmb {x}')|\le c_k\Vert \pmb {x}-\pmb {x}'\Vert . \end{aligned}$$
The function F is Lipschitz with
$$\begin{aligned} \Vert F(\kappa )-F(\kappa ')\Vert \le c\Vert \kappa -\kappa '\Vert . \end{aligned}$$

With these assumptions, we can show that our approximations of A, G, and L are good with high probability.

Theorem 3

(Concentration bound on estimation errors) Under Assumption 1 we have, for any $t>0$,

$$\begin{aligned}&\mathbb {P}\left( \Vert \tilde{A}{-} A\Vert _{\textrm{Fr}}< t\right) {\ge }1{-}\exp \left( 2\log (2N){-}\frac{Mt^2}{24\Upsilon ^2(c^2{+}1)\alpha ^2\beta ^2}\right) \\&\mathbb {P}\left( \Vert \tilde{G}{-} G\Vert _{\textrm{Fr}}< t\right) {\ge }1{-}\exp \left( 2\log (2N){-}\frac{Mt^2}{48\Upsilon ^2\alpha ^2\beta ^2}\right) \\&\mathbb {P}\left( \Vert \tilde{L}{-} L\Vert _{\textrm{Fr}}< t\right) {\ge }1{-}\exp \left( 2\log (2N){-}\frac{Mt^2}{48\Upsilon ^2c^2\alpha ^2\beta ^2}\right) , \end{aligned}$$

where $\Vert \cdot \Vert _{\textrm{Fr}}$ denotes the Frobenius norm, and $\alpha $ and $\beta $ are given by

$$\begin{aligned} \alpha =\sqrt{\sum _{k=1}^Nc_k^2},\quad \beta =\sqrt{\sum _{k=1}^N\Vert \psi _k\Vert _{L^\infty }^2}. \end{aligned}$$

Proof

We first argue for $\Vert \tilde{A}-A\Vert _{\textrm{Fr}}$. Fix $j,k\in \{1,\ldots ,N\}$ and define the random variable

$$\begin{aligned} X=\psi _k(F(\pmb {x},\tau ))\overline{\psi _j(\pmb {x})}. \end{aligned}$$

Then

$$\begin{aligned} \left| X(\kappa )-X(\kappa ')\right| \le (c_kc\Vert \psi _j\Vert _{L^\infty }+c_j\Vert \psi _k\Vert _{L^\infty })\Vert \kappa -\kappa '\Vert . \end{aligned}$$

Let $c_{j,k}=c_kc\Vert \psi _j\Vert _{L^\infty }+c_j\Vert \psi _k\Vert _{L^\infty }$. The above Lipschitz bound for X implies that

$$\begin{aligned} \left| \mathbb {E}[X]-X(\kappa ')\right|&\le c_{j,k}\int _{\Omega \times \Omega _s}\Vert \kappa -\kappa '\Vert \,\textrm{d} \mathbb {P}(\kappa )\\&\le c_{j,k}\sqrt{\Vert \kappa -\mathbb {E}(\kappa )\Vert ^2+\mathbb {E}(\Vert \kappa -\mathbb {E}(\kappa )\Vert ^2)}, \end{aligned}$$

where we have used Hölder’s inequality to derive the last line. It follows that

$$\begin{aligned} \mathbb {E}\left[ \exp \left( \frac{\left| \mathbb {E}[X]-X\right| ^2}{\Upsilon ^2c_{j,k}^2}\right) \right] \le 2. \end{aligned}$$

Let $Y=\textrm{Re}\left( \mathbb {E}\left[ X\right] -X\right) $ and $\lambda \ge 0$. Since $\mathbb {E}[Y]=0$, we have

$$\begin{aligned} \mathbb {E}\left[ \exp \left( \lambda Y\right) \right]= & {} 1+\sum _{l=2}^\infty \frac{\lambda ^l\mathbb {E}[Y^l]}{l!}\\\le & {} 1+\frac{\lambda ^2}{2}\mathbb {E}\left[ Y^2\exp (\lambda |Y|)\right] . \end{aligned}$$

For any $b>0$, we have $\lambda |Y|\le \lambda ^2/(2b)+b|Y|^2/2$. We also have $bY^2\le \exp (bY^2/2)$. It follows that

$$\begin{aligned} \mathbb {E}\left[ \exp \left( \lambda Y\right) \right] \le 1+\frac{\lambda ^2}{2b}e^{\lambda ^2/(2b)}\mathbb {E}\left[ \exp (bY^2)\right] . \end{aligned}$$

We select $b=1/(\Upsilon ^2c_{j,k}^2)$ and use the fact that $\mathbb {E}\left[ \exp (bY^2)\right] \le \mathbb {E}\left[ \exp (b|\mathbb {E}[X]-X|^2)\right] \le 2$ to obtain

$$\begin{aligned} \mathbb {E}\left[ \exp \left( \lambda Y\right) \right] \le 1+\frac{\lambda ^2}{b}e^{\frac{\lambda ^2}{2b}} \le \left( 1+\frac{\lambda ^2}{b}\right) e^{\frac{\lambda ^2}{2b}} \le e^{\frac{3\lambda ^2}{2b}}. \end{aligned}$$

Now let $\{Y^{(m)}\}_{m=1}^{M}$ independent copies of Y, then

$$\begin{aligned}&\mathbb {P}\left( \frac{1}{M}\sum _{m=1}^{M}Y^{(m)}\ge t\right) \\&\quad =\mathbb {P}\left( \exp (\lambda \sum _{m=1}^{M}Y^{(m)})\ge \exp (\lambda Mt) \right) \\&\quad \le e^{-\lambda Mt}\mathbb {E}\left[ \exp \left( \lambda \sum _{m=1}^{M}Y^{(m)}\right) \right] \\&\quad = e^{-\lambda Mt}\prod _{m=1}^{M}\mathbb {E}\left[ \exp \left( \lambda Y\right) \right] \\&\quad \le \exp \left( 3M\lambda ^2/(2b)-\lambda M t\right) , \end{aligned}$$

where we use Markov’s inequality in the first inequality. Minimizing over $\lambda $, we obtain

$$\begin{aligned} \mathbb {P}\left( \frac{1}{M}\sum _{m=1}^{M}Y^{(m)}\ge t\right) \le \exp \left( -Mbt^2/6\right) . \end{aligned}$$

We can argue in the same manner for $-Y$ and deduce that

$$\begin{aligned} \mathbb {P}\left( \frac{1}{M}\left| \sum _{m=1}^{M}Y^{(m)}\right| \ge t\right) \le 2\exp \left( -Mbt^2/6\right) . \end{aligned}$$

Similarly, we can argue for the imaginary part of $\mathbb {E}[X]-X$.

We now allow j, k to vary and let $X_{j,k}=\psi _k(F(\pmb {x},\tau ))\overline{\psi _j(\pmb {x})}$. For $t>0$, consider the events

$$\begin{aligned} S_{j,k,1}&:\frac{1}{M}\left| \sum _{m=1}^{M}\textrm{Re}\left( \mathbb {E}[X_{j,k}]-X_{j,k}(\kappa _m)\right) \right| \\&\quad< \frac{t\Upsilon c_{j,k}}{\sqrt{2\Upsilon ^2 \sum _{l,p=1}^Nc_{l,p}^2}},\\ S_{j,k,2}&:\frac{1}{M}\left| \sum _{m=1}^{M}\textrm{Im}\left( \mathbb {E}[X_{j,k}]-X_{j,k}(\kappa _m)\right) \right| \\&\quad < \frac{t\Upsilon c_{j,k}}{\sqrt{2\Upsilon ^2 \sum _{l,p=1}^Nc_{l,p}^2}}. \end{aligned}$$

Then

$$\begin{aligned} \mathbb {P}(\cap _{j,k,i}S_{j,k,i})&\ge 1 - \sum _{j,k=1}^N (\mathbb {P}(S_{j,k,1}^c)+\mathbb {P}(S_{j,k,2}^c))\\&\ge 1-4N^2\exp \left( -\frac{Mt^2}{12\Upsilon ^2\sum _{l,p=1}^Nc_{l,p}^2}\right) . \end{aligned}$$

Moreover, the AM-GM inequality implies that

$$\begin{aligned} c_{l,p}^2\le 2c^2c_k^2\Vert \psi _j\Vert _{L^\infty }^2+2c_j^2\Vert \psi _k\Vert _{L^\infty }^2 \end{aligned}$$

and hence

$$\begin{aligned} \sum _{l,p=1}^Nc_{l,p}^2\le 2(c^2+1)\alpha ^2\beta ^2. \end{aligned}$$

It follows that

$$\begin{aligned} \mathbb {P}(\cap _{j,k,i}S_{j,k,i})\ge 1-\exp \left( 2\log (2N)-\frac{Mt^2}{24\Upsilon ^2(c^2+1)\alpha ^2\beta ^2}\right) . \end{aligned}$$

If $\cap _{j,k,i}S_{j,k,i}$, then $\Vert \tilde{A}-A\Vert _{\textrm{Fr}}< t$. We can argue in the same manner, without the function F, to deduce that

$$\begin{aligned} \mathbb {P}(\Vert \tilde{G}-G\Vert _{\textrm{Fr}}< t)\ge 1-\exp \left( 2\log (2N)-\frac{Mt^2}{48\Upsilon ^2\alpha ^2\beta ^2}\right) . \end{aligned}$$

Finally, for the matrix L and its estimate $\tilde{L}$, we derive similar concentration bounds for $\psi _k(F(\pmb {x},\tau ))\overline{\psi _j(F(\pmb {x},\tau ))}$ to see that

$$\begin{aligned} \mathbb {P}(\Vert \tilde{L}-L\Vert _{\textrm{Fr}}< t)\ge 1-\exp \left( 2\log (2N)-\frac{Mt^2}{48\Upsilon ^2c^2\alpha ^2\beta ^2}\right) . \end{aligned}$$

The statement of the theorem now follows. $\square $

This theorem explicitly spells out the number of basis functions and samples required to approximate the three matrices appearing in Theorem 2. Roughly speaking, if we set

$$\begin{aligned} \exp \left( 2\log (2N)-{Mt^2}\right) \sim N^2\exp \left( -Mt^2\right) \le \delta , \end{aligned}$$

then

$$\begin{aligned} M\sim |\ln {\delta }-2\ln {N}|/{t^2}. \end{aligned}$$

For any fixed tolerance t, the confidence exponentially tightens up when M, the number of samples, increases. The idea is similar to other concentration inequality type bounds: if one samples from the same distribution many times, the sample mean becomes closer and closer to the true mean, and this bound gives the confidence interval for the tail bound. On the other hand, when N increases, more entries in the matrices need to be approximated, so it brings a logarithmically negative effect. More samples are needed to balance out the increase of N.

5 Examples

We now present three examples. The first two are based on numerically sampled trajectory data, while the final example utilizes collected experimental data.

5.1 Arnold’s circle map

For our first example, we revisit the circle map discussed in Example 1, setting $c=1/5$, $\rho $ as the uniform distribution on [0, 1], and defining

$$\begin{aligned} f(\pmb {x})=\frac{1}{4\pi }\sin (2\pi \pmb {x}). \end{aligned}$$

Our dictionary consists of Fourier modes $\{\exp (ij\pmb {x}):j=-n,\ldots ,n\}$ with $n=20$ (yielding $N=41$), and we use batched trajectory data with $M_1=100$ equally spaced $\{\pmb {x}^{(j)}\}$, and $M_2=2\times 10^4$. Figure 2 illustrates the convergence of the matrices $\tilde{A},\tilde{L}$, and $\tilde{H}$. We do not display the convergence of $\tilde{G}$ as its error was on the order of machine precision, a result of the exponential convergence achieved by the trapezoidal quadrature rule across different batches. Figure 3 shows the residuals computed using Algorithm 2. The quantity $\textrm{res}^{\textrm{var}}(\lambda ,g)$ deviates from (18) (the formula for $f=0$), particularly when $|\lambda |$ is small. As n increases, the residuals $\textrm{res}(\lambda ,g)$ converge to zero, indicating more accurate computation of the spectral content of $\mathscr {K}_{(1)}$. However, the residuals $\textrm{res}^{\textrm{var}}(\lambda ,g)$ converge to finite positive values, except for the trivial eigenvalue 1, which satisfies $\lim _{M\rightarrow \infty }\textrm{res}^{\textrm{var}}(\lambda ,g)=0$.

To underscore the significance of variance in our analysis, Fig. 4 displays the absolute value of the matrix $\tilde{L}-\tilde{H}$, which approximates the covariance matrix defined in (16). Notably, the covariance disappears for the constant function $\exp (ij\pmb {x})$ with $j=0$, and the matrix is diagonally dominated. Figure 5 presents the results obtained from applying Algorithms 3 and 4. These results align in areas where the variance is minimal (large $|\lambda |$). However, in regions where $|\lambda |$ is small, the variance component in (27) becomes significant. This observation leads us to infer that only about seven eigenpairs are of meaningful significance in a statistically coherent framework.

5.2 Stochastic Van der Pol oscillator

We now consider the stochastic differential equation

$$\begin{aligned} \textrm{d} X_1&= X_2 \textrm{d}t\\ \textrm{d}X_2&= \left[ \mu (1-X_1^2)X_2-X_1\right] \textrm{d}t +\sqrt{2\delta }\textrm{d} B_t, \end{aligned}$$

where $B_t$ denotes standard one-dimensional Brownian motion, $\delta >0$, and $\mu >0$.^{Footnote 7} This equation represents a noisy version of the Van der Pol oscillator. In the absence of noise, the Van der Pol oscillator exhibits a limit cycle to which all initial conditions converge, except for the unstable fixed point at the origin. The introduction of noise transforms the system, resulting in a global attractor that forms a band around the deterministic system’s limit cycle.

Table 1 Computed eigenvalues of the stochastic Van der Pol oscillator, and the residuals computed using Algorithm 2. We have ordered them according to perturbations of $\hat{\lambda }_{m,k}$. Due to conjugate symmetry, we have only shown eigenvalues with non-negative imaginary parts

Full size table

The generator of the stochastic solutions, known as the backward Kolmogorov operator, is described in [25, Section 9.3]. It is a second-order elliptic type differential operator $\mathscr {L}$, defined by

$$\begin{aligned} {[}\mathscr {L}g](X_1,X_2)&= \begin{pmatrix} \pmb {x}_2\\ \mu (1-X_1^2)X_2-X_1 \end{pmatrix} \cdot \nabla g(X_1,X_2)\\&\quad +\delta \nabla ^2g(X_1,X_2). \end{aligned}$$

For a discrete times step $\Delta _t$, the Koopman operator is given by $\exp (\Delta _t \mathscr {L})$. In the absence of noise ($\delta =0$), the Koopman operator has eigenvalues forming a lattice [53, Theorem 13]:

$$\begin{aligned} \left\{ \hat{\lambda }_{m,k}=\exp ([-m\mu + ik\omega _0]\Delta _t):k\in \mathbb {Z},m\in \mathbb {N}\cup \{0\}\right\} , \end{aligned}$$

where $\omega _0\approx 1-\mu ^2/16$ is the base frequency of the limit cycle [74]. When $\delta $ is moderate, the base frequency of the averaged limit cycle remains similar to that in the deterministic case [45].

We simulate the dynamics using the Euler–Maruyama method [65] with a time step of $3\times 10^{-3}$. Data are collected along a single trajectory of length $M_1=10^6$ with $M_2=2$, starting the sampling after the trajectory reaches the global attractor. We employ 318 Laplacian radial basis functions with centers on the attractor as our dictionary. The parameters are set to $\mu =0.5$, $\delta =0.02$, and $\Delta _t=0.3$.

Figure 6 displays the results obtained using Algorithms 3 and 4. Similar to observations from the circle map example, $\textrm{Sp}_\epsilon (\mathscr {K}_{(1)})$ and $\textrm{Sp}_\epsilon ^\textrm{var}(\mathscr {K}_{(1)})$ exhibit greater similarity near the unit circle. The lattice-like structure in the eigenvalues is also evident, with the EDMD-computed eigenvalues appearing as perturbations of the set $\{\hat{\lambda }_{m,k}\}$. Table 1 lists some of these eigenvalues alongside the residuals calculated using Algorithm 2. We observe that as |k| increases, $\textrm{res}(\lambda ,g)$ also increases, and similarly, $\textrm{res}^{\textrm{var}}(\lambda ,g)$ increases with m. For any given eigenvalue, $\textrm{res}(\lambda ,g)$ decreases to zero with larger dictionaries. In contrast, $\textrm{res}^{\textrm{var}}(\lambda ,g)$ approaches a finite nonzero value, except for the trivial eigenvalue, which has a constant eigenfunction exhibiting zero variance. Figure 7 illustrates the corresponding eigenfunctions on the attractor, showcasing their beautiful modal structure.

In this example, the norm of the Koopman operator $\Vert \mathscr {K}\Vert $ is approximately 1, and the subspace error $\delta _n(g)$ predominantly contributes to the bound established in Theorem 2. We analyze the two observables $X_1$ and $X_2$, each starting from a point randomly selected on the attractor. Figure 8 presents the calculated values of $\delta _n(X_1)$ and $\delta _n(X_2)$ as per (29) and (30), along with the variance of the trajectory. Additionally, Fig. 9 compares the values computed using $K^nX_i$ with the actual values of $\mathscr {K}^nX_i$, obtained by integrating the generator $\mathscr {L}$. Together, these figures demonstrate the convergence of the mean trajectories toward the dominant subspace of $\mathscr {K}$.

5.3 Neuronal population dynamics

As a final example, we apply our approach to experimental neuroscience data. Recent technological advancements in this field now allow for the simultaneous monitoring of large neuronal populations in the brains of awake, behaving animals. This development has spurred significant interest in employing data-driven methods to derive physically meaningful insights from high-dimensional neural measurements [62].

To analyze complex neural data, researchers have employed a variety of analytical tools to uncover features like low-dimensional manifolds, latent population dynamics, within-trial variance, and trial-to-trial variability. However, existing methods often examine these features in isolation [16, 29, 61, 73]. From a dynamical systems perspective, a unified model that captures these distinct aspects of neural data would be highly advantageous. In this context, the Koopman operator framework offers a compelling approach to analyzing high-dimensional neural observables [47]. DMD has emerged as a prominent method for the spatiotemporal decomposition of diverse datasets [9, 14]. Nevertheless, a limitation of DMD is its lack of explicit uncertainty quantification regarding the modes and forecasts it uncovers. This aspect is particularly vital in neural time series analysis, where it is challenging to identify physically meaningful spectral components [28].

Our framework offers a unified, data-driven solution to uncover validated latent dynamical modes and their associated variance in neural data. To demonstrate its efficacy, we applied it to high-dimensional neuronal recordings from the visual cortex of awake mice, as publicly shared by the Allen Brain Observatory [71], involving 400–800 neurons per mouse. Our focus was on the “Drifting Gratings” task epoch, wherein mice were presented with gratings drifting in one of eight directions (0$^{\circ }$, 45$^{\circ }$, etc.), modulated sinusoidally at one of five temporal frequencies. We specifically analyzed responses to gratings modulated at 15 Hz across all eight directions, as these stimuli consistently elicited an identifiable eigenvalue in the neural data corresponding to the expected frequency. This analysis encompassed 120 trials per mouse (stimulus duration of 2 s) for a total of 20 mice, as detailed in [71]. We computed distinct stochastic Koopman operators for 15 different arousal levels, categorized by the average pupil diameter measured during the 500ms before each stimulus [49]. For this analysis, DMD was employed to identify 100 dictionary functions.

Our data-driven approach was effective in identifying an isolated, population-level coherent mode at the stimulus frequency. As illustrated in Fig. 10, this is evidenced by a distinct eigenvalue, highlighted in green, which consistently appears as a clear local minimum in the variance pseudospectra contour plots across various arousal states. Without the variance pseudospectra, discerning which DMD eigenvalues are reliable and indicative of coherence can be challenging. We observed that individual neurons displayed a variety of waveforms, all linked to this single linear dynamic mode. Demonstrating the diversity of these responses, Fig. 11 showcases five randomly chosen sample trajectories from the KMD. These trajectories highlight the distinct spike counts and/or timings of different neurons, all parsimoniously represented by a single latent mode.

Importantly, neuronal responses demonstrate significant trial-to-trial variability, a phenomenon of considerable physiological interest due to its close relationship with ongoing fluctuations in an animal’s internal state. Dynamical systems approaches are adept at modeling this type of variability, which often stems from changes in the neural population’s pre-stimulus state [61]. Furthermore, the extent of this variability is heavily influenced by internal states like arousal and attention, as detailed in [50]. Our stochastic modeling approach enables us to additionally estimate this second source of trial-to-trial variability in neuronal responses.

To validate the physiological significance of our variance estimates, we analyzed the variance linked to the Koopman operators computed across each of 15 levels of pupil diameter, effectively using pupil diameter as a parameter for the Koopman operator in relation to arousal. Our hypothesis was that this analysis would reflect the well-known “U-shape” pattern described by the Yerkes–Dodson law [86], with variance minimized at intermediate arousal levels [49]. Figure 10 indicates that the eigenvalue or expectation derived from 10 remains consistent across various arousal states. However, from Fig. 12, a notable modulation in variance residuals is observed in accordance with arousal levels, aligning with our predictions: the variance associated with the leading mode is specifically reduced at intermediate arousal levels. This pattern underscores the physiological relevance of the variance estimates yielded by our modeling approach. Consequently, our findings suggest that arousal systematically influences dynamical variance, providing both practical and physiological rationales for employing dynamical models that explicitly estimate variance. Overall, our data-driven framework offers a unified and formal representation of neural dynamics, parsimoniously capturing multiple physiologically significant features in the data.

6 Conclusion

We have demonstrated the role of variance in the Koopman analysis of stochastic dynamical systems. To effectively study projection errors in data-driven approaches for these systems, it is crucial to move beyond expectations and study more than just the stochastic Koopman operator. Incorporating variance into the Koopman framework enhances our understanding of spectral properties and the related projection errors. By analyzing various types of residuals, we have developed data-driven algorithms capable of computing the spectral properties of infinite-dimensional stochastic Koopman operators. Furthermore, we introduced the concept of variance pseudospectra, a tool designed to assess statistical coherency. From a computational perspective, our work includes several convergence theorems pertinent to the spectral properties of these operators. In the realm of experimental neural recordings, our framework has proven effective in extracting and compactly representing multiple data features with known physiological significance.

There are several avenues of future work related to this paper. One such direction involves an analysis of the algorithms and theorems presented in Sect. 4 in scenarios involving noisy snapshot data. Another avenue explores the trade-offs between computing the squared residual and variance terms, as outlined in (15), potentially reflecting variance-bias trade-offs in statistical analysis. Additionally, we aim to assess the robustness and generalizability of the proposed framework across further stochastic dynamical systems.

Data availability

All data generated or analyzed during this study is available upon request. A preprint of this work can be found at [22].

Notes

For an illustrative example of a transition operator with non-trivial essential spectra, refer to [3]. If the operator in question is either self-adjoint or an $L^2$ isometry, the methodologies described in [18, 21] and [23] respectively, can be applied to compute spectral measures.
Here, ’projection error’ refers to the error incurred when projecting the infinite-dimensional Koopman operator onto a finite-dimensional space of observables.
In the setting of dynamical systems, coherent sets or structures are subsets of the phase space where elements (e.g., particles, agents, etc.) exhibit similar behavior over some time interval. This behavior remains relatively consistent despite potential perturbations or the chaotic nature of the system. In essence, within a coherent structure, the dynamics of elements are closely linked and evolve coherently.
ResDMD has been shown to effectively verify learned dictionaries in deterministic dynamical systems [20].
In the presence of residual spectrum, the full pseudospectrum requires the injection modulus of complex shifts of the adjoint of $\mathscr {K}$. We have refrained from this discussion for the sake of simplicity.
Similar types of bounds to Theorem 3 can be derived for ergodic sampling and high-order quadrature sampling.
The inclusion of Brownian motion only in the $\textrm{d}X_2$ term is motivated by the physical interpretation of the random driving force. However, adding a similar term to the $\textrm{d}X_1$ equation would only affect the Kolmogorov operator by altering the parameter $\delta $.

References

Arbabi, H., Mezic, I.: Ergodic theory, dynamic mode decomposition, and computation of spectral properties of the Koopman operator. SIAM J. Appl. Dyn. Syst. 16(4), 2096–2126 (2017)
MathSciNet Google Scholar
Arbabi, H., Mezić, I.: Study of dynamics in post-transient flows using Koopman mode decomposition. Phys. Rev. Fluids 2(12), 124402 (2017)
Google Scholar
Atchadé, Y.F., Perron, F.: On the geometric ergodicity of Metropolis-Hastings algorithms. Statistics 41(1), 77–84 (2007)
MathSciNet Google Scholar
Baddoo, P.J., Herrmann, B., McKeon, B.J., Nathan Kutz, J., Brunton, S.L.: Physics-informed dynamic mode decomposition. Proc. R. Soc. A 479(2271), 20220576 (2023)
MathSciNet Google Scholar
Beer, G.: Topologies on Closed and Closed Convex Sets, vol. 268. Springer, Berlin (1993)
Google Scholar
Berger, E., Sastuba, M., Vogt, D., Jung, B., Ben Amor, H.: Estimation of perturbations in robotic behavior using dynamic mode decomposition. Adv. Robot. 29(5), 331–343 (2015)
Google Scholar
Böttcher, A., Silbermann, B.: The finite section method for Toeplitz operators on the quarter-plane with piecewise continuous symbols. Math. Nachr. 110(1), 279–291 (1983)
MathSciNet Google Scholar
Bruder, D., Gillespie, B., Remy, C.D., Vasudevan, R.: Modeling and control of soft robots using the Koopman operator and model predictive control. arXiv preprint arXiv:1902.02827 (2019)
Brunton, B.W., Johnson, L.A., Ojemann, J.G., Kutz, J.N.: Extracting spatial–temporal coherent patterns in large-scale neural recordings using dynamic mode decomposition. J. Neurosci Methods 258, 1–15 (2016)
Google Scholar
Brunton, S.L., Brunton, B.W., Proctor, J.L., Kaiser, E., Kutz, J.N.: Chaos as an intermittently forced linear system. Nat. Commun. 8(1), 1–9 (2017)
Google Scholar
Brunton, S.L., Budišić, M., Kaiser, E., Kutz, J.N.: Modern Koopman theory for dynamical systems. SIAM Rev. 64(2), 229–340 (2022)
MathSciNet Google Scholar
Budišić, M., Mohr, R., Mezić, I.: Applied Koopmanism. Chaos 22(4), 047510 (2012)
MathSciNet Google Scholar
Caflisch, R.E.: Monte Carlo and quasi-Monte Carlo methods. Acta Numer. 7, 1–49 (1998)
MathSciNet Google Scholar
Casorso, J., Kong, X., Chi, W., Van De Ville, D., Yeo, B.T., Liégeois, R.: Dynamic mode decomposition of resting-state and task fMRI. Neuroimage 194, 42–54 (2019)
Google Scholar
Chen, K.K., Tu, J.H., Rowley, C.W.: Variants of dynamic mode decomposition: boundary condition, Koopman, and Fourier analyses. J. Nonlinear Sci. 22(6), 887–915 (2012)
MathSciNet Google Scholar
Churchland, M.M., Yu, B.M., Cunningham, J.P., Sugrue, L.P., Cohen, M.R., Corrado, G.S., Newsome, W.T., Clark, A.M., Hosseini, P., Scott, B.B., et al.: Stimulus onset quenches neural variability: a widespread cortical phenomenon. Nat. Neurosci. 13(3), 369–378 (2010)
Google Scholar
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21(1), 5–30 (2006)
MathSciNet Google Scholar
Colbrook, M., Horning, A., Townsend, A.: Computing spectral measures of self-adjoint operators. SIAM Rev. 63(3), 489–524 (2021)
MathSciNet Google Scholar
Colbrook, M.J.: The mpEDMD algorithm for data-driven computations of measure-preserving dynamical systems. SIAM J. Numer. Anal. 61(3), 1585–1608 (2023)
MathSciNet Google Scholar
Colbrook, M.J., Ayton, L.J., Szőke, M.: Residual dynamic mode decomposition: robust and verified Koopmanism. J. Fluid Mech. 955, A21 (2023)
MathSciNet Google Scholar
Colbrook, M.J., Horning, A., Townsend, A.: SpecSolve. github (online). https://github.com/SpecSolve (2020)
Colbrook, M.J., Li, Q., Raut, R.V., Townsend, A.: Beyond expectations: residual dynamic mode decomposition and variance for stochastic dynamical systems. arXiv preprint arXiv:2308.10697 (2023)
Colbrook, M.J., Townsend, A.: Rigorous data-driven computation of spectral properties of Koopman operators for dynamical systems. Commun. Pure Appl. Math. (to appear)
Črnjarić-Žic, N., Maćešić, S., Mezić, I.: Koopman operator spectrum for random dynamical systems. J. Nonlinear Sci. 30, 2007–2056 (2020)
MathSciNet Google Scholar
Da Prato, G., Zabczyk, J.: Stochastic Equations in Infinite Dimensions. Cambridge University Press, Cambridge (2014)
Google Scholar
Das, S., Giannakis, D., Slawinska, J.: Reproducing kernel Hilbert space compactification of unitary evolution groups. Appl. Comput. Harmon. Anal. 54, 75–136 (2021)
MathSciNet Google Scholar
Dawson, S., Hemati, M.S., Williams, M.O., Rowley, C.W.: Characterizing and correcting for the effect of sensor noise in the dynamic mode decomposition. Exp. Fluids 57(3), 1–19 (2016)
Google Scholar
Donoghue, T., Haller, M., Peterson, E.J., Varma, P., Sebastian, P., Gao, R., Noto, T., Lara, A.H., Wallis, J.D., Knight, R.T., et al.: Parameterizing neural power spectra into periodic and aperiodic components. Nat. Neurosci. 23(12), 1655–1665 (2020)
Google Scholar
Gao, Y., Archer, E.W., Paninski, L., Cunningham, J.P.: Linear dynamical neural population models through nonlinear embeddings. Adv. Neural Inf. Process. Syst. 29, 163–171 (2016)
Google Scholar
Giannakis, D.: Data-driven spectral decomposition and forecasting of ergodic dynamical systems. Appl. Comput. Harmon. Anal. 47(2), 338–396 (2019)
MathSciNet Google Scholar
Giannakis, D., Kolchinskaya, A., Krasnov, D., Schumacher, J.: Koopman analysis of the long-term evolution in a turbulent convection cell. J. Fluid Mech. 847, 735–767 (2018)
MathSciNet Google Scholar
Giannakis, D., Majda, A.J.: Nonlinear Laplacian spectral analysis for time series with intermittency and low-frequency variability. Proc. Natl. Acad. Sci. 109(7), 2222–2227 (2012)
MathSciNet Google Scholar
Gikhman, I.I., Skorokhod, A.V.: The Theory of Stochastic Processes: I, vol. 210. Springer, Berlin (2004)
Google Scholar
Givon, D., Kupferman, R., Stuart, A.: Extracting macroscopic dynamics: model problems and algorithms. Nonlinearity 17(6), R55 (2004)
MathSciNet Google Scholar
Hemati, M.S., Rowley, C.W., Deem, E.A., Cattafesta, L.N.: De-biasing the dynamic mode decomposition for applied Koopman spectral analysis of noisy datasets. Theor. Comput. Fluid Dyn. 31(4), 349–368 (2017)
Google Scholar
Kachurovskii, A.G.: The rate of convergence in ergodic theorems. Russ. Math. Sur. 51(4), 653–703 (1996)
MathSciNet Google Scholar
Kaiser, E., Kutz, J.N., Brunton, S.L.: Data-driven discovery of Koopman eigenfunctions for control. Mach. Learn. Sci. Technol. 2(3), 035023 (2021)
Google Scholar
Klus, S., Koltai, P., Schütte, C.: On the numerical approximation of the Perron-Frobenius and Koopman operator. J. Comput. Dyn. 3(1), 51–79 (2016)
MathSciNet Google Scholar
Klus, S., Nüske, F., Koltai, P., Wu, H., Kevrekidis, I., Schütte, C., Noé, F.: Data-driven model reduction and transfer operator approximation. J. Nonlinear Sci. 28(3), 985–1010 (2018)
MathSciNet Google Scholar
Kolmogoroff, A.: Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung. Math. Ann. 104, 415–458 (1931)
MathSciNet Google Scholar
Korda, M., Mezić, I.: On convergence of extended dynamic mode decomposition to the Koopman operator. J. Nonlinear Sci. 28(2), 687–710 (2018)
MathSciNet Google Scholar
Korda, M., Putinar, M., Mezić, I.: Data-driven spectral analysis of the Koopman operator. Appl. Comput. Harmon. Anal. 48(2), 599–629 (2020)
MathSciNet Google Scholar
Kostic, V.R., Novelli, P., Maurer, A., Ciliberto, C., Rosasco, L., et al.: Learning dynamical systems via Koopman operator regression in reproducing kernel Hilbert spaces. In: Advances in Neural Information Processing Systems
Kutz, J.N., Brunton, S.L., Brunton, B.W., Proctor, J.L.: Dynamic Mode Decomposition: Data-driven Modeling of Complex Systems. SIAM, Philadelphia (2016)
Google Scholar
Leung, H.: Stochastic transient of a noisy van der Pol oscillator. Physica A 221(1–3), 340–347 (1995)
Google Scholar
Mann, J., Kutz, J.N.: Dynamic mode decomposition for financial trading strategies. Quant. Finance 16(11), 1643–1655 (2016)
MathSciNet Google Scholar
Marrouch, N., Slawinska, J., Giannakis, D., Read, H.L.: Data-driven Koopman operator approach for computational neuroscience. Ann. Math. Artif. Intell. 88(11–12), 1155–1173 (2020)
MathSciNet Google Scholar
Mauroy, A., Mezić, I.: On the use of Fourier averages to compute the global isochrons of (quasi) periodic dynamics. Chaos Interdiscip. J. Nonlinear Sci. 22(3), 033112 (2012)
MathSciNet Google Scholar
McGinley, M.J., David, S.V., McCormick, D.A.: Cortical membrane potential signature of optimal states for sensory signal detection. Neuron 87(1), 179–192 (2015)
Google Scholar
McGinley, M.J., Vinck, M., Reimer, J., Batista-Brito, R., Zagha, E., Cadwell, C.R., Tolias, A.S., Cardin, J.A., McCormick, D.A.: Waking state: rapid variations modulate neural and behavioral responses. Neuron 87(6), 1143–1161 (2015)
Google Scholar
Mezić, I.: Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dyn. 41(1), 309–325 (2005)
MathSciNet Google Scholar
Mezić, I.: Analysis of fluid flows via spectral properties of the Koopman operator. Ann. Rev. Fluid Mech. 45, 357–378 (2013)
MathSciNet Google Scholar
Mezic, I.: Koopman operator spectrum and data analysis. arXiv preprint arXiv:1702.07597 (2017)
Mezić, I.: Koopman operator, geometry, and learning of dynamical systems. Not. Am. Math. Soc. 68, 1087–1105 (2021)
MathSciNet Google Scholar
Mezić, I.: On numerical approximations of the Koopman operator. Mathematics 10(7), 1180 (2022)
Google Scholar
Mezic, I., Banaszuk, A.: Comparison of systems with complex behavior: spectral methods. In: Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No. 00CH37187), vol. 2, pp. 1224–1231. IEEE (2000)
Mezić, I., Banaszuk, A.: Comparison of systems with complex behavior. Phys. D: Nonlin. Phen. 197(1–2), 101–133 (2004)
MathSciNet Google Scholar
Mollenhauer, M., Klus, S., Schütte, C., Koltai, P.: Kernel autocovariance operators of stationary processes: estimation and convergence. J. Mach. Learn. Res. 23(327), 1–34 (2022)
MathSciNet Google Scholar
Nuske, F., Keller, B.G., Pérez-Hernández, G., Mey, A.S., Noé, F.: Variational approach to molecular kinetics. J. Chem. Theory Comput. 10(4), 1739–1752 (2014)
Google Scholar
Nüske, F., Peitz, S., Philipp, F., Schaller, M., Worthmann, K.: Finite-data error bounds for Koopman-based prediction and control. J. Nonlinear Sci. 33(1), 14 (2023)
MathSciNet Google Scholar
Pandarinath, C., O’Shea, D.J., Collins, J., Jozefowicz, R., Stavisky, S.D., Kao, J.C., Trautmann, E.M., Kaufman, M.T., Ryu, S.I., Hochberg, L.R., et al.: Inferring single-trial neural population dynamics using sequential auto-encoders. Nat. Methods 15(10), 805–815 (2018)
Google Scholar
Paninski, L., Cunningham, J.P.: Neural data science: accelerating the experiment-analysis-theory cycle in large-scale neuroscience. Curr. Opin. Neurobiol. 50, 232–241 (2018)
Google Scholar
Proctor, J.L., Brunton, S.L., Kutz, J.N.: Dynamic mode decomposition with control. SIAM J. Appl. Dyn. Syst. 15(1), 142–161 (2016)
MathSciNet Google Scholar
Proctor, J.L., Eckhoff, P.A.: Discovering dynamic patterns from infectious disease data using dynamic mode decomposition. Int. Health 7(2), 139–145 (2015)
Google Scholar
Rößler, A.: Runge–Kutta methods for the strong approximation of solutions of stochastic differential equations. SIAM J. Numer. Anal. 48(3), 922–952 (2010)
MathSciNet Google Scholar
Rowley, C.W., Mezić, I., Bagheri, S., Schlatter, P., Henningson, D.S.: Spectral analysis of nonlinear flows. J. Fluid Mech. 641, 115–127 (2009)
MathSciNet Google Scholar
Schmid, P.J.: Dynamic mode decomposition of experimental data. In: 8th International Symposium on Particle Image Velocimetry (PIV09) (2009)
Schmid, P.J.: Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech. 656, 5–28 (2010)
MathSciNet Google Scholar
Schwantes, C.R., Pande, V.S.: Improvements in Markov state model construction reveal many non-native interactions in the folding of NTL9. J. Chem. Theory Comput. 9(4), 2000–2009 (2013)
Google Scholar
Schwantes, C.R., Pande, V.S.: Modeling molecular kinetics with tICA and the kernel trick. J. Chem. Theory Comput. 11(2), 600–608 (2015)
Google Scholar
Siegle, J.H., Jia, X., Durand, S., Gale, S., Bennett, C., Graddis, N., Heller, G., Ramirez, T.K., Choi, H., Luviano, J.A., et al.: Survey of spiking in the mouse visual system reveals functional hierarchy. Nature 592(7852), 86–92 (2021)
Google Scholar
Sinha, S., Huang, B., Vaidya, U.: On robust computation of Koopman operator and prediction in random dynamical systems. J. Nonlinear Sci. 30(5), 2057–2090 (2020)
MathSciNet Google Scholar
Stringer, C., Pachitariu, M., Steinmetz, N., Reddy, C.B., Carandini, M., Harris, K.D.: Spontaneous behaviors drive multidimensional, brainwide activity. Science 364(6437), eaav7893 (2019)
Google Scholar
Strogatz, S.H.: Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. CRC Press, Boca Raton (2018)
Google Scholar
Susuki, Y., Mezic, I.: Nonlinear Koopman modes and coherency identification of coupled swing dynamics. IEEE Trans. Power Syst. 26(4), 1894–1904 (2011)
Google Scholar
Susuki, Y., Mezić, I., Hikihara, T.: Coherent swing instability of power grids. J. Nonlinear Sci. 21(3), 403–439 (2011)
MathSciNet Google Scholar
Takeishi, N., Kawahara, Y., Yairi, T.: Subspace dynamic mode decomposition for stochastic Koopman analysis. Phys. Rev. E 96(3), 033310 (2017)
Google Scholar
Trefethen, L.N., Embree, M.: Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators. Princeton University Press, Princeton (2005)
Google Scholar
Tu, J.H., Rowley, C.W., Luchtenburg, D.M., Brunton, S.L., Kutz, J.N.: On dynamic mode decomposition: theory and applications. J. Comput. Dyn. 1(2), 391–421 (2014)
MathSciNet Google Scholar
Ulam, S.M.: A Collection of Mathematical Problems, vol. 8. Interscience Publishers, New York (1960)
Google Scholar
Vitalini, F., Noé, F., Keller, B.: A basis set for peptides for the variational approach to conformational kinetics. J. Chem. Theory Comput. 11(9), 3992–4004 (2015)
Google Scholar
Wanner, M., Mezic, I.: Robust approximation of the stochastic Koopman operator. SIAM J. Appl. Dyn. Syst. 21(3), 1930–1951 (2022)
MathSciNet Google Scholar
Webber, R.J., Thiede, E.H., Dow, D., Dinner, A.R., Weare, J.: Error bounds for dynamical spectral estimation. SIAM J. Math. Data Sci. 3(1), 225–252 (2021)
Williams, M.O., Kevrekidis, I.G., Rowley, C.W.: A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. J. Nonlinear Sci. 25(6), 1307–1346 (2015)
MathSciNet Google Scholar
Williams, M.O., Rowley, C.W., Kevrekidis, I.G.: A kernel-based method for data-driven Koopman spectral analysis. J. Comput. Dyn. 2(2), 247 (2015)
MathSciNet Google Scholar
Yerkes, R.M., Dodson, J.D., et al.: The relation of strength of stimulus to rapidity of habit-formation. Int. Health 7, 139–145 (1908)
Google Scholar
Zhang, B.J., Sahai, T., Marzouk, Y.M.: A Koopman framework for rare event simulation in stochastic differential equations. J. Comput. Phys. 456, 111025 (2022)
MathSciNet Google Scholar

Download references

Acknowledgements

We thank the Allen Institute for the publicly available data and the referees for valuable comments that helped improve the clarity of the paper. MJC would like to thank the Cecil King Foundation and the London Mathematical Society for a Cecil King Travel Scholarship that funded visits to the University of Wisconsin-Madison, the University of Washington, and Cornell University. QL would like to thank Vice Chancellor for Research and Graduate Education, DMS-2308440 and ONR-N000142112140. RVR would like to thank the Shanahan Family Foundation for support. AT work was partially supported by the NSF DMS-1952757, DMS-2045646, a Simons Mathematical Fellowship, and ONR-N000142312729.

Funding

This study was funded by DMS-2308440, DMS-1952757, DMS-2045646, ONR-N000142112140, ONR-N000142312729.

Author information

Authors and Affiliations

DAMTP, University of Cambridge, Cambridge, CB3 0WA, UK
Matthew J. Colbrook
Department of Mathematics, University of Wisconsin-Madison, Madison, WI, 53706, USA
Qin Li
Allen Institute, Seattle, WA, 98109, USA
Ryan V. Raut
Department of Physiology and Biophysics, University of Washington, Seattle, WA, 98195, USA
Ryan V. Raut
Department of Mathematics, Cornell University, Ithaca, NY, 14853, USA
Alex Townsend

Authors

Matthew J. Colbrook
View author publications
You can also search for this author in PubMed Google Scholar
Qin Li
View author publications
You can also search for this author in PubMed Google Scholar
Ryan V. Raut
View author publications
You can also search for this author in PubMed Google Scholar
Alex Townsend
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew J. Colbrook.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Colbrook, M.J., Li, Q., Raut, R.V. et al. Beyond expectations: residual dynamic mode decomposition and variance for stochastic dynamical systems. Nonlinear Dyn 112, 2037–2061 (2024). https://doi.org/10.1007/s11071-023-09135-w

Download citation

Received: 06 September 2023
Accepted: 22 November 2023
Published: 23 December 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11071-023-09135-w

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Beyond expectations: residual dynamic mode decomposition and variance for stochastic dynamical systems

Abstract

Similar content being viewed by others

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Data-driven linearization of dynamical systems

Enhancing spectral analysis in nonlinear dynamics with pseudoeigenfunctions from continuous spectra

Explore related subjects

1 Introduction

1.1 Contributions

1.2 Previous work

1.3 Data-driven setup

2 Mathematical preliminaries

2.1 The stochastic Koopman operator

Proposition 1

Proof

2.2 Extended dynamic mode decomposition

2.3 Residual dynamic mode decomposition (ResDMD)

Definition 1

3 Variance from the Koopman perspective

3.1 Variance via Koopman operators

Proposition 2

Proof

Example 1

Proposition 3

Proof

3.2 ResDMD in stochastic systems

Definition 2

3.3 Algorithms

4 Theoretical guarantees

4.1 Accuracy in finding spectral quantities

Lemma 1

Proof

Theorem 1

Proof

4.2 Error bounds for iterations

Theorem 2

Proof

4.3 Estimation error for computation of A, G, and L

Assumption 1

Theorem 3

Proof

5 Examples

5.1 Arnold’s circle map

5.2 Stochastic Van der Pol oscillator

5.3 Neuronal population dynamics

6 Conclusion

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation