1 Introduction

The Koopman framework (Koopman 1931) is the operator-theoretic basis for a wide range of data-driven methodologies to predict the evolution of nonlinear dynamical systems using linear techniques, see, e.g., Mezić (2005), Rowley et al. (2009) or the recent survey (Brunton et al. 2022) and the references therein. The underlying concept is that observables, which may also be understood as outputs from the systems-and-control perspective, can be propagated forward in time using the linear yet infinite-dimensional Koopman operator or its generator, instead of simulating the nonlinear system and evaluating the observable functions. Its recent success is closely linked to numerically tractable approximation techniques like extended Dynamic Mode Decomposition (eDMD), see, e.g., Williams et al. (2015), Klus et al. (2016), Korda and Mezić (2018b), Klus et al. (2020) for numerical techniques and convergence results.

While the Koopman framework is well established, approximation results are typically only established in the infinite-data limit, i.e., if sufficient data are available. Recently, Lu and Tartakovsky (2020) discussed error bounds w.r.t. DMD invoking the seminal work (Korda and Mezić 2018b) by Korda and Mezić. While the authors numerically demonstrate the effectiveness of their approach even for nonlinear parabolic Partial Differential Equations (PDEs), see also their extension (Lu and Tartakovsky 2021), there remains a significant gap from a more theoretical point of view since the approximation error is assumed to be zero for finite data, see Lu and Tartakovsky (2020, Remark 3.1). Mamakoukas et al. (2021) mimic a Taylor-series expansion based on a particular set of observables to approximate the system dynamics of an Ordinary Differential Equation (ODE). This work may be understood as a promising approach to incorporate (local) knowledge on the system dynamics in the Koopman framework. However, a bound on the prediction error in terms of data is not deduced. Error bounds for Koopman eigenvalues in terms of the finite-data estimation error were derived in Webber et al. (2021), but the estimation error itself was not quantified. In Mollenhauer et al. (2020), concentration inequalities were applied to bound the estimation error for the co-variance and cross-covariance operators involved in Koopman estimation. In the exhaustive preprint (Kurdila and Bobade 2018), the authors treat the projection error for different approximation spaces such as, e.g., reproducing kernel Hilbert spaces and wavelets. The estimation error is also discussed briefly in Sect. 8.5. In Zhang and Zuazua (2021), besides providing a finite-data error bound on the approximation of the Koopman operator in the context of ODEs, the authors estimate the projection error by means of finite-element analysis. In conclusion, to the best of our knowledge,Footnote 1 Zhang and Zuazua (2021), Kurdila and Bobade (2018) are the only works providing rigorous error bounds for Koopman-based approximations of a dynamical system governed by a nonlinear ODE.

In this paper, we rigorously derive probabilistic bounds on the approximation error (or finite-data estimation error) and the (multi-step) prediction error for nonlinear Stochastic Differential Equations (SDEs). This, of course, also includes nonlinear ODEs. The deduced bounds on the approximation error and prediction accuracy explicitly depend on the number of data points used in eDMD. To this end, besides using mass concentration inequalities and a numerical error analysis to deal with the error propagation in time, we employ substantially different techniques in comparison to Kurdila and Bobade (2018), Zhang and Zuazua (2021) to provide an additional alternative assumption based on ergodic sampling tailored to stationary SDEs. Our results in this setting focus on the concept of asymptotic variance, see Lelièvre and Stoltz (2016) and the references therein. In contrast to most concentration inequalities, the asymptotic variance is a genuinely dynamical quantity. Even though it cannot be directly accessed for most complex systems, it provides a solid basis for further theoretical study of the estimation error. For instance, it was shown in Duncan et al. (2016) that a spectral analysis of the generator can be used to speed up convergence to equilibrium and, by extension, the convergence of empirical estimators. In this study, we use a simple Ornstein–Uhlenbeck process to illustrate our error bounds in practice and show that they are surprisingly sharp. This serves as additional motivation to continue the study of the sampling error for ergodic sampling by means of asymptotic variances. Let us stress that we do not fully analyze the projection error; in other words, how much the Koopman generator fails to be invariant on the approximation subspace, referring to the existing literature Kurdila and Bobade (2018) and Zhang and Zuazua (2021) in the autonomous case and our follow-up work (Schaller et al. 2022) in the control setting.

W.r.t. the application of Koopman theory in control, a lot of research has been invested over the past years, beginning with the popular DMD with control (Proctor et al. 2016), which was later used in Model Predictive Control (MPC) (Korda and Mezić 2018a). Another popular method is to use a coordinate transformation into Koopman eigenfunctions (Kaiser et al. 2021) or the already mentioned component-wise Taylor series expansion (Mamakoukas et al. 2021). In Lu et al. (2020), the prediction error of the method proposed in Proctor et al. (2016) was estimated using the convergence result of Korda and Mezić (2018b). However, the result is of purely asymptotic nature, i.e., it does not state a convergence rate in terms of data points. All approaches mentioned until now yield linear surrogate models of the form \(Ax+Bu\), i.e., the control enters linearly. For general control-affine systems, numerical simulation studies indicate that bilinear surrogate models are better suited, see Goswami and Paley (2017), Peitz et al. (2020), Bruder et al. (2021), Peitz and Bieker (2021). The technique proposed in Peitz and Klus (2019); Peitz et al. (2020) constructs its surrogate model from \(n_c+1\) autonomous Koopman operators, where \(n_c\) is the control dimension. The key feature is that the state-space dimension is not augmented by the number of control inputs, which counteracts the curse of dimensionality in comparison with the more widespread approach introduced in Korda and Mezić (2018a). Compared to Peitz et al. (2020), we present a detailed analysis of the accuracy regarding both the dictionary size and the amount of training data. Even though the bound is rather coarse on the operator level, we demonstrate that it correctly captures the qualitative behavior. In this context, we provide a probabilistic bound on the approximation error of the projected Koopman generator, the projected Koopman semigroup and the respective trajectories. To this end, we extend our results toward nonlinear control systems. Besides a rigorous bound on the approximation error, we present estimates on the (auto-regressive) prediction accuracy, i.e., in an open-loop prediction (without feedback). For control systems, we also refer to the follow-up work (Schaller et al. 2022), where we obtained the following two extensions: On the one hand, we deduced quantitative estimates of the projection error depending on the (finite) dictionary size. Combining this with the error bounds depending on the (finite) number of data points proposed in this work yields a complete analysis of the approximation error. On the other hand, we further elaborated the estimates such that the error bounds uniformly hold for a set of admissible control functions rendering the approach applicable for optimal and predictive control.

The paper is structured as follows. Firstly, in Sect. 2, we deduce a rigorous bound on the approximation error for nonlinear SDEs. Then, we extend our analysis to nonlinear control-affine systems in Sect. 3. In Sect. 4, two numerical simulation studies for the Ornstein–Uhlenbeck system (SDE) and the controlled Duffing equation (nonlinear control-affine system) are presented before conclusions are drawn in Sect. 5.

2 Finite-Data Bounds on the Approximation Error: Nonlinear SDEs

In this section, we analyze the approximation quality of extended Dynamic Mode Decomposition (eDMD) with finitely many data points for the finite-dimensional stochastic differential equation

$$\begin{aligned} \text {d}X_t = F(X_t)\,\text {d}t + \sigma (X_t) \,\text {d}W_t, \end{aligned}$$

where \(X_t \in {\mathbb {X}}\subset {\mathbb {R}}^d\) is the state, \(F : {\mathbb {X}} \rightarrow {\mathbb {R}}^d\) is the drift vector field, \(\sigma : {\mathbb {X}} \rightarrow {\mathbb {R}}^{d\times d}\) is the diffusion matrix field, and \(W_t\) is a d-dimensional Brownian motion. We assume that \(F, \, \sigma \) satisfy standard Lipschitz properties to ensure global existence of solutions to (SDE), see the textbook (Oksendal 2013) for an introduction to this class of systems. We stress that the deterministic case is included by simply setting \(\sigma \equiv 0\), leading to the ordinary differential equation

$$\begin{aligned} \tfrac{\text {d}}{\text {d}t} x(t)= F(x(t)). \end{aligned}$$

The state space is assumed to be a measure space \(({\mathbb {X}}, \Sigma _{{{\mathbb {X}}}}, \mu )\) with Borel \(\sigma \)-algebra \(\Sigma _{{{\mathbb {X}}}}\) and probability measure \(\mu \). In case of an ODE, the set \({\mathbb {X}}\) is often assumed to be compact and forward-invariant and the probability measure is the standard Lebesgue measure, cf. Zhang and Zuazua (2021).

Definition 1

(Koopman operator) Let \(X_t\) satisfy (SDE) for \(t \ge 0\). The Koopman operator semigroup associated with (SDE) is defined by

$$\begin{aligned} {\mathcal {K}}^t f(x_0) = {\mathbb {E}}^{x_0}[f(X_t)] = E[f(X_t)|X_0 = x_0] \end{aligned}$$

for all bounded measurable functions f.

In case of ergodic sampling, that is, obtaining data points from a single long trajectory, we will assume invariance of the measure \(\mu \) w.r.t. the stochastic process \(X_t\).

Definition 2

(Invariant measure with positive density) A probability measure \(\mu \) is called invariant if it satisfies

$$\begin{aligned} \int _{{{\mathbb {X}}}} {\mathcal {K}}^t f \,\mathrm {d}\mu = \int _{{{\mathbb {X}}}} f \,\mathrm {d}\mu \end{aligned}$$

for all bounded measurable functions f and all \(t\ge 0\). Further, \(\mu \) has an everywhere positive density \(\rho :{{\mathbb {X}}}\rightarrow {\mathbb {R}}\) if \(\mu (A) = \int _A \rho (x) \,\mathrm {d}x\) holds for all \(A\in \Sigma _{{{\mathbb {X}}}}\).

We can now formulate our assumption on the underlying dynamics.

Assumption 3

Let either of the following hold:


The set \({\mathbb {X}}\) is compact and forward invariant \((\forall \,x^0 \in {\mathbb {X}}: {\mathbb {P}}^{x_0}(X_t \in {\mathbb {X}})=1\) for all \(t \ge 0)\) and \(\mu \) is the normalized Lebesgue measure. Moreover, the Koopman operator can be extended to a strongly continuous semigroup on the Hilbert space \(L^2_\mu ({\mathbb {X}})\).


The probability measure is an invariant measure in the sense of Definition 2.

We briefly comment on this assumption and first note that forward invariance of \({\mathbb {X}}\) can be weakened, if one is only interested in estimates for states contained in \({\mathbb {X}}\), see also Zhang and Zuazua (2021, Section 3.2). Moreover, if the dynamics obey an ODE, it was shown that the Koopman operator can indeed be extended to a strongly continuous semigroup on \(L^2_\mu ({{\mathbb {X}}})\), see also Zhang and Zuazua (2021). Second, the assumption of invariance of the underlying probability measure is satisfied for a broad class of SDEs, see, e.g., Risken (1996). It can be checked that \(\mu \) is then invariant for \(X_t\), that is, \({\mathbb {P}}(X_t \in A) = \mu (A)\) holds for all \(A\in \Sigma _{{{\mathbb {X}}}}\) and \(t \ge 0\), provided \(X_0\) is distributed according to \(\mu \). Under Assumption 3(b), Definition 1 can be extended to the Lebesgue spaces \(L^p_\mu ({{\mathbb {X}}})\), \(1 \le p < \infty \), i.e., the Banach spaces of all (equivalence classes of) measurable functions \(f:{\mathbb {X}}\rightarrow {\mathbb {R}}\) with \(\int _{\mathbb {X}}|f|^p \,\text {d}\mu < \infty \). Then, the Koopman operators \({\mathcal {K}}^t\) form a strongly continuous semigroup of contractions on all spaces \(L^p_\mu ({{\mathbb {X}}})\), see Bakry et al. (2013). The functions in any of these spaces are often referred to as observables.

Next, we recall the definition of the generator associated with the semigroup \({\mathcal {K}}_t\):

Definition 4

(Koopman generator) The infinitesimal generator \({\mathcal {L}}\) is defined via

$$\begin{aligned} {\mathcal {L}}f := \lim _{t\rightarrow 0} \frac{({\mathcal {K}}^t - {\text {Id}})f}{t} \end{aligned}$$

for all \(f \in D({\mathcal {L}})\), where \(D({\mathcal {L}})\) is the set of functions for which limit (1) exists in the appropriate topology.

For sufficiently smooth functions f, Ito’s lemma (Oksendal 2013) shows that the generator acts as a second-order differential operator, defined in terms of the coefficients of (SDE), i.e.,

$$\begin{aligned} {\mathcal {L}} = F \cdot \nabla + \tfrac{1}{2} \sigma \sigma ^\top : \nabla ^2 \end{aligned}$$

with \(A: B := \sum _{i,j}a_{i,j}b_{i,j}\) being the standard Frobenius inner product for matrices. In what follows, we will focus exclusively on the Koopman semigroup on the Hilbert space \(L^2_\mu ({{\mathbb {X}}})\) with inner product \(\langle f, g \rangle _\mu = \int _{{{\mathbb {X}}}} f g \, \mathrm {d}\mu \). As the semigroup is strongly continuous on \(L^2_\mu ({{\mathbb {X}}})\) by our assumptions, by standard semigroup theory, the domain \(D({\mathcal {L}})\) together with the graph norm forms a dense Banach space in \(L^2_\mu ({{\mathbb {X}}})\).

2.1 Extended Dynamic Mode Decomposition

In this part we introduce the data-driven finite-dimensional approximation by eDMD of the Koopman generator defined in (1), see, e.g., Williams et al. (2015), Klus et al. (2016), Klus et al. (2018). To this end, for a fixed set of linearly independent observables \(\psi _1,\ldots ,\psi _N\in D({\mathcal {L}})\), we consider the finite-dimensional subspace

$$\begin{aligned} {\mathbb {V}} := {\text {span}}\{\{\psi _j\}_{j=1}^N\} \subset D({\mathcal {L}}). \end{aligned}$$

Let \(P_{\mathbb {V}}\) denote the orthogonal projection onto \({\mathbb {V}}\). We define the Galerkin projection of the Koopman generator by . Note that this is not the restriction of \({\mathcal {L}}\) onto \({\mathbb {V}}\), as the image is also projected back onto \({\mathbb {V}}\). If \({\mathbb {V}}\) is an invariant set under the action of the generator, then holds. As \(\dim {\mathbb {V}} = N\), the linear operator \({\mathcal {L}}_{\mathbb {V}}: {\mathbb {V}}\rightarrow {\mathbb {V}}\) may be represented by a matrix. In what follows, we denote the matrix representation of \({\mathcal {L}}_{{\mathbb {V}}}\) in terms of the basis functions \(\psi _1,\ldots ,\psi _N\) by the same symbol \({\mathcal {L}}_{{\mathbb {V}}}\) as the operator itself in a slight abuse of notation. Thus, using Klus et al. (2020), we get

$$\begin{aligned} {\mathcal {L}}_{\mathbb {V}}= C^{-1}A \end{aligned}$$

with \(C, A \in {\mathbb {R}}^{N\times N}\) defined by \(C_{i,j}=\langle \psi _i,\psi _j\rangle _{L_\mu ^2({\mathbb {X}})}\) and \(A_{i,j} =\langle \psi _i,{\mathcal {L}}\psi _j\rangle _{L_\mu ^2({\mathbb {X}})}\). The norm of the isomorphism from \({\mathbb {V}}\) to \({\mathbb {R}}^N\) depends on the smallest resp. largest eigenvalues of C, cf. Proposition 21 in “Appendix A.1.”

Consider data points \(x_0, \ldots , x_{m-1} \in {\mathbb {X}}\). In the following, these data are either drawn from a trajectory of an ergodic system or sampled independent and identically distributed (i.i.d.). We state this as the following assumption, using the notation:

$$\begin{aligned} L^2_{\mu ,0}({{\mathbb {X}}}) := \{f \in L^2_\mu ({{\mathbb {X}}}):\, \langle f, 1 \rangle _\mu = 0\}. \end{aligned}$$

Assumption 5

Let Assumption 3 hold and assume either of the following.


The data are drawn i.i.d. from the measure specified via Assumption 3.


Assumption 3.(b) holds and the data are obtained as snapshots from a single ergodic trajectory, that is, from a single long trajectory of dynamics (SDE) with \(x_0\) drawn from the unique invariant measure \(\mu \). Further assume the Koopman semigroup is exponentially stable on \(L^2_{\mu ,0}({{\mathbb {X}}})\), i.e., \(\Vert {\mathcal {K}}^t \Vert _{L^2_{\mu , 0}({{\mathbb {X}}})} \le Me^{-\omega t}\) for some \(M\ge 1\), \(\omega > 0\).

The second assumption (erg) is satisfied for a broad class of ergodic SDEs that are considered widely in, for example, statistical physics and molecular simulation. However, it should also be noted that it is not at all universal. For instance, for ergodic ODEs, the Koopman operator is unitary and hence cannot be exponentially stable. In this case one can still resort to i.i.d. sampling.

Let us form the transformed data matrices

$$\begin{aligned} \Psi (X)&:= \left( \left. \left( {\begin{matrix} \psi _1(x_0)\\ :\\ \psi _N(x_0) \end{matrix}}\right) \right| \ldots \left| \left( {\begin{matrix} \psi _1(x_{m-1})\\ :\\ \psi _N(x_{m-1}) \end{matrix}}\right) \right. \right) \\ {\mathcal {L}}\Psi (X)&:= \left( \left. \left( {\begin{matrix} ({\mathcal {L}}\psi _1)(x_0)\\ :\\ ({\mathcal {L}}\psi _N)(x_0) \end{matrix}}\right) \right| \ldots \left| \left( {\begin{matrix} ({\mathcal {L}}\psi _1)(x_{m-1})\\ :\\ ({\mathcal {L}}\psi _N)(x_{m-1}) \end{matrix}}\right) \right. \right) . \end{aligned}$$

The evaluation of \({\mathcal {L}}\) can be realized via representation (2). If the coefficients of dynamics (SDE) are not available in explicit form, they need to be approximated, for example, using Kramers–Moyal expansions. The analysis of this source of error is beyond the scope of this study. Furthermore, there might occur challenges when evaluating the time derivatives of the observables to compute \({\mathcal {L}}\Psi (X)\) if the system is not explicitly given. This is a well-known problem and may be addressed by various numerical differentiation techniques (Brunton et al. 2016; van Breugel et al. 2020). Alternatively, one can resort to the finite-time Koopman operator as performed in Sect. 2.3, which has been observed to provide robust results in various applications, cf., e.g., Peitz et al. (2020); Klus et al. (2022). The empirical estimator for the Galerkin projection \({\mathcal {L}}_{\mathbb {V}}\) is then given by

$$\begin{aligned} \tilde{{\mathcal {L}}}_m= {\tilde{C}}_m^{-1} {\tilde{A}}_m\end{aligned}$$

with \({\tilde{C}}_m= \tfrac{1}{m} \Psi (X) \Psi (X)^\top \), \({\tilde{A}}_m= \tfrac{1}{m} \Psi (X) {\mathcal {L}}\Psi (X)^\top \in {\mathbb {R}}^{N\times N}\). In all scenarios of Assumption 5, we have with probability one that

  1. (1)

    \(\tilde{{\mathcal {L}}}_m\) is well-defined for large enough m, that is, \({\tilde{C}}_m\) is invertible, and

  2. (2)

    \(\tilde{{\mathcal {L}}}_m\) converges to \({\mathcal {L}}_{\mathbb {V}}\) for \(m\rightarrow \infty \), see, e.g., Klus et al. (2018, 2020).

For the case of a long trajectory, this result follows from ergodic theory, which is concerned with the convergence of time averages to spatial averages as the data size grows to infinity (Beck and Schwartz 1957). Ergodic theory particularly applies to systems with a unique invariant measure.

2.2 Error Bounds on Approximations of Projected Koopman Generator And Operator

Next, we quantify the approximation quality of the data-driven finite-dimensional approximation of the Koopman generator, i.e., for a given linear space \({\mathbb {V}}\) of observables and data \(x_0,\ldots ,x_{m-1}\in {\mathbb {X}}\), we aim to estimate

$$\begin{aligned} \Vert {\mathcal {L}}_{\mathbb {V}}- \tilde{{\mathcal {L}}}_m\Vert _F = \Vert C^{-1}A - {\tilde{C}}_m^{-1}{\tilde{A}}_m\Vert _F. \end{aligned}$$

2.2.1 Concentration Bounds for Random Matrices

We start by deriving entry-wise error bounds for the data-driven mass and stiffness matrix, respectively. Since most of the arguments are significantly simpler for i.i.d. sampling, cf. Remark 11 at the end of this subsection, we first consider the more involved situation, i.e., ergodic sampling. This is of particular interest as simulation data of dynamics (SDE) can, then, be directly used.

For \(x \in {\mathbb {X}}\), consider a centered scalar random variable

$$\begin{aligned} \phi : \, {\mathbb {X}} \mapsto {\mathbb {R}}, \quad \int _{\mathbb {X}} \phi (x) \,\mathrm {d}\mu (x) = 0. \end{aligned}$$

We denote its variance with respect to the invariant measure by

$$\begin{aligned} \sigma ^2_\phi = {\mathbb {E}}^\mu [\phi ^2] = \Vert \phi \Vert ^2_{L^2_\mu }.\end{aligned}$$

Moreover, we set \(\phi _k = \phi (x_k)\) for given data points \(x_k\), \(k \in \{0,1,\ldots ,m-1\}\) and define the averaged random variable

$$\begin{aligned} {\bar{\phi }}_m&:= \frac{1}{m}\sum _{k= 0 }^{m-1} \phi _k. \end{aligned}$$

In Lemma 6 below, we quantify the variance of the averaged random variable \({\bar{\phi }}_m\). The key point is the decomposition of the variance into an asymptotic contribution, independent of m, and a second contribution, which decays with an explicitly given (polynomial) dependence on the amount of data m.

Lemma 6

Let Assumption 5.(erg) hold. Then we have

$$\begin{aligned} \sigma ^2_{{\bar{\phi }}_m}&= \frac{1}{m}\left[ \sigma _{\phi , \infty }^2 - R_\phi ^m\right] . \end{aligned}$$

The asymptotic variance \(\sigma _{\phi , \infty }^2\) and the remainder term \(R_\phi ^m\) are given by

$$\begin{aligned} \sigma _{\phi , \infty }^2&{=} \sigma _{\phi }^2 + 2 \sum _{l=1}^\infty \langle \phi , \, {\mathcal {K}}^{l\Delta _t} \phi \rangle _\mu ,&R_\phi ^m&{=} 2 \sum _{l=m}^\infty \langle \phi , \, {\mathcal {K}}^{l\Delta _t} \phi \rangle _\mu {+} \frac{2}{m} \sum _{l=1}^{m-1} l \langle \phi , \, {\mathcal {K}}^{l\Delta _t} \phi \rangle _\mu . \end{aligned}$$


We repeat the proof given in Lelièvre and Stoltz (2016, Section 3.1.2) for the sake of illustration:

$$\begin{aligned} \nonumber \sigma ^2_{{\bar{\phi }}_m}&= \frac{1}{m^2} \sum _{k, l = 0}^{m-1} {\mathbb {E}}^\mu \left[ \phi _k \, \phi _l \right] = \frac{1}{m} \sigma ^2_\phi + \frac{2}{m^2} \sum _{k=0}^{m-1} \sum _{l=k+1}^{m-1} {\mathbb {E}}^\mu \left[ \phi _k\, \phi _l \right] \\ \nonumber&= \frac{1}{m} \left[ \sigma ^2_\phi + \frac{2}{m} \sum _{k=0}^{m-1} \sum _{l=k+1}^{m-1} {\mathbb {E}}^\mu \left[ \phi _0\, \phi _{l-k} \right] \right] = \frac{1}{m} \left[ \sigma ^2_\phi + \frac{2}{m} \sum _{k=0}^{m-1}\sum _{l=1}^{m-k-1} {\mathbb {E}}^\mu \left[ \phi _0 \, \phi _{l} \right] \right] \\ \nonumber&{=} \frac{1}{m} \left[ \sigma ^2_\phi + \frac{2}{m}\sum _{l=1}^{m-1} (m -l) {\mathbb {E}}^\mu \left[ \phi _0 \, \phi _{l} \right] \right] {=} \frac{1}{m} \left[ \sigma ^2_\phi {+} 2\sum _{l=1}^{m-1} (1 - \tfrac{l}{m}) \langle \phi ,\, {\mathcal {K}}^{l\Delta _t} \phi \rangle _\mu \right] . \end{aligned}$$

The result follows by adding and subtracting the term \(2 \sum _{l=m}^\infty \langle \phi , \, {\mathcal {K}}^{l\Delta _t} \phi \rangle _\mu \). \(\square \)

Remark 7

The assumption of exponential stability is satisfied, for example, if the generator \({\mathcal {L}}\) is self-adjoint (also known as detailed balance or reversibility) and additionally satisfies a Poincaré or spectral gap inequality (Lelièvre and Stoltz 2016). The requirement \(\langle f, 1 \rangle _\mu = 0\) is necessary, as the constant function is invariant for \({\mathcal {K}}^t\).

Remark 8

The proof of Lemma 6 shows that \(\sigma ^2_{\phi , \infty } = \lim _{m \rightarrow \infty } \sigma ^2_{{\bar{\phi }}_m} \ge 0\); hence, it can indeed be interpreted as a variance.

For reversible systems, we have \(\langle \phi ,\, {\mathcal {K}}^{l \Delta _t} \phi \rangle _\mu \ge 0\) by symmetry of the Koopman operator. Therefore, \(\sigma ^2_{\phi , \infty } \ge \sigma ^2_\phi > 0\) is guaranteed in this case, and the variance \(\sigma ^2_{{\bar{\phi }}_m}\) approaches \(\frac{1}{m}\sigma ^2_{\phi , \infty }\) from below.

Next, we derive an estimate for the remainder term in terms of the number m of data points.

Lemma 9

Let Assumption 5.(erg) hold, and set \(q = e^{-\omega \Delta _t} < 1\). Then

$$\begin{aligned} |R_\phi ^m | \le \frac{2\sigma ^2_{\phi }}{m}\frac{q}{(1-q)^2}. \end{aligned}$$


We first observe that by the Cauchy Schwarz inequality

$$\begin{aligned} |\langle \phi , {\mathcal {K}}^{l\Delta _t} \phi \rangle _\mu |&\le \Vert {\mathcal {K}}^{l \Delta _t} \Vert _{L(L^2_{\mu , 0}({\mathbb {X}}), L^2_{\mu }({\mathbb {X}}))} \Vert \phi \Vert ^2_{L^2_{\mu }({\mathbb {X}})} \le e^{-\omega \Delta _t l} \sigma ^2_\phi , \end{aligned}$$

and therefore:

$$\begin{aligned} |R_\phi ^m |&\le 2\sigma ^2_{\phi } \left[ \sum _{l=m}^\infty e^{-\omega \Delta _t l} + \frac{1}{m} \sum _{l=1}^{m-1} l e^{-\omega \Delta _t l} \right] \\&= 2\sigma ^2_{\phi } \left[ \frac{q^m}{1-q} + \frac{1}{m} \frac{(m-1)q^{m+1} - mq^m + q }{(1-q)^2} \right] \\&= \frac{2\sigma ^2_{\phi }}{m}\frac{q(1 - q^m)}{(1-q)^2} \le \frac{2\sigma ^2_{\phi }}{m}\frac{q}{(1-q)^2} . \end{aligned}$$

In the second line, we have used the geometric series for the first term and a similar identity for the sum \(\sum _{l=1}^\infty l q^l, \,q < 1\). The third line is obtained by direct simplification. \(\square \)

We can now combine the results of Lemmas 6 and 9 in order to obtain a concentration bound for a centered, matrix-valued random variable. To this end, we consider an \(N\times N\) random matrix \(\Phi \) with all entries \(\phi _{ij} \in L^2_{\mu , 0}\) centered. We define \(\Phi _k\) and \({\overline{\Phi }}_m\) as for the scalar case, i.e., \(\Phi _k = \Phi (x_k)\) and \({\overline{\Phi }}_m = \tfrac{1}{m}\sum _{k=0}^{m-1}\Phi _k\).

Proposition 10

Let Assumption 5.(erg) hold,, set \(q = e^{-\omega \Delta _t}\), and assume \(\sigma ^2_{\phi _{ij}, \infty } > 0\) for all (ij). Let \(\Phi \in {\mathbb {R}}^{N \times N}\) be a centered, matrix-valued random variable in \(L^2_\mu \). Denote the matrices of all entry-wise variances and asymptotic variances by

$$\begin{aligned} \Sigma _{\Phi }&= \left( \sigma _{\phi _{ij}} \right) _{i,j=1}^N,&\Sigma _{\Phi , \infty }&= \left( \sigma _{\phi _{ij}, \infty } \right) _{i,j=1}^N \end{aligned}$$

Then, for any given \(\delta > 0\), and \(m \in {\mathbb {N}}\), we have with probability at least \(1 - \delta \) that

$$\begin{aligned} \Vert {\overline{\Phi }}_m\Vert _F \le \frac{N}{\sqrt{m \delta }} \left[ \Vert \Sigma _{\Phi ,\infty }\Vert ^2_F + \frac{2q}{m (1- q)^2} \Vert \Sigma _{\Phi } \Vert ^2_F \right] ^{1/2}. \end{aligned}$$

For reversible systems, we obtain the simplified bound

$$\begin{aligned} \Vert {\overline{\Phi }}_m\Vert _F \le \frac{N}{\sqrt{m \delta }} \Vert \Sigma _{\Phi ,\infty }\Vert _F. \end{aligned}$$


Noting that \([{\overline{\Phi }}_m]_{ij} = [\overline{\phi _{ij}}]_m\), the scalar Chebyshev inequality and the result of Lemma 6, yield for all (ij) :

$$\begin{aligned} {\mathbb {P}}\left( [{\overline{\Phi }}_m]_{ij}^2 \le \varepsilon ^2 \right)&\ge 1 - \frac{\sigma ^2_{[\overline{\phi _{ij}}]_m}}{\varepsilon ^2} = 1 - \frac{\frac{1}{m}[\sigma ^2_{\phi _{ij},\infty } - R_{\phi _{ij}}^m]}{\varepsilon ^2}\\&\ge 1 - \frac{1}{m \varepsilon ^2}\left[ \sigma ^2_{\phi _{ij},\infty } + \frac{2 \sigma ^2_{\phi _{ij}} q}{m (1- q)^2}\right] . \end{aligned}$$

The second term on the right-hand side does not exceed \(\frac{\delta }{N^2}\) if

$$\begin{aligned} \varepsilon ^2 \ge \frac{N^2}{m \delta }\left[ \sigma ^2_{\phi _{ij},\infty } + \frac{2 \sigma ^2_{\phi _{ij}} q}{m (1- q)^2}\right] , \end{aligned}$$

in other words, there is a set of trajectories of probability at least \(1 - \frac{\delta }{N^2}\) such that

$$\begin{aligned}{}[{\overline{\Phi }}_m]_{ij}^2 \le \frac{N^2}{m \delta }\left[ \sigma ^2_{\phi _{ij},\infty } + \frac{2 \sigma ^2_{\phi _{ij}} q}{m (1- q)^2}\right] . \end{aligned}$$

On the intersection of these sets, we have that

$$\begin{aligned} \Vert {\overline{\Phi }}_m\Vert _F \le \frac{N}{\sqrt{m \delta }} \left[ \Vert \Sigma _{\Phi ,\infty }\Vert ^2_F + \frac{2q}{m (1- q)^2} \Vert \Sigma _{\Phi } \Vert ^2_F \right] ^{1/2}, \end{aligned}$$

and the probability of the intersection is at least \(1 - \delta \) by Lemma 22. In the reversible case, we know that \(R_{\phi _{ij}}^m \ge 0\) for all (ij), and therefore

$$\begin{aligned} {\mathbb {P}}\left( [{\overline{\Phi }}_m]_{ij}^2 \le \varepsilon ^2 \right)&\ge 1 - \frac{1}{m \varepsilon ^2}\sigma ^2_{\phi _{ij},\infty }. \end{aligned}$$

Simplified bound (5) follows by repeating the above argument starting from this inequality. \(\square \)

Remark 11

(I.i.d. sampling) If the data are sampled i.i.d., that is, Assumption 5.(iid) holds instead of Assumption 5.(erg), then by standard results, one has \(\sigma ^2_{{\bar{\phi }}_m} = \frac{1}{m}\sigma ^2_\phi \). The bounds from Proposition 10 simplify significantly in this case. By the Chebyshev inequality:

$$\begin{aligned} {\mathbb {P}}\left( [{\overline{\Phi }}_m]_{ij}^2 \le \varepsilon ^2 \right)&\ge 1 - \frac{\frac{1}{m} \sigma ^2_{\phi _{ij}}}{\varepsilon ^2}, \end{aligned}$$

which leads to the following error estimate for fixed \(m \in {\mathbb {N}}\) and \(\delta > 0\):

$$\begin{aligned} \Vert {\overline{\Phi }}_m\Vert _F \le \frac{N}{\sqrt{m \delta }} \Vert \Sigma _{\Phi } \Vert _F. \end{aligned}$$

The setting of sampling via the Lebesgue measure on a compact set \({\mathbb {X}}\) was thoroughly considered in Zhang and Zuazua (2021).

2.2.2 Error Bound for the Projected Generator

Next, we deduce our first main result by applying the probabilistic bounds obtained in Proposition 10 to estimate the error for the data-driven Galerkin projection \(\tilde{{\mathcal {L}}}_m\).

Theorem 12

(Approximation error: probabilistic bound) Let Assumption 5 hold. Then, for any error bound \({\tilde{\varepsilon }} > 0\) and probabilistic tolerance \({\tilde{\delta }} \in (0,1)\), we have

$$\begin{aligned} {\mathbb {P}}\left( \Vert {\mathcal {L}}_{\mathbb {V}}- \tilde{{\mathcal {L}}}_m\Vert _F\le {\tilde{\varepsilon }}\right) \ge 1-{\tilde{\delta }} \end{aligned}$$

for any amount \(m \in {\mathbb {N}}\) of data points such that the following hold with

$$\begin{aligned} \varepsilon = \min \left\{ 1,\frac{1}{\Vert A\Vert \Vert C^{-1}\Vert }\right\} \cdot \frac{\Vert A\Vert {\tilde{\varepsilon }}}{2\Vert A\Vert \Vert C^{-1}\Vert + {\tilde{\varepsilon }}}\quad \text {and}\quad \delta = \frac{{\tilde{\delta }}}{3}. \end{aligned}$$
  • In case of ergodic sampling, i.e., Assumption 5.(erg),

    $$\begin{aligned} m \ge \frac{N^2}{\delta \varepsilon ^2} \left[ \Vert \Sigma _{\Phi ,\infty }\Vert ^2_F + \frac{2q}{m (1- q)^2} \Vert \Sigma _{\Phi } \Vert ^2_F \right] \end{aligned}$$
  • In case of ergodic sampling, i.e., Assumption 5.(erg), of a reversible system

    $$\begin{aligned} m \ge \frac{N^2}{\delta \varepsilon ^2} \Vert \Sigma _{\Phi ,\infty }\Vert ^2_F. \end{aligned}$$
  • In case of i.i.d. sampling, i.e., Assumption 5.(iid),

    $$\begin{aligned} m \ge \frac{N^2}{\delta \varepsilon ^2} \Vert \Sigma _{\Phi } \Vert ^2_F. \end{aligned}$$


In this proof, we will omit the subscript for the norm and set \(\Vert \cdot \Vert = \Vert \cdot \Vert _F\). Let us introduce the centered matrix-valued random variables

$$\begin{aligned} \Phi _C(x) := \Psi (x)\Psi (x)^\top - C \qquad \text {and}\qquad \Phi _A(x) := \Psi (x){\mathcal {L}}\Psi (x)^\top - A, \end{aligned}$$

where \(\Psi = [\psi _1,\ldots ,\psi _N]^\top \). Then \(\widetilde{C}_m - C = [\overline{\Phi _C}]_m\) and \(\widetilde{A}_m - A = [\overline{\Phi _A}]_m\). Hence, we may apply Proposition 10 to these matrix-valued random variables. First, by the choice of m above we have

$$\begin{aligned} {\mathbb {P}}\left( \Vert C-\widetilde{C}_m\Vert \le \frac{R}{\Vert A\Vert \Vert C^{-1}\Vert }\right) \ge 1-\tfrac{{\tilde{\delta }}}{3} \qquad \text {and}\qquad {\mathbb {P}}\left( \Vert {\tilde{A}}_m-A\Vert \le R\right) \ge 1-\tfrac{{\tilde{\delta }}}{3}, \end{aligned}$$


$$\begin{aligned} R := \frac{\Vert A\Vert {\tilde{\varepsilon }}}{2\Vert A\Vert \Vert C^{-1}\Vert + {\tilde{\varepsilon }}} = \frac{{\tilde{\varepsilon }}}{2\left( \Vert C^{-1}\Vert + \frac{{\tilde{\varepsilon }}}{2\Vert A\Vert }\right) }. \end{aligned}$$

Moreover, we compute

$$\begin{aligned} \Vert {\tilde{C}}_m^{-1}-C^{-1}\Vert {=} \Vert {\tilde{C}}_m^{-1}(C - {\tilde{C}}_m)C^{-1}\Vert {\le } \Vert C^{-1}\Vert \Vert C{-}{\tilde{C}}_m\Vert \left( \Vert {\tilde{C}}_m^{-1}{-} C^{-1}\Vert {+} \Vert C^{-1}\Vert \right) \end{aligned}$$

which implies

$$\begin{aligned} \Vert {\tilde{C}}_m^{-1}-C^{-1}\Vert \le \frac{\Vert C^{-1}\Vert ^2\Vert C-{\tilde{C}}_m\Vert }{1-\Vert C^{-1}\Vert \Vert C-{\tilde{C}}_m\Vert }. \end{aligned}$$

Hence, by straightforward computations we obtain

$$\begin{aligned} {\mathbb {P}}\left( \Vert {\tilde{C}}_m^{-1}-C^{-1}\Vert \le \tfrac{{\tilde{\varepsilon }}}{2\Vert A\Vert }\right)&\ge {\mathbb {P}}\left( \frac{\Vert C^{-1}\Vert ^2\Vert C-{\tilde{C}}_m\Vert }{1-\Vert C^{-1}\Vert \Vert C-{\tilde{C}}_m\Vert } \le \tfrac{{\tilde{\varepsilon }}}{2\Vert A\Vert }\right) \\&={\mathbb {P}}\left( \Vert C-\widetilde{C}_m\Vert \le \frac{R}{\Vert A\Vert \Vert C^{-1}\Vert }\right) \ge 1-\tfrac{{\tilde{\delta }}}{3}. \end{aligned}$$


$$\begin{aligned}&{\mathbb {P}}\left( \Vert {\tilde{A}}_m-A\Vert \le \tfrac{{\tilde{\varepsilon }}}{2\Vert {\tilde{C}}_m^{-1}\Vert }\right) \ge {\mathbb {P}}\left( \Vert {\tilde{A}}_m-A\Vert \le \tfrac{{\tilde{\varepsilon }}}{2\left( \Vert C^{-1}\Vert + \Vert {\tilde{C}}_m^{-1}-C^{-1}\Vert \right) }\right) \\&\quad \ge {\mathbb {P}}\left( \Vert {\tilde{A}}_m-A\Vert \le \tfrac{{\tilde{\varepsilon }}}{2\left( \Vert C^{-1}\Vert +\tfrac{{\tilde{\varepsilon }}}{2\Vert A\Vert }\right) } \;\wedge \; \Vert {\tilde{C}}_m^{-1}-C^{-1}\Vert \le \tfrac{{\tilde{\varepsilon }}}{2\Vert A\Vert } \right) \\&\quad \ge \left( 1-\tfrac{{\tilde{\delta }}}{3}\right) + \left( 1-\tfrac{{\tilde{\delta }}}{3}\right) - 1 = 1-\tfrac{2{\tilde{\delta }}}{3}. \end{aligned}$$

Thus, we conclude

$$\begin{aligned} {\mathbb {P}}(\Vert C^{-1}A -{\tilde{C}}_m^{-1}{\tilde{A}}_m\Vert \le {\tilde{\varepsilon }})&= {\mathbb {P}}\left( \Vert {\tilde{C}}_m^{-1}(A-{\tilde{A}}_m) + \left( C^{-1}-{\tilde{C}}_m^{-1}\right) A\Vert \le {\tilde{\varepsilon }}\right) \\&\ge {\mathbb {P}}\left( \Vert {\tilde{C}}_m^{-1}\Vert \Vert A-{\tilde{A}}_m\Vert + \Vert C^{-1}-{\tilde{C}}_m^{-1}\Vert \Vert A\Vert \le {\tilde{\varepsilon }}\right) \\&\ge {\mathbb {P}}\left( \Vert A-{\tilde{A}}_m\Vert \le \tfrac{{\tilde{\varepsilon }}}{2\Vert {\tilde{C}}_m^{-1}\Vert } \wedge \Vert C^{-1}-{\tilde{C}}_m^{-1}\Vert \le \tfrac{{\tilde{\varepsilon }}}{2\Vert A\Vert }\right) \\&\ge (1-\tfrac{2{\tilde{\delta }}}{3}) + (1-\tfrac{{\tilde{\delta }}}{3}) - 1 = 1-{\tilde{\delta }}, \end{aligned}$$

which is (8). \(\square \)

A similar result as Theorem 12 was obtained for ODE systems in Zhang and Zuazua (2021) under the assumption that the data are drawn i.i.d.

An immediate consequence of the estimate on the generator approximation error is a bound on the error of the trajectories. To this end, consider the systems

$$\begin{aligned}&{\dot{z}}&= {\mathcal {L}}_{\mathbb {V}}z&z(0)=z_0\\&\dot{{\tilde{z}}}&= \tilde{{\mathcal {L}}}_m{{\tilde{z}}}&{\tilde{z}}(0)={z}_0. \end{aligned}$$

where \(z_0\in {\mathbb {R}}^n\), which represents an ODE in terms of the coefficients in the basis representation of elements of \({\mathbb {V}}\). We will leverage the error bound obtained in Theorem 12 to derive an estimate on the resulting prediction error in the observables, i.e., \(\Vert z(t)-{\tilde{z}}(t)\Vert _2\). Note that in view of the isomorphism \({\mathbb {V}}\simeq {\mathbb {R}}^N\) this also directly translates to an error estimate for trajectories in \({\mathbb {V}}\).

Corollary 13

Let Assumption 5 hold. Then for any \(T>0\) and \(\delta ,\varepsilon >0\) there is \(m_0\in {\mathbb {N}}\) such that for \(m\ge m_0\) data points we have

$$\begin{aligned} \min _{t \in [0,T]} {\mathbb {P}}\big (\Vert z(t) - {{\tilde{z}}}(t)\Vert _2 \le \varepsilon \big ) \ge 1-\delta . \end{aligned}$$


See “Appendix A.3.” \(\square \)

A sufficient amount of data \(m_0\) can be easily specified by combining the calculations displayed in the proof of Corollary 13, i.e., Gronwall’s inequality and Condition (4). Under additional assumptions on the Koopman semigroup generated by \({\mathcal {L}}_{\mathbb {V}}\), e.g., stability, one can refine this estimate or render it uniform in T, cf. Corollary 24 in “Appendix A.3.”

2.3 Error Bound for the Projected Koopman Operator

Similar to the derivation of the probabilistic bound on the projected generator, a bound on the Koopman operator is possible. We briefly sketch the main steps of the argumentation. Let \(t = l\Delta _t\) for some \(l\in {\mathbb {N}}\) and again choose a subspace \({\mathbb {V}} = {\text {span}}\{\{\psi _j\}_{j=1}^N\}\subset L^2_\mu ({\mathbb {X}})\) (which, in contrast to the generator-based setting, is not required to be contained in the domain). The restricted Koopman operator on this subspace is defined via


$$\begin{aligned} C_{i,j}=\langle \psi _i,\psi _j\rangle _{L_\mu ^2({\mathbb {X}})} \qquad \text {and}\qquad A_{i,j} =\langle \psi _i,{\mathcal {K}}^{t}\psi _j\rangle _{L_\mu ^2({\mathbb {X}})}. \end{aligned}$$

Define the data matrices

$$\begin{aligned} \Psi (X)&:= \left( \left. \left( {\begin{matrix} \psi _1(x_0)\\ :\\ \psi _N(x_0) \end{matrix}}\right) \right| \ldots \left| \left( {\begin{matrix} \psi _1(x_{m-l-1})\\ :\\ \psi _N(x_{m-l-1}) \end{matrix}}\right) \right. \right) \\ \Psi (Y)&:= \left( \left. \left( {\begin{matrix} \psi _1(x_l)\\ :\\ \psi _N(x_l) \end{matrix}}\right) \right| \ldots \left| \left( {\begin{matrix} \psi _1(x_{m-1})\\ :\\ \psi _N(x_{m-1}) \end{matrix}}\right) \right. \right) . \end{aligned}$$

The empirical estimator is then defined similarly to the generator setting via

$$\begin{aligned} \tilde{{\mathcal {K}}}_m^{t} ={\tilde{C}}_m^{-1} {\tilde{A}}_m\end{aligned}$$


$$\begin{aligned} {\tilde{C}}_m= \tfrac{1}{m} \Psi (X) \Psi (X)^\top \qquad \text {and}\qquad {\tilde{A}}_m= \tfrac{1}{m} \Psi (X) \Psi (Y)^\top . \end{aligned}$$

We now present the analogue to Theorem 12 for the Koopman operator which follows by straightforward adaptations of the results of Sect. 2.2.

Theorem 14

Let Assumption 5 hold. Then, for \(t\ge 0\), any error bound \(\varepsilon > 0\) and probabilistic tolerance \(\delta \in (0,1)\) there is \(m_0\in {\mathbb {N}}\) such that for any \(m\ge m_0\),

$$\begin{aligned} {\mathbb {P}}\left( \Vert {\mathcal {K}}^{t}_{\mathbb {V}}-\tilde{{\mathcal {K}}}_m^{t}\Vert _{F}\le \varepsilon \right) \ge 1-\delta . \end{aligned}$$

A sufficient amount of data \(m_0\) can be specified analogously to Theorem 12.

3 Extension to Control Systems

In this section, we derive probabilistic bounds on the approximation error of nonlinear control-affine SDE systems of the form

$$\begin{aligned} \text {d}X_t = \left( F(X_t)+ \sum _{i=1}^{n_c} G_i(X_t)u_i \right) \text {d}t + \sigma (X_t) \,\text {d}W_t, \end{aligned}$$

with input \(u\in {\mathbb {R}}^{n_c}\) and state \(X_t\in {{\mathbb {X}}}\), where \(F:{{\mathbb {X}}}\rightarrow {\mathbb {R}}^n\) and \(G_i: {{\mathbb {X}}}\rightarrow {\mathbb {R}}^{n}\), \(i = 1,\ldots ,n_c\), are locally Lipschitz-continuous vector fields. In the deterministic case \(\sigma \equiv 0\) the controlled SDE reduces to the control-affine ODE system

$$\begin{aligned} {\dot{x}} = F(x) + \sum _{i=1}^{n_c} G_i(x)u_i. \end{aligned}$$

We will describe how one can apply the bounds on the generators of autonomous (SDE) systems obtained in Sect. 2 in order to obtain bounds for prediction of control systems, either for i.i.d. or ergodic sampling. Again, we will analyze the error on the finite dictionary resulting from finitely many data points, depending on the chosen control variable. In the i.i.d. setting, we analyzed the projection error using a finite element dictionary in the follow-up work (Schaller et al. 2022). Further, also in Schaller et al. (2022), we derived uniform bounds for data requirements and dictionary size w.r.t. the control variable, assuming that the control is constrained to a compact subset.

Central in this part is the fact that the Koopman generators for control-affine systems are control-affine. More precisely, if \({\mathcal {L}}^{{\bar{u}}}\) denotes the Koopman generator for a control-affine system with constant control \({\bar{u}}\in {\mathbb {R}}^{n_c}\) and \({{\bar{u}}} = \sum _{i=1}^{r}\alpha _i {{\bar{u}}}_i\) is a linear combination of constant controls \({{\bar{u}}}_i\in {\mathbb {R}}^{n_c}\), we have

$$\begin{aligned} {\mathcal {L}}^{{{\bar{u}}}} = {\mathcal {L}}^0 + \sum _{i=1}^{n_c}\alpha _i\big ({\mathcal {L}}^{\bar{u}_i} - {\mathcal {L}}^0\big ). \end{aligned}$$

This easily follows from representation (2) of the Koopman generator, see also Peitz et al. (2020, Theorem 3.2) for the special (deterministic) case \(\sigma \equiv 0\).

We will utilize this property to invoke our results from Sect. 2 to approximate the Koopman generator corresponding to basis elements of the control space, that is, \({\mathcal {L}}^{e_i}\), \(i=1,\ldots ,n_c\), and \({\mathcal {L}}^0\) corresponding to the drift term to form a bilinear control system in the observables.

Analogously to Assumption 5 we have the following two cases for the collected data and the underlying measure.

Assumption 15

Let either of the following hold:


The data for each autonomous system with control \(u=e_i\), \(i=0,\ldots ,n_c\), are sampled i.i.d. from either the normalized Lebesgue measure and contained in a compact set \({\mathbb {X}}\) or from an invariant measure \(\mu _i\) in the sense of Definition 2.


The data for each autonomous system with control \(u=e_i\), \(i=0,\ldots ,n_c\), satisfy Assumption 5.(erg), i.e., is drawn from a single ergodic trajectory, the probability measure \(\mu _i\) of the resulting autonomous SDE is invariant in the sense of Definition 2 and the Koopman semigroup is exponentially stable on \(L^2_{\mu _i,0}({\mathbb {X}})\).

It is important to note that in the first case of (iid), we did not make any assumption of invariance of the set \({\mathbb {X}}\) for all autonomous systems corresponding to the constant controls \(e_i\), \(i=0,\ldots ,n_c\), as this would be very restrictive. Hence, we have to ensure that the state trajectories remain (with probability one in stochastic setting (11)) in the set \({\mathbb {X}}\). Sufficient conditions are, e.g., controlled forward invariance of the set \({\mathbb {X}}\) or knowing that the initial condition is contained in a suitable sub-level set of the optimal value function of a respective optimal control problem, see, e.g., Boccia et al. (2014) or Esterhuizen et al. (2020) for an illustrative application of such a technique in showing recursive stability of Model Predictive Control (MPC) without stabilizing terminal constraints for discrete- and continuous-time systems, respectively.

In the following, we set \({\mathcal {O}}_i = L^2_{\mu _i}({\mathbb {X}})\), \(i=1,\ldots ,n_c\), and consider the generators \({\mathcal {L}}^{e_i}\) in these spaces, respectively. Further, let \(\psi _1,\ldots ,\psi _N : \mathbb X\rightarrow {\mathbb {R}}\) be N linearly independent observables whose span \(\mathbb V = {\text {span}}\{\psi _1,\ldots ,\psi _N\}\) satisfies

$$\begin{aligned} {\mathbb {V}} \subset D({\mathcal {L}}^{e_0})\cap D({\mathcal {L}}^{e_1})\cap \ldots \cap D({\mathcal {L}}^{e_{n_c}}), \end{aligned}$$

where \(e_i\), \(i=1,\ldots ,n_c\), denote the standard basis vectors of \({\mathbb {R}}^{n_c}\) and \(e_0 := 0\). We now discuss two cases of sampling, one corresponding to the approach of Sect. 2 and one to the standard case of i.i.d. sampling as in Zhang and Zuazua (2021).

As the original system and the Koopman generator are control affine, the remainder of this section is split up into two parts. First, we derive error estimates corresponding to autonomous systems driven by \(n_c+1\) constant controls. Second, we use these estimates and control affinity to deduce a result for general controls.

In accordance with the notation in Sect. 2 we define \({\mathcal {L}}_{{\mathbb {V}}}^{e_i} := P_{\mathbb V}{\mathcal {L}}^{e_i}|_{{\mathbb {V}}}\) and also use this symbol to denote the matrix representation of this linear operator w.r.t. to the basis \(\{\psi _1,\ldots ,\psi _N\}\) of \({\mathbb {V}}\). Its approximation based on the data \(x_0,\ldots ,x_{m-1}\in {\mathbb {X}}\) will be denoted by \({\tilde{{\mathcal {L}}}}_m^{e_i}\).

Proposition 16

Let \(i \in \{0,\ldots ,n_c\}\) be given and Assumption 15 hold. Then, for any pair consisting of a desired error bound \(\varepsilon > 0\) and a probabilistic tolerance \(\delta \in (0,1)\), there are a number of data points \(m_i\) such that for any \(m \ge m_i\), we have the estimate

$$\begin{aligned} {\mathbb {P}}\big ( \Vert {\mathcal {L}}_{\mathbb {V}}^{e_i}-\tilde{{\mathcal {L}}}_m^{e_i}\Vert _F\le \varepsilon \big ) \ge 1-\delta . \end{aligned}$$

The minimal amount of data \(m_i\) is given by the formulas of Theorem 12.


The claim follows immediately from applying Theorem 12. \(\square \)

Having obtained an estimate for the autonomous systems corresponding to the constant controls \(e_i, i=0,\ldots n_c\), we can leverage the control affinity of the system to formulate the corresponding results for arbitrary controls. To this end, for any control \(u(t) = \sum _{i=1}^{n_c}\alpha _i(t) e_i \in L^\infty (0,T;{\mathbb {R}}^{n_c})\), we define the projected Koopman generator and its approximation corresponding to the nonautonomous system with control u by

$$\begin{aligned} {\mathcal {L}}_{\mathbb {V}}^u (t)&:= {\mathcal {L}}_{\mathbb {V}}^0 + \sum _{i=1}^{n_c}\alpha _i(t)\big ({\mathcal {L}}_{\mathbb {V}}^{e_i}-{\mathcal {L}}_{\mathbb {V}}^0\big ),\\ \tilde{{\mathcal {L}}}_m^u(t)&:= \tilde{{\mathcal {L}}}_m^0 + \sum _{i=1}^{n_c}\alpha _i(t)\big (\tilde{{\mathcal {L}}}_m^{e_i}-\tilde{{\mathcal {L}}}_m^0\big ). \end{aligned}$$

Theorem 17

Let Assumption 15 hold. Then, for any pair consisting of a desired error bound \({\tilde{\varepsilon }} > 0\) and probabilistic tolerance \({\tilde{\delta }} \in (0,1)\), prediction horizon \(T>0\), and control function \(u\in L^\infty (0,T;{\mathbb {R}}^{n_c})\) we have

$$\begin{aligned} {\text {ess\,inf}}_{t \in [0,T]}{\mathbb {P}}\big (\Vert {\mathcal {L}}_{\mathbb {V}}^u(t) - \tilde{{\mathcal {L}}}_m^u(t)\Vert _F \le {\tilde{\varepsilon }}\big ) \ge 1-{\tilde{\delta }}, \end{aligned}$$

provided that the number m of data points exceeds \(\max _{i=0,\ldots ,n_c} m_i\) with \(m_i\) defined as in Proposition 16 with

$$\begin{aligned} \varepsilon = \tfrac{{\tilde{\varepsilon }}}{(n_c+1)\left( 1 + \sum _{i=1}^{n_c}\Vert \alpha _i\Vert _{L^\infty (0,T)}\right) } \qquad \text {and}\qquad \delta = 1-\tfrac{{\tilde{\delta }}}{n_c+1}. \end{aligned}$$


Again, we omit the subscript of the norm and set \(\Vert \cdot \Vert =\Vert \cdot \Vert _F\). Using the result of Proposition 16 and our choice of \(m_0\), we have

$$\begin{aligned} {\mathbb {P}}\left( \Vert \tilde{{\mathcal {L}}}_m^{0}-{\mathcal {L}}_{\mathbb {V}}^0\Vert \le \tfrac{{\tilde{\varepsilon }}}{(n_c+1)\left( 1 + \sum _{i=1}^{n_c}\Vert \alpha _i\Vert _{L^\infty (0,T)}\right) }\right) \ge 1-\tfrac{{\tilde{\delta }}}{n_c + 1}, \end{aligned}$$

and for all \(i\in 1,\ldots n_c\)

$$\begin{aligned} {\mathbb {P}}\left( \Vert {\mathcal {L}}_{\mathbb {V}}^{e_i} - \tilde{{\mathcal {L}}}_m^{e_i}\Vert \le \tfrac{{\tilde{\varepsilon }}}{\left( n_c+1\right) \Vert \alpha _i\Vert _{L^\infty (0,T)}}\right) \ge 1- \tfrac{{\tilde{\delta }}}{n_c + 1}. \end{aligned}$$

Then we compute for \(a.e.\ t\in [0,T]\)

$$\begin{aligned}&{\mathbb {P}}\left( \Vert {\mathcal {L}}_{\mathbb {V}}^u(t) - \tilde{{\mathcal {L}}}_m^u(t) \Vert \le {\tilde{\varepsilon }}\right)&\\&\ge {\mathbb {P}}\left( \left\| \left( 1-\sum _{i=1}^{n_c}\alpha _i(t)\right) \left( {\mathcal {L}}_{\mathbb {V}}^0 - {\tilde{{\mathcal {L}}}}_m^0\right) \right\| + \sum _{i=1}^{n_c} \left\| \alpha _i(t)\left( \tilde{{\mathcal {L}}}_m^{e_i} - {\mathcal {L}}_{\mathbb {V}}^{e_i}\right) \right\| \le {\tilde{\varepsilon }}\right) \\&\ge {\mathbb {P}}\left( \left\| \left( 1-\sum _{i=1}^{n_c}\alpha _i(t)\right) \left( {\mathcal {L}}_{\mathbb {V}}^0 - {\tilde{{\mathcal {L}}}}_m^0\right) \right\| {\le } \tfrac{{\tilde{\varepsilon }}}{n_c+1}\,\wedge \, \mathop {\forall }_{i=1}^{n_c} : \left\| \alpha _i(t) \left( \tilde{{\mathcal {L}}}_m^{e_i} {-} {\mathcal {L}}_{\mathbb {V}}^{e_i}\right) \right\| {\le } \tfrac{{\tilde{\varepsilon }}}{n_c+1}\right) . \end{aligned}$$

Next, we use Lemma 22 from “Appendix A.2” with \(d = n_c+1\),

$$\begin{aligned} A_0 = \left\{ \left\| \left( 1-\sum _{i=1}^{n_c}\alpha _i(t)\right) \left( {\mathcal {L}}_{\mathbb {V}}^0 - {\tilde{{\mathcal {L}}}}_m^0\right) \right\| \le \tfrac{{\tilde{\varepsilon }}}{n_c+1}\right\} \quad \text {and}\quad A_i = \left\{ \left\| \alpha _i(t) \left( \tilde{{\mathcal {L}}}_m^{e_i} - {\mathcal {L}}_{\mathbb {V}}^{e_i}\right) \right\| \le \tfrac{{\tilde{\varepsilon }}}{n_c+1}\right\} \end{aligned}$$

for \(i=1,\ldots ,n_c\). This yields

$$\begin{aligned}&{\mathbb {P}}\left( \Vert {\mathcal {L}}_{\mathbb {V}}^u(t) - \tilde{{\mathcal {L}}}_m^u(t) \Vert \le {\tilde{\varepsilon }}\right)&\\&\ge {\mathbb {P}}\left( \left\| \left( 1-\sum _{i=1}^{n_c}\alpha _i(t)\right) \left( {\mathcal {L}}_{\mathbb {V}}^0 - {\tilde{{\mathcal {L}}}}_m^0\right) \right\| \le \tfrac{{\tilde{\varepsilon }}}{n_c+1} \right) \\&\quad + \sum _{i=1}^{n_c} {\mathbb {P}}\left( \Vert \alpha _i(t)\big (\tilde{{\mathcal {L}}}_m^{e_i}-{\mathcal {L}}_{\mathbb {V}}^{{e_i}}\big )\Vert \le \tfrac{{\tilde{\varepsilon }}}{n_c+1} \right) - n_c \\&\ge {\mathbb {P}}\left( \Vert \tilde{{\mathcal {L}}}_m^{0}-{\mathcal {L}}_{\mathbb {V}}^0\Vert \le \tfrac{{\tilde{\varepsilon }}}{\left( 1 + \sum _{i=1}^{n_c} \Vert \alpha _i\Vert _{L^\infty (0,T)}\right) (n_c+1)} \right) \\&\quad + \sum _{i=1}^{n_c} {\mathbb {P}}\left( \Vert \tilde{{\mathcal {L}}}_m^{e_i}-{\mathcal {L}}_{\mathbb {V}}^{{e_i}}\Vert \le \tfrac{{\tilde{\varepsilon }}}{\left( n_c+1\right) \Vert \alpha _i\Vert _{L^\infty (0,T)}} \right) - n_c\\&\ge 1-\tfrac{{\tilde{\delta }}}{n_c+1} + \sum _{i=1}^{n_c} \left( 1-\tfrac{{\tilde{\delta }}}{n_c+1}\right) - n_c = 1-{\tilde{\delta }}. \end{aligned}$$

Taking the essential infimum yields the result. \(\square \)

In the previous result of Theorem 17, the data requirements depend on the chosen control. If the values of the control function are constrained to a compact subset, one can derive uniform data requirements w.r.t. the control, cf. our follow-up work (Schaller et al. 2022). Finally, similar as in the previous section, we obtain a bound on trajectories via Gronwall, if the state response is contained in \({\mathbb {X}}\).

Corollary 18

Let Assumption 15 hold. Let \(T,\varepsilon >0\) and \(\delta \in (0,1)\), \(z_0\in {\mathbb {R}}^N\) and \(u\in L^\infty (0,T;{\mathbb {R}}^{n_c})\) such that the solution of (SDE) is contained in \({\mathbb {X}}\) with probability one. Then there is \(m_0\in {\mathbb {N}}\) such that for \(m\ge m_0\) the solutions \(z,{\tilde{z}}\) of

$$\begin{aligned}&{\dot{z}}(t)&= {\mathcal {L}}_{\mathbb {V}}^u(t)z&z(0)=z_0\\&\dot{{\tilde{z}}}(t)&= \tilde{{\mathcal {L}}}_m^u(t){\tilde{z}}&{\tilde{z}}(0)={z}_0 \end{aligned}$$


$$\begin{aligned} \min _{t\in [0,T]}{\mathbb {P}}\big ( \Vert z(t)-{\tilde{z}}(t)\Vert _2 \le \varepsilon \big ) \ge 1-\delta . \end{aligned}$$


See “Appendix A.3.” \(\square \)

As in Corollary 13, \(m_0\) can explicitly be computed by combining Theorem 17 with the constants in Gronwall’s inequality.

We conclude this section with a final corollary regarding the optimality of the solution obtained using an error-certified Koopman model. To this end, we consider the optimal control problem with \(x_0\in {\mathbb {X}}\) and a stage cost \(\ell :{\mathbb {R}}^n\times {\mathbb {R}}^{n_c} \rightarrow {\mathbb {R}}\):

$$\begin{aligned} \begin{aligned} \min _{u\in L^\infty (0,T;{\mathbb {R}}^{n_c})}&\int _0^T \ell (x(t),u(t))\,\text {d}t\\ \text{ s.t. }\qquad {\dot{x}} =&F(x) + \sum _{i=1}^{n_c} G_i(x)u_i, \qquad x(0)=x_0. \end{aligned} \end{aligned}$$

In what follows, we compare the optimal value of the Koopman representation of (15) projected onto the subspace of observables \({\mathbb {V}}\) with initial datum \(z_0 = \Psi (x_0)\)

$$\begin{aligned} \begin{aligned} \min _{\alpha \in L^\infty (0,T;{\mathbb {R}}^{n_c})}&\int _0^T \ell (P(z(t)),\alpha (t))\,\text {d}t\\ \text{ s.t. }\qquad \dot{{z}}(t) =&\left[ {\mathcal {L}}_{\mathbb {V}}^0 + \sum _{i=1}^{n_c}\alpha _i(t)\left( {\mathcal {L}}_{\mathbb {V}}^{e_i}-{\mathcal {L}}_{\mathbb {V}}^0\right) \right] {z}(t), \qquad {z}(0)={z}_0, \end{aligned} \end{aligned}$$

to the optimal value of the surrogate-based control problem:

$$\begin{aligned} \begin{aligned} \min _{{\tilde{\alpha }}\in L^\infty (0,T;{\mathbb {R}}^{n_c})}&\int _0^T \ell (P({\tilde{z}}(t)),\tilde{\alpha }(t))\,\text {d}t\\ \text{ s.t. }\qquad \dot{{\tilde{z}}}(t) =&\left[ \tilde{{\mathcal {L}}}_m^0 + \sum _{i=1}^{n_c}{\tilde{\alpha }}_i(t)\left( \tilde{{\mathcal {L}}}_m^{e_i}-\tilde{{\mathcal {L}}}_m^0\right) \right] {\tilde{z}}(t), \qquad {\tilde{z}}(0)={z}_0, \end{aligned} \end{aligned}$$

where P maps a trajectory of observables to a trajectory in the state space, which in practice is frequently realized by including the coordinates of the identity function in the dictionary \(\Psi \) of observables.

Corollary 19

Let \(T,\varepsilon >0\), \(\delta \in (0,1)\), \(z_0\in {\mathbb {R}}^N\), let J be locally Lipschitz continuous and let Assumption 15 hold. Furthermore, let \((z^*,\alpha ^*)\) be an optimal solution of problem (16) such that the state response of (15) emanating from the control \(\alpha ^*\) is contained in \({\mathbb {X}}\). Then there is \(m_0\in {\mathbb {N}}\) such that for \(m\ge m_0\) data points contained in \({\mathbb {X}}\), there exists a tuple \(({\tilde{z}},\tilde{\alpha })\) which is feasible for (17) such that for the cost, we have the estimate

$$\begin{aligned} {\mathbb {P}}\left( \left| \int _0^T\ell (P({\tilde{z}}(t)),\tilde{\alpha }(t)) - \ell (P(z^*(t)),\alpha ^*(t))\,\mathrm {d}t)\right| \le \varepsilon \right) \ge 1-\delta . \end{aligned}$$

4 Numerical Examples

In this section, we first present numerical experiments on the derived error bound for the Koopman generator and then discuss the implications for optimal control. In particular, we emphasize that the bilinear Koopman model from Sect. 3 appears to be the best approach for a straightforward transfer of predictive error bounds to the control setting.

4.1 Generator Error Bounds: Ornstein–Uhlenbeck Process

We begin by investigating the validity and accuracy of the error bounds for the Galerkin matrices of a single SDE system, as derived in Proposition 10. To this end, we consider the one-dimensional reversible Ornstein–Uhlenbeck (OU) process

$$\begin{aligned} \mathrm {d}X_t = - X_t \mathrm {d}t + \mathrm {d}W_t. \end{aligned}$$

As the spectrum of the generator \({\mathcal {L}}\) of the OU process, as well as its invariant density, is known in analytical form, we can exactly calculate the Galerkin matrices \(C, \, A\), all variances \(\sigma ^2_{\Phi _{ij}}\), and asymptotic variances \(\sigma ^2_{\Phi _{ij}, \infty }\), if we consider a basis set comprised of monomials, see “Appendix A.4.”

We consider monomials of maximal degree four (i.e., \(N = 4\)) and set the discrete integration time step to \(\Delta _t = 10^{-3}\). For a range of different data sizes m and confidence levels \(\delta \), we estimate the minimal error \(\varepsilon \) that can be achieved with probability \(1 - \delta \) for a variety of quantities of interest. We calculate \(\varepsilon \) for all individual entries \(C_{ij}\) and \(A_{ij}\) using inequality (6). Moreover, we also calculate \(\varepsilon \) for the Frobenius norm errors in C and A by means of (5).

In order to compare our bound to the real error, we conduct 500 identical experiments. For each experiment, we generate an independent simulation of OU process (18), with initial condition drawn from the invariant distribution. For each trajectory and each of the data sizes m considered, we estimate the matrices \({\tilde{C}}_m, \, {\tilde{A}}_m\). We then calculate the absolute entry-wise errors to C and A, as well as the Frobenius norm errors \(\Vert {\tilde{C}}_m - C\Vert _F\) and \(\Vert {\tilde{A}}_m - A\Vert _F\). Finally, we numerically compute the \(1 - \delta \)-percentile of each of these errors for all confidence levels \(\delta \) considered above (i.e., the error \(\varepsilon \) below which 450 of the 500 repeated experiments lie). These can be directly compared to the probabilistic bounds \(\varepsilon \) obtained from our theoretical estimates.

The results are shown in Fig. 1. We can see in panels B and C that our estimates for individual entries of the Galerkin matrices C and A are quite accurate, as the data-based error is over-estimated by only a factor of two to three. Our estimates for Frobenius norm errors are less accurate, with approximately one order of magnitude difference between theoretical and data-based errors. It can be concluded that the factor N in (5) is too coarse in this example, as the actual Frobenius norm error only marginally exceeds the maximal entry-wise error. Nevertheless, the qualitative behavior of all theoretical error bounds is confirmed by the data.

Fig. 1
figure 1

Numerical Results for one-dimensional OU Process (18). A: Exact invariant density \(\mu \) in black, compared to histograms of the first m points of an exemplary trajectory, for various data sizes m. B: Error bounds for C corresponding to confidence level \(1 - \delta = 0.9\). We show both the theoretical estimates obtained in Proposition 10 (blue), as well as the data-based estimates obtained as described in the text (red). We show the maximal error over all entries \(C_{ij}\) (dots), the average error over all matrix entries (squares), and the Frobenius norm errors \(\Vert {\tilde{C}}_m - C\Vert _F\). C: The same as B for the matrix A

4.2 Extension to Control Systems

In this section, we illustrate our findings for deterministic as well as stochastic systems regarding prediction and control. We compare the solution of the exact model to the bilinear system

$$\begin{aligned} \begin{aligned} {\dot{z}}(t)&= \left[ \tilde{{\mathcal {L}}}_m^0 + \sum _{i=1}^{n_c}u_i(t)\left( \tilde{{\mathcal {L}}}_m^{e_i}-\tilde{{\mathcal {L}}}_m^0\right) \right] {z}(t)\\ z(t_0)&= \psi (x(t_0)), \end{aligned} \end{aligned}$$

where \(n_c\) is the dimension of the control input u. Besides bilinear model (19), we also compare the true solution to the linear model obtained via eDMD with control, see Proctor et al. (2016); Korda and Mezić (2018a) for details. Optimality of the computed trajectories from a theoretical standpoint will not be addressed here, as the error bounds for \(\tilde{{\mathcal {L}}}_m\) are still too large. However, the principled approach is to choose an m such that Corollary 19 holds.

For the numerical discretization, we use eDMD with a finite lag time to obtain a discrete-time version of (19) in case of the Duffing system, which corresponds to an explicit Euler discretization (Peitz et al. 2020). For the Ornstein–Uhlenbeck example, we calculate the generator using gEDMD (Klus et al. 2020) and then obtain the resulting discrete-time version by taking the matrix exponential. In the case of eDMD with control, we use the standard algorithm from Korda and Mezić (2018a), which also results in a forward Euler version of the linear system \({\dot{z}} = {\hat{A}} z + {\hat{B}} u\), i.e.,

$$\begin{aligned} \begin{aligned} {z}_{i+1}&= A z_i + B u_i, \\ z_0&= \psi (x(t_0)). \end{aligned} \end{aligned}$$

Remark 20

Note that one can drastically improve the predictive accuracy—in particular for longer time horizons—by introducing an intermediate project-and-lift step, which only makes a difference if the space \({\mathbb {V}}\) spanned by the \(\{\{\psi _k\}_{k=1}^N\}\) is not a Koopman-invariant subspace (Proctor et al. 2018). Moreover, it becomes less and less important the more the dynamics of the \(\tilde{{\mathcal {L}}}_m\) are truly restricted to \({\mathbb {V}}\), or—alternatively—if we are not interested in long-term predictions, for instance in the MPC setting.

Considering this intermediate step, the bilinear discrete-time systems become

$$\begin{aligned} \begin{aligned} {\widehat{z}}_{i}&= \psi (P({z}_{i})) \\ {z}_{i+1}&= \left( K_0 + \sum _{j=1}^{n_c}(K_j-K_0) u_{j,i}\right) {\widehat{z}}_i \\ z_0&= \psi (x(t_0)), \end{aligned} \end{aligned}$$

where P is the projection of the lifted state z onto the full state \(x \in {\mathbb {X}}\). In the same manner, the DMDc model reads

$$\begin{aligned} \begin{aligned} {\widehat{z}}_{i+1}&= \psi (P({z}_{i+1})) \\ {z}_{i+1}&= A {\widehat{z}}_i + B u_i, \\ z_0&= \psi (x(t_0)). \end{aligned} \end{aligned}$$

However, this comes at the cost of losing the bilinear or linear structure of the DMD-based models, respectively.

4.2.1 Duffing Equation (ODE)

The first system we study is the Duffing oscillator:

$$\begin{aligned} \begin{aligned} \tfrac{\text {d}x}{\text {d}t} = \begin{pmatrix} x_2 \\ -\delta x_2 - \alpha x_1 u - 2\beta x_1^3 \end{pmatrix}, \quad x(t_0) = x_0. \end{aligned} \end{aligned}$$

with \(\alpha = -1\), \(\beta = 1\) and \(\delta = 0\). Note that the control does not enter linearly, which is a well-known challenge for DMDc (Peitz et al. 2020).

As the dictionary \(\psi \), we choose monomials with varying maximal degrees, and we also include square and cubic roots for comparison. For the data collection process, we simulate the system with constant control inputs \(u=0\) and \(u=1\) using the standard Runge–Kutta scheme of fourth order with time step \(h=0.005\). As the final time, we choose \(T = n_{lag} h\) seconds, where \(n_{lag}\) is the integer number of time steps we step forward by the discrete-time Koopman operator model. We perform experiments for both \(n_{lag}=1\) and \(n_{lag}=10\). Each trajectory yields one tuple \((x,y) = (x(0), x(T))\), and we sample various numbers m of data points with uniformly distributed random initial conditions over the rectangle \([-1.5, 1.5]^2\).

Fig. 2
figure 2

Comparison of ODE solution, the bilinear surrogate model and the linear model obtained via eDMDc for system (23). Top row shows systems (19) and (20); bottom row uses project-and-lift versions (21) and (22)

Figure 2 shows the prediction accuracy for \(m=10000\) and \(n_{lag}=1\), where excellent agreement is observed for the bilinear surrogate model. In particular the relative error

$$\begin{aligned} \Delta x(t) = \frac{\Vert x(t) - {\tilde{x}}(t)\Vert _2}{\Vert x(t)\Vert _2}, \end{aligned}$$

where \({\tilde{x}}(t)= P(z(t))\) is the solution obtained via the surrogate model, is below 1 percent for the first second (i.e., 200 steps), whereas the eDMDc approach has a large error from the start.

To study the influence of the size of the training data set, Fig. 3 shows boxplots of the one-step prediction accuracy for various m. Each boxplot was obtained by performing 20 trainings of a bilinear system according to the procedure described above. After each training, a single time step was made for 1000 initial conditions \(x_0 \in [-1.5,1,5]^2\) and control inputs \(u \in [0,1]\), both drawn uniformly. Consequently, each boxplot consists of \(2\cdot 10^4\) data points. We see that, as expected, the training error decreases for larger m. However, what is really surprising is that a saturation can be observed already at \(m=30\) for an ODE system. Beyond that, no further improvement can be seen, which demonstrates the advantage of (i) the linearity of the Koopman approach and (ii) the usage of autonomous systems for the model reduction process.

Fig. 3
figure 3

Left: Boxplot of the relative one-step prediction error over 20 training runs and 1000 different samples \((x_0,u)\) in each run for a dictionary of monomials up to degree at most five and \(n_{lag}=1\). Right: The influence of the lag time as well as the control input on the mean accuracy (the dashed line with triangle symbols corresponds to the left plot). We see that the lag time plays an important role in the control setting

Interestingly, the lag time between two consecutive data points has a critical impact on the maximal accuracy in the control case. This is due to the fact that the bilinear surrogate model is only exact for the Koopman generator (Peitz et al. 2020). For a finite lag time, the bilinear model is a first-order approximation such that smaller lag times are advantageous. Nevertheless, the accuracy still significantly supersedes the eDMDc approach.

Another interesting observation can be made with respect to the choice of the dictionary \(\psi \). Figure 4 shows a comparison of the mean errors (analogous to the red bars in Fig. 3 for various dictionaries. We observe excellent performance for monomials with degree three or larger. The addition of roots of x is not beneficial at all, and in particular, smaller dictionaries are favorable in terms of the data requirements, which is in agreement with our error analysis and which was also reported in Peitz and Klus (2020).

Fig. 4
figure 4

Mean relative one-step prediction errors for various dictionaries and data set sizes m

Next, we study the stabilization of system (23) for \(h=0.01\) and the final time \(T=1.5\). Using the time discretization as above and a straight-forward single-shooting method, this yields a 150-dimensional optimization problem similar to Problem (17) from Corollary 19:

$$\begin{aligned} \begin{aligned} \min _{u} \int _{0}^5&\Vert P (z(t)) - x^{\mathrm {ref}}(t)\Vert ^2 \\ \text{ s.t. } \qquad&(19) \end{aligned} \end{aligned}$$

where \(x^{\mathrm {ref}}\) is the reference trajectory to be tracked. Figure 5 demonstrates the performance for \(x^{\mathrm {ref}}=0\) with models using \(M=5\) and only \(m=200\) training samples – 100 for each model in the bilinear setting and 200 for eDMDc. We see very good performance for the bilinear system even without the intermediate projection step. In contrast, the eDMDc approximation fails for System (23), even when initializing with the optimal solution from the full system.

Fig. 5
figure 5

Open-loop control performance (stabilization of the origin) using the true ODE model as well as the bilinear and DMDc surrogate models. Top row shows systems (19) and (20); bottom row uses project-and-lift versions (21) and (22)

4.2.2 Ornstein–Uhlenbeck Process (SDE)

For the stochastic setting, we consider an Ornstein–Uhlenbeck process with a control input:

$$\begin{aligned} \mathrm {d}X_t = -\alpha (u X_t) \mathrm {d}t + \sqrt{2 \beta ^{-1}} \mathrm {d}W_t. \end{aligned}$$

with \(\alpha = 1\), \(\beta = 2\) and \(u(t) \in [0,1]\). The system is simulated numerically using an Euler-Maruyama integration scheme with a time step of \(10^{-3}\) as in Sect. 4.1. For both systems, we calculate the Koopman operator corresponding to \(u=0\) and \(u=1\), respectively, using the gEDMD procedure presented in Klus et al. (2020) with monomials up to degree five. We then calculate the corresponding Koopman operators for the time step \(h=0.05\) using the matrix exponential.

Fig. 6
figure 6

Prediction accuracy for the expected value of the Ornstein–Uhlenbeck process (approximated by averaging over 100 simulations) of the bilinear system and eDMDc, respectively

To study the prediction performance (cf. Fig. 6), we proceed in the same way as for the Duffing system, except that we now compare the expected values, approximated by averaging over 100 SDE simulations. The results are very similar to the deterministic case, where the performance of both surrogate modeling techniques is comparable when the control enters linearly, and very poor for eDMDc otherwise. Even though the Ornstein–Uhlenbeck process is stochastic, the linearity is highly favorable for the data requirements. We do not observe any considerable deterioration even in the very low data limit.

Finally, in the control setting, we aim at tracking the expected value \({\mathbb {E}}[X_t]\), which is precisely the quantity that is predicted by the Koopman operator. Thus, Problem (24) can directly be applied to SDEs as well. In order to compare the results to the full system, we average over 20 simulations in the evaluation of the objective function value when using the SDE. However, this appears to be insufficient, as the performance is inadequate, cf. Fig. 7. The bilinear surrogate model, on the other hand, shows very good performance with a small amount of \(m=100\) training data points.

Fig. 7
figure 7

Control of the expected value of the Ornstein–Uhlenbeck process (approximated by averaging over 100 simulations using the optimal control input shown in the bottom plots). In the SDE-based control, we have used 20 simulations in each objective function evaluation

5 Conclusions

We presented the first rigorously derived probabilistic bounds on the finite-data approximation error for the Koopman generator of SDEs and nonlinear control systems. Furthermore, by using slightly more advanced techniques from probability theory, we also relaxed the assumption of i.i.d. data invoked in Zhang and Zuazua (2021) in the ODE setting. Moreover, we also provided an analysis for the error propagation to estimate the prediction accuracy in terms of the data size. A novelty for SDEs and in the control setting is that our bounds explicitly depend on the number of data points (and not only in the infinite-data limit). Further, the proposed techniques provide the theoretical foundation for the Koopman-based approach (Peitz et al. 2020) to control-affine systems, which seems to be superior for control and particularly well-suited for MPC, since it avoids the curse of dimensionality w.r.t. the control dimension. In future work, we will focus on the application of the derived bounds in a optimal and predictive control, in particular in combination with the recently obtained (control-uniform) projection error bounds of our follow-up work (Schaller et al. 2022).