Finite-Data Error Bounds for Koopman-Based Prediction and Control

The Koopman operator has become an essential tool for data-driven approximation of dynamical (control) systems, e.g., via extended dynamic mode decomposition. Despite its popularity, convergence results and, in particular, error bounds are still scarce. In this paper, we derive probabilistic bounds for the approximation error and the prediction error depending on the number of training data points, for both ordinary and stochastic differential equations while using either ergodic trajectories or i.i.d. samples. We illustrate these bounds by means of an example with the Ornstein–Uhlenbeck process. Moreover, we extend our analysis to (stochastic) nonlinear control-affine systems. We prove error estimates for a previously proposed approach that exploits the linearity of the Koopman generator to obtain a bilinear surrogate control system and, thus, circumvents the curse of dimensionality since the system is not autonomized by augmenting the state by the control inputs. To the best of our knowledge, this is the first finite-data error analysis in the stochastic and/or control setting. Finally, we demonstrate the effectiveness of the bilinear approach by comparing it with state-of-the-art techniques showing its superiority whenever state and control are coupled.

The Koopman operator has become an essential tool for data-driven approximation of dynamical (control) systems, e.g., via extended dynamic mode decomposition.Despite its popularity, convergence results and, in particular, error bounds are still scarce.In this paper, we derive probabilistic bounds for the approximation error and the prediction error depending on the number of training data points; for both ordinary and stochastic differential equations while using either ergodic trajectories or i.i.d.samples.We illustrate these bounds by means of an example with the Ornstein-Uhlenbeck process.Moreover, we extend our analysis to (stochastic) nonlinear control-affine systems.We prove error estimates for a previously proposed approach that exploits the linearity of the Koopman generator to obtain a bilinear surrogate control system and, thus, circumvents the curse of dimensionality since the system is not autonomized by augmenting the state by the control inputs.To the best of our knowledge, this is the first finite-data error analysis in the stochastic and/or control setting.Finally, we demonstrate the effectiveness of the bilinear approach by comparing it with state-of-the-art techniques showing its superiority whenever state and control are coupled.

Introduction
The Koopman framework [14] is the operator-theoretic basis for a wide range of datadriven methodologies to predict the evolution of nonlinear dynamical systems using linear techniques, see, e.g., [23,33] or the recent survey [5] and the references therein.The underlying concept is that observables, which may also be understood as outputs from the systems-and-control perspective, can be propagated forward in time using the linear yet infinite-dimensional Koopman operator or its generator, instead of simulating the nonlinear system and evaluating the observable functions.Its recent success is closely linked to numerically tractable approximation techniques like extended Dynamic Mode Decomposition (eDMD), see, e.g., [10,11,16,35] for numerical techniques and convergence results.
While the Koopman framework is well established, approximation results are typically only established in the infinite-data limit, i.e., if sufficient data is available.Recently, Lu and Tartakovsky [19] discussed error bounds w.r.t.DMD invoking the seminal work [16] by Korda and Mezić.While the authors numerically demonstrate the effectiveness of their approach even for nonlinear parabolic Partial Differential Equations (PDEs), see also their extension [20], there remains a significant gap from a more theoretical point of view since the approximation error is assumed to be zero for finite data, see [19,Remark 3.1].Mamakoukas and coworkers [22] mimick a Taylor-series expansion based on a particular set of observables to approximate the system dynamics of an Ordinary Differential Equation (ODE).This work may be understood as a promising approach to incorporate (local) knowledge on the system dynamics in the Koopman framework.However, a bound on the prediction error in terms of data is not deduced.Error bounds for Koopman eigenvalues in terms of the finite-data estimation error were derived in [34], but the estimation error itself was not quantified.In [24], concentration inequalities were applied to bound the estimation error for the co-variance and cross-covariance operators involved in Koopman estimation.In the exhaustive preprint [17], the authors treat the projection error for different approximation spaces such as, e.g., reproducing kernel Hilbert spaces and wavelets.The estimation error is also discussed briefly in Section 8. 5.In [36], besides providing a finite-data error bound on the approximation of the Koopman operator in the context of ODEs, the authors estimate the projection error by means of finite-element analysis.In conclusion, to the best of our knowledge1 , [17,36] are the only works providing rigorous error bounds for Koopman-based approximations of a dynamical system governed by a nonlinear ODE.
In this paper, we rigorously derive probabilistic bounds on the approximation error (or finite-data estimation error) and the (multi-step) prediction error for nonlinear Stochastic Differential Equations (SDEs).This, of course, also includes nonlinear ODEs.The deduced bounds on the approximation error and prediction accuracy explicitly depend on the number of data points used in eDMD.To this end, besides using mass concentration inequalities and a numerical error analysis to deal with the error propagation in time, we employ substantially different techniques in comparison to [36] to provide an additional alternative assumption based on ergodic sampling tailored to stationary SDEs.Further, we illustrate the error bounds for the Ornstein-Uhlenbeck process.
W.r.t. the application of Koopman theory in control, a lot of research has been invested over the past years, beginning with the popular DMD with control [30], which was later used in Model Predictive Control (MPC) [15].Another popular method is to use a coordinate transformation into Koopman eigenfunctions [9] or the already mentioned component-wise Taylor series expansion [22].In [21], the prediction error of the method proposed in [30] was estimated using the convergence result of [16].However, the result is of purely asymptotic nature, i.e., it does not state a convergence rate in terms of data points.All approaches mentioned until now yield linear surrogate models of the form Ax + Bu, i.e. the control enters linearly.For general control-affine systems, numerical simulation studies indicate that bilinear surrogate models are better suited, see [4,8,26,29].The technique proposed in [27,29] constructs its surrogate model from n c + 1 autonomous Koopman operators, where n c is the control dimension.The key feature is that the state-space dimension is not augmented by the number of control inputs, which counteracts the curse of dimensionality in comparison to the more widespread approach introduced in [15].Compared to [29], we present a detailed analysis of the accuracy regarding both the dictionary size as well as the amount of training data.Even though the bound is rather coarse on the operator level, we demonstrate that it correctly captures the qualitative behavior.In this context, we provide a probabilistic bound on the approximation error of the projected Koopman generator, the projected Koopman semigroup and the respective trajectories.To this end, we extend our results towards nonlinear control systems.Besides a rigorous bound on the approximation error, we present estimates on the (auto-regressive) prediction accuracy, i.e. in an open-loop prediction (without feedback).This allows for a direct application of our results in MPC.
The paper is structured as follows.Firstly, in Section 2, we deduce a rigorous bound on the approximation error for nonlinear SDEs.Then, we extend our analysis to nonlinear control-affine systems in Section 3. In Section 4, two numerical simulation studies for the Ornstein-Uhlenbeck system (SDE) and the controlled Duffing equation (nonlinear controlaffine system) are presented before conclusions are drawn in Section 5.

Finite-data bounds on the approximation error: nonlinear SDEs
In this section, we analyze the approximation quality of extended Dynamic Mode Decomposition (eDMD) with finitely-many data points for the finite-dimensional stochastic differential equation where X t ∈ X ⊂ R d is the state, F : X → R d is the drift vector field, σ : X → R d×d is the diffusion matrix field, and W t is a d-dimensional Brownian motion.We assume that F, σ satisfy standard Lipschitz properties to ensure global existence of solutions to (SDE), see the textbook [25] for an introduction to this class of systems.We stress that the deterministic case is included by simply setting σ ≡ 0, leading to the ordinary differential equation The state space is assumed to be a measure space (X, Σ X , µ) with Borel σ-algebra Σ X and probability measure µ.In case of an ODE, the set X is often assumed to be compact and forward-invariant and the probability measure is the standard Lebesgue measure, cf.[36].
Definition 1 (Koopman operator).Let X t satisfy (SDE) for t ≥ 0. The Koopman operator semigroup associated with (SDE) is defined by for all bounded measurable functions f .
In case of ergodic sampling, that is, obtaining data points from a single long trajectory, we will assume invariance of the measure µ w.r.t. the stochastic process X t .
Definition 2 (Invariant measure with positive density).A probability measure µ is called invariant if it satisfies for all bounded measurable functions f and all t ≥ 0. Further, µ has an everywhere positive density ρ : We can now formulate our assumption on the underlying dynamics.
Assumption 3. Let either of the following hold: (a) The set X is compact and forward invariant (∀ x 0 ∈ X : P x 0 (X t ∈ X) = 1 for all t ≥ 0) and µ is the normalized Lebesgue measure.Moreover, the Koopman operator can be extended to a strongly continuous semigroup on the Hilbert space L 2 µ (X).
(b) The probability measure is an invariant measure in the sense of Definition 2.
We briefly comment on this assumption and first note that forward invariance of X can be weakened, if one is only interested in estimates for states contained in X, see also [36,Section 3.2].Moreover, if the dynamics obey an ODE, it was shown that the Koopman operator can indeed be extended to a strongly continuous semigroup on L 2 µ (X), see also [36].Second, the assumption of invariance of the underlying probability measure is satisfied for a broad class of SDEs, see e.g.[32].It can be checked that µ is then invariant for X t , that is, P(X t ∈ A) = µ(A) holds for all A ∈ Σ X and t ≥ 0, provided X 0 is distributed according to µ.Under Assumption 3 (b), Definition 1 can be extended to the Lebesgue spaces L p µ (X), 1 ≤ p < ∞, i.e. the Banach spaces of all (equivalence classes of) measurable functions f : X → R with X |f | p dµ < ∞.Then, the Koopman operators K t form a strongly continuous semigroup of contractions on all spaces L p µ (X), see [1].The functions in any of these spaces are often referred to as observables.
Next, we recall the definition of the generator associated to the semigroup K t : Definition 4 (Koopman generator).The infinitesimal generator L is defined via for all f ∈ D(L), where D(L) is the set of functions for which the limit (1) exists in the appropriate topology.
For sufficiently smooth functions f , Ito's Lemma [25] shows that the generator acts as a second order differential operator, defined in terms of the coefficients of (SDE), i.e.
with A : B := i,j a i,j b i,j being the standard Frobenius inner product for matrices.In what follows, we will focus exclusively on the Koopman semigroup on the Hilbert space L 2 µ (X) with inner product f, g µ = X f g dµ.As the semigroup is strongly continuous on L 2 µ (X) by our assumptions, by standard semigroup theory, the domain D(L) together with the graph norm forms a dense Banach space in L 2 µ (X).

Extended Dynamic Mode Decomposition
In this part we introduce the data-driven finite-dimensional approximation by eDMD of the Koopman generator defined in (1), see, e.g., [10,12,35].To this end, for a fixed set of linearly independent observables ψ 1 , . . ., ψ N ∈ D(L), we consider the finite-dimensional subspace Let P V denote the orthogonal projection onto V. We define the Galerkin projection of the Koopman generator by L V := P V L | V .Note that this is not the restriction of L onto V, as the image is also projected back onto V.If V is an invariant set under the action of the generator, then L V = L | V holds.As dim V = N , the linear operator L V : V → V may be represented by a matrix.In what follows, we denote the matrix representation of L V in terms of the basis functions ψ 1 , . . ., ψ N by the same symbol L V as the operator itself in a slight abuse of notation.Thus, using [13], we get . The norm of the isomorphism from V to R N depends on the smallest resp.largest eigenvalues of C, cf.Proposition 20 in Appendix A.1.
Consider data points x 0 , . . ., x m−1 ∈ X.In the following, these data is either drawn from a trajectory of an ergodic system or sampled independent and identically distributed (i.i.d.).We state this as the following assumption, using the notation: Assumption 5. Let Assumption 3 hold and assume either of the following.
(iid) The data is drawn i.i.d.from the measure specified via Assumption 3.
(erg) Assumption 3.(b) holds and the data is obtained as snapshots from a single ergodic trajectoy, that is, from a single long trajectory of the dynamics (SDE) with x 0 drawn from the unique invariant measure µ.Further assume the Koopman semigroup is exponentially stable on Let us form the transformed data matrices Ψ(X) := .
The evaluation of L can be realized via the representation (2).The empirical estimator for the Galerkin projection L V is then given by In all scenarios of of Assumption 5, we have with probability one that (1) Lm is well-defined for large enough m, that is, Cm is invertible, and (2) Lm converges to L V for m → ∞, see, e.g., [12,13].
For the case of a long trajectory, this result follows from ergodic theory, which is concerned with the convergence of time averages to spatial averages as the data size grows to infinity [2].Ergodic theory particularly applies to systems with a unique invariant measure.

Error bounds on approximations of projected Koopman generator and operator
Next, we quantify the approximation quality of the data-driven finite-dimensional approximation of the Koopman generator, i.e., for a given linear space V of observables and data x 0 , . . ., x m−1 ∈ X, we aim to estimate

Concentration bounds for random matrices
We start by deriving entry-wise error bounds for the data-driven mass and stiffness matrix, respectively.Since most of the arguments are significantly simpler for i.i.d.sampling, cf.Remark 11 at the end of this subsection, we first consider the more involved situation, i.e. ergodic sampling.This is of particular interest as simulation data of the dynamics (SDE) can, then, be directly used.For x ∈ X, consider a centered scalar random variable We denote its variance with respect to the invariant measure by Moreover, we set φ k = φ(x k ) for given data points x k , k ∈ {0, 1, . . ., m − 1}, and define the averaged random variable In Lemma 6 below, we quantify the variance of the averaged random variable φm .The key point is the decomposition of the variance into an asymptotic contribution, independent of m, and a second contribution, which decays with an explicitly given (polynomial) dependence on the amount of data m.
Lemma 6.Let Assumption 5.(erg) hold.Then we have The asymptotic variance σ 2 φ,∞ and the remainder term R m φ are given by Proof.We repeat the proof given in [18, Section 3.1.2]for the sake of illustration: The result follows by adding and subtracting the term 2 ∞ l=m φ, K l∆t φ µ .

Remark 7.
The assumption of exponential stability is satisfied, for example, if the generator L is self-adjoint (also known as detailed balance or reversibility) and additionally satisfies a Poincaré or spectral gap inequality [18].The requirement f, 1 µ = 0 is necessary, as the constant function is invariant for K t .
For reversible systems, we have φ, K l∆t φ µ ≥ 0 by symmetry of the Koopman operator.Therefore, σ 2 φ,∞ ≥ σ 2 φ > 0 is guaranteed in this case, and the variance σ Next, we derive an estimate for the remainder term in terms of the number m of data points.
Lemma 9. Let Assumption 5.(erg) hold, and set q = e −ω∆t < 1.Then Proof.We first observe that by the Cauchy Schwarz inequality and therefore: In the second line, we have used the geometric series for the first term, and a similar identity for the sum ∞ l=1 lq l , q < 1.The third line is obtained by direct simplification.
We can now combine the results of Lemmas 6 and 9 in order to obtain a concentration bound for a centered, matrix-valued random variable.To this end, we consider an N × N random matrix Φ with all entries φ ij ∈ L 2 µ,0 centered.We define Φ k and Φ m as for the scalar case, i.e., Φ k = Φ(x k ) and Φ m = 1 m m−1 k=0 Φ k .Proposition 10.Let Assumption 5.(erg) hold,, set q = e −ω∆t , and assume σ 2 φ ij ,∞ > 0 for all (i, j).Let Φ ∈ R N ×N be a centered, matrix-valued random variable in L 2 µ .Denote the matrices of all entry-wise variances and asymptotic variances by Then, for any given δ > 0, and m ∈ N, we have with probability at least 1 − δ that For reversible systems, we obtain the simplified bound , the scalar Chebyshev inequality and the result of Lemma 6, yield for all (i, j) : The second term on the right hand side does not exceed δ N 2 if in other words, there is a set of trajectories of probability at least 1 On the intersection of these sets, we have that , and the probability of the intersection is at least 1 − δ by Lemma 21.In the reversible case, we know that R m φ ij ≥ 0 for all (i, j), and therefore The simplified bound (5) follows by repeating the above argument starting from this inequality.
which leads to the following error estimate for fixed m ∈ N and δ > 0: The setting of sampling via the Lebesgue measure on a compact set X was thoroughly considered in [36].

Error bound for the projected generator
Next, we deduce our first main result by applying the probabilistic bounds obtained in Proposition 10 to estimate the error for the data-driven Galerkin projection Lm .
Theorem 12 (Approximation error: probabilistic bound).Let Assumption 5 hold.Then, for any error bound ε > 0 and probabilistic tolerance δ ∈ (0, 1), we have for any amount m ∈ N of data points such that the following hold with • In case of ergodic sampling, i.e., Assumption 5.(erg), • In case of ergodic sampling, i.e., Assumption 5.(erg), of a reversible system • In case of i.i.d.sampling, i.e., Assumption 5.(iid), Proof.In this proof, we will omit the subscript for the norm and set • = • F .Let us introduce the centered matrix-valued random variables Hence, we may apply Proposition 10 to these matrix-valued random variables.First, by the choice of m above we have .
Moreover, we compute Hence, by straightforward computations we obtain and Thus, we conclude which is (8).
A similar result as Theorem 12 was obtained for ODE systems in [36] under the assumption that the data is drawn i.i.d.An immediate consequence of the estimate on the generator approximation error is a bound on the error of the trajectories.To this end, consider the systems where z 0 ∈ R n , which represents an ODE in terms of the coefficients in the basis representation of elements of V. We will leverage the error bound obtained in Theorem 12 to derive an estimate on the resulting prediction error in the observables, i.e., z(t) − z(t) 2 .Note that in view of the isomorphism V R N this also directly translates to an error estimate for trajectories in V.
Corollary 13.Let Assumption 5 hold.Then for any T > 0 and δ, ε > 0 there is m 0 ∈ N such that for m ≥ m 0 data points we have A sufficient amount of data m 0 can be easily specified by combining the calculations displayed in the proof of Corollary 13, i.e.Gronwall's inequality and Condition (4).Under additional assumptions on the Koopman semigroup generated by L V , e.g., stability, one can refine this estimate or render it uniform in T , cf.Corollary 23 in Appendix A.3.

Error bound for the projected Koopman operator
Similar to the derivation of the probabilistic bound on the projected generator, a bound on the Koopman operator is possible.We briefly sketch the main steps of the argumentation.Let t = l∆ t for some l ∈ N and again choose a subspace V = span{{ψ j } N j=1 } ⊂ L 2 µ (X) (which, in contrast to the generator-based setting, is not required to be contained in the domain).The restricted Koopman operator on this subspace is defined via and . The empirical estimator is then defined similarly to the generator setting via . We now present the analogue to Theorem 12 for the Koopman operator which follows by straightforward adaptations of the results of Section 2.2.Theorem 14.Let Assumption 5 hold.Then, for t ≥ 0, any error bound ε > 0 and probabilistic tolerance δ ∈ (0, 1) there is m 0 ∈ N such that for any m ≥ m 0 , A sufficient amount of data m 0 can be specified analogously to Theorem 12.

Extension to control systems
In this section, we derive probabilistic bounds on the approximation error of nonliner controlaffine SDE systems of the form with input u ∈ R nc and state X t ∈ X, where F : X → R n and G i : X → R n , i = 1, . . ., n c , are locally Lipschitz-continuous vector fields.In the deterministic case σ ≡ 0 the controlled SDE reduces to the control-affine ODE system We will describe how one can apply the bounds on the generators of autonomous (SDE) systems obtained in Section 2 in order to obtain bounds for prediction of control systems, either for i.i.d. or ergodic sampling.
Central in this part is the fact that the Koopman generators for control-affine systems are control-affine.More precisely, if L ū denotes the Koopman generator for a control-affine system with constant control ū ∈ R nc and ū = r i=1 α i ūi is a linear combination of constant controls ūi ∈ R nc , we have This easily follows from the representation (2) of the Koopman generator, see also [29,Theorem 3.2] for the special (deterministic) case σ ≡ 0. We will utilize this property to invoke our results from Section 2 to approximate the Koopman generator corresponding to basis elements of the control space, that is, L e i , i = 1, . . ., n c , and L 0 corresponding to the drift term to form a bilinear control system in the observables.
Analogously to Assumption 5 we have the following two cases for the collected data and the underlying measure.
Assumption 15.Let either of the following hold: (iid) The data for each autonomous system with control u = e i , i = 0, . . ., n c , is sampled i.i.d.from either the normalized Lebesgue measure and contained in a compact set X or from an invariant measure µ i in the sense of Definition 2.
(erg) The data for each autonomous system with control u = e i , i = 0, . . ., n c , satisfies Assumption 5.(erg), i.e., is drawn from a single ergodic trajectory, the probability measure µ i of the resulting autonomous SDE is invariant in the sense of Definition 2 and the Koopman semigroup is exponentially stable on L 2 µ i ,0 (X).It is important to note that in the first case of (iid), we did not make any assumption of invariance of the set X for all autonomous systems corresponding to the constant controls e i , i = 0, . . ., n c , as this would be very restrictive.Hence, we have to ensure that the state trajectories remain (with probability one in the stochastic setting (11)) in the set X. Sufficient conditions are, e.g., controlled forward invariance of the set X or knowing that the initial condition is contained in a suitable sub-level set of the optimal value function of a respective optimal control problem, see, e.g., [3] or [7] for an illustrative application of such a technique in showing recursive stability of Model Predictive Control (MPC) without stabilizing terminal constraints for discrete-and continuous-time systems, respectively.
In the following, we set O i = L 2 µ i (X), i = 1, . . ., n c , and consider the generators L e i in these spaces, respectively.Further, let ψ 1 , . . ., ψ N : X → R be N linearly independent observables whose span V = span{ψ 1 , . . ., ψ N } satisfies where e i , i = 1, . . ., n c , denote the standard basis vectors of R nc and e 0 := 0. We now discuss two cases of sampling, one corresponding to the approach of Section 2 and one to the standard case of i.i.d.sampling as in [36].
As the original system and the Koopman generator are control affine, the remainder of this section is split up into two parts.First, we derive error estimates corresponding to autonomous systems driven by n c + 1 constant controls.Second, we use these estimates and control affinity to deduce a result for general controls.In accordance with the notation in Section 2 we define L e i V := P V L e i | V and also use this symbol to denote the matrix representation of this linear operator w.r.t. to the basis {ψ 1 , . . ., ψ N } of V. Its approximation based on the data x 0 , . . ., x m−1 ∈ X will be denoted by Le i m .Proposition 16.Let i ∈ {0, . . ., n c } be given and Assumption 15 hold.Then, for any pair consisting of a desired error bound ε > 0 and a probabilistic tolerance δ ∈ (0, 1), there is a number of data points m i such that for any m ≥ m i , we have the estimate The minimal amount of data m i is given by the formulas of Theorem 12.
Proof.The claim follows immediately from applying Theorem 12.
Having obtained an estimate for the autonomous systems corresponding to the constant controls e i , i = 0, . . .n c , we can leverage the control affinity of the system to formulate the corresponding results for arbitrary controls.To this end, for any control u(t) = nc i=1 α i (t)e i ∈ L ∞ (0, T ; R nc ), we define the projected Koopman generator and its approximation corresponding to the non-autonomous system with control u by Theorem 17.Let Assumption 15 hold.Then, for any pair consisting of a desired error bound ε > 0 and probabilistic tolerance δ ∈ (0, 1), prediction horizon T > 0, and control function u ∈ L ∞ (0, T ; R nc ) we have provided that the number m of data points exceeds max i=0,...,nc m i with m i defined as in Proposition 16 with Proof.Again, we omit the subscript of the norm and set • = • F .Using the result of Proposition 16 and our choice of m 0 , we have and for all i ∈ 1, . . .
Then we compute for a.e.t ∈ [0, T ] Next, we use Lemma 21 from Appendix A.2 with d = n c + 1, for i = 1, . . ., n c .This yields Taking the essential infimum yields the result.
Again, similar as in the previous section, we obtain a bound on trajectories via Gronwall, if the state response is contained in X.
Corollary 18.Let Assumption 15 hold.Let T, ε > 0 and δ ∈ (0, 1), z 0 ∈ R N and u ∈ L ∞ (0, T ; R nc ) such that the solution of (SDE) is contained in X with probability one.Then there is m 0 ∈ N such that for m ≥ m 0 the solutions z, z of As in Corollary 13, m 0 can explicitly be computed by combining Theorem 17 with the constants in Gronwalls inequality.
We conclude this section with a final corollary regarding the optimality of the solution obtained using an error-certified Koopman model.To this end, we consider the optimal control problem with x 0 ∈ X and a stage cost : R n × R nc → R: In what follows, we compare the optimal value of the Koopman representation of ( 15) projected onto the subspace of observables V with initial datum to the optimal value of the surrogate-based control problem: where P maps a trajectory of observables to a trajectory in the state space, which in practice is frequently realized by including the coordinates of the identity function in the dictionary Ψ of observables.
Corollary 19.Let T, ε > 0, δ ∈ (0, 1), z 0 ∈ R N , let J be locally Lipschitz continuous and let Assumption 15 hold.Furthermore, let (z * , α * ) be an optimal solution of problem (16) such that the state response of (15) emanating from the control α * is contained in X.Then there is m 0 ∈ N such that for m ≥ m 0 data points contained in X, there exists a tuple (z, α) which is feasible for (17) such that for the cost, we have the estimate

Numerical examples
In this section, we first present numerical experiments on the derived error bound for the Koopman generator, and then discuss the implications for optimal control.In particular, we emphasize that the bilinear Koopman model from Section 3 appears to be the best approach for a straightforward transfer of predictive error bounds to the control setting.

Generator Error Bounds: Ornstein-Uhlenbeck Process
We begin by investigating the validity and accuracy of the error bounds for the Galerkin matrices of a single SDE system, as derived in Proposition 10.To this end, we consider the one-dimensional reversible Ornstein-Uhlenbeck (OU) process   (18).A: Exact invariant density µ in black, compared to histograms of the first m points of an exemplary trajectory, for various data sizes m.B: Error bounds for C corresponding to confidence level 1 − δ = 0.9.We show both the theoretical estimates obtained in Proposition 10 (blue), as well as the data-based estimates obtained as described in the text (red).We show the maximal error over all entries C ij (dots), the average error over all matrix entries (squares), and the Frobenius norm errors Cm − C F .C: The same as B for the matrix A.
As the spectrum of the generator L of the OU process, as well as its invariant density, are known in analytical form, we can exactly calculate the Galerkin matrices C, A, all variances σ 2 Φ ij , and asymptotic variances σ 2 Φ ij ,∞ , if we consider a basis set comprised of monomials, see Appendix A.4.
We consider monomials of maximal degree four (i.e.N = 4), and set the discrete integration time step to ∆ t = 10 −3 .For a range of different data sizes m and confidence levels δ, we estimate the minimal error ε that can be achieved with probability 1 − δ for a variety of quantities of interest.We calculate ε for all individual entries C ij and A ij using inequality (6).Moreover, we also calculate ε for the Frobenius norm errors in C and A by means of (5).
In order to compare our bound to the real error, we conduct 500 identical experiments.For each experiment, we generate an independent simulation of the OU process (18), with initial condition drawn from the invariant distribution.For each trajectory and each of the data sizes m considered, we estimate the matrices Cm , Ãm .We then calculate the absolute entry-wise errors to C and A, as well as the Frobenius norm errors Cm − C F and Ãm − A F .Finally, we numerically compute the 1 − δ-percentile of each of these errors for all confidence levels δ considered above (i.e., the error ε below which 450 of the 500 repeated experiments lie).These can be directly compared to the probabilistic bounds ε obtained from our theoretical estimates.
The results are shown in Figure 1.We can see in panels B and C that our estimates for individual entries of the Galerkin matrices C and A are quite accurate, as the data-based error is over-estimated by only a factor of two to three.Our estimates for Frobenius norm errors are less accurate, with approximately one order of magnitude difference between theoretical and data-based errors.It can be concluded that the factor N in ( 5) is too coarse in this example, as the actual Frobenius norm error only marginally exceeds the maximal entry-wise error.Nevertheless, the qualitative behaviour of all theoretical error bounds is confirmed by the data.

Extension to control systems
In this section, we illustrate our findings for deterministic as well as stochastic systems regarding prediction and control.We compare the solution of the exact model to the bilinear system z(t) = ψ(P (z(t))) where n c is the dimension of the control input u, and P is the projection of the lifted state z onto the full state x ∈ X.Note that the first line, i.e., the project-and-lift step is not required if the space V spanned by the {{ψ k } N k=1 } is a Koopman-invariant subspace [31].Moreover, it becomes less and less important the more the dynamics of the Lm are truly restricted to V, or -alternatively -if we are not interested in long-term predictions, for instance in the MPC setting.Besides the bilinear model (19), we also compare the true solution to the linear model obtained via eDMD with control, see [15,30] for details.Optimality of the computed trajectories from a theoretical standpoint will not be addressed here, as the error bounds for Lm are still too large.However, the principled approach is to choose an m such that Corollary 19 holds.
For the numerical discretization, we use eDMD with a finite lag time to obtain a discretetime version of (19) in case of the Duffing system, which corresponds to an explicit Euler discretization [29].For the Ornstein-Uhlenbeck example, we calculate the generator using gEDMD [13] and then obtain the resulting discrete-time version by taking the matrix exponential.In the case of eDMD with control, we use the standard algorithm from [15], which also results in a forward Euler version of the linear system ż = Âz + Bu, i.e., (20) where we have again added the project-and-lift step necessary for high prediction accuracy over long time horizons.

Duffing equation (ODE)
The first system we study is the Duffing oscillator: with α = −1, β = 1 and δ = 0. Note that the control does not enter linearly, which is a well-known challenge for DMDc [29].
As the dictionary ψ, we choose monomials with varying maximal degrees, and we also include square and cubic roots for comparison.For the data collection process, we simulate the system with constant control inputs u = 0 and u = 1 using the standard Runge-Kutta scheme of fourth order with time step h = 0.005.As the final time, we choose T = n lag h seconds, where n lag is the integer number of time steps we step forward by the discrete-time Koopman operator model.We perform experiments for both n lag = 1 and n lag = 10.Each trajectory yields one tuple (x, y) = (x(0), x(T )), and we sample various numbers m of data points with uniformly distributed random initial conditions over the rectangle [−1.5, 1, 5] 2 .Fig. 2 shows the prediction accuracy for m = 100 and n lag = 10, where excellent agreement is observed for the bilinear surrogate model.In particular the relative error where x(t) = P (z(t)) is the solution obtained via the surrogate model, is below 0.1 percent for almost 3 seconds, whereas the eDMDc approach has a large error of ≈ 10% from the start and becomes unstable within the first second.
To study the influence of the size of the training data set, Fig. 3 shows boxplots of the one-step prediction accuracy for various m.Each boxplot was obtained by performing 20 trainings of a bilinear system according to the procedure described above.After each training, a single time step was made with 1000 uniformly drawn random initial conditions x 0 ∈ [−1.5, 1, 5] 2 control inputs u ∈ [0, 1], both.Consequently, each boxplot consists of 2 • 10 4 data points.We see that, as expected, the training error decreases for larger m.However, what is really surprising is that a saturation can be observed already at m = 30 for an ODE system.Beyond that, no further improvement can be seen, which demonstrates the advantage of (i) the linearity of the Koopman approach and (ii) the usage of autonomous systems for the model reduction process.Interestingly, the lag time between two consecutive data points has a critical impact on the maximal accuracy in the control case.This is due to the fact that the bilinear surrogate model is only exact for the Koopman generator [29].For a finite lag time, the bilinear model is a first order approximation such that smaller lag times are advantageous.Nevertheless, the accuracy still significantly supersedes the eDMDc approach.
Another interesting observation can be made with respect to the choice of the dictionary ψ.Fig. 4 shows a comparison of the mean errors (analogous to the red bars in Fig. 3 for various dictionaries.We observe excellent performance for monomials with degree three or larger.The addition of roots of x is not beneficial at all, and in particular, smaller dictionaries are favorable in terms of the data requirements, which is in agreement with our error analysis and which was also reported in [28].Next, we study the stabilization of the system (21) for the final time T = 5.Using the time discretization as above and a straight-forward single-shooting method, this yields a where x ref is the reference trajectory to be tracked.Fig. 5 demonstrates the performance for x ref = 0 with models that were obtained using only m = 25 training samples for each of the Koopman approximations, where almost perfect agreement with the solution using the full system is achieved.In contrast, the eDMDc approximation fails for System (21), even when initializing with the optimal solution from the full system.

Ornstein-Uhlenbeck process (SDE)
For the stochastic setting, we consider an Ornstein-Uhlenbeck process with a control input: with α = 1, β = 2 and u(t) ∈ [0, 1].The system is simulated numerically using an Euler-Maruyama integration scheme with a time step of 10 −3 as in Section 4.1.For both systems, we calculate the Koopman operator corresponding to u = 0 and u = 1, respectively, using the gEDMD procedure presented in [13] with monomials up to degree five.We then calculate the corresponding Koopman operators for the time step h = 0.05 using the matrix exponential.
To study the prediction performance (cf.Fig. 6), we proceed in the same way as for the Duffing system, except that we now compare the expected values, approximated by averaging over 100 SDE simulations.The results are very similar to the deterministic case, where the performance of both surrogate modeling techniques is comparable when the control enters linearly, and very poor for eDMDc otherwise.Even though the Ornstein-Uhlenbeck process is stochastic, the linearity is highly favorable for the data requirements.We do not observe any considerable deterioration even in the very low data limit.
Finally, in the control setting, we aim at tracking the expected value E[X t ], which is precisely the quantity that is predicted by the Koopman operator.Thus, Problem (22) can directly be applied to SDEs as well.In order to compare the results to the full system, we average over 20 simulations in the evaluation of the objective function value when using the SDE.However, this appears to be insufficient, as the performance is inadequate, cf.Fig. 7.The bilinear surrogate model, on the other hand, shows very good performance with a small amount of m = 100 training data points.

Conclusions
We presented the first rigorously derived probabilistic bounds on the finite-data approximation error for the Koopman generator of SDEs and nonlinear control systems.Furthermore, by using slightly more advanced techniques from probability theory, we also relaxed the assumption of i.i.d.data invoked in [36] in the ODE setting.Moreover, we also provided an analysis for the error propagation to estimate the prediction accuracy in terms of the data size.A novelty for SDEs and in the control setting is that our bounds explicitly depend on the number of data points (and not only in the infinite-data limit).Further, the proposed techniques provide the theoretical foundation for the Koopman-based approach [29] to control-affine systems, which seems to be superior for control and particularly well-suited for MPC, since it avoids the curse of dimensionality w.r.t. the control dimension.

A. Appendix
) and B ∈ R n×n be its corresponding matrix representation.Then where C i,j = ψ i , ψ j L 2 µ (X) .Proof.This follows from the identity which shows the equivalence of the vector norms.This induces the desired equivalence of the operator norms.Proof (Corollary 13).Using the bound of Lemma 22 we obtain z(t) − z(t) 2 ≤ Lm − L V 2 te t Lm 2 e t L V 2 = t Lm − L V 2 e t( L V 2 + Lm 2) .

A.2. A technical lemma
We compute P ( z(t)− z(t) 2 ≤ ε) ≥ P t Lm − L V 2 e t L V 2 e t Lm 2 z 0 ≤ ε ≥ P t Lm − L V 2 e 2t L V 2 e t Lm−LV 2 z 0 ≤ ε ≥ P T Lm − L V 2 e 2T L V 2 e T Lm−LV 2 z 0 ≤ ε By Theorem 12 and • 2 ≤ • F , for any ε we can choose m 0 such that P Lm − L V 2 ≤ ε ≥ 1 − δ.Hence, there is m 0 only depending on T , z 0 , L V and ε such that for any t ≥ 0 Taking the minimum over all t ∈ [0, T ] proves the claim.
Proof (Corollary 18).This proof follows with obvious modifications in the proof of Corollary 22 using the bound on then error of the time dependent generators of Theorem 17.
with respect to the inner product with weight function µ, which is the density of a normal distribution with variance one half, yielding the relations: The monomial basis can be recovered from eigenfunction basis ψ i by the representation formula: For a basis set comprised of monomials up to maximal degree N , the Galerkin matrices C and A can be obtained as the moments of the normal distribution with variance 0.5: (i + j) even, 0 (i + j) odd.
For their numerical estimation, we consider centered random variables: We calculate the asymptotic variance of the scalar random variable φ ij if it is defined by either of the two expressions above.We also introduce the quantity n := i + j for C or n := i + j − 2 for A. The analytical expressions for C ij , A ij above exactly equal the terms corresponding to H 0 in the general expansion for the monomial x n in (25).As the random variables φ ij are centered, no contribution from H 0 is left.Thereby, we obtain the decomposition for φ ij (up to the factor − ij 2 for estimation of A): Next, we calculate matrix elements with the Koopman operator at lag time l∆ t by combining (26) with the orthogonality relation (24): Finally, by setting q k = e −(n−2k)∆t , we calculate the asymptotic variance according to the result in Lemma 6 (note that the contribution for l = 0 appears only once, and that the

Figure 1 :
Figure1: Numerical Results for one-dimensional OU Process(18).A: Exact invariant density µ in black, compared to histograms of the first m points of an exemplary trajectory, for various data sizes m.B: Error bounds for C corresponding to confidence level 1 − δ = 0.9.We show both the theoretical estimates obtained in Proposition 10 (blue), as well as the data-based estimates obtained as described in the text (red).We show the maximal error over all entries C ij (dots), the average error over all matrix entries (squares), and the Frobenius norm errors Cm − C F .C: The same as B for the matrix A.

Figure 2 :
Figure 2: Comparison of ODE solution, the bilinear surrogate model and the linear model obtained via eDMDc for the system (21) for a random control input with u(t) ∈ [−1, 1].

Figure 3 :
Figure 3: Left: Boxplot of the relative one-step prediction error over 20 training runs and 1000 different samples (x 0 , u) in each run for a dictionary of monomials up to degree at most five and n lag = 1.Right: The influence of the lag time as well as the control input on the mean accuracy (the dashed line with triangle symbols corresponds to the left plot).We see that the lag time plays an important role in the control setting.

Figure 4 : 5 0
Figure 4: Mean relative one-step prediction errors for various dictionaries and data set sizes m.

Figure 5 :
Figure 5: Control performance using the true ODE model (black) and the bilinear surrogate model (orange).The results are almost indistinguishable, whereas eDMDc fails.

Figure 6 :
Figure 6: Prediction accuracy for the expected value of the Ornstein-Uhlenbeck process (approximated by averaging over 100 simulations) of the bilinear system and eDMDc, respectively.

Figure 7 :
Figure 7: Control of the expected value of the Ornstein-Uhlenbeck process (approximated by averaging over 100 simulations using the optimal control input shown in the bottom plots).In the SDE-based control, we have used 20 simulations in each objective function evaluation.