Infinite-dimensional bilinear and stochastic balanced truncation with error bounds

Along the ideas of Curtain and Glover, we extend the balanced truncation method for infinite-dimensional linear systems to bilinear and stochastic systems. Specifically , we apply Hilbert space techniques used in many-body quantum mechanics to establish error bounds for the truncated system and prove convergence results. The functional analytic setting allows us to obtain mixed Hardy space error bounds for both finite-and infinite-dimensional systems, and it is then applied to the model reduction of stochastic evolution equations driven by Wiener noise.


Introduction
Model reduction of bilinear systems has become a major field of research, partly triggered by applications in optimal control and the advancement of iterative numerical methods for solving large-scale matrix equations. High-dimensional bilinear systems often appear in connection with semi-discretised controlled partial differential equations or stochastic (partial) differential equations with multiplicative noise. A popular class of model reduction methods that is well-established in the field of linear systems theory is based on first transforming the system to a form in which highly controllable states are highly observable and vice versa ("balancing"), and then eliminating the least controllable and observable states. For finite-dimensional linear systems, balanced truncation and residualisation (a.k.a. singular perturbation approximation) feature computable error bounds and are known to preserve important system properties, such as stability or passivity [G84]; see also [A05] and the references therein. For a generalisation of (linear) balanced truncation to infinite-dimensional systems, see [CG86,GO14].
For bilinear systems, no such elaborate theory as in the linear case is available, in particular approximation error bounds for the reduced system are not known. The purpose of this paper therefore is to extend balanced truncation to bilinear and stochastic evolution equations, specifically, to establish convergence results and prove truncation error bounds for the bilinear and stochastic systems. We start by introducing a function space setting that allows us to define bilinear balanced truncation in arbitrary (separable) Hilbert spaces which extends the finite-dimensional theory. However, instead of just extending the finite-dimensional theory to infinite dimensions, we harness the functional analytic machinery available in infinite dimensions to obtain new error bounds for finite-dimensional systems as well.
The figure of merit in our analysis is a Hankel-type operator acting between certain function spaces which are ubiquitous in many-body quantum mechanics and within this theory called Fock spaces. We show that under mild assumptions on the dynamics, the Hankel operator is a Hilbert-Schmidt or even trace-class operator. The key idea is that the algebraic structure of the Fock space, that is a direct sum of tensor products of copies of Hilbert spaces, mimics the nested Volterra kernels representing the bilinear system. This allows us to perform an analysis of the singular value decomposition of this operator along the lines of the linear theory developed by Curtain and Glover [CG86]. For more recent treatments of infinite-dimensional linear systems we refer to [GO14], [RS14], and [S11]. For applications of the bilinear method to finite-dimensional open quantum systems and Fokker-Planck equations we refer to [HSZ13] and [SHSS11].
The remainder of the article is structured as follows: The rest of the introduction is devoted to fix the notation that is used throughout the article and to state the main results. Section 2 introduces the concept of balancing based on observability and controllability (or reachability) properties of bilinear systems, which is then used in Section 3 to define the Fock space-valued Hankel operator and study properties of its approximants. The global error bounds for the finite-rank approximation based on the singular value decomposition of the Hankel operator are given in Section 4. Finally, in Section 5 we discuss applications of the aforementioned results to the model reduction of stochastic evolution equations driven by multiplicative Lévy noise. The article contains an Appendix that records a technical Lemma stating the Volterra series representation of the solution to infinite-dimensional bilinear systems.
Set-up and main results. Let X be a separable Hilbert space and A : D(A) ⊂ X → X the generator of an exponentially stable C 0 -semigroup (T (t)) t≥0 of bounded operators, i.e. a strongly continuous semigroup that satisfies T (t) ≤ Me −ωt for some ω > 0 and M ≥ 1.
(1.1) It follows from standard fixed-point arguments that such equations always have unique mild solutions [LY95,Proposition 5.3] ϕ ∈ C([0, T ], X) that satisfy u i (s)N i ϕ(s) + Bu(s) ds. (1.2) and assume that M 2 Γ 2 (2ω) −1 < 1. We then introduce the observability O = W * W and reachability Gramian P = RR * for equation (1.1) in Definition 2.1. Those Gramians will be decomposed, as indicated above, by an observability W and reachability map R that are explicitly constructed in Section 3. The Hankel operator is then defined as H = W R. From the Hankel operator construction we obtain two immediate corollaries: The full Lyapunov equations for bilinear or stochastic systems are known to be notoriously difficult to solve. It is therefore often more convenient [B17] to compute a k-th order truncation of the Gramians which we introduce in Definition 3.5. Our first result implies exponentially fast convergence for the singular values calculated from the truncated Gramians to the singular values obtained from the full Gramians O and P: Proposition 1.1. Let (σ i ) i∈N denote the balanced singular values σ i := λ i (OP) and (σ k i ) i∈N the singular values of the k-th order truncated Gramians. The Hankel operator H k computed from the k-th order truncated Gramians converges in Hilbert-Schmidt norm to H and for all i ∈ N Although our framework includes infinite-dimensional systems, such systems are almost always numerically approximated by finite-dimensional systems: Let V 1 ⊂ V 2 ⊂ ... ⊂ X be a nested sequence of closed vector spaces such that n∈N V n = X for which we assume that V n is an invariant subspace of both T (t) and N. In this case, V n is also an invariant subspace of the generator A of the semigroup [EN00, Chapter 2, Section 2.3], and we can consider the restriction of (1.1) to V n Sufficient conditions for W to be a Hilbert-Schmidt operator will be presented in Lemma 3.4. Norm convergence of Hankel operators implies convergence of its singular values and so the convergence of Hankel singular values holds also under the assumptions of Proposition 1.2.
We then turn to global error bounds for bilinear systems: For linear systems, the existence of a Hardy space H ∞ error bound is well-known and a major theoretical justification of the linear balanced truncation method both in theory and practice. That is, the difference of the transfer function for the full and reduced system in H ∞ norm is controlled by the difference of the Hankel singular values that are discarded in the reduction step. To the best of our knowledge, there is no such bound for bilinear systems and we are only of aware of two recent results in that direction [R17] and [R18].
In [BD10] a family of transfer functions (G k ) k∈N 0 for bilinear systems was introduced. We consider the difference of those functions for two systems and write ∆(G k ) for the difference of transfer functions and ∆(H) for the difference of Hankel operators. In terms of these two quantities we obtain an error bound that extends the folklore bound for linear systems to the bilinear case: Theorem 1. Consider two bilinear systems that both satisfy the stability condition M 2 Γ 2 (2ω) −1 < 1 with the same finite-dimensional input space R n and output space H ≃ R m . 2 The difference of the transfer functions of the two systems ∆(G k ) in mixed H ∞ -H 2 Hardy norms, defined in (1.4), is bounded by We recall that for a trace-class operator T , the nuclear (trace) norm T S 1 is the ℓ 1 sum of singular values of T. Thus, if ∆(H) is the difference of Hankel operators for the full and reduced system, then the nuclear norm coincides with the ℓ 1 sum of truncated singular values. The proof of Theorem 1 is done along the lines of the linear balancing theory and extends the 2 ∆(H) S 1 bound on the H ∞ norm of the transfer function for linear equations to the bilinear setting. From the Hankel estimates we then obtain an explicit error bound on the dynamics for two systems with initial condition zero: Theorem 2. Consider two bilinear systems that both satisfy the stability condition M 2 Γ 2 (2ω) −1 < 1 with the same finite-dimensional input space R n and output space H ≃ R m . Let ∆(Cϕ(t)) be the difference of the outputs of the two systems. For with Ξ := n i=1 N i and initial conditions zero it follows that As an application of the theoretical results, we discuss generalised stochastic balanced truncation of stochastic (partial) differential equations in Section 5. The links between bilinear balanced truncation and stochastic balanced truncation are well-known for finite-dimensional systems driven by Wiener noise (see [BD11]). In Section 5, we extend the Hankel operator methods to the finite-dimensional stochastic systems covered in [BD14] and [BR15], but also a large class of bilinear infinite-dimensional stochastic systems as well. By pursuing an approach similar to the linear setting, we obtain an error bound on the expected output in terms of the Hankel singular values: Proposition 1.3. Consider two stochastic systems with the same finite-dimensional input space R n and output space H ≃ R m . Let u ∈ L p ((0, ∞), R n ) for p ∈ [1, ∞] be a deterministic control and let Φ and Φ be the stochastic flows of each respective system. The two stochastic flows shall be exponentially stable in mean square sense and define C b -Markov semigroups. The difference ∆(CY ) of processes Y defined in (5.4) with initial conditions zero satisfies then E∆(CY • (u)) L p ((0,∞),R m ) ≤ 2 ∆(H) S 1 (L 2 (Ω×(0,∞),R n ),L 2 (Ω×(0,∞),R m )) u L p ((0,∞),R n ) .
A conceptually stronger bound however is obtained by arguing along the lines of the bilinear framework: Theorem 3. Consider two stochastic systems with the same finite-dimensional input space R n and output space H ≃ R m such that the respective stochastic flows Φ and Φ are independent. The two stochastic flows shall be exponentially stable in mean square sense and define C b -Markov semigroups. The difference ∆(CY ) of processes Y defined in (5.4) with zero initial conditions satisfies Finite-dimensional intermezzo. Hitherto, stochastic and bilinear balanced truncation have only been considered for finite-dimensional systems and so we devote a few preliminary remarks towards this setting. For finite-dimensional systems one computes the observability and reachability Gramians O and P from the Lyapunov equations and decomposes these symmetric positive-definite matrices into some other (non-unique) matrices O = K * K and P = V V * . In the next step, a singular value decomposition of the matrix KV is computed. The singular values of this matrix KV are just the square-roots of the eigenvalues σ j := λ j (OP) independent of the particular form of K and V (zero is not counted as a singular value here).
By discarding a certain number of "small" singular values of KV , one can reduce the order of the system by applying the balancing transformations [RS14,5.11] to the matrices of the system. A paradigm of such a decomposition KV , where K and V are not matrices but operators, is the Hankel operator H studied in this paper. However, all such decomposition are equivalent [RS14, Theorem 5.1] up to unitary transformations U 1 : ran(H) → ran(KV ) and U 2 : ker(H) ⊥ → ker(KV ) ⊥ such that H| ker(H) ⊥ = U * 1 KV | ker(KV ) ⊥ U 2 . Let (σ i ) i be the singular values of some compact operator T then the p-th Schatten norm is defined as T S p = ( i σ p i ) 1/p . All Schatten norms share the property that they are invariant under unitary transformations. Thus, the error bounds for the Hankel decomposition apply to any other decomposition, as well. More precisely, let H = U H Σ H V * H be a singular decomposition of the Hankel operator and KV = U KV Σ KV V * KV the one for KV. The difference Hankel operator for the full and reduced system is then unitarily equivalent to the difference of KV for the two systems ∆(H)| ker(H) Thus, although the Hankel decomposition is of limited immediate applicability itself, it possesses a rich algebraic structure that is discussed in this paper. The Hankel decomposition finally allows us to obtain error bounds on the difference of the dynamics for the reduced and full system in terms of the sum of truncated singular values, i.e. the S 1 -norm of ∆(H).
Notation. The space of bounded linear operators between Banach spaces X, Y is denoted by L(X, Y ) and just by L(X) if X = Y. The operator norm of a bounded operator T ∈ L(X, Y ) is written as T . The p-th Schatten class is denoted by S p (X, Y ). In particular, we recall that for a linear operator T ∈ S 1 (X, Y ), where X and Y are separable Hilbert spaces endowed with orthonormal systems of basis vectors (ONB), the nuclear norm is given by (1.3) We write ∂B X (1) for the unit sphere of a Banach space X and say that g = O(f ) if there is C > 0 such that g ≤ C f . In order not to specify the constant C, we also write g f . The domain of unbounded operators A is denoted by D(A).
Let H be a separable Hilbert space. For the n-fold Hilbert space tensor product of a Hilbert space H we write H ⊗n := H ⊗ ... ⊗ H. To define the Hankel operator we require a decomposition of the positive Gramians. For this purpose, we introduce the Fock space F n (H) of H-valued functions F n (H) := ∞ k=1 F n k (H) where F n k (H) := L 2 ((0, ∞) k , H ⊗ (R n ) ⊗(k−1) ) and F n 0 (H) := H. Thus, elements of the Fock space F n are sequences taking values in F n k . The indicator function of an interval I will be denoted by 1l I Let C + be the right complex half-plane, then we define the H-valued Hardy spaces H 2 and H ∞ of multivariable holomorphic functions F : C k + → H with finite norms respectively. We also introduce mixed L 1 i L 2 k−1 and H ∞ i H 2 k−1 norms which for H-valued functions functions f : (0, ∞) k → H and g : Finally, for k-variable functions h we occassionally use the short notation In Section 5, the space L p ad denotes the L p spaces of stochastic processes that are adapted to some given filtration and we introduce the notation Ω I := I × Ω where I is some interval.

The pillars of bilinear balanced truncation
We start with the definition of the Gramians on X which extend the standard definition on finite-dimensional spaces [ZL02, Eq. (6) and (7)] to arbitrary separable Hilbert spaces.
2.1. Gramians. Let H be a separable Hilbert space and C ∈ L(X, H) the state-tooutput (observation) operator. The space H is called the output space and as we assume that there are n control functions, the space R n will be referred to as the input space. We then introduce the bilinear Gramians for times t i ∈ (0, ∞): with e i denoting the standard basis vectors of R n .
Let M 2 Γ 2 (2ω) −1 < 1, then the bounded operators O k defined for x, y ∈ X by are summable in operator norm. The limiting operator, given by O : Similarly for the reachability Gramian, let P 0 (t 1 ) := T (t 1 ) * . Then, we define for i ≥ 1 and y ∈ X The control operator B ∈ L(R n , X) shall be of the form Let M 2 Γ 2 (2ω) −1 < 1, the reachability Gramian is then defined as P := ∞ k=0 P k ∈ S 1 (X). The S 1 (X)-convergence follows from the characterization (1.3) of the nuclear norm as for any orthonormal systems (e i ), Assumption 1. We assume that M 2 Γ 2 (2ω) −1 < 1 holds such that both O and P exist.
As in the finite-dimensional case [ZL02, Theorems 3 and 4] the Gramians satisfy certain Lyapunov equations. However, those equations hold only in a weak sense if the generator of the semigroup A is unbounded.
Lemma 2.2. For all x 1 , y 1 ∈ D(A) and all x 2 , y 2 ∈ D(A * ) (2.1) Proof. We restrict us to the proof of the first identity, since the proof of the second one is completely analogous. Let x ∈ D(A) then Similarly, for x ∈ D(A) and k ≥ 1 Finally, we may use the polarization identity to obtain (2.1).
Analogously to [BD11, Theorem 3.1] for finite-dimensional system we obtain the following eponymous properties for the Gramians. Lemma 2.3. All elements ϕ 0 ∈ ker(O) are unobservable in the homogeneous system, i.e. the solution to Proof.
We start by showing that ker(O) is an invariant subspace of the semigroup (T (t)). Let x ∈ ker(O) then for all t ≥ 0 and all k Lemma 2.4. The closure of the range of the reachability Gramian P is an invariant subspace of the flow of (1.1).
Proof. Analogous to the preceding one.

Hankel operators on Fock spaces
To decompose the observability Gramian as O = W * W and the reachability Gramian as P = RR * , we commence by defining the observability and reachability maps.
Then, we can define, by Assumption 1, the observability map W ∈ L (X, F n (H)) as W (x) := (W k (x)) k∈N 0 . An explicit calculation shows that W * is given by Similarly to the decomposition of the observability Gramian, we introduce a decomposition of the reachability Gramian P = RR * . Let The adjoint operators of the R k are the operators If the Gramians exist, then the reachability map is defined as Its adjoint is given by To see that R k is a Hilbert-Schmidt operator we take an ONB (e i ) of F n k+1 (R n ), such that the e i are tensor products of an ONB of L 2 ((0, ∞), R) and standard unit vectors of R n , and an arbitrary ONB (3.1) One can then check that the maps W and P indeed decompose the Gramians as O = W * W and P = RR * . We now introduce the main object of our analysis: Since any compact operator acting between Hilbert spaces possesses a singular value decomposition, we obtain the following Corollary: (3.2) We now state a sufficient condition under which H becomes a trace-class operator such that (σ k ) k∈N ∈ ℓ 1 (N). Proof. Since for any i ∈ {1, .., m} and i 1 , ..., i k ∈ {1, .., n} the operator is a Carleman operator, we can apply [W00, Theorem 6.12(iii)] that characterizes Carleman operators of Hilbert-Schmidt type. The statement of the Lemma follows from the summability of In the rest of this section, we discuss immediate applications of our preceding construction.
Definition 3.5. The k-th order truncation of the Gramians are the first k summands of the Gramians, i.e.
Proposition 1.1. The truncated Hankel operators H (k) converge in Hilbert-Schmidt norm to H and for the singular values (σ k i ) of the truncated Hankel operator it follows that Proof. From [K69, Corollary 2.3] it follows that for any i ∈ N the difference of singular values can be bounded as Next, we state the proof of Proposition 1.2 on the approximation of infinite-dimensional by finite-dimensional systems. The Hankel operator for the system on V n is then just given by H Vn := W R Vn where with P Vn being the orthogonal projection onto V n . Proposition 1.2. Let H Vn be the Hankel operator of the system restricted to V n as above. If the observability map W is a Hilbert-Schmidt operator then the Hankel operator H Vn converges in nuclear norm to H. If W is only assumed to be bounded, then the convergence is still in Hilbert-Schmidt norm.
Proof. Using elementary estimates it suffices to show S 2 −convergence of R Vn to R. This is done along the lines of (3.1).
3.1. Convergence of singular vectors. The convergence of singular values has already been addressed in Proposition 1.1. For the convergence of singular vectors, we now assume that there is a sequence of compact operators H(m) ∈ S ∞ (F n (R n ) , F n (H)) converging in operator norm to H. By compactness, every operator H(m) has a sin- . Assumption 2. Without loss of generality let the singular values be ordered as σ 1 (m) ≥ σ 2 (m) ≥ .. . Furthermore, for the rest of this section, all singular values of H are assumed to be non-zero and non-degenerate, i.e. the eigenspaces of HH * and H * H are one-dimensional.
Lemma 3.6. Convergence in operator norm for a family of compact operators (H(m)) to the Hankel operator H implies norm convergence of singular vectors.
Proof of Lemma 3.6. We sketch the proof only for singular vectors (e j ) since the arguments for (f j ) are analogous. We start by writing e j = r(m)e j (m) + x j (m) where e j (m), x j (m) = 0. Then, the arguments stated in the proof of [CGP88,Appendix 2] show that for m sufficiently large

Global error estimates
We start by defining a control tensor U k (s) ∈ L H ⊗ R n ⊗k , H Using sets ∆ k (t) := {(s 1 , ..., s k ) ∈ R k ; 0 ≤ s k ≤ ... ≤ s 1 ≤ t}, we can decompose the output map (0, ∞) ∋ t → Cϕ(t) with ϕ as in (1.2) for controls u L 2 ((0,∞),(R n , • ∞ )) < √ 2ω M Ξ and Ξ := n i=1 N i as in Lemma A.1 into two terms Cϕ(t) = K 1 (t) + K 2 (t) given by (4.1) The first term K 1 is determined by the initial state ϕ 0 of the evolution problem (1.1). If this state is zero, the term K 1 vanishes. The term K 2 on the other hand captures the intrinsic dynamics of equation (1.1). A technical object that links the dynamics of the evolution equation with the operators from the balancing method are the Volterra kernels we study next.
Definition 4.1. The Volterra kernels associated with (1.1) are the functions The Volterra kernels satisfy an invariance property for all p, q, k, j ∈ N 0 such that p + q = k + j : (4.2) The Volterra kernels are also the integral kernels of the components of the Hankel operator h k,j (s 0 , ..., s k + t 1 , ..., t j+1 )f (t) dt.
Remark 1. In particular the kernels h k,0 appear in the definition of the H 2 -system norm introduced in [ZL02, Eq. 15] for which robust numerical algorithms with strong H 2 -error performance are available [BB11].
This system norm can also be expressed in terms of the Gramians Σ 2 H 2 = tr (BB * O) = tr (C * CP) which is well-defined as B * B and P are both trace-class operators.
In [BD10] the k-th order transfer function G k has been introduced as the k + 1variable Laplace transform of the Volterra kernel h k,0 Using mixed Hardy norms as defined in (1.4), the Paley-Wiener theorem implies the following estimate for i ∈ {1, .., k + 1} .
In the following Lemma we derive a bound on the mixed L 1 -L 2 norm of the Volterra kernels: Lemma 4.2. Consider two systems satisfying Assumption 1 with the same number of controls and the same output space H ≃ R m such that H is trace-class (Lemma 3.4). Then the Volterra kernels h k,j satisfy Proof. Given the difference Volterra kernel ∆(h k,j ) associated with ∆(W k R j ).
The Lemma follows then from the characterization of the nuclear norm stated in (1.3).
The preceding Lemma implies bounds on the difference of the dynamics for two systems Σ and Σ satisfying Assumption 1. Before explaining this in more detail, we recall the notation ∆(X) := X − X used in the introduction where X is some observable of system Σ and X its pendant in system Σ.
In particular, Lemma 4.2 immediately gives the statement of Theorem 1. Consider two bilinear systems that both satisfy the stability condition M 2 Γ 2 (2ω) −1 < 1 with the same finite-dimensional input space R n and output space H ≃ R m . The difference of transfer functions ∆(G k ) in mixed H ∞ -H 2 Hardy norms, defined in (1.4), is bounded by Proof. The Hankel operator is an infinite matrix with operator-valued entries H ij = W i R j . Using the invariance property (4.2), we can combine Lemma 4.2 with estimate (4.3), relating the transfer functions to the Volterra kernels, to obtain from the definition of the nuclear norm (1.3) that which by summing up the two bounds yields the statement of the theorem.
While the preceding Theorem controls the transfer functions, the subsequent Theorem controls the actual dynamics starting at zero: Theorem 2. Consider two bilinear systems that both satisfy the stability condition M 2 Γ 2 (2ω) −1 < 1 with the same finite-dimensional input space R n and output space H ≃ R m . Let ∆(Cϕ(t)) be the difference of the outputs of the two systems. For control functions u ∈ L ∞ ((0, ∞), R n ) ∩ L 2 ((0, ∞), R n ) such that u L 2 ((0,∞),(R n , • ∞ )) < Proof. The operator norm of the control tensor is bounded by where we applied Cauchy-Schwartz's inequality to the product inside the sum to bound the ℓ 1 norm by an ℓ 2 norm.

Applications
Throughout this section, we assume that we are given a filtered probability space (Ω, F , (F t ) t≥T 0 , P) satisfying the usual conditions, i.e. the filtration is right-continuous and F T 0 contains all F null-sets. We assume X to be a real separable Hilbert space. We study in the following subsection an infinite-dimensional stochastic evolution equation with Wiener noise to motivate the extension of stochastic balanced truncation to infinite-dimensional systems that we introduce thereupon. We stick to the notation introduced in the preceding sections and also consider the state-to-output (observation) operator C ∈ L(X, H) and the control-to-state (control) operator Bu = n i=1 ψ i u i . such that for all t ∈ [0, T ], ω ∈ Ω, and x, y ∈ C([0, T ], X) Then, by [GM11, Theorem 3.5] there exists a unique continuous mild solution in H As we are concerned with bilinear systems, we make the additional assumption that N ∈ L(X, S 2 (Y, X)) and that there is a control u ∈ L 2 We refer to (5.2) with F ≡ 0 as the homogeneous part of this equation. For solutions to the homogeneous part of (5.2) starting at t = 0, let Φ(•) : L 2 (Ω, F 0 , X) → H (0,T ) 2 (X) be the flow defined by the mild solution, i.e. Z hom t := Φ(t)ξ. If the initial time is some T 0 rather than 0 we denote the (initial time-dependent) flow by Φ(•, T 0 ) : L 2 (Ω, F T 0 , X) → H (T 0 ,T ) 2 (X). The (X-)adjoint of the flow is defined by Φ(•, T 0 )ϕ 1 , ϕ 2 X = ϕ 1 , Φ(•, T 0 ) * ϕ 2 X for arbitrary ϕ 1 , ϕ 2 ∈ X.
Definition 5.1 (Exponential stability in m.s.s.). The solution to the homogeneous system with flow Φ is called exponentially stable in the mean square sense (m.s.s.) if there is some c > 0 such that for all ϕ 0 ∈ X and all t ≥ 0 Lyapunov techniques to verify exponential stability for SPDEs of the form (5.2) are discussed in [GM11, Section 6.2].
We then define the variation of constants process Y of the flow Φ as This variation of constants process coincides with the mild solution to the full SPDE (5.2) almost surely for initial-conditions ξ = 0. This follows from the stochastic Fubini theorem [GM11, Theorem 2.8] and (5. Another important feature of the homogeneous solution to (5.2) is the homogeneous Markov property [GM11, Section 3.4]. While the flow Φ is time-dependent as the SPDE is non-autononomous, there is an associated , s)x)) independent of s ≥ 0 and P (t + s)f = P (t)P (s)f. The C b -Feller property, i.e. P (t) maps C b (X) again into C b (X), will not be needed in our subsequent analysis, but reflects the continuous dependence of the solution (5.2) on initial data.
In particular, we use that the C b -Markov semigroup can be extended to all f for which the process is still integrable, i.e. f (Φ(t, s)x) ∈ L 1 (Ω, R) for arbitrary s ≤ t and x ∈ X.
In the following subsection we introduce a generalized stochastic balanced truncation framework for systems with properties similar to the ones that we just discussed for the particular stochastic evolution equation (5.2).

Generalized stochastic balanced truncation.
For an exponentially stable flow Φ we define the stochastic observability W and reachability map R W ∈ L(X, L 2 (Ω (0,∞) , H)) with (W x)(t, ω) := CΦ(t, ω)x and From the observability and reachability maps (5.5), we define observability O = W * W ∈ L(X) and reachability P = RR * ∈ S 1 (X) Gramians satisfying for all x ∈ X To obtain an analogous interpretation of the reachability Gramian, let us recall that for compact self-adjoint operators K : X → X, we can define the unbounded Moore-Penrose pseudoinverse as using any orthonormal eigenbasis (v λ ) λ∈σ(K) associated with eigenvalues λ of K.
Then, for any T > 0 one defines the input energy up to time T as E T input : where Y t is the variation of constants process of the flow defined in (5.4). We also define the (self-adjoint and trace-class) time-truncated reachability Gramians P T for x, y ∈ X Then, one has as in the finite-dimensional framework [BD11, Theorem 2.1] the following representation of the input energy (5.7): Proof. Let x ∈ ran(P T ) then the control u T (t) := 1l (0,T ) (t)B * Φ(T, t) * P # T x is welldefined and we can consider Y T (u T ) as in (5.4). This yields E(Y T (u T )) = x since for any z ∈ X From the definition of u T it follows also that u T 2 L 2 (Ω (0,∞) ,R n ) = x, P # T x . To see that u T is minimal in norm, as stated in the definition of the input energy (5.7), consider any other u = u T + u ∆ such that Y T (u) satisfies E(Y T (u)) = x.
Hence, it follows that for any z ∈ X Specializing this for z = P # T x shows that u ∆ ⊥ u T . Hence, minimality follows from Definition 5.3. The stochastic Hankel operator is defined as By Remark 2 the Hankel operator is trace-class if H ≃ R m for some m ∈ N.
From standard properties of the stochastic integral it follows that the expectation value of the solution E(Z t ) or E(CZ t ) to (5.16) is just the solution ϕ or Cϕ to the linear and deterministic equation ϕ ′ (t) = Aϕ(t) + BEu(t). Our next Proposition extends this analogy between stochastic and linear systems to the error bounds for deterministic controls: Proposition 1.3. Consider two stochastic systems with the same finite-dimensional input space R n and output space H ≃ R m . Let u ∈ L p ((0, ∞), R n ) for p ∈ [1, ∞] be a deterministic control and let Φ and Φ be the stochastic flows of each respective system. The two stochastic flows shall be exponentially stable in mean square sense and define C b -Markov semigroups. The difference ∆(CY ) of processes Y defined in (5.4) with initial conditions zero satisfies Proof. Let (e n ) and (f n ) be orthonormal systems in L 2 ((0, ∞), R n ) and L 2 ((0, ∞), R m ), then they are also orthonormal in L 2 (Ω (0,∞) , R n ) and L 2 (Ω (0,∞) , R m ).
Then by the semigroup property of the time-homogeneous Markov process it follows that E((P (s)q k )(Φ(t)ψ j )) = (P (t)P (s)q k )(ψ j ) = (P (t + s)q k )(ψ j ) and thus The standard estimate for linear systems [CGP88, Theorem 2.1] implies then Using this inequality the statement of the theorem follows from the homogeneity of the Markov semigroup and Young's inequality While the error bound in the preceding theorem relied essentially on linear theory, our next estimate bounds the expected error. The proof strategy resembles the proof given for bilinear systems in Lemma 4.2. We commence, as we did for bilinear systems, by introducing the Volterra kernels of the stochastic Hankel operator. Theorem 3. Consider two stochastic systems with the same finite-dimensional input space R n and output space H ≃ R m such that the respective stochastic flows Φ and Φ are independent. The two stochastic flows shall be exponentially stable in mean square sense and define C b -Markov semigroups. Then the difference of compressed Volterra kernels h for both systems satisfies The difference ∆(CY ) of processes Y defined in (5.4) with initial conditions zero satisfies Proof. We start by showing how (5.9) implies (5.10) Thus, it suffices to verify (5.9). Let Z := L 2 (Ω, R m ) ⊗ L 2 (Ω, R n ). The independence assumption in the theorem has been introduced for to hold. To see this, we consider an auxiliary function ξ i (x 1 , x 2 ) := e i , Cx 1 − Cx 2 R m 2 where C and C are the observation operators of the two systems. By the independence assumption, there is again a Markov semigroup (P (t)) t≥0 associated with the timehomogeneous Markov process determined by the vector-valued flow (Φ(t), Φ(t)) t≥0 such that (P (t)ξ i )(x 1 , x 2 ) := E(ξ i (Φ(s + t, s)x 1 , Φ(s + t, s)x 2 )). Let (ψ j ) j∈{1,..,n} , ( ψ) j∈{1,..,n} be the vectors in X comprising the control operators B and B, respectively. The semigroup property of (P (t)) t≥0 implies then x−α/2 ∆(h(2s, •)) L 2 (Ω,S 2 (R n ,R m )) ds − ∆(h(2x, •)) L 2 (Ω,S 2 (R n ,R m )) This is due to Lebesgue's differentiation theorem for Banach space-valued integrands applied to the flows Φ, Φ and the following estimate •), (x, • ′ ))) Z ds dt Consider then the family of intervals I x := [x − min (δ x , γ x ) , x + min (δ x , γ x )] for x ∈ I ∩ J. Lebesgue's covering theorem [L10,Theroem 26] states that, after possibly shrinking the diameter of the sets I x first, there exists an at most countably infinite family of disjoint sets (I x i ) i∈N covering I ∩ J such that the Lebesgue measure of I ∩ J ∩ i∈N I x i C is zero. Using additivity of the Lebesgue measure, there are for every ε > 0 finitely many points x 1 , .., x n ∈ I ∩ J such that the set Lebesgue measure at most ε . Thus, we have obtained finitely many disjoint sets I x i of total measure M − ε such that for 0 < α i /2 ≤ diam(I x i )/2 both estimates (5.12) and (5.13) hold at For every i ∈ {1, .., n} fixed, we introduce the family of sesquilinear forms (L i ) and for Z := L 2 (Ω, R m ) ⊗ L 2 (Ω, R n ) we can define a Hilbert-Schmidt operator of unit S 2 -norm given by Q : Doing a singular value decomposition of Q yields orthonormal systems f k,i ∈ L 2 (Ω, R m ) , g k,i ∈ L 2 (Ω, R n ) as well as singular values σ k,i ∈ [0, 1] parametrized by k ∈ N. For any δ > 0 given there is N(δ) large enough such that Thus, there are also f k,i ∈ L 2 (Ω, R m ) and g k,i ∈ L 2 (Ω, R n ) orthonormalized, N i ∈ N, and σ k,i ∈ [0, 1] such that (5.14) Then, s k,i (s, ω) := in L 2 Ω (0,∞) , R n and L 2 Ω (0,∞) , R m respectively, both in k and i, such that for ω), (t, ω ′ )))g k,i (ω ′ ) R m dt ds dP(ω) dP(ω ′ ).
(5.15) Hence, we get The bound on the first term follows from (5.13) and bound on the second term follows from (5.14) and the third term is (5.11). We then compute further that where we used (5.12) to obtain the second estimate. Combining the two preceding estimates, the Theorem follows from the characterization of the nuclear norm given in (1.3).
Next, we study conditions under which convergence of flows implies convergence of stochastic Hankel operators. Let (Φ i ) be a sequence of flows converging in L 2 (Ω (0,∞) , L(X)) to Φ and W i , R i the observability and reachability maps derived from Φ i as in (5.5). For the observability map this yields convergence in operator norm If H ≃ R m then it follows by an analogous estimate that W i converges to W in Hilbert-Schmidt norm, too [W00, Theorem 6.12(iii)].
For the reachability map we choose an ONB (e k ) k∈N of L 2 (Ω (0,∞) , R) which we extend by tensorisation e j k := e k ⊗ e j for j ∈ {1, .., n} to an ONB of L 2 (Ω (0,∞) , R n ). Using this basis and an orthonormal basis (f l ) l∈N of X, it follows that As in the bilinear case, we obtain from this a convergence result for the Hankel operators: Corollary 5.5. Let H i denote the (compact) Hankel operators associated with flows Φ i converging in L 2 (Ω (0,∞) , L(X)) to Φ. Then, the H i converge in Hilbert-Schmidt norm to H and if H ≃ R m then the convergence is also in the sense of trace-class operators In  t ) j∈{1,...,n} and the control operator B as before. We then study the stochastic evolution equation for ξ ∈ L 2 (Ω, F 0 , X), A the generator of a C 0 −semigroup (T (t)), and N j ∈ L(X). Then, the homogeneous part of (5.16), i.e. without the control term Bu, defines a unique predictable process Z hom To see that this one coincides with the standard stochastic observability Gramian (5.6), we must show that for all x ∈ X : E CΦ(t)x 2 H = E CΨ(t) * x 2 H . Applying Itō's isometry we obtain from (5.17) ds.
An inflection of the integration domain shows then that both expressions (and hence the Gramians) coincide.
The second Lyapunov equation can be obtained by an analogous calculation: Let x 0 ∈ X be arbitrary, then we study the evolution for initial conditions √ C * Cx 0 in the weak sense of the adjoint flow Proceeding as before, stochastic integration by parts yields Using Parseval's identity, i.e. summing over an orthonormal basis replacing x 0 , yields after taking the limit t → ∞ the second Lyapunov equation. Proof. For all k ≥ 2 we obtain recursively an exponentially decreasing bound Thus, (A.1) is an absolutely convergent series. To see that ζ and the mild solution coincide, it suffices to verify that the Volterra series (A.1) satisfies (1.2).