Infinite-dimensional bilinear and stochastic balanced truncation with explicit error bounds

Along the ideas of Curtain and Glover (in: Bart, Gohberg, Kaashoek (eds) Operator theory and systems, Birkhäuser, Boston, 1986), we extend the balanced truncation method for (infinite-dimensional) linear systems to arbitrary-dimensional bilinear and stochastic systems. In particular, we apply Hilbert space techniques used in many-body quantum mechanics to establish new fully explicit error bounds for the truncated system and prove convergence results. The functional analytic setting allows us to obtain mixed Hardy space error bounds for both finite-and infinite-dimensional systems, and it is then applied to the model reduction of stochastic evolution equations driven by Wiener noise.


Introduction
Model reduction of bilinear systems has become a major field of research, partly triggered by applications in optimal control and the advancement of iterative numerical methods for solving large-scale matrix equations. High-dimensional bilinear systems often appear in connection with semi-discretized controlled partial differential equations or stochastic (partial) differential equations with multiplicative noise. A popular class of model reduction methods that is well established in the field of linear systems theory is based on first transforming the system to a form in which highly controllable states are highly observable and vice versa ("balancing") and then eliminating the least controllable and observable states. For finite-dimensional linear systems, balanced truncation and residualization (a.k.a. singular perturbation approximation) feature computable error bounds and are known to preserve important system properties, such as stability or passivity [1]; see also [2] and references therein. For a generalization of (linear) balanced truncation to infinite-dimensional systems, see [3,4].
For bilinear systems, no such elaborate theory as in the linear case is available, in particular approximation error bounds for the reduced system are not known. The purpose of this paper therefore is to extend balanced truncation to bilinear and stochastic evolution equations, specifically, to establish convergence results and prove explicit truncation error bounds for the bilinear and stochastic systems. For finite-dimensional systems, our framework coincides with the established theory for bilinear and stochastic systems as studied in [5,6], and references therein. We start by introducing a function space setting that allows us to define bilinear balanced truncation in arbitrary (separable) Hilbert spaces which extends the finite-dimensional theory. However, instead of just extending the finite-dimensional theory to infinite dimensions, we harness the functional analytic machinery available in infinite dimensions to obtain new explicit error bounds for finite-dimensional systems as well.
The figure of merit in our analysis is a Hankel-type operator acting between certain function spaces which are ubiquitous in many-body quantum mechanics and within this theory called Fock spaces. We show that under mild assumptions on the dynamics, the Hankel operator is a Hilbert-Schmidt or even trace class operator. The key idea is that the algebraic structure of the Fock space, that is, a direct sum of tensor products of copies of Hilbert spaces, mimics the nested Volterra kernels representing the bilinear system. This allows us to perform an analysis of the singular value decomposition of this operator along the lines of the linear theory developed by Curtain and Glover [3]. For more recent treatments of infinite-dimensional linear systems, we refer to [4,7,8]. For applications of the bilinear method to finite-dimensional open quantum systems and Fokker-Planck equations, we refer to [9,10].
The article is structured as follows: The rest of the introduction is devoted to fix the notation that is used throughout the article and to state the main results. Section 2 introduces the concept of balancing based on observability and controllability (or reachability) properties of bilinear systems, which is then used in Sect. 3 to define the Fock space-valued Hankel operator and study properties of its approximations. The global error bounds for the finite-rank approximation based on the singular value decomposition of the Hankel operator are given in Sect. 4. Finally, in Sect. 5 we discuss applications of the aforementioned results to the model reduction of stochastic evolution equations driven by multiplicative Lévy noise. The article contains two appendices. The first one records a technical lemma stating the Volterra series representation of the solution to infinite-dimensional bilinear systems. The second appendix provides more background on how to compute the error bounds found in this article.

Set-up and main results
Let X be a separable Hilbert space and A : D(A) ⊂ X → X the generator of an exponentially stable C 0 -semigroup (T (t)) t≥0 of bounded operators, i.e. a strongly continuous semigroup that satisfies T (t) ≤ Me −νt for some ν > 0 and M ≥ 1.
For exponentially stable semigroups generated by A, bounded operators N i ∈ L(X ), B ∈ L(R n , X ), an initial state ϕ 0 ∈ X , and control functions u = (u 1 , . . . , u n ) ∈ L 2 ((0, T ), R n ), we study bilinear evolution equations on X of the following type (1.1) It follows from standard fixed-point arguments [11,Proposition 5.3] that such equations always have unique mild solutions ϕ ∈ C([0, T ], X ) that satisfy and assume that M 2 2 (2ν) −1 < 1. We then introduce the observability O = W * W and reachability gramian P = R R * for Eq. (1.1) in Definition 2.1. The gramians we define coincide for finite-dimensional system spaces X R k , and control B ∈ L(R n , R k ) and observation C ∈ L(R k , R m ) matrices with the gramians introduced in [12], see also [6, (6) and (7)]. More precisely, if X is finite dimensional, then the reachability gramian P is defined by and the observability gramian O by The condition M 2 2 (2ν) −1 < 1, stated in the beginning of this paragraph, appears naturally to ensure the existence of the two gramians. To see this, consider, for example, the reachability gramian for which we find [6, Theorem 2] For general bilinear and stochastic systems, the gramians will be decomposed, as indicated above, by an observability W and reachability map R that are explicitly constructed in Sect. 3. Although there are infinitely many possible decompositions of the gramians, our analysis relies on constructing an explicit decomposition. The Hankel operator is then defined as H = W R and is a map between Fock spaces. From the Hankel operator construction, we obtain two immediate corollaries: The Lyapunov equations for bilinear or stochastic systems are known to be notoriously difficult to solve. It is therefore computationally more convenient [13] to compute a kthorder truncation of the gramians which we introduce in Definition 3.5. Our first result implies exponentially fast convergence of the balanced singular values calculated from the truncated gramians to the balanced singular values obtained from the full gramians O and P: Although our framework includes infinite-dimensional systems, such systems are usually numerically approximated by finite-dimensional systems.
We therefore state a result on systems that are approximated by projections onto suitable subspaces. Let V 1 ⊂ V 2 ⊂ · · · ⊂ X be a nested sequence of closed vector spaces of arbitrary dimension such that i∈N V i = X for which we assume that V i is an invariant subspace of both T (t) and N . In this case, V i is also an invariant subspace of the generator A of the semigroup [14, Chapter 2, Section 2.3], and we can consider the restriction of (1.1) to V i We then turn to global error bounds for bilinear systems: For linear systems, the existence of a Hardy space H ∞ error bound is well known and a major theoretical justification of the linear balanced truncation method both in theory and practice. That is, the difference of the transfer function for the full and reduced system in H ∞ norm is controlled by the difference of the Hankel singular values that are discarded in the reduction step. To the best of our knowledge, there is no such bound for bilinear systems and we are only of aware of two recent results in that direction [15,16].
In [17], a family of transfer functions (G k ) k∈N 0 for bilinear systems was introduced. We consider the difference of these transfer functions for two systems and write (G k ) for the difference of transfer functions and (H ) for the difference of Hankel operators. In terms of these two quantities, we obtain an error bound that extends the folklore bound for linear systems to bilinear systems: Theorem 1 Consider two bilinear systems that both satisfy the stability condition M 2 2 (2ν) −1 < 1 with the same finite-dimensional input space R n and output space H R m . 2 The difference of the transfer functions of the two systems The trace distance of the Hankel operators can be explicitly evaluated using the composite error system, see "Appendix B", and does not require a direct computation of Hankel operators.
The proof of Theorem 1 is done by extending the framework of the linear balancing theory and extends the 2 (H ) TC bound on the H ∞ norm of the transfer function for linear equations to bilinear systems. From the Hankel estimates, we then obtain an explicit error bound on the dynamics for two systems with initial condition zero: Theorem 2 Consider two bilinear systems that both satisfy the stability condition M 2 2 (2ν) −1 < 1 with the same finite-dimensional input space R n and output space H R m . Let (Cϕ(t)) be the difference of the outputs of the two systems. For control 2ν M with := n i=1 N i and initial conditions zero it follows that As stated in Theorem 1, the trace distance of the Hankel operators can be explicitly evaluated using the composite error system, see "Appendix B", and does not require a direct computation of Hankel operators.
As an application of the theoretical results, we discuss generalized stochastic balanced truncation of stochastic (partial) differential equations in Sect. 5. The links between bilinear balanced truncation and stochastic balanced truncation are well known for finite-dimensional systems driven by Wiener noise (see e.g. [5]). In Sect. 5, we extend the Hankel operator methods to the finite-dimensional stochastic systems discussed in [18,19], but our methods also cover a large class of infinite-dimensional stochastic systems as well. By pursuing an approach similar to the linear setting, we obtain an error bound on the expected output in terms of the Hankel singular values: Proposition 1.3 Consider two stochastic systems with the same finite-dimensional input space R n and output space H R m . Let u ∈ L p ((0, ∞), R n ) for p ∈ [1, ∞] be a deterministic control and let and be the stochastic flows of each respective system. The two stochastic flows shall be exponentially stable in mean square sense and define C b -Markov semigroups. The difference (CY ) of processes Y defined in (5.4) with initial conditions zero satisfies then The trace distance of the Hankel operators can be explicitly evaluated using the composite error system, see "Appendix B".
It was first shown in [18,Example II.2] that the difference of full and reduced stochastic systems cannot be estimated by the sum of truncated singular values, which is the case for linear systems. Instead, the following result can be obtained by arguing along the lines of the bilinear framework: trace distance of the Hankel operators can be explicitly evaluated using the composite error system, see "Appendix B".

Finite-dimensional intermezzo and relation to balanced truncation
Hitherto, stochastic and bilinear balanced truncation have only been considered for finite-dimensional systems and so we devote a few preliminary remarks towards this setting. When applying, for example, balanced truncation to finite-dimensional systems, one computes the observability and reachability gramians O and P from the Lyapunov equations and decomposes these symmetric positive-definite matrices into some other (non-unique) matrices O = K * K and P = V V * . In the next step, a singular value decomposition of the matrix K V is computed. The singular values of this matrix K V are just the square roots of the eigenvalues of the product of the gramians σ j := λ j (OP) independent of the particular form of K and V . (Zero is not counted as a singular value here.) By discarding a certain number of "small" singular values of K V , one can reduce the order of the system by applying, for example, the balancing transformations, see [6,Proposition 2]. A paradigm of such a decomposition K V , where K and V are not matrices but operators, is the Hankel operator H . Yet most importantly, all such decompositions of the gramians are equivalent [7,Theorem 5.1]. That is, there are unitary transformations U 1 : ran(H ) → ran(K V ) and U 2 : ker(H ) ⊥ → ker(K V ) ⊥ such that any decomposition K V | ker(K V ) ⊥ of the gramians is equivalent to the Hankel operator studied in this paper H | ker(H ) This makes our results on error bounds widely applicable since the Hankel decomposition is as good as any other decomposition.
This follows, as to evaluate the trace norm of the difference of Hankel operators appearing in our error bound, it suffices to compute the gramians of the composite system and not the actual Hankel operators, see the explanation given in "Appendix B". In particular, the respective gramians of the composite system can be computed, for example, directly from the Lyapunov equations of the composite error system.

Notation
The space of bounded linear operators between Banach spaces X , Y is denoted by L(X , Y ) and just by L(X ) if X = Y . The operator norm of a bounded operator T ∈ L(X , Y ) is written as T . The trace class operators from X to Y are denoted by TC(X , Y ) and the Hilbert-Schmidt operators by HS(X , Y ). In particular, we recall that for a linear trace class operator T ∈ TC(X , Y ), where X and Y are separable Hilbert spaces, the trace norm is given by the following supremum over orthonormal systems of basis vectors (ONB), In order not to specify the constant C, we also write g f . The indicator function of an interval I is denoted by 1 I . The domain of unbounded operators A is denoted by D(A).
Let H be a separable Hilbert space. For the n-fold Hilbert space tensor product of a Hilbert space H , we write H ⊗n := H ⊗ · · · ⊗ H . To define the Hankel operator, we require a decomposition of the positive gramians. For this purpose, we introduce the Let C + be the right complex half-plane, then we define the H -valued Hardy spaces H 2 and H ∞ of multivariable holomorphic functions F : C k + → H with finite norms (1.7) Finally, for k-variable functions h we occasionally use the short notation In Sect. 5, the space L p ad denotes the L p spaces of stochastic processes that are adapted to an underlying filtration and we introduce the notation I := I × where I is some interval.

The pillars of bilinear balanced truncation
We start with the definition of the gramians on X which extend the standard definition on finite-dimensional spaces (1.3), (1.4) to arbitrary separable Hilbert spaces.

Gramians
Let H be a separable Hilbert space and C ∈ L(X , H) the state-to-output (observation) operator. The space H is called the output space. As we assume that there are n control functions, the space R n will be referred to as the input space. Adopting the notation used in (1.1) with strongly continuous semigroup (T (t)) generated by A, we then introduce the bilinear gramians for times t i ∈ (0, ∞): N n l−1 T (t l ) y ⊗ e n 1 ⊗ · · · ⊗ e n i with e i denoting the standard basis vectors of R n .
are summable in operator norm. The limiting operator, given by O : To define the reachability gramian, let P 0 (t 1 ) := T (t 1 ) * . For i ≥ 1 and y ∈ X , we introduce
As in finite dimensions [6, Theorems 3 and 4], the gramians are solutions to Lyapunov equations. However, the Lyapunov equations hold only in a weak sense if the generator of the semigroup A is unbounded.
Proof We restrict us to the proof of the first identity, since the proof of the second one is fully analogous. Let x ∈ D(A) then by (2.1) Similarly, for x ∈ D(A) and k ≥ 1 by the fundamental theorem of calculus, the exponential decay of the semigroup at infinity, and the definition of the observability gramian Finally, we may use the polarization identity to obtain (2.3).
As stated for finite-dimensional systems in [5, Theorem 3.1], we obtain the following eponymous properties for the gramians.
We start by showing that ker(O) is an invariant subspace of the semigroup (T (t)). Let x ∈ ker(O), then for all t ≥ 0 and all k by (2.1) and the semigroup property where we used the semigroup property of (T (t)), substituted τ = s k+1 + t, and extended the integration domain to get the final inequality.

Lemma 2.4
The closure of the range of the reachability gramian P is an invariant subspace of the flow of (1.1), i.e. for ϕ 0 ∈ ran(P) it follows that ϕ(t) ∈ ran(P) for all times t ≥ 0.
Proof Analogous to Lemma 2.3.

Hankel operators on Fock spaces
To decompose the observability gramian as O = W * W and the reachability gramian as P = R R * , we start by defining the observability and reachability maps.
Similarly, to the decomposition of the observability gramian, we introduce a decomposition of the reachability gramian P = R R * . Let The adjoint operators of the R k are the operators If the gramians exist, then the reachability map is defined as Its adjoint is given by To see that R k is a Hilbert-Schmidt operator, we take an ONB (e i ) of F n k+1 (R n ), such that the e i are tensor products of an ONB of L 2 ((0, ∞), R) and standard unit vectors of R n , and an arbitrary ONB (3.1) One can then check that the maps W and P indeed decompose the gramians as O = W * W and P = R R * . We now introduce the main object of our analysis:

Definition 3.2 The Hankel operator is the Hilbert-Schmidt operator
Since any compact operator acting between Hilbert spaces possesses a singular value decomposition, we conclude that: We now state a sufficient condition under which H is a trace class operator such that (σ k ) k∈N ∈ 1 (N).
is a Carleman operator, we can apply [20, Theorem 6.12(iii)] that characterizes Carleman operators of Hilbert-Schmidt type. The statement of the Lemma follows from the summability of In the rest of this section, we discuss immediate applications of our preceding construction. We start by introducing the truncated gramians.
HS and by the inverse triangle inequality HS . Thus, it suffices to bound by (3.1) and Definition 3.1 We now give the proof of Proposition 1.2 on the approximation by subsystems. The Hankel operator for the subsystem on V i is then with P V i being the orthogonal projection onto V i .

Proof of Proposition 1.2 Using elementary estimates
it suffices to show HS-convergence of R V i to R. This is done along the lines of (3.1).

Convergence of singular vectors
The convergence of singular values is addressed in Proposition 1.  Proof of Lemma 3. 6 We give the proof only for singular vectors (e j ) since the arguments for ( f j ) are analogous. We start by writing e j = r (m)e j (m) + x j (m) where e j (m), x j (m) = 0. Then, the arguments stated in the proof of [22,Appendix 2] show that for m sufficiently large (the denominator is well defined as the singular values are non-degenerate)

Global error estimates
We start by defining a control tensor U k (s) ∈ L H ⊗ R n ⊗k , H Using sets k (t) := {(s 1 , . . . , s k ) ∈ R k ; 0 ≤ s k ≤ · · · ≤ s 1 ≤ t}, we can decompose the output map (0, ∞) t → Cϕ(t) with ϕ as in (1.2) for controls (4.1) The first term K 1 is determined by the initial state ϕ 0 of the evolution problem (1.1). If this state is zero, the term K 1 vanishes. The term K 2 on the other hand captures the intrinsic dynamics of Eq. (1.1). A technical object linking the dynamics of the evolution equation to the operators from the balancing method is the Volterra kernels we introduce next: The Volterra kernels satisfy an invariance property for all p, q, k, j ∈ N 0 such that p + q = k + j : (4. 2) The Volterra kernels appear also as integral kernels of the Hankel operator In [17], the kth-order transfer function G k has been introduced as the k + 1-variable Laplace transform of the Volterra kernel h k,0 Using mixed Hardy norms as defined in (1.7), the Paley-Wiener theorem implies the following estimate for i ∈ {1, . . . , k + 1} The next Lemma bounds the mixed L 1 -L 2 norm of the difference Volterra kernel:

Lemma 4.2 Consider two systems satisfying Assumption 1 with the same number of controls and the same output space H R m such that H is trace class (Lemma 3.4).
Then, the Volterra kernels h k, j satisfy . Proof Given the difference Volterra kernel (h k, j ) associated with (W k R j ). For every z ∈ N 0 and α > 0 fixed, we define a family of sesquilinear forms (L z,α ) k, j (s, 2zα) g(s k+1 , . . ., s k+ j ) R m ⊗R n ⊗k ds.
Since h The singular value decomposition of Q provides orthonormal systems f z,i ∈ , parameterized by i ∈ N, and singular values σ z,i ∈ [0, 1] such that for any δ > 0 given there is N (δ) large enough with form orthonormal systems parameterized by z and i in spaces F n j+1 (R n ) and F n k+1 (R m ), respectively, such that using the auxiliary quantities Hence, using the above uniform continuity as well as (4.4) and (4.5) This implies immediately by uniform continuity (4.6) Summing over z up to M 2α implies by the choice of M that The Lemma follows then from the characterization of the trace norm stated in (1.6).
The preceding Lemma provides us with bounds on the difference of the dynamics of two systems and satisfying Assumption 1. In particular, Lemma 4.2 allows us to prove Theorem 1.

Proof of Theorem 1
The Hankel operator is an infinite matrix with operator-valued entries H i j = W i R j . Using the invariance property (4.2), we can combine Lemma 4.2 with estimate (4.3), relating the transfer functions to the Volterra kernels, to obtain from the definition of the trace norm (1.6) that which by summing up the two bounds yields the statement of the theorem.
While Theorem 1 controls the transfer functions, the subsequent theorem controls the actual dynamics from zero:

Proof of Theorem 2 The operator norm of the control tensor is bounded by
where we applied the Cauchy-Schwarz inequality to the product inside the sum to bound the 1 norm by an 2 norm.

Applications
Throughout this section, we assume that we are given a filtered probability space ( , F, (F t ) t≥T 0 , P) satisfying the usual conditions, i.e. the filtration is rightcontinuous and F T 0 contains all F null-sets. We assume X to be a real separable Hilbert space. In the following subsection, we study an infinite-dimensional stochastic evolution equation with Wiener noise to motivate the extension of stochastic balanced truncation to infinite-dimensional systems that we introduce thereupon. We stick mostly to the notation introduced in the preceding sections and also consider the state-to-output (observation) operator C ∈ L(X , H), the control-to-state (control) operator Bu = n i=1 ψ i u i , and A the generator of an exponentially stable C 0 -semigroup (T (t)) on X .

Stochastic evolution equation with Wiener noise.
Let Y be a separable Hilbert space and TC(Y ) Q = Q * ≥ 0 a positive trace class operator. We then consider a Wiener process (W t ) t≥T 0 [24,Def. 2.6] adapted to the filtration (F t ) t≥T 0 with covariance operator Q.

We introduce the Banach space
), X -valued processes adapted to the filtration (F t ) t≥T 0 and consider mappings 4 N ∈ L(X , L(Y , X )) and controls where we recall the notation X := × X . For the stochastic partial differential equation We refer to ( . If the initial time is some T 0 rather than 0, we denote the (initial time-dependent) flow by The (X -)adjoint of the flow is defined by Another important property of the homogeneous solution to (5.1) is that it satisfies the homogeneous Markov property [24,Section 3.4]. Although the flow is time dependent as the SPDE is non-autononomous, there is an associated ( (s + t, s)x)) for all s ≥ 0 and P(t + s) f = P(t)P(s) f .
The C b -Feller property, i.e. P(t) maps C b (X ) again into C b (X ), will not be needed in our subsequent analysis, but reflects the continuous dependence of the solution (5.1) on initial data. We shall also use that the C b -Markov semigroup can be extended to all f for which the process is still integrable, i.e. f ( (t, s)x) ∈ L 1 ( , R) for arbitrary s ≤ t and x ∈ X .
By applying the Markov property to the auxiliary functions f x,y defined as follows In the following subsection, we introduce a generalized stochastic balanced truncation framework for systems similar to the stochastic evolution equation (5.1).

Generalized stochastic balanced truncation
For an exponentially stable flow , we define the stochastic observability map W and reachability map R We define stochastic observability O = W * W ∈ L(X ) and reachability P = R R * ∈ TC(X ) gramians for all x, y ∈ X by To obtain a dynamical interpretation of the gramians, let us recall that for compact self-adjoint operators K : X → X , we can define the (possibly unbounded) Moore-Penrose pseudoinverse as using any orthonormal eigenbasis (v λ ) λ∈σ (K ) associated with eigenvalues λ of K .
Then, for any time τ > 0 one defines the input energy E τ input : X → [0, ∞] and output energy E τ output : X → [0, ∞] up to time τ as where Y t is the variation of constants process of the flow defined in (5.4). In particular, the expectation value E(Y τ (u)) appearing in the definition of the input energy is a solution to the deterministic equation where u ∈ L 2 ((0, ∞), R n ) is a deterministic control. The theory of linear systems implies that x is then reachable, by the dynamics of (5.9), after a fixed finite time τ > 0 if x ∈ ran P det τ where P det τ is the time-truncated deterministic linear gramian which for x, y ∈ X is defined as The control, of minimal L 2 norm, that steers the deterministic system (5.9) into state x after time τ is then given by We also define time-truncated stochastic reachability and observability gramians P τ and O τ for x, y ∈ X x, P τ y = E τ 0 B * (t) * x, B * (t) * y dt and An application of the Cauchy-Schwarz inequality shows that ker(P τ ) ⊂ ker(P det τ ) and thus ran(P det τ ) ⊂ ran(P τ ) : Since for τ 1 > τ 2 : ker(P τ 1 ) ⊂ ker(P τ 2 ), it also follows that ran(P τ 2 ) ⊂ ran(P τ 1 ).
Then, one has, as for finite-dimensional systems [19,Prop. 3.10], the following bound on the input energy (5.8): Lemma 5.2 Let x be a reachable by the flow defined in (5.9) and x ∈ ran(P τ ) then The output energy of any state x ∈ X satisfies Proof The representation of the output energy is immediate from the definition of the (time-truncated) observability gramian. For the representation of the input energy, we have by assumption x ∈ ran(P det τ ) ∩ ran(P τ ). Consider then functions Hence, we find since x = P det which implies the claim on the (time-truncated) reachability gramian Remark 3 (Reachability concept) Apart from the energy concept discussed above, interesting ideas relating the eigendecomposition of the reachability gramian to the set of reachable states have been recently presented in [25,Sec.3] and apply to infinitedimensional systems as well. ω). Then, by the semigroup property of the time-homogeneous Markov process it follows that

Definition 5.3 The stochastic Hankel operator is defined as
and thus By homogeneity of the Markov semigroup and Young's inequality, we find While the error bound in Proposition 1.3 relied essentially on linear theory, our next estimate in Theorem 3 bounds the expected error. The proof strategy resembles the proof presented for bilinear systems in Lemma 4.2. We start, as we did for bilinear systems, by introducing the Volterra kernels of the stochastic Hankel operator.  3 We will show that the difference of compressed Volterra kernels h of the two systems satisfies

Proof of Theorem
Thus, it suffices to verify (5.11). Let Z := L 2 ( , R m ) ⊗ L 2 ( , R n ). The independence assumption in the theorem has been introduced for to hold. To see this, we consider an auxiliary function ξ i (x 1 , where C and C are the observation operators of the two systems. By the independence assumption, there is again a Markov semigroup (P(t)) t≥0 associated with the time-homogeneous Markov process determined by the vector-valued flow ( (t)) t≥0 := ( (t), (t)) t≥0 such that (P(t)ξ i )(x 1 , x 2 ) := E(ξ i ( (s +t, s)x 1 , (s + t, s)x 2 )). Let (ψ j ) j∈{1,...,n} , ( ψ) j∈{1,...,n} be the vectors in X comprising the control operators B and B, respectively. The semigroup property of (P(t)) t≥0 implies then  ((s, •), (t, • ))) contains the products of two flows, the function (h ((x, •), (x, • ))) is a.e. well defined on the diagonal. Then, there is a set J of full measure such that every x ∈ J ⊂ (0, M) is a Lebesgue point of the Volterra kernel on the diagonal. Thus, as for the condensed Volterra kernel above, there is also for the full Volterra kernel some 0 < γ x < min(x, M − x) such that if 0 < α/2 ≤ γ x then ((x, •), (x, • ))) Z ds dt ≤ ε/M. (5.14) This is due to Lebesgue's differentiation theorem for Banach space-valued integrands applied to the flows , and the following estimate •), (x, • )) Z ds dt ds dt Consider then the family of intervals Lebesgue's covering theorem [26,Theroem 26] states that, after possibly shrinking the diameter of the sets I x first, there exists an at most countably infinite family of disjoint sets (I x i ) i∈N covering I ∩ J such that the Lebesgue measure of I ∩ J ∩ i∈N I x i C is zero. The additivity of the Lebesgue measure implies that there are for every ε > 0 finitely many points x 1 , . . . , x n ∈ I ∩ J such that the set I ∩ J ∩ n i=1 I x i C has Lebesgue measure at most ε . Thus, we have obtained finitely many disjoint sets I x i of total measure M − ε such that for 0 < α i /2 ≤ diam(I x i )/2 both estimates (5.13) and (5.14) hold at x = x i where x i is the midpoint of I x i . For every i ∈ {1, . . . , n} fixed, we introduce the family of sesquilinear forms (L i ) and for Z := L 2 ( , R m ) ⊗ L 2 ( , R n ) we can define a Hilbert-Schmidt operator of unit HS-norm given by Q i : The singular value decomposition of Q i yields orthonormal systems f k,i ∈ L 2 ( , R m ) , g k,i ∈ L 2 ( , R n ) as well as singular values σ k,i ∈ [0, 1] parameterized by k ∈ N. For any δ > 0, given there is N (δ) large enough such that Thus, there are also f k,i ∈ L 2 ( , R m ) and g k,i ∈ L 2 ( , R n ) orthonormalized, N i ∈ N, and σ k,i ∈ [0, 1] such that form orthonormal systems in L 2 (0,∞) , R n and L 2 (0,∞) , R m , respectively, both in k and i, such that for (h((s, ω), (t, ω )))g k,i (ω ) R m dt ds dP(ω) dP(ω ).
The bound on the first term follows from (5.14) and The bound on the second term follows from (5.15) and the third term is (5.12). We then compute further that where we used (5.13) to obtain the second estimate. Combining the two preceding estimates, the theorem follows from the characterization of the trace norm given in (1.6).
Next, we study conditions under which convergence of flows implies convergence of stochastic Hankel operators. Let ( i ) be a sequence of flows converging in L 2 ( (0,∞) , L(X )) to and W i , R i the observability and reachability maps derived from i as in (5.6). For the observability map, this yields convergence in operator norm If H R m , then it follows by an analogous estimate that W i converges to W in Hilbert-Schmidt norm, too [20, Theorem 6.12(iii)].
For the reachability map, we choose an ONB (e k ) k∈N of L 2 ( (0,∞) , R) which we extend by tensorization e j k := e k ⊗ e j for j ∈ {1, . . . , n} to an ONB of L 2 ( (0,∞) , R n ). Using this basis and an orthonormal basis ( f l ) l∈N of X , it follows that As in the bilinear case, we obtain from this a convergence result for stochastic Hankel operators: To exhibit the connection between the model reduction methods for SPDEs and bilinear systems, we finally state a weak version of the stochastic Lyapunov equations for real-valued Lévy noise as stated for finite-dimensional systems in [19,Eq. (14),   for ξ ∈ L 2 ( , F 0 , P, X ), A the generator of a C 0 -semigroup (T (t)), and N j ∈ L(X ). Then, the homogeneous part of (5.17), i.e. without the control term Bu, defines a unique predictable process Z ds.
An inflection of the integration domain shows then that both expressions (and hence the gramians) coincide. Finally, the gramians satisfy the following Lyapunov equations for scalar Lévy-type noise (cf. [19] for the finite-dimensional analogue): Stochastic integration by parts yields after summing over i ∈ {1, . . . , n} Letting t tend to infinity, we obtain the first Lyapunov equation as by exponential stability lim t→∞ E ( x 1 , (t)ψ i (t)ψ i , y 1 ) = 0. The second Lyapunov equation can be obtained by an analogous calculation: Let x 0 ∈ X be arbitrary, then we study the evolution for initial conditions √ C * C x 0 in the weak sense of the adjoint flow Proceeding as before, stochastic integration by parts yields Using Parseval's identity, i.e. summing over an orthonormal basis replacing x 0 , yields after taking the limit t → ∞ the second Lyapunov equation.
For instance, the system on X can be thought of as the full system and the system on X r as the reduced system. One can then define a composite error system on the direct sum of Hilbert spaces X = X ⊕ X r with the same input space R n and output space H