1 Introduction

Model reduction of bilinear systems has become a major field of research, partly triggered by applications in optimal control and the advancement of iterative numerical methods for solving large-scale matrix equations. High-dimensional bilinear systems often appear in connection with semi-discretized controlled partial differential equations or stochastic (partial) differential equations with multiplicative noise. A popular class of model reduction methods that is well established in the field of linear systems theory is based on first transforming the system to a form in which highly controllable states are highly observable and vice versa (“balancing”) and then eliminating the least controllable and observable states. For finite-dimensional linear systems, balanced truncation and residualization (a.k.a. singular perturbation approximation) feature computable error bounds and are known to preserve important system properties, such as stability or passivity [1]; see also [2] and references therein. For a generalization of (linear) balanced truncation to infinite-dimensional systems, see [3, 4].

For bilinear systems, no such elaborate theory as in the linear case is available, in particular approximation error bounds for the reduced system are not known. The purpose of this paper therefore is to extend balanced truncation to bilinear and stochastic evolution equations, specifically, to establish convergence results and prove explicit truncation error bounds for the bilinear and stochastic systems. For finite-dimensional systems, our framework coincides with the established theory for bilinear and stochastic systems as studied in [5, 6], and references therein. We start by introducing a function space setting that allows us to define bilinear balanced truncation in arbitrary (separable) Hilbert spaces which extends the finite-dimensional theory. However, instead of just extending the finite-dimensional theory to infinite dimensions, we harness the functional analytic machinery available in infinite dimensions to obtain new explicit error bounds for finite-dimensional systems as well.

The figure of merit in our analysis is a Hankel-type operator acting between certain function spaces which are ubiquitous in many-body quantum mechanics and within this theory called Fock spaces. We show that under mild assumptions on the dynamics, the Hankel operator is a Hilbert–Schmidt or even trace class operator. The key idea is that the algebraic structure of the Fock space, that is, a direct sum of tensor products of copies of Hilbert spaces, mimics the nested Volterra kernels representing the bilinear system. This allows us to perform an analysis of the singular value decomposition of this operator along the lines of the linear theory developed by Curtain and Glover [3]. For more recent treatments of infinite-dimensional linear systems, we refer to [4, 7, 8]. For applications of the bilinear method to finite-dimensional open quantum systems and Fokker–Planck equations, we refer to [9, 10].

The article is structured as follows: The rest of the introduction is devoted to fix the notation that is used throughout the article and to state the main results. Section 2 introduces the concept of balancing based on observability and controllability (or reachability) properties of bilinear systems, which is then used in Sect. 3 to define the Fock space-valued Hankel operator and study properties of its approximations. The global error bounds for the finite-rank approximation based on the singular value decomposition of the Hankel operator are given in Sect. 4. Finally, in Sect. 5 we discuss applications of the aforementioned results to the model reduction of stochastic evolution equations driven by multiplicative Lévy noise. The article contains two appendices. The first one records a technical lemma stating the Volterra series representation of the solution to infinite-dimensional bilinear systems. The second appendix provides more background on how to compute the error bounds found in this article.

1.1 Set-up and main results

Let X be a separable Hilbert space and \(A:D(A) \subset X \rightarrow X\) the generator of an exponentially stable \(C_0\)-semigroup \((T(t))_{t \ge 0}\) of bounded operators, i.e. a strongly continuous semigroup that satisfies \(\left\| T(t) \right\| \le M e^{-\nu t}\) for some \(\nu >0\) and \(M \ge 1.\)

For exponentially stable semigroups generated by A, bounded operators \(N_i \in {\mathcal {L}} (X)\), \(B \in \mathcal L(\mathbb {R}^n,X)\), an initial state \(\varphi _0 \in X\), and control functions \(u=(u_1,\ldots ,u_n) \in L^{2}((0,T),\mathbb {R}^n)\), we study bilinear evolution equations on X of the following type

$$\begin{aligned} \varphi '(t)= & {} A \varphi (t) + \sum _{i=1}^n N_i \varphi (t) u_i(t) + Bu(t), \text { for }t \in (0,T) \text { such that } \nonumber \\&\varphi (0) =\varphi _0. \end{aligned}$$
(1.1)

It follows from standard fixed-point arguments [11, Proposition 5.3] that such equations always have unique mild solutions \(\varphi \in C([0,T],X)\) that satisfy

$$\begin{aligned} \varphi (t) = T(t)\varphi _0 + \int _{0}^{t} T(t-s)\left( \sum _{i=1}^n u_i(s)N_i \varphi (s)+Bu(s) \right) \mathrm{d}s. \end{aligned}$$
(1.2)

Let \(\Gamma :=\sqrt{ \sum _{i=1}^n \left\| N_iN_i^* \right\| }\) and assume that \(M^{2} \Gamma ^{2} ( 2 \nu )^{-1}<1.\) We then introduce the observability \(\mathscr {O}= W^* W \) and reachability gramian \(\mathscr {P} = R R^*\) for Eq. (1.1) in Definition 2.1. The gramians we define coincide for finite-dimensional system spaces \(X\simeq {\mathbb {R}}^k,\) and control \(B \in {\mathcal {L}}({\mathbb {R}}^n,{\mathbb {R}}^k)\) and observation \(C\in {\mathcal {L}}({\mathbb {R}}^k,{\mathbb {R}}^m)\) matrices with the gramians introduced in [12], see also [6, (6) and (7)]. More precisely, if X is finite dimensional, then the reachability gramian \({\mathscr {P}}\) is defined by

$$\begin{aligned} \begin{aligned}&P_1(t_1) = e^{At_1} B, \\&P_i(t_1,\ldots ,t_i) = e^{A t_1}\left( N_1 P_{i-1} \ N_2 P_{i-1} \ \cdots N_n P_{i-1} \right) (t_2,\ldots ,t_i),\quad i\ge 2\\&{\mathscr {P}} = \sum _{i=1}^{\infty } \int _{(0,\infty )^i} P_i(t_1,,.,t_i) P_i(t_1,\ldots ,t_i)^T \mathrm{d}t \end{aligned} \end{aligned}$$
(1.3)

and the observability gramian \({\mathscr {O}}\) by

$$\begin{aligned} \begin{aligned}&Q_1(t_1) = Ce^{At_1} , \\&Q_i(t_1,\ldots ,t_i) = \left( Q_{i-1} N_1 \ Q_{i-1} N_2 \ \cdots Q_{i-1} N_n \right) (t_2,\ldots ,t_i)e^{A t_1}, \quad i\ge 2\\&{\mathscr {O}} = \sum _{i=1}^{\infty } \int _{(0,\infty )^i} Q_i(t_1,,.,t_i)^T Q_i(t_1,\ldots ,t_i) \mathrm{d}t. \end{aligned} \end{aligned}$$
(1.4)

The condition \(M^{2} \Gamma ^{2} ( 2 \nu )^{-1}<1\), stated in the beginning of this paragraph, appears naturally to ensure the existence of the two gramians. To see this, consider, for example, the reachability gramian for which we find [6, Theorem 2]

$$\begin{aligned} \left\| {\mathscr {P}} \right\| \le \sum _{i=1}^{\infty } \int _{(0,\infty )^i} \left\| P_i(t_1,,.,t_i) P_i(t_1,\ldots ,t_i)^T\right\| \mathrm{d}t \le \frac{\left\| BB^T \right\| }{\Gamma ^2}\sum _{i=1}^{\infty } \left( \frac{M^2 \Gamma ^2}{2\nu }\right) ^i \end{aligned}$$

which is summable if \(M^{2} \Gamma ^{2} ( 2 \nu )^{-1}<1.\)

For general bilinear and stochastic systems, the gramians will be decomposed, as indicated above, by an observability W and reachability map R that are explicitly constructed in Sect. 3. Although there are infinitely many possible decompositions of the gramians, our analysis relies on constructing an explicit decomposition. The Hankel operator is then defined as \(H= W R\) and is a map between Fock spaces. From the Hankel operator construction, we obtain two immediate corollaries: The Lyapunov equations for bilinear or stochastic systems are known to be notoriously difficult to solve. It is therefore computationally more convenient [13] to compute a kth-order truncation of the gramians which we introduce in Definition 3.5. Our first result implies exponentially fast convergence of the balanced singular values calculated from the truncated gramians to the balanced singular values obtained from the full gramians \({\mathscr {O}}\) and \({\mathscr {P}}\):

Proposition 1.1

Let \((\sigma _i)_{i \in {\mathbb {N}}}\) denote the balanced singular values \(\sigma _i:= \sqrt{\lambda _i(\mathscr {O} \mathscr {P})}\) and \((\sigma ^{k}_i)_{i \in {\mathbb {N}}}\) the singular values of the kth-order truncated gramians. The Hankel operator \(H^k\) obtained from the kth-order truncated gramians converges in Hilbert–Schmidt norm to H and for all \(i \in \mathbb {N}\)

$$\begin{aligned} \left|\sigma _{i}-\sigma ^k_{i} \right|= & {} {\mathcal {O}} \left( \left( \underbrace{M^{2} \Gamma ^{2} ( 2 \nu )^{-1}}_{<1} \right) ^k \right) \text { and } \left|\left\| \sigma \right\| _{\ell ^2}-\left\| \sigma ^k \right\| _{\ell ^2} \right|\\= & {} {\mathcal {O}} \left( \left( \underbrace{M^{2} \Gamma ^{2} ( 2 \nu )^{-1}}_{<1} \right) ^k \right) . \end{aligned}$$

Although our framework includes infinite-dimensional systems, such systems are usually numerically approximated by finite-dimensional systems.

We therefore state a result on systems that are approximated by projections onto suitable subspaces. Let \(V_1 \subset V_2 \subset \cdots \subset X\) be a nested sequence of closed vector spaces of arbitrary dimension such that \(\overline{\bigcup _{i \in \mathbb {N}} V_i }=X\) for which we assume that \(V_i\) is an invariant subspace of both T(t) and N. In this case, \(V_i\) is also an invariant subspace of the generator A of the semigroup [14, Chapter 2,  Section 2.3], and we can consider the restriction of (1.1) to \(V_i\)Footnote 1

$$\begin{aligned} \varphi _{V_i}'(t)= & {} A \varphi _{V_i}(t) + \sum _{i=1}^n u_i(t) N_i \varphi _{V_i}(t) + P_{V_i}Bu(t), \text { for }t \in (0,T), \\&\varphi (0) = P_{V_i} (\varphi _0). \end{aligned}$$

Proposition 1.2

Let \(H_{V_i}\) be the Hankel operator of the system restricted to \(V_i\). If the observability map W is a Hilbert–Schmidt operator, then the Hankel operator \(H_{V_i}\) converges in nuclear (trace) norm to H. If W is only assumed to be bounded, then the convergence of Hankel operators is still in Hilbert–Schmidt norm.

Sufficient conditions for W to be a Hilbert–Schmidt operator are given in Lemma 3.4. Norm convergence of Hankel operators implies convergence of its singular values and so the convergence of Hankel singular values holds also under the assumptions of Proposition 1.2.

We then turn to global error bounds for bilinear systems: For linear systems, the existence of a Hardy space \(\mathscr {H}^{\infty }\) error bound is well known and a major theoretical justification of the linear balanced truncation method both in theory and practice. That is, the difference of the transfer function for the full and reduced system in \(\mathscr {H}^{\infty }\) norm is controlled by the difference of the Hankel singular values that are discarded in the reduction step. To the best of our knowledge, there is no such bound for bilinear systems and we are only of aware of two recent results in that direction [15, 16].

In [17], a family of transfer functions \((G_k)_{k \in {\mathbb {N}}_0}\) for bilinear systems was introduced. We consider the difference of these transfer functions for two systems and write \(\Delta (G_k)\) for the difference of transfer functions and \(\Delta (H)\) for the difference of Hankel operators. In terms of these two quantities, we obtain an error bound that extends the folklore bound for linear systems to bilinear systems:

Theorem 1

Consider two bilinear systems that both satisfy the stability condition \(M^2\Gamma ^2(2\nu )^{-1}<1\) with the same finite-dimensional input space \({\mathbb {R}}^n\) and output space \({\mathcal {H}} \simeq {\mathbb {R}}^m.\)Footnote 2 The difference of the transfer functions of the two systems \(\Delta (G_k)\) in mixed \(\mathscr {H}^{\infty }\)-\(\mathscr {H}^2\) Hardy norms, defined in (1.7), is bounded by

$$\begin{aligned} \begin{aligned}&\sum _{k=1}^{\infty } \left( \left\| \Delta (G_{2k-2}) \right\| _{{\mathscr {H}}^{\infty }_k {\mathscr {H}}^2_{2k-2}}+\left\| \Delta (G_{2k-1}) \right\| _{{\mathscr {H}}^{\infty }_k \mathscr {H}^2_{2k-1}} \right) \le 4 \left\| \Delta (H) \right\| _{{\text {TC}}}. \end{aligned} \end{aligned}$$

The trace distance of the Hankel operators can be explicitly evaluated using the composite error system, see “Appendix B”, and does not require a direct computation of Hankel operators.

The proof of Theorem 1 is done by extending the framework of the linear balancing theory and extends the \(2 \left\| \Delta (H) \right\| _{{\text {TC}}}\) bound on the \(\mathscr {H}^{\infty }\) norm of the transfer function for linear equations to bilinear systems. From the Hankel estimates, we then obtain an explicit error bound on the dynamics for two systems with initial condition zero:

Theorem 2

Consider two bilinear systems that both satisfy the stability condition \(M^2\Gamma ^2(2\nu )^{-1}<1\) with the same finite-dimensional input space \({\mathbb {R}}^n\) and output space \({\mathcal {H}} \simeq {\mathbb {R}}^m\). Let \(\Delta (C\varphi (t))\) be the difference of the outputs of the two systems. For control functions, \(u \in L^{\infty }((0,\infty ),{\mathbb {R}}^n) \cap L^2((0,\infty ),{\mathbb {R}}^n)\) such that \(\left\| u \right\| _{L^{2}((0,\infty ),({\mathbb {R}}^n, \left\| \bullet \right\| _{\infty }))} < \min \left( \frac{1}{\sqrt{n}}, \frac{\sqrt{2\nu }}{M \Xi } \right) \) with \(\Xi :=\sum _{i=1}^{n}\Vert N_{i}\Vert \) and initial conditions zero it follows that

$$\begin{aligned} \sup _{t \in (0, \infty )} \left\| \Delta (C\varphi (t)) \right\| _{{\mathbb {R}}^m} \le 4\sqrt{n} \left\| \Delta (H) \right\| _{{\text {TC}}} \left\| u \right\| _{L^{\infty }((0,\infty ),({\mathbb {R}}^n, \left\| \bullet \right\| _{\infty }))}. \end{aligned}$$

As stated in Theorem 1, the trace distance of the Hankel operators can be explicitly evaluated using the composite error system, see “Appendix B”, and does not require a direct computation of Hankel operators.

As an application of the theoretical results, we discuss generalized stochastic balanced truncation of stochastic (partial) differential equations in Sect. 5. The links between bilinear balanced truncation and stochastic balanced truncation are well known for finite-dimensional systems driven by Wiener noise (see e.g. [5]). In Sect. 5, we extend the Hankel operator methods to the finite-dimensional stochastic systems discussed in [18, 19], but our methods also cover a large class of infinite-dimensional stochastic systems as well. By pursuing an approach similar to the linear setting, we obtain an error bound on the expected output in terms of the Hankel singular values:

Proposition 1.3

Consider two stochastic systems with the same finite-dimensional input space \({\mathbb {R}}^n\) and output space \({\mathcal {H}} \simeq {\mathbb {R}}^m\). Let \(u \in L^p((0,\infty ),{\mathbb {R}}^n)\) for \(p \in [1,\infty ]\) be a deterministic control and let \(\Phi \) and \({\widetilde{\Phi }}\) be the stochastic flows of each respective system. The two stochastic flows shall be exponentially stable in mean square sense and define \( C_b\)-Markov semigroups. The difference \(\Delta (CY)\) of processes Y defined in (5.4) with initial conditions zero satisfies then

$$\begin{aligned} \left\| {\mathbb {E}} \Delta (CY_{\bullet }(u)) \right\| _{L^p((0,\infty ),{\mathbb {R}}^m)} \le 2 \left\| \Delta (H) \right\| _{{\text {TC}}} \left\| u \right\| _{L^{p}((0,\infty ),{\mathbb {R}}^n)}. \end{aligned}$$

The trace distance of the Hankel operators can be explicitly evaluated using the composite error system, see “Appendix B”.

It was first shown in [18, Example II.2] that the difference of full and reduced stochastic systems cannot be estimated by the sum of truncated singular values, which is the case for linear systems. Instead, the following result can be obtained by arguing along the lines of the bilinear framework:

Theorem 3

Consider two stochastic systems with the same finite-dimensional input space \({\mathbb {R}}^n\) and output space \({\mathcal {H}} \simeq {\mathbb {R}}^m\) such that the respective stochastic flows \(\Phi \) and \({\widetilde{\Phi }}\) are independent. The two stochastic flows shall be exponentially stable in mean square sense and define \( C_b\)-Markov semigroups. The difference \(\Delta (CY)\) of processes Y defined in (5.4) with zero initial conditions satisfies

$$\begin{aligned} \sup _{t \in (0,\infty )} {\mathbb {E}} \left\| \Delta (CY_t(u)) \right\| _{{\mathbb {R}}^m} \le 2 \left\| \Delta ( H) \right\| _{{\text {TC}}} \left\| u\right\| _{\mathcal H_2^{(0,\infty )}({\mathbb {R}}^n)} \end{aligned}$$
(1.5)

with controls in the Banach space \(\left( \mathcal H_{2}^{(0,\infty )}({\mathbb {R}}^n), \sup _{t \in (0,\infty )}\left( {\mathbb {E}} \left( \left\| u(t) \right\| _{\mathbb R^n}\right) ^2\right) ^{1/2}\right) .\) The trace distance of the Hankel operators can be explicitly evaluated using the composite error system, see “Appendix B”.

1.2 Finite-dimensional intermezzo and relation to balanced truncation

Hitherto, stochastic and bilinear balanced truncation have only been considered for finite-dimensional systems and so we devote a few preliminary remarks towards this setting. When applying, for example, balanced truncation to finite-dimensional systems, one computes the observability and reachability gramians \({\mathscr {O}} \) and \(\mathscr {P}\) from the Lyapunov equations and decomposes these symmetric positive-definite matrices into some other (non-unique) matrices \({\mathscr {O}}=K^*K\) and \({\mathscr {P}} = VV^*.\) In the next step, a singular value decomposition of the matrix KV is computed. The singular values of this matrix KV are just the square roots of the eigenvalues of the product of the gramians \(\sigma _j:=\sqrt{\lambda _j({\mathscr {O}} {\mathscr {P}})}\) independent of the particular form of K and V. (Zero is not counted as a singular value here.)

By discarding a certain number of “small” singular values of KV, one can reduce the order of the system by applying, for example, the balancing transformations, see [6, Proposition 2]. A paradigm of such a decomposition KV, where K and V are not matrices but operators, is the Hankel operator H. Yet most importantly, all such decompositions of the gramians are equivalent [7, Theorem 5.1]. That is, there are unitary transformations \(U_1: \overline{{\text {ran}}}(H) \rightarrow \overline{{\text {ran}}}(KV)\) and \(U_2: {\text {ker}}(H)^{\perp } \rightarrow {\text {ker}}(KV)^{\perp }\) such that any decomposition \(KV\vert _{{\text {ker}}(KV)^{\perp }} \) of the gramians is equivalent to the Hankel operator studied in this paper \(H \vert _{{\text {ker}}(H)^{\perp }} = U_1^* \ KV\vert _{{\text {ker}}(KV)^{\perp }} U_2.\) This makes our results on error bounds widely applicable since the Hankel decomposition is as good as any other decomposition.

This follows, as to evaluate the trace norm of the difference of Hankel operators appearing in our error bound, it suffices to compute the gramians of the composite system and not the actual Hankel operators, see the explanation given in “Appendix B”. In particular, the respective gramians of the composite system can be computed, for example, directly from the Lyapunov equations of the composite error system.

1.3 Notation

The space of bounded linear operators between Banach spaces XY is denoted by \({\mathcal {L}}(X,Y)\) and just by \({\mathcal {L}}(X)\) if \(X=Y.\) The operator norm of a bounded operator \(T \in {\mathcal {L}}(X,Y)\) is written as \(\left\| T \right\| \). The trace class operators from X to Y are denoted by \({\text {TC}}(X,Y)\) and the Hilbert–Schmidt operators by \({\text {HS}}(X,Y).\) In particular, we recall that for a linear trace class operator \(T \in {\text {TC}}(X,Y)\), where X and Y are separable Hilbert spaces, the trace norm is given by the following supremum over orthonormal systems of basis vectors (ONB),

$$\begin{aligned} \left\| T \right\| _{{\text {TC}}}=\sup \left\{ \sum _{n \in \mathbb {N}} \left|\langle f_n,T e_n \rangle _Y \right|; (e_n) \text { ONB of } X \text { and } (f_n) \text { ONB of } Y \right\} . \end{aligned}$$
(1.6)

\(\partial B_X(1)\) denotes the unit sphere of a Banach space X and we write \(g={\mathcal {O}} (f)\) if there is \(C>0\) such that \(\left\| g \right\| \le C \left\| f \right\| .\) In order not to specify the constant C, we also write \(\left\| g \right\| \lesssim \left\| f \right\| .\) The indicator function of an interval I is denoted by \(\mathbb {1}_I.\) The domain of unbounded operators A is denoted by D(A).

Let \( H \) be a separable Hilbert space. For the n-fold Hilbert space tensor product of a Hilbert space \( H \), we write \( H ^{\otimes n}:= H \otimes \cdots \otimes H .\) To define the Hankel operator, we require a decomposition of the positive gramians. For this purpose, we introduce the Fock space \(F^n( H )\) of \( H \)-valued functions \(F^n( H ):=\bigoplus _{k=1}^{\infty }F_k^n( H )\) where \(F_k^n( H ):=L^2((0,\infty )^k, H \otimes ({\mathbb {R}}^{n})^{\otimes (k-1)})\), and \(F_0^n( H ):= H .\)

Thus, elements of the Fock space \(F^n\) are sequences taking values in \(F^n_k.\)

Let \({\mathbb {C}}_{+}\) be the right complex half-plane, then we define the \( H \)-valued Hardy spaces \({\mathscr {H}}^2\) and \(\mathscr {H}^{\infty }\) of multivariable holomorphic functions \(F:\mathbb C_{+}^k \rightarrow H \) with finite norms

$$\begin{aligned} \left\| F \right\| _{{\mathscr {H}}^2}:= & {} \sup _{x \in \mathbb R^{k}} \frac{1}{(2\pi )^{k/2}} \left( \int _{(0,\infty )^{k}} \left\| F(x+iy) \right\| _{ H }^2 \ \mathrm{d}y\right) ^{\frac{1}{2}} \quad \text {and} \quad \\ \left\| F \right\| _{{\mathscr {H}}^{\infty }}:= & {} \sup _{z \in {\mathbb {C}}_{+}^{k}} \left\| F(z) \right\| _{ H }, \end{aligned}$$

respectively. We also introduce mixed \(L^1_iL^2_{k-1}\) and \(\mathscr {H}^{\infty }_i {\mathscr {H}}^{2}_{k-1}\) norms which for \( H \)-valued functions \(f:(0,\infty )^k \rightarrow H \) and \(g: {\mathbb {C}}_{+}^k \rightarrow H \) read

$$\begin{aligned} \begin{aligned}&\left\| f \right\| _{L^1_iL^2_{k-1}( H )} = \int _0^{\infty } \left\| f(\bullet ,\ldots ,\bullet ,s_i, \bullet ,\ldots ,\bullet ) \right\| _{L^2((0,\infty )^{k-1}, H )}\mathrm{d}s_i \text { and } \\&\left\| g \right\| _{{\mathscr {H}}^{\infty }_i \mathscr {H}^{2}_{k-1}( H )}= \sup _{s_i \in {\mathbb {C}}_{+}} \left\| g(\bullet ,\ldots ,\bullet ,s_i, \bullet ,\ldots ,\bullet ) \right\| _{{\mathscr {H}}^2((0,\infty )^{k-1}, H )}. \end{aligned} \end{aligned}$$
(1.7)

Finally, for k-variable functions h we occasionally use the short notation

$$\begin{aligned} h^{(i)}(s,t):=h(s_1,\ldots ,s_{i-1},t,s_{i},\ldots ,s_{k-1}). \end{aligned}$$
(1.8)

In Sect. 5, the space \(L^p_{\text {ad}}\) denotes the \(L^p\) spaces of stochastic processes that are adapted to an underlying filtration and we introduce the notation \(\Omega _I:=I \times \Omega \) where I is some interval.

2 The pillars of bilinear balanced truncation

We start with the definition of the gramians on X which extend the standard definition on finite-dimensional spaces (1.3), (1.4) to arbitrary separable Hilbert spaces.

2.1 Gramians

Let \(\mathcal {H}\) be a separable Hilbert space and \(C \in \mathcal L(X,\mathcal {H})\) the state-to-output (observation) operator. The space \({\mathcal {H}}\) is called the output space. As we assume that there are n control functions, the space \({\mathbb {R}}^n\) will be referred to as the input space. Adopting the notation used in (1.1) with strongly continuous semigroup (T(t)) generated by A, we then introduce the bilinear gramians for times \(t_i \in (0,\infty )\):

Definition 2.1

Let \(O_0(t_1):=CT(t_1)\). For \(i \ge 1\) and \(y \in X\), define

$$\begin{aligned} O_{i}(t_1,\ldots ,t_{i+1})y:=CT(t_1) \sum _{n_1,\ldots ,n_{i}=1}^n \left( \prod _{l=2}^{i+1} \left( N_{n_{l-1}}T(t_l) \right) \right) y \otimes \left( \widehat{e}_{n_1} \otimes \cdots \otimes \widehat{e}_{n_{i}}\right) \end{aligned}$$

with \(\widehat{e}_i\) denoting the standard basis vectors of \(\mathbb R^n.\)

Let \(M^2\Gamma ^2(2\nu )^{-1} < 1 \), then the bounded operators \(\mathscr {O}_k\) defined for \(x,y \in X\) by

$$\begin{aligned} \langle x, \mathscr {O}_ky \rangle _X := \int _{(0,\infty )^{k+1}}\langle O_k(s)x, O_k(s)y \rangle _{{\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes k}}}\mathrm{d}s \end{aligned}$$
(2.1)

are summable in operator norm. The limiting operator, given by \(\mathscr {O} := \sum _{k=0}^{\infty } \mathscr {O}_k\), is called the observability gramian \(\mathscr {O}\) in \({\mathcal {L}}(X).\)

To define the reachability gramian, let \(P_0(t_1):=T(t_1)^*\). For \(i \ge 1\) and \(y \in X\), we introduce

$$\begin{aligned} P_{i}(t_1,\ldots ,t_{i+1})y:= \sum _{n_1,\ldots ,n_{i}=1}^n \left( \prod _{l=1}^{i} \left( T(t_l)^*N_{n_l}^*\right) \right) T(t_{i+1})^*y \otimes \left( \widehat{e}_{n_1} \otimes \cdots \otimes \widehat{e}_{n_{i}}\right) . \end{aligned}$$

The control operator \(B \in {\mathcal {L}}({\mathbb {R}}^n, X)\) shall be of the form \(Bu=\sum _{i=1}^n \psi _i u_i\) for \(\psi _i \in X\) such that \(BB^*=\sum _{i=1}^n \langle \bullet , \psi _i \rangle \psi _i \) is a finite-rank operator. Define operators \(\mathscr {P}_k \) for any \(x,y \in X\) by

$$\begin{aligned} \langle x,\mathscr {P}_ky \rangle _X:=\int _{(0,\infty )^{k+1}} \left\langle P_k(s)x, \left( BB^* \otimes {\text {id}}_{\mathbb {R}^{n^{\otimes k}}}\right) P_k(s)y \right\rangle _{X \otimes \mathbb {R}^{n^{\otimes k}}}\mathrm{d}s. \end{aligned}$$
(2.2)

If \(M^2\Gamma ^2(2\nu )^{-1} < 1,\) the reachability gramian is defined as \({\mathscr {P}}:= \sum _{k=0}^{\infty } \mathscr {P}_k \in {\text {TC}}(X).\) The \({\text {TC}}(X)\)-convergence follows from the characterization (1.6) of the trace norm: For arbitrary orthonormal systems \((e_i),(f_i)\) of X

$$\begin{aligned} \begin{aligned}&\sum _{i=1}^{{\text {dim}}(X)} \left|\langle f_i,\mathscr {P} e_i \rangle _X \right|\\&\quad \le \left\| BB^* \right\| _{{\text {TC}}(X)} \sum _{k=0}^{\infty }\int _{(0,\infty )^{k+1}} \sum _{n_1,\ldots ,n_{k}=1}^n \left\| \prod _{l=1}^{k} \left( T(t_l)^*N_{n_l}^*\right) T(t_{k+1})^* \right\| ^2 \mathrm{d}t < \infty . \end{aligned} \end{aligned}$$

Assumption 1

We assume \(M^2\Gamma ^2(2\nu )^{-1}<1\) such that both gramians \(\mathscr {O}\) and \(\mathscr {P}\) exist.

As in finite dimensions [6, Theorems 3 and 4], the gramians are solutions to Lyapunov equations. However, the Lyapunov equations hold only in a weak sense if the generator of the semigroup A is unbounded.

Lemma 2.2

For all \(x_1,y_1 \in D(A)\) and all \(x_2,y_2 \in D(A^*)\)

$$\begin{aligned} \begin{aligned}&\langle \mathscr {O}Ax_1,y_1 \rangle _X + \langle \mathscr {O}x_1,Ay_1 \rangle _X + \sum _{i=1}^n \langle \mathscr {O}N_ix_1,N_iy_1 \rangle _X + \left\langle Cx_1,Cy_1 \right\rangle _{\mathcal {H}} =0 \text { and }\\&\langle \mathscr {P}A^*x_2,y_2 \rangle _X + \langle \mathscr {P}x_2,A^*y_2 \rangle _X + \sum _{i=1}^n \langle \mathscr {P}N_i^*x_2,N_i^*y_2 \rangle _X + \langle BB^*x_2,y_2\rangle _X=0. \end{aligned} \end{aligned}$$
(2.3)

Proof

We restrict us to the proof of the first identity, since the proof of the second one is fully analogous. Let \(x \in D(A)\) then by (2.1)

$$\begin{aligned} \begin{aligned}&\langle \mathscr {O}_0Ax,x \rangle +\langle \mathscr {O}_0x,Ax \rangle _X + \left\| Cx \right\| _{\mathcal {H}}^2 \\&\quad =\int _{0}^{\infty }\left( \langle CT'(s)x,CT(s)x \rangle _{\mathcal {H}} + \langle CT(s)x, CT'(s)x \rangle _{\mathcal {H}}\right) \mathrm{d}s +\left\| Cx \right\| _{\mathcal {H}}^2 \\&\quad =\int _{0}^{\infty } \frac{\mathrm{d}}{\mathrm{d}s} \left\| CT(s)x \right\| _{\mathcal {H}}^2 \mathrm{d}s +\left\| Cx \right\| _{\mathcal {H}}^2=0. \end{aligned} \end{aligned}$$

Similarly, for \(x \in D(A)\) and \(k \ge 1\) by the fundamental theorem of calculus, the exponential decay of the semigroup at infinity, and the definition of the observability gramian

$$\begin{aligned} \begin{aligned}&\langle \mathscr {O}_kAx,x \rangle _X + \langle \mathscr {O}_k x,Ax \rangle _X+\sum _{i=1}^n\langle \mathscr {O}_{k-1}N_ix,N_ix \rangle _X = \sum _{i=1}^n \langle \mathscr {O}_{k-1}N_ix,N_ix \rangle _X\\&\quad + \sum _{i=1}^n\int _{(0,\infty )^{k}} \int _{(0,\infty )}\ \frac{\mathrm{d}}{\mathrm{d}\tau } \left\| O_{k-1}(s_1,\ldots ,s_k)(N_i T(\tau )x) \right\| ^2_{{\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes (k-1)}}} \ \mathrm{d}\tau \mathrm{d}s =0. \end{aligned} \end{aligned}$$

The uniform convergence of \(\mathscr {O} = \sum _{k=0}^{\infty } \mathscr {O}_k\) implies that

$$\begin{aligned} \langle \mathscr {O} Ax,x \rangle _X + \langle \mathscr {O}x,Ax \rangle _X + \sum _{i=1}^n \langle \mathscr {O}N_ix,N_ix \rangle _X + \left\| Cx \right\| ^2_{\mathcal {H}}=0. \end{aligned}$$

Finally, we may use the polarization identity to obtain (2.3). \(\square \)

As stated for finite-dimensional systems in [5, Theorem 3.1], we obtain the following eponymous properties for the gramians.

Lemma 2.3

All elements \(\varphi _0\in {\text {ker}}(\mathscr {O})\) are unobservable in the homogeneous system, i.e. solutions to

$$\begin{aligned} \varphi '(t)=A\varphi (t)+\sum _{i=1}^n N_i\varphi (t)u_i(t),\ \text { for }t>0 \end{aligned}$$
(2.4)

with \(\varphi (0)=\varphi _0 \in {\text {ker}}(\mathscr {O})\) satisfy \(C\varphi (t)=0\) for all \(t \ge 0.\)

Proof

An element \(x \in X\) is in \({\text {ker}}({\mathscr {O}})\) if and only if \(\left\langle {\mathscr {O}}_k x, x \right\rangle _X =0\) for all \(k \in {\mathbb {N}}_0.\) We start by showing that \({\text {ker}}({\mathscr {O}})\) is an invariant subspace of the semigroup (T(t)). Let \(x \in {\text {ker}}({\mathscr {O}})\), then for all \(t \ge 0\) and all k by (2.1) and the semigroup property

$$\begin{aligned} \begin{aligned}&0 \le \left\langle {\mathscr {O}}_k T(t)x, T(t)x \right\rangle _X =\int _{(0,\infty )^{k+1}}\left\| O_k(s)T(t)x \right\| ^2_{{\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes k}}} \mathrm{d}s\\&\quad = \int _{(0,\infty )^{k}} \sum _{i=1}^n \int _{0}^{\infty }\left\| O_{k-1}(s)N_i T(s_{k+1}+t)x\right\| ^2_{{\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes {k-1}}}} \mathrm{d}s_{k+1} \mathrm{d}s \\&\quad = \int _{(0,\infty )^{k}} \int _{t}^{\infty }\langle O_k(s,\tau )x, O_k(s,\tau )x \rangle _{{\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes k}}} \ \mathrm{d}\tau \mathrm{d}s \\&\quad \le \int _{(0,\infty )^{k+1}} \langle O_k(s)x, O_k(s)x \rangle _{{\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes k}}} \mathrm{d}s = \left\langle {\mathscr {O}}_k x, x \right\rangle _X = 0, \end{aligned} \end{aligned}$$

where we used the semigroup property of (T(t)), substituted \(\tau =s_{k+1}+t,\) and extended the integration domain to get the final inequality. Thus, (T(t)) restricts to a \(C_0\)-semigroup on the closed subspace \({\text {ker}}({\mathscr {O}})\) and the generator of A is the part of A in \({\text {ker}}(\mathscr {O})\) [14, Chapter II 2.3]. In particular, \(D(A) \cap {\text {ker}}({\mathscr {O}})\) is dense in \({\text {ker}}({\mathscr {O}}).\) Let \(x \in {\text {ker}}(\mathscr {O}) \cap D(A),\) then positivity of \(\mathscr {O}\) implies, by the first Lyapunov equation (2.3) with \(x_1=y_1=x\), that \(N_ix \in {\text {ker}}(\mathscr {O})\) and \(x \in {\text {ker}}(C)\). Thus, a density argument shows \(N_i \left( {\text {ker}}(\mathscr {O}) \right) \subset {\text {ker}}(\mathscr {O})\) and \({\text {ker}}(\mathscr {O})\subset {\text {ker}}(C).\)

This shows, by Li and Yong [11, Proposition 5.3], that (2.4) is well posed on \({\text {ker}}({\mathscr {O}})\), i.e. for initial data in \({\text {ker}}({\mathscr {O}})\) the solution to (2.4) stays in \({\text {ker}}(\mathscr {O})\). From the inclusion \({\text {ker}}(\mathscr {O}) \subset {\text {ker}}(C)\), we then obtain \(C \varphi (t)=0.\) \(\square \)

Lemma 2.4

The closure of the range of the reachability gramian \(\mathscr {P}\) is an invariant subspace of the flow of (1.1), i.e. for \(\varphi _0 \in \overline{{\text {ran}}}(P)\) it follows that \(\varphi (t) \in \overline{{\text {ran}}}(P)\) for all times \(t\ge 0.\)

Proof

Analogous to Lemma 2.3. \(\square \)

3 Hankel operators on Fock spaces

To decompose the observability gramian as \(\mathscr {O}= W^* W\) and the reachability gramian as \(\mathscr {P} = R R^* \), we start by defining the observability and reachability maps.

Definition 3.1

For \(k \in \mathbb {N}_0\), let \( W_k\in {\mathcal {L}}\left( X, F^n_{k+1}\left( \mathcal {H} \right) \right) \) be the operators that map \(X \ni x \mapsto O_k(\bullet )x,\) then \(\left\| W_k \right\| = {\mathcal {O}}\left( \left( M \Gamma ( 2 \nu )^{-1/2} \right) ^k\right) .\) The adjoint operators \( W_k^* \in \mathcal L\left( F_{k+1}^n\left( \mathcal {H} \right) ,X\right) \) are given by \(W_k^*f:=\int _{(0,\infty )^{k+1}} O_k^*(s)f(s) \mathrm{d}s.\) By Assumption 1, we can define the observability map \(W \in {\mathcal {L}}\left( X, F^n\left( \mathcal {H} \right) \right) \) as \(W(x):=\left( W_k(x)\right) _{k \in \mathbb {N}_0}\) with adjoint operator \( W^*\), given for \((f_k)_{k}\in F^n\left( \mathcal {H} \right) ,\) by \(W^*((f_k)_{k})= \sum _{k=0}^{\infty } W_k^*f_k.\)

Similarly, to the decomposition of the observability gramian, we introduce a decomposition of the reachability gramian \(\mathscr {P} = R R^*\). Let

$$\begin{aligned} R_k \in {\text {HS}}\left( F_{k+1}^n\left( \mathbb {R}^{n}\right) ,X\right) \text { be given by } R_kf:=\int _{(0,\infty )^{k+1}} P_k(s)^*(B\otimes {\text {id}}_{{\mathbb {R}}^{n^{\otimes k}}}) f(s) \mathrm{d}s. \end{aligned}$$

The adjoint operators of the \(R_k\) are the operators

$$\begin{aligned} R_k^* \in {\text {HS}}\left( X,F_{k+1}^n\left( \mathbb {R}^{n} \right) \right) \text { with } R_k^*x:= \left( B^*\otimes {\text {id}}_{{\mathbb {R}}^{n^{\otimes k}}}\right) P_k(\bullet )x. \end{aligned}$$

If the gramians exist, then the reachability map is defined as

$$\begin{aligned} R \in {\text {HS}}\left( F^n\left( {{\mathbb {R}}}^n\right) ,X\right) \text { such that } (f_k)_{k \in \mathbb {N}_0} \mapsto \sum _{k=0}^{\infty } R_k f_k. \end{aligned}$$

Its adjoint is given by \( R^* \in {\text {HS}}\left( X, F^n(\mathbb {R}^n)\right) , \ X \ni x \mapsto \left( R_k^*(x) \right) _{k \in \mathbb {N}_0}.\)

To see that \( R_k\) is a Hilbert–Schmidt operator, we take an ONB \((e_i)\) of \(F_{k+1}^n\left( \mathbb {R}^{n}\right) \), such that the \(e_i\) are tensor products of an ONB of \(L^2((0,\infty ),{\mathbb {R}})\) and standard unit vectors of \({\mathbb {R}}^n\), and an arbitrary ONB \((f_j)\) of X

$$\begin{aligned} \begin{aligned}&\left\| R_k \right\| ^2_{{\text {HS}}\left( F_{k+1}^n\left( \mathbb {R}^{n}\right) ,X\right) } =\sum _{j=1}^{{\text {dim}}(X)}\sum _{i=1}^{\infty } \left|\left\langle f_j, R_k e_i \right\rangle _X \right|^2=\sum _{j=1}^{{\text {dim}}(X)}\sum _{i=1}^{\infty } \left|\left\langle R_k^*f_j, e_i \right\rangle _{F_{k+1}^n\left( \mathbb {R}^{n}\right) } \right|^2 \\&\quad = \sum _{j=1}^{{\text {dim}}(X)} \sum _{i=1}^n \sum _{n_1,\ldots ,n_k=1}^n\int _{(0,\infty )^{k+1}} \left|\left\langle f_j, P_k(s)^*(\psi _i\otimes \widehat{e}_{n_1} \otimes \cdots \otimes \widehat{e}_{n_k}) \right\rangle _{X} \right|^2 \mathrm{d}s \\&\quad = \sum _{i=1}^n \sum _{n_1,\ldots ,n_k=1}^n\int _{(0,\infty )^{k+1}} \left\| P_k(s)^*(\psi _i\otimes \widehat{e}_{n_1} \otimes \cdots \otimes \widehat{e}_{n_k}) \right\| _{X}^2 \mathrm{d}s \\&\quad \quad = {\mathcal {O}} \left( \left( M^{2} \Gamma ^{2}( 2 \nu )^{-1} \right) ^k \right) . \end{aligned} \end{aligned}$$
(3.1)

One can then check that the maps W and P indeed decompose the gramians as \(\mathscr {O} = W^* W\) and \(\mathscr {P} = R R^*.\) We now introduce the main object of our analysis:

Definition 3.2

The Hankel operator is the Hilbert–Schmidt operator \(H:= W R \in {\text {HS}}\left( F^n(\mathbb {R}^n),F^n(\mathcal {H})\right) .\)

Since any compact operator acting between Hilbert spaces possesses a singular value decomposition, we conclude that:

Corollary 3.3

There are \((e_k)_{k \in \mathbb {N}} \subset F^n(\mathbb {R}^n) \) and \((f_k)_{k \in \mathbb {N}} \subset F^n({\mathcal {H}})\)-orthonormal systems as well as singular values \((\sigma _k)_{k \in \mathbb {N}} \in \ell ^2(\mathbb {N})\) such that

$$\begin{aligned} H = \sum _{k=1}^{\infty } \sigma _k\langle \bullet ,e_k \rangle _{F^n(\mathbb {R}^n)} f_k, \quad He_k= \sigma _k f_k, \quad \text {and} \quad H^*f_k = \sigma _k e_k. \end{aligned}$$
(3.2)

We now state a sufficient condition under which H is a trace class operator such that \((\sigma _k)_{k \in \mathbb {N}} \in \ell ^1(\mathbb {N}).\)

Lemma 3.4

If \({\mathcal {H}}\simeq \mathbb {R}^m\) for any \(m \in {\mathbb {N}}\), then W is a Hilbert–Schmidt operator just like R. Consequently, \(H = W R \text { and } \mathscr {O}= W^* W \) are both of trace class.

Proof

Since for any \(i \in \left\{ 1,\ldots ,m\right\} \) and \(i_1,\ldots ,i_k \in \left\{ 1,\ldots ,n \right\} \) the operator

$$\begin{aligned} X \ni x \mapsto \left\langle \widehat{e}_i \otimes \widehat{e}_{i_1} \otimes \cdots \otimes \widehat{e}_{i_k},W_kx \right\rangle _{\mathbb {R}^m \otimes {\mathbb {R}}^{n^{\otimes k}}}=:Q_{i,i_1,\ldots ,i_k}(x) \end{aligned}$$

is a Carleman operator, we can apply [20, Theorem 6.12(iii)] that characterizes Carleman operators of Hilbert–Schmidt type. The statement of the Lemma follows from the summability of

$$\begin{aligned} \begin{aligned} \left\| W_k \right\| _{{\text {HS}}}^2&= \sum _{i=1}^m \sum _{i_1,\ldots ,i_k=1}^n \left\| Q_{i,i_1,\ldots ,i_k} \right\| _{{\text {HS}}(X,L^2((0,\infty )^{k+1},\mathbb R))}^2 \\&\quad \le \sum _{i=1}^m \sum _{i_1,\ldots ,i_k=1}^n \int _{(0,\infty )^{k+1}} \left\| O_k(t)^*\left( \widehat{e}_i \otimes \widehat{e}_{i_1} \otimes \cdots \otimes \widehat{e}_{i_k}\right) \right\| ^2_{X} \mathrm{d}t\\&\quad ={\mathcal {O}}\left( \left( M^{2} \Gamma ^{2} (2 \nu )^{-1}\right) ^{k}\right) . \end{aligned} \end{aligned}$$

\(\square \)

In the rest of this section, we discuss immediate applications of our preceding construction. We start by introducing the truncated gramians.

Definition 3.5

The kth-order truncation of the gramians is the first k summands of the gramians, i.e. \(\mathscr {O}^{(k)}:=\sum _{i=0}^{k-1}\mathscr {O}_i\) and \(\mathscr {P}^{(k)}:=\sum _{i=0}^{k-1}\mathscr {P}_i.\) The associated kth-order truncated Hankel operator is \(H^{(k)}f:=( W_i \sum _{j=0}^{k-1} R_jf_j)_{i \in \{0,\ldots ,k-1\}}\).

The proof of Proposition 1.1 follows then from our preliminary work very easily:

Proof of Proposition 1.1

From [21, Corollary 2.3], it follows that for any \(i \in \mathbb {N}\) the difference of singular values can be bounded as \(\left|\sigma _{i}-\sigma ^k_{i} \right|\le \left\| H-H^{(k)} \right\| \le \left\| H-H^{(k)} \right\| _{{\text {HS}}}\) and by the inverse triangle inequality \(\left|\left\| \sigma \right\| _{\ell ^2}-\left\| \sigma ^{k} \right\| _{\ell ^2} \right|\le \left\| H-H^{(k)} \right\| _{{\text {HS}}}\). Thus, it suffices to bound by (3.1) and Definition 3.1

$$\begin{aligned} \begin{aligned} \left\| H-H^{(k)} \right\| _{{\text {HS}}}^2 = \sum _{(i,j) \in \mathbb {N}_0^2 \backslash \{0,\ldots ,k-1\}^2} \left\| H_{ij} \right\| ^2_{{\text {HS}}}&= \sum _{(i,j) \in \mathbb {N}_0^2 \backslash \{0,\ldots ,k-1\}^2} \left\| W_i \right\| ^2 \left\| R_j \right\| ^2_{{\text {HS}}} \\&= {\mathcal {O}} \left( \left( M^{2} \Gamma ^{2} ( 2 \nu )^{-1} \right) ^{2k} \right) . \end{aligned} \end{aligned}$$

\(\square \)

We now give the proof of Proposition 1.2 on the approximation by subsystems. The Hankel operator for the subsystem on \(V_i\) is then \(H_{V_i}:=WR_{V_i}\), where

$$\begin{aligned} R_{V_i}(f):=\sum _{k=0}^{\infty } \int _{(0,\infty )^{k+1}} P_k(s)^*(P_{V_i}B\otimes {\text {id}}_{{\mathbb {R}}^{n^{\otimes k}}}) f_k(s) \mathrm{d}s \end{aligned}$$

with \(P_{V_i}\) being the orthogonal projection onto \(V_i.\)

Proof of Proposition 1.2

Using elementary estimates

$$\begin{aligned} \left\| H-H_{V_i} \right\| _{{\text {TC}}} \le \left\| W \right\| _{{\text {HS}}}\left\| R-R_{V_i} \right\| _{{\text {HS}}}\text { and }\left\| H-H_{V_i} \right\| _{{\text {HS}}} \le \left\| W \right\| \left\| R-R_{V_i} \right\| _{{\text {HS}}}, \end{aligned}$$

it suffices to show \({\text {HS}}\)-convergence of \(R_{V_i}\) to R. This is done along the lines of (3.1). \(\square \)

3.1 Convergence of singular vectors

The convergence of singular values is addressed in Proposition 1.1. For the convergence of singular vectors, we now assume that there is a family of compact operators \(H(m) \in {\mathcal {L}} \left( F^n\left( {\mathbb {R}}^n \right) ,F^n\left( {\mathcal {H}} \right) \right) \) converging in operator norm to H. By compactness, every operator H(m) has a singular value decomposition \(H(m)= \sum _{k=1}^{\infty } \sigma _k(m) \langle \bullet ,e_{k}(m) \rangle f_{k}(m).\)

Assumption 2

Without loss of generality, let the singular values be ordered as \(\sigma _1(m) \ge \sigma _2(m) \ge \cdots \) . Furthermore, for the rest of this section, all singular values of H are assumed to be nonzero and non-degenerate, i.e. all eigenspaces of \(HH^*\) and \(H^*H\) are one dimensional.

Lemma 3.6

Let the family of compact operators (H(m)) converge to the Hankel operator H in operator norm, then the singular vectors converge in norm as well.

Proof of Lemma 3.6

We give the proof only for singular vectors \((e_j)\) since the arguments for \((f_j)\) are analogous. We start by writing \(e_j=r(m)e_j(m)+x_j(m)\) where \(\langle e_j(m),x_j(m) \rangle =0.\) Then, the arguments stated in the proof of [22, Appendix 2] show that for m sufficiently large (the denominator is well defined as the singular values are non-degenerate)

$$\begin{aligned} \begin{aligned}&\left\| x_j(m) \right\| _{F^n(\mathbb {R}^n)}^2 \le \frac{\sigma _j^2-\left( \sigma _j-2\left\| H_j-H_j(m) \right\| _{{\mathcal {L}}\left( F^n\left( {\mathbb {R}}^n \right) ,F^n\left( {\mathcal {H}} \right) \right) }\right) ^2}{\sigma _j^2-\sigma _{j+1}^2} \xrightarrow [m \rightarrow \infty ]{} 0, \end{aligned} \end{aligned}$$

where \(H_j:=H-\sum _{k=0}^{j-1} \sigma _k \langle \bullet ,e_k \rangle f_k\) and \(H_j(m):=H(m)-\sum _{k=0}^{j-1} \sigma _k(m) \langle \bullet ,e_k(m) \rangle f_k(m). \) \(\square \)

4 Global error estimates

We start by defining a control tensor \(U_k(s) \in \mathcal L\left( {\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes k}}, \mathcal H\right) \)

$$\begin{aligned} U_k(s):=\sum _{i_1,\ldots ,i_k=1}^n u_{i_1}(s_1)\cdot \cdots \cdot u_{i_k}(s_k){\text {id}}_{{\mathcal {H}}} \otimes \ \left\langle \widehat{e}_{i_1}\otimes \cdots \otimes \widehat{e}_{i_k}, \bullet \right\rangle . \end{aligned}$$

Using sets \(\Delta _k(t):=\{(s_1,\ldots ,s_k) \in \mathbb {R}^k; 0 \le s_k \le \cdots \le s_1 \le t\}\), we can decompose the output map \((0,\infty ) \ni t \mapsto C\varphi (t)\) with \(\varphi \) as in (1.2) for controls \(\left\| u \right\| _{L^2((0,\infty ),({\mathbb {R}}^n, \left\| \bullet \right\| _{\infty }))} < \frac{\sqrt{2 \nu }}{M \Xi }\) and \(\Xi :=\sum _{i=1}^n \left\| N_i \right\| \) according to Lemma A.1 into two terms \(C\varphi (t) = K_1(t)+K_2(t)\) such that

$$\begin{aligned} \begin{aligned} K_1(t)&:= \sum _{k=1}^{\infty } \int _{\Delta _k(t)} U_k(s) \left( O_{k}(t-s_1,.,s_{k-1}-s_k,s_k)\varphi _0\right) \mathrm{d}s + CT(t) \varphi _0 \text { and } \\ K_2(t)&:= \sum _{k=1}^{\infty } \int _{\Delta _k(t)} U_k(s) \left( \sum _{i=1}^nO_{k-1}(t-s_1,s_1-s_2,\ldots ,s_{k-1}-s_k)\psi _i \otimes \widehat{e}_i \right) \mathrm{d}s. \end{aligned} \end{aligned}$$
(4.1)

The first term \(K_1\) is determined by the initial state \(\varphi _0\) of the evolution problem (1.1). If this state is zero, the term \(K_1\) vanishes. The term \(K_2\) on the other hand captures the intrinsic dynamics of Eq. (1.1). A technical object linking the dynamics of the evolution equation to the operators from the balancing method is the Volterra kernels we introduce next:

Definition 4.1

The Volterra kernels associated with (1.1) are the functions

$$\begin{aligned} \begin{aligned}&h_{k,j} \in L^2\left( (0,\infty )^{k+j+1},{\text {HS}}\left( \mathbb R^{n^{\otimes (j+1)}}, {\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes k}} \right) \right) \\&h_{k,j}(\sigma _0,\ldots ,\sigma _{k}+\sigma _{k+1},\ldots ,\sigma _{k+j+1}):=O_{k}(\sigma _0,\ldots ,\sigma _k)P_{j}^*(\sigma _{k+j+1},\ldots ,\sigma _{k+1})\\&\quad (B\otimes {\text {id}}_{{\mathbb {R}}^{n^{\otimes {j}}}}). \end{aligned} \end{aligned}$$

The Volterra kernels satisfy an invariance property for all \(p,q,k,j \in {\mathbb {N}}_0\) such that \(p+q=k+j:\)

$$\begin{aligned} \left\| h_{k,j} \right\| _{L^{1}_{k+1}L^{2}_{k+j}\left( {\text {HS}}\left( \mathbb R^{n^{\otimes (j+1)}},{\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes k}}\right) \right) } = \left\| h_{p,q} \right\| _{L^{1}_{k+1}L^{2}_{k+j}\left( {\text {HS}}\left( \mathbb R^{n^{\otimes (q+1)}},{\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes p}}\right) \right) }. \end{aligned}$$
(4.2)

The Volterra kernels appear also as integral kernels of the Hankel operator

$$\begin{aligned}\begin{aligned} \left( W_k R_j f \right) (s_0,\ldots ,s_k) = \int _{(0,\infty )^{j+1}} h_{k,j} (s_0,\ldots ,s_k+t_{1},\ldots ,t_{j+1}) f(t) \mathrm{d}t. \end{aligned} \end{aligned}$$

Remark 1

In particular, the kernels \(h_{k,0}\) appear in the definition of the \({\mathscr {H}}^2\)-system norm introduced in [6, Eq. 15]

$$\begin{aligned} \begin{aligned} \left\| \Sigma \right\| ^2_{\mathscr {H}^2}&:= \sum _{k=0}^{\infty } \left\| h_{k,0} \right\| ^2_{L^2\left( (0,\infty )^{k+1},{\text {HS}}\left( {\mathbb {R}}^{n},{\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes k}}\right) \right) } \\&=\sum _{k=0}^{\infty } \int _{(0,\infty )^{k+1}} \sum _{n_1,\ldots ,n_{k}=1}^n \left\| CT(t_1) \prod _{l=2}^{k+1} \left( N_{n_{l-1}}T(t_l) \right) B \right\| _{{\text {HS}}({\mathbb {R}}^n, {\mathcal {H}})}^2 \mathrm{d}t \end{aligned} \end{aligned}$$

for which robust numerical algorithms with strong \(\mathscr {H}^2\)-error performance exist [23].

This system norm can also be expressed directly in terms of the gramians

$$\begin{aligned} \left\| \Sigma \right\| ^2_{\mathscr {H}^2} = {{\,\mathrm{tr}\,}}\left( BB^*{\mathscr {O}}\right) = {{\,\mathrm{tr}\,}}\left( C^*C{\mathscr {P}}\right) \end{aligned}$$

which is well defined as \(B^*B\) and \({\mathscr {P}}\) are both trace-class operators.

In [17], the kth-order transfer function \(G_k\) has been introduced as the \(k+1\)-variable Laplace transform of the Volterra kernel \(h_{k,0}\)

$$\begin{aligned} G_{k}(s):=\int _{(0,\infty )^{k+1}} h_{k,0}(t)e^{- \langle s, t \rangle } \mathrm{d}t. \end{aligned}$$

Using mixed Hardy norms as defined in (1.7), the Paley–Wiener theorem implies the following estimate for \(i \in \left\{ 1,\ldots ,k+1 \right\} \)

$$\begin{aligned} \begin{aligned}&\left\| G_{k} \right\| _{{\mathscr {H}}^{\infty }_{i} \mathscr {H}^{2}_{k}\left( {\text {HS}}\left( {\mathbb {R}}^n,{\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes k}}\right) \right) } \\&\quad \le \int _{0}^{\infty } \left\| \int _{(0,\infty )^{k}}h_{k,0}^{(i)}(s,\sigma )e^{-\langle \bullet , s \rangle } \mathrm{d}s \right\| _{\mathscr {H}^2\left( (0,\infty )^{k},{\text {HS}}\left( \mathbb R^n,{\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes k}}\right) \right) } \ \mathrm{d}\sigma \\&\quad =\left\| h_{k,0}^{(i)} \right\| _{L^{1}_{i}L^{2}_{k}\left( {\text {HS}}\left( \mathbb R^n,{\mathcal {H}} \otimes {\mathbb {R}}^{n^{\otimes k}}\right) \right) }. \end{aligned} \end{aligned}$$
(4.3)

For two systems \(\Sigma \) and \({{\widetilde{\Sigma }}}\) satisfying Assumption 1 with the same number of controls and the same output space \({\mathcal {H}}\), we then define the difference Volterra kernel and the difference Hankel operator \(\Delta (h):=h-{\widetilde{h}}\) and \(\Delta (H):= H -{\widetilde{H}} =\left( W_i R_j - \widetilde{W_i} \widetilde{ R_j} \right) _{ij}.\)

The next Lemma bounds the mixed \(L^1\)\(L^2\) norm of the difference Volterra kernel:

Lemma 4.2

Consider two systems satisfying Assumption 1 with the same number of controls and the same output space \({\mathcal {H}} \simeq {\mathbb {R}}^m\) such that H is trace class (Lemma 3.4). Then, the Volterra kernels \(h_{k,j}\) satisfy

$$\begin{aligned} \left\| \Delta ( h_{k,j}) \right\| _{L^{1}_{k+1}L^{2}_{k+j}\left( {\text {HS}}\left( \mathbb R^{n^{\otimes (j+1)}},{\mathbb {R}}^m \otimes {\mathbb {R}}^{n^{\otimes k}}\right) \right) } \le 2 \left\| \Delta ( W_k R_j) \right\| _{{\text {TC}}\left( F^n_{j+1}(\mathbb R^n),F^n_{k+1}\left( {\mathbb {R}}^m\right) \right) }. \end{aligned}$$

Proof

Given the difference Volterra kernel \(\Delta (h_{k,j})\) associated with \(\Delta ( W_k R_j).\)

For every \(z \in \mathbb {N}_0\) and \(\alpha >0\) fixed, we define a family of sesquilinear forms \((L_{z,\alpha })\)

$$\begin{aligned} \begin{aligned}&L_{z,\alpha }: F_k^1\left( \mathbb {R}^{m}\otimes \mathbb R^{n^{\otimes k}}\right) \oplus F_j^1\left( \mathbb {R}^{n^{\otimes (j+1)}}\right) \rightarrow \mathbb {R} (f,g) \\&\quad {\mapsto } \int _{(0,\infty )^{k{+}j}} \left\langle f(s_1,{\ldots },s_{k}),\Delta \left( h^{(k+1)}_{k,j}(s,2z\alpha )\right) g(s_{k{+}1},{\ldots },s_{k{+}j}) \right\rangle _{\mathbb {R}^{m}{\otimes } \mathbb R^{n^{\otimes k}}} \mathrm{d}s. \end{aligned} \end{aligned}$$

Since \(\Delta \left( h^{(k+1)}_{k,j}(\bullet ,2z\alpha )\right) \in F_k^1\left( \mathbb {R}^{m}\otimes {\mathbb {R}}^{n^{\otimes k}}\right) \otimes F_j^1\left( \mathbb {R}^{n^{\otimes (j+1)}}\right) =: Z\), there exists a Hilbert–Schmidt operator \(Q : F_j^1\left( \mathbb {R}^{n^{\otimes (j+1)}}\right) \rightarrow F_k^1\left( \mathbb {R}^{m}\otimes {\mathbb {R}}^{n^{\otimes k}}\right) \)Footnote 3 of unit HS-norm

$$\begin{aligned} \begin{aligned}&(Q\varphi )(s):=\int _{(0,\infty )^{j}} \tfrac{\Delta \left( h^{(k+1)}_{k,j}((s,t),2z\alpha )\right) }{\left\| \Delta \left( h^{(k+1)}_{k,j}(\bullet ,2z\alpha )\right) \right\| _{Z}} \varphi (t) \mathrm{d}t. \end{aligned} \end{aligned}$$

The singular value decomposition of Q provides orthonormal systems \(f_{z,i} \in F_k^1\left( \mathbb {R}^{m}\otimes {\mathbb {R}}^{n^{\otimes k}}\right) \), \(g_{z,i} \in F_j^1\left( \mathbb {R}^{n^{\otimes (j+1)}}\right) \), parameterized by \(i \in {\mathbb {N}},\) and singular values \(\sigma _{z,i} \in [0,1]\) such that for any \(\delta >0\) given there is \(N(\delta )\) large enough with

$$\begin{aligned} \left\| \tfrac{\Delta \left( h^{(k+1)}_{k,j}(\bullet ,2z\alpha )\right) }{\left\| \Delta \left( h^{(k+1)}_{k,j}(\bullet ,2z\alpha )\right) \right\| _{Z}} - \sum _{i=1}^{N(\delta )} \sigma _{z,i} (f_{z,i} \otimes g_{z,i} )\right\| _{Z}<\delta . \end{aligned}$$

Let \(\varepsilon >0,\) then for M sufficiently large \(\int _{M}^{\infty } \left\| \Delta \left( h^{(k+1)}_{k,j}(\bullet ,v)\right) \right\| _{Z} \mathrm{d}v < \varepsilon .\) Thus, for \(z \in \mathbb {N}_0\) there are \(f_{z,i}\in F_k^1\left( \mathbb {R}^{m}\otimes {\mathbb {R}}^{n^{\otimes k}}\right) \) and \(g_{z,i} \in F_j^1\left( \mathbb {R}^{n^{\otimes (j+1)}}\right) \) orthonormalized, \(\sigma _{z,i} \in [0,1]\), and \(N_z \in {\mathbb {N}}\) such that

$$\begin{aligned} \begin{aligned}&\left|\left\langle \tfrac{\Delta \left( h^{(k+1)}_{k,j}(\bullet ,2z\alpha )\right) }{\left\| \Delta \left( h^{(k+1)}_{k,j}(\bullet ,2z\alpha )\right) \right\| _{Z}}-\sum _{i=1}^{N_z} \sigma _{z,i} (f_{z,i} \otimes g_{z,i} ), \Delta \left( h^{(k+1)}_{k,j}(\bullet ,2z\alpha )\right) \right\rangle _{Z} \right|\\&\quad =\left|\left\| \Delta \left( h^{(k+1)}_{k,j}(\bullet ,2z\alpha )\right) \right\| _{Z} -\sum _{i=1}^{N_z} \sigma _{z,i} L_{z,\alpha }(f_{z,i},g_{z,i}) \right|<\frac{ \varepsilon }{M}. \end{aligned} \end{aligned}$$
(4.4)

Then, \(s_{z,i}(r,u):= \frac{1}{\sqrt{\alpha }} \mathbb {1}_{[z\alpha ,(z+1)\alpha )}(r)g_{z,i}(u)\) and \(t_{z,i}(r,u):= \frac{1}{\sqrt{\alpha }} \mathbb {1}_{[z\alpha ,(z+1)\alpha )}(r) f_{z,i}(u)\) form orthonormal systems parameterized by z and i in spaces \(F^n_{j+1}(\mathbb {R}^n)\) and \(F_{k+1}^n\left( \mathbb {R}^{m}\right) \), respectively, such that using the auxiliary quantities

$$\begin{aligned} \begin{aligned}&I:=(z \alpha ,(z+1)\alpha )^2 \times (0,\infty )^{k+j}, \ J:=(2z \alpha ,2(z+1)\alpha ) \times (0,\infty )^{k+j}, \text { and }\\&\lambda (v):=\min \left\{ v-2z\alpha ,2(z+1)\alpha -v\right\} \end{aligned} \end{aligned}$$

it follows that

$$\begin{aligned} \begin{aligned}&\langle t_{z,i}, \Delta ( W_k R_j) s_{z,i} \rangle _{F_{k+1}^n ({\mathbb {R}}^m)} \\&\quad =\frac{1}{\alpha }\int _{I} \left\langle f_{z,i}(s_1,.,s_{k}),\Delta \left( h^{(k+1)}_{k,j}\right) (s,r+t) g_{z,i}(s_{k+1},.,s_{k+j})\right\rangle _{\mathbb {R}^{m}\otimes {\mathbb {R}}^{n^{\otimes k}}} \mathrm{d}r \mathrm{d}t \mathrm{d}s \\&\quad =\frac{1}{2\alpha } \int _{J} \int _{-\lambda (v)}^{\lambda (v)} \left\langle f_{z,i}(s_1,.,s_{k}),\Delta \left( h^{(k+1)}_{k,j}\right) (s,v) g_{z,i}(s_{k+1},.,s_{k+j})\right\rangle _{\mathbb {R}^{m}\otimes {\mathbb {R}}^{n^{\otimes k}}} \mathrm{d}w \mathrm{d}v \mathrm{d}s\\&\quad =\frac{1}{\alpha } \int _{J} \lambda (v) \left\langle f_{z,i}(s_1,.,s_{k}),\Delta \left( h^{(k+1)}_{k,j}\right) (s,v) g_{z,i}(s_{k+1},.,s_{k+j})\right\rangle _{\mathbb {R}^{m}\otimes {\mathbb {R}}^{n^{\otimes k}}} \mathrm{d}v \mathrm{d}s \end{aligned} \end{aligned}$$
(4.5)

where we made the change of variables \(v:=r+t\) and \(w:=r-t.\) For \(\alpha \) small enough and \(v_1, v_2 \in [0,M+1]\), we have by strong continuity of translations

$$\begin{aligned} \left\| \Delta \left( h^{(k+1)}_{k,j}(\bullet ,v_1)\right) -\Delta \left( h^{(k+1)}_{k,j}(\bullet ,v_2)\right) \right\| _{Z}< \frac{\varepsilon }{M} \text { if } \left|v_1-v_2 \right|< 2 \alpha . \end{aligned}$$
(4.6)

Hence, using the above uniform continuity as well as (4.4) and (4.5)

$$\begin{aligned} \begin{aligned}&\left|\sum _{i=1}^{N_z} \sigma _{z,i} \langle t_{z,i}, \Delta ( W_k R_j) s_{z,i} \rangle _{F_{k+1}^n ({\mathbb {R}}^m)} - \alpha \left\| \Delta \left( h^{(k+1)}_{k,j}(\bullet ,2z\alpha )\right) \right\| _Z \right|\\&\quad \le \frac{1}{\alpha } \int _{2z\alpha }^{2(z+1)\alpha } \lambda (v) \left( \left|\sum _{i=1}^{N_z} \sigma _{z,i} \left\langle f_{z,i} \otimes g_{z,i}, \Delta \left( h^{(k+1)}_{k,j}(\bullet ,v)\right) -\Delta \left( h^{(k+1)}_{k,j}(\bullet ,2z\alpha )\right) \right\rangle _Z \right|\right. \\&\qquad \left. +\left|\sum _{i=1}^{N_z} \sigma _{z,i} L_{z,\alpha }(f_{z,i},g_{z,i}) -\left\| \Delta \left( h^{(k+1)}_{k,j}(\bullet ,2z\alpha )\right) \right\| _Z \right|\right) \mathrm{d}v \lesssim \frac{\alpha \varepsilon }{M}. \end{aligned} \end{aligned}$$

This implies immediately by uniform continuity (4.6)

$$\begin{aligned} \left|\sum _{i=1}^{N_z} \sigma _{z,i} \langle t_{z,i}, \Delta ( W_k R_j) s_{z,i} \rangle _{F_{k+1}^n ({\mathbb {R}}^m)} - \frac{1}{2} \int _{2z\alpha }^{2(z+1)\alpha } \left\| \Delta \left( h^{(k+1)}_{k,j}(\bullet ,v)\right) \right\| _Z \mathrm{d}v \right|\lesssim \frac{\alpha \varepsilon }{M}. \end{aligned}$$

Summing over z up to \(\left\lfloor {\frac{M}{2\alpha }} \right\rfloor \) implies by the choice of M that

$$\begin{aligned} \begin{aligned}&\left|\sum _{z=0}^{\left\lfloor {\frac{M}{2\alpha }}\right\rfloor } \sum _{i=1}^{N_z} \sigma _{z,i} \langle t_{z,i}, \Delta ( W_k R_j) s_{z,i} \rangle _{F_{k+1}^n ({\mathbb {R}}^m)} - \frac{1}{2} \left\| \Delta \left( h^{(k+1)}_{k,j}\right) \right\| _{L^{1}_{k+1}L^{2}_{k+j}\left( {\text {HS}}\right) } \right|\lesssim \varepsilon . \end{aligned} \end{aligned}$$

The Lemma follows then from the characterization of the trace norm stated in (1.6). \(\square \)

The preceding Lemma provides us with bounds on the difference of the dynamics of two systems \(\Sigma \) and \({\widetilde{\Sigma }}\) satisfying Assumption 1. In particular, Lemma 4.2 allows us to prove Theorem 1.

Proof of Theorem 1

The Hankel operator is an infinite matrix with operator-valued entries \(H_{ij}= W_iR_j.\) Using the invariance property (4.2), we can combine Lemma 4.2 with estimate (4.3), relating the transfer functions to the Volterra kernels, to obtain from the definition of the trace norm (1.6) that

$$\begin{aligned} \begin{aligned}&\sum _{k=1}^{\infty } \left\| \Delta (G_{2k-1}) \right\| _{{\mathscr {H}}^{\infty }_k {\mathscr {H}}^2_{2k-1}} \le 2 \sum _{k=1}^{\infty } \left\| \Delta (W_{k}R_{k-1}) \right\| _{{\text {TC}}}\le 2 \left\| \Delta (H) \right\| _{{\text {TC}}} \text { and } \\&\sum _{k=1}^{\infty } \left\| \Delta (G_{2k-2}) \right\| _{{\mathscr {H}}^{\infty }_k {\mathscr {H}}^2_{2k-2}} \le 2 \sum _{k=0}^{\infty } \left\| \Delta (W_{k}R_{k}) \right\| _{{\text {TC}}}\le 2 \left\| \Delta (H) \right\| _{{\text {TC}}} \end{aligned} \end{aligned}$$

which by summing up the two bounds yields the statement of the theorem. \(\square \)

While Theorem 1 controls the transfer functions, the subsequent theorem controls the actual dynamics from zero:

Proof of Theorem 2

The operator norm of the control tensor is bounded by

$$\begin{aligned} \begin{aligned} \left\| U_k(s) \right\|&\le \prod _{i=1}^k \left\| u(s_i) \right\| _{({\mathbb {R}}^n, \left\| \bullet \right\| _{\infty })}\left\| {\text {id}}_{{\mathcal {H}}} \otimes \sum _{i_1,\ldots ,i_k=1}^n \langle \widehat{e}_{i_1}\otimes \cdots \otimes \widehat{e}_{i_k}, \bullet \rangle \right\| \\&\le \prod _{i=1}^k \left\| u(s_i) \right\| _{({\mathbb {R}}^n, \left\| \bullet \right\| _{\infty })} \left\| \sum _{i_1,\ldots ,i_k=1}^n\langle \widehat{e}_{i_1}\otimes \cdots \otimes \widehat{e}_{i_k}, \bullet \rangle \cdot 1 \right\| \le n^{k/2} \prod _{i=1}^k \left\| u(s_i) \right\| _{({\mathbb {R}}^n, \left\| \bullet \right\| _{\infty })} \end{aligned} \end{aligned}$$

where we applied the Cauchy–Schwarz inequality to the product inside the sum to bound the \(\ell ^1\) norm by an \(\ell ^2\) norm.

It follows from (4.1), Hölder’s inequality, and Minkowski’s integral inequality that

$$\begin{aligned} \begin{aligned}&\left\| \Delta (C \varphi (t)) \right\| _{{\mathbb {R}}^m} \le \sum _{k=1}^{\infty }\int _{\Delta _k(t)} \Bigg (\left\| U_k(s)\right\| _{{\mathcal {L}}({\mathbb {R}}^m \otimes \mathbb R^{n^{\otimes k}},{\mathbb {R}}^m)} \\&\qquad \cdot \left\| \sum _{i=1}^n\Delta \left( O_{k-1}(t-s_1,\ldots ,s_{k-1}-s_k)\psi _i \right) \otimes \widehat{e}_i \right\| _{{\mathbb {R}}^m \otimes \mathbb R^{n^{\otimes k}}} \Bigg )\ \mathrm{d}s \\&\le \sum _{k=1}^{\infty }\int _{\Delta _k(t)}\underbrace{\left\| U_k(s)\right\| _{{\mathcal {L}}({\mathbb {R}}^m \otimes \mathbb R^{n^{\otimes k}},{\mathbb {R}}^m)}}_{\le n^{k/2} \prod _{i=1}^k \left\| u(s_i) \right\| _{({\mathbb {R}}^n, \left\| \bullet \right\| _{\infty })}} \left\| \Delta h_{k-1,0}(t-s_1,\ldots ,s_{k-1}-s_k) \right\| _{{\text {HS}}\left( {\mathbb {R}}^n,{\mathbb {R}}^m \otimes {\mathbb {R}}^{n^{\otimes (k-1)}}\right) } \mathrm{d}s \\&\le \sum _{k=1}^{\infty } \left( \left\| \Delta (h _{2k-1,0}) \right\| _{L^{1}_kL^{2}_{2k-1}({\text {HS}})}+ \left\| \Delta ( h _{2k-2,0}) \right\| _{L^{1}_kL^{2}_{2k-2}({\text {HS}})} \right) \sqrt{n} \left\| u \right\| _{L^{\infty }((0,\infty ),(\mathbb R^n, \left\| \bullet \right\| _{\infty }))}. \end{aligned} \end{aligned}$$

Then, by (1.6), Lemma 4.2, and the invariance property (4.2)

$$\begin{aligned} \begin{aligned} \left\| \Delta (C \varphi (t)) \right\| _{{\mathbb {R}}^m}&\le \sum _{k=1}^{\infty } \left\| \Delta (h _{k-1,k}) \right\| _{L^{1}_kL^{2}_{2k-1}({\text {HS}})} \sqrt{n} \left\| u \right\| _{L^{\infty }((0,\infty ),({\mathbb {R}}^n, \left\| \bullet \right\| _{\infty }))} \\&\quad +\sum _{k=1}^{\infty } \left\| \Delta (h _{k-1,k-1}) \right\| _{L^{1}_kL^{2}_{2k-2}({\text {HS}})}\sqrt{n} \left\| u \right\| _{L^{\infty }((0,\infty ),({\mathbb {R}}^n, \left\| \bullet \right\| _{\infty }))} \\&\le 4 \sqrt{n} \left\| \Delta (H) \right\| _{{\text {TC}}} \left\| u \right\| _{L^{\infty }((0,\infty ),({\mathbb {R}}^n, \left\| \bullet \right\| _{\infty }))}. \end{aligned} \end{aligned}$$

\(\square \)

5 Applications

Throughout this section, we assume that we are given a filtered probability space \((\Omega , {\mathcal {F}},({\mathcal {F}}_t)_{t \ge T_0}, {\mathbb {P}})\) satisfying the usual conditions, i.e. the filtration is right-continuous and \({\mathcal {F}}_{T_0}\) contains all \({\mathcal {F}}\) null-sets. We assume X to be a real separable Hilbert space. In the following subsection, we study an infinite-dimensional stochastic evolution equation with Wiener noise to motivate the extension of stochastic balanced truncation to infinite-dimensional systems that we introduce thereupon. We stick mostly to the notation introduced in the preceding sections and also consider the state-to-output (observation) operator \(C \in \mathcal L(X,{\mathcal {H}})\), the control-to-state (control) operator \(Bu = \sum _{i=1}^n \psi _i u_i,\) and A the generator of an exponentially stable \(C_0\)-semigroup (T(t)) on X.

5.1 Stochastic evolution equation with Wiener noise.

Let Y be a separable Hilbert space and \({\text {TC}}(Y) \ni Q=Q^* \ge 0 \) a positive trace class operator. We then consider a Wiener process \((W_t)_{t \ge T_0}\) [24, Def. 2.6] adapted to the filtration \(({\mathcal {F}}_t)_{t \ge T_0}\) with covariance operator Q.

We introduce the Banach space \(\left( {\mathcal {H}}_{2}^{(T_0,T)}(X), \sup _{t \in (T_0,T)}\left( {\mathbb {E}} \left( \left\| Z_t \right\| _{X}\right) ^2\right) ^{1/2}\right) \) of jointly measurable \(((T_0,T)\times \Omega \ni (t,\omega ) \mapsto Z_t(\omega ))\), X-valued processes adapted to the filtration \(({\mathcal {F}}_t)_{t \ge T_0}\) and consider mappingsFootnote 4 \(N \in {\mathcal {L}}(X, {\mathcal {L}}(Y,X))\) and controls \(u \in L^2_{\text {ad}}(\Omega _{{\mathbb {R}}_{\ge 0}}, {\mathbb {R}}^n) \cap L^{\infty }_{\text {ad}}(\Omega _{{\mathbb {R}}_{\ge 0}}, {\mathbb {R}}^n)\) where we recall the notation \(\Omega _X:= \Omega \times X.\) For the stochastic partial differential equation

$$\begin{aligned} \begin{aligned} \mathrm{d}Z_t&= (AZ_t +Bu(t))\ \mathrm{d}t + N(Z_t) \ \mathrm{d}W_t, \quad t >0, \\ Z_{0}&=\xi \in L^2(\Omega ,X), \end{aligned} \end{aligned}$$
(5.1)

there exists by Gawarecki and Mandrekar [24, Theorem 3.5] a unique continuous mild solution in \({\mathcal {H}}_{2}^{(T_0,T)}\), satisfying \({\mathbb {P}}\)-a.s. for \(t \in [0,T]\)

$$\begin{aligned} Z_t = T(t)\xi + \int _0^t T(t-s) Bu(s) \mathrm{d}s + \int _0^t T(t-s) N(Z_s) \ \mathrm{d}W_s. \end{aligned}$$
(5.2)

We refer to (5.1) with \(B \equiv 0\) as the homogeneous part of that equation. For solutions \(Z_t^{\text {hom}}\) to the homogeneous part of (5.1) starting at \(t=0\), let \(\Phi (\bullet ): L^2(\Omega ,{\mathcal {F}}_0,X) \rightarrow \mathcal H_{2}^{(0,T)}(X)\) be the flow defined by the mild solution, i.e. \(\Phi (t)\xi :=Z_t^{\text {hom}}\). If the initial time is some \(T_0\) rather than 0, we denote the (initial time-dependent) flow by \(\Phi (\bullet ,T_0):L^2(\Omega ,{\mathcal {F}}_{T_0},X) \rightarrow {\mathcal {H}}_{2}^{(T_0,T)}(X)\). The (X-)adjoint of the flow is defined by \(\langle \Phi (\bullet , T_0)\varphi _1,\varphi _2 \rangle _X = \langle \varphi _1,\Phi (\bullet , T_0)^*\varphi _2 \rangle _X\) for arbitrary \(\varphi _1,\varphi _2 \in X.\)

Definition 5.1

(Exponential stability in m.s.s.) The solution to the homogeneous system with flow \(\Phi \) is called exponentially stable in the mean square sense (m.s.s.) if there is some \(c>0\) such that for all \(\varphi _0 \in X\) and all \(t \ge 0\)

$$\begin{aligned} {\mathbb {E}} \left( \left\| \Phi (t) \varphi _0 \right\| ^2_X \right) \lesssim e^{-c t} \left\| \varphi _0 \right\| _X^2. \end{aligned}$$
(5.3)

Lyapunov techniques to verify exponential stability for SPDEs of the form (5.1) are discussed in [24, Section 6.2].

We then define the variation of constants process Y of the flow \(\Phi \) as

$$\begin{aligned} Y_t(u) :=\int _0^t \Phi (t,s) Bu(s)\ \mathrm{d}s= \sum _{i=1}^n\int _0^t \Phi (t,s)\psi _i u_i(s) \mathrm{d}s. \end{aligned}$$
(5.4)

This variation of constants process coincides with the mild solution to the full SPDE (5.1) almost surely if \(\xi =0\), as an application of the stochastic Fubini theorem [24, Theorem 2.8] and (5.2) shows

$$\begin{aligned} \begin{aligned}&\int _0^t T(t-s) N (Y_s) \ \mathrm{d}W_s = \int _0^t T(t-s) N\left( \int _0^s \Phi (s,r)B u(r) \mathrm{d}r\right) \mathrm{d}W_s \\&\quad = \int _0^t T(t-s) N\left( \int _0^t \underbrace{\mathbb {1}_{[0,s]}(r)}_{=\mathbb {1}_{[r,t]}(s)} \Phi (s,r)B u(r) \ \mathrm{d}r\right) \mathrm{d}W_s \\&\quad = \int _0^t \left( \int _0^t T(t-s) N\left( \mathbb {1}_{[r,t]}(s) \Phi (s,r)B u(r)\right) \mathrm{d}W_s\right) \mathrm{d}r \\&\quad = \int _0^t \left( \int _r^t T(t-s)N\left( \Phi (s,r)B u(r)\right) \ \mathrm{d}W_s \right) \ \mathrm{d}r\\&\quad = \int _0^t \Phi (t,r)B u(r)- T(t-r)B u(r) \ \mathrm{d}r = Y_t - \int _0^t T(t-r) Bu(r) \mathrm{d}r \end{aligned} \end{aligned}$$

such that upon rewriting \(Y_t = \int _0^t T(t-s)N(Y_s) \ \mathrm{d}W_s + \int _0^t T(t-r) Bu(r) \ \mathrm{d}r.\)

Another important property of the homogeneous solution to (5.1) is that it satisfies the homogeneous Markov property [24, Section 3.4]. Although the flow \(\Phi \) is time dependent as the SPDE is non-autononomous, there is an associated \( C_b\)-Markov semigroup \(P(t): C_b(X) \rightarrow C_b(X)\) satisfying \(P(t)f(x)={\mathbb {E}}(f(\Phi (s+t,s)x))\) for all \(s \ge 0\) and \(P(t+s)f=P(t)P(s)f.\)

The \( C_b\)-Feller property, i.e. P(t) maps \( C_b(X)\) again into \( C_b(X),\) will not be needed in our subsequent analysis, but reflects the continuous dependence of the solution (5.1) on initial data. We shall also use that the \( C_b\)-Markov semigroup can be extended to all f for which the process is still integrable, i.e. \(f(\Phi (t,s)x) \in L^1(\Omega ,{\mathbb {R}}) \) for arbitrary \(s \le t\) and \(x \in X.\)

By applying the Markov property to the auxiliary functions \(f_{x,y}\) defined as follows

$$\begin{aligned} \begin{aligned}&\left\langle \Phi (T-t+s,s)^*x,BB^*\Phi (T-t+s,s)^*y \right\rangle \\&\quad = \sum _{i=1}^n \underbrace{\left\langle \Phi (T-t+s,s)\psi _i,y\right\rangle \left\langle x, \Phi (T-t+s,s)\psi _i\right\rangle }_{=:f_{x,y}(\Phi (T-t+s,s)\psi _i)} \end{aligned} \end{aligned}$$

with \(0 \le t \le T\), \(x,y \in X\), and \(0 \le s \le T-t\), it follows by evaluating \({\mathbb {E}}(f_{x,y}(\Phi (T-t+s,s)\psi _i))\) at \(s=0\) and \(s=t\) that

$$\begin{aligned} {\mathbb {E}}\left\langle \Phi (T-t,0)^*y,BB^*\Phi (T-t,0)^*x \right\rangle ={\mathbb {E}}\left\langle \Phi (T,t)^*y,BB^*\Phi (T,t)^*x \right\rangle . \end{aligned}$$
(5.5)

In the following subsection, we introduce a generalized stochastic balanced truncation framework for systems similar to the stochastic evolution equation (5.1).

5.2 Generalized stochastic balanced truncation

For an exponentially stable flow \(\Phi \), we define the stochastic observability map W and reachability map R

$$\begin{aligned} \begin{aligned}&W \in {\mathcal {L}} (X, L^2(\Omega _{(0,\infty )},{\mathcal {H}}))\text { with }(Wx)(t,\omega ):= C \Phi (t,\omega ) x \text { and }\\&R \in {\text {HS}}(L^2(\Omega _{(0,\infty )},{\mathbb {R}}^n), X)\text { with } Rf:={\mathbb {E}} \left( \int _{(0,\infty )} \sum _{i=1}^n\Phi (s)\psi _i \langle f(s), \widehat{e}_i \rangle \ \mathrm{d}s \right) . \end{aligned} \end{aligned}$$
(5.6)

Remark 2

Let \({\mathcal {H}} \simeq {\mathbb {R}}^m,\) then each map \(x \mapsto \langle \widehat{e}_i,Wx \rangle \) is a Carleman operator and by the characterization of Carleman operators of Hilbert–Schmidt type [20, Theorem 6.4 (iii)] the operator W is a Hilbert–Schmidt operator as well.

We define stochastic observability \(\mathscr {O}=W^*W \in {\mathcal {L}}(X)\) and reachability \(\mathscr {P}=RR^* \in {\text {TC}}(X)\) gramians for all \(x,y \in X\) by

$$\begin{aligned} \begin{aligned} \langle x, \mathscr {O}y \rangle&=\mathbb {E} \left( \int _0^{\infty } \langle C \Phi (t)x, C \Phi (t) y \rangle \ \mathrm{d}t \right) \\ \langle x, \mathscr {P}y \rangle&=\mathbb {E} \left( \int _0^{\infty } \langle B^* \Phi (t)^* x, B^* \Phi (t)^* y \rangle \ \mathrm{d}t \right) . \end{aligned} \end{aligned}$$
(5.7)

To obtain a dynamical interpretation of the gramians, let us recall that for compact self-adjoint operators \(K: X\rightarrow X\), we can define the (possibly unbounded) Moore–Penrose pseudoinverse as

$$\begin{aligned} \begin{aligned} K^{\#}:{\text {ran}}(K) \oplus {\text {ran}}( K)^{\perp } \subset X \rightarrow X \text { such that }K^{\#}x := \sum _{\lambda \in \sigma (K)\backslash \{0 \}} \lambda ^{-1} \langle x,v_{\lambda } \rangle v_{\lambda } \end{aligned} \end{aligned}$$

using any orthonormal eigenbasis \((v_{\lambda })_{\lambda \in \sigma (K)}\) associated with eigenvalues \(\lambda \) of K.

Then, for any time \(\tau >0\) one defines the input energy \(E^{\tau }_{\text {input}}: X \rightarrow [0,\infty ]\) and output energy \(E^{\tau }_{\text {output}}: X \rightarrow [0,\infty ]\) up to time \(\tau \) as

$$\begin{aligned} \begin{aligned} E^{\tau }_{\text {input}}(x)&:= \inf _{u \in L^2((0,\infty ),\mathbb R^n); {\mathbb {E}}(Y_{\tau }(u))=x} \int _0^{{\tau }} \left\| u(t) \right\| ^2 \ \mathrm{d}t \text { and }\\ E^{\tau }_{\text {output}}(x)&:= \left\| C\Phi x \right\| ^2_{L^2(\Omega _{(0,\tau )},{\mathcal {H}})}, \end{aligned} \end{aligned}$$
(5.8)

where \(Y_t\) is the variation of constants process of the flow defined in (5.4). In particular, the expectation value \( {\mathbb {E}}(Y_{\tau }(u))\) appearing in the definition of the input energy is a solution to the deterministic equation

$$\begin{aligned} \varphi '(t) = T(t) \varphi (t) + Bu(t), \quad \varphi (0) = 0, \end{aligned}$$
(5.9)

where \(u \in L^2((0,\infty ),{\mathbb {R}}^n)\) is a deterministic control. The theory of linear systems implies that x is then reachable, by the dynamics of (5.9), after a fixed finite time \({\tau }>0\) if \(x \in {\text {ran}} \mathscr {P}_{\tau }^{\text {det}}\) where \( {\mathscr {P}}_{\tau }^{\text {det}}\) is the time-truncated deterministic linear gramian which for \(x,y \in X\) is defined as

$$\begin{aligned} \langle x,{\mathscr {P}}_{\tau }^{\text {det}}y \rangle := \int _0^{\tau } \langle B^*T(s)^*x,B^* T(s)^*y \rangle \ \mathrm{d}s. \end{aligned}$$

The control, of minimal \(L^2\) norm, that steers the deterministic system (5.9) into state x after time \(\tau \) is then given by \(u(t)=\mathbb {1}_{[0,\tau ]}(t) B^*T(\tau -t)^*\left( {\mathscr {P}}_{\tau }^{\text {det}}\right) ^{\#}x.\) We also define time-truncated stochastic reachability and observability gramians \({\mathscr {P}}_{\tau }\) and \({\mathscr {O}}_{\tau }\) for \(x,y \in X\)

$$\begin{aligned} \begin{aligned} \langle x, \mathscr {P}_{\tau }y \rangle&=\mathbb {E} \left( \int _0^{\tau } \langle B^*\Phi (t)^* x, B^* \Phi (t)^* y \rangle \ \mathrm{d}t \right) \text { and } \\ \langle x, \mathscr {O}_{\tau }y \rangle&=\mathbb {E} \left( \int _0^{\tau } \langle C\Phi (t)x, C \Phi (t) y \rangle \ \mathrm{d}t \right) . \end{aligned} \end{aligned}$$

An application of the Cauchy–Schwarz inequality shows that \({\text {ker}}(\mathscr {P}_{\tau }) \subset {\text {ker}}(\mathscr {P}_{\tau }^{\text {det}})\) and thus \(\overline{{\text {ran}}}(\mathscr {P}_{\tau }^{\text {det}}) \subset \overline{{\text {ran}}}(\mathscr {P}_{\tau }):\)

$$\begin{aligned} \begin{aligned} \langle x, \mathscr {P}_{\tau }^{\text {det}} x \rangle&=\int _0^{\tau } \left\| B^*T(t)^* x \right\| ^2 \ \mathrm{d}t=\int _0^{\tau } \left\| {\mathbb {E}} ( B^*\Phi (t,0)^* x )\right\| ^2 \ \mathrm{d}t \\&\le {\mathbb {E}} \int _0^{\tau } \left\| B^*\Phi (t,0)^* x \right\| ^2 \ \mathrm{d}t = \langle x, \mathscr {P}_{\tau } x \rangle . \end{aligned} \end{aligned}$$

Since for \(\tau _1>\tau _2: {\text {ker}}(\mathscr {P}_{\tau _1})\subset {\text {ker}}(\mathscr {P}_{\tau _2})\), it also follows that \( \overline{{\text {ran}}}(\mathscr {P}_{\tau _2}) \subset \overline{{\text {ran}}}(\mathscr {P}_{\tau _1}).\) Then, one has, as for finite-dimensional systems [19, Prop. 3.10], the following bound on the input energy (5.8):

Lemma 5.2

Let x be a reachable by the flow defined in (5.9) and \(x \in {\text {ran}}({\mathscr {P}}_{\tau })\) then

$$\begin{aligned} E^{\tau }_{\text {input}}(x)=\left\langle x,\left( \mathscr {P}_{\tau }^{{\text {det}}}\right) ^{\#}x \right\rangle \ge \langle x, {\mathscr {P}}_{\tau }^{\#} x \rangle . \end{aligned}$$

The output energy of any state \(x \in X\) satisfies

$$\begin{aligned} E^{\tau }_{\text {output}}(x)=\left\langle x, {\mathscr {O}}_{\tau } x \right\rangle \le \left\langle x, {\mathscr {O}} x \right\rangle . \end{aligned}$$

Proof

The representation of the output energy is immediate from the definition of the (time-truncated) observability gramian. For the representation of the input energy, we have by assumption \(x \in {\text {ran}}({\mathscr {P}}_{\tau }^{\text {det}}) \cap {\text {ran}}({\mathscr {P}}_{\tau })\). Consider then functions

$$\begin{aligned} \begin{aligned} u(t)&:=B^*T(\tau -t)^* \left( \mathscr {P}_{\tau }^{\text {det}}\right) ^{\#}x \text { and } v(t):=B^*\Phi (\tau ,t)^*{\mathscr {P}}_{\tau }^{\#}x. \end{aligned} \end{aligned}$$

Hence, we find since \(x= {\mathscr {P}}_{\tau }^{\text {det}}\left( {\mathscr {P}}_{\tau }^{\text {det}}\right) ^{\#}x=\mathscr {P}_{\tau }{\mathscr {P}}_{\tau }^{\#} x\)

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\int _0^T \left\langle v(s),u(s)-v(s) \right\rangle _{\mathbb R^n}\ \mathrm{d}s=0, \end{aligned} \end{aligned}$$

which implies the claim on the (time-truncated) reachability gramian

$$\begin{aligned} \begin{aligned} \left\langle x,\left( P_{\tau }^{{\text {det}}}\right) ^{\#}x \right\rangle&={\mathbb {E}} \int _0^t \left\| u(s) \right\| _{{\mathbb {R}}^n}^2 \ \mathrm{d}s\\&= {\mathbb {E}} \int _0^t \left\| v(s) \right\| _{\mathbb R^n}^2 \ \mathrm{d}s + {\mathbb {E}} \int _0^t \left\| u(s)-v(s) \right\| _{{\mathbb {R}}^n}^2 \ \mathrm{d}s \\&\ge \left\langle x,P_{\tau }^{\#}x \right\rangle . \end{aligned} \end{aligned}$$

\(\square \)

Remark 3

(Reachability concept) Apart from the energy concept discussed above, interesting ideas relating the eigendecomposition of the reachability gramian to the set of reachable states have been recently presented in [25, Sec.3] and apply to infinite-dimensional systems as well.

Definition 5.3

The stochastic Hankel operator is defined as

$$\begin{aligned} \begin{aligned} H \in {\text {HS}}\left( L^2(\Omega _{(0,\infty )},\mathbb R^n),L^2(\Omega _{(0,\infty )},{\mathcal {H}})\right) \text { such that }(Hf)(t,\omega ) = (WRf)(t,\omega ). \end{aligned} \end{aligned}$$
(5.10)

By Remark 2, the Hankel operator is trace class if \({\mathcal {H}}\simeq {\mathbb {R}}^m\) for some \(m \in {\mathbb {N}}.\)

From standard properties of the stochastic integral, it follows that the expectation value of the solution \({\mathbb {E}}(Z_t)\) or \({\mathbb {E}}(CZ_t)\) to (5.17) is just the solution \(\varphi \) or \(C\varphi \) to the linear and deterministic equation \(\varphi '(t)=A\varphi (t)+B\mathbb Eu(t).\) We can then show Proposition 1.3 which extends this analogy between stochastic and linear systems to the error bounds for deterministic controls:

Proposition 1.3

Let \((e_n)\) and \((f_n)\) be orthonormal systems in \(L^2((0,\infty ), \mathbb {R}^n)\) and \(L^2((0,\infty ),\mathbb {R}^m)\), then they are also orthonormal in \(L^2(\Omega _{(0,\infty )},{\mathbb {R}}^n)\) and \(L^2(\Omega _{(0,\infty )}, {\mathbb {R}}^m)\).

Let \(q_{k}(x):=\left\langle \widehat{e}_k,Cx \right\rangle _{\mathbb R^m}\) and \(g(\sigma ):=\Delta \left( {\mathbb {E}} \left( C\Phi (\sigma )B \right) \right) \in \mathbb {R}^{m \times n}.\) From the definition of the trace norm (1.6) and the semigroup property, it follows that

$$\begin{aligned} \begin{aligned} \left\| \Delta (H) \right\| _{{\text {TC}}}&\ge \sum _{i \in \mathbb {N}} \left|\langle f_i,\Delta (H) e_i \rangle \right|= \sum _{i \in \mathbb {N}} \Bigg \vert \int _{{{\mathbb {R}}}_{>0}^2} \sum _{k=1}^m\Delta \Bigg ( \int _{\Omega ^2} \left\langle f_i(s),\widehat{e}_k \right\rangle \\&\quad \left\langle \widehat{e}_k, C\Phi (s,\omega ')\Phi (t,\omega )B e_i(t) \right\rangle \ d{\mathbb {P}}(\omega ') d{\mathbb {P}}(\omega ) \Bigg ) \ \mathrm{d}s \ \mathrm{d}t \Bigg \vert \\&=\sum _{i \in \mathbb {N}} \left|\int _{{{\mathbb {R}}}_{>0}^2} \sum _{j=1}^n \sum _{k=1}^m \left\langle f_i(s),\widehat{e}_k \right\rangle \Delta \left( {\mathbb {E}}((P(s)q_k)(\Phi (t)\psi _j)) \right) \langle \widehat{e}_j,e_i(t)\rangle \ \mathrm{d}s \ \mathrm{d}t \right|. \end{aligned} \end{aligned}$$

Then, by the semigroup property of the time-homogeneous Markov process it follows that

$$\begin{aligned} \mathbb E((P(s)q_k)(\Phi (t)\psi _j))=(P(t)P(s)q_k)(\psi _j)=(P(t+s)q_k)(\psi _j) \end{aligned}$$

and thus

$$\begin{aligned} \begin{aligned}&\left\| \Delta (H) \right\| _{{\text {TC}}} \ge \sum _{i \in \mathbb {N}} \left|\int _{(0,\infty )^2} \sum _{j=1}^n\left\langle f_i(s),\Delta ( C P(s+t)\psi _j) \langle e_i(t), \widehat{e}_j\rangle \right\rangle _{{\mathbb {R}}^m} \ \mathrm{d}s \ \mathrm{d}t \right|\\&\quad = \sum _{i \in \mathbb {N}} \left|\int _{(0,\infty )^2} \left\langle f_i(s),\Delta \left( {\mathbb {E}}\left( C \Phi (s+t) B \right) \right) e_i(t) \right\rangle _{{\mathbb {R}}^m} \ \mathrm{d}s \ \mathrm{d}t \right|\\&\quad = \sum _{i \in \mathbb {N}} \left|\int _{(0,\infty )^2} \left\langle f_i(s),g(s+t) \ e_i(t) \right\rangle _{{\mathbb {R}}^m} \ \mathrm{d}s \ \mathrm{d}t \right|. \end{aligned} \end{aligned}$$

The standard estimate for linear systems [22, Theorem 2.1] implies then

$$\begin{aligned} \left\| g \right\| _{L^1((0, \infty ),{\mathcal {L}}(\mathbb {R}^n, \mathbb {R}^m))} \le 2\left\| \Delta (H) \right\| _{{\text {TC}}}. \end{aligned}$$

By homogeneity of the Markov semigroup and Young’s inequality, we find

$$\begin{aligned} \begin{aligned}&\left\| {\mathbb {E}} \Delta (CY_{\bullet }(u)) \right\| _{L^p((0,\infty ),{\mathbb {R}}^m)} \le \left\| \int _{(0,\bullet )} \left\| \Delta \left( {\mathbb {E}}(C \Phi (\bullet -s) B)\right) u(s) \right\| _{{\mathbb {R}}^m} \ \mathrm{d}s \right\| _{L^p((0,\infty ),{\mathbb {R}})} \\&\quad \le \left\| \int _{(0,\infty )} \left\| \mathbb {1}_{(0,\infty )}(\bullet -s)\Delta \left( {\mathbb {E}}(C \Phi (\bullet -s) B)\right) \right\| _{\mathcal L(\mathbb {R}^n,\mathbb {R}^m)} \left\| \mathbb {1}_{(0,\infty )}(s) u(s) \right\| _{{\mathbb {R}}^m} \ \mathrm{d}s \right\| _{L^p((0,\infty ),{\mathbb {R}})} \\&\quad \le \left\| \left\| \mathbb {1}_{(0,\infty )} \Delta \left( {\mathbb {E}}(C \Phi B)\right) \right\| _{\mathcal L(\mathbb {R}^n,\mathbb {R}^m)} * \left\| \mathbb {1}_{(0,\infty )} u \right\| _{{\mathbb {R}}^n} \right\| _{L^p((0,\infty ),{\mathbb {R}})}\\&\quad \le \left\| g \right\| _{L^1((0,\infty ), \mathcal L(\mathbb {R}^n,\mathbb {R}^m))} \left\| u \right\| _{L^{p}((0,\infty ), \mathbb {R}^n)} \le 2\left\| \Delta (H) \right\| _{{\text {TC}}}\left\| u \right\| _{L^{p}( (0,\infty ), \mathbb {R}^n)}. \end{aligned} \end{aligned}$$

\(\square \)

While the error bound in Proposition 1.3 relied essentially on linear theory, our next estimate in Theorem 3 bounds the expected error. The proof strategy resembles the proof presented for bilinear systems in Lemma 4.2. We start, as we did for bilinear systems, by introducing the Volterra kernels of the stochastic Hankel operator.

Definition 5.4

The Volterra kernel of the stochastic Hankel operator is defined as

$$\begin{aligned} h((s,\omega ),(t,\omega ')):= C\Phi (s,\omega )\Phi (t,\omega ')B \end{aligned}$$

and the compressed Volterra kernel is \(\overline{h}(s,\omega ):= C\Phi (s,\omega )B.\)

Proof of Theorem 3

We will show that the difference of compressed Volterra kernels \({\overline{h}}\) of the two systems satisfies

$$\begin{aligned} \int _{0}^{\infty } \left\| \Delta ( \overline{h}(v,\bullet )) \right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, \mathbb R^m))} \ \mathrm{d}v \le 2 \left\| \Delta ( H) \right\| _{{\text {TC}}\left( L^2(\Omega _{(0,\infty )},\mathbb R^n),L^2(\Omega _{(0,\infty )},{\mathbb {R}}^m)\right) }. \end{aligned}$$
(5.11)

We start by showing how (5.11) implies (1.5)

$$\begin{aligned} \begin{aligned} \sup _{t \in (0,\infty )} {\mathbb {E}} \left\| \Delta (CY_t(u)) \right\| _{{\mathbb {R}}^m}&\le \sup _{t \in (0,\infty )} \int _{(0,t)} {\mathbb {E}} \left\| \Delta \left( C \Phi (t,s) B\right) u(s) \right\| _{{\mathbb {R}}^m} \ \mathrm{d}s \\&\le \sup _{t \in (0,\infty )} \int _{(0,t)} \left( {\mathbb {E}} \left\| \Delta \left( C \Phi (t,s) B\right) \right\| _{{\mathcal {L}}({\mathbb {R}}^n, \mathbb R^m)}^2\right) ^{1/2} \left( {\mathbb {E}} \left\| u(s) \right\| _{{\mathbb {R}}^n}^2\right) ^{1/2} \ \mathrm{d}s \\&\le \int _{(0,\infty )} \left( {\mathbb {E}} \left\| \Delta \left( C \Phi (t) B\right) \right\| _{{\mathcal {L}}({\mathbb {R}}^n, \mathbb R^m)}^2\right) ^{1/2} \ \mathrm{d}t \left\| u \right\| _{\mathcal H_2^{(0,\infty )}({\mathbb {R}}^n)} \\&\le 2\left\| \Delta (H) \right\| _{{\text {TC}}}\left\| u \right\| _{\mathcal H_2^{(0,\infty )}({\mathbb {R}}^n)}. \end{aligned} \end{aligned}$$

Thus, it suffices to verify (5.11). Let \(Z:= L^2\left( \Omega ,\mathbb {R}^{m}\right) \otimes L^2 \left( \Omega ,\mathbb {R}^{n}\right) \). The independence assumption in the theorem has been introduced for

$$\begin{aligned} \left\| \Delta (h((s,\bullet ),(t,\bullet ')))\right\| _{Z} = \left\| \Delta ( \overline{h}(s+t,\bullet ))\right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, {\mathbb {R}}^m))}\, \end{aligned}$$

to hold. To see this, we consider an auxiliary function \(\xi _{i}(x_1,x_2):=\big ( \langle \widehat{e}_i,Cx_1-{\widetilde{C}}x_2 \rangle _{{\mathbb {R}}^m} \big )^2\), where C and \({\widetilde{C}}\) are the observation operators of the two systems. By the independence assumption, there is again a Markov semigroup \((\mathbf P(t) )_{t \ge 0}\) associated with the time-homogeneous Markov process determined by the vector-valued flow \(({\varvec{\Phi }}{} \mathbf{{(t)}})_{t \ge 0}:=(\Phi (t),{\widetilde{\Phi }}(t))_{t \ge 0}\) such that \((\mathbf P(t) \xi _i)(x_1,x_2):={\mathbb {E}}(\xi _i(\Phi (s+t,s)x_1, {\widetilde{\Phi }}(s+t,s)x_2)).\) Let \((\psi _j)_{j \in \left\{ 1,\ldots ,n \right\} }, ({\widetilde{\psi }})_{j \in \left\{ 1,\ldots ,n \right\} }\) be the vectors in X comprising the control operators B and \(\widetilde{B}\), respectively. The semigroup property of \((\mathbf{P(t)})_{t \ge 0}\) implies then

$$\begin{aligned} \begin{aligned}&\left\| \Delta (h((s,\bullet ),(t,\bullet ')))\right\| ^2_{Z} \\&\quad = \sum _{i=1}^m \sum _{j=1}^n\int _{\Omega \times \Omega } \xi _i \left( {\varvec{\Phi }}(\mathbf{{s}},\omega ){\varvec{\Phi }}(\mathbf{{t}},\omega ')(\psi _j,\widetilde{\psi _j}) \right) \ \mathrm{d} {\mathbb {P}}(\omega ) \ \mathrm{d} {\mathbb {P}}(\omega ')\\&\quad = \sum _{i=1}^m \sum _{j=1}^n{\mathbb {E}}\left( \mathbf{P(s)} \xi _i\left( {\varvec{\Phi }}(\mathbf{{t}})(\psi _j,\widetilde{\psi _j})\right) \right) \\&\quad = \sum _{i=1}^m \sum _{j=1}^n (\mathbf{P(t)P(s)}\xi _{i})(\psi _j,{\widetilde{\psi }}_j) = \sum _{i=1}^m \sum _{j=1}^n (\mathbf{P(s+t)}\xi _{i})(\psi _j,{\widetilde{\psi }}_j) \\&\quad = \left\| \Delta (\overline{h}(s+t,\bullet ))\right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, {\mathbb {R}}^m))}^2. \end{aligned} \end{aligned}$$
(5.12)

Let M be large enough such that \(\frac{1}{2} \int _{(2M,\infty )} \left\| \Delta ( \overline{h}(v,\bullet )) \right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, \mathbb R^m))} \ \mathrm{d}v \le \varepsilon .\) Then, consider the integral function defined for \(0<\alpha /2<x\)

$$\begin{aligned} G(x,\alpha ):=\frac{1}{\alpha } \int _{x-\alpha /2}^{x+\alpha /2} \Delta ( \overline{h}(2s,\bullet )) \ \mathrm{d}s. \end{aligned}$$

By Lebesgue’s differentiation theorem for Bochner integrals, this function converges for \(x \in (0,M)\) pointwise on a set \(I \subset (0,M)\) of full measure to its integrand evaluated at \(s=x\) as \(\alpha \downarrow 0.\) In particular, for any \(x \in I\) there is \(\delta _x<\text {min}(x,M-x)\) such that if \(0 < \alpha /2 \le \delta _x\) then

$$\begin{aligned} \begin{aligned}&\left|\frac{1}{\alpha } \int _{x-\alpha /2}^{x+\alpha /2} \left\| \Delta ( \overline{h}(2s,\bullet ) )\right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, \mathbb R^m))} \ \mathrm{d}s- \left\| \Delta ( \overline{h}(2x,\bullet ) )\right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, \mathbb R^m))} \right|\\&\quad \le \frac{1}{\alpha } \int _{x-\alpha /2}^{x+\alpha /2} \left\| \Delta ( \overline{h}(2s,\bullet ) - \overline{h}(2x,\bullet ) )\right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, {\mathbb {R}}^m))} \ \mathrm{d}s \le \varepsilon /M. \end{aligned} \end{aligned}$$
(5.13)

Since \(\Delta (h((s,\bullet ),(t,\bullet ')))\) contains the products of two flows, the function \(\Delta (h((x,\bullet ),(x,\bullet ')))\) is a.e. well defined on the diagonal. Then, there is a set J of full measure such that every \(x \in J \subset (0,M)\) is a Lebesgue point of the Volterra kernel on the diagonal. Thus, as for the condensed Volterra kernel above, there is also for the full Volterra kernel some \(0<\gamma _x<\text {min}(x,M-x)\) such that if \(0 < \alpha /2 \le \gamma _x\) then

$$\begin{aligned} \frac{1}{\alpha ^2} \int _{x-\alpha /2}^{x+\alpha /2} \int _{x-\alpha /2}^{x+\alpha /2} \left\| \Delta (h((s,\bullet ),(t,\bullet ')))- \Delta (h((x,\bullet ),(x,\bullet ')))\right\| _Z \ \mathrm{d}s \ \mathrm{d}t \le \varepsilon /M. \end{aligned}$$
(5.14)

This is due to Lebesgue’s differentiation theorem for Banach space-valued integrands applied to the flows \(\Phi ,{\widetilde{\Phi }}\) and the following estimate

$$\begin{aligned} \begin{aligned}&\frac{1}{\alpha ^2} \int _{x-\alpha /2}^{x+\alpha /2} \int _{x-\alpha /2}^{x+\alpha /2} \left\| \Delta \left( h((s,\bullet ),(t,\bullet '))\right) - \Delta \left( h((x,\bullet ),(x,\bullet '))\right) \right\| _Z \ \mathrm{d}s \ \mathrm{d}t \\&\quad \le \frac{1}{\alpha ^2} \int _{x-\alpha /2}^{x+\alpha /2} \int _{x-\alpha /2}^{x+\alpha /2} \left\| \Delta \left( h((s,\bullet ),(t,\bullet '))\right) - \Delta \left( h((s,\bullet ),(x,\bullet '))\right) \right\| _Z \ \mathrm{d}s \ \mathrm{d}t \\&\qquad + \frac{1}{\alpha ^2} \int _{x-\alpha /2}^{x+\alpha /2} \int _{x-\alpha /2}^{x+\alpha /2} \left\| \Delta \left( h((s,\bullet ),(x,\bullet '))\right) - \Delta \left( h((x,\bullet ),(x,\bullet '))\right) \right\| _Z \ \mathrm{d}s \ \mathrm{d}t \\&\quad \le \frac{\left\| C \right\| \left\| B \right\| _{{\text {HS}}}}{\alpha ^2} \int _{x-\alpha /2}^{x+\alpha /2} \int _{x-\alpha /2}^{x+\alpha /2} \left\| \Phi (s)\right\| _{L^2(\Omega , \mathcal {L}(X))}\left\| \Phi (t)-\Phi (x) \right\| _{L^2(\Omega , \mathcal {L}(X))} \ \mathrm{d}s \ \mathrm{d}t \\&\qquad + \frac{\left\| C \right\| \left\| B \right\| _{{\text {HS}}}}{\alpha ^2} \int _{x-\alpha /2}^{x+\alpha /2} \int _{x-\alpha /2}^{x+\alpha /2} \left\| \Phi (s)-\Phi (x) \right\| _{L^2(\Omega , \mathcal {L}(X))} \left\| \Phi (x) \right\| _{L^2(\Omega , \mathcal {L}(X))} \ \mathrm{d}s \ \mathrm{d}t \\&\qquad + \frac{\left\| \widetilde{C} \right\| \left\| \widetilde{B} \right\| _{{\text {HS}}}}{\alpha ^2} \int _{x-\alpha /2}^{x+\alpha /2} \int _{x-\alpha /2}^{x+\alpha /2} \left\| \widetilde{\Phi (s)} \right\| _{L^2(\Omega , \mathcal {L}(X))}\left\| \widetilde{\Phi (t)}-\widetilde{\Phi (x)} \right\| _{L^2(\Omega , \mathcal {L}(X))} \ \mathrm{d}s \ \mathrm{d}t \\&\qquad + \frac{\left\| \widetilde{C} \right\| \left\| \widetilde{B} \right\| _{{\text {HS}}}}{\alpha ^2} \int _{x-\alpha /2}^{x+\alpha /2} \int _{x-\alpha /2}^{x+\alpha /2} \left\| \widetilde{\Phi (s)}-\widetilde{\Phi (x)} \right\| _{L^2(\Omega , \mathcal {L}(X))} \left\| \widetilde{\Phi (x)}\right\| _{L^2(\Omega , \mathcal {L}(X))} \ \mathrm{d}s \ \mathrm{d}t. \end{aligned} \end{aligned}$$

Consider then the family of intervals \(I_x:=[x-{\text {min}} \left( \delta _x, \gamma _x\right) ,x+{\text {min}} \left( \delta _x, \gamma _x\right) ]\) for \(x \in I\cap J.\) Lebesgue’s covering theorem [26, Theroem 26] states that, after possibly shrinking the diameter of the sets \(I_x\) first, there exists an at most countably infinite family of disjoint sets \((I_{x_i})_{i \in {\mathbb {N}}}\) covering \(I \cap J\) such that the Lebesgue measure of \(I \cap J \cap \left( \bigcup _{i \in {\mathbb {N}}} I_{x_i} \right) ^{C}\) is zero. The additivity of the Lebesgue measure implies that there are for every \(\varepsilon >0\) finitely many points \(x_1,\ldots ,x_n \in I \cap J\) such that the set \(I\cap J \cap \left( \bigcup _{i=1}^n I_{x_i} \right) ^{C}\) has Lebesgue measure at most \(\varepsilon \) . Thus, we have obtained finitely many disjoint sets \(I_{x_i}\) of total measure \(M-\varepsilon \) such that for \(0<\alpha _i /2 \le {\text {diam}}(I_{x_i})/2\) both estimates (5.13) and (5.14) hold at \(x=x_i\) where \(x_i\) is the midpoint of \(I_{x_i}.\)

For every \(i \in \left\{ 1,\ldots ,n \right\} \) fixed, we introduce the family of sesquilinear forms \((L_{i})\)

$$\begin{aligned} \begin{aligned}&L_{i}: L^2\left( \Omega ,\mathbb {R}^{m}\right) \oplus L^2 \left( \Omega ,\mathbb {R}^{n}\right) \rightarrow \mathbb {R} \\&(f,g) \mapsto \int _{\Omega ^2} \left\langle f(\omega ),\Delta (h((x_i,\omega ),(x_i,\omega '))) g(\omega ') \right\rangle _{\mathbb {R}^{m}} \ \mathrm{d}{\mathbb {P}}(\omega ) \ \mathrm{d}\mathbb P(\omega ') \end{aligned} \end{aligned}$$

and for \(Z:= L^2\left( \Omega ,\mathbb {R}^{m}\right) \otimes L^2 \left( \Omega ,\mathbb {R}^{n}\right) \) we can define a Hilbert–Schmidt operator of unit \({\text {HS}}\)-norm given by \(Q_i : L^2 \left( \Omega ,\mathbb {R}^{n}\right) \rightarrow L^2\left( \Omega ,\mathbb {R}^{m}\right) \)

$$\begin{aligned} \begin{aligned}&(Q_i\varphi )(\omega ):=\int _{\Omega } \tfrac{\Delta (h((x_i,\omega ),(x_i,\omega ')))}{\left\| \Delta (h((x_i,\bullet ),(x_i,\bullet ')))\right\| _{Z}} \varphi (\omega ') \ \mathrm{d}{\mathbb {P}}(\omega '). \end{aligned} \end{aligned}$$

The singular value decomposition of \(Q_i\) yields orthonormal systems \(f_{k,i} \in L^2\left( \Omega ,\mathbb {R}^{m}\right) , \ g_{k,i} \in L^2 \left( \Omega ,\mathbb {R}^{n}\right) \) as well as singular values \(\sigma _{k,i} \in [0,1]\) parameterized by \(k \in {\mathbb {N}}.\) For any \(\delta >0\), given there is \(N(\delta )\) large enough such that

$$\begin{aligned} \left\| \tfrac{\Delta (h((x_i,\bullet ),(x_i,\bullet ')))}{\left\| \Delta (h((x_i,\bullet ),(x_i,\bullet ')))\right\| _{Z}} - \sum _{k=1}^{N(\delta )} \sigma _{k,i} ( f_{k,i} \otimes g_{k,i} )\right\| _{Z}<\delta . \end{aligned}$$

Thus, there are also \(f_{k,i}\in L^2\left( \Omega ,\mathbb {R}^{m}\right) \) and \(g_{k,i} \in L^2 \left( \Omega ,\mathbb {R}^{n}\right) \) orthonormalized, \(N_i \in {\mathbb {N}},\) and \(\sigma _{k,i} \in [0,1]\) such that

$$\begin{aligned} \begin{aligned}&\left|\left\langle \tfrac{\Delta (h((x_i,\bullet ),(x_i,\bullet ')))}{\left\| \Delta (h((x_i,\bullet ),(x_i,\bullet ')))\right\| _{Z}} -\sum _{k=1}^{N_i} \sigma _{k,i} ( f_{k,i} \otimes g_{k,i} ), \Delta (h((x_i,\bullet ),(x_i,\bullet '))) \right\rangle _{Z} \right|\\&=\left|\left\| \Delta (h((x_i,\bullet ),(x_i,\bullet ')))\right\| _{Z} -\sum _{k=1}^{N_i} \sigma _{k,i} L_{i}(f_{k,i},g_{k,i}) \right|<\varepsilon / M. \end{aligned} \end{aligned}$$
(5.15)

Then, \(s_{k,i}(s,\omega ):= \sqrt{\left|I_{x_i} \right|}^{-1} g_{k,i}(\omega )\mathbb {1}_{I_{x_i}}(s)\) and \(t_{k,i}(s,\omega ):= \sqrt{\left|I_{x_i} \right|}^{-1}f_{k,i}(\omega ) \mathbb {1}_{I_{x_i}}(s)\) form orthonormal systems in \(L^2\left( \Omega _{(0,\infty )},\mathbb {R}^n\right) \) and \(L^2\left( \Omega _{(0,\infty )},\mathbb {R}^{m}\right) \), respectively, both in k and i, such that for \(\mathcal I_i:=\Omega _{I_{x_i}}\times \Omega _{I_{x_i}}\) it follows that

$$\begin{aligned} \begin{aligned}&\langle t_{k,i}, \Delta ( H) s_{k,i} \rangle _{L^2\left( \Omega _{(0,\infty )},\mathbb {R}^{m}\right) } \\&\quad =\tfrac{1}{\left|I_{x_i} \right|}\int _{{\mathcal {I}}_{i}} \left\langle f_{k,i}(\omega ),\Delta (h((s,\omega ),(t,\omega ')))g_{k,i}(\omega ')\right\rangle _{\mathbb {R}^{m}} \ \mathrm{d}t \ \mathrm{d}s \ \mathrm{d}{\mathbb {P}}(\omega ) \ \mathrm{d} {\mathbb {P}}(\omega '). \end{aligned} \end{aligned}$$
(5.16)

Hence, we get

$$\begin{aligned} \begin{aligned}&\left|\sum _{i=1}^{n } \left( \sum _{k=1}^{N_i} \sigma _{k,i} \langle t_{k,i}, \Delta ( H) s_{k,i} \rangle _{L^2\left( \Omega _{(0,\infty )},\mathbb {R}^{m}\right) } - \int _{I_{x_i}^2} \tfrac{\left\| \Delta ( \overline{h}(2x_i,\bullet ) )\right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, {\mathbb {R}}^m))}}{\left|I_{x_i} \right|}\ \mathrm{d}s \ \mathrm{d}t\right) \right|\\&\quad \le \sum _{i=1}^{n} \tfrac{1}{\left|I_{x_i} \right|} \int _{I_{x_i}^2} \left( \left|\sum _{k=1}^{N_i} \sigma _{k,i} \left\langle g_{k,i} \otimes f_{k,i}, \left( \Delta \left( h((s,\bullet ),(t,\bullet '))\right) -\Delta ({h}((x_i,\bullet ),(x_i,\bullet ')))\right) \right\rangle _Z \right|\right. \\&\qquad \left. +\left|\sum _{k=1}^{N_i} \sigma _{k,i} L_{i}(f_{k,i},g_{k,i})-\left\| \Delta (h((x_i,\bullet ),(x_i,\bullet '))) \right\| _{Z} \right|\right. \\&\qquad \left. +\left|\sum _{k=1}^{N_i} \left\| \Delta (h((x_i,\bullet ),(x_i,\bullet '))) \right\| _{Z}- \left\| \Delta ( \overline{h}(2x_i,\bullet )) \right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, \mathbb R^m))}\right|\right) \ \mathrm{d}s \ \mathrm{d}t \ \lesssim \varepsilon . \end{aligned} \end{aligned}$$

The bound on the first term follows from (5.14) and \(\left\| \sum _{i=1}^{N_i} \sigma _{k,i} g_{k,i} \otimes f_{k,i} \right\| _{Z} \le 1.\) The bound on the second term follows from (5.15) and the third term is (5.12). We then compute further that

$$\begin{aligned} \begin{aligned}&\left|\sum _{i=1}^{n } \left( \int _{I_{x_i}^2} \tfrac{\left\| \Delta ( \overline{h}(2x_i,\bullet ) )\right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, \mathbb R^m))}}{\left|I_{x_i}\right|} \ \mathrm{d}s \ \mathrm{d}t - \int _{2I_{x_i} } \tfrac{ \left\| \Delta ( \overline{h}(v,\bullet ) )\right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, \mathbb R^m))}}{2} \ \mathrm{d}v \right) \right|\\&\quad \le \left|\sum _{i=1}^{n } \left( \int _{I_{x_i}^2} \tfrac{\left\| \Delta ( \overline{h}(2x_i,\bullet ) )\right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, \mathbb R^m))}}{\left|I_{x_i}\right|} \ \mathrm{d}s \ \mathrm{d}t - \left|I_{x_i} \right|\left\| \Delta ( \overline{h}(2x_i,\bullet ) )\right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, {\mathbb {R}}^m))} \right) \right|\\&\qquad +\left|\sum _{i=1}^{n } \left( \left|I_{x_i} \right|\left\| \Delta ( \overline{h}(2x_i,\bullet ) )\right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, \mathbb R^m))} - \int _{2I_{x_i}} \tfrac{\left\| \Delta ( \overline{h}(v,\bullet ) )\right\| _{L^2(\Omega , {\text {HS}}({\mathbb {R}}^n, {\mathbb {R}}^m))}}{2} \ \mathrm{d}v \right) \right|\lesssim \varepsilon \end{aligned} \end{aligned}$$

where we used (5.13) to obtain the second estimate. Combining the two preceding estimates, the theorem follows from the characterization of the trace norm given in (1.6). \(\square \)

Next, we study conditions under which convergence of flows implies convergence of stochastic Hankel operators. Let \((\Phi _i)\) be a sequence of flows converging in \(L^2(\Omega _{(0,\infty )},\mathcal L(X))\) to \(\Phi \) and \(W_i\), \(R_i\) the observability and reachability maps derived from \(\Phi _i\) as in (5.6). For the observability map, this yields convergence in operator norm

$$\begin{aligned} \left\| W-W_i \right\| ^2={\mathbb {E}}\int _{(0,\infty )} \left\| C(\Phi -\Phi _i)(t) \right\| _{{\mathcal {L}}(X,\mathcal H)}^2 \ \mathrm{d}t \xrightarrow [i \rightarrow \infty ]{} 0. \end{aligned}$$

If \({\mathcal {H}} \simeq \mathbb {R}^m\), then it follows by an analogous estimate that \(W_i\) converges to W in Hilbert–Schmidt norm, too [20, Theorem 6.12(iii)].

For the reachability map, we choose an ONB \((e_k)_{k \in \mathbb {N}}\) of \(L^2(\Omega _{(0,\infty )},{\mathbb {R}})\) which we extend by tensorization \(e_k^j:=e_k \otimes \widehat{e}_j\) for \(j \in \left\{ 1,\ldots ,n \right\} \) to an ONB of \(L^2(\Omega _{(0,\infty )},{\mathbb {R}}^n).\) Using this basis and an orthonormal basis \((f_l)_{l \in {\mathbb {N}}}\) of X, it follows that

$$\begin{aligned} \begin{aligned}&\left\| R_i-R \right\| ^2_{{\text {HS}}(L^2(\Omega _{(0,\infty )},\mathbb R^n), X)}\\&\quad =\sum _{l \in {\mathbb {N}}}\sum _{k \in \mathbb {N}} \sum _{j=1}^n \left|\int _{\Omega _{(0,\infty )}}\left\langle f_l, ( \Phi - \Phi _i)(t)(\omega )\psi _j \right\rangle _X e_k(t)(\omega ) \ \mathrm{d}t \ \mathrm{d}{\mathbb {P}}(\omega ) \right|^2 \\&\quad =\sum _{l \in {\mathbb {N}}} \sum _{j=1}^n \int _{\Omega _{(0,\infty )}} \left|\left\langle f_l, ( \Phi - \Phi _i)(t)(\omega )\psi _j\right\rangle _X \right|^2 \ \mathrm{d}t \ \mathrm{d}{\mathbb {P}}(\omega )\\&\quad =\sum _{j=1}^n \int _{\Omega _{(0,\infty )}} \left\| ( \Phi - \Phi _i)(t)(\omega )\psi _j \right\| ^2_X \ \mathrm{d}t \ \mathrm{d}\mathbb P(\omega ) \xrightarrow [i \rightarrow \infty ]{} 0. \end{aligned} \end{aligned}$$

As in the bilinear case, we obtain from this a convergence result for stochastic Hankel operators:

Corollary 5.5

Let \(H_i\) denote the Hankel operators associated with flows \(\Phi _i\) converging in \(L^2(\Omega _{(0,\infty )},{\mathcal {L}}(X))\) to \(\Phi .\) Then, the \(H_i\) converge in Hilbert–Schmidt norm to H

$$\begin{aligned} \left\| H_i - H \right\| _{{\text {HS}}} \le \left\| W_i-W \right\| \left\| R_i \right\| _{{\text {HS}}} + \left\| W \right\| \left\| R_i- R \right\| _{{\text {HS}}}\xrightarrow [i \rightarrow \infty ]{} 0, \end{aligned}$$

and if \({\mathcal {H}} \simeq \mathbb {R}^m\), then the convergence is also in the sense of trace class operators

$$\begin{aligned} \left\| H_i - H \right\| _{{\text {TC}}} \le \left\| W_i-W \right\| _{{\text {HS}}} \left\| R_i \right\| _{{\text {HS}}} + \left\| W \right\| _{{\text {HS}}} \left\| R_i- R \right\| _{{\text {HS}}}\xrightarrow [i \rightarrow \infty ]{} 0. \end{aligned}$$

In particular, all singular values of \(H_i\) converge to the singular values of H [21, Corollary 2.3] and, if the respective singular values non-degenerate, then the singular vectors converge in norm as well (see the proof of Lemma 3.6).

To exhibit the connection between the model reduction methods for SPDEs and bilinear systems, we finally state a weak version of the stochastic Lyapunov equations for real-valued Lévy noise as stated for finite-dimensional systems in [19, Eq. (14), (22)]. Let \((L_t)\) be a square-integrable scalar Lévy process, then \(M_t:=L_t-t {\mathbb {E}}(L_1)\) is a square-integrable centred martingale [19, Theorem 2.7]. Its quadratic variation measure satisfies \(\mathrm{d}\langle M,M \rangle _t= \mathbb E\left( M_1^2\right) \ \mathrm{d}t\). Let \((X_s)\) be an X-valued, predictable process with \(\int _0^T \left\| X_s \right\| _X^2 \mathrm{d}\langle M,M \rangle _s< \infty \), then the stochastic integral is defined by the unconditional convergent series \(\int _0^t X_s \ \mathrm{d}M_s:=\sum _{k \in {\mathbb {N}}} \int _0^t \langle X_s, e_k \rangle \ \mathrm{d}M_s \ e_k \) where \((e_k)\) is any ONB of X for \(t \in [0,T]\) and the isometry formula

$$\begin{aligned} {\mathbb {E}} \left\| \int _0^t X_s \ \mathrm{d}M_s \right\| _X^2 = \mathbb E \int _0^t \left\| X_s \right\| _X^2 \ \mathrm{d}\langle M,M \rangle _s \end{aligned}$$

holds [27, Def.6 and Prop.8]. Moreover, from the series representation it follows from one-dimensional theory [19, Theorem 2.11] that \(\int _0^T X_s \ \mathrm{d}M_s\) is a martingale and \({\mathbb {E}}\int _0^T X_s \ \mathrm{d}M_s=0.\)

Consider n independent copies of such martingales \((M_t^{(j)})_{j \in \left\{ 1,\ldots ,n \right\} }\) and the control operator B as before. We then study the stochastic evolution equation

$$\begin{aligned} \begin{aligned} \mathrm{d}Z_t&= (AZ_t + Bu) \ \mathrm{d}t +\sum _{j=1}^{n} N_j Z_t \ \mathrm{d}M^{(j)}_t, \quad t >0 \\ Z_{0}&=\xi \end{aligned} \end{aligned}$$
(5.17)

for \(\xi \in L^2(\Omega ,{\mathcal {F}}_0,{\mathbb {P}},X)\), A the generator of a \(C_0\)-semigroup (T(t)), and \(N_j \in \mathcal L(X)\). Then, the homogeneous part of (5.17), i.e. without the control term Bu, defines a unique predictable process \(Z_t^{\text {hom}}:=\Phi (t)\xi \in {\mathcal {H}}_{2}^{(0,T)}\) [28, Def. 9.11, Theorem 9.15, Theorem 9.29] with flow \(\Phi \) that satisfies the homogeneous Markov property [28, Prop. 9.31 and 9.32] and

$$\begin{aligned} Z_t^{\text {hom}}=T(t)\xi + \sum _{j=1}^{n} \int _{0}^t T(t-s)N_j Z_s^{\text {hom}} \ \mathrm{d}M^{(j)}_s. \end{aligned}$$
(5.18)

The adjoint equation to (5.17) shall be defined with initial condition \(Y_0 = \xi \) as

$$\begin{aligned} \begin{aligned} \mathrm{d}Y_t= (A^*Y_t+ Bu) \ \mathrm{d}t +\sum _{j=1}^{n} N_j^* Y_t \ \mathrm{d}M^{(j)}_t , \quad t >0, \end{aligned} \end{aligned}$$

and the mild solution to the homogeneous part of this equation is

$$\begin{aligned} Y_t^{\text {hom}}=T(t)^*\xi + \sum _{j=1}^{n} \int _{0}^t T(t-s)^*N_j^* Y_s^{\text {hom}} \ \mathrm{d}M^{(j)}_s. \end{aligned}$$
(5.19)

Let \(\Psi \) be the flow of the adjoint equation such that \(Y_t^{\text {hom}}:=\Psi (t)\xi \), then the X-adjoint of \(\Psi \) satisfies the variation of constant formula

$$\begin{aligned} \Psi (t)^*\xi = T(t)\xi + \sum _{j=1}^{n} \int _{0}^t \Psi (s)^*N_j T(t-s)\xi \ \mathrm{d}M^{(j)}_s. \end{aligned}$$

For \(\Phi \) being an exponentially stable flow in m.s.s. to (5.18), we then define another observability gramian for (5.17) by

$$\begin{aligned} \langle x,{\mathscr {O}}^{\text {L}\acute{\text {e}}\text {vy}} y \rangle :=\int _0^{\infty } \langle C \Psi (t)^* x,C \Psi (t)^*y \rangle \ \mathrm{d}t. \end{aligned}$$

To see that \({\mathscr {O}}^{\text {L}\acute{\text {e}}\text {vy}}\) coincides with the standard stochastic observability gramian (5.7) \(\mathscr {O}\), it suffices to show that for all \(x \in X:\) \({\mathbb {E}} \left\| C \Phi (t)x \right\| _{{\mathcal {H}}}^2 = \mathbb E\left\| C \Psi (t)^*x \right\| _{{\mathcal {H}}}^2.\) Applying Itō’s isometry, we obtain from (5.18) using sets \(\Delta _k(t):=\{(s_1,\ldots ,s_k) \in \mathbb {R}^k; 0 \le s_k \le \cdots \le s_1 \le t\}\)

$$\begin{aligned} \begin{aligned} {\mathbb {E}} \left\| C \Phi (t)x \right\| _{{\mathcal {H}}}^2&= \left\| CT(t) x \right\| _{{\mathcal {H}}}^2+ \sum _{i=1}^n {\mathbb {E}}\left( M^{(i)}(1)^2\right) \ {\mathbb {E}} \int _0^t \left\| C T(t-s_1)N_{i} \Phi (s_1) x \right\| _{{\mathcal {H}}}^2 \mathrm{d}s_1 \\&=\left\| CT(t) x \right\| _{{\mathcal {H}}}^2 + \sum _{k=1}^{\infty } \sum _{i_1,\ldots ,i_k=1}^n \prod _{j=1}^k \mathbb E\left( M^{(i_j)}(1)^2\right) \cdot \\&\quad \cdot \int _{\Delta _k(t)} \left\| C T(t-s_1) \prod _{j=1}^{k-1}\left( N_{i_j} T(s_j-s_{j+1}) \right) N_{i_k}T(s_{k}) x \right\| _{{\mathcal {H}}}^2 \mathrm{d}s, \end{aligned} \end{aligned}$$

whereas it follows from (5.19)

$$\begin{aligned} \begin{aligned} {\mathbb {E}} \left\| C \Psi (t)^*x \right\| _{{\mathcal {H}}}^2&= \left\| CT(t) x \right\| _{{\mathcal {H}}}^2+ \sum _{i=1}^n {\mathbb {E}} \left( M^{(i)}(1)^2\right) \ {\mathbb {E}} \int _0^t \left\| C \Psi (s_1)^*N_i T(t-s_1) x \right\| _{{\mathcal {H}}}^2 \mathrm{d}s_1 \\&=\left\| CT(t) x \right\| _{{\mathcal {H}}}^2 + \sum _{k=1}^{\infty } \sum _{i_1,\ldots ,i_k=1}^n \prod _{j=1}^k \mathbb E\left( M^{(i_j)}(1)^2\right) \\&\quad \cdot \int _{\Delta _k(t)} \left\| C T(s_{k}) \prod _{j=k-1}^{1} \left( N_{i_{j+1}} T(s_{j}-s_{j+1}) \right) N_{i_1}T(t-s_1) x \right\| _{{\mathcal {H}}}^2 \mathrm{d}s. \end{aligned} \end{aligned}$$

An inflection of the integration domain shows then that both expressions (and hence the gramians) coincide.

Finally, the gramians satisfy the following Lyapunov equations for scalar Lévy-type noise (cf. [19] for the finite-dimensional analogue):

Lemma 5.6

Let \(\Phi \) be an exponentially stable flow in m.s.s. to (5.18) such that both gramians exist. Let \(x_1,y_1 \in D(A^*)\) and \(x_2,y_2 \in D(A),\) then

$$\begin{aligned} \begin{aligned}&\langle x_1, BB^* y_1 \rangle + \langle A^*x_1, {\mathscr {P}} y_1 \rangle + \langle x_1, {\mathscr {P}} A^*y_1 \rangle +\sum _{j=1}^n \langle N_j^*x_1, {\mathscr {P}} N_j^*y_1 \rangle \ \mathbb {E}(M^{(j)}(1)^2) =0 \text { and } \\&\langle x_2, C^*C y_2 \rangle + \langle Ax_2, {\mathscr {O}} y_2 \rangle + \langle x_2, {\mathscr {O}} Ay_2 \rangle +\sum _{j=1}^n \langle N_jx_2, {\mathscr {O}} N_jy_2 \rangle \ \mathbb {E}(M^{(j)}(1)^2) =0. \end{aligned} \end{aligned}$$

Proof

For every \(i \in \left\{ 1,\ldots ,n \right\} \), there is a weak formulation of the homogeneous solution to (5.17) [28, Theorem 9.15]

$$\begin{aligned} \begin{aligned} \langle \Phi (t)\psi _i,x_1 \rangle&= \langle \psi _i,x_1 \rangle + \int _0^t \langle \Phi (s) \psi _i,A^*x_1 \rangle \ \mathrm{d}s + \sum _{j=1}^n \int _0^t \langle \Phi (s)\psi _i, N_j^*x_1 \rangle \ \mathrm{d}M^{(j)}_s. \end{aligned} \end{aligned}$$

Stochastic integration by parts yields after summing over \(i \in \left\{ 1,\ldots ,n \right\} \)

$$\begin{aligned} \begin{aligned}&\langle \Phi (t)^*x_1,BB^* \Phi (t)^*y_1 \rangle = \langle x_1, BB^*y_1 \rangle + \sum _{i=1}^n \int _0^t \langle \Phi (s) \psi _i,x_1 \rangle _{-} \ \mathrm{d}\langle \Phi (s) \psi _i,y_1 \rangle \\&\quad + \sum _{i=1}^n \int _0^t \langle \Phi (s) \psi _i,y_1 \rangle _{-} \ \mathrm{d}\langle \Phi (s)\psi _i,x_1 \rangle + \sum _{i=1}^n \langle \langle x_1,\Phi (t) \psi _i \rangle , \langle \Phi (t)\psi _i ,y_1 \rangle \rangle _t \end{aligned} \end{aligned}$$

where the subscript − indicates left limits.

From the quadratic variation process [19, Eq.(8)]

$$\begin{aligned} \sum _{i=1}^{n}{\mathbb {E}} \langle \langle x_1,\Phi (t) \psi _i \rangle \rangle _t=\sum _{j=1}^{n}{\mathbb {E}} \left( \int _0^t \langle \Phi (s)^* N_j^*x_1,BB^*\Phi (s)^* N_j^*x_1 \rangle \ \mathrm{d}s \right) \mathbb E(M^{(j)}(1)^2), \end{aligned}$$

we obtain together with the martingale property of the stochastic integral

$$\begin{aligned} \begin{aligned} \mathbb {E} \left( \langle \Phi (t)^*x_1,BB^* \Phi (t)^*y_1 \rangle \right)&= \langle x_1, BB^*y_1 \rangle +{\mathbb {E}} \left( \int _0^t \langle \Phi (s)^* A^*x_1,BB^*\Phi (s)^* y_1 \rangle \ \mathrm{d}s \right) \\&\quad + {\mathbb {E}} \left( \int _0^t \langle \Phi (s)^* x_1,BB^*\Phi (s)^* A^*y_1 \rangle \ \mathrm{d}s \right) \\&\quad +\sum _{j=1}^{n}{\mathbb {E}} \left( \int _0^t \langle \Phi (s)^* N_j^*x_1,BB^*\Phi (s)^* N_j^*y_1 \rangle \ \mathrm{d}s \right) \mathbb E\left( M^{(j)}(1)^2\right) . \end{aligned} \end{aligned}$$

Letting t tend to infinity, we obtain the first Lyapunov equation as by exponential stability \(\lim _{t \rightarrow \infty }\mathbb {E} \left( \left\langle x_1, \Phi (t) \psi _i \right\rangle \left\langle \Phi (t)\psi _i,y_1 \right\rangle \right) =0.\)

The second Lyapunov equation can be obtained by an analogous calculation: Let \(x_0 \in X\) be arbitrary, then we study the evolution for initial conditions \(\sqrt{C^*C}x_0\) in the weak sense of the adjoint flow

$$\begin{aligned} \begin{aligned} \left\langle \Psi (t) \sqrt{C^*C}x_0,x_2 \right\rangle&= \left\langle \sqrt{C^*C}x_0,x_2 \right\rangle + \int _0^t \left\langle \Psi (s) \sqrt{C^*C}x_0, Ax_2 \right\rangle \ \mathrm{d}s \\&\quad +\sum _{j=1}^n \int _0^t \left\langle \Psi (s) \sqrt{C^*C}x_0, N_jx_2 \right\rangle \ \mathrm{d}M^{(j)}_s. \end{aligned} \end{aligned}$$

Proceeding as before, stochastic integration by parts yields

$$\begin{aligned} \begin{aligned}&\mathbb {E} \left( \left\langle x_2, \Psi (t) \sqrt{C^*C}x_0 \right\rangle \left\langle \Psi (t) \sqrt{C^*C}x_0 ,y_2 \right\rangle \right) = \left\langle x_2, \sqrt{C^*C}x_0\right\rangle \left\langle \sqrt{C^*C} x_0,y_2 \right\rangle \\&\quad +{\mathbb {E}} \left( \int _0^t \left\langle \sqrt{C^*C}\Psi (s)^*Ax_2,(x_0 \otimes x_0) \sqrt{C^*C}\Psi (s)^*y_2 \right\rangle \mathrm{d}s \right) \\&\quad + {\mathbb {E}} \left( \int _0^t \left\langle \sqrt{C^*C}\Psi (s)^* x_2,(x_0 \otimes x_0)\sqrt{C^*C}\Psi (s)^*Ay_2 \right\rangle \mathrm{d}s \right) \\&\quad +\sum _{j=1}^n {\mathbb {E}} \left( \int _0^t \left\langle \sqrt{C^*C}\Psi (s)^* N_jx_2,(x_0 \otimes x_0)\sqrt{C^*C}\Psi (s)^* N_jy_2 \right\rangle \mathrm{d}s \right) {\mathbb {E}}(M^{(j)}(1)^2). \end{aligned} \end{aligned}$$

Using Parseval’s identity, i.e. summing over an orthonormal basis replacing \(x_0\), yields after taking the limit \(t \rightarrow \infty \) the second Lyapunov equation. \(\square \)