1 Introduction

Many phenomena in real life can be described by partial differential equations (PDEs). For an accurate mathematical modeling of these real-world applications, it is often required to take random effects into account. Uncertainties in a PDE model can, for example, be represented by an additional noise term leading to stochastic PDEs (SPDEs) [11, 15, 28, 29].

It is often necessary to numerically approximate time-dependent SPDEs since analytic solutions do not exist in general. Discretizing in space can be considered as a first step. This can, for example, be done by spectral Galerkin [17, 19, 20] or finite element methods [2, 21, 22]. This usually leads to large-scale SDEs. Solving such complex SDE systems causes large computational cost. In this context, model order reduction (MOR) is used to save computational time by replacing high-dimensional systems by systems of low order in which the main information of the original system should be captured.

1.1 Literature review

Balancing related MOR schemes were developed for deterministic linear systems first. Famous representatives of this class of methods are balanced truncation (BT) [3, 26, 27] and singular perturbation approximation (SPA) [14, 23].

BT was extended in [5, 8] and SPA was generalized in [33] to stochastic linear systems. With this first extension, however, no \(L^2\)-error bound can be achieved [6, 12]. Therefore, an alternative approach based on a different reachability Gramian was studied for stochastic linear systems leading to an \(L^2\)-error bound for BT [12] and for SPA [32].

BT [1, 5] and SPA [18] were also generalized to bilinear systems, which we refer to as the standard approach for these systems. Although bilinear terms are very weak nonlinearities, they can be seen as a bridge between linear and nonlinear systems. This is because many nonlinear systems can be represented by bilinear systems using a so-called Carleman linearization. Applications of these equations can be found in various fields [10, 25, 34]. A drawback of the standard approach for bilinear systems is that no \(L^2\)-error bound could be shown so far. A first error bound for the standard ansatz was recently proved in [4], where an output error bound in \(L^\infty \) was formulated for infinite dimensional bilinear systems. Based on the alternative choice of Gramians in [12], a new type of BT for bilinear systems was considered [31] providing an \(L^2\)-error bound under the assumption of a possibly small bound on the controls.

A more general setting extending both the stochastic linear and the deterministic bilinear case is investigated in [30]. There, BT was studied and an \(L^2\)-error bound was proved overcoming the restriction of bounded controls in [31]. In this paper, we consider SPA for the same setting as in [30] in order to generalize the work in [18]. Moreover, we modify the reduced order model (ROM) in comparison to [18] and show an \(L^2\)-error bound which closes the gap in the theory in this context.

For further extensions of balancing related MOR techniques to other nonlinear systems, we refer to [7, 35].

1.2 Outline of the paper

This work on SPA for stochastic bilinear systems, see (1), can be interpreted as a generalization of the deterministic bilinear case [18]. This extension builds a bridge between stochastic linear systems and stochastic nonlinear systems such that SPA can possibly be used for many more stochastic equations and applications.

In Sect. 2, the procedure of SPA is described and the ROM is stated. With this, we provide an alternative to [30], where BT was studied for the same kind of systems. We also extend the work of [18] combined with a modification of the ROM and the choice of a new Gramian, compare with (3). Based on this, we obtain an error bound in Sect. 3 that was not available even for the deterministic bilinear case. Its proof requires new techniques that cannot be found in the literature so far and this is the main result of this paper. The efficiency of the error bound is shown in Sect. 5. There, the proposed version of SPA is compared with the one in [18] and with BT [30].

2 Setting and ROM

Let every stochastic process appearing in this paper be defined on a filtered probability space \(\left( \varOmega , {\mathcal {F}}, \left( {\mathcal {F}}_t\right) _{t\ge 0}, \mathbb P\right) \).Footnote 1 Suppose that \(M=\left( M_1, \ldots , M_{v}\right) ^T\) is an \(\left( {\mathcal {F}}_t\right) _{t\ge 0}\)-adapted and \({\mathbb {R}}^{v}\)-valued mean zero Lévy process with \({\mathbb {E}} \left\| M(t)\right\| ^2_2={\mathbb {E}}\left[ M^T(t)M(t)\right] <\infty \) for all \(t\ge 0\). Moreover, we assume that for all \(t, h\ge 0\) the random variable \(M\left( t+h\right) -M\left( t\right) \) is independent of \({\mathcal {F}}_t\).

We consider a large-scale stochastic control system with bilinear drift that can be interpreted as a spatially discretized SPDE. We investigate the system

$$\begin{aligned} \mathrm{d}x(t)&=[A x(t)+ Bu(t) + \sum _{k=1}^m N_k x(t) u_k(t)]\mathrm{d}t+ \sum _{i=1}^v H_i x(t-) \mathrm{d}M_i(t), \end{aligned}$$
(1a)
$$\begin{aligned} y(t)&= {C} x(t),\;\;\;t\ge 0. \end{aligned}$$
(1b)

We assume that \(A, N_k, H_i\in {\mathbb {R}}^{n\times n}\) (\(k\in \left\{ 1, \ldots , m\right\} \) and \(i\in \left\{ 1, \ldots , v\right\} \)), \(B\in {\mathbb {R}}^{n\times m}\) and \(C\in {\mathbb {R}}^{p\times n}\). Moreover, we define \(x(t-):=\lim _{s\uparrow t} x(s)\). The control \(u=\left( u_1, \ldots , u_{m}\right) ^T\) is assumed to be deterministic and square integrable, i.e.,

$$\begin{aligned} \left\| u\right\| _{L^2_T}^2:=\int _0^T \left\| u(t)\right\| _2^2 \mathrm{d}t<\infty \end{aligned}$$

for every \(T>0\). By [28, Theorem 4.44] there is a matrix \(K=\left( k_{ij}\right) _{i, j=1, \ldots , v}\) such that \({\mathbb {E}}[M(t)M^T(t)]=K t\). K is called covariance matrix of M.

In this paper, we study SPA to obtain a ROM. SPA is a balancing related method and relies on defining a reachability Gramian P and an observability Gramian Q. These two matrices are selected such that P characterizes the states that barely contribute to the dynamics in (1a) and Q identifies the less important states in (1b), see [30] for estimates on the reachability and observability energy functionals. The estimates in [30] are global, whereas the standard choice of Gramians leads to results being valid in a small neighborhood of zero only [5, 16].

In order to ensure the existence of these Gramians, throughout the paper it is assumed that

$$\begin{aligned} \lambda \left( A\otimes I+I\otimes A+\sum _{k=1}^m N_k\otimes N_k+\sum _{i, j=1}^v H_i\otimes H_j k_{ij}\right) \subset {\mathbb {C}}_-. \end{aligned}$$
(2)

Here, \(\lambda \left( \cdot \right) \) denotes the spectrum of a matrix. The reachability Gramian P and the observability Gramian Q are, according to [30], defined as the solutions to

$$\begin{aligned} A^T P^{-1}+P^{-1}A+\sum _{k=1}^m N^T_k P^{-1} N_k + \sum _{i, j=1}^v H_i^T P^{-1} H_j k_{i j}&\le -P^{-1}BB^T P^{-1}, \end{aligned}$$
(3)
$$\begin{aligned} A^T Q+Q A+\sum _{k=1}^m N_k^T Q N_k +\sum _{i, j=1}^v H_i^T Q H_j k_{ij}&\le -C^T C, \end{aligned}$$
(4)

where the existence of a positive definite solution to (3) goes back to [12, 32] and is ensured if (2) holds.

We approximate the large scale system (1) by a system which has a much smaller state dimension \(r\ll n\). This ROM is supposed be chosen, such that the corresponding output \(y_r\) is close to the original one, i.e., \(y_r\approx y\) in some metric. In order to be able to remove both the unimportant states in (1a) and (1b) simultaneously, the first step of SPA is a state space transformation

$$\begin{aligned} (A, B, C, H_i, N_k)\mapsto ({\tilde{A}}, {\tilde{B}}, {\tilde{C}}, {\tilde{H}}_i, {\tilde{N}}_k):=(SAS^{-1}, SB, CS^{-1}, SH_iS^{-1}, SN_kS^{-1}), \end{aligned}$$

where \(S=\varSigma ^{-\tfrac{1}{2}} X^T L_Q^T \) and \(S^{-1}=L_PY\varSigma ^{-\tfrac{1}{2}}\). The ingredients of the balancing transformation are computed by the Cholesky factorizations \(P=L_PL_P^T\), \(Q=L_QL_Q^T\), and the singular value decomposition \(X\varSigma Y^T=L_Q^TL_P\). This transformation does not change the output y of the system, but it guarantees that the new Gramians are diagonal and equal, i.e., \(S P S^T=S^{-T}Q S^{-1}=\varSigma ={\text {diag}}(\sigma _1,\ldots , \sigma _n)\) with \(\sigma _1\ge \ldots \ge \sigma _n\) being the Hankel singular values (HSVs) of the system.

We partition the balanced coefficients of (1) as follows:

$$\begin{aligned} {\tilde{A}}=\left[ {\begin{matrix}{A}_{11}&{}{A}_{12}\\ {A}_{21}&{}{A}_{22}\end{matrix}}\right] ,\;{\tilde{B}}=\left[ {\begin{matrix}{B}_1 \\ B_2\end{matrix}}\right] ,\; {\tilde{N}}_k=\left[ {\begin{matrix}{N}_{k, 11}&{}{N}_{k, 12}\\ {N}_{k, 21}&{}{N}_{k, 22}\end{matrix}}\right] ,\;{\tilde{H}}_i=\left[ {\begin{matrix}{H}_{i, 11}&{}{H}_{i, 12}\\ {H}_{i, 21}&{}{H}_{i, 22}\end{matrix}}\right] ,\;{\tilde{C}}= \left[ {\begin{matrix}{C}_1&C_2\end{matrix}}\right] , \end{aligned}$$
(5)

where \(A_{11}, N_{k, 11}, H_{i, 11}\in {\mathbb {R}}^{r\times r}\) (\(k\in \left\{ 1, \ldots , m\right\} \) and \(i\in \left\{ 1, \ldots , v\right\} \)), \(B_1\in {\mathbb {R}}^{r\times m}\) and \(C_1\in {\mathbb {R}}^{p\times r}\) etc. Furthermore, we partition the state variable \({\tilde{x}}\) of the balanced system and the diagonal matrix of HSVs

$$\begin{aligned} {\tilde{x}}=\left[ \begin{array}{c} x_1 \\ x_2\end{array}\right] \text { and }\varSigma =\left[ \begin{array}{cc} \varSigma _1&{} \\ &{} \varSigma _2\end{array}\right] , \end{aligned}$$
(6)

where \(x_1\) takes values in \({\mathbb {R}}^r\) (\(x_2\) accordingly), \(\varSigma _1\) is the diagonal matrix of large HSVs and \(\varSigma _2\) contains the small ones.

Remark 1

The balancing procedure requires the computation of the Gramians from (3) and (4). Practically, one always computes the solution of the equation in (4). The reason why an inequality is considered is that the proof of the error bound in Theorem 3 does not need an equality in (4). However, it is essential that we consider an inequality in (3). In contrast to the equation that may not have a solution, the inequality always has a solution under the given assumptions, but some regularization is needed to enforce uniqueness. In particular, one solves an optimization problem like, e.g., minimize \({\text {tr}}(P)\) subject to (3). The reason why the trace is minimized is that one wants to achieve small HSVs, because this ensures a small error according to Theorem 3. We refer to Sect. 5 for more details on the computation of P.

Based on the balanced full model (1) with matrices as in (5), the ROM is obtained by neglecting the state variables \(x_2\) corresponding to the small HSVs. The ROM using SPA is obtained by setting \(\mathrm{d}x_2(t)=0\) and furthermore neglecting the diffusion and bilinear term in the equation related to \(x_2\). Note that the condition \(\mathrm{d}x_2(t)=0\) is almost surely false. However, we enforce it since it leads to a ROM with remarkable properties. After setting \(\mathrm{d}x_2(t)=0\) it is no simplification to not take the diffusion into account since it would follow automatically that it is zero within the resulting algebraic constraint due to the consideration in [33, Section 2]. Assuming that the bilinear term is equal to zero in the equation is needed, so that the matrices of the ROM below do not depend on the control u. The dependence on u is something that is not desired. With this simplification, one can solve for \(x_2\) in the algebraic constraint. This leads to \(x_2(t)=-A_{22}^{-1}(A_{21} x_1(t)+B_2 u(t))\). Inserting this expression into the equation for \(x_1\) and into the output equation, the reduced system is

$$\begin{aligned} \mathrm{d}{\bar{x}}&=\left[ {\bar{A}} {\bar{x}}+{\bar{B}} u+\sum _{k=1}^m ({\bar{N}}_{k} {\bar{x}} + {\bar{E}}_{k} u)u_k\right] \mathrm{d}t+\sum _{i=1}^v ({\bar{H}}_{i} {\bar{x}}+ {\bar{F}}_{i} u) \mathrm{d}M_i, \end{aligned}$$
(7a)
$$\begin{aligned} {\bar{y}}(t)&={\bar{C}}{\bar{x}}(t)+ {\bar{D}} u(t), \;\;\;t\ge 0, \end{aligned}$$
(7b)

with matrices defined by

$$\begin{aligned} {\bar{A}}&:=A_{11}- A_{12} A_{22}^{-1} A_{21},\;\;\;{\bar{B}}:=B_1-A_{12}A_{22}^{-1} B_2,\;\;\;{\bar{C}}:=C_1-C_2 A_{22}^{-1} A_{21},\\ {\bar{D}}&:=-C_2 A_{22}^{-1} B_2,\;\;\; \quad \quad \;\,{\bar{E}}_{k}:=-N_{k, 12} A_{22}^{-1} B_2,\;\;\quad {\bar{F}}_{i}:=-H_{i, 12} A_{22}^{-1} B_2,\\ {\bar{H}}_i&:=H_{i, 11}-H_{i, 12} A_{22}^{-1} A_{21}, \;\;\;{\bar{N}}_k:=N_{k, 11}-N_{k, 12} A_{22}^{-1} A_{21}, \end{aligned}$$

where \({\bar{x}}(0)=0\) and the time dependence in (7a) is omitted to shorten the notation. This straightforward ansatz is based on observations from the deterministic case (\(N_k=H_i=0\)), where \(x_2\) represents the fast variables, i.e., \(\dot{x}_2(t) \approx 0\) after a short time, see [23].

This ansatz for stochastic systems might, however, be false, no matter how small the HSVs corresponding to \(x_2\) are. Despite the fact that, for the motivation, a possibly less convincing argument is used, this leads to a viable MOR method for which an error bound can be proved. An averaging principle would be a mathematically well-founded alternative to this naive approach. Averaging principles for stochastic systems have for example been investigated in [36, 37]. A further strategy to derive a ROM in this context can be found in [9].

Moreover, notice that system (7) is not a bilinear system anymore due to the quadratic term in the control u. This is an essential difference to the ROM proposed in [18]. One can think about a structure preserving version by setting \(B_2=0\) in (7). This would lead to a generalized variant of the ROMs considered in [18, 33]. The reason why this simplified method is not studied is, because the error bound in Theorem 3 could not be achieved. We refer to a further discussion below Theorem 3 and to Sect. 5 where (7) is compared numerically with the version obtained by choosing \(B_2=0\).

Remark 2

Notice that if \(\sigma _r\ne \sigma _{r+1}\), then (2) implies

$$\begin{aligned} \lambda \left( A_{ll}\otimes I+I\otimes A_{ll}+\sum _{k=1}^m N_{k, ll}\otimes N_{k, ll}+\sum _{i, j=1}^v H_{i, ll}\otimes H_{j, ll} k_{ij}\right) \subset {\mathbb {C}}_- \end{aligned}$$

for \(l=1, 2\) due to considerations in [6]. This implies \(\lambda \left( A_{ll}\right) \subset {\mathbb {C}}_-\) and hence guarantees the existence of \(A^{-1}_{ll}\) for \(l=1, 2\).

3 \(L^2\)-error bound for SPA

The proof of the main result (Theorem 3) is divided into two parts. We first investigate the error that we encounter by removing the smallest HSV from the system in Sect. 3.1. In this reduction step, the structure from the full model (1) to the ROM (7) changes. Therefore, when removing the other HSVs from the system, another case needs to be studied in Sect. 3.2. There, an error bound between two ROM is achieved which are neighboring, i.e., the larger ROM has exactly one HSV more than the smaller one. The results of Sects. 3.1 and 3.2 are then combined in Sect. 3.3 in order to prove the general error bound.

For simplicity, let us from now on assume that system (1) is already balanced and has a zero initial condition (\(x_0=0\)). Thus, (3) and (4) become

$$\begin{aligned} A^T \varSigma ^{-1}+\varSigma ^{-1}A+\sum _{k=1}^m N_k^T \varSigma ^{-1} N_k + \sum _{i, j=1}^v H_i^T \varSigma ^{-1} H_j k_{i j}&\le -\varSigma ^{-1}BB^T \varSigma ^{-1}, \end{aligned}$$
(8)
$$\begin{aligned} A^T \varSigma +\varSigma A+\sum _{k=1}^m N_k^T \varSigma N_k +\sum _{i, j=1}^v H_i^T \varSigma H_j k_{ij}&\le -C^T C, \end{aligned}$$
(9)

i.e., \(P=Q=\varSigma ={\text {diag}}(\sigma _1, \ldots , \sigma _n)>0\).

3.1 Error bound of removing the smallest HSV

We consider the balanced system (1) with partitions as in (5) and (6). As mentioned before, the reduced state equation is obtained by approximating \(x_2\) through \({\bar{x}}_2:= - A_{22}^{-1}(A_{21} {\bar{x}} + B_2u)\) in the differential equation for \(x_1\) and within the output Eq. (1b). The ROM (7) can hence be rewritten as

$$\begin{aligned} \mathrm{d}{\bar{x}}&=\left[ A_1\left[ {\begin{matrix}{{\bar{x}}} \\ {{\bar{x}}_2}\end{matrix}}\right] + B_1 u+\sum _{k=1}^m N_{k, 1} \left[ {\begin{matrix}{{\bar{x}}} \\ {{\bar{x}}_2}\end{matrix}}\right] u_k\right] \mathrm{d}t +\sum _{i=1}^v H_{i, 1} \left[ {\begin{matrix}{{\bar{x}}} \\ {{\bar{x}}_2}\end{matrix}}\right] \mathrm{d}M_i, \end{aligned}$$
(10a)
$$\begin{aligned} {\bar{y}}(t)&= C \left[ {\begin{matrix}{{\bar{x}}}(t) \\ {{\bar{x}}_2}(t)\end{matrix}}\right] , \;\;\;t\ge 0, \end{aligned}$$
(10b)

where we define

$$\begin{aligned} A_1=\left[ {\begin{matrix}{A}_{11}&{A}_{12} \end{matrix}}\right] ,\; H_{i, 1}=\left[ {\begin{matrix}{H}_{i, 11}&{H}_{i, 12}\end{matrix}}\right] ,\; N_{k, 1}=\left[ {\begin{matrix}{N}_{k, 11}&{N}_{k, 12}\end{matrix}}\right] . \end{aligned}$$

We want to be able to subtract the ROM (10) from (1). Therefore, we add the following zero line to (10a)

$$\begin{aligned} d 0&=\left[ \left[ {\begin{matrix}{A}_{21}&A_{22}\end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}} \\ {{\bar{x}}_2}\end{matrix}}\right] + B_{2} u - c_0+\sum _{k=1}^m \left[ {\begin{matrix}{N}_{k, 21}&N_{k, 22} \end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}} \\ {{\bar{x}}_2}\end{matrix}}\right] u_k \right] \mathrm{d}t\\&\quad +\,\sum _{i=1}^v \left[ \left[ {\begin{matrix}{H}_{i, 21}&{H}_{i, 22}\end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}} \\ {{\bar{x}}_2}\end{matrix}}\right] - c_i \right] \mathrm{d}M_i\nonumber \end{aligned}$$
(11)

with the compensation terms \(c_0(t):=\sum _{k=1}^m \left[ {\begin{matrix}{N}_{k, 21}&N_{k, 22} \end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}(t)} \\ {{\bar{x}}_2(t)}\end{matrix}}\right] u_k(t)\) and \(c_i(t):=\left[ {\begin{matrix}{H}_{i, 21}&{H}_{i, 22}\end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}(t)} \\ {{\bar{x}}_2(t)}\end{matrix}}\right] \) for \(i=1, \ldots , v\). Subtracting (10) from (1) together with (11) yields the following error system

$$\begin{aligned} \mathrm{d}x_-&=\left[ A \mathbf{x }_{-}+ \left[ {\begin{matrix}{0} \\ c_0\end{matrix}}\right] +\sum _{k=1}^m N_{k} \mathbf{x }_{-} u_k\right] \mathrm{d}t +\sum _{i=1}^v \left[ H_{i} \mathbf{x }_{-} + \left[ {\begin{matrix}{0} \\ c_i\end{matrix}}\right] \right] \mathrm{d}M_i, \end{aligned}$$
(12a)
$$\begin{aligned} y_{-}(t)&=C \mathbf{x }_{-}(t) = y(t)-{\bar{y}}(t), \;\;\;t\ge 0, \end{aligned}$$
(12b)

where we introduce the variables \(\mathbf{x }_{-} =\left[ {\begin{matrix}{x}_1-{\bar{x}} \\ x_2-{\bar{x}}_2\end{matrix}}\right] \) and \(x_-=\left[ {\begin{matrix}{x}_1-{\bar{x}} \\ x_2\end{matrix}}\right] \). Moreover, adding (1a) and (10a) combined with (11) leads to

$$\begin{aligned} \mathrm{d}x_+=\left[ A\mathbf{x }_{+}+2 B u- \left[ {\begin{matrix}{0} \\ c_0\end{matrix}}\right] +\sum _{k=1}^m N_{k} \mathbf{x }_{+} u_k\right] \mathrm{d}t +\sum _{i=1}^v \left[ H_{i} \mathbf{x }_{+} - \left[ {\begin{matrix}{0} \\ c_i\end{matrix}}\right] \right] \mathrm{d}M_i, \end{aligned}$$
(13)

setting \(\mathbf{x }_{+}=\left[ {\begin{matrix}{x}_1+{\bar{x}} \\ x_2+{\bar{x}}_2\end{matrix}}\right] \) and \(x_+=\left[ {\begin{matrix}{x}_1+{\bar{x}} \\ x_2\end{matrix}}\right] \).

Before we prove an error bound based on (12a) and (13), we need to introduce the vector \(u^0\) of control components with a nonzero \(N_k\). This is given by

$$\begin{aligned} u^0=(u^0_1, \dots , u_m^0)^T,\quad \text {where}\quad u_k^0 \equiv {\left\{ \begin{array}{ll} 0 &{} \text {if }N_k = 0,\\ u_k &{} \text {else}. \end{array}\right. } \end{aligned}$$
(14)

The proof of the error bound when only one HSV is removed can be reduced to the task of finding suitable estimates for \({\mathbb {E}}[x_-^T(t) \varSigma x_-(t)]\) and \({\mathbb {E}}[x_+^T(t) \varSigma ^{-1} x_+(t)]\). In particular, the rough idea is to apply Ito’s lemma to \({\mathbb {E}}[x_-^T(t) \varSigma x_-(t)]\) and subsequently the lemma of Gronwall. Then, a first estimate for (12b) is obtained, i.e.,

$$\begin{aligned} {\mathbb {E}}[x_-^T(t) \varSigma x_-(t)]\le -{\mathbb {E}}\left\| y-{\bar{y}}\right\| _{L^2_{t}}^2 + \sigma ^2 f(t), \end{aligned}$$

where f depends on the control u, the state x, the compensation terms \(c_0, \ldots , c_v\) and \(\varSigma _2^{-1}\) assuming \(\varSigma _2= \sigma I\). Of course, the dependence of the above estimate on the state and the compensation terms is not desired. That is why another inequality is derived by using Ito’s and Gronwall’s lemma for \({\mathbb {E}}[x_+^T(t) \varSigma ^{-1} x_+(t)]\) which yields

$$\begin{aligned} {\mathbb {E}}[x_+^T(t) \varSigma ^{-1} x_+(t)] \le -f(t) + 4\left\| u\right\| _{L^2_t}^2 \exp \left( \int _0^t \left\| u^0(s)\right\| _2^2\mathrm{d}s\right) . \end{aligned}$$

Combining both inequalities, the result of the next theorem is obtained. A similar idea was also used to determine an error bound for BT [30]. However, the proof for SPA requires different techniques to find the estimates sketched above.

Theorem 1

Let y be the output of the full model (1) with \(x(0)=0\), \({\bar{y}}\) be the output of the ROM (7) with \({\bar{x}}(0)=0\) and \(\varSigma _2=\sigma I\), \(\sigma >0\), in (6). Then, the following holds:

$$\begin{aligned} \left( {\mathbb {E}}\left\| y-{\bar{y}}\right\| _{L^2_{T}}^2\right) ^{\frac{1}{2}}\le 2 \sigma \left\| u\right\| _{L^2_T}\exp \left( 0.5 \left\| u^0\right\| _{L^2_T}^2\right) . \end{aligned}$$

Proof

In order to improve the readability of this paper, the proof is given in Sect. 4.1\(\square \)

We proceed with the study of an error bound between two ROM that are neighboring.

3.2 Error bound for neighboring ROMs

In this section, we investigate the output error between two ROMs, in which the larger ROM has exactly one more HSV than the smaller one. This concept of neighboring ROMs was first introduced in [32] but in the much simpler stochastic linear setting.

The reader might wonder why a second case is considered besides the one in Sect. 3.1 since one might just start with a full model that has the same structure as the ROM (7). The reason is that it is not clear how the Gramians need to be chosen for (7). In order to investigate the error between two ROMs by SPA, a finer partition than the one in (5) is required. We partition the matrices of the balanced full system (1) as follows:

$$\begin{aligned} A&=\left[ {\begin{matrix}{A}_{11}&{}{A}_{12}&{}A_{13}\\ {A}_{21}&{}{A}_{22}&{}A_{23}\\ {A}_{31}&{}{A}_{32}&{}A_{33}\end{matrix}}\right] ,\quad B=\left[ {\begin{matrix}{B}_1 \\ B_2\\ B_3\end{matrix}}\right] ,\quad C= \left[ {\begin{matrix}{C}_1&C_2&C_3\end{matrix}}\right] , \end{aligned}$$
(15a)
$$\begin{aligned} H_i&=\left[ {\begin{matrix}{H}_{i, 11}&{}{H}_{i, 12}&{}{H}_{i, 13}\\ {H}_{i, 21}&{}{H}_{i, 22}&{}{H}_{i, 23}\\ {H}_{i, 31}&{}{H}_{i, 32}&{}{H}_{i, 33}\end{matrix}}\right] ,\quad N_k=\left[ {\begin{matrix}{N}_{k, 11}&{}{N}_{k, 12}&{}{N}_{k, 13}\\ {N}_{k, 21}&{}{N}_{k, 22}&{}{N}_{k, 23}\\ {N}_{k, 31}&{}{N}_{k, 32}&{}{N}_{k, 33}\end{matrix}}\right] . \end{aligned}$$
(15b)

The partitioned balanced solution to (1a) and the Gramians are then of the form

$$\begin{aligned} x=\left[ {\begin{matrix}{x}_{1}\\ x_{2}\\ x_{3}\end{matrix}}\right] \; \text {and} \; \varSigma =\left[ {\begin{matrix}{\varSigma }_{1}&{} &{} \\ &{}\varSigma _{2}&{} \\ &{} &{}\varSigma _{3}\end{matrix}}\right] . \end{aligned}$$
(16)

We introduce the ROM of truncating \(\varSigma _3\) first. According to the procedure described in Sect. 2, the reduced system is obtained by setting \(\mathrm{d}x_3\) equal to zero, neglecting the bilinear and the diffusion term in this equation. The solution \({\bar{x}}_3\) of the resulting algebraic constraint is an approximation for \(x_3\). One can solve for this approximating variable and obtains \({\bar{x}}_3=-A_{33}^{-1}(A_{31}x_1+A_{32}x_2+B_3u)\). Inserting this result for \(x_3\) in the equations for \(x_1\), \(x_2\) and into the output Eq. (1b) leads to

$$\begin{aligned} \mathrm{d}\left[ {\begin{matrix}{x}_1 \\ x_2\end{matrix}}\right]&=\left[ {\hat{A}}\left[ {\begin{matrix}{x}_1 \\ x_2\\ {\bar{x}}_3\end{matrix}}\right] +{\hat{B}} u+\sum _{k=1}^m {\hat{N}}_k \left[ {\begin{matrix}{x}_1 \\ x_2\\ {\bar{x}}_3\end{matrix}}\right] u_k\right] \mathrm{d}t +\sum _{i=1}^v {\hat{H}}_i \left[ {\begin{matrix}{x}_1 \\ x_2\\ {\bar{x}}_3\end{matrix}}\right] \mathrm{d}M_i, \end{aligned}$$
(17a)
$$\begin{aligned} {\bar{y}}(t)&= C \left[ {\begin{matrix}{x}_1(t) \\ x_2(t)\\ {\bar{x}}_3(t)\end{matrix}}\right] , \;\;\;t\ge 0, \end{aligned}$$
(17b)

where \(\left[ {\begin{matrix}{x}_1(0) \\ x_2(0)\end{matrix}}\right] =\left[ {\begin{matrix}{0} \\ 0\end{matrix}}\right] \) and

$$\begin{aligned} {\hat{A}}=\left[ {\begin{matrix}{A}_{11}&{}{A}_{12}&{}A_{13}\\ {A}_{21}&{}{A}_{22}&{}A_{23}\end{matrix}}\right] ,\; {\hat{B}}=\left[ {\begin{matrix}{B}_1 \\ B_2\end{matrix}}\right] , \; {\hat{H}}_i=\left[ {\begin{matrix}{H}_{i, 11}&{}{H}_{i, 12}&{}{H}_{i, 13}\\ {H}_{i, 21}&{}{H}_{i, 22}&{}{H}_{i, 23}\end{matrix}}\right] ,\; {\hat{N}}_k=\left[ {\begin{matrix}{N}_{k, 11}&{}{N}_{k, 12}&{}{N}_{k, 13}\\ {N}_{k, 21}&{}{N}_{k, 22}&{}{N}_{k, 23}\end{matrix}}\right] . \end{aligned}$$

We aim to determine the error between this ROM and the reduced system of neglecting \(\varSigma _2\) and \(\varSigma _3\). This is

$$\begin{aligned} \mathrm{d}{\bar{x}}_r&=\left[ {\hat{A}}_r \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] +B_1 u+\sum _{k=1}^m {\hat{N}}_{r, k} \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] u_k\right] \mathrm{d}t + \sum _{i=1}^v {\hat{H}}_{r, i} \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] \mathrm{d}M_i, \end{aligned}$$
(18a)
$$\begin{aligned} {\bar{y}}_r(t)&=\left[ {\begin{matrix}{C}_1&C_2&C_3 \end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}_r(t)} \\ h_1(t)\\ h_2(t)\end{matrix}}\right] , \;\;\;t\ge 0, \end{aligned}$$
(18b)

where \({\bar{x}}_r(0)=0\),

$$\begin{aligned} {\hat{A}}_r=\left[ {\begin{matrix}{A}_{11}&{A}_{12}&A_{13}\end{matrix}}\right] ,\; {\hat{H}}_{r, i}=\left[ {\begin{matrix}{H}_{i, 11}&{H}_{i, 12}&{H}_{i, 13}\end{matrix}}\right] ,\; {\hat{N}}_{r, k}=\left[ {\begin{matrix}{N}_{k, 11}&{N}_{k, 12}&{N}_{k, 13}\end{matrix}}\right] \end{aligned}$$

and we define

$$\begin{aligned} h(t)=\left[ {\begin{matrix}{h}_1(t) \\ h_2(t)\end{matrix}}\right] = -\left[ {\begin{matrix}{A}_{22}&{} A_{23}\\ A_{32}&{} A_{33}\end{matrix}}\right] ^{-1} \left( \left[ {\begin{matrix}{A}_{21}\\ {A}_{31}\end{matrix}}\right] {\bar{x}}_r(t)+\left[ {\begin{matrix}{B}_{2}\\ {B}_{3}\end{matrix}}\right] u(t)\right) . \end{aligned}$$
(19)

In order to find a bound for the error between (17b) and (18b), state variables analogous to \(\mathbf{x }_-\) and \(\mathbf{x }_+\) in Sect. 3.1 are constructed in the following and corresponding equations are derived. For simplicity, we use a similar notation again and define

$$\begin{aligned} \hat{\mathbf{x }}_-=\left[ {\begin{matrix}{x}_1-{\bar{x}}_r \\ x_2-h_1\\ {\bar{x}}_3 - h_2\end{matrix}}\right] \; \text {and}\; \hat{\mathbf{x }}_+=\left[ {\begin{matrix}{x}_1+{\bar{x}}_r \\ x_2 + h_1\\ {\bar{x}}_3 + h_2\end{matrix}}\right] . \end{aligned}$$

Now, we find the differential equations for \(\hat{\mathbf{x }}_-\) and \(\hat{\mathbf{x }}_+\). Using (19), we find that

$$\begin{aligned} \left[ {\begin{matrix}{A}_{21}&{}{A}_{22}&{} A_{23}\\ {A}_{31}&{}{A}_{32}&{} A_{33} \end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right]&= \left[ {\begin{matrix}{A}_{21}\\ {A}_{31}\end{matrix}}\right] {\bar{x}}_r + \left[ {\begin{matrix}{A}_{22}&{} A_{23}\\ {A}_{32}&{} A_{33} \end{matrix}}\right] h\nonumber \\&= \left[ {\begin{matrix}{A}_{21}\\ {A}_{31}\end{matrix}}\right] {\bar{x}}_r -\left[ {\begin{matrix}{A}_{22}&{} A_{23}\\ {A}_{32}&{} A_{33} \end{matrix}}\right] \left[ {\begin{matrix}{A}_{22}&{} A_{23}\\ {A}_{32}&{} A_{33} \end{matrix}}\right] ^{-1} \left( \left[ {\begin{matrix}{A}_{21}\\ {A}_{31}\end{matrix}}\right] {{\bar{x}}_r}+\left[ {\begin{matrix}{B}_{2}\\ {B}_{3}\end{matrix}}\right] u\right) \nonumber \\&= -\left[ {\begin{matrix}{B}_{2}\\ {B}_{3}\end{matrix}}\right] u. \end{aligned}$$
(20)

Applying the first line of (20), we obtain the following equation

$$\begin{aligned} d 0&=\left[ \left[ {\begin{matrix}{A}_{21}&{A}_{22}&A_{23}\end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] + {B}_{2} u - {\hat{c}}_0+\sum _{k=1}^m \left[ {\begin{matrix}{N}_{k, 21}&{N}_{k, 22}&{N}_{k, 23} \end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] u_k\right] \mathrm{d}t\nonumber \\&\quad +\sum _{i=1}^v \left[ \left[ {\begin{matrix}{H}_{i, 21}&{H}_{i, 22}&{H}_{i, 23} \end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] - {\hat{c}}_i\right] \mathrm{d}M_i, \end{aligned}$$
(21)

where \({\hat{c}}_0=\sum _{k=1}^m \left[ {\begin{matrix}{N}_{k, 21}&{N}_{k, 22}&{N}_{k, 23} \end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] u_k\) and \({\hat{c}}_i=\left[ {\begin{matrix}{H}_{i, 21}&{H}_{i, 22}&{H}_{i, 23} \end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] \) for \(i=1, \ldots , v\). We add the zero line (21) to the state Eq. (18a) and subtract the resulting system from (17). Hence, we obtain

$$\begin{aligned} \mathrm{d}{\hat{x}}_-&=\left[ {\hat{A}} \hat{\mathbf{x }}_- + \left[ {\begin{matrix}{0}\\ {\hat{c}}_0\end{matrix}}\right] +\sum _{k=1}^m \hat{N}_{k} \hat{\mathbf{x }}_- u_k\right] \mathrm{d}t +\sum _{i=1}^v \left[ \hat{H}_{i} \hat{\mathbf{x }}_- + \left[ {\begin{matrix}{0}\\ {\hat{c}}_i\end{matrix}}\right] \right] \mathrm{d}M_i, \end{aligned}$$
(22a)
$$\begin{aligned} {\hat{y}}_-(t)&= C\hat{\mathbf{x }}_-(t)= {\bar{y}}(t) - {\bar{y}}_r(t), \quad t\ge 0, \end{aligned}$$
(22b)

where \({\hat{x}}_- = \left[ {\begin{matrix}{x}_1 - {\bar{x}}_r \\ x_2\end{matrix}}\right] \). One can see that the output of (22) is the output error that we aim to analyze. The sum of (17a) and (18a) together with (21) leads to

$$\begin{aligned} \mathrm{d}{\hat{x}}_+&=\left[ {\hat{A}} \hat{\mathbf{x }}_++ 2 {\hat{B}} u - \left[ {\begin{matrix}{0}\\ {\hat{c}}_0\end{matrix}}\right] +\sum _{k=1}^m \hat{N}_{k} \hat{\mathbf{x }}_+ u_k\right] \mathrm{d}t +\sum _{i=1}^v \left[ \hat{H}_{i} \hat{\mathbf{x }}_+ - \left[ {\begin{matrix}{0}\\ {\hat{c}}_i\end{matrix}}\right] \right] \mathrm{d}M_i, \end{aligned}$$
(23)

where \({\hat{x}}_+ = \left[ {\begin{matrix}{x}_1 + {\bar{x}}_r \\ x_2\end{matrix}}\right] \). We now state the output error between the systems (17) and (18) for the case that the ROM are neighboring, i.e., the larger model has exactly one HSV more than the smaller one. Similarly as in Theorem 1, the proof relies on finding suitable estimates for \({\mathbb {E}}\left[ {\hat{x}}_-^T(t) {{\hat{\varSigma }}}{\hat{x}}_-(t)\right] \) and \({\mathbb {E}}\left[ {\hat{x}}_+^T(t) {{\hat{\varSigma }}} {\hat{x}}_+(t)\right] \), where \({\hat{\varSigma }}= \left[ {\begin{matrix}{\varSigma }_1&{} \\ &{} \varSigma _2\end{matrix}}\right] \) is a submatrix of \(\varSigma \) in (16).

Theorem 2

Let \({\bar{y}}\) be the output of the ROM (17), \({\bar{y}}_r\) be the output of the ROM (18) and \(\varSigma _2=\sigma I\), \(\sigma >0\), in (16). Then, it holds that

$$\begin{aligned} \left( {\mathbb {E}}\left\| {\bar{y}}-{\bar{y}}_r\right\| _{L^2_{T}}^2\right) ^{\frac{1}{2}}\le 2 \sigma \left\| u\right\| _{L^2_T}\exp \left( 0.5 \left\| u^0\right\| _{L^2_T}^2\right) . \end{aligned}$$

Proof

In order to improve the readability of this paper, the proof is presented later, see Sect. 4.2. \(\square \)

3.3 Main result

In the following, the main result of this paper is formulated. It is a consequence of Theorems 1 and 2.

Theorem 3

Let y be the output of the full model (1) with \(x(0)=0\) and \({\bar{y}}\) be the output of the ROM (7) with zero initial state. Then, for all \(T>0\), it holds that

$$\begin{aligned} \left( {\mathbb {E}}\left\| y-{\bar{y}}\right\| _{L^2_{T}}^2\right) ^{\frac{1}{2}}\le 2 (\sigma _{r+1}+\sigma _{r+2}+\ldots + \sigma _n) \left\| u\right\| _{L^2_T}\exp \left( 0.5 \left\| u^0\right\| _{L^2_T}^2\right) , \end{aligned}$$

where \(\sigma _{r+1}, \sigma _{r+2}, \ldots , \sigma _n\) are the diagonal entries of \(\varSigma _2\) and \(u^0=(u^0_1, \dots , u_m^0)^T\) is the control vector with components defined by \(u_k^0 \equiv {\left\{ \begin{array}{ll} 0 &{} \text {if }N_k = 0,\\ u_k &{} \text {else}. \end{array}\right. }\)

Proof

We apply the results in Theorems 1 and 2. We remove the HSVs step by step and exploit the triangle inequality in order to bound the error between the outputs y and \({\bar{y}}\). We introduce \({\bar{y}}_\ell \) as the output of ROM (7) with state space dimension \(\ell = r, r+1, \ldots , n-1\). Notice that \({\bar{y}}_r\) coincides with \({\bar{y}}\). Moreover, we set \({\bar{y}}_n := y\). We then have

$$\begin{aligned} \left( {\mathbb {E}} \left\| y-{\bar{y}}\right\| ^2_{L^2_T}\right) ^{\frac{1}{2}}\le \sum _{\ell =r+1}^n \left( {\mathbb {E}}\left\| {\bar{y}}_{\ell }-{\bar{y}}_{\ell -1} \right\| ^2_{L^2_T}\right) ^{\frac{1}{2}}. \end{aligned}$$

In the reduction step from y to \({\bar{y}}_{n-1}\) only the smallest HSV \(\sigma _n\) is removed from the system. Hence, by Theorem 1, we have

$$\begin{aligned} \left( {\mathbb {E}} \left\| y-{\bar{y}}_{n-1}\right\| _{L^2_T}\right) ^{\frac{1}{2}}\le 2 \sigma _n \left\| u\right\| _{L^2_T}\exp \left( 0.5 \left\| u^0\right\| _{L^2_T}^2\right) . \end{aligned}$$

The ROMs of the outputs \({\bar{y}}_{\ell }\) and \({\bar{y}}_{\ell -1}\) are neighboring according to Sect. 3.2, i.e., only the HSV \(\sigma _{\ell }\) is removed in the reduction step. By Theorem 2, we obtain

$$\begin{aligned} \left( {\mathbb {E}} \left\| {\bar{y}}_{\ell }-{\bar{y}}_{\ell -1}\right\| _{L^2_T}\right) ^{\frac{1}{2}}\le 2 \sigma _{\ell } \left\| u\right\| _{L^2_T} \exp \left( 0.5 \left\| u^0\right\| _{L^2_T}^2\right) \end{aligned}$$

for \(\ell =r+1, \ldots , n-1\). This provides the claimed result. \(\square \)

The result in Theorem 3 tells us that the ROM (7) yields a very good approximation if the truncated HSVs (diagonal entries of \(\varSigma _2\)) are small and the vector \(u^0\) of control components with a nonzero \(N_k\) is not too large. The exponential in the error bound can be an indicator that SPA performs badly if \(u^0\) is very large.

We conclude this section by a discussion on the result in Theorem 3. It is important to notice that the dependence of the error bound on the system matrices and the covariance matrix of M is hidden in the HSVs. Those are given by \(\sigma _\ell = \sqrt{\lambda _\ell (P Q)}\). We can see from (3) and (4) that the Gramians depend on \((A, B, C, H_i, N_k, K)\) and hence \(\sigma _\ell = \sigma _\ell (A, B, C, H_i, N_k, K)\). Consequently, changing the system matrices or the covariance matrix will change the HSVs too. Now, if K is large, the terms related to \(H_i\) in (3) and (4) become more dominant which results in larger HSVs. According to Theorem 3 a worse approximation can then be expected. We also observe that the exponential term in Theorem 3 is related to the bilinearity in the drift. Setting \(N_k=0\) for all \(k=1, \ldots , m\) the exponential becomes a one. This results in the bound that is known from the stochastic linear case [32]. Choosing \(H_i=0\) for all \(i=1, \ldots , v\) yields a bound for the deterministic bilinear case. Notice that in this case, the state variables of the full and reduced system are not random anymore such that the expected value is redundant and can hence be omitted. Finally, considering \(H_i = N_k = 0\) leads to the bound obtained for the deterministic linear case [23].

Since the ROM (7) has a different structure than (1), it is an important question why we do not use a structure preserving generalization of SPA considered in [18, 33]. This variant of SPA is obtained by setting \(B_2=0\) in (7). Conducting the proof of the error bound for this simplified method, one would obtain an additional term in the error bound that depends on \(\varSigma _1\), a possibly large matrix. This indicates that SPA with \(B_2=0\) probability performs worse than the version stated in (7). We refer to Sect. 5 where both versions of SPA are compared numerically.

4 Proofs of Theorems 1 and 2

In this section, we present the pending proofs of Theorems 1 and 2.

4.1 Proof of Theorem 1

We derive a suitable upper bound for \({\mathbb {E}}[x_-^T(t) \varSigma x_-(t)]\) first applying Ito’s formula. Hence, Lemma 1 (“Appendix”) and Eq. (12a) yield

$$\begin{aligned} {\mathbb {E}}\left[ x_-^T(t)\varSigma x_-(t)\right]&=2 \int _0^t{\mathbb {E}}\left[ x_-^T\varSigma \left( A\mathbf{x }_- +\sum _{k=1}^m (N_{k} \mathbf{x }_- u_k) + \left[ {\begin{matrix}{0} \\ c_0\end{matrix}}\right] \right) \right] \mathrm{d}s\nonumber \\&\quad + \int _0^t\sum _{i, j=1}^v {\mathbb {E}}\left[ \left( H_{i} \mathbf{x }_{-} + \left[ {\begin{matrix}{0} \\ c_i\end{matrix}}\right] \right) ^T\varSigma \left( H_{j} \mathbf{x }_{-} + \left[ {\begin{matrix}{0} \\ c_j\end{matrix}}\right] \right) \right] k_{ij} \mathrm{d}s. \end{aligned}$$
(24)

We find an estimate for the terms related to \(N_k\), that is

$$\begin{aligned} \sum _{k=1}^m 2 x_-^T(s)\varSigma N_{k} \mathbf{x }_{-}(s) u_k(s)&=\sum _{k=1}^m 2\left\langle \varSigma ^{\frac{1}{2}} x_-(s)u_k(s), \varSigma ^{\frac{1}{2}} N_k \mathbf{x }_{-}(s) \right\rangle _2 \nonumber \\&\le \sum _{k=1}^m \left\| \varSigma ^{\frac{1}{2}} x_-(s)u^0_k(s)\right\| _2^2 + \left\| \varSigma ^{\frac{1}{2}} N_k \mathbf{x }_{-}(s)\right\| _2^2\nonumber \\&=x_-^T(s) \varSigma x_-(s) \left\| u^0(s)\right\| _{2}^2 +\sum _{k=1}^m \mathbf{x }_{-}^T(s) N_k^T\varSigma N_{k} \mathbf{x }_{-}(s), \end{aligned}$$
(25)

where \(u^0\) is defined in (14). Moreover, adding a zero, we rewrite

$$\begin{aligned} 2 x_-^T(s) \varSigma A\mathbf{x }_{-}(s)&=2 \mathbf{x }_{-}^T(s) \varSigma A\mathbf{x }_{-}(s) + 2 \left[ {\begin{matrix}{0} \\ {\bar{x}}_2(s)\end{matrix}}\right] ^T \varSigma A\mathbf{x }_{-}(s) \nonumber \\&= \mathbf{x }_{-}^T(s) (A^T\varSigma + \varSigma A) \mathbf{x }_{-}(s) + 2 \left[ {\begin{matrix}{0} \\ {\bar{x}}_2(s)\end{matrix}}\right] ^T \varSigma A\mathbf{x }_{-}(s). \end{aligned}$$
(26)

With (25) and (26), (24) becomes

$$\begin{aligned} {\mathbb {E}}\left[ x_-^T(t)\varSigma x_-(t)\right]&\le {\mathbb {E}}\int _0^t \mathbf{x }_{-}^T\left( A^T \varSigma {+}\varSigma A{+}\sum _{k=1}^m N_k^T\varSigma N_{k}{+}\sum _{i, j=1}^v H^T_{i}\varSigma H_{j}k_{ij}\right) \mathbf{x }_{-} \mathrm{d}s\nonumber \\&\quad +{\mathbb {E}}\int _0^t 2 x_-^T\varSigma \left[ {\begin{matrix}{0} \\ c_0\end{matrix}}\right] + \sum _{i, j=1}^v \left( 2 H_{i} \mathbf{x }_{-} + \left[ {\begin{matrix}{0} \\ c_i\end{matrix}}\right] \right) ^T\varSigma \left[ {\begin{matrix}{0} \\ c_j\end{matrix}}\right] k_{ij} \mathrm{d}s\nonumber \\&\quad +\int _0^t {\mathbb {E}}\left[ x_-^T \varSigma x_-\right] \left\| u^0\right\| _{2}^2 \mathrm{d}s + {\mathbb {E}} \int _0^t 2 \left[ {\begin{matrix}{0} \\ {\bar{x}}_2\end{matrix}}\right] ^T \varSigma A\mathbf{x }_{-}\mathrm{d}s. \end{aligned}$$
(27)

Taking the partitions of \(x_-\) and \(\varSigma \) into account, we see that \(x_-^T\varSigma \left[ {\begin{matrix}{0} \\ c_0\end{matrix}}\right] =x_2^T \varSigma _2 c_0\). Furthermore, the partitions of \(\mathbf{x }_{-}\) and \(H_i\) yield

$$\begin{aligned} \left( 2 H_{i} \mathbf{x }_{-} + \left[ {\begin{matrix}{0} \\ c_i\end{matrix}}\right] \right) ^T\varSigma \left[ {\begin{matrix}{0} \\ c_j\end{matrix}}\right]&=\left( 2 H_{i} \mathbf{x }_{-} + \left[ {\begin{matrix}{0} \\ c_i\end{matrix}}\right] \right) ^T\left[ {\begin{matrix}{0} \\ \varSigma _2 c_j\end{matrix}}\right] \nonumber \\&=\left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}\end{matrix}}\right] \left( x - \left[ {\begin{matrix}{{\bar{x}}} \\ {{\bar{x}}_2}\end{matrix}}\right] \right) + c_i\right) ^T \varSigma _2 c_j \nonumber \\&= \left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}\end{matrix}}\right] x - c_i\right) ^T \varSigma _2 c_j, \end{aligned}$$
(28)

since \(\left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}\end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}} \\ {{\bar{x}}_2}\end{matrix}}\right] =c_i\). Using the partition of A, it holds that

$$\begin{aligned} 2 \left[ {\begin{matrix}{0} \\ {{\bar{x}}_2}\end{matrix}}\right] ^T \varSigma A\mathbf{x }_{-}&=2 \left[ {\begin{matrix}{0}&{\bar{x}}_2^T\varSigma _2 \end{matrix}}\right] A \mathbf{x }_{-}=2 {\bar{x}}_2^T\varSigma _2 \left[ {\begin{matrix}{A}_{21}&A_{22}\end{matrix}}\right] \left( x - \left[ {\begin{matrix}{{\bar{x}}} \\ {{\bar{x}}_2}\end{matrix}}\right] \right) \nonumber \\&=2 {\bar{x}}_2^T\varSigma _2 (\left[ {\begin{matrix}{A}_{21}&A_{22}\end{matrix}}\right] x + B_2u), \end{aligned}$$
(29)

because \(\left[ {\begin{matrix}{A}_{21}&A_{22}\end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}} \\ {{\bar{x}}_2}\end{matrix}}\right] =-B_2u\). We insert (9) and (12b) into inequality (27) and exploit the relations in (28) and (29). Hence,

$$\begin{aligned} {\mathbb {E}}\left[ x_-^T(t)\varSigma x_-(t)\right]&\le - {\mathbb {E}}\left\| y-{\bar{y}}\right\| ^2_{L^2_{t}}+\int _0^t {\mathbb {E}}\left[ x_-^T \varSigma x_-\right] \left\| u^0\right\| _{2}^2 \mathrm{d}s\\&\quad +\,{\mathbb {E}}\int _0^t 2 x_2^T \varSigma _2 c_0 + \sum _{i, j=1}^v \left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}\end{matrix}}\right] x - c_i\right) ^T \varSigma _2 c_j k_{ij} \mathrm{d}s\\&\quad +\, {\mathbb {E}} \int _0^t 2 {\bar{x}}_2^T\varSigma _2 (\left[ {\begin{matrix}{A}_{21}&A_{22}\end{matrix}}\right] x + B_2u) \mathrm{d}s. \end{aligned}$$

We define the function \(\alpha _-(t):={\mathbb {E}}\int _0^t 2 x_2^T \varSigma _2 c_0 + \sum _{i, j=1}^v \left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}\end{matrix}}\right] x - c_i\right) ^T \varSigma _2 c_j k_{ij} \mathrm{d}s + {\mathbb {E}} \int _0^t 2 {\bar{x}}_2^T\varSigma _2 (\left[ {\begin{matrix}{A}_{21}&A_{22}\end{matrix}}\right] x + B_2u) \mathrm{d}s\) and apply Lemma 3 (‘Appendix’) implying

$$\begin{aligned} {\mathbb {E}}\left[ x_-^T(t)\varSigma x_-(t)\right]&\le \alpha _-(t)- {\mathbb {E}}\left\| y-{\bar{y}}\right\| _{L^2_{t}}^2\\&\quad +\int _0^t (\alpha _-(s) - {\mathbb {E}}\left\| y-{\bar{y}}\right\| _{L^2_{s}}^2) \left\| u^0(s)\right\| _{2}^2 \exp \left( \int _s^t \left\| u^0(w)\right\| _{2}^2 \mathrm{d}w\right) \mathrm{d}s. \end{aligned}$$

Since \(\varSigma \) is positive definite, we obtain an upper bound for the output error by

$$\begin{aligned} {\mathbb {E}}\left\| y-{\bar{y}}\right\| _{L^2_{t}}^2\le \alpha _-(t) +\int _0^t \alpha _-(s) \left\| u^0(s)\right\| _{2}^2 \exp \left( \int _s^t \left\| u^0(w)\right\| _{2}^2 \mathrm{d}w\right) \mathrm{d}s. \end{aligned}$$

Defining the term \(\alpha _+(t):={\mathbb {E}}\int _0^t 2 x_2^T \varSigma _2^{-1} c_0 + \sum _{i, j=1}^v \left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}\end{matrix}}\right] x - c_i\right) ^T \varSigma _2^{-1} c_j k_{ij} \mathrm{d}s + {\mathbb {E}} \int _0^t 2 {\bar{x}}_2^T\varSigma _2^{-1} (\left[ {\begin{matrix}{A}_{21}&A_{22}\end{matrix}}\right] x + B_2u) \mathrm{d}s\) and exploiting the assumption that \(\varSigma _2=\sigma I\), leads to

$$\begin{aligned} {\mathbb {E}}\left\| y-{\bar{y}}\right\| _{L^2_{t}}^2\le \sigma ^2\left[ \alpha _+(t) +\int _0^t \alpha _+(s) \left\| u^0(s)\right\| _{2}^2 \exp \left( \int _s^t \left\| u^0(w)\right\| _{2}^2 \mathrm{d}w\right) \mathrm{d}s\right] . \end{aligned}$$
(30)

The remaining step is to find a bound for the right side of (30) that does not depend on \(\alpha _+\) anymore. For that reason, a bound for the expression \({\mathbb {E}}[x_+^T(t) \varSigma ^{-1} x_+(t)]\) is derived next using Ito’s lemma again. From (13) and Lemma 1 (‘Appendix’), we obtain

$$\begin{aligned} {\mathbb {E}}\left[ x_+^T(t)\varSigma ^{-1} x_+(t)\right]&=2 \int _0^t{\mathbb {E}}\left[ x_+^T\varSigma ^{-1}\left( A \mathbf{x }_{+}+2 Bu+\sum _{k=1}^m (N_{k} \mathbf{x }_{+} u_k) - \left[ {\begin{matrix}{0} \\ c_0\end{matrix}}\right] \right) \right] \mathrm{d}s\nonumber \\&\quad + \int _0^t\sum _{i, j=1}^v {\mathbb {E}}\left[ \left( H_{i} \mathbf{x }_{+} - \left[ {\begin{matrix}{0} \\ c_i\end{matrix}}\right] \right) ^T\varSigma ^{-1}\left( H_{j} \mathbf{x }_{+} - \left[ {\begin{matrix}{0} \\ c_j\end{matrix}}\right] \right) \right] k_{ij} \mathrm{d}s. \end{aligned}$$
(31)

Analogously to (25), it holds that

$$\begin{aligned} \sum _{k=1}^m 2 x_+^T(s)\varSigma ^{-1} N_{k} \mathbf{x }_{+}(s) u_k(s) \le x_+^T(s) \varSigma ^{-1} x_+(s) \left\| u^0(s)\right\| _{2}^2 +\sum _{k=1}^m \mathbf{x }_{+}^T(s) N_k^T\varSigma ^{-1} N_{k} \mathbf{x }_{+}(s). \end{aligned}$$

Additionally, we rearrange the term related to A as follows

$$\begin{aligned} 2x_+^T(s) \varSigma ^{-1} A \mathbf{x }_{+}(s)&= 2\mathbf{x }_{+}^T(s) \varSigma ^{-1} A\mathbf{x }_{+}(s) - 2\left[ {\begin{matrix}{0} \\ {\bar{x}}_2(s)\end{matrix}}\right] ^T \varSigma ^{-1} A \mathbf{x }_{+}(s) \\&= \mathbf{x }_{+}^T(s)(A^T \varSigma ^{-1} + \varSigma ^{-1} A)\mathbf{x }_{+}(s) - 2\left[ {\begin{matrix}{0} \\ {\bar{x}}_2(s)\end{matrix}}\right] ^T \varSigma ^{-1} A \mathbf{x }_{+}(s). \end{aligned}$$

Moreover, we have

$$\begin{aligned} 4 x_+^T(s)\varSigma ^{-1} Bu(s) = 4 \mathbf{x }_{+}^T(s)\varSigma ^{-1} Bu(s) - 4\left[ {\begin{matrix}{0} \\ {\bar{x}}_2(s)\end{matrix}}\right] ^T \varSigma ^{-1} Bu(s). \end{aligned}$$

We plug in the above results into (31) which gives us

$$\begin{aligned}&{\mathbb {E}}\left[ x_+^T(t)\varSigma ^{-1} x_+(t)\right] \nonumber \\&\quad \le {\mathbb {E}}\int _0^t \mathbf{x }_{+}^T\left( A^T \varSigma ^{-1}+\varSigma ^{-1} A+\sum _{k=1}^m N_k^T\varSigma ^{-1} N_{k}+\sum _{i, j=1}^v H^T_{i}\varSigma ^{-1} H_{j}k_{ij}\right) \mathbf{x }_{+} \mathrm{d}s\nonumber \\&\qquad -{\mathbb {E}}\int _0^t 2 x_+^T\varSigma ^{-1} \left[ {\begin{matrix}{0} \\ c_0\end{matrix}}\right] + \sum _{i, j=1}^v \left( 2 H_{i} \mathbf{x }_{+} - \left[ {\begin{matrix}{0} \\ c_i\end{matrix}}\right] \right) ^T\varSigma ^{-1}\left[ {\begin{matrix}{0} \\ c_j\end{matrix}}\right] k_{ij} \mathrm{d}s\nonumber \\&\qquad - {\mathbb {E}} \int _0^t 2\left[ {\begin{matrix}{0} \\ {{\bar{x}}_2}\end{matrix}}\right] ^T \varSigma ^{-1} (A \mathbf{x }_{+}+2 Bu) \mathrm{d}s + {\mathbb {E}} \int _0^t 4 \mathbf{x }_{+}^T\varSigma ^{-1} Bu \mathrm{d}s\nonumber \\&\qquad +\int _0^t {\mathbb {E}}\left[ x_+^T \varSigma ^{-1} x_+\right] \left\| u^0\right\| _{2}^2 \mathrm{d}s. \end{aligned}$$
(32)

From inequality (8) and the Schur complement condition on definiteness, it follows that

$$\begin{aligned} \left[ \begin{array}{cc} A^T \varSigma ^{-1}+\varSigma ^{-1}A+\sum _{k=1}^m N_k^T \varSigma ^{-1} N_k + \sum _{i, j=1}^v H_i^T \varSigma ^{-1} H_j k_{i j} &{} \varSigma ^{-1}B\\ B^T \varSigma ^{-1}&{} -I\end{array}\right] \le 0. \end{aligned}$$
(33)

We multiply (33) with \(\left[ {\begin{matrix}{\mathbf{x }_{+}} \\ 2u\end{matrix}}\right] ^T\) from the left and with \(\left[ {\begin{matrix}{\mathbf{x }_{+}}\\ 2u\end{matrix}}\right] \) from the right. Hence,

$$\begin{aligned} 4 \left\| u\right\| _{2}^2&\ge \mathbf{x }_{+}^T\left( A^T \varSigma ^{-1}+\varSigma ^{-1} A+\sum _{k=1}^m N_k^T\varSigma ^{-1} N_{k}+\sum _{i, j=1}^v H^T_{i}\varSigma ^{-1} H_{j}k_{ij}\right) \mathbf{x }_{+}\nonumber \\ {}&\quad +4\mathbf{x }_{+}^T\varSigma ^{-1} Bu. \end{aligned}$$
(34)

Applying this result to (32) yields

$$\begin{aligned} {\mathbb {E}}\left[ x_+^T(t)\varSigma ^{-1} x_+(t)\right]&\le 4 \left\| u\right\| _{L^2_t}^2+\int _0^t {\mathbb {E}}\left[ x_+^T \varSigma ^{-1} x_+\right] \left\| u^0\right\| _{2}^2 \mathrm{d}s\nonumber \\&\quad -{\mathbb {E}} \int _0^t 2\left[ {\begin{matrix}{0} \\ {{\bar{x}}_2}\end{matrix}}\right] ^T \varSigma ^{-1} (A \mathbf{x }_{+}+2 Bu) \mathrm{d}s\nonumber \\&\quad -{\mathbb {E}}\int _0^t 2 x_+^T\varSigma ^{-1} \left[ {\begin{matrix}{0} \\ c_0\end{matrix}}\right] {+} \sum _{i, j=1}^v \left( 2 H_{i} \mathbf{x }_{+} {-} \left[ {\begin{matrix}{0} \\ c_i\end{matrix}}\right] \right) ^T\varSigma ^{-1}\left[ {\begin{matrix}{0} \\ c_j\end{matrix}}\right] k_{ij} \mathrm{d}s. \end{aligned}$$
(35)

We first of all see that \(x_+^T\varSigma ^{-1} \left[ {\begin{matrix}{0} \\ c_0\end{matrix}}\right] =x_2^T\varSigma _2^{-1}c_0\) using the partitions of \(x_+\) and \(\varSigma \). With the partition of \(H_i\), we moreover have

$$\begin{aligned} \left( 2 H_{i} \mathbf{x }_{+} - \left[ {\begin{matrix}{0} \\ c_i\end{matrix}}\right] \right) ^T\varSigma ^{-1}\left[ {\begin{matrix}{0} \\ c_j\end{matrix}}\right]&=\left( 2 H_{i} \mathbf{x }_{+} - \left[ {\begin{matrix}{0} \\ c_i\end{matrix}}\right] \right) ^T\left[ {\begin{matrix}{0} \\ \varSigma _2^{-1} c_j\end{matrix}}\right] \\&=\left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}\end{matrix}}\right] \left( x + \left[ {\begin{matrix}{{\bar{x}}} \\ {{\bar{x}}_2}\end{matrix}}\right] \right) - c_i\right) ^T \varSigma _2^{-1} c_j \\&= \left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}\end{matrix}}\right] x + c_i\right) ^T \varSigma _2^{-1} c_j. \end{aligned}$$

In addition, it holds that

$$\begin{aligned} -2\left[ {\begin{matrix}{0} \\ {{\bar{x}}_2}\end{matrix}}\right] ^T \varSigma ^{-1} (A\mathbf{x }_{+}+2 Bu)&= -2\left[ {\begin{matrix}{0}&{{\bar{x}}_2}^T\varSigma _2^{-1}\end{matrix}}\right] (A\mathbf{x }_{+}+2 Bu)\\&= -2 {\bar{x}}_2^T\varSigma _2^{-1} \left( \left[ {\begin{matrix}{A}_{21}&A_{22}\end{matrix}}\right] \left( x+\left[ {\begin{matrix}{{\bar{x}}} \\ {{\bar{x}}_2}\end{matrix}}\right] \right) +2 B_2u\right) \\&= -2 {\bar{x}}_2^T\varSigma _2^{-1} \left( \left[ {\begin{matrix}{A}_{21}&A_{22}\end{matrix}}\right] x+ B_2u\right) . \end{aligned}$$

Plugging the above relations into (35) leads to

$$\begin{aligned} {\mathbb {E}}\left[ x_+^T(t)\varSigma ^{-1} x_+(t)\right]&\le 4\left\| u\right\| _{L^2_t}^2+\int _0^t {\mathbb {E}}\left[ x_+^T \varSigma ^{-1} x_+\right] \left\| u^0\right\| _{2}^2 \mathrm{d}s\nonumber \\&\quad -{\mathbb {E}}\int _0^t 2 {\bar{x}}_2^T\varSigma _2^{-1} (\left[ {\begin{matrix}{A}_{21}&A_{22}\end{matrix}}\right] x+ B_2u) \mathrm{d}s\nonumber \\&\quad -{\mathbb {E}}\int _0^t 2 x_2^T\varSigma _2^{-1} c_0 {+} \sum _{i, j=1}^v \left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}\end{matrix}}\right] x {+} c_i\right) ^T\varSigma _2^{-1} c_j k_{ij} \mathrm{d}s. \end{aligned}$$
(36)

We add \(2{\mathbb {E}}\int _0^t \sum _{i, j=1}^v c_i^T\varSigma _2^{-1} c_j k_{ij} \mathrm{d}s\) to the right side of (36) and preserve the inequality since this term is nonnegative due to Lemma 2 (‘Appendix’). This results in

$$\begin{aligned} {\mathbb {E}}\left[ x_+^T(t)\varSigma ^{-1} x_+(t)\right] \le 4 \left\| u\right\| _{L^2_t}^2-\alpha _+(t)+\int _0^t {\mathbb {E}}\left[ x_+^T(s) \varSigma ^{-1} x_+(s)\right] \left\| u^0(s)\right\| _{2}^2 \mathrm{d}s. \end{aligned}$$

Gronwall’s inequality in Lemma 3 (“Appendix”) yields

$$\begin{aligned}&{\mathbb {E}}\left[ x_+^T(t)\varSigma ^{-1} x_+(t)\right] \nonumber \\&\quad \le 4\left\| u\right\| _{L^2_t}^2-\alpha _+(t) +\int _0^t (4 \left\| u\right\| _{L^2_s}^2-\alpha _+(s)) \left\| u^0(s)\right\| _{2}^2 \exp \left( \int _s^t \left\| u^0(w)\right\| _{2}^2 \mathrm{d}w\right) \mathrm{d}s. \end{aligned}$$
(37)

We find an estimate for the following expression:

$$\begin{aligned}&\int _0^t \left\| u\right\| _{L^2_s}^2 \left\| u^0(s)\right\| _2^2 \exp \left( \int _s^t \left\| u^0(w)\right\| _2^2\mathrm{d}w\right) \mathrm{d}s\nonumber \\&\quad \le \left\| u\right\| _{L^2_t}^2 \left[ -\exp \left( \int _s^t \left\| u^0(w)\right\| _2^2\mathrm{d}w\right) \right] _{s=0}^t\nonumber \\&\quad =\left\| u\right\| _{L^2_t}^2\left( \exp \left( \int _0^t \left\| u^0(s)\right\| _2^2\mathrm{d}s\right) -1\right) . \end{aligned}$$
(38)

Combining (37) with (38), we obtain

$$\begin{aligned}&\alpha _+(t)+\int _0^t \alpha _+(s) \left\| u^0(s)\right\| _{2}^2 \exp \left( \int _s^t \left\| u^0(w)\right\| _{2}^2 \mathrm{d}w\right) \mathrm{d}s\nonumber \\&\quad \le 4\left\| u\right\| _{L^2_t}^2 \exp \left( \int _0^t \left\| u^0(s)\right\| _2^2ds\right) . \end{aligned}$$
(39)

Comparing this result with (30) implies

$$\begin{aligned} \left( {\mathbb {E}}\left\| y-{\bar{y}}\right\| _{L^2_{t}}^2\right) ^{\frac{1}{2}}\le 2\sigma \left\| u\right\| _{L^2_t} \exp \left( 0.5 \left\| u^0\right\| _{L^2_t}^2\right) . \end{aligned}$$
(40)

4.2 Proof of Theorem 2

We make use of Eqs. (22a) and (23) in order to prove this bound. We set \({\hat{\varSigma }}= \left[ {\begin{matrix}{\varSigma }_1&{} \\ &{} \varSigma _2\end{matrix}}\right] \) as a submatrix of \(\varSigma \) in (16). Lemma 1 (‘Appendix’) now yields

$$\begin{aligned} {\mathbb {E}}\left[ {\hat{x}}_-^T(t){\hat{\varSigma }} {\hat{x}}_-(t)\right]&=2 \int _0^t{\mathbb {E}}\left[ {\hat{x}}_-^T{\hat{\varSigma }}\left( {\hat{A}}\hat{\mathbf{x }}_{-}+\sum _{k=1}^m ({\hat{N}}_{k} \hat{\mathbf{x }}_{-} u_k) + \left[ {\begin{matrix}{0} \\ {\hat{c}}_0\end{matrix}}\right] \right) \right] \mathrm{d}s\nonumber \\&\quad + \int _0^t\sum _{i, j=1}^v {\mathbb {E}}\left[ \left( {\hat{H}}_{i} \hat{\mathbf{x }}_{-} + \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T{\hat{\varSigma }}\left( {\hat{H}}_{j} \hat{\mathbf{x }}_{-} + \left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] \right) \right] k_{ij} \mathrm{d}s. \end{aligned}$$
(41)

We see that the right side of (41) contains the submatrices \({\hat{A}}, {\hat{B}}, {\hat{H}}, {\hat{N}}\) and \({\hat{\varSigma }}\). In order to be able to refer to the full matrix inequality (9), we find upper bounds for certain terms in the following involving the full matrices ABHN and \(\varSigma \). With the same estimate as in (25) and the control vector \(u^0\) defined in (14), we have

$$\begin{aligned} \sum _{k=1}^m 2 {\hat{x}}_-^T(s){\hat{\varSigma }} {\hat{N}}_{k} \hat{\mathbf{x }}_{-}(s) u_k(s)\le {\hat{x}}_-^T(s) {\hat{\varSigma }} {\hat{x}}_-(s) \left\| u^0(s)\right\| _{2}^2 +\sum _{k=1}^m \hat{\mathbf{x }}_{-}^T(s) {\hat{N}}_k^T{\hat{\varSigma }} {\hat{N}}_{k} \hat{\mathbf{x }}_{-}(s). \end{aligned}$$

Adding the term \(\sum _{k=1}^m \left( \left[ {\begin{matrix}{N}_{k, 31}&{N}_{k, 32}&{N}_{k, 33} \end{matrix}}\right] \hat{\mathbf{x }}_{-}(s)\right) ^T \varSigma _3 \left[ {\begin{matrix}{N}_{k, 31}&{N}_{k, 32}&{N}_{k, 33} \end{matrix}}\right] \hat{\mathbf{x }}_{-}(s)\) to the right side of this inequality results in

$$\begin{aligned} \sum _{k=1}^m 2 {\hat{x}}_-^T(s){\hat{\varSigma }} {\hat{N}}_{k} \hat{\mathbf{x }}_{-}(s) u_k(s)\le {\hat{x}}_-^T(s) {\hat{\varSigma }} {\hat{x}}_-(s) \left\| u^0(s)\right\| _{2}^2 +\sum _{k=1}^m \hat{\mathbf{x }}_{-}^T(s) N_k^T \varSigma N_{k} \hat{\mathbf{x }}_{-}(s). \end{aligned}$$
(42)

Moreover, it holds that

$$\begin{aligned} \hat{\mathbf{x }}_{-}^T (A^T\varSigma +\varSigma A)\hat{\mathbf{x }}_{-}&= 2 \hat{\mathbf{x }}_{-}^T \varSigma A\hat{\mathbf{x }}_{-} \\&= 2 \left[ {\begin{matrix}{x}_1 - {\bar{x}}_r\\ x_2 - h_1 \end{matrix}}\right] ^T {\hat{\varSigma }} {\hat{A}}\hat{\mathbf{x }}_{-} + 2 ({\bar{x}}_3 - h_2)^T \varSigma _3\left[ {\begin{matrix}{A}_{31}&{A}_{32}&A_{33}\end{matrix}}\right] \hat{\mathbf{x }}_{-}. \end{aligned}$$

We derive \(\left[ {\begin{matrix}{A}_{31}&{A}_{32}&A_{33}\end{matrix}}\right] \left[ {\begin{matrix}{x}_{1}\\ x_{2}\\ {\bar{x}}_{3}\end{matrix}}\right] =-B_3u\) by the definition of \({\bar{x}}_3\). Moreover, it can be seen from the second line of (20) that \(\left[ {\begin{matrix}{A}_{31}&{A}_{32}&A_{33}\end{matrix}}\right] \hat{\mathbf{x }}_{-}=0\). Hence,

$$\begin{aligned} \hat{\mathbf{x }}_{-}^T (A^T\varSigma +\varSigma A)\hat{\mathbf{x }}_{-} = 2 {\hat{x}}_-^T {\hat{\varSigma }} {\hat{A}}\hat{\mathbf{x }}_{-} - 2\left[ {\begin{matrix}{0}\\ h_1\end{matrix}}\right] ^T {\hat{\varSigma }} {\hat{A}}\hat{\mathbf{x }}_{-}. \end{aligned}$$
(43)

It remains to find a suitable upper bound related to the expression depending on \({\hat{H}}_i\). We first of all see that

$$\begin{aligned}&\sum _{i, j=1}^v \left( {\hat{H}}_{i} \hat{\mathbf{x }}_{-} + \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T{\hat{\varSigma }}\left( {\hat{H}}_{j} \hat{\mathbf{x }}_{-} + \left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] \right) k_{ij}\\&\quad = \hat{\mathbf{x }}_{-}^T\sum _{i, j=1}^v {\hat{H}}^T_{i}{\hat{\varSigma }} {\hat{H}}_{j}k_{ij} \hat{\mathbf{x }}_{-} + \sum _{i, j=1}^v \left( 2 {\hat{H}}_{i} \hat{\mathbf{x }}_{-} + \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T{\hat{\varSigma }}\left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] k_{ij}. \end{aligned}$$

The term \(\sum _{i, j=1}^v \left( \left[ {\begin{matrix}{H}_{i, 31}&{H}_{i, 32}&{H}_{i, 33} \end{matrix}}\right] \hat{\mathbf{x }}_{-}(s)\right) ^T \varSigma _3 \left[ {\begin{matrix}{H}_{j, 31}&{H}_{j, 32}&{H}_{j, 33} \end{matrix}}\right] \hat{\mathbf{x }}_{-}(s) k_{ij}\) is nonnegative by Lemma 2 (‘Appendix’). Adding this term to the right side of the above equation yields

$$\begin{aligned}&\sum _{i, j=1}^v \left( {\hat{H}}_{i} \hat{\mathbf{x }}_{-} + \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T{\hat{\varSigma }}\left( {\hat{H}}_{j} \hat{\mathbf{x }}_{-} + \left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] \right) k_{ij}\nonumber \\&\quad \le \hat{\mathbf{x }}_{-}^T\sum _{i, j=1}^v H^T_{i} \varSigma H_{j}k_{ij} \hat{\mathbf{x }}_{-} + \sum _{i, j=1}^v \left( 2 {\hat{H}}_{i} \hat{\mathbf{x }}_{-} + \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T{\hat{\varSigma }}\left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] k_{ij}. \end{aligned}$$
(44)

Applying (42), (43) and (44) to (41), results in

$$\begin{aligned} {\mathbb {E}}\left[ {\hat{x}}_-^T(t){\hat{\varSigma }} {\hat{x}}_-(t)\right]&\le {\mathbb {E}}\int _0^t \hat{\mathbf{x }}_{-}^T\left( A^T \varSigma +\varSigma A+\sum _{k=1}^m N_k^T\varSigma N_{k}+\sum _{i, j=1}^v H^T_{i}\varSigma H_{j}k_{ij}\right) \hat{\mathbf{x }}_{-} \mathrm{d}s\nonumber \\&\quad +{\mathbb {E}}\int _0^t 2 {\hat{x}}_-^T{\hat{\varSigma }} \left[ {\begin{matrix}{0} \\ {\hat{c}}_0\end{matrix}}\right] + \sum _{i, j=1}^v \left( 2 {\hat{H}}_{i} \hat{\mathbf{x }}_{-} + \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T {\hat{\varSigma }}\left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] k_{ij} \mathrm{d}s\nonumber \\&\quad +\int _0^t {\mathbb {E}}\left[ {\hat{x}}_-^T {\hat{\varSigma }} {\hat{x}}_-\right] \left\| u^0\right\| _{2}^2 \mathrm{d}s + {\mathbb {E}} \int _0^t 2 \left[ {\begin{matrix}{0} \\ h_1\end{matrix}}\right] ^T {\hat{\varSigma }} {\hat{A}} \hat{\mathbf{x }}_{-}\mathrm{d}s. \end{aligned}$$
(45)

Using that \({\hat{c}}_i=\left[ {\begin{matrix}{H}_{i, 21}&{H}_{i, 22}&{H}_{i, 23} \end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] \), we have

$$\begin{aligned} \left( 2 {\hat{H}}_{i} \hat{\mathbf{x }}_{-} + \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T{\hat{\varSigma }}\left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right]&=\left( 2 {\hat{H}}_{i} \hat{\mathbf{x }}_{-} + \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T\left[ {\begin{matrix}{0} \\ \varSigma _2 {\hat{c}}_j\end{matrix}}\right] \nonumber \\&=\left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}&H_{i, 23}\end{matrix}}\right] \left( \left[ {\begin{matrix}{x}_1\\ x_2 \\ {\bar{x}}_3\end{matrix}}\right] - \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] \right) + {\hat{c}}_i\right) ^T \varSigma _2 {\hat{c}}_j\nonumber \\&= \left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}&H_{i, 23}\end{matrix}}\right] \left[ {\begin{matrix}{x}_1\\ x_2 \\ {\bar{x}}_3\end{matrix}}\right] - {\hat{c}}_i\right) ^T \varSigma _2 {\hat{c}}_j. \end{aligned}$$
(46)

It can be seen further that

$$\begin{aligned} 2\left[ {\begin{matrix}{0}\\ h_1\end{matrix}}\right] ^T {\hat{\varSigma }} {\hat{A}}\hat{\mathbf{x }}_{-}&=2 \left[ {\begin{matrix}{0}&h_1^T\varSigma _2 \end{matrix}}\right] {\hat{A}}\hat{\mathbf{x }}_{-}=2 h_1^T\varSigma _2 \left[ {\begin{matrix}{A}_{21}&A_{22}&A_{23}\end{matrix}}\right] \left( \left[ {\begin{matrix}{x}_1\\ x_2 \\ {\bar{x}}_3\end{matrix}}\right] -\left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] \right) \nonumber \\&=2 h_1^T\varSigma _2 \left( \left[ {\begin{matrix}{A}_{21}&A_{22}&A_{23}\end{matrix}}\right] \left[ {\begin{matrix}{x}_1\\ x_2 \\ {\bar{x}}_3\end{matrix}}\right] + B_2u\right) \end{aligned}$$
(47)

taking the first line of (20) into account. Inserting (46) and (47) into (45) and using the fact that \(2 {\hat{x}}_-^T{\hat{\varSigma }} \left[ {\begin{matrix}{0} \\ {\hat{c}}_0\end{matrix}}\right] = 2 x_2 \varSigma _2 {\hat{c}}_0 \) leads to

$$\begin{aligned} {\mathbb {E}}\left[ {\hat{x}}_-^T(t){\hat{\varSigma }} {\hat{x}}_-(t)\right]&\le \int _0^t {\mathbb {E}}\left[ {\hat{x}}_-^T {\hat{\varSigma }} {\hat{x}}_-\right] \left\| u^0\right\| _{2}^2 \mathrm{d}s +{\hat{\alpha }}_-(t)\nonumber \\&\quad +{\mathbb {E}}\int _0^t \hat{\mathbf{x }}_{-}^T\left( A^T \varSigma +\varSigma A+\sum _{k=1}^m N_k^T\varSigma N_{k}+\sum _{i, j=1}^v H^T_{i}\varSigma H_{j}k_{ij}\right) \hat{\mathbf{x }}_{-} \mathrm{d}s, \end{aligned}$$
(48)

where we set \({\hat{\alpha }}_-(t):={\mathbb {E}}\int _0^t 2 x_2^T \varSigma _2 {\hat{c}}_0 + \left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}&H_{i, 23}\end{matrix}}\right] \left[ {\begin{matrix}{x}_1\\ x_2 \\ {\bar{x}}_3\end{matrix}}\right] - {\hat{c}}_i\right) ^T \varSigma _2 {\hat{c}}_j \mathrm{d}s +{\mathbb {E}}\int _0^t 2 h_1^T\varSigma _2 (\left[ {\begin{matrix}{A}_{21}&A_{22}&A_{23}\end{matrix}}\right] \left[ {\begin{matrix}{x}_1\\ x_2 \\ {\bar{x}}_3\end{matrix}}\right] + B_2u) \mathrm{d}s\). With (9) and (22b), we obtain

$$\begin{aligned} {\mathbb {E}}\left[ {\hat{x}}_-^T(t){\hat{\varSigma }} {\hat{x}}_-(t)\right]&\le \int _0^t {\mathbb {E}}\left[ {\hat{x}}_-^T {\hat{\varSigma }} {\hat{x}}_-\right] \left\| u^0\right\| _{2}^2 \mathrm{d}s +{\hat{\alpha }}_-(t)- {\mathbb {E}}\left\| {\bar{y}}-{\bar{y}}_r\right\| _{L^2_{t}}^2. \end{aligned}$$

Applying Lemma 3 (‘Appendix’) to this inequality yields

$$\begin{aligned} {\mathbb {E}}\left[ {\hat{x}}_-^T(t){\hat{\varSigma }} {\hat{x}}_-(t)\right]&\le {\hat{\alpha }}_-(t)- {\mathbb {E}}\left\| {\bar{y}}-{\bar{y}}_r\right\| _{L^2_{t}}^2\\&\quad +\int _0^t {\hat{\alpha }}_-(s)\left\| u^0(s)\right\| _{2}^2 \exp \left( \int _s^t \left\| u^0(w)\right\| _{2}^2 \mathrm{d}w\right) \mathrm{d}s. \end{aligned}$$

Since the above left side of the inequality is positive, we obtain

$$\begin{aligned}&{\mathbb {E}}\left\| {\bar{y}} - {\bar{y}}_r\right\| _{L^2_{t}}^2\\&\quad \le {\hat{\alpha }}_-(t) +\int _0^t {\hat{\alpha }}_-(s)\left\| u^0(s)\right\| _{2}^2 \exp \left( \int _s^t \left\| u^0(w)\right\| _{2}^2 \mathrm{d}w\right) \mathrm{d}s. \end{aligned}$$

We exploit that \(\varSigma _2=\sigma I\). Hence, we have

$$\begin{aligned}&{\mathbb {E}}\left\| {\bar{y}} - {\bar{y}}_r\right\| _{L^2_{t}}^2\nonumber \\&\quad \le \sigma ^2\left( {\hat{\alpha }}_+(t) +\int _0^t {\hat{\alpha }}_+(s)\left\| u^0(s)\right\| _{2}^2 \exp \left( \int _s^t \left\| u^0(w)\right\| _{2}^2 \mathrm{d}w\right) \mathrm{d}s\right) , \end{aligned}$$
(49)

where we set \({\hat{\alpha }}_+(t):={\mathbb {E}}\int _0^t 2 x_2^T \varSigma _2^{-1} {\hat{c}}_0 + \left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}&H_{i, 23}\end{matrix}}\right] \left[ {\begin{matrix}{x}_1\\ x_2 \\ {\bar{x}}_3\end{matrix}}\right] - {\hat{c}}_i\right) ^T \varSigma _2^{-1} {\hat{c}}_j \mathrm{d}s +{\mathbb {E}}\int _0^t 2 h_1^T\varSigma _2^{-1} (\left[ {\begin{matrix}{A}_{21}&A_{22}&A_{23}\end{matrix}}\right] \left[ {\begin{matrix}{x}_1\\ x_2 \\ {\bar{x}}_3\end{matrix}}\right] + B_2u) \mathrm{d}s\). In order to find a suitable bound for the right side of (49), Ito’s lemma is applied to \({\mathbb {E}}[{\hat{x}}_+^T(t) {\hat{\varSigma }}^{-1}{\hat{x}}_+(t)]\). Due to (23) and Lemma 1 (‘Appendix’), we obtain

$$\begin{aligned} {\mathbb {E}}\left[ {\hat{x}}_+^T(t){\hat{\varSigma }}^{-1} {\hat{x}}_+(t)\right]&=2 \int _0^t{\mathbb {E}}\left[ {\hat{x}}_+^T{\hat{\varSigma }}^{-1}\left( {\hat{A}}\hat{\mathbf{x }}_{+}+2 {\hat{B}}u+\sum _{k=1}^m ({\hat{N}}_{k} \hat{\mathbf{x }}_{+} u_k) - \left[ {\begin{matrix}{0} \\ {\hat{c}}_0\end{matrix}}\right] \right) \right] \mathrm{d}s\nonumber \\&\quad + \int _0^t\sum _{i, j=1}^v {\mathbb {E}}\left[ \left( {\hat{H}}_{i} \hat{\mathbf{x }}_{+} - \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T{\hat{\varSigma }}^{-1}\left( {\hat{H}}_{j} \hat{\mathbf{x }}_{+} - \left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] \right) \right] k_{ij} \mathrm{d}s. \end{aligned}$$
(50)

Analogously to (42), it holds that

$$\begin{aligned}&\sum _{k=1}^m 2 {\hat{x}}_+^T(s){\hat{\varSigma }}^{-1} {\hat{N}}_{k} \hat{\mathbf{x }}_{+}(s) u_k(s) \nonumber \\&\quad \le {\hat{x}}_+^T(s) {\hat{\varSigma }}^{-1} {\hat{x}}_+(s) \left\| u^0(s)\right\| _{2}^2 +\sum _{k=1}^m \hat{\mathbf{x }}_{+}^T(s) {\hat{N}}_k^T{\hat{\varSigma }}^{-1} {\hat{N}}_{k} \hat{\mathbf{x }}_{+}(s)\nonumber \\&\quad \le {\hat{x}}_+^T(s) {\hat{\varSigma }}^{-1} {\hat{x}}_+(s) \left\| u^0(s)\right\| _{2}^2 +\sum _{k=1}^m \hat{\mathbf{x }}_{+}^T(s) N_k^T \varSigma ^{-1} N_{k} \hat{\mathbf{x }}_{+}(s). \end{aligned}$$
(51)

Furthermore, we see that

$$\begin{aligned}&\hat{\mathbf{x }}_{+}^T (A^T\varSigma ^{-1}+\varSigma ^{-1} A)\hat{\mathbf{x }}_{+}+4\hat{\mathbf{x }}_{+}^T \varSigma ^{-1} Bu = 2 \hat{\mathbf{x }}_{+}^T \varSigma ^{-1} (A\hat{\mathbf{x }}_{+}+2Bu)\\&\quad = 2 \left[ {\begin{matrix}{x}_1 + {\bar{x}}_r\\ x_2 + h_1 \end{matrix}}\right] ^T {\hat{\varSigma }}^{-1} ({\hat{A}}\hat{\mathbf{x }}_{+}+2{\hat{B}} u) + 2 ({\bar{x}}_3 + h_2)^T \varSigma _3^{-1}\left( \left[ {\begin{matrix}{A}_{31}&{A}_{32}&A_{33}\end{matrix}}\right] \hat{\mathbf{x }}_{+}+2B_3u\right) . \end{aligned}$$

Since \(\left[ {\begin{matrix}{A}_{31}&{A}_{32}&A_{33}\end{matrix}}\right] \left[ {\begin{matrix}{x}_{1}\\ x_{2}\\ {\bar{x}}_{3}\end{matrix}}\right] = \left[ {\begin{matrix}{A}_{31}&{A}_{32}&A_{33}\end{matrix}}\right] \left[ {\begin{matrix}{{\bar{x}}_r}\\ h_1\\ h_2\end{matrix}}\right] =-B_3u\) by the definition of \({\bar{x}}_3\) and the second line of (20), we obtain \(\left[ {\begin{matrix}{A}_{31}&{A}_{32}&A_{33}\end{matrix}}\right] \hat{\mathbf{x }}_{+}=-2B_3 u\). Thus,

$$\begin{aligned}&\hat{\mathbf{x }}_{+}^T (A^T\varSigma ^{-1}+\varSigma ^{-1} A)\hat{\mathbf{x }}_{+}+4\hat{\mathbf{x }}_{+}^T \varSigma ^{-1} Bu \nonumber \\&\quad = 2 {\hat{x}}_+^T {\hat{\varSigma }}^{-1} ({\hat{A}}\hat{\mathbf{x }}_{+}+2{\hat{B}} u) + 2\left[ {\begin{matrix}{0}\\ h_1\end{matrix}}\right] ^T {\hat{\varSigma }}^{-1} ({\hat{A}}\hat{\mathbf{x }}_{+}+2{\hat{B}} u). \end{aligned}$$
(52)

Finally, we see that

$$\begin{aligned}&\sum _{i, j=1}^v \left( {\hat{H}}_{i} \hat{\mathbf{x }}_{+} - \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T{\hat{\varSigma }}^{-1}\left( {\hat{H}}_{j} \hat{\mathbf{x }}_{+} - \left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] \right) k_{ij}\nonumber \\&\quad = \hat{\mathbf{x }}_{+}^T\sum _{i, j=1}^v {\hat{H}}^T_{i}{\hat{\varSigma }}^{-1} {\hat{H}}_{j}k_{ij} \hat{\mathbf{x }}_{+} - \sum _{i, j=1}^v \left( 2 {\hat{H}}_{i} \hat{\mathbf{x }}_{+} - \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T{\hat{\varSigma }}^{-1}\left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] k_{ij}\nonumber \\&\quad \le \hat{\mathbf{x }}_{+}^T\sum _{i, j=1}^v H^T_{i} \varSigma ^{-1} H_{j}k_{ij} \hat{\mathbf{x }}_{+} - \sum _{i, j=1}^v \left( 2 {\hat{H}}_{i} \hat{\mathbf{x }}_{+} - \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T{\hat{\varSigma }}^{-1}\left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] k_{ij} \end{aligned}$$
(53)

applying Lemma 2 (‘Appendix’). With (51), (52) and (53) inequality (50) becomes

$$\begin{aligned}&{\mathbb {E}}\left[ {\hat{x}}_+^T(t){\hat{\varSigma }}^{-1} {\hat{x}}_+(t)\right] \nonumber \\&\quad \le {\mathbb {E}}\int _0^t \hat{\mathbf{x }}_{+}^T\left( A^T \varSigma ^{-1}+\varSigma ^{-1} A+\sum _{k=1}^m N_k^T\varSigma ^{-1} N_{k} +\sum _{i, j=1}^v H^T_{i}\varSigma ^{-1} H_{j}k_{ij}\right) \hat{\mathbf{x }}_{+} \mathrm{d}s\nonumber \\&\qquad -{\mathbb {E}}\int _0^t 2 {\hat{x}}_+^T{\hat{\varSigma }}^{-1} \left[ {\begin{matrix}{0} \\ {\hat{c}}_0\end{matrix}}\right] + \sum _{i, j=1}^v \left( 2 {\hat{H}}_{i} \hat{\mathbf{x }}_{+} - \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T {\hat{\varSigma }}^{-1}\left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] k_{ij} \mathrm{d}s\nonumber \\&\qquad - {\mathbb {E}} \int _0^t 2\left[ {\begin{matrix}{0} \\ h_1\end{matrix}}\right] ^T {\hat{\varSigma }}^{-1} ({\hat{A}}\hat{\mathbf{x }}_{+}+2 {\hat{B}}u) \mathrm{d}s + {\mathbb {E}} \int _0^t 4 \hat{\mathbf{x }}_{+}^T\varSigma ^{-1} Bu \mathrm{d}s\nonumber \\&\qquad +\int _0^t {\mathbb {E}}\left[ {\hat{x}}_+^T {\hat{\varSigma }}^{-1} {\hat{x}}_+\right] \left\| u^0\right\| _{2}^2 \mathrm{d}s. \end{aligned}$$
(54)

Similarly to (34), we obtain

$$\begin{aligned} 4 \left\| u\right\| _{2}^2 \ge \hat{\mathbf{x }}_{+}^T\left( A^T \varSigma ^{-1}+\varSigma ^{-1} A+\sum _{k=1}^m N_k^T\varSigma ^{-1} N_{k}+\sum _{i, j=1}^v H^T_{i}\varSigma ^{-1} H_{j}k_{ij}\right) \hat{\mathbf{x }}_{+} +4 \hat{\mathbf{x }}_{+}^T\varSigma ^{-1} Bu. \end{aligned}$$

This leads to

$$\begin{aligned}&{\mathbb {E}}\left[ {\hat{x}}_+^T(t){\hat{\varSigma }}^{-1} {\hat{x}}_+(t)\right] \nonumber \\&\quad \le 4 \left\| u\right\| _{L^2_t}^2 +\int _0^t {\mathbb {E}}\left[ {\hat{x}}_+^T {\hat{\varSigma }}^{-1} {\hat{x}}_+\right] \left\| u^0\right\| _{2}^2 \mathrm{d}s\nonumber \\&\qquad -{\mathbb {E}}\int _0^t 2 {\hat{x}}_+^T{\hat{\varSigma }}^{-1} \left[ {\begin{matrix}{0} \\ {\hat{c}}_0\end{matrix}}\right] + \sum _{i, j=1}^v \left( 2 {\hat{H}}_{i} \hat{\mathbf{x }}_{+} - \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T {\hat{\varSigma }}^{-1}\left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] k_{ij} \mathrm{d}s\nonumber \\&\qquad - {\mathbb {E}} \int _0^t 2\left[ {\begin{matrix}{0} \\ h_1\end{matrix}}\right] ^T {\hat{\varSigma }}^{-1} ({\hat{A}}\hat{\mathbf{x }}_{+}+2 {\hat{B}}u) \mathrm{d}s. \end{aligned}$$
(55)

In the following (55) is expressed by terms depending on \(\varSigma _2\). We obtain \({\hat{x}}_+^T{\hat{\varSigma }}^{-1} \left[ {\begin{matrix}{0} \\ {\hat{c}}_0\end{matrix}}\right] =x_2^T\varSigma _2^{-1} {\hat{c}}_0\) exploiting the partitions of \({\hat{x}}_+\) and \({\hat{\varSigma }}\). The terms depending on \({\hat{H}}_i\) become

$$\begin{aligned}&-\sum _{i, j=1}^v\left( 2 {\hat{H}}_{i} \hat{\mathbf{x }}_{+} - \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T{\hat{\varSigma }}^{-1}\left[ {\begin{matrix}{0} \\ {\hat{c}}_j\end{matrix}}\right] k_{ij}\nonumber \\&\quad =-\sum _{i, j=1}^v\left( 2 {\hat{H}}_{i} \hat{\mathbf{x }}_{+} - \left[ {\begin{matrix}{0} \\ {\hat{c}}_i\end{matrix}}\right] \right) ^T\left[ {\begin{matrix}{0} \\ \varSigma _2^{-1} {\hat{c}}_j\end{matrix}}\right] k_{ij}\nonumber \\&\quad =-\sum _{i, j=1}^v\left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}&H_{i, 23}\end{matrix}}\right] \left( \left[ {\begin{matrix}{x}_1 \\ x_2\\ {\bar{x}}_3\end{matrix}}\right] + \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] \right) - {\hat{c}}_i\right) ^T \varSigma _2^{-1} {\hat{c}}_j k_{ij}\nonumber \\&\quad = -\sum _{i, j=1}^v\left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}&H_{i, 23}\end{matrix}}\right] \left[ {\begin{matrix}{x}_1 \\ x_2\\ {\bar{x}}_3\end{matrix}}\right] + {\hat{c}}_i\right) ^T \varSigma _2^{-1} {\hat{c}}_j k_{ij}\nonumber \\&\quad \le -\sum _{i, j=1}^v\left( 2 \left[ {\begin{matrix}{H}_{i, 21}&H_{i, 22}&H_{i, 23}\end{matrix}}\right] \left[ {\begin{matrix}{x}_1 \\ x_2\\ {\bar{x}}_3\end{matrix}}\right] - {\hat{c}}_i\right) ^T \varSigma _2^{-1} {\hat{c}}_j k_{ij} \end{aligned}$$
(56)

adding \(2\sum _{i, j=1}^v {\hat{c}}_i^T\varSigma _2^{-1} {\hat{c}}_j k_{ij}\) which is positive due to Lemma 2 (‘Appendix’). Furthermore, using the first line of (20), it holds that

$$\begin{aligned} -2\left[ {\begin{matrix}{0} \\ h_1\end{matrix}}\right] ^T {\hat{\varSigma }}^{-1} ({\hat{A}}\hat{\mathbf{x }}_{+}+2 {\hat{B}}u)&= -2\left[ {\begin{matrix}{0}&h_1^T\varSigma _2^{-1}\end{matrix}}\right] ({\hat{A}}\hat{\mathbf{x }}_{+}+2 {\hat{B}}u)\nonumber \\&= -2 h_1^T\varSigma _2^{-1} \left( \left[ {\begin{matrix}{A}_{21}&A_{22}&A_{23}\end{matrix}}\right] \left( \left[ {\begin{matrix}{x}_1 \\ x_2\\ {\bar{x}}_3\end{matrix}}\right] + \left[ {\begin{matrix}{{\bar{x}}_r} \\ h_1\\ h_2\end{matrix}}\right] \right) +2 B_2u\right) \nonumber \\&= -2 h_1^T\varSigma _2^{-1} \left( \left[ {\begin{matrix}{A}_{21}&A_{22}&A_{23}\end{matrix}}\right] \left[ {\begin{matrix}{x}_1 \\ x_2\\ {\bar{x}}_3\end{matrix}}\right] + B_2u\right) . \end{aligned}$$
(57)

We insert (56) and (57) into (55) and obtain

$$\begin{aligned} {\mathbb {E}}\left[ {\hat{x}}_+^T(t){\hat{\varSigma }}^{-1} {\hat{x}}_+(t)\right] \le 4 \left\| u\right\| _{L^2_t}^2 +\int _0^t {\mathbb {E}}\left[ {\hat{x}}_+^T {\hat{\varSigma }}^{-1} {\hat{x}}_+\right] \left\| u^0\right\| _{2}^2 \mathrm{d}s-{\hat{\alpha }}_+(t). \end{aligned}$$

With Lemma 3 (‘Appendix’), analogously to (39), we find

$$\begin{aligned}&{\hat{\alpha }}_+(t)+\int _0^t {\hat{\alpha }}_+(s) \left\| u^0(s)\right\| _{2}^2 \exp \left( \int _s^t \left\| u^0(w)\right\| _{2}^2 \mathrm{d}w\right) \mathrm{d}s\nonumber \\&\quad \le 4\left\| u\right\| _{L^2_t}^2 \exp \left( \int _0^t \left\| u^0(s)\right\| _2^2ds\right) . \end{aligned}$$
(58)

The relations (49) and (58) yield the claim.

5 Numerical experiments

We conduct a numerical experiment in order to compare several MOR techniques and to check the performance of the error bound in Theorem 3. We determine three different ROMs. One is by SPA stated in (7). The corresponding output is denoted by \({\bar{y}}_{SPA}\). Moreover, we study a structure preserving version of SPA that is obtained by setting \(B_2=0\) in (7), i.e., \(({\bar{B}}, {\bar{D}}, {\bar{E}}_k, {\bar{F}}_i)= (B_1, 0, 0, 0)\). This technique is denoted by SPA2 and its output is written as \({\bar{y}}_{SPA2}\). Notice that this method is a generalization of the one in [18]. Finally, we deal with BT [30], another structure preserving scheme. The respective output is \({\bar{y}}_{BT}\).

In particular, we apply the different MOR variants to a heat transfer problem that was proposed in [30]. We consider a heat equation on \([0, 1]^2\):

$$\begin{aligned} \frac{\partial }{\partial t} X(t,\zeta )=\varDelta X(t, \zeta ),\quad \zeta \in [0, 1]^2,\quad t\in [0, T], \end{aligned}$$

with Dirichlet and noisy Robin boundary conditions

$$\begin{aligned} X(t,\zeta )&= u(t)\quad \text {on}\quad \varGamma _L:=\left\{ 0\right\} \times (0, 1), \\ \frac{\partial }{\partial {\mathbf {n}}} X(t,\zeta )&= \frac{1}{\sqrt{2}}\left( u(t)+\dot{w}(t)\right) X(t, \zeta )\quad \text {on}\quad \varGamma _R:=\left\{ 1\right\} \times [0, 1], \\ X(t,\zeta )&= 0 \quad \text {on}\quad \partial [0, 1]^2\setminus (\varGamma _L\cup \varGamma _R),\quad t\in [0, T], \end{aligned}$$

where \(u\in L^2_T\) is a scalar deterministic input and w denotes a scalar standard Wiener process. We discretize the heat equation with a finite difference scheme on an equidistant \({\tilde{n}} \times {\tilde{n}}\)-mesh. This leads to an \(n={\tilde{n}}^2\)-dimensional stochastic bilinear system

$$\begin{aligned} \begin{aligned} \mathrm{d}x(t)&=\left[ Ax(t)+Bu(t)+ \frac{1}{\sqrt{2}} N x(t)u(t)\right] \mathrm{d}t + \frac{1}{\sqrt{2}} N x(t) \mathrm{d}w(t),\\ y(t)&=Cx(t), \;\;\;t\in [0, T], \end{aligned} \end{aligned}$$
(59)

where \(C= \frac{1}{n} [\begin{matrix}1&1&\dots&1\end{matrix}]\), i.e., the average temperature is considered. We refer to [5] or [12] for more details on the matrices AB and N. There, a similar example was investigated for deterministic bilinear systems and linear stochastic systems, respectively.

According to (3) and (4), the associated Gramians are the solutions to

$$\begin{aligned} A^T P^{-1}+P^{-1}A+N^T P^{-1} N&\le -P^{-1}BB^T P^{-1},\nonumber \\ A^T Q+Q A+ N^T Q N&= -C^T C. \end{aligned}$$
(60)

We multiply (60) with P from the left and the right. Applying the Schur complement condition on definiteness, (60) can then be equivalently written as the following linear matrix inequality:

$$\begin{aligned} \left[ \begin{array}{cc} A P+PA^T+B B^T &{} P N^T \\ N P &{} -P\end{array}\right] \le 0, \end{aligned}$$
(61)

see also [12, Remark III.2]. The matrix inequality (61) now is solved using the LMI-solver YALMIP  [24] minimizing \({\text {tr}}(P)\). LMI-solver are generally not suitable in a large-scale setting. Therefore, we choose \({\tilde{n}}=10\) implying \(n=100\).

As in [30], set \(T=2\) and choose two different controls \(u(t)={\tilde{u}}(t), {\hat{u}}(t)\), where \({\tilde{u}}(t)=\cos (\pi t)\) and \({\hat{u}}(t)=\frac{1}{\sqrt{2}}\), \(t\in [0, T]\). We derive the ROMs using SPA, a modified structure preserving version SPA2 and BT based on Q and P. We determine the reduced systems for \(r=3, 6, 9\). We have an error bound for SPA (Theorem 3) and BT [30] but none for SPA2. The bound for BT and SPA is \({{\mathcal {E}}}{{\mathcal {B}}}_r:=2\left( \sum _{i=r+1}^{100}\sigma _i\right) \left\| u\right\| _{L^2_T}\exp \left( 0.5 \left\| u\right\| _{L^2_T}^2\right) \). Notice that \(u^0\equiv u\) in this example. We compute \(\sqrt{{\mathbb {E}}\left\| y-{\bar{y}}_{l}\right\| _{L^2_{T}}^2}\) for \(l=SPA, BT, SPA2\) in Tables 12 and 3.

Table 1 \({L^2_T}\)-error of SPA with ROM defined in (7) and the respective error bound for different reduced order dimensions r, \(T=2\) and outputs \(u={\tilde{u}}, {\hat{u}}\)
Table 2 \({L^2_T}\)-error of BT studied in [30] and the respective error bound for different reduced order dimensions r, \(T=2\) and outputs \(u={\tilde{u}}, {\hat{u}}\)
Table 3 \({L^2_T}\)-error of SPA2 with ROM that is obtained by setting \(B_2=0\) in (7) for different reduced order dimensions r, \(T=2\) and outputs \(u={\tilde{u}}, {\hat{u}}\)

We can see by looking at Tables 1 and 3 that SPA performs clearly better than the structure preserving variant SPA2. This tells us that it is worth to allow a structure change since this can lead to better approximations. We can also see that the error bound for SPA is relatively tight. It is tighter for BT, compare with Table 2. However, this also means that BT performs worse than SPA. Consequently, SPA is the best choice for the example considered here.

6 Conclusions

In this paper, we investigated a large-scale stochastic bilinear system. In order to reduce the state space dimension, a model order reduction technique called singular perturbation approximation was extended to this setting. This method is based on Gramians proposed in [30] that characterize how much a state contributes to the system dynamics or to the output of a system. This choice of Gramians as well as the structure of the reduced system is different than in [18]. With this modification, we provided a new \(L^2\)-error bound that can be used to point out the cases in which the reduced order model by singular perturbation approximation delivers a good approximation to the original model. This error bound is new even for deterministic bilinear systems. Its quality was tested in a numerical experiment.