In this section, we discuss a low-rank approximation method proposed by Kressner and Sirković in [25]. We show that several results can be generalized from the case of the standard Lyapunov equation to the more general form (1). Moreover, we show that in the symmetric case the method allows for an interpretation in terms of \({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-optimal model reduction for bilinear control systems. With this in mind, we assume that we have a symmetric generalized Lyapunov equation (7). If additionally \(A \prec 0\) and \(\rho ({{\,\mathrm{{\mathscr {L}}}\,}}^{-1}\varPi )<1\), then the operator \({\mathscr {M}}(X):=-{{\,\mathrm{{\mathscr {L}}}\,}}(X)-\varPi (X)\) is positive definite and allows us to define a weighted inner product via
$$\begin{aligned} \begin{aligned} \langle \cdot ,\cdot \rangle _{{\mathscr {M}}}&:{\mathbb {R}}^{n\times n} \times {\mathbb {R}}^{n \times n} \rightarrow {\mathbb {R}}\\ \langle X,Y \rangle _{{\mathscr {M}}}&= \langle X,{\mathscr {M}}(Y) \rangle = {{\,\mathrm{trace}\,}}\left( X^T {\mathscr {M}}(Y) \right) , \end{aligned} \end{aligned}$$
with a corresponding induced \({\mathscr {M}}\)-norm, also known as energy norm,
$$\begin{aligned} \Vert X\Vert _{{\mathscr {M}}}^2 = \langle X,X \rangle _{{\mathscr {M}}}. \end{aligned}$$
ALS for the generalized Lyapunov equation
In [25], it is suggested to construct iterative approximations \({\hat{X}}_k\) by rank-1 updates that are locally optimal with respect to the \({\mathscr {M}}\)-norm. To be more precise, assume that X is a solution to the symmetric Lyapunov equation (7), i.e., \( AX + XA + \sum _{i=1}^m N_i X N_i + BB^T=0. \) Given an approximation \({\hat{X}}_k\), we consider the minimization problem
$$\begin{aligned} \min _{v,w\in {\mathbb {R}}^n} \Vert X-{\hat{X}}_k - v w^T\Vert _{{\mathscr {M}}}^2&= \langle X-{\hat{X}}_k - v w^T, X-{\hat{X}}_k - v w^T \rangle _{{\mathscr {M}}}. \end{aligned}$$
Since the minimization involves the constant term \(\Vert X-{\hat{X}}_k\Vert ^2_{{\mathscr {M}}},\) it suffices to focus on
$$\begin{aligned} J(v,w):= \langle vw^T , vw^T \rangle _{{\mathscr {M}}} - 2 {{\,\mathrm{trace}\,}}\left( wv^T {\mathscr {R}}_k\right) , \end{aligned}$$
(10)
where \({\mathscr {R}}_k\) is the current residual, i.e., (5). Locally optimal vectors \(v_k\) and \(w_k\) are then (approximately) determined via an alternating linear scheme (ALS). The main step is to fix one of the two vectors, e.g., v and then minimize the strictly convex objective function to obtain an update for w. A pseudocode is given in Algorithm 1.
In view of Proposition 3 the ALS-based approach for computing new subspace extensions can be seen as searching for an approximation to \(X_k^\text {e}\) of the form \(v_kw_k^T\) by iterating \(({{\,\mathrm{{\mathscr {L}}}\,}}(v_kw_k^T) + \varPi (v_kw_k^T) + {\mathscr {R}}_k)w_k = 0\) when determining \(v_k\) and \(v_k^T({{\,\mathrm{{\mathscr {L}}}\,}}(v_kw_k^T) + \varPi (v_kw_k^T) + {\mathscr {R}}_k) = 0\) when determining \(w_k\). This is to say that the error is approximated by a rank-1 matrix, and at convergence this would result in the new residual, \({\mathscr {R}}_{k+1}\), being left-orthogonal to \(v_{k}\) and right-orthogonal to \(w_{k}\). In the symmetric case, local minimizers of (10) are necessarily symmetric positive semidefinite. This yields the following extension of [25, Lemma 2.3].
Lemma 9
Consider the symmetric generalized Lyapunov equation (7) and assume that \(A\prec 0\), \(\rho ({{\,\mathrm{{\mathscr {L}}}\,}}^{-1}\varPi )<1\), and \({\mathscr {R}}_k={\mathscr {R}}_k^T\succeq 0\). Let J be as in (10). Then every local minimum \((v_*,w_*)\) of J is such that \(v_*w_*^T\) is symmetric positive semidefinite.
Proof
The proof naturally follows along the lines of [25, Lemma 2.3], and hence without loss of generality we assume that \(v_*\ne 0\) , \(w_*\ne 0\), and \(\Vert v_*\Vert =\Vert w_*\Vert \). Thus \(v_*w_*^T\) is positive semidefinite if and only if \(v_*=w_*\). The proof is by contradiction and we assume that \(v_*\ne w_*\). Then, since \(J(v_*,w)\) is strictly convex in w and \(J(v,w_*)\) is strictly convex in v, it follows that
$$\begin{aligned} 2 J(v_*,w_*)&< J(v_*,v_*) + J(w_*,w_*). \end{aligned}$$
Simplifying the left-hand-side we get
$$\begin{aligned} 2 J(v_*,w_*) = -2 v_*^T{{\,\mathrm{{\mathscr {L}}}\,}}(v_*w_*^T)w_* -2 v_*^T\varPi (v_*w_*^T)w_* - 4 v_*^T{\mathscr {R}}_k w_*, \end{aligned}$$
and similarly the right-hand-side gives
$$\begin{aligned} J(v_*,v_*) + J(w_*,w_*) =&- v_*^T{{\,\mathrm{{\mathscr {L}}}\,}}(v_*v_*^T)v_* - v_*^T\varPi (v_*v_*^T)v_* - 2 v_*^T{\mathscr {R}}_k v_* \\&- w_*^T{{\,\mathrm{{\mathscr {L}}}\,}}(w_*w_*^T)w_* - w_*^T\varPi (w_*w_*^T)w_* - 2 w_*^T{\mathscr {R}}_k w_*. \end{aligned}$$
Collecting the terms involving the \({{\,\mathrm{{\mathscr {L}}}\,}}\)-operator we observe that
$$\begin{aligned}&-2 v_*^T{{\,\mathrm{{\mathscr {L}}}\,}}(v_*w_*^T)w_* + v_*^T{{\,\mathrm{{\mathscr {L}}}\,}}(v_*v_*^T)v_* + w_*^T{{\,\mathrm{{\mathscr {L}}}\,}}(w_*w_*^T)w_* \\&\quad =2(v_*^Tv_*)(w_*^TAw_* - v_*^TAv_*) + 2w_*^Tw_*(v_*^TAv_* - w_*^TAw_*) = 0, \end{aligned}$$
Thus by collecting the terms involving the \(\varPi \)-operator to the left, and the residual to the right, the inequality reduces to
$$\begin{aligned} -2 v_*^T\varPi (v_*w_*^T)w_* + v_*^T\varPi (v_*v_*^T)v_* + w_*^T\varPi (w_*w_*^T)w_* < -2 (v_*-w_*)^T {\mathscr {R}}_k (v_*-w_*). \end{aligned}$$
The argument is now concluded by showing that
$$\begin{aligned} -2 v_*^T\varPi (v_*w_*^T)w_* + v_*^T\varPi (v_*v_*^T)v_* + w_*^T\varPi (w_*w_*^T)w_* \ge 0, \end{aligned}$$
since this implies that \(-2 (v_*-w_*)^T {\mathscr {R}}_k (v_*-w_*)>0\) in contradiction to the positive semidefiniteness of \({\mathscr {R}}_k\). We can without loss of generality consider \(m=1\), i.e., only one N-matrix, since the following argument can be applied to all terms in the sum independently. We observe that
$$\begin{aligned} -2 v_*^T N v_*w_*^T N w_* + v_*^TNv_*v_*^TNv_* + w_*^TNw_*w_*^TNw_* = (v_*^TNv_* - w_*^TNw_*)^2 \ge 0, \end{aligned}$$
which shows the desired inequality and thus concludes the proof. \(\square \)
Algorithm 1 and the argument in Lemma 9 are based on a residual. However, if \({{\hat{X}}}_k = 0\), then \({\mathscr {R}}_k = BB^T\), and hence the result is applicable directly to any symmetric generalized Lyapunov equation. The focus on the residual in the previous results is natural since it leads to the following extension of [25, Theorem 2.4] to the case of the symmetric generalized Lyapunov equation.
Theorem 10
Consider the symmetric generalized Lyapunov equation (7) with the additional assumptions that \(A\prec 0\) and \(\rho ({{\,\mathrm{{\mathscr {L}}}\,}}^{-1}\varPi )<1\). Moreover, consider the sequence of approximations constructed as
$$\begin{aligned} \begin{aligned} {{\hat{X}}}_0&= 0\\ {{\hat{X}}}_{k+1}&= {{\hat{X}}}_k + v_{k+1}v_{k+1}^T, \qquad k = 0,1,\dots , \end{aligned} \end{aligned}$$
(11)
where \(v_{k+1}\) is a locally optimal vector computed with ALS (Algorithm 1). Then \({\mathscr {R}}_{k+1}={\mathscr {R}}_{k+1}^T \succeq 0\) for all \(k\ge -1\).
Proof
We show the assertion by induction. It clearly holds that \({\mathscr {R}}_{0}={\mathscr {R}}_{0}^T\succeq 0\). Now assume that this is the case for some k. From Lemma 9 the local minimizers of (10) are symmetric and hence \({{\hat{X}}}_{k+1}\) is reasonably defined in (11). Moreover, since \({{\hat{X}}}_{k+1}\) and the operators in (1) are symmetric it follows that \({\mathscr {R}}_{k+1}\) is symmetric. Thus what is left to show is that \({\mathscr {R}}_{k+1} \succeq 0\), which is true if and only if \(y^T{\mathscr {R}}_{k+1}y\ge 0\) for all \(y\in {\mathbb {R}}^{n}\). Hence take an arbitrary \(y\in {\mathbb {R}}^{n}\) and consider \(y^T{\mathscr {R}}_{k+1}y\). We derive properties similar to [25, equations (12)–(14)]:
Since \((v_{k+1},v_{k+1})\) is a (local) minimizer of J(v, w), it also follows that \(v_{k+1}\) is a (global) minimizer of the (convex) cost function
$$\begin{aligned} J_w(v):=J(v,w)=\langle vw^T,vw^T\rangle _{{\mathscr {M}}} - 2{{\,\mathrm{trace}\,}}(wv^T{\mathscr {R}}_k), \end{aligned}$$
where \(w=v_{k+1}\). Note that the gradient \(\nabla _vJ_w\) of \(J_w\) with respect to v is given by
$$\begin{aligned} (\nabla _vJ_w)_i =2\langle e_iw^T,vw^T \rangle _{{\mathscr {M}}}- 2e_i^T {\mathscr {R}}_k w . \end{aligned}$$
Due to the optimality of \(v_{k+1}\) with respect to \(J_{v_{k+1}}\), first order optimality conditions then imply that
$$\begin{aligned} -Av_{k+1}v_{k+1}^Tv_{k+1}-v_{k+1}v_{k+1}^TAv_{k+1}-\sum _{i=1}^mN_iv_{k+1}v_{k+1}^TN_iv_{k+1}={\mathscr {R}}_k v_{k+1} . \end{aligned}$$
(12)
Striking this equality with \(v_{k+1}^T\) from the left implies that
$$\begin{aligned} 2 v_{k+1}^TA v_{k+1} \Vert v_{k+1}\Vert ^2 = - v_{k+1}^T {\mathscr {R}}_{k} v_{k+1} - \sum _{i=1}^m ( v_{k+1}^T N_i v_{k+1})^2 . \end{aligned}$$
(13)
Based on (12) and its transpose, and by exploiting the symmetry of the involved matrices, we can write the residual as
$$\begin{aligned}&y^T{\mathscr {R}}_{k+1}y = y^T{\mathscr {R}}_{k} y + y^T\left( Av_{k+1}v_{k+1}^T + v_{k+1}v_{k+1}^TA + \sum _{i=1}^m N_iv_{k+1}v_{k+1}^T N_i\right) y \\&\quad = y^T{\mathscr {R}}_{k} y + \sum _{i=1}^m y^TN_iv_{k+1}v_{k+1}^T N_i y + \frac{1}{\Vert v_{k+1}\Vert ^2}y^T( U_{k+1} + U_{k+1}^T)y, \end{aligned}$$
with \(U_{k+1} := -{\mathscr {R}}_{k} v_{k+1}v_{k+1}^T - ( v_{k+1}^TA v_{k+1}) v_{k+1}v_{k+1}^T - \sum _{i=1}^m N_iv_{k+1}z_{i,k+1}^T\), and where \(z_{i,k+1}:=(v_{k+1}^T N_iv_{k+1})v_{k+1}\). We rearrange, identify the term \(-2( v_{k+1}^T A v_{k+1}) v_{k+1} v_{k+1}^T\) and insert (13) to get
$$\begin{aligned}&y^T{\mathscr {R}}_{k+1}y = y^T{\mathscr {R}}_{k} y \\&\quad + \frac{1}{\Vert v_{k+1}\Vert ^2}y^T\left( -{\mathscr {R}}_{k} v_{k+1}v_{k+1}^T -v_{k+1}v_{k+1}^T{\mathscr {R}}_{k} + \frac{1}{\Vert v_{k+1}\Vert ^2}v_{k+1}^T{\mathscr {R}}_{k}v_{k+1} v_{k+1} v_{k+1}^T \right) y \\&\quad +\frac{1}{\Vert v_{k+1}\Vert ^2} y^T \left( \sum _{i=1}^m N_i v_{k+1}v_{k+1}^TN_i \Vert v_{k+1}\Vert ^2 + \frac{1}{\Vert v_{k+1}\Vert ^2}\sum _{i=1}^m z_{i,k+1}z_{i,k+1}^T\right) y \\&\quad +\frac{1}{\Vert v_{k+1}\Vert ^2} y^T\left( - \sum _{i=1}^m N_iv_{k+1}z_{i,k+1}^T - \sum _{i=1}^m z_{i,k+1}v_{k+1}^T N_i\right) y \\&= y^T{\mathscr {R}}_{k} y + \frac{1}{\Vert v_{k+1}\Vert ^2} \left( -2(y^T{\mathscr {R}}_{k} v_{k+1})(v_{k+1}^Ty) + \frac{1}{\Vert v_{k+1}\Vert ^2}( v_{k+1}^T{\mathscr {R}}_{k}v_{k+1})( v_{k+1}^Ty)^2 \right) \\&\quad + \frac{1}{\Vert v_{k+1}\Vert ^2}\Big (\sum _{i=1}^m (y^TN_iv_{k+1})^2\Vert v_{k+1}\Vert ^2 + \frac{1}{\Vert v_{k+1}\Vert ^2} ( z_{i,k+1}^Ty)^2 - 2 (y^TN_i v_{k+1})( z_{i,k+1}^T y ) \Big ) \\&= (y - v_{k+1} \frac{v_{k+1}^Ty}{\Vert v_{k+1}\Vert ^2} )^T{\mathscr {R}}_{k}(y - v_{k+1} \frac{v_{k+1}^Ty}{\Vert v_{k+1}\Vert ^2} ) \\&\quad + \frac{1}{\Vert v_{k+1}\Vert ^2}\sum _{i=1}^m\left( \Vert v_{k+1}\Vert (y^TN_i v_{k+1}) - \frac{1}{\Vert v_{k+1}\Vert }( z_{i,k+1}^T y)\right) ^2 \ge 0. \end{aligned}$$
This asserts the inductive step and hence concludes the proof. \(\square \)
Corollary 11
The iteration (11) produces an increasing sequence of approximations \(0={{\hat{X}}}_0 \preceq {{\hat{X}}}_1\preceq \cdots \preceq X\).
\({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-optimal model reduction for symmetric state space systems
For the standard Lyapunov equation it has been shown, in [7], that minimization of the energy norm induced by the Lyapunov operator (see [40]) is related to \({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-optimal model reduction for linear control systems. We show that a similar conclusion can be drawn for the minimization of the cost functional (10) and \({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-optimal model reduction for symmetric bilinear control systems. In this regard, let us briefly summarize the most important concepts from bilinear model reduction. Given a bilinear system \(\varSigma \) as in (8) with \(\mathrm {dim}(\varSigma )=n,\) the goal of model reduction is to construct a surrogate model \({\widehat{\varSigma }}\) of the form
$$\begin{aligned} {\widehat{\varSigma }}:\left\{ \begin{aligned} \dot{\phantom {1^{-}}{\widehat{x}}}(t)&= {\hat{A}}{\widehat{x}}(t) + \sum _{i=1}^m {\hat{N}}_i {\widehat{x}}(t) w_i(t) + {\hat{B}}u(t)\\ {\widehat{y}}(t)&= {\hat{C}} {\widehat{x}}(t), \end{aligned}\right. \end{aligned}$$
(14)
with \({\hat{A}},{\hat{N}}_i \in {\mathbb {R}}^{k\times k}, {\hat{B}}\in {\mathbb {R}}^{k\times r}, {\hat{C}}\in {\mathbb {R}}^{r\times k}\) and control inputs \(u(t)\in {\mathbb {R}}^{r}\) and \(w(t)\in {\mathbb {R}}^{m}\). In particular, the reduced system should satisfy \(k\ll n\) and \({\widehat{y}}(t)\approx y(t)\) in some norm. In [5, 19] the authors have suggested an algorithm, BIRKA, that iteratively tries to compute a reduced model satisfying first order necessary conditions for \({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-optimality, for the bilinear \({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-norm as defined in Definition 8. A corresponding pseudocode is given in Algorithm 2.
To establish the connection we introduce the following generalizations of the operator \({\mathscr {M}}\):
$$\begin{aligned} \widetilde{{\mathscr {M}}}&:{\mathbb {R}}^{n\times k} \rightarrow {\mathbb {R}}^{n\times k}, \quad \widetilde{{\mathscr {M}}}(X) := - A X - X {{\hat{A}}^T} -\sum _{i=1}^mN_i X {{\hat{N}}_i^T}, \\ \widehat{{\mathscr {M}}}&:{\mathbb {R}}^{k\times k} \rightarrow {\mathbb {R}}^{k\times k}, \quad \widehat{{\mathscr {M}}}(X) := - {\hat{A}} X - X {{\hat{A}}^T} -\sum _{i=1}^m {\hat{N}}_i X {{\hat{N}}_i^T}, \end{aligned}$$
where \({\hat{A}}=V^T A V, {\hat{N}}_i=V^TN_i V\) for \(i=1,\dots ,m\), and \(V\in {\mathbb {R}}^{n\times k}\) is orthogonal. Our first result is concerned with the invertibility of the operators \(\widetilde{{\mathscr {M}}}\) and \(\widehat{{\mathscr {M}}}\), respectively.
Proposition 12
If \(\sigma ({\mathscr {M}})=-\sigma ({{\,\mathrm{{\mathscr {L}}}\,}}+\varPi )\subset {\mathbb {C}}_+\) then \(\sigma (\widetilde{{\mathscr {M}}})\subset {\mathbb {C}}_+\) and \(\sigma (\widehat{{\mathscr {M}}})\subset {\mathbb {C}}_+\).
Proof
Note that \(\sigma (\widetilde{{\mathscr {M}}})\) is determined by the eigenvalues of the matrix
$$\begin{aligned} \widetilde{{\mathbf {M}}} := -I{{\,\mathrm{\otimes }\,}}A - {\hat{A}} {{\,\mathrm{\otimes }\,}}I - \sum _{i=1}^m {\hat{N}}_i {{\,\mathrm{\otimes }\,}}N_i . \end{aligned}$$
(15)
Similarly, we obtain \(\sigma ({\mathscr {M}})\) by computing the eigenvalues of the matrix
$$\begin{aligned} {\mathbf {M}} := -I{{\,\mathrm{\otimes }\,}}A - A {{\,\mathrm{\otimes }\,}}I - \sum _{i=1}^m N_i{{\,\mathrm{\otimes }\,}}N_i . \end{aligned}$$
(16)
Since A and \(N_i\) are assumed to be symmetric, we conclude that \({\mathbf {M}}={\mathbf {M}}^T\succ 0\). Let us then define the orthogonal matrix \({\mathbf {V}}= { V{{\,\mathrm{\otimes }\,}}I}\). It follows that \(\widetilde{{\mathbf {M}}}={\mathbf {V}}^T {\mathbf {M}} {\mathbf {V}}\) and, consequently, \(\widetilde{{\mathbf {M}}} =\widetilde{{\mathbf {M}}}^T\succ 0\). A similar argument with \({\mathbf {V}}=V{{\,\mathrm{\otimes }\,}}V\) can be applied to show the second assertion. \(\square \)
Given a reduced bilinear system, we naturally obtain an approximate solution to the generalized Lyapunov equation. Moreover, the error with respect to the \({\mathscr {M}}\)-inner product is given by the \({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-norms of the original and reduced system, respectively.
Proposition 13
Let \(\varSigma \) denote a bilinear system (8) and let \(A=A^T\prec 0,N_i=N_i^T\) for \(i=1,\dots ,m\), and \(B=C^T\). Assume that \(\rho ({{\,\mathrm{{\mathscr {L}}}\,}}^{-1}\varPi )<1\). Given an orthogonal \(V\in {\mathbb {R}}^{n \times k},k< n,\) define \({\widehat{\varSigma }}\), the reduced bilinear system (14), via \({\hat{A}}=V^T A V, {\hat{N}}_i=V^TN_i V\) and \({\hat{B}}=V^TB={\hat{C}}^T.\) Let X be the solution to \({\mathscr {M}}( X) = BB^T\), and let \({\hat{X}}\) be the solution to \(\widehat{{\mathscr {M}}}({{\hat{X}}}) = {\hat{B}}{\hat{B}}^T\). Then
$$\begin{aligned} \Vert X-V{\hat{X}}V^T \Vert _{{\mathscr {M}}}^2 = \Vert \varSigma \Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2 - \Vert {\widehat{\varSigma }} \Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2. \end{aligned}$$
Proof
By assumption it holds that \({\mathscr {M}}\) and \(\widehat{{\mathscr {M}}}\) are invertible and the controllability Gramians X and \({{\hat{X}}}\) exist. We observe that \(\Vert X\Vert _{{\mathscr {M}}}^2 = {{\,\mathrm{trace}\,}}(XBB^T)=\Vert \varSigma \Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2\) and that \(\langle V{{\hat{X}}} V^T, X \rangle _{{\mathscr {M}}} = {{\,\mathrm{trace}\,}}(V {{\hat{X}}} V^T BB^T)=\Vert {\widehat{\varSigma }}\Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2\). Moreover, for the reduced system we obtain
$$\begin{aligned} \widehat{{\mathscr {M}}}({\hat{X}})&= -V^T ( AV{\hat{X}}V^T + V{\hat{X}}V^T A + \sum _{i=1}^m N_i V{\hat{X}}V^T N_i ) V = V^T {\mathscr {M}}(V{\hat{X}}V^T)V , \end{aligned}$$
which implies that \(\Vert V{{\hat{X}}} V^T\Vert _{{\mathscr {M}}}^2 = {{\,\mathrm{trace}\,}}({\hat{X}} \widehat{{\mathscr {M}}}({\hat{X}}) ) = \Vert {\widehat{\varSigma }}\Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2\). Hence, we obtain
$$\begin{aligned} \Vert X-V{\hat{X}}V^T \Vert _{{\mathscr {M}}}^2&= \Vert X\Vert _{{\mathscr {M}}}^2 + \Vert V{{\hat{X}}} V^T\Vert _{{\mathscr {M}}}^2 - 2 \langle V{{\hat{X}}} V^T, X \rangle _{{\mathscr {M}}} = \Vert \varSigma \Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2 - \Vert {\widehat{\varSigma }}\Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2. \end{aligned}$$
\(\square \)
Extending the results from [7], we obtain a lower bound for the previous terms by the \({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-norm of the error system \(\varSigma -{\widehat{\varSigma }}\).
Proposition 14
Let \(\varSigma \) denote a bilinear system (8) and let \(A=A^T\prec 0,N_i=N_i^T\) for \(i=1,\dots ,m\), and \(B=C^T\). Assume that \(\rho ({{\,\mathrm{{\mathscr {L}}}\,}}^{-1}\varPi )<1\). Given an orthogonal \(V\in {\mathbb {R}}^{n \times k},k< n,\) define \({\widehat{\varSigma }}\), the reduced bilinear system (14), via \({\hat{A}}=V^T A V, {\hat{N}}_i=V^TN_i V\) and \({\hat{B}}=V^TB={\hat{C}}^T.\) Then, it holds
$$\begin{aligned} \Vert \varSigma - {\widehat{\varSigma }} \Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2 \le \Vert \varSigma \Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2 - \Vert {\widehat{\varSigma }} \Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2, \end{aligned}$$
with equality if \({\widehat{\varSigma }}\) is locally \({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-optimal.
Proof
The proof follows by arguments similar to those used in [7, Lemma 3.1]. By definition of the \({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-norm for bilinear systems
$$\begin{aligned} \Vert \varSigma - {\widehat{\varSigma }}\Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2 = {{\,\mathrm{trace}\,}}\left( \begin{bmatrix} B^T&-{\hat{B}}^T \end{bmatrix} X_e \begin{bmatrix} B \\ -{\hat{B}} \end{bmatrix}\right) , \end{aligned}$$
where \(X_e= \begin{bmatrix} X&Y \\ Y^T&{\hat{X}}\end{bmatrix}\) is the solution of
$$\begin{aligned} \begin{bmatrix} A&0\\ 0&{\hat{A}} \end{bmatrix} X_e + X_e \begin{bmatrix} A&0\\ 0&{\hat{A}} \end{bmatrix} + \sum _{i=1}^m \begin{bmatrix} N_i&0\\ 0&{\hat{N}}_i \end{bmatrix} X_e \begin{bmatrix} N_i&0\\ 0&{\hat{N}}_i \end{bmatrix} + \begin{bmatrix} B \\ {\hat{B}} \end{bmatrix}\begin{bmatrix} B^T&{\hat{B}}^T \end{bmatrix}=0. \end{aligned}$$
Analyzing the block structure of \(X_e\), adding and subtracting \(\Vert {\widehat{\varSigma }}\Vert ^2_{{{\,\mathrm{{\mathscr {H}}_2}\,}}} = {{\,\mathrm{trace}\,}}({{\hat{B}}}^T {{\hat{X}}} {{\hat{B}}})\), we find the equivalent expression
$$\begin{aligned} \Vert \varSigma - {\widehat{\varSigma }}\Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2 = \Vert \varSigma \Vert _{{{\,\mathrm{{\mathscr {H}}_2}\,}}}^2 - \Vert {\widehat{\varSigma }}\Vert ^2_{{{\,\mathrm{{\mathscr {H}}_2}\,}}}-2\left( {{\,\mathrm{trace}\,}}(B^T Y {\hat{B}}) - {{\,\mathrm{trace}\,}}({\hat{B}}^T {\hat{X}}{\hat{B}})\right) . \end{aligned}$$
We claim that \({{\,\mathrm{trace}\,}}(B^T Y {\hat{B}}) - {{\,\mathrm{trace}\,}}({\hat{B}}^T {\hat{X}}{\hat{B}})\ge 0\) which then shows the first assertion. In fact, Y and \({\hat{X}}\) are the solutions of \(\widetilde{{\mathscr {M}}}(Y)=B{\hat{B}}^T\) and \(\widehat{{\mathscr {M}}}({\hat{X}})={\hat{B}}{\hat{B}}^T\). With the operators introduced in (15) and (16), we obtain
$$\begin{aligned} {{\,\mathrm{trace}\,}}(B^T Y {\hat{B}}) - {{\,\mathrm{trace}\,}}({\hat{B}}^T {\hat{X}}{\hat{B}})&= \widetilde{{\mathbf {b}}}^T {{\,\mathrm{vec}\,}}(Y) - \widehat{{\mathbf {b}}}^T {{\,\mathrm{vec}\,}}({\hat{X}}) = \widetilde{{\mathbf {b}}}^T \widetilde{{\mathbf {M}}}^{-1} \widetilde{{\mathbf {b}}}^T - \widehat{{\mathbf {b}}}^T \widehat{{\mathbf {M}}}^{-1}\widehat{{\mathbf {b}}} \\&= \widetilde{{\mathbf {b}}}^T\left( \widetilde{{\mathbf {M}}}^{-1} - {\mathbf {V}} ( {\mathbf {V}}^T \widetilde{{\mathbf {M}}} {\mathbf {V}})^{-1} {\mathbf {V}}^T \right) \widetilde{{\mathbf {b}}}, \end{aligned}$$
where \(\widetilde{{\mathbf {b}}} = {{\,\mathrm{vec}\,}}(B{{\hat{B}}}^T)\) and \(\widehat{{\mathbf {b}}} = {{\,\mathrm{vec}\,}}({{\hat{B}}}{{\hat{B}}}^T)\). As in [7, Lemma 3.1], it follows that the previous expression contains the Schur complement of \(\widetilde{{\mathbf {M}}}^{-1}\) in \({\mathbf {S}}= \begin{bmatrix}{\mathbf {V}}^T \widetilde{{\mathbf {M}}} {\mathbf {V}}&{\mathbf {V}}^T \\ {\mathbf {V}}&\widetilde{{\mathbf {M}}}^{-1} \end{bmatrix} \) which can be shown to be positive semidefinite. We omit the details and refer to [7].
Assume now that \({\widehat{\varSigma }}\) is locally \({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-optimal. From [41], we have the following first-order necessary optimality conditions
$$\begin{aligned} Y^T Z + {\hat{X}} {\hat{Z}}&= 0, \quad Z^T N_iY + {\hat{X}}{\hat{N}}_i{\hat{Z}}=0, \ \ i=1,\dots ,m, \\ Z^TB + {\hat{Z}}{\hat{B}}&=0, \quad CY+{\hat{C}}{\hat{X}} =0, \end{aligned}$$
where \(Y,{\hat{X}}\) are as before and \(Z,{\hat{Z}}\) satisfy
$$\begin{aligned} A^T Z + Z{\hat{A}} + \sum _{i=1}^m N_i^T Z {\hat{N}}_i -C^T {\hat{C}}&=0, \quad {\hat{A}}^T {\hat{Z}} + {\hat{Z}}{\hat{A}} + \sum _{i=1}^m {\hat{N}}^T_i {\hat{Z}} {\hat{N}}_i + {\hat{C}}^T {\hat{C}} =0. \end{aligned}$$
From the symmetry of \(A,{\hat{A}},N_i\) and \({\hat{N}}_i\) as well as the fact that \(B=C^T\) and \({\hat{B}}={\hat{C}}^T\), we conclude that \({\hat{Z}}={\hat{X}}\) and \(Z=-Y\). Hence, from the optimality conditions, we obtain
$$\begin{aligned} 0=Z^TB+{\hat{Z}}{\hat{B}}=-Y^TB+{\hat{X}}{\hat{B}} \end{aligned}$$
which in particular implies that
$$\begin{aligned} {{\,\mathrm{trace}\,}}(B^T Y {\hat{B}}) - {{\,\mathrm{trace}\,}}({\hat{B}}^T {\hat{X}}{\hat{B}}) = {{\,\mathrm{trace}\,}}( {\hat{B}}^T(Y^T B - {\hat{X}}{\hat{B}})) = 0. \end{aligned}$$
This shows the second assertion. \(\square \)
As a consequence of Propositions 13 and 14, we obtain the following result.
Theorem 15
Let \(\varSigma \) denote a bilinear system (8) and let \(A=A^T\prec 0,N_i=N_i^T\) for \(i=1,\dots ,m\) and \(B=C^T\). Assume that \(\rho ({{\,\mathrm{{\mathscr {L}}}\,}}^{-1}\varPi )<1\). Given an orthogonal \(V\in {\mathbb {R}}^{n \times k},k< n,\) define \({\widehat{\varSigma }}\), the reduced bilinear system (14), via \({\hat{A}}=V^T A V, {\hat{N}}_i=V^TN_i V\) and \({\hat{B}}=V^TB={\hat{C}}^T.\) Assume that \({\hat{X}}\) solves \(\widehat{{\mathscr {M}}}({{\hat{X}}}) = {\hat{B}}{\hat{B}}^T\). If \({\widehat{\varSigma }}\) is locally \({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-optimal, then \(V{\hat{X}}V^T\) is locally optimal with respect to the \({\mathscr {M}}\)-norm.
Equivalence of ALS and rank-1 BIRKA
So far we have shown that a subspace producing a locally \({{\,\mathrm{{\mathscr {H}}_2}\,}}\)-optimal model reduction is also a subspace for which the Galerkin approximation is locally optimal in the \({\mathscr {M}}\)-norm. In this part we, algorithmically, establish an equivalence between BIRKA and ALS. More precisely, for the symmetric case the equivalence is between BIRKA applied with the target model reduction subspace of dimension 1 for (8), and ALS applied to (1). The proof is based on the following lemmas.
Lemma 16
Consider using BIRKA (Algorithm 2) with \(k=1\), i.e., both the initial guesses and the output are vectors. Then \(\tilde{A}\in {\mathbb {R}}\) is a scalar and hence we can take \({{\tilde{\varLambda }}} = \tilde{A}\) and \(R=1\) in Step 2. Thus \({{\hat{B}}} = \tilde{B}\), \({{\hat{C}}} = {{\tilde{C}}}\), \({{\hat{N}}}_1 = {{\tilde{N}}}_1, \dots , {{\hat{N}}}_m = {{\tilde{N}}}_m\), and hence Steps 2–3 are redundant. Moreover, since \({{\tilde{V}}}\) and \({{\tilde{W}}}\) are vectors, Step 6, is redundant.
Proof
The result follows from direct computation. \(\square \)
When speaking about redundant steps and operations we mean that the entities assigned in that step are exactly equal to another, existing, entity. In such a situation the algorithm can be rewritten, by simply changing the notation, in a way that skips the redundant step and still produces the same result.
Lemma 17
Consider a symmetric generalized Lyapunov equation (7) and let \(v,w\in {\mathbb {R}}^{n}\) be two given vectors. Let \(v_\textsc {birka},w_\textsc {birka}\in {\mathbb {R}}^{n}\) be the approximations obtained by applying BIRKA (Algorithm 2) to (1) with \(C = B^T\) and initial guesses v and w. If \(v=w\), then \(v_\textsc {birka} = w_\textsc {birka}\).
Proof
The proof is by induction, and it suffices to show that if \({{\tilde{V}}} = {{\tilde{W}}}\) at the beginning of a loop, the same holds at the end of the loop. Thus assume \({{\tilde{V}}} = {{\tilde{W}}}\). Then \({{\tilde{N}}}_i = ({{\tilde{W}}}^T {{\tilde{V}}})^{-1} {{\tilde{W}}}^T N_i {{\tilde{V}}} = {{\tilde{V}}}^T N_i {{\tilde{V}}}/\Vert V\Vert ^2 = {{\tilde{V}}}^T N_i^T {{\tilde{V}}}/\Vert V\Vert ^2 = {{\hat{N}}}_i^T\) for \(i=1,\dots ,m\), and \({{\tilde{C}}} = C {{\tilde{V}}} = B^T {{\tilde{W}}} = {{\tilde{B}}}^T\). By Lemma 16 we do not need to consider Steps 2–3. We can now conclude that Step 4 and Step 5 are equal, and thus at the end of the iteration we still have \({{\tilde{V}}} = {{\tilde{W}}}\). \(\square \)
Lemma 18
Consider a symmetric generalized Lyapunov equation (7) and let \(v,w\in {\mathbb {R}}^{n}\) be two given vectors. Let \(v_\textsc {als},w_\textsc {als}\in {\mathbb {R}}^{n}\) be the approximations obtained by applying the ALS algorithm (Algorithm 1) to (1) with initial guesses v and w. If \(v=w\), then \(v_\textsc {als} = w_\textsc {als}\).
Proof
Similar to the proof of Lemma 17 it is enough to show that if \(v = w\) at the beginning of a loop then it also holds at the end of the loop. Hence we assume that \(v = w\). Then \({{\hat{A}}}_1 = {{\hat{A}}}_2\) follows by direct calculations. Moreover, by assumption \({\mathscr {R}}_k={\mathscr {R}}_k^T\). Thus Step 3 and Step 6 are equal, and hence at the end of the iteration we still have that \(v=w\). \(\square \)
Theorem 19
Consider a symmetric generalized Lyapunov equation (7) and let \(v\in {\mathbb {R}}^{n}\) be a given vector. Let \(v_\textsc {birka}\in {\mathbb {R}}^{n}\) be the approximation obtained by applying BIRKA (Algorithm 2) to (1) with \(C= B^T\) and initial guess v. Moreover, let \(v_\textsc {als}\in {\mathbb {R}}^n\) be the approximation obtained by applying the ALS algorithm (Algorithm 1) to (1) with initial guess v. Then \(v_\textsc {birka}= v_\textsc {als}\).
Proof
First, Lemma 17 and Lemma 18 makes it reasonable to assess the algorithms with only a single initial guess as well as a single output. Moreover, Step 5 in BIRKA as well as Steps 2–4 in ALS are redundant. Furthermore, it follows from Lemma 16 that in this situation Steps 2, 3, and 6 of BIRKA are also redundant. Hence we need to compare the procedure consisting of Steps 1 and 4 from BIRKA, with the procedure consisting of Steps 1, 5, and 6 from ALS. It can be observed that the computations are equivalent and thus the asserted equality holds if they stop after an equal amount of iterations. We hence consider the stopping criteria and note that they are the same, since \((v^TA^Tv + v^TAv)/2\Vert v\Vert ^2 = v^TAv/\Vert v\Vert ^2 = {{\tilde{A}}} \in {\mathbb {R}}\). \(\square \)
Corollary 20
Theorem 10 is applicable with ALS changed to BIRKA, using subspaces of dimension 1.
Remark 21
Note that ALS can be generalized such that the optimization is computing rank-\(\ell \) corrections, see [25, Remark 2.2]. With similar arguments as above, one can show that for symmetric systems this can equivalently be achieved by BIRKA. From a theoretical point of view, this will yield more accurate approximations. However, the computational complexity increases quickly since each ALS or BIRKA step then requires solving a generalized Sylvester equation of dimension \(n\times {\ell }\).