1 Introduction

The low-rank alternating direction implicit (LR-ADI) [42, 54] method is one of the state-of-the-art methods for the numerical solution of large-scale Lyapunov equations [19, 65]. This linear matrix equation can be encountered in many applications: control and system theory [34, 66], especially some model reduction techniques for dynamical systems [3, 15], but also discretization of certain partial differential equations (PDEs) [71], and many more.

We consider Lyapunov equations of the form

$$AXE^{\mathsf{T}}+EXA^{\mathsf{T}}+BB^{\mathsf{T}}$$
(1)

where \(A, E\in \mathbb {R}^{n\times n}\), and \(B\in \mathbb {R}^{n\times q}\), qn. Moreover, E is supposed to be symmetric positive definite (SPD) and the matrix pencil (A,E) to be asymptotically stable, i.e., its spectrum is contained in the open left half plane \(\mathbb {C}_{-}\), which guarantees that a unique solution X exists, and is symmetric positive semidefinite [53].

A special case of (1) is attained whenever E = I, namely the equation of interest is

$$AX+XA^{\mathsf{T}}+BB^{\mathsf{T}}=0.$$
(2)

Oftentimes the coefficient matrix E possesses a structured sparsity pattern. For instance, it is (block) diagonal when the matrices stem from a finite element discretization that uses mass-lumping. In this case, we can easily transform (1) and obtain an equation of the form (2). This can, for example, be achieved by simply pre- and post-multiplying (1) by \(E^{-\frac {1}{2}}\) to potentially preserve symmetry of A. For the sake of simplicity, we thus focus on (2) in the following.

In case of very large problem dimensions, the solution X cannot be stored since this matrix is, in general, dense. However, it is well known that its singular values quickly decay to zero under suitable assumptions, see, e.g., [5, 13, 33, 55], so that accurate low-rank approximations ZZTX, \(Z\in \mathbb {R}^{n\times t}\), tn, can be constructed. The efficient computation of the low-rank factor Z is the task of LR-ADI and of all other low-rank methods (see, e.g., the survey papers [19, 65] for further details on different low-rank methods for linear matrix equations).

It is well known that the convergence rate of the LR-ADI method is strictly connected to the selection of some parameters \({\{p_{i}\}}_{i=1,\ldots ,j}\subset \mathbb {C}_{-}\) called shiftsFootnote 1. The computation of effective shifts is a highly non-trivial task and it has been a rather active research topic in the last decades. Many strategies are available in the literature and these can be divided into two categories: Offline routines [54, 60, 73], where the shifts are computed a priori, before LR-ADI starts and then, potentially, cyclically reused, and online schemes [12, 37], where the shifts are computed on the fly within the iterative procedure. The name shifts for the values pj comes from the fact that in each LR-ADI iteration we need to solve a shifted linear system with a coefficient matrix of the form \(A + p_{j}E\), or \(A + p_{j}I\), in case of (1), or (2), respectively. Notice that since (A,E) (or A in case of (2)) is asymptotically stable and \(\{p_{j}\}\subset \mathbb {C}_{-}\), all the linear systems involved in the LR-ADI scheme are well defined.

In Algorithm 1, we report an implementation of the LR-ADI scheme for the solution of (1). Notice that Algorithm 1 is designed to drastically reduce the amount of complex arithmetic calculations. Indeed, even though A and B in (2) are real, the shifts pj are often complex if A is nonsymmetric, so that complex arithmetic may occur (see [11], [36, Chapter 4], and references therein for details and derivations).

One of the most computationally expensive steps of Algorithm 1 is the solution of the shifted linear systems with q right-hand sides in line 3. Such a job has to be carried out at each LR-ADI iteration. In this contribution, we propose to employ state-of-the-art block Krylov subspace methods for this task. In particular, for (2), we illustrate how to efficiently reuse the approximation space employed at the jth LR-ADI iteration and utilize it also in the next one. To this end, it is crucial that the right-hand side of the linear system we need to solve at the (j +1)-st iteration can be represented in terms of the basis of the subspace employed in the previous iteration. This simple but critical observation lets us design a novel, efficient procedure that can lead to noticeable savings in the running time for the solution of (2). Indeed, all the LR-ADI steps can be completely merged into the Krylov routine so that the LR-ADI iteration is only implicitly performed. Moreover, also the LR-ADI shift computation can be incorporated into the framework proposed in this paper.

Algorithm 1
figure a

LR-ADI for Lyapunov equations

The following is a synopsis of the paper. Section 2 is devoted to recalling the general (block) Krylov subspace framework for shifted linear systems. In particular, some details about the extended Krylov subspace method presented in [64] are given in Section 2.1. In Section 3, we present the main contribution of the paper and we show how to fully merge the LR-ADI iteration into the projection method adopted for the linear system solution. The selection of effective shifts is crucial for attaining a fast convergence in terms of number of LR-ADI iterations, and numerous strategies have been proposed in the literature to accomplish this task (see, e.g., [12, 16, 37, 54, 58, 60, 71, 73]). In Section 5, we illustrate how many of these routines can be integrated into our novel framework with no additional cost. The potential of our strategy is depicted in Section 6, where several numerical results are reported. We close the paper with our conclusions in Section 7.

Throughout the paper, we adopt the following notation. The matrix inner product is defined as \(\langle X, X\rangle _{F}\): = trace(YTX) so that the induced norm is \(\|X\|_{F}^{2}= \langle X, X\rangle _{F}\). The Kronecker product is denoted by ⊗ whereas In and On×m denote the identity matrix of order n and the n × m zero matrix, respectively. Only one subscript is used for a square zero matrix, i.e., On×n = On, and the subscript is omitted whenever the dimension of I and O is clear from the context. Moreover, ei is the ith basis vector of the canonical basis of \(\mathbb {R}^{n}\). The brackets [⋅] are used to concatenate matrices of conformal dimensions. In particular, a MATLAB-like notation is adopted and [M,N] denotes the matrix obtained by putting M on the left of N whereas [M;N] the one obtained by putting M on top of N, i.e., [M;N] = [MT,NT]T. If \(w\in \mathbb {R}^{n}\), diag(w) denotes the n × n diagonal matrix whose ith diagonal entry corresponds to the ith component of w. Given \(X\in \mathbb {C}^{n\times m}\), we write X = Re(X) + ı Im(X), where Re(X) and Im(X) are its real and imaginary parts, respectively, and ı is the imaginary unit. The complex conjugate of X is denoted by \(\overline {X}= \text {Re}(X)-\imath \text {Im}(X)\).

2 Block Krylov methods for shifted linear systems

The literature about the numerical solution of shifted linear systems by Krylov subspace methods is rather vast. Indeed, sequences of shifted linear systems arise in many applications belonging to different research areas like control theory [23, 41], wave propagation problems [8], mechanical systems [27], and quantum chromodynamics [32].

This algebraic problem is trickier than it looks and many researchers have contributed to its understanding providing important insights on its properties and designing efficient, robust algorithms for its solution. Here is an incomplete list of contributions on numerical schemes for sequences of shifted linear systems and their analysis [7, 28, 29, 48, 62, 67, 68, 70].

In this section, we consider sequences of shifted linear systems of the form

$$(A+p_{j}I)S_{j}=W,\quad W\in\mathbb{R}^{n\times q},$$
(3)

where the right-hand side W does not depend on the index j, even though, in line 3 of Algorithm 1, Wj−1 does change at every LR-ADI iteration. In Section 3, we show how to adapt the machinery, presented here, to the case of linear systems of the form \((A + p_{j}I)S_{j} = W_{j-1}\), arising within the LR-ADI scheme.

Any Krylov routine for (3) computes a numerical solution of the form \(S_{m}^{(j)}=S_{0}+V_{m}Y_{m}^{(j)}\approx S_{j}\), \(V_{m}=[\mathcal {V}_{1},\ldots ,\mathcal {V}_{m}]\in \mathbb {R}^{n\times m\ell q}\), ≥1Footnote 2, \(\mathcal {V}_{i}\in \mathbb {R}^{n\times \ell q}\), i =1,…,m, \(Y_{m}^{(j)}\in \mathbb {C}^{m\ell q\times q}\), where the orthonormal columns of Vm span a suitable subspace \(\mathcal {K}_{m}\), namely, \(Range (V_{m})=\mathcal {K}_{m}\), S0 is an initial guess, and the matrix \(Y_{m}^{(j)}\) can be computed by imposing different conditions. In particular, \(Y_{m}^{(j)}\) is often computed by either imposing a Galerkin condition on the residual or minimizing the residual norm. For the sake of simplicity, we consider S0 = O in the following.

One of the most common choices for the approximation space \(\mathcal {K}_{m}\) is the block Krylov subspace

$$\mathbf{K}_{m}^{\square}(A,W)=\text{Range} ([W,AW,\ldots,A^{m-1}W]).$$
(4)

See, e.g., [30, 50, 61] and the references therein for further details on the block polynomial Krylov subspace \(\mathbf {K}_{m}^{\square }(A,W)\) and related methods.

However, Simoncini showed in [64] that the extended Krylov subspace [24]

$$\mathbf{E}\mathbf{K}_{m}^{\square}(A,W)=\text{Range} ([W,A^{-1}W,AW,A^{-2}W,\ldots,A^{m-1}W,A^{-m}W]),$$
(5)

can be a powerful alternative for the solution of (3) in many cases. For instance, when A is large and real while the pj’s are complex (see also Section 2.1).

The basis Vm of both the polynomial and extended Krylov subspace can be constructed by means of the (extended) Arnoldi process and the following Arnoldi relation is fulfilled

$$AV_{m}=V_{m}T_{m}+\mathcal{V}_{m+1}E_{m+1}^{\mathsf{T}}\underline{T}_{m},$$
(6)

where \(\underline {T}_{m}=V_{m+1}^{\mathsf {T}}AV_{m}\in \mathbb {R}^{(m+1)\ell q\times m\ell q}\), Tm is its principal square submatrix, and Em+1 = em+1Iq (see, e.g., [56, 63]).

The Arnoldi relation (6) is one of the most crucial tools in the solution of (3) by Krylov methods. Indeed, it can be used to show the fundamental shift-invariance property of the Krylov subspaces (4) and (5), and the following relation holds true

$$(A+p_{j}I_{n})V_{m}=V_{m}(T_{m}+p_{j}I_{m\ell q})+\mathcal{V}_{m+1}E_{m+1}^{\mathsf{T}}\underline{T}_{m}.$$
(7)

See, e.g., [62, Equation (2.1)], [64, Equation (3.1)].

Equation (7) says that we can compute only one approximation space for solving (3). In particular, the space constructed using A, i.e., \(\mathbf {K}_{m}^{\square }(A,W)\) or \(\mathbf {EK}_{m}^{\square }(A,W)\), can be employed, by possibly being expanded, to solve all the shifted linear systems in the sequence (3).

Polynomial Krylov subspace methods often need many iterations to achieve the prescribed accuracy, so that a large subspace is constructed. This leads to an increment in both the storage demand and the computational efforts of the selected solution procedure. Different strategies have been developed to avoid the construction of a too large subspace.

With the goal of achieving a fast convergence in terms of number of iterations, the linear system (3) can be preconditioned, namely is transformed into an equivalent problem with better spectral properties. However, designing effective preconditioning operators for a sequence of shifted linear systems is a difficult task and often highly problem dependent. Very sophisticated schemes have been proposed in the literature (see, e.g., [4, 9, 20, 21, 46]).

Restarted routines are an alternative solution. In this framework, the approximation space \(\mathcal {K}_{m}\) is expanded until it reaches a prescribed maximum dimension. If the desired level of accuracy is not achieved, the last computed basis block \(\mathcal {V}_{m+1}\) is employed as initial block in the construction of a new subspace \(\mathcal {K}_{m}^{\prime }\). This procedure is iterated until a stopping criterion is fulfilled (see, e.g., [29, 62] and [30, Section 3.2.1]). However, in our framework, the LR-ADI shifts pj’s are often computed on the fly and, thus, are not all available at the same time. Therefore, to fully take advantage of the computational efforts needed to solve the linear system \((A + p_{j-1}I)S_{j-1} = W\), we would have to store all the bases computed during the employed restarted Krylov procedure and use them to solve the jth linear system, as well. Unfortunately, this would destroy all the benefits in terms of storage complexity gained from the restart-paradigm.

In [64], Simoncini showed that the employment of the extended Krylov subspace (5), in place of (4), often leads to a faster convergence, in terms of iterations, to the point that the constructed subspace is usually smaller than the polynomial counterpart needed to reach the same level of accuracy. We, thus, decide to use such an approximation space for the solution of the shifted linear systems within the LR-ADI method and in the next section we recall some details of the extended Krylov subspace method.

Notice that the faster convergence of the extended Krylov subspace (5) comes with a toll. Indeed, at each iteration, a linear system with A has to be solved during the basis construction. Nevertheless, the increase in the overall workload of the solution process can be limited in general. Indeed, if we want to use a direct solver to invert A, for instance, the LU factors of A can be computed once and for all before the LR-ADI scheme starts. On the other hand, if an iterative procedure is employed, analogously a single preconditioner for A has to be designed once.

As already mentioned, in the formulation (3), the right-hand side W is fixed, namely it does not depend on the shift index j. However, in line 3 of Algorithm 1, the linear systems we need to solve are of the form

$$(A+p_{j}I)S_{j}=W_{j-1}.$$

At a first glance, having a nonconstant right-hand side does not allow for the employment of the shifted Krylov framework we briefly described above. A larger class of solvers, the so-called recycling Krylov methods, seems more appropriate (see, e.g., [31, 52, 67, 69, 70] for general sequences of shifted linear systems, and [1, 2, 26] for some recycling Krylov techniques applied in a model reduction context). However, in Section 3, we show that, in the LR-ADI context for j > 1, the residual factor Wj−1 belongs to the subspace \(\mathcal {K}_{m}\) employed in the solution of the (j −1)-st linear system \((A + p_{j-1}I)S_{j-1} = W_{j-2}\). Along with the shift-invariance property of the Krylov subspace, this observation allows us to utilize only one subspace for the solution of all the shifted linear systems within the LR-ADI method. In turn, as shown in Section 6, we can notably reduce the computational effort of the overall procedure.

2.1 The extended Krylov subspace method for shifted linear systems

In this section, we recall the extended Krylov subspace method for shifted linear systems presented in [64].

Given the sequence of shifted linear systems (3), the extended Krylov subspace method computes a solution of the form \(S_{m}^{(j)}=V_{m}Y_{m}^{(j)}\), where the 2mq orthonormal columns of Vm span the extended Krylov subspace (5), whereas the 2mq × q matrix \(Y_{m}^{(j)}\) can be computed in different manners.

For instance, \(Y_{m}^{(j)}\) can be computed by imposing a Galerkin condition on the residual \(R_{m}^{(j)}=(A+p_{j}I)V_{m}Y_{m}^{(j)}-W\), namely by imposing \(V_{m}^{\mathsf {T}}R_{m}^{(j)}=0\). Thanks to the shifted Arnoldi relation (7), it is easy to show that such a Galerkin condition is equivalent to solving the projected linear systems

$$(T_{m}+p_{j}I)Y_{m}^{(j)}=E_{1}\gamma,$$
(8)

where E1 = e1I2q, and \(\gamma \in \mathbb {R}^{2q\times q}\) is such that W = V1γ.

With \(Y_{m}^{(j)}\) at hand, the Frobenius norm of the residual \(\|R_{m}^{(j)}\|_{F}\) can be computed at low cost, as

$$\|R_{m}^{(j)}\|_{F}=\|E_{m+1}^{\mathsf{T}}\underline{T}_{m}Y_{m}^{(j)}\|_{F},$$
(9)

following [64, Equation (3.2)].

Alternatively, following the discussion in [68, Section 4.1], the matrix \(Y_{m}^{(j)}\) can be computed also by minimizing the residual norm, i.e.,

$$Y_m^{(j)}={\underset{Y\in\mathbb{R}^{2mq\times q}}{\text{argmin}}}_{}\;{\left\|(A+p_jI)V_mY-W\right\|}_F.$$
(10)

Once again, thanks to the shifted Arnoldi relation (7), the minimization problem in (10) simplifies, and we can compute \(Y_{m}^{(j)}\) as

$$Y_m^{(j)}=\underset{Y\in\mathbb{R}^{2mq\times q}}{\text{argmin}}\;{\left\|({\underline T}_m+p_j\lbrack I_{2mq};O_{2q\times2mq}\rbrack)Y-E_1\gamma\right\|}_F.$$
(11)

Note the abuse of notation: in (11\(E_{1}\in \mathbb {R}^{2(m+1)q\times 2q}\) whereas \(E_{1}\in \mathbb {R}^{2mq\times 2q}\) in (8).

If \(QP=\underline {T}_{m}+p_{j}[I_{2mq};O_{2q\times 2mq}]\) denotes the QR factorization of \(\underline {T}_{m}+p_{j}[I_{2mq};O_{2q\times 2mq}]\), and we consider the following partition

$$Q=[Q_{1},Q_{2}], Q_{1}\in\mathbb{R}^{2(m+1)q\times 2mq}, Q_{2}\in\mathbb{R}^{2(m+1)q\times 2q}, P=\left[\begin{array}{cc} P_{1}\\ O_{2q\times 2mq} \end{array}\right] , P_{1}\in\mathbb{R}^{2mq\times 2mq},$$

then the matrix \(Y_{m}^{(j)}\) in (11) can be computed as

$$Y_{m}^{(j)}=P_{1}^{-1}Q_{1}^{\mathsf{T}}E_{1}\gamma,$$
(12)

and the residual norm is given by

$$\|R_{m}^{(j)}\|_{F}=\|Q_{2}^{\mathsf{T}}E_{1}\gamma\|_{F}.$$
(13)

The overall procedure is summarized in Algorithm 2, where Σ contains the indices of all yet unsolved systems, whereas ΣC contains the indices of all the systems that have already been solved. The basis block \(\mathcal V_{m+1}\) can be computed by following [63]. This operation involves both matrix-vector products and linear system solves with A. Moreover, the basis Vm is real whenever A and W are so. Complex arithmetic may occur in the computation of \(Y_{m}^{(j)}\), if Im(pj)≠0.

Algorithm 2
figure b

Extended Krylov subspace method for shifted linear systems

Notice that as soon as the jth linear system has converged, namely the related relative residual norm is sufficiently small, we stop solving the jth projected problemFootnote 3. Once all the linear systems have converged, we terminate the iterative process.

To conclude, we would like to point out that, to the best of our knowledge, this is the first time the minimal residual condition (11) is proposed within the extended Krylov subspace method for shifted linear systems.

3 Merging the two iterative procedures

In this section, we show how the LR-ADI iteration and the extended Krylov subspace method for shifted linear systems can be merged together into a novel, efficient iterative procedure for the solution of (2).

As already mentioned, in the sequence of shifted linear systems in line 3 of Algorithm 1, also the right-hand side Wj−1 depends on the current LR-ADI iteration j. Therefore, at a first glance, we seemingly have to build a new subspace at each iteration j, by employing the current Wj−1 as initial block. However, in the following theorem, we show that Wj−1 belongs to the subspace constructed to solve the (j −1)-st linear system so that such a space can be used, by being possibly expanded, also in the solution of the subsequent linear system.

Theorem 3.1

Let \(S_{j}=V_{m_{j}}Y_{m_{j}}\), \(j\geqslant 1\), \(Range (V_{m_{j}})=\mathbf {EK}_{m_{j}}^{\square }(A,B)\) for certain \(m_{j}\geqslant 0\). Then

$$\text{Range} (W_{j})\subseteq\mathbf{EK}_{m_{j}}^{\square}(A,B).$$

Proof

We are going to show the statement by induction on j.

The first linear system to be solved within the LR-ADI method is \((A + p_{1}I)S_{1} = B\) and the extended Krylov subspace \(\mathbf {EK}_{m_{1}}^{\square }(A,B)\) can be employed to this end. The computed solution is of the form \(S_{1}=V_{m_{1}}Y_{m_{1}}\), m1 >0, where \(\text{Range} (V_{m_{1}})=\mathbf {EK}_{m_{1}}^{\square }(A,B)\) and \(Y_{m_{1}}\in \mathbb {C}^{2m_{1}q\times q}\). It is thus easy to show that \(W_{1}=B-2\text {Re}(p_{1})S_{1}=V_{m_{1}}(E_{1}\gamma -2\text {Re}(p_{1})Y_{m_{1}})\) is such that \(\text{Range} (W_{1})\subseteq \mathbf {EK}_{m_{1}}^{\square }(A,B)\).

We now assume the statement holds for a certain \(j-1\geqslant 1\), and we show it holds for j as well. Since \(S_{j}=V_{m_{j}}Y_{m_{j}}\) by assumption and \(\text{Range} (W_{j-1})\subseteq \mathbf {EK}_{m_{j-1}}^{\square }(A,B)\) by inductive hypothesis, namely we can write \(W_{j-1}=V_{m_{j-1}}{\Upsilon }_{j-1}\) for a certain \({\Upsilon }_{j-1}\in \mathbb {R}^{2m_{j-1}q\times q}\), we have

$$\begin{array}{@{}rcl@{}} W_{j}&=&W_{j-1}-2\text{Re}(p_{j})S_{j}=V_{m_{j-1}}{\Upsilon}_{j-1}-2\text{Re}(p_{j})V_{m_{j}}Y_{m_{j}}\\ &=&V_{m_{j}}\left([{\Upsilon}_{j-1};O_{2(m_{j}-m_{j-1})q\times q}]-2\text{Re}(p_{j})Y_{m_{j}}\right). \end{array}$$

Therefore, \(\text{Range} (W_{j})\subseteq \mathbf {EK}_{m_{j}}^{\square }(A,B)\). □

Theorem 3.1 shows that Wj is exactly represented in \(\mathbf {EK}_{m_{j}}^{\square }(A,B)\). This means that the latter subspace can be still employed for the computation of Sj+1 by being possibly expanded. Indeed, no components of Wj are annihilated when either the Galerkin or the minimal residual condition is imposed. In the following corollary, we show how to easily write down the projected problems (8) and (11) along with the corresponding residual norm computation.

Corollary 3.1

Assume the prerequisites of Theorem 3.1 hold. If a Galerkin condition is imposed for the computation of \(S_{j}=V_{m_{j}}Y_{m_{j}}\), then the matrix \(Y_{m_{j}}\) amounts to the solution of the projected linear system

$$(T_{m_{j}}+p_{j}I_{2m_{j}q})Y_{m_{j}}= [{\Upsilon}_{j-1};O_{2(m_{j}-m_{j-1})q\times q}],$$
(14)

where \({\Upsilon }_{j-1}\in \mathbb {R}^{2m_{j-1}q\times q}\) is such that \(W_{j-1}=V_{m_{j-1}}{\Upsilon }_{j-1}\), \(m_{j-1}\leqslant m_{j}\). The related residual norm can be computed by

$$\|R_{m_{j}}\|_{F}=\|E_{m_{j}+1}^{\mathsf{T}}\underline{T}_{m_{j}}Y_{m_{j}}\|_{F}.$$
(15)

Similarly, if a minimal residual norm condition is imposed, we have

$$Y_{m_{j}}=\underset{Y\in\mathbb{C}^{2m_{j}q\times q}}{\text{argmin}}\|(\underline{T}_{m_{j}}+p_{j}[I_{2m_{j}q};O_{2q\times 2m_{j}q}] )Y-[{\Upsilon}_{j-1};O_{2(m_{j}-m_{j-1}+1)q\times q}]\|_{F},$$
(16)

so that

$$\|R_{m_{j}}\|_{F}=\|Q_{2}^{\mathsf{T}}[{\Upsilon}_{j-1};O_{2(m_{j}-m_{j-1}+1)q\times q}]\|_{F},$$
(17)

where the 2q orthonormal columns of Q2 are a basis of the kernel of \(\underline {T}_{m_{j}}+p_{j}[I_{2m_{j}q};O_{2q\times 2m_{j}q}]\).

Proof

Since \(W_{j-1}=V_{m_{j-1}}{\Upsilon }_{j-1}\) and we look for a solution \(S_{j}=V_{m_{j}}Y_{m_{j}}\) to \((A + p_{j}I)S_{j} = W_{j-1}\), we can write

$$\begin{array}{lll} R_{m_{j}}&=&(A+p_{j}I)S_{j}-W_{j-1}= (A+p_{j}I)V_{m_{j}}Y_{m_{j}}-V_{m_{j-1}}{\Upsilon}_{j-1}\\ &=& V_{m_{j}}\left((T_{m_{j}}+p_{j}I_{2m_{j}q})Y_{m_{j}}- [{\Upsilon}_{j-1};O_{2(m_{j}-m_{j-1})q\times q}]\right)+\mathcal{V}_{m_{j}+1}E_{m_{j}+1}^{\mathsf{T}}\underline{T}_{m_{j}}\\ &=&V_{m_{j}+1}\left((\underline{T}_{m_{j}}+p_{j} [I_{2m_{j}q};O_{2q\times 2m_{j}q}] )Y_{m_{j}}- [{\Upsilon}_{j-1};O_{2(m_{j}-m_{j-1}+1)q\times q}]\right). \end{array}$$

If a Galerkin condition is imposed, namely \(V_{m_{j}}^{\mathsf {T}}R_{m_{j}}=0\), then \(Y_{m_{j}}\) is the solution of the linear system in (14) and the related residual norm \(\|R_{m_{j}}\|_{F}\) can be computed as in (15).

Similarly, if a minimal residual condition is imposed, \(Y_{m_{j}}\) solves the minimization problem (16) and the \(\|R_{m_{j}}\|_{F}\) fulfills (17). □

Once \(S_{j}=V_{m_{j}}Y_{m_{j}}\) is computed, namely the related residual norm \(\|R_{m_{j}}\|_{F}\) is sufficiently small, we proceed with the remaining LR-ADI operations.

We would like to point out that the expression of Wj, i.e., \(W_{j}=V_{m_{j}}{\Upsilon }_{j}\), can be exploited for the Lyapunov residual norm as well. Indeed,

$$\|W_{j}^{*}W_{j}\|_{F}=\|{\Upsilon}_{j}^{*}{\Upsilon}_{j}\|_{F}.$$
(18)

This means that also the computation of the Lyapunov residual norm can be carried out by manipulating small matrices of dimension 2mjq × q. Similarly, the solution Zj can be assembled at the very end of the LR-ADI procedure once the residual norm in (18) is sufficiently small. Indeed,

$$\begin{aligned}Z_{j}&=[Z_{j-1},\sqrt{-2\text{Re}(p_{j})}S_{j}]=[\sqrt{-2\text{Re}(p_{1})}S_{1},\sqrt{-2\text{Re}(p_{2})}S_{2},\ldots,\sqrt{-2\text{Re}(p_{j})}S_{j}]\\ \\ &=[\sqrt{-2\text{Re}(p_{1})}V_{m_{1}}Y_{m_{1}},\sqrt{-2\text{Re}(p_{2})}V_{m_{2}}Y_{m_{2}},\ldots,\sqrt{-2\text{Re}(p_{j})}V_{m_{j}}Y_{m_{j}}]\\ \\ &=V_{m_{j}}[[Y_{m_{1}};O_{2(m_{j}-m_{1})q\times q}],[Y_{m_{2}};O_{2(m_{j}-m_{2})q\times q}],\ldots,Y_{m_{j}}]\\ &\quad\cdot(\sqrt{-2\text{diag}(\text{Re}(p_{1}),\ldots,\text{Re}(p_{j}))}\otimes I_{q}). \end{aligned}$$
(19)

The overall procedure combining the LR-ADI iteration with the extended Krylov subspace method for shifted linear systems is depicted in Algorithm 3Footnote 4.

Algorithm 3
figure c

LR-ADI-EKSM for Lyapunov equations

As in Algorithm 1, if Im(pj)≠0, in lines 15 to 19 we set \(p_{j+1}=\overline p_{j}\), and we follow the implementation suggested in [11, 36] to reduce the amount of complex arithmetic. In particular, \(Y_{m_{j+1}}\) can be obtained from \(Y_{m_{j}}\) without solving (14) or (16). Moreover, the adopted scheme results in a real Zj (see [11] and [36, Algorithm 4.3] for further details).

Remark 4

Theorem 3.1 shows that \(\text {Range}(W_{j})\subseteq \mathbf {EK}_{m_{j}}^{\square }(A,B)\) whenever Wj is updated as Wj = Wj−1 −2Re(pj)Sj, namely whenever all the employed shifts are real. In case of shifts with nonzero imaginary part, the LR-ADI implementation we adopt sets

$$W_{j+1}=W_{j-1}-4\text{Re}(p_{j})(\text{Re}(S_{j})+\upbeta\text{Im}(S_{j})).$$

Therefore, we need to show that Wj+1 defined as above is still such that \(\text {Range}(W_{j+1})\subseteq \mathbf {EK}_{m_{j}}^{\square }(A,B)\). This can be done by applying the same exact arguments used in the proof of Theorem 3.1. In particular, the result follows by noticing that the basis Vm is real, as we assumed A and B to be real matrices, and that we can write

$$\begin{array}{@{}rcl@{}} W_{j+1}&=&W_{j-1}-4\text{Re}(p_{j})(\text{Re}(S_{j})+\beta\text{Im}(S_{j}))\\ &=&V_{m_{j}}\left([{\Upsilon}_{j-1};O_{2(m_{j}-m_{j-1})q\times q}]-4\text{Re}(p_{j})(\text{Re}(Y_{m_{j}})+\beta\text{Im}(Y_{m_{j}})\right). \end{array}$$

Notice that two tolerances \(\varepsilon _{\mathtt {inn}}^{(j)}\), and ε are employed in Algorithm 3. In particular, ε is used to assess the accuracy of the computed solution in terms of the Lyapunov residual norm, whereas \(\varepsilon _{\mathtt {inn}}^{(j)}\) is employed to determine whether the solution of the current linear system is sufficiently correct. In principle, the user can provide a fixed value for the inner tolerance, i.e., \(\varepsilon _{\mathtt {inn}}^{(j)}\equiv \overline \varepsilon _{\mathtt {inn}}\) for all j. However, the theory developed in [38] can be used to adaptively compute \(\varepsilon _{\mathtt {inn}}^{(j)}\) as the LR-ADI iterations proceed. The relaxation strategy presented in [38, Section 3] allows us to increase \(\varepsilon _{\mathtt {inn}}^{(j)}\) as j grows. Therefore, especially when \(\varepsilon _{\mathtt {inn}}^{(j)}\) is rather large, there is no need to expand the current extended Krylov subspace in general. In all the results reported in Section 6, we employ such a strategy and \(\varepsilon _{\mathtt {inn}}^{(j)}\) is computed according to [38, Equation (3.18b)] (see also [44] for similar results in case of Sylvester equations).

We would like to point out that the lines 23 to 28 in Algorithm 3 and the use of the flag flag_noexpand are crucial to reduce the computational cost of the overall procedure. Indeed, those lines are devoted to check whether the current subspace already contains enough spectral information to solve the current linear system. If this is the case, we do not expand the current space avoiding unnecessary increments in the memory requirements and computational efforts.

If \(\mathbf {Y}:= [[Y_{m_{1}};O_{2(m_{j}-m_{1})q\times q}],[Y_{m_{2}};O_{2(m_{2}-m_{1})q\times q}],\ldots ,Y_{m_{j}}]\), (19) shows that the numerical solution computed by the proposed LR-ADI implementation is of the form

$$Z_{j}Z_{j}^{\mathsf{T}}=-2V_{m_{j}}(\mathbf{Y}(\text{diag}(\text{Re}(p_{1}),\ldots,\text{Re}(p_{j}))\otimes I_{q}) \mathbf{Y}^{\mathsf{T}})V_{m_{j}}^{\mathsf{T}}.$$
(20)

The right-hand side in (20) has the typical form of an approximate solution computed by a projection method applied to (2). In particular, if the extended Krylov subspace method (K-PIK) presented in [63] is applied to solve (2), the computed approximation is of the form \(X_{m}=V_{m}L_{m} V_{m}^{\mathsf {T}}\), where the orthonormal columns of Vm are a basis of \(\mathbf {EK}_{m}^{\square }(A,B)\) and Lm is computed by imposing a Galerkin condition on the residual matrix \(AV_{m}L_{m}V_{m}^{\mathsf {T}}+V_{m}L_{m}V_{m}^{\mathsf {T}}A^{\mathsf {T}}+BB^{\mathsf {T}}\). Therefore, the proposed LR-ADI implementation can be seen as a novel projection method where the coefficients of the linear combination in terms of the basis vectors that provides the approximate solution, namely the matrix \(\mathbf {Y}(\text {diag}(\text {Re}(p_{1}),\ldots ,\text {Re}(p_{j}))\otimes I_{q}) \mathbf {Y}^{\mathsf {T}}\), is computed as outlined above and not by imposing a Galerkin condition on the residual matrix. This perspective may provide new insights on the relation between LR-ADI and K-PIK. However, this is beyond the scope of this paper. Similar investigations, relating LR-ADI and rational Krylov subspace methods, have been reported in [25, 74, 75].

The expression (20) resembles the LDLT-form of the LR-ADI solution. This formulation, while being more natural for projection-based solvers, also turned out to be advantageous when LR-ADI is employed as linear solver for differential matrix equations (see [39]).

4 Shift computation

Many of the procedures, available in the literature, for the ADI shift computation need the explicit construction of a basis of Range(Zj) or a subspace thereof. For instance, in [12], the authors suggest to use, as shifts pj, a subset of the Ritz values of A with respect to \(\mathcal {Z}_{j} = \text{Range} (\widetilde Z_{j})\), where \(\widetilde Z_{j}\in \mathbb {R}^{n\times h}\) consists of the last h >0 columns of Zj that have been orthogonalized with respect to each other. However, (19) shows that Algorithm 3 provides us with a matrix Zj such that \(\text{Range} (Z_{j})\subseteq \mathbf {EK}_{m_{j}}^{\square }(A,B)\) so that the Ritz values of A with respect to \(\mathbf {EK}_{m_{j}}^{\square }(A,B)\) can be employed as shifts. Moreover, in standard LR-ADI implementations, one has to explicitly compute the projection of A onto \(\mathcal {Z}_{j}\) increasing the computational efforts of the overall procedure. In our approach, the projection of A onto \(\mathbf {EK}_{m_{j}}^{\square }(A,B)\) is given for free as this amounts to \(T_{m_{j}}\) and no additional operations are required.

The observation above can be applied to many schemes for the shift computation. In the following, we give some details for the residual-Hamiltonian-based shifts and the residual norm-minimizing shifts presented in [37].

In [37, Section 2.1.3], at the  jth LR-ADI iteration, the Hamiltonian matrix \({\mathscr{H}}_{j}=\left [\begin {array}{ll} A^{\mathsf {T}} & O \\ W_{j}W_{j}^{\mathsf {T}} & -A \end {array}\right ]\) is considered and its projection onto \(\mathcal {Z}_{j}\), namely \(\widetilde{\mathcal {H}}_{j}=\left [\begin {array}{ll} {(\widetilde Z_{j}^{\mathsf {T}}A\widetilde Z_{j})}^{\mathsf {T}} & O \\ \widetilde Z_{j}^{\mathsf {T}}W_{j}W_{j}^{\mathsf {T}}\widetilde Z_{j} & -\widetilde Z_{j}^{\mathsf {T}}A\widetilde Z_{j} \end {array}\right ]\), is constructed. In our case, we can easily construct the projection of \({\mathscr{H}}_{j}\) onto \(\mathbf {EK}_{m_{j}}^{\square }(A,B)\) and this is given by

$$\widetilde{\mathcal{H}}_{j}=\left[\begin{array}{ll} T_{m_{j}}^{\mathsf{T}} & O \\ {\Upsilon}_{j}^{\mathsf{T}}{\Upsilon}_{j} & -T_{m_{j}} \end{array}\right]\in\mathbb{R}^{4m_{j}q\times 4m_{j}q}.$$
(21)

With (21) at hand, we compute its stable eigenpairs \(\left (\lambda _{k},\left [\begin {array}{ll}s_{k}\\ t_{k} \end {array}\right ]\right )\), Re(λk) <0, \(s_{k},t_{k}\in \mathbb {R}^{2m_{j}q}\), and the (j +1)-st residual-Hamiltonian-based shift pj+1 is selected as the eigenvalue \(\lambda _{\widehat k}\) such that \(t_{\widehat k}=argmax\{\|t_{k}\|\}\).

For the computation of residual-norm-minimizing shifts, in [37, Section 3], a rather involved optimization procedure is presented. In particular, the real and imaginary parts of pj+1 = 𝜃j+1 + ıξj+1 are computed by solving the following minimization problem

$$\lbrack\theta_{j+1},\xi_{j+1}\rbrack = \underset{\theta\in{\mathbb{R}}_-,\xi\in\mathbb{R}}{\text{argmin}}\;\left\|W_j-2\theta((A+(\theta+i\xi)I)^{-1}W_j\right\|^2.$$
(22)

The objective function in (22) is expensive to evaluate, making the shift computation often more expensive than a single LR-ADI iteration. To overcome this issue, Kürschner proposes to employ smaller matrices \(\widetilde A\) and \(\widetilde W_{j}\) in place of A and Wj. Once again, \(\widetilde A\) and \(\widetilde W_{j}\) are the projection of A and Wj onto a suitable subspace. This subspace is chosen to be \(\mathbf {EK}_{\ell }^{\square }(A,B)\cup \text{Range} (Z_{j})\) for a certain, usually small, >0. In our implementation, \(\mathbf {EK}_{\ell }^{\square }(A,B)\cup \text{Range} (Z_{j})\subseteq \mathbf {EK}_{m_{j}}^{\square }(A,B)\) if \(\ell \leqslant m_{j}\). Therefore, we can set \(\widetilde A=T_{m_{j}}\) and \(\widetilde W_{j}={\Upsilon }_{j}\) for the approximation of [𝜃j+1,ξj+1] (22).

5 Numerical examples

In this section, we illustrate the potential of the scheme we propose in this paper. The two variants of the LR-ADI-EKSM method, we have illustrated in Section 3, will be denoted by LR-ADI-EKSM(G) and LR-ADI-EKSM(MR). In particular, in LR-ADI-EKSM(G), we solve the linear systems by imposing a Galerkin condition, i.e., the matrix Y is computed by solving the reduced problem (14). In LR-ADI-EKSM(MR), Y solves the least squares problem (16).

We test Algorithm 3 on different instances of (2) coming from the discretization of certain PDEs, and we study how the computational cost of the main steps of Algorithm 3 depends on the problem dimension n and rank of the right hand side q.

The results achieved by Algorithm 3 are also compared to the ones obtained by running a standard implementation of the LR-ADI method. In particular, we employed the MATLAB function mess_lradi available in the M-M.E.S.S. package [59]. Notice that mess_lradi is intended to be a black-box routine so that many checks and inspections are performed before the actual solution process starts. This may increase the overall running time of mess_lradi. Therefore, to have fair comparisons, we also report the results obtained by running a standard implementation of LR-ADI where the overhead cost mentioned above is not present. Such a routine is simply denoted by lradi in the tables that follow.

For a better understanding, in Table 1, we summarize the adopted linear system solver included in the tested routines for each of the numerical experiments that follow. Similarly, in Table 1, we indicate whether a given scheme is equipped with the relaxation strategy coming from [38] for the selection of \(\varepsilon^{(j)}_{\mathtt{inn}}\).

Table 1 Solver: solver employed for solving the linear systems with A in LR-ADI-EKSM and K-PIK and with \(A + p_{j}I\) in lradi and mess_lradi. In the column Relaxation, we indicate whether a certain scheme is equipped with the relaxation strategy proposed in [38]

For all experiments, the tolerance ε for the relative residual norm is set to 10−8. Moreover, except for Experiment 3, we always employ the residual-Hamiltonian-based shifts presented in [37] and computed as illustrated in Section 5.

All results were obtained by running MATLAB R2020b [47] on a standard nodeFootnote 5 of the Linux cluster mechthild hosted at the Max Planck Institute for Dynamics of Complex Technical Systems in Magdeburg, Germany.

Experiment 1

In the first experiment, we consider a Lyapunov equation where

$$A=I_{h}\otimes D_{h}+D_{h}\otimes I_{h},\quad D_{h}=\text{tridiag}(1,-2,1)\in\mathbb{R}^{h\times h}.$$

Therefore, \(A\in \mathbb {R}^{n\times n}\), n = h2, is symmetric and stable. We first consider a matrix \(B\in \mathbb {R}^{n\times q}\) with random entries and unit norm, and in Table 2, we depict how the overall solution time distributes among the main steps of our algorithm for different values of n and q.

Table 2 Experiment 1. Computational timings devoted to the different main steps of LR-ADI-EKSM(G) for different values of the problem size n and rank of the right-hand side q

In both LR-ADI-EKSM(G) and LR-ADI-EKSM(MR), the linear systems with A required for the basis construction are solved by means of the MATLAB sparse direct solver “backslash.” In particular, A is factorized once and for all before the iterative procedures start so that only triangular systems are actually solved during the basis construction. The computational time for the factorization of A is always included in the results that follow.

In this experiment, LR-ADI-EKSM(G) and LR-ADI-EKSM(MR) perform very similarly. We thus report only the results achieved by the former.

As expected, the time devoted to the basis construction represents the majority of the overall computational efforts. This is the usual case in Krylov projection algorithms. This cost increases as q grows. Indeed, a larger subspace is computed making the basis construction, and in particular the orthogonalization step, rather demanding. Having a large dimensional approximation space leads to a more expensive shift computation, as well.

In Fig. 1 (left y-axis), we illustrate how the dimension of the computed extended Krylov subspace grows in terms of j for n =360 000 and different values of q.

Fig. 1
figure 1

Experiment 1. Dimension of the constructed extended Krylov subspace and computed normalized residual norms as j grows, i.e., the ADI progresses, for problem size n =360 000

In this experiment, we can notice that the subspace constructed to solve the second shifted linear system, namely \((A + p_{2}I)S_{2} = W_{1}\), is a very rich approximation space in terms of spectral information. Indeed, we need to only slightly expand it to solve the subsequent linear systems without compromising the decrease in the Lyapunov residual norm (see Fig. 1 (right y-axis)). This means that the majority of the computational efforts are dedicated to solve the second linear system, and we can capitalize on them for j > 2 reducing the overall workload of the solution process. We would like to mention that such a phenomenon is partially due to the adaptive selection of the inner tolerance \(\varepsilon _{\mathtt {inn}}^{(j)}\) coming from [38].

We now compare LR-ADI-EKSM(G) with the function mess_lradi of the M-M.E.S.S. package [59], an abstract function handle-based implementation of the LR-ADI, and lradi, a plain matrix-based implementation of the same algorithm.

To this end, we make \(B\in \mathbb {R}^{n}\) the normalized vector of all ones. For having fair comparisons, we employ the shifts computed by the LR-ADI-EKSM(G) in all the different implementations. This leads to a very similar trend in the relative residual norm achieved by the routines even though the shifted linear systems in mess_lradi and lradi are solved at very high accuracyFootnote 6, whereas the relaxation strategy of [38] is implemented in LR-ADI-EKSM(G). In Fig. 2, we report the relative difference between the relative residual norms computed by LR-ADI-EKSM(G) and mess_lradi throughout all the necessary iterations j for different problem dimension n along with the values of \(\varepsilon ^{(j)}_{\mathtt {inn}}\) we employed. In agreement with the results presented in [38], we can notice that the distance between the computed relative residual norms is always rather moderate and smaller than \(\varepsilon ^{(j)}_{\texttt {inn}}\)Footnote 7. Very similar results are obtained by comparing the residual norms attained by lradi in place of mess_lradi.

Fig. 2
figure 2

Experiment 1. Relative gap \(\dfrac{|r_j^{LR-ADI-EKSM}-r_j^{\mathtt{mess\_lradi}}|}{r_j^{\mathtt{mess\_lradi}}}\) between the residual norms \(r_{j}^{\text {LR-ADI-EKSM}}\) and \(r_{j}^{\mathtt {mess\_lradi}}\) computed by LR-ADI-EKSM(G) and mess_lradi, respectively, as j grows, i.e., ADI converges, and different problem sizes n, together with the corresponding inner inexact solver tolerance \(\varepsilon^{(j)}\), denoted \(\varepsilon_{n}\) to relate the problem sizes

We also compare the routines in terms of computation time. The results are collected in Table 3. Since we employ the shifts computed within LR-ADI-EKSM(G) also for mess_lradi and lradi, we do not consider the time devoted to the shift computation when reporting the performances of LR-ADI-EKSM(G) in Table 3.

Table 3 Experiment 1. Computational timings achieved by LR-ADI-EKSM(G), lradi, and mess_lradi for different problem sizes n

The results in Table 3 show that, for this experiment, our proposed scheme combined with the relaxation strategy presented in [38] leads to a remarkable speed-up of the solution process — up to 50% — when compared to a standard implementation of the LR-ADI method.

Experiment 2

In the second experiment, we consider a problem similar to [51, Example 6]. In particular, the matrix A comes from the centered finite difference discretization of the 3-dimensional convection-diffusion operator

$$\mathcal{L}(u)=-\zeta{\Delta} u+\mathbf{w}\cdot\nabla u,$$

on the unit cube with zero Dirichlet boundary conditions. The convection vector w is given by w = (ϕ1(x)ψ1(y)π1(z),0,π3(z)) = ((1 − x2)yz,0,ez) whereas ζ >0. By employing h nodes in each direction, the discretization phase leads to a matrix A that can be written as

$$A=(D_{h}+{\Pi}_{3}N^{\mathsf{T}})\otimes I_{h}\otimes I_{h}+I_{h}\otimes D_{h}\otimes I_{h}+I_{h}\otimes I_{h}\otimes D_{h}+{\Pi}_{1}\otimes{\Psi}_{1}\otimes{\Phi}_{1}N,$$

where \(D_{h}=\zeta {(h-1)}^{2}\cdot \text {tridiag}(-1,2,-1)\in \mathbb {R}^{h\times h}\), \(N=-\frac {(h-1)}{2}\cdot \text {tridiag}(-1,0,1)\in \mathbb {R}^{h\times h}\), and Φi, Ψi, and πi are diagonal matrices whose diagonal entries correspond to the nodal values of the corresponding functions ϕi, ψi, and πi (see [51] for further details). \(B\in \mathbb {R}^{n}\), n = h3, is a vector with random entries.

Due to the 3D nature of the problem, the nonsymmetric linear systems with A involved in the basis construction in LR-ADI-EKSM are solved by GMRES [57]. In particular, we employ the GMRES implementation written by Lund et al [35], namely the function bgmres in [45]. GMRES is stopped whenever the computed relative residual norm gets smaller than 10−10.

It is well known that (polynomial) Krylov methods for linear systems need to be preconditioned to achieve a fast convergence in terms of number of iterations. To this end, as suggested in [51], we employ the following preconditioning operator when solving the linear systems with A,

$$\mathcal{P}=(D_{h}+{\Pi}_{3}N^{\mathsf{T}})\otimes I_{h}\otimes I_{h}+I_{h}\otimes D_{h}\otimes I_{h}+I_{h}\otimes I_{h}\otimes D_{h}+\overline\pi_{1}I_{h}\otimes{\Psi}_{1}\otimes{\Phi}_{1}N,$$

where \(\overline \pi _{1}\) is the mean value of the function π1 in [0,1]. At each GMRES iteration, we thus have to invert \(\mathcal {P}\), namely we have to compute \(\overline v=\mathcal {P}^{-1}v\) for \(v\in \mathbb {R}^{n}\). This operation is performed by solving the Sylvester equation

$$(D_{h}\otimes I_{h}+I_{h}\otimes D_{h}+\overline\pi_{1}{\Psi}_{1}\otimes{\Phi}_{1}N) \mathbf{\overline V}+\mathbf{\overline V}{(D_{h}+{\Pi}_{3}N^{\mathsf{T}})}^{\mathsf{T}}=\mathbf{V},$$

where \(\mathbf {\overline V},\mathbf {V}\in \mathbb {R}^{h^{2}\times h}\) are such that \(\text {vec}(\mathbf {\overline V})=\overline v\) and vec(V) = v. Since the coefficient matrices in the equation above have moderate dimensions, the Bartels-Stewart method [6] is employed for its solution and the Schur decompositions of the coefficient matrices are computed once and for all before the iterative procedure starts. We always employ a right preconditioning scheme in order to easily have access to the actual residual norm.

Also, for the shifted linear systems with \(A + p_{j}I\), within mess_lradi and lradi, we employ preconditioned GMRES equipped with the preconditioning operator \(\mathcal {P}+p_{j}I\). Once again, this preconditioner is applied by solving the Sylvester equation

$$(D_{h}\otimes I_{h}+I_{h}\otimes D_{h}+\overline\pi_{1}{\Psi}_{1}\otimes{\Phi}_{1}N) \mathbf{\overline V}+\mathbf{\overline V}{(D_{h}+{\Pi}_{3}N^{\mathsf{T}}+p_{j}I_{h})}^{\mathsf{T}}=\mathbf{V}.$$

Even though this is in general a better preconditioner for \(A + p_{j}I\) compared to \(\mathcal {P}\), its application involves complex arithmetic whenever Im(pj)≠0 with a consequent increment in the computational efforts devoted to the preconditioning step.

For this experiment, lradi is equipped with the relaxation strategy presented in [38].

Also for this experiment, LR-ADI-EKSM(G) and LR-ADI-EKSM(MR) perform very similarly, with LR-ADI-EKSM(MR) achieving slightly better results in terms of computational time. We thus report only the performance of LR-ADI-EKSM(MR).

The results are collected in Table 4 for different values of n and ζ. In Table 4, we also report the number of shifts with nonzero imaginary part.

Table 4 Experiment 2. Computational timings achieved by LR-ADI-EKSM(MR), lradi, and mess_lradi for different problem sizes n and diffusivities ζ

We would like to mention that we ran some experiments with mess_lradi where the shifted linear systems were solved by means of the MATLAB sparse direct solver “backslash” in place of preconditioned GMRES. However, for this example, the potentially higher accuracy of the direct solves did not benefit the computation and the execution times we achieved with “backslash” could not keep up with the ones reported for GMRES in Table 4. We, thus, decided to omit them here.

From the results in Table 4, we can see that LR-ADI-EKSM(MR) is very competitive and always achieves computational timings that are significantly smaller than the ones required by mess_lradi. Thanks to the relaxation procedure coming from [38], lradi performs better than mess_lradi.

The performance of all the tested routines is strictly related to the number of complex shifts needed to converge. When this is sizable with respect to the total number of iterations, many of the n × n linear systems \(A + p_{j}I\) within mess_lradi and lradi involve complex arithmetic, whereas this is needed only in the solution of the small dimensional least squares problem for the computation of Y in LR-ADI-EKSM(MR).

We notice that, for a fixed n, the computational time of LR-ADI-EKSM(MR) decreases, in general, by reducing ζ, even tough the number of LR-ADI iterations that are implicitly performed increases. This is due to the computational efforts required by the solution of the linear systems with A during the basis construction. Indeed, for ζ =0.05, many more GMRES iterations are required than what is necessary for ζ =0.005. In Fig. 3, we report the number of GMRES iterations needed to solve the linear system with A at each m, namely every time a new basis vector of the adopted extended Krylov subspace needs to be computed.

Fig. 3
figure 3

Experiment 2. Number of GMRES iterations needed to solve the linear systems with A during the basis construction in LR-ADI-EKSM(MR) for different values of the diffusivity ζ and n =125 000

A rather large number of GMRES iterations are required for solving the linear systems with A in case of ζ =0.05 making the construction of the basis of \(\mathbf {E}\mathbf {K}^{\square }_{m}(A,B)\) more demanding. On the other hand, few GMRES iterations are sufficient to meet the prescribed accuracy for ζ =0.005 and the overall solution procedure turns out to be very successful.

Experiment 3

In this experiment, we compare LR-ADI-EKSM also with K-PIK [63], since the two routines construct the same subspaceFootnote 8. We consider the thermal part of the thermo-elastic modeling of a building-block of an experimental machine tool given by the following heat equation

$$\left\{\begin{array}{ll} c_{p}\rho\frac{\partial T}{\partial t}&=\lambda{\Delta} T,\quad \text{in }{\Omega},\\ \lambda\frac{\partial T}{\partial \textbf{n}}&=f,\quad \text{on }{\Gamma}_{c}\subset\partial{\Omega}, \\ \lambda\frac{\partial T}{\partial \textbf{n}}&=\alpha(T_{ext}-T),\quad \text{on }{\Gamma}_{ext}\subset\partial{\Omega}, \\ T(0)&=0. \end{array}\right.$$
(23)

The discretization in space using the finite element method (here applying the proprietary tool ANSYSFootnote 9) on the three-dimensional domain, given by the machine frame indicated in Fig. 4, leads to the LTI system

$$E\dot T=\left(A-\sum\limits_{i=1}^{t} \alpha_{i}F_{i}\right) T+Bu(t).$$
(24)

where A represents the discretized Laplacian together with the Robin boundary contributions from Γext and represented by Fi, while B results from the external control inputs (heats fluxes, e.g., induced by the drive motors) on Γc. Note that the elastic part of the thermo-elastic model can be encoded entirely in the output equation of the corresponding dynamical system and is, thus, not relevant here [40]. The algebraic problem resulting from this system amounts to a Lyapunov equation of the form (1). However, due to mass lumping in ANSYS, the mass matrix E is diagonal and SPD. We can, thus, easily invert its square root and consider the Lyapunov equation

Fig. 4
figure 4

Experiment 2. Finite element grid of the machine frame indicated on the CAD model of the full machine. (Source: DFG CRC/TR-96 (https://transregio96.de))

$$E^{-\frac{1}{2}}\left(A-\sum\limits_{i=1}^{t} \alpha_{i}F_{i}\right)E^{-\frac{1}{2}}\widetilde X + \widetilde XE^{-\frac{1}{2}}{\left(A-\sum\limits_{i=1}^{t} \alpha_{i}F_{i}\right)}^{\mathsf{T}}E^{-\frac{1}{2}}+E^{-\frac{1}{2}}BB^{\mathsf{T}}E^{-\frac{1}{2}}=0,\quad \widetilde X=(E^{\frac{1}{2}}XE^{\frac{1}{2}}).$$

So, again, we can efficiently retract to a problem of the form (2). Once a low-rank approximation \(\widetilde Z\widetilde Z^{\mathsf {T}}\) to \(\widetilde X\) is computed, the low-rank factor Z such that ZZTX can be retrieved by performing \(Z=E^{-\frac {1}{2}}\widetilde Z\).

The actual machine frame in Fig. 4 consists of several parts itself, which are discretized separately. This leads to differently sized models of the structure in (24). These are reflected by the rows of Table 5. Accordingly, we solve the Lyapunov equation considering different configurations of the PDE (23), respectively the LTI system in (24). In particular, this allows us to vary the number of degrees of freedom employed in the discretization phase, leading to different problem dimensions n, modify the Neumann boundary conditions obtaining diverse matrices Fi, and consider different values for the rank q of B. Moreover, we set αi =10 for all i =1,…,t.

Table 5 Experiment 3. Computational timings achieved by LR-ADI-EKSM(G), K-PIK, and mess_lradi for different values of problem size n, number of Robin boundary conditions t, and rank of the right-hand side q

The results are collected in Table 5. It turns out that the Wachspress ADI shifts [42, 73] are particularly effective for this experiment, since A as well as all the Fi and thus \(E^{-\frac{1}{2}}\;(A\;-{\textstyle\sum_{\;i=1}^t}\;\alpha_iF_i)\;E^{-\frac{1}{2}}\) are symmetric, i.e., the spectrum is real. These are the ideal circumstances for Wachspress shifts. We, thus, employ those shifts in LR-ADI-EKSM(G) and mess_lradi.

For this experiment, the LR-ADI method, either based on our new formulation or on a standard scheme as the one in mess_lradi, turns out to be more efficient in terms of computational time than K-PIK. Indeed, in spite of the smaller number of iterations needed to converge, the large dimension of the extended Krylov subspace constructed by K-PIK leads to a rather costly solution of the projected equations. Also LR-ADI-EKSM(G) requires the construction of an extended Krylov subspace whose dimension is similar to the one computed by K-PIK. However, if \({\dim }\left (\mathbf {E}\mathbf {K}^{\square }_{m}(E^{-\frac {1}{2}}\left (A-{\sum }_{i=1}^{t} \alpha _{i}F_{i}\right )E^{-\frac {1}{2}},E^{-\frac {1}{2}}B)\right )=2mq\), the computational cost of solving the inner problems within LR-ADI-EKSM(G) is \(\mathcal {O}(4m^{2}q^{2})\) floating-point operations (FLOPs) whereas it amounts to \(\mathcal {O}(8m^{3}q^{3})\) FLOPs for K-PIK.

We conclude by mentioning that in this experiment, we relied on the ease of computing \(E^{-\frac {1}{2}}\). However, it may happen that the mass matrix E cannot be easily manipulated, e.g., it can be possibly singular, so that the routine presented in this paper cannot be readily applied as we have done in this experiment. We plan to extend the LR-ADI-EKSM framework to this more challenging class of equations in the near future.

Experiment 4

In the last experiment, we show that the proposed framework still needs some further improvements to efficiently deal with generalized Lyapunov equations of the form (1) where the mass matrix E is not diagonal. To this end, we consider the Steel Profile data set [18, 49] from the MORwiki repository [72].

We compute the observability Gramian of the system, namely the solution X to the equation

$$A^{\mathsf{T}}XE+E^{\mathsf{T}}XA+C^{\mathsf{T}}C=0,$$
(25)

where \(A\in \mathbb {R}^{n\times n}\) is symmetric negative definite, \(C\in \mathbb {R}^{q\times n}\), q =6, and \(E\in \mathbb {R}^{n\times n}\) is SPD but not diagonal (see [17] fur further details on the model).

If E = LLT denotes the Cholesky factorization of E, we consider the transformed equation

$$(L^{-1}A^{\mathsf{T}}L^{-\mathsf{T}})\widetilde X+\widetilde X(L^{-1}AL^{-\mathsf{T}})+L^{-1}C^{\mathsf{T}}CL^{-\mathsf{T}}=0,\quad \widetilde X=L^{\mathsf{T}}XL,$$
(26)

and, due to symmetry of A, employ the extended Krylov subspace \(\mathbf {EK}_{m}^{\square }(L^{-1}AL^{-\mathsf {T}},L^{-1}C^{\mathsf {T}})\) as approximation space. Notice that the matrix L−1ALT does not need to be explicitly constructed (see, e.g., [63, Example 5.4]). As before, once \(\widetilde Z\widetilde Z^{\mathsf {T}}\approx \widetilde X\) is computed, we obtain a low-rank approximation to the original X by performing \(Z=L^{-\mathsf {T}}\widetilde Z\).

In Table 6, we report the results achieved by LR-ADI-EKSM(G) and \({\tt mess\_lradi}\) for different values of n.

Table 6 Experiment 4. Computational timings achieved by LR-ADI-EKSM(G) and mess_lradi for different values of the problem size n

From the results in Table 6, we can readily see that the standard scheme of the LR-ADI method implemented in mess_lradi is much faster than LR-ADI-EKSM(G). This is due to the fact that the latter algorithm needs to construct a quite large subspace to achieve the prescribed accuracy with a consequent increment in the computational efforts of the overall procedure.

We also mention that the rank of the approximate solution computed by LR-ADI-EKSM(G) is much lower than the dimension of the constructed subspace. We believe that the transformation we performed in (26), and thus the employment of \(\mathbf {E}\mathbf {K}_{m}^{\square }(L^{-1}AL^{-\mathsf {T}},L^{-1}C^{T})\), may lead to some spectral redundancy in the adopted approximation subspace and a slower convergence of the method. On the other hand, mess_lradi is able to deal with the original formulation (25) of the problem.

To address generalized equations of the form (1), we plan to study the employment of different techniques within the Krylov LR-ADI framework we presented in this paper. In particular, the use of nonstandard inner products and (extended) generalized Krylov subspaces [43] will be explored.

6 Conclusions

A new formulation of the LR-ADI algorithm for large-scale standard Lyapunov equations has been proposed. The computational core of the LR-ADI scheme consists in the solution of a shifted linear system at each iteration. We showed that the extended Krylov subspace method can be a valid candidate for this task. In particular, we described how only one extended Krylov subspace needs to be constructed to solve all the necessary linear systems required by the LR-ADI method. The LR-ADI iteration has been completely merged into the extended Krylov subspace method for shifted linear systems resulting in a novel, efficient solution procedure. We also showed that many state-of-the-art algorithms for the shift computation can be easily integrated into our new scheme. Numerical results demonstrate the potential of our novel algorithm, especially when this is equipped with the relaxation strategy proposed in [38], and many complex shifts are needed to converge.

In future work, we will consider more involved Lyapunov equations of the form (1) that cannot be easily transformed into (2). While standard implementations of the LR-ADI method naturally address such a scenario by solving linear systems of the form \(A + p_{j}E\), further care has to be taken to employ the scheme we presented in this paper. Indeed, the shifted Arnoldi relation (7) can no longer be exploited. The use of non-standard inner products and generalized Krylov subspace methods [43] will be investigated.

The framework presented in this paper can be generalized to enhance other LR-ADI-like algorithms for matrix equations. For instance, the LR-ADI method for Sylvester equations [14] or LR-RADI schemes for Riccati equations [10, 22] can be equipped with a procedure similar to the one we proposed here.