RADI: a lowrank ADItype algorithm for large scale algebraic Riccati equations
 813 Downloads
 2 Citations
Abstract
This paper introduces a new algorithm for solving largescale continuoustime algebraic Riccati equations (CARE). The advantage of the new algorithm is in its immediate and efficient lowrank formulation, which is a generalization of the Choleskyfactored variant of the Lyapunov ADI method. We discuss important implementation aspects of the algorithm, such as reducing the use of complex arithmetic and shift selection strategies. We show that there is a very tight relation between the new algorithm and three other algorithms for CARE previously known in the literature—all of these seemingly different methods in fact produce exactly the same iterates when used with the same parameters: they are algorithmically different descriptions of the same approximation sequence to the Riccati solution.
Mathematics Subject Classification
15A24 15A18 65F15 65F301 Introduction
The alternating directions implicit (ADI) method [35] is a well established iterative approach for computing solutions of Lyapunov and other linear matrix equations. There exists an array of ADI methods [7, 8, 21, 22, 27, 30, 35, 36], covering both the ordinary and the generalized case. All of these methods have simple statements and efficient implementations [27]. One particular advantage of ADI methods is that they are very well suited for largescale problems: the default formulation which works with fullsize dense matrices can be transformed into a series of iteratively built approximations to the solution. Such approximations are represented in factored form, each factor having a very small rank compared to the dimensions of the input matrices. This makes ADI methods very suitable for largescale applications.
Recently, Wong and Balakrishnan [37, 38] suggested a socalled quadratic ADI method (qADI) for solving the algebraic Riccati equation (1). Their method is a direct generalization of the Lyapunov ADI method, but only when considering the formulation working with fullsize dense matrices. However, in the setting of the qADI algorithm, it appears impossible to apply a socalled “Li–White trick” [22], which is the usual method of obtaining a lowrank formulation of an ADI method. Wong and Balakrishnan do provide a lowrank variant of their algorithm, but this variant has an important drawback: in each step, all the lowrank factors have to be rebuilt from scratch. This has a large negative impact on the performance of the algorithm.
Apart from the qADI method, there are several other methods for solving the largescale Riccati equation that have appeared in the literature recently. Amodei and Buchot [1] derive an approximation of the solution by computing smalldimensional invariant subspaces of the associated Hamiltonian matrix (2). Lin and Simoncini [23] also consider the Hamiltonian matrix, and construct the solution by running subspace iterations on its Cayley transforms. Massoudi et al. [24] have shown that the latter method can be obtained from the control theory point of view as well.
In this paper, we introduce a new ADItype iteration for Riccati equations, RADI. The derivation of RADI is not related to qADI, and it immediately gives the lowrank algorithm which overcomes the drawback from [37, 38]. The lowrank factors are built incrementally in the new algorithm: in each step, each factor is expanded by several columns and/or rows, while keeping the elements from the previous steps intact. By setting the quadratic coefficient B in (1) to zero, our method reduces to the lowrank formulation of the Lyapunov ADI method, see, e.g., [4, 7, 22, 27].
A surprising result is that, despite their completely different derivations, all of the Riccati methods we mentioned so far are equivalent: the approximations they produce in each step are the same. This was already shown [3] for the qADI algorithm, and the algorithm of Amodei and Buchot. In this paper we extend this equivalence to our new lowrank RADI method and the method of Lin and Simoncini. Among all these different formulations of the same approximation sequence, RADI offers a compact and efficient implementation, and is very well suited for effective computation.
This paper is organized as follows: in Sect. 2, we recall the statements of the Lyapunov ADI method and the various Riccati methods, and introduce the new lowrank RADI algorithm. The equivalence of all aforementioned methods is shown in Sect. 3. In Sect. 4 we discuss important implementation issues, and in particular, various strategies for choosing shift parameters. Finally, Sect. 5 compares the effect of different options for the algorithm on its performance via several numerical experiments. We compare RADI with other algorithms for computing lowrank approximate solutions of (1) as well: the extended [16] and rational Krylov subspace methods [34], and the lowrank Newton–Kleinman ADI iteration [7, 9, 15, 29].
The following notation is used in this paper: \(\mathbb {C}_\) and \(\mathbb {C}_+\) are the open left and right half plane, respectively, while \({\text {Re}}\left( z\right) ,~{\text {Im}}\left( z\right) \), \(\overline{z}={\text {Re}}\left( z\right) \mathsf {i}{\text {Im}}\left( z\right) \), z denote the real part, imaginary part, complex conjugate, and absolute value of a complex quantity z. For the matrix A, we use \(A^*\) and \(A^{1}\) for the complex conjugate transpose and the inverse, respectively. In most situations, expressions of the form \(x=A^{1}b\) are to be understood as solving the linear system \(Ax=b\) of equations for b. The relations \(A>(\ge )0\), \(A<(\le )0\) stand for the matrix A being positive or negative (semi)definite. Likewise, \(A\ge (\le ) B\) refers to \(AB\ge (\le )0\). If not stated otherwise, \(\Vert \cdot \Vert \) is the Euclidean vector or subordinate matrix norm, and \(\kappa (\cdot )\) is the associated condition number.
2 A new lowrank factored iteration
2.1 Derivation of the algorithm
The common way of converting ADI iterations into their lowrank variants is to perform a procedure similar to the one originally done by Li and White [22] for the Lyapunov ADI method. A crucial assumption for this procedure to succeed is that the matrices participating in the linear systems in each of the halfsteps mutually commute for all k. For the Lyapunov ADI method this obviously holds true for the matrices \(A^*+ \sigma _{k+1} I\). However, in the case of the quadratic ADI iteration (6), the matrices \(A^*+ \sigma _{k+1} I  X^{\mathsf {adi}}_{k} G\) do not commute in general, for all k, and neither do \(A^*+ \sigma _{k+1} I  X^{\mathsf {adi}}_{k+1/2} G\).
Theorem 1
 (a)Let \(X=\varXi + {\tilde{X}}\) be an exact solution of (1). Then \({\tilde{X}}\) is a solution to the residual equationwhere \(\tilde{A} = A  G \varXi \) and \(\tilde{Q} = \mathscr {R}(\varXi )\).$$\begin{aligned} \tilde{A}^*{\tilde{X}} + {\tilde{X}} \tilde{A} + \tilde{Q}  {\tilde{X}}G{\tilde{X}} = 0, \end{aligned}$$(10)
 (b)
Conversely, if \({\tilde{X}}\) is a solution to (10), then \(X = \varXi + {\tilde{X}}\) is a solution to the original Riccati equation (1). Moreover, if \(\varXi \ge 0 \) and \({\tilde{X}}\) is a stabilizing solution to (10), then \(X = \varXi + {\tilde{X}}\) is the stabilizing solution to (1).
 (c)
If \(\varXi \ge 0\) and \(\mathscr {R}(\varXi ) \ge 0\), then the residual equation (10) has a unique stabilizing solution.
 (d)
If \(\varXi \ge 0\) and \(\mathscr {R}(\varXi ) \ge 0\), then \(\varXi \le X\), where X is the stabilizing solution of (1).
Proof
 (a)
This is a straightforward computation which follows by inserting \({\tilde{X}} = X\varXi \) and the formula for the residual of \(\varXi \) into (10), see also [2, 25].
 (b)
The first part follows as in (a). If \(\varXi \ge 0\) and \({\tilde{X}}\) is a stabilizing solution to (10), then \(X = \varXi + {\tilde{X}} \ge 0\) and \(AGX = \tilde{A}  G{\tilde{X}}\) is stable, which makes X the stabilizing solution to (1).
 (c), (d)
The claims follow directly from (a) and [19, Theorem 9.1.1].
Proposition 1
 (a)
\({\tilde{X}}_1 \ge 0\), i.e. \(\varXi + {\tilde{X}}_1 \ge 0\).
 (b)
\(\mathscr {R}(\varXi + {\tilde{X}}_1) = \hat{C}^*\hat{C} \ge 0\), where \(\hat{C}^*= {\tilde{C}}^*+ \sqrt{2{\text {Re}}\left( \lambda \right) } \cdot {\tilde{V}}_1 {\tilde{Y}}_1^{1}\).
Proof
 (a)
Positive definiteness of \({\tilde{Y}}_1\) (and then the semidefiniteness of \({\tilde{X}}_1\) as well) follows directly from \({\text {Re}}\left( \lambda \right) < 0\).
 (b)Note that \({\tilde{A}}^*{\tilde{V}}_1 = \sqrt{2{\text {Re}}\left( \lambda \right) } \cdot {\tilde{C}}^* \lambda {\tilde{V}}_1\), and \( ({\tilde{V}}_1^*B)({\tilde{V}}_1^*B)^*= 2{\text {Re}}\left( \lambda \right) I  2{\text {Re}}\left( \lambda \right) {\tilde{Y}}_1\). We use these expressions to obtain:$$\begin{aligned} \mathscr {R}(\varXi + {\tilde{X}}_1)&={\tilde{A}}^*{\tilde{X}}_1 + {\tilde{X}}_1 {\tilde{A}} + \mathscr {R}(\varXi )  {\tilde{X}}_1 BB^*{\tilde{X}}_1 \\&= \left( {\tilde{A}}^*{\tilde{V}}_1\right) {\tilde{Y}}_1^{1} {\tilde{V}}_1^*+ {\tilde{V}}_1 {\tilde{Y}}_1^{1} \left( {\tilde{A}}^*{\tilde{V}}_1\right) ^*+ {\tilde{C}}^*{\tilde{C}} \\&\quad \, {\tilde{V}}_1 {\tilde{Y}}_1^{1} \left( {\tilde{V}}_1^*B\right) \left( {\tilde{V}}_1^*B\right) ^*{\tilde{Y}}_1^{1} {\tilde{V}}_1^*\\&= \sqrt{2{\text {Re}}\left( \lambda \right) } \cdot {\tilde{C}}^*{\tilde{Y}}_1^{1} {\tilde{V}}_1^*+ \sqrt{2{\text {Re}}\left( \lambda \right) } \cdot {\tilde{V}}_1 {\tilde{Y}}_1^{1} {\tilde{C}} + {\tilde{C}}^*{\tilde{C}} \\&\quad \,  2{\text {Re}}\left( \lambda \right) {\tilde{V}}_1 {\tilde{Y}}_1^{1} {\tilde{Y}}_1^{1} {\tilde{V}}_1^*\\&= \left( {\tilde{C}}^*+ \sqrt{2{\text {Re}}\left( \lambda \right) }\cdot {\tilde{V}}_1 {\tilde{Y}}_1^{1}\right) \cdot \left( {\tilde{C}}^*+ \sqrt{2{\text {Re}}\left( \lambda \right) }\cdot {\tilde{V}}_1 {\tilde{Y}}_1^{1}\right) ^*. \end{aligned}$$
When \(p=1\) and all the shifts are chosen as eigenvalues of the Hamiltonian matrix associated with the initial Riccati equation (1), the update described in Proposition 1 reduces to [3, Theorem 5]. Thus in that case, the RADI algorithm reduces to the invariant subspace approach (8).
Furthermore, iteration (12) clearly reduces to the lowrank Lyapunov ADI method (5) when \(B=0\); in that case \(Y_k = I\). The relation to the original qADI iteration (6) is not clear unless \(p=1\) and the shifts are chosen as eigenvalues of \(\mathscr {H}\), in which case both of these methods coincide with the invariant subspace approach. We discuss this further in the following section.
3 Equivalences with other Riccati methods
In this section we prove that all Riccati solvers introduced in Sect. 2 in fact compute exactly the same iterations, which we will refer to as the Riccati ADI iterations in the remaining text. This result is collected in Theorem 2; we begin with a simple technical lemma that provides different representations of the residual factor.
Lemma 1
Proof
Theorem 2
Proof
We first use induction to show that \(X_k = X^{\mathsf {adi}}_{k}\), for all k.
In the case of \({\text {rank}}{C}=1\) and shifts equal to the eigenvalues of \(\mathscr {H}\), the equality \(X^{\mathsf {inv}}_{k}= X^{\mathsf {adi}}_{k}\) is already shown in [3]. Equality among the iterates generated by the other methods is a special case of what we have proved above. \(\square \)
It is interesting to observe that [23] also provides a lowrank variant of the Cayley subspace iteration algorithm: there, formulas for updating the factors of \(X^{\mathsf {cay}}_{k}= Z^{\mathsf {cay}}_{k}(Y^{\mathsf {cay}}_{k})^{1} (Z^{\mathsf {cay}}_{k})^*\), where \(Z^{\mathsf {cay}}_{k}\in {\mathbb {C}}^{n \times pk}\) and \(Y^{\mathsf {cay}}_{k}\in {\mathbb {C}}^{pk \times pk}\), are given. The contribution of [24] was to show that the same formulas can be derived from a controltheory point of view. The main difference in comparison to our RADI variant of the lowrank Riccati ADI iterations is that, in order to compute \(Z^{\mathsf {cay}}_{k}\), one uses the matrix \((A^*+{\sigma _k}I)^{1}\), instead of \((A^*X_{k1}G+{\sigma _k}I)^{1}\), when computing \(Z_k\). This way, the need for using the Sherman–Morrison–Woodbury formula is avoided. However, as a consequence, the matrix \(Y^{\mathsf {cay}}_{k}\) looses the blockdiagonal structure, and its update formula becomes much more involved. Also, it is very difficult to derive a version of the algorithm that would use real arithmetic. Another disadvantage is the computation of the residual: along with \(Z^{\mathsf {cay}}_{k}\) and \(Y^{\mathsf {cay}}_{k}\), one needs to maintain a QRfactorization of the matrix \([C^*\;\; A^*Z^{\mathsf {cay}}_{k}\;\; Z^{\mathsf {cay}}_{k}]\), which adds significant computational complexity to the algorithm.
Proposition 2
4 Implementation aspects of the RADI algorithm
There are several issues with the iteration (12), stated as is, that should be addressed when designing an efficient computational routine: how to decide when the iterates \(X_k\) have converged, how to solve linear systems with matrices \(A^* X_{k1}G + \sigma _k I\), and how to minimize the usage of complex arithmetic. In this section we also discuss the various shift selection strategies.
4.1 Computing the residual and the stopping criterion
Tracking the progress of the algorithm and deciding when the iterates have converged is very simple, and can be computed cheaply thanks to the expression \(\Vert \mathscr {R}(X_k)\Vert = \Vert R_k R_k^*\Vert = \Vert R_k^*R_k\Vert .\) This is an advantage compared to the Cayley subspace iteration, where computing \(\Vert \mathscr {R}(X_k)\Vert \) is more expensive because a lowrank factorization along the lines of Proposition 1 (b) is currently not known. The RADI iteration is stopped once the residual norm has decreased sufficiently relative to the initial residual norm \(\Vert C C^*\Vert \) of the approximation \(X_0=0\).
4.2 Solving linear systems in RADI
The RADI algorithm, implementing the techniques described above, is listed in Algorithm 1. Note that, if only the feedback matrix K is of interest, e.g. if the CARE arises from an optimal control problem, there is no need to store the whole lowrank factors Z, Y since Algorithm 1 requires only the latest blocks to continue. This is again similar to the lowrank Newton ADI solver [7], and not possible in the current version of the Cayley subspace iteration.
4.3 Reducing the use of complex arithmetic
To increase the efficiency of Algorithm 1, we reduce the use of complex arithmetic. We do so by taking shift \(\sigma _{k+1} = \overline{\sigma _k}\) immediately after the shift \(\sigma _k \in {\mathbb {C}}\setminus {\mathbb {R}}\) has been used, and by merging these two consecutive RADI steps into a single one. This entire procedure will have only one operation involving complex matrices: a linear solve with the matrix \(A^* X_{k1}G + \sigma _k I\) to compute \(V_k\). There are two key observations to be made here. First, by modifying the iteration slightly, one can ensure that the matrices \(K_{k+1}\), \(R_{k+1}\), and \(Y_{k+1}\) are real and can be computed by using real arithmetic only, as shown in the upcoming technical proposition. Second, there is no need to compute \(V_{k+1}\) at all to proceed with the iteration: the next matrix \(V_{k+2}\) will once again be computed by using the residual \(R_{k+1}\), the same way as in Algorithm 1.
Proposition 3
Proof
The basic idea of the proof is similar to [5, Theorem 1]; however, there is a major complication involving the matrix Y, which in the Lyapunov case is simply equal to the identity. Due to the technical complexity of the proof, we only display key intermediate results. To simplify notation, we use indices 0, 1, 2 instead of \(k1\), k, \(k+1\), respectively.
4.4 RADI iteration for the generalized Riccati equation
4.5 Shift selection
The problem of choosing the shifts in order to accelerate the convergence of the Riccati ADI iteration is very similar to the one for the Lyapunov ADI method. Thus we apply and discuss the techniques presented in [3, 6, 27, 30] in the context of the Riccati equation, and compare them in several numerical experiments. It appears natural to employ the heuristic Penzl shifts [27]. There, a small number of approximate eigenvalues of A are generated. From this set the values which lead to the smallest magnitude of the rational function associated to the ADI iteration are selected in a heuristical manner. Simoncini and Lin [23] have shown that the convergence of the Riccati ADI iteration is related to a rational function built from the stable eigenvalues of \(\mathscr {H}\). This suggests to carry out the Penzl approach, but to use approximate eigenvalues for the Hamiltonian matrix \(\mathscr {H}\) instead of A. Note that, due to the low rank of Q and G, we can expect that most of the eigenvalues of A are close to the eigenvalues of \(\mathscr {H}\), see the discussion in [3]. Thus in many cases the Penzl shifts generated by A should suffice as well. Penzl shifts require significant preprocessing computation: in order to approximate the eigenvalues of \(M=A\) or \(M=\mathscr {H}\), one has to build Krylov subspaces with matrices M and \(M^{1}\). All the shifts are computed in this preprocessing stage, and then simply cycled during the RADI iteration. Here, we will mainly focus on alternative approaches generating each shift just before it is used. This way we hope to compute a shift which will better adapt to the current stage of the algorithm.
4.5.1 Residual Hamiltonian shifts
We still have to define the subspace \({\text {span}}\{ U \}\). One option is to use \(V_k\) (or, equivalently, the last p columns of the matrix \(Z_k\)), which works very well in practice unless \(p=1\). When \(p=1\), all the generated shifts are real, which can make the convergence slow in some cases. Then it is better to choose the last \(\ell \) columns of the matrix \(Z_k\); usually already \(\ell =2\) or \(\ell =5\) or a small multiple of p will suffice. An ultimate option is to use the entire \(Z_k\), which we denote as \(\ell =\infty \). This is obviously more computationally demanding, but it provides fast convergence in all cases we tested.
4.5.2 Residual minimizing shifts
For \(p>1\), a similar formula can be derived, but one should note that the function f is not necessarily differentiable at every point \(\sigma \), see, e.g., [26]. Thus, a numerically more reliable heuristic [18] is to artificially reduce the problem once again to the case \(p=1\). This can be done in the following way: let v denote the right singular vector corresponding to the largest singular value of the matrix \(R_k\). Then \(R_k\in {\mathbb {C}}^{n \times p}\) in (28) is replaced by the vector \(R_k v \in {\mathbb {C}}^{n}\). Since numerical optimization algorithms usually require a starting point for the optimization, the two shift generating approaches may be combined: the residual Hamiltonian shift can be used as the starting point in the optimization for the second approach. However, from our numerical experience we conclude that the additional computational effort invested in the postoptimization of the residual Hamiltonian shifts often does not contribute to the convergence. The main difficulty is the choice of an adequate subspace U such that the projected objective function approximates (28) well enough. This issue requires futher investigation. The rationale is given in the following example.
Example 1
Figure 1a shows a region of the complex plane; stars are at locations of the stable eigenvalues of the projected Hamiltonian matrix (27). The one eigenvalue chosen as the residual Hamiltonian shift \(\sigma _{\mathsf {ham}}\) is shown as ‘x’. The residual minimizing shift \(\sigma _{\mathsf {opt}}\) is shown as ‘o’. Each point \(\sigma \) of the complex plane is colored according to the ratio \(\rho ^{\mathsf {proj}}(\sigma )=\Vert R^{\mathsf {proj}}_{14}(\sigma )\Vert / \Vert R^{\mathsf {proj}}_{13}\Vert \), where \(R^{\mathsf {proj}}\) is the residual for the projected Riccati equation. The ratio \(\rho ^{\mathsf {proj}}(\sigma _{\mathsf {ham}}) \approx 0.54297\) is not far from the optimal ratio \(\rho ^{\mathsf {proj}}(\sigma _{\mathsf {opt}}) \approx 0.53926\).
On the other hand, Fig. 1b shows the complex plane colored according to ratios for the original system of order 10648, \(\rho (\sigma ) = \Vert R_{14}(\sigma )\Vert / \Vert R_{13}\Vert \). Neither of the values \(\rho (\sigma _{\mathsf {ham}}) \approx 0.71510\) and \(\rho (\sigma _{\mathsf {opt}}) \approx 0.71981\) is optimal, but they both offer a reasonable reduction of the residual norm in the next step. In this case, \(\sigma _{\mathsf {ham}}\) turns out even to give a slightly better residual reduction for the original equation than \(\sigma _{\mathsf {opt}}\), making the extra effort in running the numerical optimization algorithm futile.
5 Numerical experiments
In this section we show a number of numerical examples, with several objectives in mind. First, our goal is to compare different lowrank implementations of the Riccati ADI algorithm mentioned in this paper: the lowrank qADI proposed in [37, 38], the Cayley transformed subspace iteration [23, 24], and the complex and real variants of the RADI iteration (12). Second, we compare performance of the RADI approach against other methods for solving largescale Riccati equations, namely the rational Krylov subspace method (RKSM) [34], the extended block Arnoldi (EBA) method [16, 32], and the NewtonADI algorithm [7, 9, 15].
Finally, we discuss various shift strategies for the RADI iteration described in the previous section.
The numerical experiments are run on a desktop computer with a fourcore Intel Core i54690K processor and 16GB RAM. All algorithms and testing routines are implemented and executed in MATLAB R2014a, running on Microsoft Windows 8.1.
Example 2
Consider again the Riccati benchmark CUBE from Example 1. We use three versions of this example: the previous setting with \(n=10648\), \(m=p=1\) and \(m=p=10\), and later on a finer discretization with \(n=74088\) and \(m=10\), \(p=1\).
Clearly, the real variant of iteration (12), implemented as in Algorithm 2, outperforms all the others.^{1} Thus we use this implementation in the remaining numerical experiments. The RADI algorithms mostly obtain the advantage over the Cayley subspace iteration because of the cheap computation of the residual norm. In the latter algorithm, costly orthogonalization procedures are required for this task, and after some point these compensate the computational gains from the easier linear systems (cf. Sect. 4.2). Also, the times for the algorithm of Wong and Balakrishnan shown in the table do not include the (very costly) computation of the residuals at all, so their actual execution times are even higher.

20 precomputed Penzl shifts (“RADI—Penzl”) generated by using the Krylov subspaces of dimensions 40 with matrices A and \(A^{1}\);

residual Hamiltonian shifts (“RADI—Ham”), with \(\ell =2p\), \(\ell =6p\), and \(\ell =\infty \);

residual minimizing shifts (“RADI—Ham + Opt”), with \(\ell =2p\), \(\ell =6p\), and \(\ell =\infty \).
Times spend in different subtasks in the RADI iteration and RKSM for CUBE with \(m=p=10\)
Method  Subtask  Time 

RADI—Penzl: 135 iterations  Precompute shifts  5.31 
Solve linear systems  43.24  
Total  49.19  
RADI—Ham, \(\ell =2p\): 139 iterations  Solve linear systems  46.42 
Compute shifts dynamically  1.76  
Total  48.79  
RADI—Ham, \(\ell =6p\): 100 iterations  Solve linear systems  32.73 
Compute shifts dynamically  2.75  
Total  35.92  
RADI—Ham, \(\ell =\infty \): 74 iterations  Solve linear systems  24.74 
Compute shifts dynamically  32.44  
Total  57.51  
RKSM—adaptive: 79 iterations  Solve linear systems  18.45 
Orthogonalization  4.14  
Compute shifts dynamically  12.02  
Solve projected equations  15.98  
Total  53.82 
Results of the numerical experiments
Example  Method  No. iterations  Final subspace dim.  Time 

CUBE \(n=10648, \; m=p=1\)  RADI—Penzl  97  97  18.96 
RADI—Ham, \(\ell =2p\)  119  119  17.10  
RADI—Ham, \(\ell =6p\)  99  99  14.15  
RADI—Ham, \(\ell =\infty \)  75  75  11.60  
RADI—Ham + Opt, \(\ell =2p\)  122  122  17.87  
RADI—Ham + Opt, \(\ell =6p\)  103  103  16.70  
RADI—Ham + Opt, \(\ell =\infty \)  108  108  18.44  
RKSM—adaptive  83  83  14.80  
EBA  111  222  6.23  
NewtonADI  2 outer, 296 inner  192  42.11  
CUBE \(n=10648, \; m=p=10\)  RADI—Penzl  135  1350  49.19 
RADI—Ham, \(\ell =2p\)  139  1390  48.79  
RADI—Ham, \(\ell =6p\)  100  1000  35.92  
RADI—Ham, \(\ell =\infty \)  74  740  57.51  
RADI—Ham + Opt, \(\ell =2p\)  87  870  30.59  
RADI—Ham + Opt, \(\ell =6p\)  90  900  33.20  
RADI—Ham + Opt, \(\ell =\infty \)  90  900  119.55  
RKSM—adaptive  79  790  53.82  
EBA  91  1820  230.57  
NewtonADI  2 outer, 202 inner  1960  75.60  
CUBE \(n=74088, \; m=10, \; p=1\)  RADI—Penzl  139  139  1048.60 
RADI—Ham, \(\ell =2p\)  97  97  617.62  
RADI—Ham, \(\ell =6p\)  81  81  506.37  
RADI—Ham, \(\ell =\infty \)  72  72  446.64  
RADI—Ham + Opt, \(\ell =2p\)  101  101  621.38  
RADI—Ham + Opt, \(\ell =6p\)  93  93  571.34  
RADI—Ham + Opt, \(\ell =\infty \)  63  63  387.43  
RKSM—adaptive  73  73  338.78  
EBA  81  162  30.45  
NewtonADI  2 outer, 288 inner  968  1546.29  
CHIP \(n=20082, \; m=1, \; p=5\)  RADI—Penzl  33  165  51.57 
RADI—Ham, \(\ell =2p\)  36  180  30.32  
RADI—Ham, \(\ell =6p\)  29  145  24.36  
RADI—Ham, \(\ell =\infty \)  26  130  22.64  
RADI—Ham + Opt, \(\ell =2p\)  29  145  23.97  
RADI—Ham + Opt, \(\ell =6p\)  26  130  22.26  
RADI—Ham + Opt, \(\ell =\infty \)  25  125  22.33  
RKSM—adaptive  26  130  23.33  
EBA  26  260  6.69  
NewtonADI  2 outer, 64 inner  204  54.04  
IFISS \(n=66049, \; m=p=5\)  RADI—Penzl  >50  >250  
RADI—Ham, \(\ell =2p\)  22  110  17.21  
RADI—Ham, \(\ell =6p\)  19  95  15.37  
RADI—Ham, \(\ell =\infty \)  20  100  17.46  
RADI—Ham + Opt, \(\ell =2p\)  27  135  21.12  
RADI—Ham + Opt, \(\ell =6p\)  Did not converge  
RADI—Ham + Opt, \(\ell =\infty \)  Did not converge  
RKSM—adaptive  26  130  22.28  
EBA  11  110  9.26  
NewtonADI  2 outer, 46 inner  250  38.05  
RAIL \(n=317377, \; m=7, \; p=6\)  RADI—Penzl  66  396  182.60 
RADI—Ham, \(\ell =2p\)  49  294  131.34  
RADI—Ham, \(\ell =6p\)  43  258  127.11  
RADI—Ham, \(\ell =\infty \)  46  276  197.06  
RADI—Ham + Opt, \(\ell =2p\)  46  276  124.13  
RADI—Ham + Opt, \(\ell =6p\)  40  240  120.04  
RADI—Ham + Opt, \(\ell =\infty \)  39  234  158.89  
RKSM—adaptive  41  246  188.60  
EBA  91  1092  916.21  
NewtonADI  1 outer, 62 inner  372  279.90  
LUNG \(n=109460, \; m=p=10\)  RADI—Penzl  Did not converge  
RADI—Ham, \(\ell =2p\)  31  310  30.03  
RADI—Ham, \(\ell =6p\)  28  280  30.22  
RADI—Ham, \(\ell =\infty \)  26  260  34.83  
RADI—Ham + Opt, \(\ell =2p\)  25  250  22.33  
RADI—Ham + Opt, \(\ell =6p\)  17  170  17.74  
RADI—Ham + Opt, \(\ell =\infty \)  17  170  19.02  
RKSM—adaptive  61  610  114.22  
EBA  Did not converge  
NewtonADI  Did not converge 
For all methods, the threshold for declaring convergence is reached once the relative residual is less than \(tol=10^{11}\). A summary of the results for all the different methods and strategies is shown in Table 3. The column “final subspace dimension” displays the number of columns of the matrix Z, where \(X\approx ZZ^*\) is the final computed approximation. Dividing this number by p (for EBA, by 2p), we obtain the number of iterations used in a particular method. Just for the sake of completeness, we have also included a variant of the NewtonADI algorithm [7] with Galerkin projection [9]. Without the Galerkin projection, the NewtonADI algorithm could not compete with the other methods. The recent developments from [15], which make the NewtonADI algorithm more competitive, are beyond the scope of this study.
It is interesting to analyze the timing breakdown for RADI and RKSM methods. These timings are listed in Table 2 for the CUBE example with \(m=p=10\) where a significant amount of time is spent for tasks other than solving linear systems.
As \(\ell \) increases, the cost of computing shifts in RADI increases as well—the projection subspace gets larger, and more effort is needed to orthogonalize its basis and compute the eigenvalue decomposition of the projected Hamiltonian matrix. This effort is, in the CUBE benchmark, awarded by a decrease in the number of iterations. However, there is a tradeoff here: the extra computation does outweigh the saving in the number of iterations for sufficiently large \(\ell \). Convergence history for CUBE is plotted in Fig. 2; to reduce the clutter, only the selected few methods are shown.
The fact that in each step RADI solves linear systems with \(p+m\) right hand side vectors, compared to only p vectors in RKSM, may become noticable when m is larger than p. This effect is shown in Table 3 for CUBE with \(m=10\) and \(p=1\). Unlike these two methods, EBA can precompute the LU factorization of A, and win by a large margin in this test case.
Example 3
Next, we run the Riccati solvers for the wellknown benchmark example CHIP. All coefficient matrices for the Riccati equation are taken as they are found in the Oberwolfach Model Reduction Benchmark Collection [17]. Here we solve the generalized Riccati equation (25).
The cost of precomputing shifts is very high in case of CHIP. One fact not shown in the table is that all algorithms which compute shifts dynamically have already solved the Riccati equation before “RADI—Penzl” has even started.
Example 4
We use the IFISS 3.2. finiteelement package [31] to generate the coefficient matrices for a generalized Riccati equation. We choose the provided example TCD 2 which represents a finite element discretization of a twodimensional convection diffusion equation on a square domain. The leading dimension is \(n=66049\), with E symmetric positive definite, and A nonsymmetric. The matrix B consists of \(m=5\) randomly generated columns, and \(C=[C_1, \; 0]\) with random \(C_1\in {\mathbb {R}}^{5 \times 5}\) (\(p=5\)).
In this example, the RADI iteration with Penzl shifts converges very slowly. The RADI iteration with dynamically generated shifts are quite fast, and the final subspace dimension is smallest among all methods. On the other hand, the version with residual minimizing shifts does not converge for \(\ell =6p, \infty \): it quickly reaches the relative residual of about \(10^{7}\), and then gets stuck by continually using shifts very close to zero. Figure 3 shows the convergence history for some of the used shift strategies.
Example 5
The example RAIL is a larger version of the steel profile cooling model from the Oberwolfach Model Reduction Benchmark Collection [17]. A finer finite element discretization was used for the heat equation resulting in a generalized CARE \(n=317377\), \(m=7\), \(p=6\), and E and A symmetric positive and negative definite, respectively. Once again, there is a tradeoff between (questionably) better shifts with larger \(\ell \) and faster computation with lower \(\ell \).
Example 6
The final example LUNG from the UF Sparse Matrix Collection [12] models temperature and water vapor transport in the human lung. It provides matrices with leading dimension \(n=109460\), \(E=I\), A nonsymmetric, and B, C are generated as random matrices with \(m=p=10\). This example shows the importance of proper shift generation: precomputed shifts are completely useless, while dynamically generated ones show different rates of success. The projection based methods (RKSM, EBA) encountered problems at the numerical solution of the projected ARE. Either the complete algorithm broke down or convergence speed was reduced. Similar issues were encountered at the Galerkin acceleration stage in the NewtonADI method.
6 Conclusion
In this paper, we have presented a new lowrank RADI algorithm for computing solutions of large scale Riccati equations. We have shown that this algorithm produces exactly the same iterates as three previously known methods (for which we suggest the common name “Riccati ADI methods”), but it does so in a computationally far more efficient way. As with other Riccati solvers, the performance is heavily dependent on the choice of shift parameters. We have suggested several strategies on how this may be done; some of them show very promising results, making the RADI algorithm competitive with the fastest large scale Riccati solvers.
Footnotes
 1.
Note that the generated Penzl shifts come in complex conjugate pairs.
Notes
Acknowledgements
Open access funding provided by Max Planck Society. Part of this work was done while the second author was a postdoctoral researcher at Max Planck Institute Magdeburg, Germany. The second author would also like to acknowledge the support of the Croatian Science Foundation under Grant HRZZ9345.
References
 1.Amodei, L., Buchot, J.M.: An invariant subspace method for largescale algebraic Riccati equation. Appl. Numer. Math. 60(11), 1067–1082 (2010). doi: 10.1016/j.apnum.2009.09.006 MathSciNetCrossRefzbMATHGoogle Scholar
 2.Benner, P.: Theory and Numerical Solution of Differential and Algebraic Riccati Equations. In: Benner, P., Bollhöfer, M., Kressner, D., Mehl, C., Stykel, T. (eds.) Numerical Algebra, Matrix Theory, DifferentialAlgebraic Equations and Control Theory, pp. 67–105. Springer, Berlin (2015). doi: 10.1007/9783319152608_4 CrossRefGoogle Scholar
 3.Benner, P., Bujanović, Z.: On the solution of largescale algebraic Riccati equations by using lowdimensional invariant subspaces. Linear Algebra Appl. 488, 430–459 (2016). doi: 10.1016/j.laa.2015.09.027 MathSciNetCrossRefzbMATHGoogle Scholar
 4.Benner, P., Kürschner, P., Saak, J.: An improved numerical method for balanced truncation for symmetric second order systems. Math. Comput. Model. Dyn. Syst. 19(6), 593–615 (2013). doi: 10.1080/13873954.2013.794363 MathSciNetCrossRefzbMATHGoogle Scholar
 5.Benner, P., Kürschner, P., Saak, J.: Efficient handling of complex shift parameters in the lowrank Cholesky factor ADI method. Numer. Algorithms 62(2), 225–251 (2013). doi: 10.1007/s1107501295697 MathSciNetCrossRefzbMATHGoogle Scholar
 6.Benner, P., Kürschner, P., Saak, J.: Selfgenerating and efficient shift parameters in ADI methods for large Lyapunov and Sylvester equations. Electr. Trans. Num. Anal. 43, 142–162 (2014)MathSciNetzbMATHGoogle Scholar
 7.Benner, P., Li, J.R., Penzl, T.: Numerical solution of large Lyapunov equations, Riccati equations, and linearquadratic control problems. Numer. Linear Algebra Appl. 15(9), 755–777 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
 8.Benner, P., Li, R.C., Truhar, N.: On the ADI method for Sylvester equations. J. Comput. Appl. Math. 233(4), 1035–1045 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
 9.Benner, P., Saak, J.: A GalerkinNewtonADI method for solving largescale algebraic Riccati equations. Preprint SPP1253090, SPP1253 (2010). http://www.am.unierlangen.de/home/spp1253/wiki/index.php/Preprints
 10.Benner, P., Saak, J.: Numerical solution of large and sparse continuous time algebraic matrix Riccati and Lyapunov equations: a state of the art survey. GAMM Mitt. 36(1), 32–52 (2013). doi: 10.1002/gamm.201310003 MathSciNetCrossRefzbMATHGoogle Scholar
 11.Bini, D., Iannazzo, B., Meini, B.: Numerical Solution of Algebraic Riccati Equations. Fundamentals of Algorithms. SIAM, New Delhi (2012)zbMATHGoogle Scholar
 12.Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011). doi: 10.1145/2049662.2049663 MathSciNetzbMATHGoogle Scholar
 13.Druskin, V., Simoncini, V.: Adaptive rational Krylov subspaces for largescale dynamical systems. Syst. Control Lett. 60(8), 546–560 (2011). doi: 10.1016/j.sysconle.2011.04.013 MathSciNetCrossRefzbMATHGoogle Scholar
 14.Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2013)zbMATHGoogle Scholar
 15.Heinkenschloss, M., Weichelt, H.K., Benner, P., Saak, J.: An inexact lowrank NewtonADI method for largescale algebraic Riccati equations. Appl. Numer. Math. 108, 125–142 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 16.Heyouni, M., Jbilou, K.: An extended block Arnoldi algorithm for largescale solutions of the continuoustime algebraic Riccati equation. Electr. Trans. Num. Anal. 33, 53–62 (2009)MathSciNetzbMATHGoogle Scholar
 17.Korvink, J.G., Rudnyi, E.B.: Oberwolfach benchmark collection. In: Benner, P., Sorensen, D.C., Mehrmann, V. (eds.) Dimension Reduction of LargeScale Systems, Lecture Notes in Computational Science and Engineering, vol. 45, pp. 311–315. Springer, Berlin (2005). doi: 10.1007/3540279091_11 CrossRefGoogle Scholar
 18.Kürschner, P.: Efficient lowrank solution of largescale matrix equations. Ph.D. thesis, OttovonGuerickeUniversität Magdeburg (2016)Google Scholar
 19.Lancaster, P., Rodman, L.: The Algebraic Riccati Equation. Oxford University Press, Oxford (1995)zbMATHGoogle Scholar
 20.Laub, A.J.: A Schur method for solving algebraic Riccati equations. IEEE Trans. Autom. Control AC–24, 913–921 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
 21.Levenberg, N., Reichel, L.: A generalized ADI iterative method. Numer. Math. 66(1), 215–233 (1993). doi: 10.1007/BF01385695 MathSciNetCrossRefzbMATHGoogle Scholar
 22.Li, J.R., White, J.: Low rank solution of Lyapunov equations. SIAM J. Matrix Anal. Appl. 24(1), 260–280 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
 23.Lin, Y., Simoncini, V.: A new subspace iteration method for the algebraic Riccati equation. Numer. Linear Algebra Appl. 22(1), 26–47 (2015). doi: 10.1002/nla.1936 MathSciNetCrossRefzbMATHGoogle Scholar
 24.Massoudi, A., Opmeer, M.R., Reis, T.: Analysis of an iteration method for the algebraic Riccati equation. SIAM J. Matrix Anal. Appl. 37(2), 624–648 (2016). doi: 10.1137/140985792 MathSciNetCrossRefzbMATHGoogle Scholar
 25.Mehrmann, V., Tan, E.: Defect correction methods for the solution of algebraic Riccati equations. IEEE Trans. Autom. Control 33, 695–698 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
 26.Overton, M.L.: Largescale optimization of eigenvalues. SIAM J. Optim. 2(1), 88–120 (1992). doi: 10.1137/0802007 MathSciNetCrossRefzbMATHGoogle Scholar
 27.Penzl, T.: Lyapack Users guide. Technical Report SFB393/0033, Sonderforschungsbereich 393 Numerische Simulation auf massiv parallelen Rechnern, TU Chemnitz, 09107 Chemnitz, Germany (2000). http://www.tuchemnitz.de/sfb393/sfb00pr.html
 28.Ruhe, A.: The rational Krylov algorithm for nonsymmetric eigenvalue problems. III: complex shifts for real matrices. BIT 34, 165–176 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
 29.Saak, J.: Efficient numerical solution of large scale algebraic matrix equations in PDE control and model order reduction. Ph.D. thesis, TU Chemnitz (2009). http://nbnresolving.de/urn:nbn:de:bsz:ch1200901642
 30.Sabino, J.: Solution of largescale Lyapunov equations via the block modified smith method. Ph.D. Thesis, Rice University, Houston, Texas (2007). http://www.caam.rice.edu/tech_reports/2006/TR0608.pdf
 31.Silvester, D., Elman, H., Ramage, A.: Incompressible Flow and Iterative Solver Software (IFISS) version 3.2 (2012). http://www.manchester.ac.uk/ifiss
 32.Simoncini, V.: A new iterative method for solving largescale Lyapunov matrix equations. SIAM J. Sci. Comput. 29(3), 1268–1288 (2007). doi: 10.1137/06066120X MathSciNetCrossRefzbMATHGoogle Scholar
 33.Simoncini, V.: Computational methods for linear matrix equations. SIAM Rev. 38(3), 377–441 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 34.Simoncini, V., Szyld, D., Monsalve, M.: On two numerical methods for the solution of largescale algebraic Riccati equations. IMA J. Numer. Anal. 34(3), 904–920 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 35.Wachspress, E.: Iterative solution of the Lyapunov matrix equation. Appl. Math. Lett. 107, 87–90 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
 36.Wachspress, E.: The ADI Model Problem. Springer, New York (2013). doi: 10.1007/9781461451228 CrossRefzbMATHGoogle Scholar
 37.Wong, N., Balakrishnan, V.: Quadratic alternating direction implicit iteration for the fast solution of algebraic Riccati equations. In: Proceedings of International Symposium on Intelligent Signal Processing and Communication Systems, pp. 373–376 (2005)Google Scholar
 38.Wong, N., Balakrishnan, V.: Fast positivereal balanced truncation via quadratic alternating direction implicit iteration. IEEE Trans. Comput.Aided Des. Integr. Circuits Syst. 26(9), 1725–1731 (2007)CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.