Consider a preconditioner M based on an incomplete LU (or Cholesky) factorization of a matrix A. M −1, which represents an approximation of A −1, is applied by performing forward and back substitution steps; this can present a computational bottleneck. An alternative strategy is to directly approximate A −1 by explicitly computing M −1. Preconditioners of this kind are called sparse approximate inverse preconditioners. They constitute an important class of algebraic preconditioners that are complementary to the approaches discussed in the previous chapter. They can be attractive because when used with an iterative solver, they can require fewer iterations than standard incomplete factorization preconditioners that contain a similar number of entries while offering significantly greater potential for parallel computations.

From Theorem 7.3, the sparsity pattern of the inverse of an irreducible matrix A is dense, even when A is sparse. Therefore, if A is large, the exact computation of its inverse is not an option, and aggressive dropping is needed to obtain a sufficiently sparse approximation to A −1 that can be used as a preconditioner. Fortunately, for a wide class of problems of practical interest, many of the entries of A −1 are small in absolute value, so that approximating the inverse with a sparse M −1 may be feasible, although capturing the large (important) values of A −1 is a nontrivial task. Importantly, the computed M −1 can have nonzeros at positions that cannot be obtained by either a complete or an incomplete factorization, and this can be beneficial. Furthermore, although A −1 is fully dense, the following result shows this is not the case for the factors of factorized inverses.

FormalPara Theorem 11.1 (Bridson & Tang 1999; Benzi & Tůma 2000)

Assume the matrix A is SPD, and let L be its Cholesky factor. Then \(\mathcal {S}\{L^{-1}\}\) is the union of all entries (i, j) such that i is an ancestor of j in the elimination tree \(\mathcal {T}(A)\).

A consequence of this result is that L −1 need not be fully dense. Considering this implication algorithmically, if A is SPD, it may be advantageous to preorder A to limit the number of ancestors that the vertices in \(\mathcal {T}(A)\) have. For example, nested dissection may be applied to \(\mathcal {S}\{A\}\) (Section 8.4). If \(\mathcal {S}\{A\}\) is nonsymmetric, then it may be possible to reduce fill-in in the factors of A −1 by applying nested dissection to \(\mathcal {S}\{A+A^T\}\).

11.1 Basic Approaches

An obvious way to obtain an approximate inverse of A in factorized form is to compute an incomplete LU factorization of A and then perform an approximate inversion of the incomplete factors. For example, if incomplete factors \(\widetilde L\) and \(\widetilde U\) are available, approximate inverse factors can be found by solving the 2n triangular linear systems

$$\displaystyle \begin{aligned} \widetilde Lx_i = e_i , \quad \widetilde U y_i = e_i ,\quad 1 \le i \le n ,\end{aligned}$$

where e i is the i-th column of the identity matrix. These systems can all be solved independently, and hence, there is the potential for significant parallelism. To reduce costs and to preserve sparsity in the approximate inverse factors, they may not need to be solved accurately. A disadvantage is that the computation of the preconditioner involves two levels of incompleteness, and because information from the incomplete factorization of A is passed into the second step, the loss of information can be excessive.

Another straightforward approach is based on bordering. Let A j denote the principal leading submatrix of A of order j (A j = A 1:j,1:j), and assume that its inverse factorization

$$\displaystyle \begin{aligned} A_j^{-1} = W_j D_j^{-1} Z_j^T\end{aligned}$$

is known. Here W j and Z j are unit upper triangular matrices, and D j is a diagonal matrix. Consider the following scheme:

$$\displaystyle \begin{aligned} \begin{pmatrix} Z_j^T & 0 \\[0.3cm] z_{j+1}^T & 1 \end{pmatrix} \begin{pmatrix} {A}_j & A_{1:j,j+1} \\[0.3cm] A_{j+1,1:j} & a_{j+1,j+1} \end{pmatrix} \begin{pmatrix} {W}_j & w_{j+1} \\[0.3cm] 0 & 1 \end{pmatrix} = \begin{pmatrix} D_j & 0 \\[0.3cm] 0 & d_{j+1,j+1} \end{pmatrix} , \end{aligned}$$

where for 1 ≤ j < n

$$\displaystyle \begin{aligned} & w_{j+1} = -W_jD_j^{-1}Z_j^TA_{1:j,j+1}, \\[0.2cm] & z_{j+1} = -Z_jD_j^{-1}W_j^TA_{j+1,1:j}^T, \\[0.2cm] & d_{j+1,j+1} = a_{j+1,j+1} + z_{j+1}^TA_jw_{j+1} + A_{j+1,1:j}w_{j+1} + z_{j+1}^TA_{1:j,j+1}. \end{aligned} $$

Starting from j = 1, this suggests a procedure for computing the inverse factors of A. Sparsity can be preserved by dropping some entries from the vectors w j+1 and z j+1 once they have been computed. Sparsity and the quality of the preconditioner can be influenced by preordering A.

If A is symmetric, W = Z and the required work is halved. Furthermore, if A is SPD, then it can be shown that, in exact arithmetic, d jj > 0 for all j and the process does not break down. In the general case, diagonal modifications may be required, which can limit the effectiveness of the resulting preconditioner.

Observe that the computations of Z and W are tightly coupled, restricting the potential to exploit parallelism. At each step j, besides a matrix–vector product with A j, four sparse matrix–vector products involving W j, Z j and their transposes are needed; these account for most of the work. The implementation is simplified if access to the triangular factors is available by columns as well as by rows.

11.2 Approximate Inverses Based on Frobenius Norm Minimization

It is clear from the above discussion that alternative techniques for constructing sparse approximate inverse preconditioners are needed. We start by looking at schemes based on Frobenius norm minimization. Historically, these were the first to be proposed and offer the greatest potential for parallelism because both the construction of the preconditioner and its subsequent application can be performed in parallel.

11.2.1 SPAI Preconditioner

To describe the sparse approximate inverse (SPAI) preconditioner, it is convenient to use the notation K = M −1. The basic idea is to compute K ≈ A −1 with its columns denoted by k j as the solution of the problem of minimizing

$$\displaystyle \begin{aligned} \| I - A M^{-1} \|{}_F^2 = \| I - A K \|{}_F^2 = \sum_{j=1}^{n} \| e_j - A k_j \|{}_2^2, \end{aligned} $$
(11.1)

over all K with pattern \(\mathcal {S}\). This produces a right approximate inverse. A left approximate inverse can be computed by solving a minimization problem for ∥I − KAF = ∥I − A T K TF. This amounts to computing a right approximate inverse for A T and taking the transpose of the resulting matrix. For nonsymmetric matrices, the distinction between left and right approximate inverses can be important. Indeed, there are situations where it is difficult to compute a good right approximate inverse but easy to find a good left approximate inverse (or vice versa). In the following discussion, we assume that a right approximate inverse is being computed.

The Frobenius norm is generally used because the minimization problem then reduces to least squares problems for the columns of K that can be computed independently and, if required, in parallel. Further, these least squares problems are all of small dimension when \(\mathcal {S}\) is chosen to ensure K is sparse. Let \(\mathcal {J} = \{i \, | \, k_j(i) \ne 0\}\) be the set of indices of the nonzero entries in column k j. The set of indices of rows of A that can affect a product with column k j is \(\mathcal {I} = \{m\, | \, {A}_{m,\mathcal {J}} \neq 0 \}\). Let \(|\mathcal {I}|\) and \(|\mathcal {J}|\) denote the number of entries in \(\mathcal {I}\) and \(\mathcal {J}\), respectively, and let \(\widehat e_j = e_j(\mathcal {I})\) be the vector of length \(|\mathcal {I}|\) that is obtained by taking the entries of e j with row indices belonging to \(\mathcal {I}\). To solve (11.1) for k j, construct the \(|\mathcal {I}| \times |\mathcal {J}|\) matrix \(\widehat A = A_{\mathcal {I},\mathcal {J}}\) and solve the small unconstrained least squares problem

$$\displaystyle \begin{aligned} \min_{\widehat k_j} \| \widehat e_j - \widehat A \,\widehat k_j \|{}_2^2 . \end{aligned} $$
(11.2)

This can be done using a dense QR factorization of \(\widehat A\). Extending \(\widehat k_j \) to have length n by setting entries that are not in \(\mathcal {J}\) to zero gives k j.

A straightforward way to construct \(\mathcal {S}\) that does not depend on a sophisticated initial choice (but could, for example, be the identity or be equal to \(\mathcal {S}\{A\}\)) proceeds as follows. Starting with a chosen column sparsity pattern \(\mathcal {J}\) for k j, construct \(\widehat A\), solve (11.2) for \(\widehat k_j \), set \(k_j(\mathcal {J}) = \widehat k_j \), and define the residual vector

$$\displaystyle \begin{aligned} r_j = e_j - A_{1:n,\mathcal{J}} {\widehat k_j}.\end{aligned}$$

If ∥r j2 ≠ 0, then k j is not equal to the j-th column of A −1, and a better approximation can be derived by augmenting \(\mathcal {J}\). To do this, let \(\mathcal {L} = \{l \, |\, r_j(l) \neq 0\}\) and define

$$\displaystyle \begin{aligned} \widetilde{\mathcal{J}} = \{i \, | \, A_{\mathcal{L}, i} \neq 0\} \setminus \mathcal{J} . \end{aligned} $$
(11.3)

These are candidate indices that can be added to \(\mathcal {J}\), but as there may be many of them, they need to be chosen to most effectively reduce ∥r j2. A possible heuristic is to solve for each \(i \in \widetilde {\mathcal {J}}\) the minimization problem

$$\displaystyle \begin{aligned} \min_{\mu_i}|| r_j - \mu_i Ae_i \|{}_2^2. \end{aligned}$$

This has the solution \(\mu _i ={r_j^T Ae_i}/{\|Ae_i\|{ }_2^2} \) with residual \(\|r_j\|{ }^2 - (r_j^T Ae_i)^2 / \|Ae_i\|{ }_2^2. \) Indices \(i \in \widetilde {\mathcal {J}}\) for which this is small are appended to \(\mathcal {J}\). The process can be repeated until either the required accuracy is attained or the maximum number of allowed entries in \(\mathcal {J}\) is reached.

Solving the unconstrained least squares problem (11.2) after extending \(\widehat A\) to \( A_{\mathcal {I} \cup \mathcal {I}^{\prime },\,\mathcal {J}\cup \mathcal {J}^{\prime }}\) is typically performed using updating. Assume the QR factorization of \(\widehat A\) is

$$\displaystyle \begin{aligned} \widehat A = A_{\mathcal{I},\mathcal{J}} = Q \begin{pmatrix} R \\ 0 \end{pmatrix} = \begin{pmatrix} Q_1 & Q_2 \end{pmatrix} \begin{pmatrix} R \\ 0 \end{pmatrix},\end{aligned}$$

where Q 1 is \(|\mathcal {I}| \times |\mathcal {J}|\). Here Q is an orthogonal matrix and R is an upper triangular matrix. The QR factorization of the extended matrix is

$$\displaystyle \begin{aligned} \begin{array}{rcl} A_{\mathcal{I} \cup \mathcal{I}^{\prime},\,\mathcal{J}\cup \mathcal{J}^{\prime}} & =&\displaystyle \begin{pmatrix} \widehat A &\displaystyle A_{\mathcal{I},\, \mathcal{J}^{\prime}} \\ & A_{\mathcal{I}^{\prime},\, \mathcal{J}^{\prime}} \end{pmatrix} = \begin{pmatrix}Q \\ & I \end{pmatrix} \begin{pmatrix} R &\displaystyle Q_1^T A_{\mathcal{I},\, \mathcal{J}^{\prime}} \\ & Q_2^T A_{\mathcal{I},\, \mathcal{J}^{\prime}} \\ & A_{\mathcal{I}^{\prime},\, \mathcal{J}^{\prime}} \end{pmatrix} \\ & =&\displaystyle \begin{pmatrix}Q \\ & I \end{pmatrix} \begin{pmatrix} I \\ & Q^{\prime} \end{pmatrix} \begin{pmatrix} R &\displaystyle Q_1^T A_{\mathcal{I},\, \mathcal{J}^{\prime}} \\ & R^{\prime} \\ & 0 \end{pmatrix}, \end{array} \end{aligned} $$

where Q and R are from the QR factorization of the \((|\mathcal {I}^{\prime }|+|\mathcal {I}| - |\mathcal {J}|) \times |\mathcal {J}^{\prime }|\) matrix

$$\displaystyle \begin{aligned} \begin{pmatrix} Q_2^T A_{\mathcal{I},\, \mathcal{J}^{\prime}} \\ A_{\mathcal{I}^{\prime},\, \mathcal{J}^{\prime}} \end{pmatrix}. \end{aligned}$$

Factorizing this matrix and updating the trailing QR factorization to get the new \(\widehat k_j\) is much more efficient than computing the QR factorization of the extended matrix from scratch.

Construction of the SPAI preconditioner is summarized in Algorithm 11.1. The maximum number of entries nz j that is permitted in k j must be at least as large as the number of entries in the initial sparsity pattern \(\mathcal {J}_j\). Updating can be used to compute a new \({\widehat k_j}\) for each pass through the while loop; the number of passes is typically small (for example, if a good initial sparsity pattern is available, a single pass may be sufficient).

Algorithm 11.1 SPAI preconditioner (right-looking approach)

The example in Figure 11.1 illustrates Algorithm 11.1. Starting with a tridiagonal matrix, it considers the computation of the first column k 1 of the inverse matrix K. The algorithm starts with \(\mathcal {J}_1 = \{1,2\}\).

Fig. 11.1
figure 1

An illustration of computing the first column of a sparse approximate inverse using the SPAI algorithm with nz 1 = 3. On the top line is the initial tridiagonal matrix A followed by the matrix \(\widehat A\) and the vectors \(\widehat k_1\) and r 1 on the first loop of Algorithm 11.1. The bottom line presents the updated matrix \(\hat A\) that is obtained on the second loop by adding the third row and column of A and the corresponding vectors \(\widehat k_1\) and r 1 and, finally, k 1. Here the numerical values have been appropriately rounded.

When A is symmetric, there is no guarantee that the computed K will be symmetric. One possibility is to use (K + K T)∕2 to approximate A −1. The SPAI preconditioner is not sensitive to the ordering of A. This has the advantage that A can be partitioned and preordered in whatever way is convenient, for instance, to better suit the needs of a distributed implementation, without worrying about the impact on the subsequent convergence rate of the solver. The disadvantage is that orderings cannot be used to reduce fill-in and/or improve the quality of this approximate inverse. For instance, if A −1 has no small entries, SPAI will not find a sparse K, and because the inverse of a permutation of A is just a permutation of A −1, no permutation of A will change this.

11.2.2 FSAI Preconditioner: SPD Case

We next consider a class of preconditioners based on an incomplete inverse factorization of A −1. The factorized sparse approximate inverse (FSAI) preconditioner for an SPD matrix A is defined as the product

$$\displaystyle \begin{aligned} M^{-1} = G^TG, \end{aligned}$$

where the sparse lower triangular matrix G is an approximation of the inverse of the (complete) Cholesky factor L of A. Theoretically, a FSAI preconditioner is computed by choosing a lower triangular sparsity pattern \(\mathcal {S}_L\) and minimizing

$$\displaystyle \begin{aligned} \|I - GL\|{}^2_F = tr\left[({I} - GL)^T ({I} - GL) \right], \end{aligned} $$
(11.4)

over all G with sparsity pattern \(\mathcal {S}_L\). Here tr denotes the matrix trace operator (that is, the sum of the entries on the diagonal). The computation of G can be performed without knowing L explicitly. Differentiating (11.4) with respect to the entries of G and setting to zero yields

$$\displaystyle \begin{aligned} (GLL^T)_{ij} = (GA)_{ij} = ({L}^T)_{ij} \quad \mbox{for all} \quad (i,j) \in \mathcal{S}_L. \end{aligned} $$
(11.5)

Because L T is an upper triangular matrix while \(\mathcal {S}_L\) is a lower triangular pattern, the matrix equation (11.5) can be rewritten as

$$\displaystyle \begin{aligned} (GA)_{ij} = \begin{cases} 0 \quad \ i \ne j, \quad (i,j) \in \mathcal{S}_L,\\ l_{ii} \quad i = j. \end{cases} \end{aligned} $$
(11.6)

G is not available from (11.6) because L is unknown. Instead, \(\overline G\) is computed such that

$$\displaystyle \begin{aligned} (\overline G A)_{ij} = \delta_{i,j} \quad \mbox{for all} \quad (i,j) \in \mathcal{S}_L, \end{aligned} $$
(11.7)

where δ i,j is the Kronecker delta function (δ i,j = 1 if i = j and is equal to 0, otherwise). The FSAI factor G is then obtained by setting

$$\displaystyle \begin{aligned} G = D\overline G,\end{aligned}$$

where D is a diagonal scaling matrix. An appropriate choice for D is

$$\displaystyle \begin{aligned} D = [diag (\overline G)]^{-1/2}, \end{aligned} $$
(11.8)

so that

$$\displaystyle \begin{aligned} (GA G^T)_{ii} = 1, \quad 1 \le i \le n.\end{aligned}$$

The following result shows that the FSAI preconditioner exists for any nonzero pattern \(\mathcal {S}_L\) that includes the main diagonal of A.

Theorem 11.2 (Kolotilina & Yeremin 1993)

Assume A is SPD. If the lower triangular sparsity pattern \(\mathcal {S}_L\) includes all diagonal positions, then G exists and is unique.

Proof

Set \(\mathcal {I}_i = \{ j \ | \ (i,j) \in \mathcal {S}_L \}\), and let \(A_{\mathcal {I}_i,\, \mathcal {I}_i}\) denote the submatrix of order \(nz_i=|\mathcal {I}_i|\) of entries a kl such that \(k,l \in \mathcal {I}_i\). Let \(\bar g_i\) and g i be dense vectors containing the nonzero coefficients in row i of \(\overline G\) and G, respectively. Using this notation, solving (11.7) decouples into solving n independent SPD linear systems

$$\displaystyle \begin{aligned} A_{\mathcal{I}_i,\, \mathcal{I}_i}\, \bar g_i = e_{nz_i}, \quad 1 \le i \le n, \end{aligned}$$

where the unit vectors are of length nz i. Moreover,

$$\displaystyle \begin{aligned} {\displaystyle (\overline G A \overline G^T)_{ii} = \sum_{ j \in \mathcal{I}_i } \delta_{i,j} \overline G_{ij} = \overline G_{ii} = ( A^{-1}_{\mathcal{I}_i, \mathcal{I}_i})_{ii}. } \end{aligned}$$

This implies that the diagonal entries of D given by (11.8) are nonzero. Consequently, the computed rows of G exist and provide a unique solution. □

The procedure for computing a FSAI preconditioner is summarized in Algorithm 11.2. The computation of each row of G can be performed independently; thus, the algorithm is inherently parallel, but each application of the preconditioner requires the solution of triangular systems.

Algorithm 11.2 FSAI preconditioner

The following theorem states that G computed using Algorithm 11.2 is in some sense optimal.

Theorem 11.3 (Kolotilina et al. 2000)

Let L be the Cholesky factor of the SPD matrix A. Given a lower triangular sparsity pattern \(\mathcal {S}_L\) that includes all diagonal positions, let G be the FSAI preconditioner computed using Algorithm 11.2 . Then any lower triangular matrix G 1 with its sparsity pattern contained in \(\mathcal {S}_L\) and \((G_1AG_1^T)_{ii} = 1\) (1 ≤ i  n) satisfies

$$\displaystyle \begin{aligned} ||I - G L ||{}_F \le ||{I} - G_1L||{}_F. \end{aligned}$$

The performance of the FSAI preconditioner is highly dependent on the choice of \(\mathcal {S}_L\). If entries are added to the pattern, then, as the following result shows, the preconditioner is more accurate, but it is also more expensive.

Theorem 11.4 (Kolotilina et al. 2000)

Let L be the Cholesky factor of the SPD matrix A. Given the lower triangular sparsity patterns \(\mathcal {S}_{L1}\) and \(\mathcal {S}_{L2}\) that include all diagonal positions, let the corresponding FSAI preconditioners computed using Algorithm 11.2 be G 1 and G 2 , respectively. If \(\mathcal {S}_{L1} \subseteq \mathcal {S}_{L2}\) , then

$$\displaystyle \begin{aligned} ||{I} - G_2 {L} ||{}_F \le ||{I} - G_1 {L}||{}_F. \end{aligned}$$

11.2.3 FSAI Preconditioner: General Case

The FSAI algorithm can be extended to a general matrix A. Two input sparsity patterns are required: a lower triangular sparsity pattern \(\mathcal {S}_L\) and an upper triangular sparsity pattern \(\mathcal {S}_U\), both containing the diagonal positions. First, lower and upper triangular matrices \(\overline G_L\) and \(\overline G_U\) are computed such that

$$\displaystyle \begin{aligned} (\overline G_L A)_{ij} = \delta_{i,j} \quad \mbox{for all} \quad (i,j) \in \mathcal{S}_L, \end{aligned}$$
$$\displaystyle \begin{aligned} (A \overline G_U)_{ij} = \delta_{i,j} \quad \mbox{for all} \quad (i,j) \in \mathcal{S}_U. \end{aligned}$$

Then D is obtained as the inverse of the diagonal of the matrix \( \overline G_L A \overline G_U ,\) and the final nonsymmetric FSAI factors are given by \(G_L = \overline G_L \) and \( G_U = \overline G_U D. \) The computation of the two approximate factors can be performed independently.

This generalization is well defined if, for example, A is nonsymmetric positive definite. There is also theory that extends existence to special classes of matrices, including M- and H-matrices. In more general cases, solutions to the reduced systems may not exist, and modifications (such as perturbing the diagonal entries) are needed to circumvent breakdown.

11.2.4 Determining a Good Sparsity Pattern

The role of the input pattern is to preserve sparsity by filtering out entries of A −1 that contribute little to the quality of the preconditioner. For instance, it might be appropriate to ignore entries with a small absolute value, while retaining the largest ones. Unfortunately, the locations of large entries in A −1 are generally unknown, and this makes the a priori sparsity choice difficult. A possible exception is when A is a banded SPD matrix. In this case, the entries of A −1 are bounded in an exponentially decaying manner along each row or column. Specifically, there exist 0 < ρ < 1 and a constant c such that for all i, j

$$\displaystyle \begin{aligned} |(A^{-1})_{ij}| \le c \rho^{|i-j|}.\end{aligned}$$

The scalars ρ and c depend on the bandwidth and the condition number of A. For matrices with a large bandwidth and/or a high condition number, c can be very large and ρ close to one, indicating extremely slow decay. However, if the entries of A −1 can be shown to decay rapidly, then a banded M −1 should be a good approximation to A −1. In this case, \(\mathcal {S}_L\) can be chosen to correspond to a matrix with a prescribed bandwidth.

A common choice for a general A is \(\mathcal {S}_L + \mathcal {S}_U = \mathcal {S}\{A\}\), motivated by the empirical observation that entries in A −1 that correspond to nonzero positions in A tend to be relatively large. However, this simple choice is not robust because entries of A −1 that lie outside \(\mathcal {S}\{A\}\) can also be large. An alternative strategy based on the Neumann series expansion of A −1 is to use the pattern of a small power of A, i.e., \(\mathcal {S}\{A^2\}\) or \(\mathcal {S}\{A^3\}\). By starting from the lower and upper triangular parts of A, this approach can be used to determine candidates \(\mathcal {S}_L\) and \(\mathcal {S}_U\). While approximate inverses based on higher powers of A are often better than those corresponding to A, there is no guarantee they will result in good preconditioners. Furthermore, even small powers of A can be very dense, thus slowing down the construction and application of the preconditioner. A possible remedy is to use the power of a sparsified A. Alternatively, the pattern can be chosen dynamically by retaining the largest terms in each row of the preconditioner as it is computed, which is what the SPAI algorithm does. Another possibility is to implicitly determine \(\mathcal {S}_L + \mathcal {S}_U\) as follows. Starting with a simple sparsity pattern, compute the corresponding FSAI preconditioner G 1. Then choose a pattern based on \(G_1AG_1^T\) and apply the FSAI algorithm to \(G_1AG_1^T\) to obtain G 2. Finally, set the preconditioner to G 2 G 1. Despite running the FSAI algorithm twice, this approach can be worthwhile. Unfortunately, the choice of the best technique for generating a FSAI preconditioner and its sparsity pattern is highly problem dependent.

11.3 Factorized Approximate Inverses Based on Incomplete Conjugation

An alternative way to obtain a factorized approximate inverse is based on incomplete conjugation (A-orthogonalization) in the SPD case and on incomplete A-biconjugation in the general case. For SPD matrices, the approach represents an approximate Gram–Schmidt orthogonalization that uses the A-inner product 〈., .〉A. An important attraction is that the sparsity patterns of the approximate inverse factors need not be specified in advance; instead, they are determined dynamically as the preconditioner is computed.

11.3.1 AINV Preconditioner: SPD Case

When A is an SPD matrix, the AINV preconditioner is defined by an approximate inverse factorization of the form

$$\displaystyle \begin{aligned} A^{-1} \approx M^{-1} = ZD^{-1}Z^T,\end{aligned}$$

where the matrix Z is unit upper triangular and D is a diagonal matrix with positive entries. The factor Z is a sparse approximation of the inverse of the L T factor in the square root-free factorization of A. Z and D are computed directly from A using an incomplete A-orthogonalization process applied to the columns of the identity matrix. If entries are not dropped, then a complete factorization of A −1 is computed and Z is significantly denser than L T. To preserve sparsity, at each step of the computation, entries are discarded (for example, using a prescribed threshold, or according to the positions of the entries, or by retaining a chosen number of the largest entries in each column), resulting in an approximate factorization of A −1.

There are several variants. Algorithms 11.3 and 11.4 outline left-looking and right-looking approaches, respectively. Practical implementations need to employ sparse matrix techniques. The left-looking scheme computes the j-th column z j of Z as a sparse linear combination of the previous columns z 1, …, z j−1. The key is determining which multipliers (the α’s in Steps 4 and 5 of the two algorithms, respectively) are nonzero and need to be computed. This can be achieved very efficiently by having access to both the rows and columns of A (although the algorithm does not require that A is explicitly stored—only the capability of forming inner products involving the rows of A is required). For the right-looking approach, the crucial part for each j is the update of the sparse submatrix of Z composed of the columns j + 1 to n that are not yet fully computed. Here, only one row of A is used in the outer loop of the algorithm. Therefore, A can be generated on-the-fly by rows. The DS format can be used to store the partially computed Z (Section 1.3.2). As with complete factorizations, the efficiency of the computation and application of AINV preconditioners can benefit from incorporating blocking.

Algorithm 11.3 AINV preconditioner (left-looking approach)

11.3.2 AINV Preconditioner: General Case

In the general case, the AINV preconditioner is given by an approximate inverse factorization of the form

$$\displaystyle \begin{aligned} A^{-1} \approx M^{-1} = WD^{-1}Z^T,\end{aligned}$$

where Z and W are unit upper triangular matrices and D is a diagonal matrix. Z and W are sparse approximations of the inverses of the L T and U factors in the LDU factorization of A, respectively. Starting from the columns of the identity matrix, A-biconjugation is used to compute the factors. Algorithm 11.5 outlines the right-looking approach. Note it offers two possibilities for computing the entries d jj of D that are equivalent in exact arithmetic if the factorization is breakdown-free. The left-looking variant given in Algorithm 11.3 can be generalized in a similar way.

Figure 11.2 illustrates the sparsity patterns of the AINV factors for a matrix arising in circuit simulation. \(\mathcal {S}\{A\}\) is symmetric, but the values of the entries of A are nonsymmetric. The sparsity pattern \(\mathcal {S}\{W+Z^T\}\) is given, where W and Z are computed using Algorithm 11.5 with sparsification based on a dropping tolerance of 0.5. Also given are the patterns \(\mathcal {S}\{\widetilde L+\widetilde U\}\) and \(\mathcal {S}\{\widetilde L^{-1}+\widetilde U^{-1}\}\) for the incomplete factors \(\widetilde L\) and \(\widetilde U\) computed using Algorithm 10.2 (see Section 10.2) with a dropping tolerance of 0.1 and at most 10 entries in each row of \(\widetilde L+\widetilde U\). Note that this dual dropping strategy is one of the most popular ways of employing Algorithm 10.2; it is often denoted as ILUT(p, τ), where p is the maximum number of entries allowed in each row and τ is the dropping tolerance. In this example, the parameters were chosen so that the number of entries in both W + Z T and \(\widetilde L+\widetilde U\) is approximately equal, but the resulting sparsity patterns are clearly different. In particular, potentially important information is lost from \(\mathcal {S}\{\widetilde L^{-1}+\widetilde U^{-1}\}\).

Fig. 11.2
figure 2

An example to illustrate the difference between the sparsity patterns of the AINV factors and those of the inverse of the ILU factors. The sparsity pattern \({\mathcal S}\{A\}\) of the matrix A is given (top left) together with the patterns of the factorized approximate inverse factors \(\mathcal {S}\{ W+Z^T\}\) (top right), the ILU factors \(\mathcal {S}\{\widetilde L+\widetilde U\}\) (bottom left), and their inverses \(\mathcal {S}\{\widetilde L^{-1}+\widetilde U^{-1}\}\) (bottom right).

11.3.3 SAINV: Stabilization of the AINV Method

The following result is analogous to Theorem 9.4.

Theorem 11.5 (Benzi et al. 1996)

If A is a nonsingular M- or H-matrix, then the AINV factorization of A does not break down.

For more general matrices, breakdown can happen because of the occurrence of a zero d jj or, in the SPD case, negative d jj. In practice, exact zeros are unlikely but very small d jj can occur (near breakdown), which may lead to uncontrolled growth in the size of entries in the incomplete factors and, because such entries are not dropped when using a threshold parameter, a large amount of fill-in. The next theorem indicates how breakdown can be prevented when A is SPD through reformulating the A-orthogonalization.

Algorithm 11.4 AINV preconditioner (right-looking approach)

Algorithm 11.5 Nonsymmetric AINV preconditioner (right-looking approach)

Theorem 11.6 (Benzi et al. 2000; Kopal et al. 2012)

Consider Algorithm 11.4 with no sparsification (Step 7 is removed). The following identity holds

$$\displaystyle \begin{aligned} A_{j,1:n}\, z_k^{(j-1)} \equiv e_j^T A z_k^{(j-1)} = \langle z_j^{(j-1)}, z_k^{(j-1)}\rangle_A, \ \ 1 \le j \le k \le n. \end{aligned}$$

Proof

Because AZ = Z T D and Z T D is lower triangular, entries 1 to j − 1 of the vector \(A{z}_k^{(j-1)}\) are equal to zero. Z is unit upper triangular so entries j + 1 to n of its j-th column \({z}_j^{(j-1)}\) are also equal to zero. Thus, \({z}_j^{(j-1)}\) can be written as the sum z + e j, where entries j to n of the vector z are zero. The result follows. □

This suggests using alternative computations within the AINV approach based on the whole of A instead of on its rows. The reformulation, which is called the stabilized AINV algorithm (SAINV), is outlined in Algorithm 11.6. It is breakdown-free for any SPD matrix A because the diagonal entries are \(d_{jj} = \langle z_j^{(j-1)},\, z_j^{(j-1)}\rangle _A >0.\) Practical experience shows that, while slightly more costly to compute, the SAINV algorithm gives higher quality preconditioners than the AINV algorithm. However, the computed diagonal entries can still be very small and may need to be modified.

Algorithm 11.6 SAINV preconditioner (right-looking approach)

The factors Z and D obtained with no sparsification can be used to compute the square root-free Cholesky factorization of A. The L factor of A and the inverse factor Z computed using Algorithm 11.6 without sparsification satisfy

$$\displaystyle \begin{aligned} AZ=LD \quad \mathrm{or} \quad L=AZD^{-1}.\end{aligned}$$

Using \(d_{jj} = \langle z_j^{(j-1)}, \,z_j^{(j-1)}\rangle _A\), and equating corresponding entries of AZD −1 and L, gives

$$\displaystyle \begin{aligned} l_{ij} = \frac {\langle z_j^{(j-1)}, \,z_i^{(j-1)} \rangle_A}{\langle z_j^{(j-1)}, \,z_j^{(j-1)}\rangle_A}, \quad 1 \le j \le i \le n. \end{aligned}$$

Thus, the SAINV algorithm generates the L factor of the square root-free Cholesky factorization of A as a by-product of orthogonalization in the inner product 〈. , .〉A at no extra cost and without breakdown.

The stabilization strategy can be extended to the nonsymmetric AINV algorithm using the following result.

Theorem 11.7 (Benzi & Tůma 1998; Bollhöfer & Saad 2002)

Consider Algorithm 11.5 with no sparsification (Steps 7 and 10 removed). The following identities hold:

$$\displaystyle \begin{aligned} A_{j,1:n}\, z_k^{(j-1)} = e_{j}^T A z_k^{(j-1)} = \langle w_j^{(j-1)},\, z_k^{(j-1)} \rangle_A, \end{aligned}$$
$$\displaystyle \begin{aligned} (A_{1:n,j})^T w_k^{(j-1)} = e_j^T A^T w_k^{(j-1)} = \langle z_j^{(j-1)}, \,w_k^{(j-1)} \rangle_A, \quad 1 \le j \le k \le n. \end{aligned}$$

The nonsymmetric SAINV algorithm obtained using this reformulation can improve the preconditioner quality, but it is not guaranteed to be breakdown-free.

11.4 Notes and References

Benzi & Tůma (1999) present an early comparative study that puts preconditioning by approximate inverses into the context of alternative preconditioning techniques; see also Bollhöfer & Saad (2002, 2006), Benzi & Tůma (2003), and Bru et al. (2008, 2010). The inverse by bordering method mentioned in Section 11.1 is from Saad (2003b).

The first use of approximate inverses based on Frobenius norm minimization is given by Benson (1973). A SPAI approach that can exploit a dynamically changing sparsity pattern \(\mathcal S\) is introduced in Cosgrove et al. (1992); an independent and enhanced description is given in the influential paper by Grote & Huckle (1997). Later developments are presented in Holland et al. (2005), Jia & Zhang (2013), and Jia & Kang (2019). A comprehensive discussion on the choice of the sparsity pattern \(\mathcal S\) can be found in Huckle (1999). Huckle & Kallischko (2007) consider modifying the SPAI method by probing or symmetrizing the approximate inverse and Bröker et al. (2001) look at using approximate inverses based on Frobenius norm minimization as smoothers for multigrid methods. Choosing sparsity patterns for a related approximate inverse with a particular emphasis on parallel computing is described by Chow (2000).

For nonsymmetric matrices, MI12 within the HSL mathematical software library computes SPAI preconditioners (see Gould & Scott, 1998 for details and a discussion of the merits and limitations of the approach). An early parallel implementation is given by Barnard et al. (1999). Dehnavi et al. (2013) present an efficient parallel implementation that uses GPUs and include comparisons with ParaSails (Chow, 2001). The latter handles SPD problems using a factored sparse approximate inverse and general problems with an unfactored sparse approximate inverse. A priori techniques determine \(\mathcal S\) as a power of a sparsified matrix.

Original work on the FSAI preconditioner is by Kolotilina & Yeremin (1986, 1993). Its use in solving systems on massively parallel computers is presented in Kolotilina et al. (1992), while an interesting iterative construction can be found in Kolotilina et al. (2000). A parallel variant called ISAI preconditioning that combines a Frobenius norm-based approach with traditional ILU preconditioning is proposed by Anzt et al. (2018). FSAI preconditioning has attracted significant theoretical and practical attention. Recent contributions discuss not only its efficacy but also parallel computation, the use of blocks, supernodes, and multilevel implementations (Ferronato et al., 2012, 2014; Janna & Ferronato, 2011; Janna et al., 2010, 2013, 2015; Ferronato & Pini, 2018; Magri et al., 2018). Many of these enhancements are exploited in the FSAIPACK software of Janna et al. (2015).

The AINV preconditioner for SPD and nonsymmetric systems is introduced in Benzi et al. (1996) and Benzi & Tůma (1998), respectively; see also Benzi et al. (1999) for a parallel implementation. However, the development of this type of preconditioner follows much earlier interest in factorized matrix inverses (for example, Morris, 1946 and Fox et al., 1948). For the SAINV algorithm, see Benzi et al. (2000) and Kharchenko et al. (2001). Theoretical and practical properties of the AINV and SAINV factorizations are studied in a series of papers by Kopal et al. (2012, 2016, 2020).