Abstract
Consider a preconditioner M based on an incomplete LU (or Cholesky) factorization of a matrix A. M −1, which represents an approximation of A −1, is applied by performing forward and back substitution steps; this can present a computational bottleneck. An alternative strategy is to directly approximate A −1 by explicitly computing M −1. Preconditioners of this kind are called sparse approximate inverse preconditioners. They constitute an important class of algebraic preconditioners that are complementary to the approaches discussed in the previous chapter. They can be attractive because when used with an iterative solver, they can require fewer iterations than standard incomplete factorization preconditioners that contain a similar number of entries while offering significantly greater potential for parallel computations.
While it is recognized that preconditioning the system often improves the convergence of a particular method, this is not always so. In particular, a successful preconditioner for one class of problems may prove ineffective on another class. – Gould & Scott ( 1998 ).
There is, of course, no such concept as a best preconditioner ... However, every practitioner knows when they have a good preconditioner which enables feasible computation and solution of problems. In this sense, preconditioning will always be an art rather than a science. – Wathen ( 2015 ).
You have full access to this open access chapter, Download chapter PDF
Consider a preconditioner M based on an incomplete LU (or Cholesky) factorization of a matrix A. M −1, which represents an approximation of A −1, is applied by performing forward and back substitution steps; this can present a computational bottleneck. An alternative strategy is to directly approximate A −1 by explicitly computing M −1. Preconditioners of this kind are called sparse approximate inverse preconditioners. They constitute an important class of algebraic preconditioners that are complementary to the approaches discussed in the previous chapter. They can be attractive because when used with an iterative solver, they can require fewer iterations than standard incomplete factorization preconditioners that contain a similar number of entries while offering significantly greater potential for parallel computations.
From Theorem 7.3, the sparsity pattern of the inverse of an irreducible matrix A is dense, even when A is sparse. Therefore, if A is large, the exact computation of its inverse is not an option, and aggressive dropping is needed to obtain a sufficiently sparse approximation to A −1 that can be used as a preconditioner. Fortunately, for a wide class of problems of practical interest, many of the entries of A −1 are small in absolute value, so that approximating the inverse with a sparse M −1 may be feasible, although capturing the large (important) values of A −1 is a nontrivial task. Importantly, the computed M −1 can have nonzeros at positions that cannot be obtained by either a complete or an incomplete factorization, and this can be beneficial. Furthermore, although A −1 is fully dense, the following result shows this is not the case for the factors of factorized inverses.
Assume the matrix A is SPD, and let L be its Cholesky factor. Then \(\mathcal {S}\{L^{-1}\}\) is the union of all entries (i, j) such that i is an ancestor of j in the elimination tree \(\mathcal {T}(A)\).
A consequence of this result is that L −1 need not be fully dense. Considering this implication algorithmically, if A is SPD, it may be advantageous to preorder A to limit the number of ancestors that the vertices in \(\mathcal {T}(A)\) have. For example, nested dissection may be applied to \(\mathcal {S}\{A\}\) (Section 8.4). If \(\mathcal {S}\{A\}\) is nonsymmetric, then it may be possible to reduce fill-in in the factors of A −1 by applying nested dissection to \(\mathcal {S}\{A+A^T\}\).
11.1 Basic Approaches
An obvious way to obtain an approximate inverse of A in factorized form is to compute an incomplete LU factorization of A and then perform an approximate inversion of the incomplete factors. For example, if incomplete factors \(\widetilde L\) and \(\widetilde U\) are available, approximate inverse factors can be found by solving the 2n triangular linear systems
where e i is the i-th column of the identity matrix. These systems can all be solved independently, and hence, there is the potential for significant parallelism. To reduce costs and to preserve sparsity in the approximate inverse factors, they may not need to be solved accurately. A disadvantage is that the computation of the preconditioner involves two levels of incompleteness, and because information from the incomplete factorization of A is passed into the second step, the loss of information can be excessive.
Another straightforward approach is based on bordering. Let A j denote the principal leading submatrix of A of order j (A j = A 1:j,1:j), and assume that its inverse factorization
is known. Here W j and Z j are unit upper triangular matrices, and D j is a diagonal matrix. Consider the following scheme:
where for 1 ≤ j < n
Starting from j = 1, this suggests a procedure for computing the inverse factors of A. Sparsity can be preserved by dropping some entries from the vectors w j+1 and z j+1 once they have been computed. Sparsity and the quality of the preconditioner can be influenced by preordering A.
If A is symmetric, W = Z and the required work is halved. Furthermore, if A is SPD, then it can be shown that, in exact arithmetic, d jj > 0 for all j and the process does not break down. In the general case, diagonal modifications may be required, which can limit the effectiveness of the resulting preconditioner.
Observe that the computations of Z and W are tightly coupled, restricting the potential to exploit parallelism. At each step j, besides a matrix–vector product with A j, four sparse matrix–vector products involving W j, Z j and their transposes are needed; these account for most of the work. The implementation is simplified if access to the triangular factors is available by columns as well as by rows.
11.2 Approximate Inverses Based on Frobenius Norm Minimization
It is clear from the above discussion that alternative techniques for constructing sparse approximate inverse preconditioners are needed. We start by looking at schemes based on Frobenius norm minimization. Historically, these were the first to be proposed and offer the greatest potential for parallelism because both the construction of the preconditioner and its subsequent application can be performed in parallel.
11.2.1 SPAI Preconditioner
To describe the sparse approximate inverse (SPAI) preconditioner, it is convenient to use the notation K = M −1. The basic idea is to compute K ≈ A −1 with its columns denoted by k j as the solution of the problem of minimizing
over all K with pattern \(\mathcal {S}\). This produces a right approximate inverse. A left approximate inverse can be computed by solving a minimization problem for ∥I − KA∥F = ∥I − A T K T∥F. This amounts to computing a right approximate inverse for A T and taking the transpose of the resulting matrix. For nonsymmetric matrices, the distinction between left and right approximate inverses can be important. Indeed, there are situations where it is difficult to compute a good right approximate inverse but easy to find a good left approximate inverse (or vice versa). In the following discussion, we assume that a right approximate inverse is being computed.
The Frobenius norm is generally used because the minimization problem then reduces to least squares problems for the columns of K that can be computed independently and, if required, in parallel. Further, these least squares problems are all of small dimension when \(\mathcal {S}\) is chosen to ensure K is sparse. Let \(\mathcal {J} = \{i \, | \, k_j(i) \ne 0\}\) be the set of indices of the nonzero entries in column k j. The set of indices of rows of A that can affect a product with column k j is \(\mathcal {I} = \{m\, | \, {A}_{m,\mathcal {J}} \neq 0 \}\). Let \(|\mathcal {I}|\) and \(|\mathcal {J}|\) denote the number of entries in \(\mathcal {I}\) and \(\mathcal {J}\), respectively, and let \(\widehat e_j = e_j(\mathcal {I})\) be the vector of length \(|\mathcal {I}|\) that is obtained by taking the entries of e j with row indices belonging to \(\mathcal {I}\). To solve (11.1) for k j, construct the \(|\mathcal {I}| \times |\mathcal {J}|\) matrix \(\widehat A = A_{\mathcal {I},\mathcal {J}}\) and solve the small unconstrained least squares problem
This can be done using a dense QR factorization of \(\widehat A\). Extending \(\widehat k_j \) to have length n by setting entries that are not in \(\mathcal {J}\) to zero gives k j.
A straightforward way to construct \(\mathcal {S}\) that does not depend on a sophisticated initial choice (but could, for example, be the identity or be equal to \(\mathcal {S}\{A\}\)) proceeds as follows. Starting with a chosen column sparsity pattern \(\mathcal {J}\) for k j, construct \(\widehat A\), solve (11.2) for \(\widehat k_j \), set \(k_j(\mathcal {J}) = \widehat k_j \), and define the residual vector
If ∥r j∥2 ≠ 0, then k j is not equal to the j-th column of A −1, and a better approximation can be derived by augmenting \(\mathcal {J}\). To do this, let \(\mathcal {L} = \{l \, |\, r_j(l) \neq 0\}\) and define
These are candidate indices that can be added to \(\mathcal {J}\), but as there may be many of them, they need to be chosen to most effectively reduce ∥r j∥2. A possible heuristic is to solve for each \(i \in \widetilde {\mathcal {J}}\) the minimization problem
This has the solution \(\mu _i ={r_j^T Ae_i}/{\|Ae_i\|{ }_2^2} \) with residual \(\|r_j\|{ }^2 - (r_j^T Ae_i)^2 / \|Ae_i\|{ }_2^2. \) Indices \(i \in \widetilde {\mathcal {J}}\) for which this is small are appended to \(\mathcal {J}\). The process can be repeated until either the required accuracy is attained or the maximum number of allowed entries in \(\mathcal {J}\) is reached.
Solving the unconstrained least squares problem (11.2) after extending \(\widehat A\) to \( A_{\mathcal {I} \cup \mathcal {I}^{\prime },\,\mathcal {J}\cup \mathcal {J}^{\prime }}\) is typically performed using updating. Assume the QR factorization of \(\widehat A\) is
where Q 1 is \(|\mathcal {I}| \times |\mathcal {J}|\). Here Q is an orthogonal matrix and R is an upper triangular matrix. The QR factorization of the extended matrix is
where Q ′ and R ′ are from the QR factorization of the \((|\mathcal {I}^{\prime }|+|\mathcal {I}| - |\mathcal {J}|) \times |\mathcal {J}^{\prime }|\) matrix
Factorizing this matrix and updating the trailing QR factorization to get the new \(\widehat k_j\) is much more efficient than computing the QR factorization of the extended matrix from scratch.
Construction of the SPAI preconditioner is summarized in Algorithm 11.1. The maximum number of entries nz j that is permitted in k j must be at least as large as the number of entries in the initial sparsity pattern \(\mathcal {J}_j\). Updating can be used to compute a new \({\widehat k_j}\) for each pass through the while loop; the number of passes is typically small (for example, if a good initial sparsity pattern is available, a single pass may be sufficient).
Algorithm 11.1 SPAI preconditioner (right-looking approach)
![](http://media.springernature.com/lw554/springer-static/image/chp%3A10.1007%2F978-3-031-25820-6_11/MediaObjects/526491_1_En_11_Figaaa_HTML.png)
The example in Figure 11.1 illustrates Algorithm 11.1. Starting with a tridiagonal matrix, it considers the computation of the first column k 1 of the inverse matrix K. The algorithm starts with \(\mathcal {J}_1 = \{1,2\}\).
An illustration of computing the first column of a sparse approximate inverse using the SPAI algorithm with nz 1 = 3. On the top line is the initial tridiagonal matrix A followed by the matrix \(\widehat A\) and the vectors \(\widehat k_1\) and r 1 on the first loop of Algorithm 11.1. The bottom line presents the updated matrix \(\hat A\) that is obtained on the second loop by adding the third row and column of A and the corresponding vectors \(\widehat k_1\) and r 1 and, finally, k 1. Here the numerical values have been appropriately rounded.
When A is symmetric, there is no guarantee that the computed K will be symmetric. One possibility is to use (K + K T)∕2 to approximate A −1. The SPAI preconditioner is not sensitive to the ordering of A. This has the advantage that A can be partitioned and preordered in whatever way is convenient, for instance, to better suit the needs of a distributed implementation, without worrying about the impact on the subsequent convergence rate of the solver. The disadvantage is that orderings cannot be used to reduce fill-in and/or improve the quality of this approximate inverse. For instance, if A −1 has no small entries, SPAI will not find a sparse K, and because the inverse of a permutation of A is just a permutation of A −1, no permutation of A will change this.
11.2.2 FSAI Preconditioner: SPD Case
We next consider a class of preconditioners based on an incomplete inverse factorization of A −1. The factorized sparse approximate inverse (FSAI) preconditioner for an SPD matrix A is defined as the product
where the sparse lower triangular matrix G is an approximation of the inverse of the (complete) Cholesky factor L of A. Theoretically, a FSAI preconditioner is computed by choosing a lower triangular sparsity pattern \(\mathcal {S}_L\) and minimizing
over all G with sparsity pattern \(\mathcal {S}_L\). Here tr denotes the matrix trace operator (that is, the sum of the entries on the diagonal). The computation of G can be performed without knowing L explicitly. Differentiating (11.4) with respect to the entries of G and setting to zero yields
Because L T is an upper triangular matrix while \(\mathcal {S}_L\) is a lower triangular pattern, the matrix equation (11.5) can be rewritten as
G is not available from (11.6) because L is unknown. Instead, \(\overline G\) is computed such that
where δ i,j is the Kronecker delta function (δ i,j = 1 if i = j and is equal to 0, otherwise). The FSAI factor G is then obtained by setting
where D is a diagonal scaling matrix. An appropriate choice for D is
so that
The following result shows that the FSAI preconditioner exists for any nonzero pattern \(\mathcal {S}_L\) that includes the main diagonal of A.
Theorem 11.2 (Kolotilina & Yeremin 1993)
Assume A is SPD. If the lower triangular sparsity pattern \(\mathcal {S}_L\) includes all diagonal positions, then G exists and is unique.
Proof
Set \(\mathcal {I}_i = \{ j \ | \ (i,j) \in \mathcal {S}_L \}\), and let \(A_{\mathcal {I}_i,\, \mathcal {I}_i}\) denote the submatrix of order \(nz_i=|\mathcal {I}_i|\) of entries a kl such that \(k,l \in \mathcal {I}_i\). Let \(\bar g_i\) and g i be dense vectors containing the nonzero coefficients in row i of \(\overline G\) and G, respectively. Using this notation, solving (11.7) decouples into solving n independent SPD linear systems
where the unit vectors are of length nz i. Moreover,
This implies that the diagonal entries of D given by (11.8) are nonzero. Consequently, the computed rows of G exist and provide a unique solution. □
The procedure for computing a FSAI preconditioner is summarized in Algorithm 11.2. The computation of each row of G can be performed independently; thus, the algorithm is inherently parallel, but each application of the preconditioner requires the solution of triangular systems.
Algorithm 11.2 FSAI preconditioner
![](http://media.springernature.com/lw554/springer-static/image/chp%3A10.1007%2F978-3-031-25820-6_11/MediaObjects/526491_1_En_11_Figaab_HTML.png)
The following theorem states that G computed using Algorithm 11.2 is in some sense optimal.
Theorem 11.3 (Kolotilina et al. 2000)
Let L be the Cholesky factor of the SPD matrix A. Given a lower triangular sparsity pattern \(\mathcal {S}_L\) that includes all diagonal positions, let G be the FSAI preconditioner computed using Algorithm 11.2 . Then any lower triangular matrix G 1 with its sparsity pattern contained in \(\mathcal {S}_L\) and \((G_1AG_1^T)_{ii} = 1\) (1 ≤ i ≤ n) satisfies
The performance of the FSAI preconditioner is highly dependent on the choice of \(\mathcal {S}_L\). If entries are added to the pattern, then, as the following result shows, the preconditioner is more accurate, but it is also more expensive.
Theorem 11.4 (Kolotilina et al. 2000)
Let L be the Cholesky factor of the SPD matrix A. Given the lower triangular sparsity patterns \(\mathcal {S}_{L1}\) and \(\mathcal {S}_{L2}\) that include all diagonal positions, let the corresponding FSAI preconditioners computed using Algorithm 11.2 be G 1 and G 2 , respectively. If \(\mathcal {S}_{L1} \subseteq \mathcal {S}_{L2}\) , then
11.2.3 FSAI Preconditioner: General Case
The FSAI algorithm can be extended to a general matrix A. Two input sparsity patterns are required: a lower triangular sparsity pattern \(\mathcal {S}_L\) and an upper triangular sparsity pattern \(\mathcal {S}_U\), both containing the diagonal positions. First, lower and upper triangular matrices \(\overline G_L\) and \(\overline G_U\) are computed such that
Then D is obtained as the inverse of the diagonal of the matrix \( \overline G_L A \overline G_U ,\) and the final nonsymmetric FSAI factors are given by \(G_L = \overline G_L \) and \( G_U = \overline G_U D. \) The computation of the two approximate factors can be performed independently.
This generalization is well defined if, for example, A is nonsymmetric positive definite. There is also theory that extends existence to special classes of matrices, including M- and H-matrices. In more general cases, solutions to the reduced systems may not exist, and modifications (such as perturbing the diagonal entries) are needed to circumvent breakdown.
11.2.4 Determining a Good Sparsity Pattern
The role of the input pattern is to preserve sparsity by filtering out entries of A −1 that contribute little to the quality of the preconditioner. For instance, it might be appropriate to ignore entries with a small absolute value, while retaining the largest ones. Unfortunately, the locations of large entries in A −1 are generally unknown, and this makes the a priori sparsity choice difficult. A possible exception is when A is a banded SPD matrix. In this case, the entries of A −1 are bounded in an exponentially decaying manner along each row or column. Specifically, there exist 0 < ρ < 1 and a constant c such that for all i, j
The scalars ρ and c depend on the bandwidth and the condition number of A. For matrices with a large bandwidth and/or a high condition number, c can be very large and ρ close to one, indicating extremely slow decay. However, if the entries of A −1 can be shown to decay rapidly, then a banded M −1 should be a good approximation to A −1. In this case, \(\mathcal {S}_L\) can be chosen to correspond to a matrix with a prescribed bandwidth.
A common choice for a general A is \(\mathcal {S}_L + \mathcal {S}_U = \mathcal {S}\{A\}\), motivated by the empirical observation that entries in A −1 that correspond to nonzero positions in A tend to be relatively large. However, this simple choice is not robust because entries of A −1 that lie outside \(\mathcal {S}\{A\}\) can also be large. An alternative strategy based on the Neumann series expansion of A −1 is to use the pattern of a small power of A, i.e., \(\mathcal {S}\{A^2\}\) or \(\mathcal {S}\{A^3\}\). By starting from the lower and upper triangular parts of A, this approach can be used to determine candidates \(\mathcal {S}_L\) and \(\mathcal {S}_U\). While approximate inverses based on higher powers of A are often better than those corresponding to A, there is no guarantee they will result in good preconditioners. Furthermore, even small powers of A can be very dense, thus slowing down the construction and application of the preconditioner. A possible remedy is to use the power of a sparsified A. Alternatively, the pattern can be chosen dynamically by retaining the largest terms in each row of the preconditioner as it is computed, which is what the SPAI algorithm does. Another possibility is to implicitly determine \(\mathcal {S}_L + \mathcal {S}_U\) as follows. Starting with a simple sparsity pattern, compute the corresponding FSAI preconditioner G 1. Then choose a pattern based on \(G_1AG_1^T\) and apply the FSAI algorithm to \(G_1AG_1^T\) to obtain G 2. Finally, set the preconditioner to G 2 G 1. Despite running the FSAI algorithm twice, this approach can be worthwhile. Unfortunately, the choice of the best technique for generating a FSAI preconditioner and its sparsity pattern is highly problem dependent.
11.3 Factorized Approximate Inverses Based on Incomplete Conjugation
An alternative way to obtain a factorized approximate inverse is based on incomplete conjugation (A-orthogonalization) in the SPD case and on incomplete A-biconjugation in the general case. For SPD matrices, the approach represents an approximate Gram–Schmidt orthogonalization that uses the A-inner product 〈., .〉A. An important attraction is that the sparsity patterns of the approximate inverse factors need not be specified in advance; instead, they are determined dynamically as the preconditioner is computed.
11.3.1 AINV Preconditioner: SPD Case
When A is an SPD matrix, the AINV preconditioner is defined by an approximate inverse factorization of the form
where the matrix Z is unit upper triangular and D is a diagonal matrix with positive entries. The factor Z is a sparse approximation of the inverse of the L T factor in the square root-free factorization of A. Z and D are computed directly from A using an incomplete A-orthogonalization process applied to the columns of the identity matrix. If entries are not dropped, then a complete factorization of A −1 is computed and Z is significantly denser than L T. To preserve sparsity, at each step of the computation, entries are discarded (for example, using a prescribed threshold, or according to the positions of the entries, or by retaining a chosen number of the largest entries in each column), resulting in an approximate factorization of A −1.
There are several variants. Algorithms 11.3 and 11.4 outline left-looking and right-looking approaches, respectively. Practical implementations need to employ sparse matrix techniques. The left-looking scheme computes the j-th column z j of Z as a sparse linear combination of the previous columns z 1, …, z j−1. The key is determining which multipliers (the α’s in Steps 4 and 5 of the two algorithms, respectively) are nonzero and need to be computed. This can be achieved very efficiently by having access to both the rows and columns of A (although the algorithm does not require that A is explicitly stored—only the capability of forming inner products involving the rows of A is required). For the right-looking approach, the crucial part for each j is the update of the sparse submatrix of Z composed of the columns j + 1 to n that are not yet fully computed. Here, only one row of A is used in the outer loop of the algorithm. Therefore, A can be generated on-the-fly by rows. The DS format can be used to store the partially computed Z (Section 1.3.2). As with complete factorizations, the efficiency of the computation and application of AINV preconditioners can benefit from incorporating blocking.
Algorithm 11.3 AINV preconditioner (left-looking approach)
![](http://media.springernature.com/lw554/springer-static/image/chp%3A10.1007%2F978-3-031-25820-6_11/MediaObjects/526491_1_En_11_Figaac_HTML.png)
11.3.2 AINV Preconditioner: General Case
In the general case, the AINV preconditioner is given by an approximate inverse factorization of the form
where Z and W are unit upper triangular matrices and D is a diagonal matrix. Z and W are sparse approximations of the inverses of the L T and U factors in the LDU factorization of A, respectively. Starting from the columns of the identity matrix, A-biconjugation is used to compute the factors. Algorithm 11.5 outlines the right-looking approach. Note it offers two possibilities for computing the entries d jj of D that are equivalent in exact arithmetic if the factorization is breakdown-free. The left-looking variant given in Algorithm 11.3 can be generalized in a similar way.
Figure 11.2 illustrates the sparsity patterns of the AINV factors for a matrix arising in circuit simulation. \(\mathcal {S}\{A\}\) is symmetric, but the values of the entries of A are nonsymmetric. The sparsity pattern \(\mathcal {S}\{W+Z^T\}\) is given, where W and Z are computed using Algorithm 11.5 with sparsification based on a dropping tolerance of 0.5. Also given are the patterns \(\mathcal {S}\{\widetilde L+\widetilde U\}\) and \(\mathcal {S}\{\widetilde L^{-1}+\widetilde U^{-1}\}\) for the incomplete factors \(\widetilde L\) and \(\widetilde U\) computed using Algorithm 10.2 (see Section 10.2) with a dropping tolerance of 0.1 and at most 10 entries in each row of \(\widetilde L+\widetilde U\). Note that this dual dropping strategy is one of the most popular ways of employing Algorithm 10.2; it is often denoted as ILUT(p, τ), where p is the maximum number of entries allowed in each row and τ is the dropping tolerance. In this example, the parameters were chosen so that the number of entries in both W + Z T and \(\widetilde L+\widetilde U\) is approximately equal, but the resulting sparsity patterns are clearly different. In particular, potentially important information is lost from \(\mathcal {S}\{\widetilde L^{-1}+\widetilde U^{-1}\}\).
An example to illustrate the difference between the sparsity patterns of the AINV factors and those of the inverse of the ILU factors. The sparsity pattern \({\mathcal S}\{A\}\) of the matrix A is given (top left) together with the patterns of the factorized approximate inverse factors \(\mathcal {S}\{ W+Z^T\}\) (top right), the ILU factors \(\mathcal {S}\{\widetilde L+\widetilde U\}\) (bottom left), and their inverses \(\mathcal {S}\{\widetilde L^{-1}+\widetilde U^{-1}\}\) (bottom right).
11.3.3 SAINV: Stabilization of the AINV Method
The following result is analogous to Theorem 9.4.
Theorem 11.5 (Benzi et al. 1996)
If A is a nonsingular M- or H-matrix, then the AINV factorization of A does not break down.
For more general matrices, breakdown can happen because of the occurrence of a zero d jj or, in the SPD case, negative d jj. In practice, exact zeros are unlikely but very small d jj can occur (near breakdown), which may lead to uncontrolled growth in the size of entries in the incomplete factors and, because such entries are not dropped when using a threshold parameter, a large amount of fill-in. The next theorem indicates how breakdown can be prevented when A is SPD through reformulating the A-orthogonalization.
Algorithm 11.4 AINV preconditioner (right-looking approach)
![](http://media.springernature.com/lw554/springer-static/image/chp%3A10.1007%2F978-3-031-25820-6_11/MediaObjects/526491_1_En_11_Figaad_HTML.png)
Algorithm 11.5 Nonsymmetric AINV preconditioner (right-looking approach)
![](http://media.springernature.com/lw554/springer-static/image/chp%3A10.1007%2F978-3-031-25820-6_11/MediaObjects/526491_1_En_11_Figaae_HTML.png)
Theorem 11.6 (Benzi et al. 2000; Kopal et al. 2012)
Consider Algorithm 11.4 with no sparsification (Step 7 is removed). The following identity holds
Proof
Because AZ = Z −T D and Z −T D is lower triangular, entries 1 to j − 1 of the vector \(A{z}_k^{(j-1)}\) are equal to zero. Z is unit upper triangular so entries j + 1 to n of its j-th column \({z}_j^{(j-1)}\) are also equal to zero. Thus, \({z}_j^{(j-1)}\) can be written as the sum z + e j, where entries j to n of the vector z are zero. The result follows. □
This suggests using alternative computations within the AINV approach based on the whole of A instead of on its rows. The reformulation, which is called the stabilized AINV algorithm (SAINV), is outlined in Algorithm 11.6. It is breakdown-free for any SPD matrix A because the diagonal entries are \(d_{jj} = \langle z_j^{(j-1)},\, z_j^{(j-1)}\rangle _A >0.\) Practical experience shows that, while slightly more costly to compute, the SAINV algorithm gives higher quality preconditioners than the AINV algorithm. However, the computed diagonal entries can still be very small and may need to be modified.
Algorithm 11.6 SAINV preconditioner (right-looking approach)
![](http://media.springernature.com/lw554/springer-static/image/chp%3A10.1007%2F978-3-031-25820-6_11/MediaObjects/526491_1_En_11_Figaaf_HTML.png)
The factors Z and D obtained with no sparsification can be used to compute the square root-free Cholesky factorization of A. The L factor of A and the inverse factor Z computed using Algorithm 11.6 without sparsification satisfy
Using \(d_{jj} = \langle z_j^{(j-1)}, \,z_j^{(j-1)}\rangle _A\), and equating corresponding entries of AZD −1 and L, gives
Thus, the SAINV algorithm generates the L factor of the square root-free Cholesky factorization of A as a by-product of orthogonalization in the inner product 〈. , .〉A at no extra cost and without breakdown.
The stabilization strategy can be extended to the nonsymmetric AINV algorithm using the following result.
Theorem 11.7 (Benzi & Tůma 1998; Bollhöfer & Saad 2002)
Consider Algorithm 11.5 with no sparsification (Steps 7 and 10 removed). The following identities hold:
The nonsymmetric SAINV algorithm obtained using this reformulation can improve the preconditioner quality, but it is not guaranteed to be breakdown-free.
11.4 Notes and References
Benzi & Tůma (1999) present an early comparative study that puts preconditioning by approximate inverses into the context of alternative preconditioning techniques; see also Bollhöfer & Saad (2002, 2006), Benzi & Tůma (2003), and Bru et al. (2008, 2010). The inverse by bordering method mentioned in Section 11.1 is from Saad (2003b).
The first use of approximate inverses based on Frobenius norm minimization is given by Benson (1973). A SPAI approach that can exploit a dynamically changing sparsity pattern \(\mathcal S\) is introduced in Cosgrove et al. (1992); an independent and enhanced description is given in the influential paper by Grote & Huckle (1997). Later developments are presented in Holland et al. (2005), Jia & Zhang (2013), and Jia & Kang (2019). A comprehensive discussion on the choice of the sparsity pattern \(\mathcal S\) can be found in Huckle (1999). Huckle & Kallischko (2007) consider modifying the SPAI method by probing or symmetrizing the approximate inverse and Bröker et al. (2001) look at using approximate inverses based on Frobenius norm minimization as smoothers for multigrid methods. Choosing sparsity patterns for a related approximate inverse with a particular emphasis on parallel computing is described by Chow (2000).
For nonsymmetric matrices, MI12 within the HSL mathematical software library computes SPAI preconditioners (see Gould & Scott, 1998 for details and a discussion of the merits and limitations of the approach). An early parallel implementation is given by Barnard et al. (1999). Dehnavi et al. (2013) present an efficient parallel implementation that uses GPUs and include comparisons with ParaSails (Chow, 2001). The latter handles SPD problems using a factored sparse approximate inverse and general problems with an unfactored sparse approximate inverse. A priori techniques determine \(\mathcal S\) as a power of a sparsified matrix.
Original work on the FSAI preconditioner is by Kolotilina & Yeremin (1986, 1993). Its use in solving systems on massively parallel computers is presented in Kolotilina et al. (1992), while an interesting iterative construction can be found in Kolotilina et al. (2000). A parallel variant called ISAI preconditioning that combines a Frobenius norm-based approach with traditional ILU preconditioning is proposed by Anzt et al. (2018). FSAI preconditioning has attracted significant theoretical and practical attention. Recent contributions discuss not only its efficacy but also parallel computation, the use of blocks, supernodes, and multilevel implementations (Ferronato et al., 2012, 2014; Janna & Ferronato, 2011; Janna et al., 2010, 2013, 2015; Ferronato & Pini, 2018; Magri et al., 2018). Many of these enhancements are exploited in the FSAIPACK software of Janna et al. (2015).
The AINV preconditioner for SPD and nonsymmetric systems is introduced in Benzi et al. (1996) and Benzi & Tůma (1998), respectively; see also Benzi et al. (1999) for a parallel implementation. However, the development of this type of preconditioner follows much earlier interest in factorized matrix inverses (for example, Morris, 1946 and Fox et al., 1948). For the SAINV algorithm, see Benzi et al. (2000) and Kharchenko et al. (2001). Theoretical and practical properties of the AINV and SAINV factorizations are studied in a series of papers by Kopal et al. (2012, 2016, 2020).
References
Anzt, H., Huckle, T. K., Bräckle, J., & Dongarra, J. (2018). Incomplete sparse approximate inverses for parallel preconditioning. Parallel Computing, 71, 1–22.
Barnard, S. T., Clay, R. L., & Simon, H. D. (1999). An MPI implementation of the SPAI preconditioner on the T3E. International Journal of High Performance Computing Applications, 13(2), 107–123.
Benson, M. W. (1973). Iterative solution of large scale linear systems. Master’s thesis, Thunder Bay, Canada: Lakehead University.
Benzi, M., Cullum, J. K., & Tůma, M. (2000). Robust approximate inverse preconditioning for the conjugate gradient method. SIAM Journal on Scientific Computing, 22(4), 1318–1332.
Benzi, M., Marín, J., & Tůma, M. (1999). Parallel preconditioning with factorized sparse approximate inverses. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing. Philadelphia, PA: SIAM.
Benzi, M., Meyer, C. D., & Tůma, M. (1996). A sparse approximate inverse preconditioner for the conjugate gradient method. SIAM Journal on Scientific Computing, 17(5), 1135–1149.
Benzi, M. & Tůma, M. (1998). A sparse approximate inverse preconditioner for nonsymmetric linear systems. SIAM Journal on Scientific Computing, 19(3), 968–994.
Benzi, M. & Tůma, M. (1999). A comparative study of sparse approximate inverse preconditioners. Applied Numerical Mathematics, 30(2-3), 305–340.
Benzi, M. & Tůma, M. (2000). Orderings for factorized sparse approximate inverse preconditioners. SIAM Journal on Scientific Computing, 21(5), 1851–1868.
Benzi, M. & Tůma, M. (2003). A robust incomplete factorization preconditioner for positive definite matrices. Numerical Linear Algebra with Applications, 10(5-6), 385–400.
Bollhöfer, M. & Saad, Y. (2002). On the relations between ILUs and factored approximate inverses. SIAM Journal on Matrix Analysis and Applications, 24(1), 219–237.
Bollhöfer, M. & Saad, Y. (2006). Multilevel preconditioners constructed from inverse-based ILUs. SIAM Journal on Scientific Computing, 27(5), 1627–1650.
Bridson, R. & Tang, W.-P. (1999). Ordering, anisotropy, and factored sparse approximate inverses. SIAM Journal on Scientific Computing, 21(3), 867–882.
Bröker, O., Grote, M. J., Mayer, C., & Reusken, A. (2001). Robust parallel smoothing for multigrid via sparse approximate inverses. SIAM Journal on Scientific Computing, 23(4), 1396–1417.
Bru, R., Marín, J., Mas, J., & Tůma, M. (2008). Balanced incomplete factorization. SIAM Journal on Scientific Computing, 30(5), 2302–2318.
Bru, R., Marín, J., Mas, J., & Tůma, M. (2010). Improved balanced incomplete factorization. SIAM Journal on Matrix Analysis and Applications, 31(5), 2431–2452.
Chow, E. (2000). A priori sparsity patterns for parallel sparse approximate inverse preconditioners. SIAM Journal on Scientific Computing, 21(5), 1804–1822.
Chow, E. (2001). Parallel implementation and practical use of sparse approximate inverse preconditioners with a priori sparsity patterns. International Journal of High Performance Computing Applications, 15(1), 56–74.
Cosgrove, J. D. F., Díaz, & Griewank, A. (1992). Approximate inverse preconditioning for sparse linear systems. International Journal of Computer Mathematics, 44, 91–110.
Dehnavi, M. M., Becerra, F., Moises, D., Gaudiot, J.-L., & Giannacopoulos, D. D. (2013). Parallel sparse approximate inverse preconditioning on graphic processing units. IEEE Transactions on Parallel and Distributed Systems, 24(9), 1852–1862.
Ferronato, M., Janna, C., & Pini, G. (2012). Shifted FSAI preconditioners for the efficient parallel solution of non-linear groundwater flow models. International Journal for Numerical Methods in Engineering, 89(13), 1707–1719.
Ferronato, M., Janna, C., & Pini, G. (2014). A generalized Block FSAI preconditioner for nonsymmetric linear systems. Journal of Computational and Applied Mathematics, 256, 230–241.
Ferronato, M. & Pini, G. (2018). A supernodal block factorized sparse approximate inverse for non-symmetric linear systems. Numerical Algorithms, 78(1), 333–354.
Fox, L., Huskey, H. D., & Wilkinson, J. H. (1948). Notes on the solution of algebraic linear simultaneous equations. The Quarterly Journal of Mechanics and Applied Mathematics, 1, 149–173.
Gould, N. I. M. & Scott, J. A. (1998). On approximate inverse preconditioners. SIAM Journal on Scientific Computing, 19(2), 605–625.
Grote, M. J. & Huckle, T. (1997). Parallel preconditioning with sparse approximate inverses. SIAM Journal on Scientific Computing, 18(3), 838–853.
Holland, R. M., Wathen, A. J., & Shaw, G. J. (2005). Sparse approximate inverses and target matrices. SIAM Journal on Scientific Computing, 26(3), 1000–1011.
Huckle, T. (1999). Approximate sparsity patterns for the inverse of a matrix and preconditioning. Applied Numerical Mathematics, 30(2-3), 291–303.
Huckle, T. & Kallischko, A. (2007). Frobenius norm minimization and probing for preconditioning. International Journal of Computer Mathematics, 84(8), 1225–1248.
Janna, C. & Ferronato, M. (2011). Adaptive pattern research for block FSAI preconditioning. SIAM Journal on Scientific Computing, 33(6), 3357–3380.
Janna, C., Ferronato, M., & Gambolati, G. (2010). A block FSAI-ILU parallel preconditioner for symmetric positive definite linear systems. SIAM Journal on Scientific Computing, 32(5), 2468–2484.
Janna, C., Ferronato, M., & Gambolati, G. (2013). Enhanced block FSAI preconditioning using domain decomposition techniques. SIAM Journal on Scientific Computing, 35(5), S229–S249.
Janna, C., Ferronato, M., & Gambolati, G. (2015). The use of supernodes in factored sparse approximate inverse preconditioning. SIAM Journal on Scientific Computing, 37(1), C72–C94.
Janna, C., Ferronato, M., Sartoretto, F., & Gambolati, G. (2015). FSAIPACK: a software package for high-performance factored sparse approximate inverse preconditioning. ACM Transactions on Mathematical Software, 41(2), Art. 10, 1–26.
Jia, Z. & Kang, W. (2019). A transformation approach that makes SPAI, PSAI and RSAI procedures efficient for large double irregular nonsymmetric sparse linear systems. Journal of Computational and Applied Mathematics, 348, 200–213.
Jia, Z. & Zhang, Q. (2013). An approach to making SPAI and PSAI preconditioning effective for large irregular sparse linear systems. SIAM Journal on Scientific Computing, 35(4), A1903–A1927.
Kharchenko, S. A., Kolotilina, L. Y., Nikishin, A. A., & Yeremin, A. Y. (2001). A robust AINV-type method for constructing sparse approximate inverse preconditioners in factored form. Numerical Linear Algebra with Applications, 8(3), 165–179.
Kolotilina, L. Y., Nikishin, A. A., & Yeremin, A. Y. (1992). Factorized sparse approximate inverse (FSAI) preconditionings for solving 3D FE systems on massively parallel computers. II. In Iterative Methods in Linear Algebra (pp. 311–312). Amsterdam: North-Holland.
Kolotilina, L. Y. & Yeremin, A. Y. (1986). On a family of two-level preconditionings of the incomplete block factorization type. Soviet Journal of Numerical Analysis and Mathematical Modelling, 1(4), 293–320.
Kolotilina, L. Y. & Yeremin, A. Y. (1993). Factorized sparse approximate inverse preconditionings. I. Theory. SIAM Journal on Matrix Analysis and Applications, 14(1), 45–58.
Kolotilina, L. Y., Yeremin, A. Y., & Nikishin, A. A. (2000). Factorized sparse approximate inverse preconditionings. III: Iterative construction of preconditioners. Journal of Mathematical Sciences, 101, 3237–3254.
Kopal, J., Rozložník, M., & Tůma, M. (2016). Factorized approximate inverses with adaptive dropping. SIAM Journal on Scientific Computing, 38(3), A1807–A1820.
Kopal, J., Rozložník, M., & Tůma, M. (2020). A note on adaptivity in factorized approximate inverse preconditioning. Analele Universitatii “Ovidius” Constanta-Seria Matematica, 28(2), 149–159.
Kopal, J., Rozložník, M., Smoktunowicz, A., & Tůma., M. (2012). Rounding error analysis of orthogonalization with a non-standard inner product. BIT, 52, 1035–1058.
Magri, V. A. P., Franceschini, A., Ferronato, M., & Janna, C. (2018). Multilevel approaches for FSAI preconditioning. Numerical Linear Algebra with Applications, 25(5), e2183, 18.
Morris, J. (1946). An escalator process for the solution of linear simultaneous equations. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 37(265), 106–120.
Saad, Y. (2003b). Iterative Methods for Sparse Linear Systems (2nd ed.). Philadelphia, PA: SIAM.
Wathen, A. J. (2015). Preconditioning. Acta Numerica, 24, 329–376.
Author information
Authors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this chapter
Cite this chapter
Scott, J., Tůma, M. (2023). Sparse Approximate Inverse Preconditioners. In: Algorithms for Sparse Linear Systems. Nečas Center Series. Birkhäuser, Cham. https://doi.org/10.1007/978-3-031-25820-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-25820-6_11
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-031-25819-0
Online ISBN: 978-3-031-25820-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)