1 Introduction

This paper is concerned with the computation of robust preconditioners by using standard pivoting techniques for solving ill-conditioned sparse nonsingular linear systems of equations of the form

$$\begin{aligned} Ax=b, \quad A\in {\mathbb {R}}^{n\times n}, \quad b \in {\mathbb {R}}^{n}, \end{aligned}$$
(1)

using Krylov subspace methods. Devising robust preconditioning algorithms for (1) such that it can be solved efficiently by means of iterative methods still remains one of the most active research areas in numerical linear algebra. Preconditioned Krylov methods have been traditionally linked to the solution of large and sparse linear systems due to the relative small amount of memory and computation time needed to obtain an approximate solution compared with direct methods. There exist different techniques that can be used successfully to compute preconditioners, as incomplete LU factorizations, approximate inverses, algebraic methods, etc. (see [2] and the references therein). But in recent years they have also been employed in the context of mixed precision techniques for solving dense linear systems. The accuracy of an initial solution obtained with an LU factorization computed in single precision is improved by iterative refinement using the LU factorization as preconditioner [10, 15].

Ill-conditioned nonsingular linear systems arise in many areas of scientific and engineering applications, and computing numerically stable LU factorizations for these linear systems becomes a challenge [1, 13]. Pivoting has been originally used to compute good LU factorizations for such problems [14], but there have been also some work for incomplete factorizations [6, 18, 21]. Also MATLAB has incorporated this possibility into his function ilu, that computes the Incomplete LU factorization of a matrix, [17].

There are different pivoting techniques being partial, complete and rook pivoting the more important ones [14, 19]. Basically, at a given step of Gaussian elimination pivoting looks for an element sufficiently large in magnitude in the remaining submatrix, the Schur complement, to use it as the next pivot. These techniques involve row and possibly column permutations of the matrix that supposes a computational overhead. In this sense, partial pivoting is the cheapest pivoting technique, since it looks only in the first column of the Schur complement. Close behind is rook pivoting [20], it selects a pivot with maximum absolute value in its row and column, moving first to the biggest entry in magnitude in the first column, then it moves in the corresponding row, and then again in the column, and so on until the requirement is fulfilled. Finally, complete pivoting is the most expensive one, but guarantees the largest pivot at any stage however because the pivot is the entry of biggest magnitude in all the Schur complement.

In this work we study pivoting techniques for the balanced incomplete factorization preconditioner, BIF. BIF preconditioning is based on the incomplete Sherman-Morrison decomposition, ISM. The ISM decomposition uses recursion formulas derived from the Sherman-Morrison formula and was introduced in [7] as a method for computing approximate inverse preconditioners. In [8] the authors show that, applying the ISM algorithm to a symmetric and positive definite matrix A it is possible to compute an incomplete Cholesky factorization, later in [9] they showed that applying the ISM algorithm to A and \(A^T\), it is possible to compute an incomplete LDU factorization. Moreover, in both cases the inverse factors are also available and they influence the computation of the Cholesky or LDU factorization, and vice versa. In addition, the availability of the direct and inverse factors is exploited to implement norm based dropping rules [5]. The numerical results show that BIF is a robust algorithm comparable to other techniques as ILU(\(\tau \)) [3], ILUT (Threshold Incomplete LU) [22] and RIF (Robust Incomplete Factorization) [4]. Nevertheless, as mentioned above, computing stable (incomplete) factorizations for general ill-conditioned problems still require the application of pivoting techniques, except for some kind of matrices that can be solved with high accuracy, [16]. In this paper we show that with a slight modification of the ISM recursion formulas it is possible to incorporate pivoting to BIF, and obtain an algorithm which can be efficiently implemented and with similar efficiency that the well known incomplete LU with pivoting (ILUP). So, this study completes our previous work.

The paper is organized as follows. In Sect. 2 an overview of the ISM decomposition is presented. In Sect. 3 we introduce and analyze the right looking ISM decomposition. It is shown that the Schur complement computed with Gaussian elimination is available at each step of the modified algorithm and therefore, it is possible to incorporate any standard pivoting technique. In Sect. 4 the BIF algorithm with pivoting is presented and the results for several ill-conditioned matrices are reported. Our experiments show that BIF with pivoting is able to solve such a challenging problems and it is comparable to ILUT with partial pivoting. Finally, the main conclusions are presented in Sect. 5.

2 The ISM decomposition

The ISM decomposition was introduced in [7] as an algorithm to compute approximate inverse preconditioners since it obtains a factorization of the (shifted) inverse matrix of A, as

$$\begin{aligned} s^{-1}I - {A}^{-1} = s^{-2} Z D_s^{-1} V_s^T, \end{aligned}$$
(2)

where \(s > 0\) is a given scalar and the columns of the matrices Z and \(V_s\) are computed using the recursion formulas

$$\begin{aligned} z_k=e_k-\sum _{i=1}^{k-1}\frac{v_i^T e_k}{sr_i}z_i \quad \text {and} \quad v_k=y_k-\sum _{i=1}^{k-1}\frac{y^T_k z_i}{sr_i}v_i, \end{aligned}$$
(3)

for \(k=1,2,\ldots ,n\). In (3) the vector \(e_k\) (\(e^k\)) denotes the \(k-\)th column (row) of the identity matrix, \(y_k = (a^k - se^k)^T\) where \(a^k\) denotes the k-th row of A, and

$$\begin{aligned} r_k=1+y^T_k z_k/s=1+v^T_k e_k/s, \end{aligned}$$
(4)

are the entries of the diagonal matrix \(D_s\).

It was proved in [8] that for symmetric matrices the factorization \(A=LDL^T\) and the decomposition (2) satisfy

$$\begin{aligned} D=sD_s, \quad Z=L^{-T}, \quad V_s= L D - sL^{-T}. \end{aligned}$$

The algorithm to get the decomposition of A uses explicitly the computed factors of \(A^{-1}\), that is, \(A^{-1}\) is implicitly factorized at the same time. In [9] it is proved the following result

Theorem 1

(Theorem 2.1 of [9]) Let \(A= {L} {D} {U}\) be the LDU decomposition of A, and let \(s^{-1}I - {A}^{-1} = s^{-2} Z D_s^{-1} V_s^T\) be the ISM decomposition  (2). Then

$$\begin{aligned} Z = U^{-1}, \quad \text {and} \quad V_s = {U}^T {D} - s {L}^{-T}. \end{aligned}$$
(5)

Observe that L does not appear in (5). Therefore, to get the LU factorization for general matrices it is necessary to compute also the ISM decomposition of \(A^T\) that gives as result

$$\begin{aligned} {\tilde{Z}} = L^{-T}, \quad \text {and} \quad {\tilde{V}}_s = {L} {D} - s {U}^{-1}, \end{aligned}$$

where we have denoted with tilde the factors of the ISM decomposition of \(A^T\).

It is well known that a nonsingular matrix A has an LU factorization if there exists a lower unit triangular matrix L and an upper triangular matrix U, such that \(A=LU\). The LDU factorization is obtained from the LU factorization by taking D as the diagonal matrix whose entries are the diagonal entries of U, and applying its inverse to U as \(D^{-1}U\). Both factorizations are closely related with Gaussian elimination. Note that not all the nonsingular matrices have LU factorization since a zero pivot can be found during the Gaussian elimination process. However it is always possible to permute some rows, and maybe some columns of the matrix in such a way that the permuted matrix PAQ has LU factorization. Here P and Q are permutation matrices acting on rows and columns of A, respectively.

The idea is that it is possible to find permutation matrices P and Q such that at the k-th step of the Gaussian elimination process one obtains the matrix

$$\begin{aligned} (PAQ)^{(k)} = \left[ \begin{array}{cc} L_{11} &{} O \\ L_{21} &{} I \end{array}\right] \left[ \begin{array}{cc} U_{11} &{} U_{12} \\ O &{} S^{(k)} \end{array}\right] , \end{aligned}$$

where the Schur complement \(S^{(k)} = A_{22} - A_{21}A_{22}^{-1}A_{12}\) is nonsingular and its first diagonal element is nonzero. Then, the permuted matrix PAQ is factorized as

$$\begin{aligned} PAQ = \left[ \begin{array}{cc} A_{11} &{} A_{12} \\ A_{21} &{} A_{22} \end{array}\right] = \left[ \begin{array}{cc} L_{11} &{} 0 \\ L_{21} &{} L_{22} \end{array}\right] \left[ \begin{array}{cc} U_{11} &{} U_{12} \\ 0 &{} U_{22} \end{array}\right] . \end{aligned}$$
(6)

Here, \(A_{11}\), \(A_{12}\), \(A_{21}\) and \(A_{22}\) represent the submatrices of the reordered matrix PAQ, and the size of \(A_{11}\) is \(k\times k\).

Note that in practice, the permutation matrices P and Q are not known in advance and therefore LU factorization algorithms determine which rows and columns must be interchanged during the elimination process. In the next section we show that it is possible to obtain the factorization (6) from the ISM decomposition by modifying its recursion formulas.

3 Right looking ISM algorithm

To implement pivoting in the ISM decomposition it is necessary to know the Schur complement of the LU factorization. To accomplish that, the vectors \(z_k\) and \(v_k\) must be computed in a different way. Instead of computing only one pair of vectors in the k-th step of the algorithm according to equations (3), the modification consist in updating also the remaining vectors, from \(k+1\) to n. That is, the right part of the matrices Z and V are updated in each step. The following MATLAB code implements the new right looking version of ISM.

figure a

Next, we will show that the Schur complement \(S^{(k)}\) is available from the matrix V. As usual, we denote by \(u_{kj}\) the entry in the row k and column j of the matrix U, and by \(l_{ik}\) the entry in the row i and column k of the matrix L. Observe that after the k-th step the first k columns of Z and V are computed. From \(A=LDU\) and Theorem 1, \(AZ=LD\), and after the k-th step

$$\begin{aligned} \mathbf {a}^i \mathbf {z}_k = l_{ik} d_k, \quad \hbox { if}\ i > k \ , \end{aligned}$$
(7)

and the j-th element of \(\mathbf {v}_k\) is

$$\begin{aligned} v_{jk} = u_{kj}, \quad j\ne k \ . \end{aligned}$$
(8)

From (7) and considering that \(\mathbf {z}_k\) has zero entries bellow the row k, an important equality for the proof of our main result is

$$\begin{aligned} \mathbf {y}_i^T \mathbf {z}_k = \mathbf {a}^i \mathbf {z}_k - \mathbf {e}^i \mathbf {z}_k = \mathbf {a}^i \mathbf {z}_k = l_{ik} d_k, \quad i > k . \end{aligned}$$
(9)

We denote by \(V_{22}^{(k)}\) the \((n-k)\times (n-k)\) submatrix of V in Algorithm 1 after step k, with rows and columns with indexes in \(\{k+1,\ldots , n\}\). Then we have the following result. To prove it we denote the entry in the row i and column j of a matrix M as \(m_{ij}\).

Theorem 2

If A is a nonsingular matrix, then at the k-th step of the right looking ISM Algorithm 1

$$\begin{aligned} V_{22}^{(k)} = S^{{(k)}^T} - I. \end{aligned}$$
(10)

Proof

We are going to prove equation (10) by induction on the steps.

Clearly the initialization of V is \(V = A^T - I\), so we can write \(V^{(0)} = S^{{(0)}^T} - I\).

For \(k=1\) let us consider the element \(a_{ij}^{(1)}\) of the Schur complement \(S^{(1)}\), that is entries with \(i,j>1\). It is well known

$$\begin{aligned} a_{ij}^{(1)} = a_{ij}^{(0)} - \frac{a_{i1}^{(0)} a_{1j}^{(0)}}{a_{11}^{(0)}} = a_{ij} - l_{i1} u_{1j} \end{aligned}$$

In the right looking ISM the (ji) entry of the matrix \(V^{(1)}\), for \(i,j>1\), \(i\ne j\) is

$$\begin{aligned} v_{ji}^{(1)} = v_{ji}^{(0)} - \frac{\mathbf {y}_i^{T} \mathbf {z}_1^{(0)}}{r_1} \mathbf {v}_1^{(0)}(j) = a_{ij} - \frac{a_{i1}}{a_{11}} a_{1j} = a_{ij} - l_{i1} u_{1j}. \end{aligned}$$

where we have used Eqs. (8) and (9).

Working in the same way when \(i=j\), we have

$$\begin{aligned} v_{ii}^{(1)} = v_{ii}^{(0)} - \frac{\mathbf {y}_i^{T} \mathbf {z}_1^{(0)}}{r_1} \mathbf {v}_1^{(0)}(i) = (a_{ii} - 1) - \frac{a_{i1}}{a_{11}} a_{1i} = a_{ii} - l_{i1} u_{1i} - 1. \end{aligned}$$

Then

$$\begin{aligned} V_{22}^{(1)} = S^{{(1)}^T} - I. \end{aligned}$$

Assume now that

$$\begin{aligned} V_{22}^{(k-1)} = S^{{(k-1)}^T} - I. \end{aligned}$$

Let us prove the equality for the k-th step. Consider the entries of the Schur complement \(S^{(k)}\), that is \(a_{ij}^{(k)}\) for \(i,j>k\)

$$\begin{aligned} a_{ij}^{(k)} = a_{ij}^{(k-1)} - \frac{a_{ik}^{(k-1)} a_{kj}^{(k-1)}}{a_{kk}^{(k-1)}} = a_{ij}^{(k-1)} - l_{ik} u_{kj}. \end{aligned}$$

Again, in the right looking ISM the (ji) entry of the matrix \(V^{(k-1)}\), for \(i,j>1\), \(i\ne j\) is

$$\begin{aligned} v_{ji}^{(k)} = v_{ji}^{(k-1)} - \frac{\mathbf {y}_i^{T} \mathbf {z}_k^{(k-1)}}{r_k} \mathbf {v}_k^{(k-1)}(j) = a_{ij}^{(k-1)} - \frac{a_{ik}^{(k-1)}}{a_{kk}^{(k-1)}} a_{kj}^{(k-1)} = a_{ij}^{(k-1)} - l_{ik} u_{kj}, \end{aligned}$$

where we have used Eqs. (8) and (9).

Working in the same way, when \(i=j\) we have

$$\begin{aligned} v_{ii}^{(k)} = v_{ii}^{(k-1)} - \frac{\mathbf {y}_i^{T} \mathbf {z}_k^{(k-1)}}{r_k} \mathbf {v}_k^{(k-1)}(i) = \left( a_{ii}^{(k-1)} - 1\right) - \frac{a_{ik}^{(k-1)}}{a_{kk}^{(k-1)}} a_{ki}^{(k-1)} = a_{ii}^{(k-1)} - l_{ik} u_{ki} - 1. \end{aligned}$$

Then

$$\begin{aligned} V_{22}^{(k)} = S^{{(k)}^T} - I . \end{aligned}$$

\(\square \)

Since we need to compute also the factorization of \(A^T\), observe that.

Corollary 1

If the right looking algorithm is applied to \(A^T\) then

$$\begin{aligned} {\tilde{V}}_{22}^{(k)} = S^{{(k)}} - I. \end{aligned}$$

Figure 1 shows a graphical representation of the above results. To introduce pivoting strategies the relation

$$\begin{aligned} V_{22}^{(k)} = S^{{(k)}^T} - I, \end{aligned}$$

should be taken in account. The new pivot is looked for into the submatrix \(V_{22}^{(k)} + I\) that corresponds to the transpose of the same submatrix in \(A^{(k)}\) in Gaussian elimination. Thus, in partial pivoting if two columns k and \(p>k\) are permuted at step k in matrix \(V^{(k)}\), the rows k and p should be permuted in A.

Also note that the pivoting strategy should be decided by looking into the Schur complement contained in \(V_s\), or that in \(\tilde{V_s}\), but not both. In contrast, for complete pivoting it is clear that \(V_s\) or \(\tilde{V_s}\) produce the same pivot in exact arithmetic so any of them or both may be used.

Fig. 1
figure 1

Matrices \(V_s\) and \(\tilde{V_s}\) at the k-th step of the right looking ISM algorithm

4 Numerical experiments

In this section we report the results of some numerical experiments with a set of matrices from The SuiteSparse Matrix Collection [11] and the Harwell-Boeing collection [12]. The matrices are listed in Table 1 where their size, number of nonzeros, condition number and application are indicated. They correspond to very ill-conditioned and highly indefinite problems for which Gaussian elimination without pivoting fails to compute good quality L and U factors, so the same is expected to be the case for incomplete LU factorizations (see [5, 10]). Partial, rook and complete pivoting techniques have been tested. The experiments have been implemented and run in MATLAB R2022a. As iterative solvers the MATLAB implementation of full GMRES [23] and BiCGStab [24] were used. The right hand side vector was computed such that the solution was the vector of all ones, and the initial guess was the zero vector. The iterations were stopped when the initial residual was reduced by 8 orders of magnitude with a maximum number of 1, 000 iterations. To compare the results obtained with BIF the problems were also solved with the MATLAB’s incomplete LU preconditioner with partial pivoting, ILUTP.

Table 1 Test problems

The implementation of the BIF preconditioner is based on the algorithm described in [9] but with the right looking modification described in Sect. 3. In [9] the authors show that in the ISM factorization the computation of the direct and inverse LU factors is interleaved and they mutually influence each other. These characteristics allows for the use of advanced dropping rules, see [5]. We will not discuss in detail these rules but we recall that their application requires the estimation of the norm of the columns of the LU factors and their inverses. Since approximations of these factors are explicitely available the application of this kind of dropping rules is straightforward.

For simplicity, all the experiments have been done with the the parameter s of the ISM decomposition equal to one. The algorithm is implemented such that the ISM decompositions of A and \(A^T\) are computed at the same time. Therefore, accessing A and \(A^T\) simultaneously is needed. The pivot is chosen from the Schur complement contained in \(V_s\) rather than \({\tilde{V}}_s\). We note that for complete pivoting the same pivot could be obtained working either with \(V_s\) or \({\tilde{V}}_s\) but we choose working with \(V_s\) for simplicity. Thus, if in the k-th step of the algorithm the pivot strategy determines that the new pivot is in column p and row q, since \(V_s\) stores the transpose of the Schur complement, then the columns p and k in \(V_s\) and \(A^T\) must be interchanged. Observe that in \({\tilde{V}}_s\), \({\tilde{Z}}\) and A the rows p and k must be pivoted instead. By the same reason the rows q and k must be interchanged in \(V_s\), Z and \(A^T\), and the corresponding columns in \({\tilde{V}}_s\) and A. Also, other vectors whose elements depend on the column and row ordering, for instance vectors storing the norms of the columns needed for the dropping rule, must be reordered accordingly. Algorithm 2 sketches the pivoted version of the BIF algorithm described.

figure b
Table 2 Test results for the University of Florida matrices
Table 3 Harwell-Boeing group

The algorithm first determines the pivot position according to the pivoting strategy applied, complete, rook or partial. Then, permutation of the matrices and vectors are performed. After that, norms of the columns of the LU factors and their inverses are updated and the dropping rule is applied. The recursion formula of the factorization is applied. Finally, after n steps the LU factors are extracted from matrices V and \({\tilde{V}}\).

In Tables 2 and 3 the pivoting strategy is indicated with C, P and R for the complete, partial and rook pivoting strategies, respectively. Density is the ratio between the number of nonzeros of the preconditioner and the number of nonzeros of the matrix, that is \(\frac{{\text {nnz}}(L)+{\text {nnz}}(U)}{{\text {nnz}}(A)}\). Column iter shows the number of iterations of the solver and droptol is the tolerance used to drop elements in BIFP and ILUTP. The other columns are self explanatory. To reduce the numbers in the tables, a blank space means that the value is the same appearing in previous rows. For instance, in Table 2 the droptol value for BIFP was always \(10^{-6}\) and therefore it appears only in the first row. The same holds for the preconditioner densities which are the same for GMRES and BiCSTAB and therefore only indicated once.

Next, we will comment on the results. We note that the matrices tested can not be solved without pivoting with both BIFP and ILUTP preconditioners. Thus, pivoting is an essential tool to gain robustness for these factorizations. Starting with the University of Florida test matrices, Table 2, we observe for the adder group that there are not big differences between the different pivoting strategies for BIFP. Density is small, except for adder_dcop_06. The same can be said for the number of iterations spent by both iterative solvers. For the rest of matrices one can see that BIFP with complete pivoting computes sparser preconditioners than partial and rook pivoting. The iteration count does not present remarkable differences except for the oscil_dcop_01 matrix for which GMRES with partial pivoting, although with larger nonzero density, doubles the number of iterations.

For the Harwell-Boeing matrices reported in Table 3 the first thing to observe is that the preconditioners are quite dense, with the exception of partial pivoting for the matrix orani678. Complete pivoting still produces less fill-in in the preconditioner and the number of iterations is similar to the other pivoting strategies. Note however that partial pivoting performs extraordinary well for the matrix orani678 since it is able to converge in the same number of iterations with a very sparse preconditioner.

Finally, comparing the performance of BIFP with ILUTP we did not observed significant differences, specially with the preconditioned BiCGStab method. We recall that ILUTP uses partial pivoting and we observe that BIFP with this pivoting strategy performed closely in most cases.

5 Conclusions

In this paper we have presented an improved version of the BIF preconditioner that incorporates pivoting. The algorithm relies on a modification of the recursion formulas such that the Schur complement of standard Gaussian elimination is available at each step of the factorization. Thus, the application of different pivoting techniques, as for instance partial, rook and complete pivoting, can be done in a straightforward manner. Incorporating pivoting turns out to be an important step in order to achieve our initial goal of obtaining a more robust preconditioner since it is able to solve very ill-conditioned and indefinite problems that it may not be possible to solve in other way. The results of the numerical experiments with several matrices arising in different applications confirm that BIF with pivoting is a robust algorithm. Partial, rook and complete pivoting has been tested. Although complete pivoting very often produces sparser preconditioners with a competitive iteration count, rook and partial pivoting perform also quite well. Taking into account that partial and rook are less expensive from a computational point of view since they need less comparisons in order to determine the pivot, these two techniques may be preferable as default. Also, a comparison with ILU with partial pivoting (ILUP) has been done and one can see that the results between them are fairly close. As a final note on future work, since with the ISM decomposition one can also compute incomplete approximate inverse preconditioners, it can be worth to explore its application for ill-conditioned problems, as it is done for instance in [15].