1 Introduction

Consider the linear programming (LP) problem in the standard primal-dual formulation

$$\begin{aligned} \min _{\varvec{x}} \varvec{c}^{\mathsf {T}}\varvec{x} \qquad&\text{ subject } \text{ to } \qquad A\varvec{x}=\varvec{b}, \;\; \varvec{x}\ge \varvec{0}, \end{aligned}$$
(1a)
$$\begin{aligned} \max _{\varvec{y},\varvec{s}} \varvec{b}^{\mathsf {T}}\varvec{y} \qquad&\text{ subject } \text{ to } \qquad A^{\mathsf {T}}\varvec{y} + \varvec{s}=\varvec{c},\;\; \varvec{s}\ge \varvec{0}, \end{aligned}$$
(1b)

where \(A \in {\mathbb {R}}^{m\times n}\), \(m\le n\), and we assume the existence of an optimal solution. In this paper, we describe an implementation of the interior-point method for LP based on iterative solvers. The main computational task in one iteration of the interior-point method is the solution of a system of linear equations to compute the search direction.

For this task, direct solvers are usually used. But some solvers also employ iterative solvers. Iterative solvers are advantageous when the systems are large and sparse, or even when they are large and dense but the product of the coefficient matrix and a vector can be approximated cheaply, as in [11, 65]. The difficulty with iterative solvers is that the linear system becomes notoriously ill-conditioned towards the end of interior-point iterations. One approach is to precondition the mathematically equivalent indefinite augmented system [as in Eq. (5)] as in HOPDM (Higher Order Primal-Dual Method) [29] and also [2, 3, 6, 7, 12, 26, 27, 33, 58, 61]. The other approach is to precondition the equivalent normal equations [as in Eq. (6)] [9, 14, 28, 40, 42, 44, 45, 48, 60, 70].

In this paper, we treat the normal equations and apply novel inner-iteration preconditioned Krylov subspace methods to them. The inner-iteration preconditioners recently proposed by Morikuni and Hayami [54, 55] enable us to deal with the severe ill-conditioning of the normal equations. Furthermore, the proposed Krylov subspace methods do not suffer from singularity and therefore no preprocessing is necessary even if A is rank-deficient.

The main contribution of the present paper is that we actually show that the use of the inner-iteration preconditioner enables the efficient interior-point solution of wide-ranging LP problems. We further proposed combining the row-scaling scheme with the inner-outer iteration methods, where the row norm appears in the successive overrelaxation (SOR) inner-iterations, to improve the condition of the system at each interior-point step. The linear systems are solved with a gradually tightened stopping tolerance. We proposed a new recurrence in order to omit one matrix-vector product at each interior-point step. These techniques reduce the CPU time.

Extensive numerical experiments were conducted over diverse instances of 127 LP problems taken from the standard benchmark libraries Netlib, QAPLIB, and Mittelmann collections. The largest problem has 434,580 unknowns. The proposed interior-point method is entirely based on iterative solvers and yet succeeds in solving a fairly large number of standard LP instances from the benchmark libraries with standard stopping criteria. We could not find any other analogous result where this level of LP instances were solved just relying on iterative solvers.

We compared our interior-point LP solvers based on AB-GMRES (right-preconditioned generalized minimal residual method) [37, 55], CGNE, and MRNE (preconditioned CG and MINRES applied to the normal equations of the second kind) [13, 55] with the following well-known interior-point LP solvers:

  1. 1.

    SeDuMi (Self-Dual Minimization) [66], (public-domain, direct solver),

  2. 2.

    SDPT3 (Semidefinite Programming Toh-Todd-Tütüncü) [68, 69], (public-domain, direct solver),

  3. 3.

    PDCO (Primal-Dual Barrier Method for Convex Objectives) [65],

    1. (a)

      PDCO-Direct (public-domain, direct solver),

    2. (b)

      PDCO-LSMR (public-domain, LSMR iterative solver),

  4. 4.

    MOSEK [57] (commercial, direct solver).

SeDuMi and SDPT3 are solvers for conic linear programming including semidefinite programming (SDP) and second-order cone programming (SOCP). PDCO is for LP and convex quadratic programming (QP) and has options to solve the system of linear equations with Krylov subspace iterative method LSMR in addition to the direct method. MOSEK is considered as one of the state-of-the-art solvers for LP.

As summarized in Table 1, our implementation was able to solve most instances, which is clearly superior to SeDuMi, SDPT3, PDCO-Direct, and PDCO-LSMR with comparable computation time, though it is still slower than MOSEK.

We also tested our solvers on different problems which arise in basis pursuit [11] where the coefficient matrix is much denser than the aforementioned standard benchmark problems.

We emphasize that there are many interesting topics to be further worked out based on this paper. There is still room for improvement regarding the iterative solvers as well as using more sophisticated methods for the interior-point iterations.

In the following, we introduce the interior-point method and review the iterative solvers previously used. We employ an infeasible primal-dual predictor-corrector interior-point method, one of the methods that evolved from the original primal-dual interior-point method [41, 49, 67, 71] incorporating several innovative ideas, e.g., [45, 73].

An optimal solution \(\varvec{x}, \varvec{y}, \varvec{s}\) to problem (1) must satisfy the Karush-Kuhn-Tucker (KKT) conditions

$$\begin{aligned} A^{\mathsf {T}} \varvec{y} + \varvec{s}&= \varvec{c}, \end{aligned}$$
(2a)
$$\begin{aligned} A \varvec{x}&= \varvec{b}, \end{aligned}$$
(2b)
$$\begin{aligned} X S \varvec{e}&= \varvec{0}, \end{aligned}$$
(2c)
$$\begin{aligned} \varvec{x} \ge \varvec{0}, \quad \varvec{s}&\ge \varvec{0}, \end{aligned}$$
(2d)

where \(X :=\mathrm {diag}(x_1, x_2, \dots , x_n)\), \(S:=\mathrm {diag}(s_1, s_2, \dots , s_n)\), and \(\varvec{e} := [1,1,\dots , 1]^{\mathsf {T}}\). The complementarity condition (2c) implies that at an optimal solution, one of the elements \(x_i\) or \(s_i\) must be zero for \(i = 1, 2, \dots , n\).

The following system is obtained by relaxing (2c) to \(XS{{\varvec{e}}}= \mu {{\varvec{e}}}\) with \(\mu >0\):

$$\begin{aligned} XS\varvec{e} = \mu \varvec{e}, \ \ A\varvec{x} = \varvec{b}, \ \ A^{\mathsf {T}} \varvec{y}+\varvec{s}=\varvec{c}, \ \ \varvec{x}\ge \varvec{0},\ \ \varvec{s} \ge \varvec{0}. \end{aligned}$$
(3)

The interior-point method solves the problem (1) by generating solutions to (3), with \(\mu \) decreasing towards zero, so that (2) is satisfied within some tolerance level at the solution point. The search direction at each infeasible interior-point step is obtained by solving the Newton equations

$$\begin{aligned} \begin{bmatrix} \varvec{0}&A^{\mathsf {T}}&I \\ A&\varvec{0}&\varvec{0} \\ S&\varvec{0}&X \end{bmatrix} \left[ \begin{array}{c} \varDelta \varvec{x} \\ \varDelta \varvec{y} \\ \varDelta \varvec{s} \end{array} \right] = \left[ \begin{array}{c} \varvec{r}_\mathrm {d} \\ \varvec{r}_\mathrm {p} \\ \varvec{r}_\mathrm {c} \end{array} \right] , \end{aligned}$$
(4)

where \(\varvec{r}_\mathrm {d} := \varvec{c} - A^{\mathsf {T}} \varvec{y} - \varvec{s} \in {\mathbb {R}}^n\) is the residual of the dual problem, \(\varvec{r}_\mathrm {p} := \varvec{b} - A\varvec{x} \in {\mathbb {R}}^m\) is the residual of the primal problem, \(\varvec{r}_\mathrm {c} := -XS\varvec{e} + \sigma \mu \varvec{e}\) , \(\mu := {\varvec{x}^{\mathsf {T}}\varvec{s}}/{n}\) is the duality measure, and \(\sigma \in [0, 1)\) is the centering parameter, which is dynamically chosen to govern the progress of the interior-point method. Once the kth iterate \((\varvec{x}^{(k)}, \varvec{y}^{(k)}, \varvec{s}^{(k)})\) is given and (4) is solved, we define the next iterate as \((\varvec{x}^{(k+1)}, \varvec{y}^{(k+1)}, \varvec{s}^{(k+1)}) := (\varvec{x}^{(k)}, \varvec{y}^{(k)}, \varvec{s}^{(k)}) + \alpha (\varDelta \varvec{x}, \varDelta \varvec{y}, \varDelta \varvec{s})\), where \(\alpha \in (0, 1]\) is a step length to ensure the positivity of \(\varvec{x}\) and \(\varvec{s}\), and then reduce \(\mu \) to \(\sigma \mu \) before solving (4) again.

At each iteration, the solution of (4) dominates the total CPU time. The choice of linear solvers depends on the way of arranging the matrix of (4). Aside from solving the \((m+2n)\times (m+2n)\) system (4), one can solve its reduced equivalent form of size \((m+n)\times (m+n)\)

$$\begin{aligned} \begin{bmatrix} A&0\\ S&-XA^{\mathsf {T}} \end{bmatrix} \left[ \begin{array}{c} \varDelta \varvec{x} \\ \varDelta \varvec{y} \end{array}\right] = \left[ \begin{array}{c} \varvec{r}_\mathrm {p} \\ \varvec{r}_\mathrm {c} - X\varvec{r}_\mathrm {d}\end{array} \right] , \end{aligned}$$
(5)

or a more condensed equivalent form of size \(m\times m\)

$$\begin{aligned} AXS^{-1}A^{\mathsf {T}}\varDelta \varvec{y} = \varvec{r}_\mathrm {p} - AS^{-1}(\varvec{r}_\mathrm {c} - X\varvec{r}_\mathrm {d}), \end{aligned}$$
(6)

both of which are obtained by performing block Gaussian eliminations on (4). We are concerned in this paper with solving the third equivalent form (6).

It is known that the matrix of (6) is semidefinite when any of the following cases is encountered. First, when A is rank-deficient, system (6) is singular. There exist presolving techniques that address this problem, see, e.g., [4, 31]. However, they do not guarantee to detect all dependent rows in A. Second, in late interior-point iterations, the diagonal matrix \(XS^{-1}\) has very tiny and very large diagonal values as a result of convergence. Thus, the matrix may become positive semidefinite. In particular, the situation becomes severe when primal degeneracy occurs at an optimal solution. One can refer to [34, 74] for more detailed explanations.

Thus, when direct methods such as Cholesky decomposition are applied to (6), some diagonal pivots encountered during decomposition can be zero or negative, causing the algorithm to break down. Many direct methods adopt a strategy of replacing the problematic pivot with a very large number. See, e.g., [74] for the Cholesky-Infinity factorization, which is specially designed to solve (6) when it is positive semidefinite but not definite. Numerical experience [1, 5, 17, 25, 43, 44, 72] indicates that direct methods provide sufficiently accurate solutions for interior-point methods to converge regardless of the ill-conditioning of the matrix. However, as the LP problems become larger, the significant fill-ins in decompositions make direct methods prohibitively expensive. It is stated in [32] that the fill-ins are observed even for very sparse matrices. Moreover, the matrix can be dense, as in QP in support vector machine training [24] or linear programming in basis pursuit [11], and even when A is sparse, \(AXS^{-1}A^{\mathsf {T}}\) can be dense or have a pattern of nonzero elements that renders the system difficult for direct methods. The expensive solution of the KKT systems is a usual disadvantage of second-order methods including interior-point methods.

These drawbacks of direct methods and the progress in preconditioning techniques motivate researchers to develop stable iterative methods for solving (6) or alternatively (5). The major problem is that as the interior-point iterations proceed, the condition number of the term \(XS^{-1}\) increases, making the system of linear equations intractable. One way to deal with this is to employ suitable preconditioners. Since our main focus is on solving (6), we explain preconditioners for (6) in detail in the following. We mention [2, 3, 6, 7, 12, 26, 27, 58, 61] as literature related to preconditioners for (5).

For the iterative solution of (6), the conjugate gradient (CG) method [38] has been applied with diagonal scaling preconditioners [9, 42, 60] or incomplete Cholesky preconditioners [12, 40, 45, 48]. LSQR with a preconditioner was used in [28]. A matrix-free method of using CG for least squares (CGLS) preconditioned by a partial Cholesky decomposition was proposed in [33]. In [14], a preconditioner based on Greville’s method [15] for generalized minimal residual (GMRES) method was applied. Suitable preconditioners were also introduced for particular fields such as the minimum-cost network flow problem in [39, 50, 51, 62]. One may refer to [18] for a review on the application of numerical linear algebra algorithms to the solutions of KKT systems in the optimization context.

In this paper, we propose to solve (6) using Krylov subspace methods preconditioned by stationary inner-iterations recently proposed for least squares problems in [37, 54, 55]. In Sect. 2, we briefly describe the framework of Mehrotra’s predictor-corrector interior-point algorithm we implemented and the normal equations arising from this algorithm. In Sect. 3, we specify the application of our method to the normal equations. In Sect. 4, we present numerical results comparing our method with a modified sparse Cholesky method, three direct solvers in CVX, a major public package for specifying and solving convex programs [35, 36], and direct and iterative solvers in PDCO [65]. The testing problems include the typical LP problems from the Netlib, Qaplib and Mittelmann collections in [20] and basis pursuit problems from the package Atomizer [10]. In Sect. 5, we conclude the paper.

Throughout, we use bold lower case letters for column vectors. We denote quantities related to the kth interior-point iteration by using a superscript with round brackets, e.g., \(\varvec{x}^{(k)}\), the kth iteration of Krylov subspace methods by using a subscript without brackets, e.g., \(\varvec{x}_{k}\), and the kth inner iteration by using a superscript with angle brackets, e.g., \(\varvec{x}^{\langle k\rangle }\). \({\mathcal {R}}(A)\) denotes the range space of a matrix A. \(\kappa (A)\) denotes the condition number \(\kappa (A) = \sigma _1(A)/\sigma _r(A)\), where \(\sigma _1(A)\) and \(\sigma _r(A)\) denote the maximum and minimum nonzero singular values of A, respectively. \({\mathcal {K}}_k (A, \varvec{b}) = \mathrm {span} \lbrace \varvec{b}, A \varvec{b}, \dots , A^{k-1} \varvec{b} \rbrace \) denotes the Krylov subspace of order k.

2 Interior-point algorithm and the normal equations

We implement an infeasible version of Mehrotra’s predictor-corrector method [46], which has been established as a standard in this area [43, 44, 47, 71]. Note that our method can be applied to other interior-point methods (see, e.g., [71] for more interior-point methods) whose directions are computed via the normal equations (6).

2.1 Mehrotra’s predictor-corrector algorithm

In this method, the centering parameter \(\sigma \) is determined by dividing each step into two stages.

In the first stage, we solve for the affine direction \((\varDelta \varvec{x}_\mathrm {af}, \varDelta \varvec{y}_\mathrm {af}, \varDelta \varvec{s}_\mathrm {af})\)

$$\begin{aligned} \begin{bmatrix} \varvec{0}&A^{\mathsf {T}}&I \\ A&\varvec{0}&\varvec{0} \\ S&\varvec{0}&X \end{bmatrix} \left[ \begin{array}{c} \varDelta \varvec{x}_\mathrm {af} \\ \varDelta \varvec{y}_\mathrm {af} \\ \varDelta \varvec{s}_\mathrm {af} \end{array} \right] = \left[ \begin{array}{c} \varvec{r}_\mathrm {d} \\ \varvec{r}_\mathrm {p} \\ -XS\varvec{e} \end{array} \right] , \end{aligned}$$
(7)

and measure its progress in reducing \(\mu \). If the affine direction makes large enough progress without violating the nonnegative boundary (2d), then \(\sigma \) is assigned a small value. Otherwise, \(\sigma \) is assigned a larger value to steer the iterate to be more centered in the strictly positive region.

In the second stage, we solve for the corrector direction \((\varDelta \varvec{x}_\mathrm {cc}, \varDelta \varvec{y}_\mathrm {cc}, \varDelta \varvec{s}_\mathrm {cc})\)

$$\begin{aligned} \begin{bmatrix}\varvec{0}&A^{\mathsf {T}}&I \\ A&\varvec{0}&\varvec{0} \\ S&\varvec{0}&X \end{bmatrix} \left[ \begin{array}{c} \varDelta \varvec{x}_\mathrm {cc} \\ \varDelta \varvec{y}_\mathrm {cc} \\ \varDelta \varvec{s}_\mathrm {cc} \end{array} \right] = \left[ \begin{array}{c} \varvec{0} \\ \varvec{0} \\ -\varDelta X_\mathrm {af}\varDelta S_\mathrm {af}\varvec{e}+\sigma \mu _\mathrm {af} \varvec{e}\end{array} \right] , \end{aligned}$$
(8)

where \(\varDelta X_\mathrm {af} = \mathrm {diag}(\varDelta \varvec{x}_\mathrm {af})\), \(\varDelta S_\mathrm {af} = \mathrm {diag}(\varDelta \varvec{s}_\mathrm {af})\) and \(\sigma \) is determined according to the solution in the first stage. Finally, we update the current iterate along the linear combination of the two directions.

In our implementation of the interior-point method, we adopt Mehrotra’s predictor-corrector algorithm as follows.

figure a

In line 5 in Algorithm 1, the step lengths \(\alpha _\mathrm {p}, \; \alpha _\mathrm {d}\) are computed by

$$\begin{aligned} \alpha _\mathrm {p} = \min { \left( 1,\eta \min _{i: \varDelta x_i<0}\left( -\frac{ x_i}{\varDelta x_i}\right) \right) },\ \ \alpha _\mathrm {d} = \min { \left( 1, \eta \min _{i: \varDelta s_i<0}\left( -\frac{ s_i}{\varDelta s_i}\right) \right) }, \end{aligned}$$
(9)

where \((\varDelta \varvec{x}, \varDelta \varvec{s}) = (\varDelta \varvec{x}_\mathrm {af}, \varDelta \varvec{s}_\mathrm {af}), \eta \in [0.9, 1)\).

In line 9, the quantity \(\mu _\mathrm {af}\) is computed by

$$\begin{aligned} \mu _\mathrm {af} = (\varvec{x}^{(k)} + \alpha _\mathrm {p}\varDelta \varvec{x}_\mathrm {af})^{\mathsf {T}} (\varvec{s}^{(k)} + \alpha _\mathrm {d}\varDelta \varvec{s}_\mathrm {af})/n. \end{aligned}$$

In the same line, the parameter \(\sigma \) is chosen as \(\sigma = \min { (0.208, (\mu _\mathrm {af}/\mu ^{(k)})^2)}\) in the early phase of the interior-point iterations. The value 0.208 and the range [0.9, 1) for \(\eta \) are adopted by the LIPSOL package [74]. In the late phase of the interior-point iterations, \(\sigma \) is chosen as approximately 10 times the error measure \(\varGamma ^{(k)}\) which is defined as:

$$\begin{aligned} \quad \varGamma ^{(k)} := \max \left\{ \mu ^{(k)}, \frac{\Vert \varvec{b}-A\varvec{x}^{(k)}\Vert _2}{\max \left\{ \Vert \varvec{b}\Vert _2,1\right\} }, \frac{\Vert \varvec{c}-\varvec{s}^{(k)}-A^{\mathsf {T}}\varvec{y}^{(k)}\Vert _2}{\max \left\{ \Vert \varvec{c}\Vert _2,1\right\} }\right\} . \end{aligned}$$
(10)

Here the distinction between early and late phases is when \(\varGamma ^{(k)}\) is more or less than \(10^{-3}\).

In line 13, we first compute trial step lengths \(\alpha _\mathrm {p}, \alpha _\mathrm {d}\) using equations (9) with \((\varDelta \varvec{x}, \varDelta \varvec{s}) = (\varDelta \varvec{x}^{(k)}, \varDelta \varvec{s}^{(k)})\). Then, we gradually reduce \(\alpha _\mathrm {p}, \alpha _\mathrm {d}\) to find the largest step lengths that can ensure the centrality of the updated iterates, i.e., to find the maximum \({\hat{\alpha }}_\mathrm {p}, \; {\hat{\alpha }}_\mathrm {d}\) that satisfy

$$\begin{aligned} \min _{i}( x_i + {\hat{\alpha }}_\mathrm {p} \varDelta x_i)( s_i + {\hat{\alpha }}_\mathrm {d} \varDelta s_i) \ge \phi (\varvec{x} + {\hat{\alpha }}_\mathrm {p} \varDelta \varvec{x})^{\mathsf {T}}(\varvec{s} + {\hat{\alpha }}_\mathrm {d} \varDelta \varvec{s})/n, \end{aligned}$$

where \(\phi \) is typically chosen as \(10^{-5}\).

2.2 The normal equations in the interior-point algorithm

We consider modifying Algorithm 1 so that it is not necessary to update \(\varvec{y}^{(k)}\). Since we assume the existence of an optimal solution to problem (1), we have \(\varvec{b} \in {\mathcal {R}}(A)\). Let \(D:=S^{-1/2} X^{1/2}\) and \({\mathcal {A}} := AD\). Problem (6) with \(\varDelta \varvec{w} = {\mathcal {A}}^{\mathsf {T}}\varDelta \varvec{y}\) (the normal equations of the second kind) is equivalent to

$$\begin{aligned} \min \Vert \varDelta \varvec{w} \Vert _2 \quad \text{ subject } \text{ to } \quad {\mathcal {A}} \varDelta \varvec{w} = \varvec{f}, \end{aligned}$$
(11)

where \(\varvec{f}:=\varvec{r}_\mathrm {p} - AS^{-1}(\varvec{r}_\mathrm {c} - X\varvec{r}_\mathrm {d})\).

In the predictor stage, the problem (7) is equivalent to first solving (11) for \(\varDelta \varvec{w}_\mathrm {af}\) with \(\varDelta \varvec{w} = \varDelta \varvec{w}_\mathrm {af},\;\varvec{f} = \varvec{f}_\mathrm {af} := \varvec{b}+AS^{-1}X\varvec{r}_\mathrm {d}\), and then updating the remaining unknowns by

$$\begin{aligned} \varDelta \varvec{s}_\mathrm {af}&= \varvec{r}_\mathrm {d} - D^{-1} \varDelta \varvec{w}_\mathrm {af}, \end{aligned}$$
(12a)
$$\begin{aligned} \varDelta \varvec{x}_\mathrm {af}&= -D^2\varDelta \varvec{s}_\mathrm {af} - \varvec{x}. \end{aligned}$$
(12b)

In the corrector stage, the problem (8) is equivalent to first solving (11) for \(\varDelta \varvec{w}_\mathrm {cc}\) with \(\varDelta \varvec{w} = \varDelta \varvec{w}_\mathrm {cc},\;\varvec{f} = \varvec{f}_\mathrm {cc} := AS^{-1}\varDelta X_\mathrm {af}\varDelta S_\mathrm {af}\varvec{e} - \sigma \mu AS^{-1}\varvec{e}\), and then updating the remaining unknowns by

$$\begin{aligned} \varDelta \varvec{s}_\mathrm {cc}&= - D^{-1} \varDelta \varvec{w}_\mathrm {cc}, \end{aligned}$$
(13a)
$$\begin{aligned} \varDelta \varvec{x}_\mathrm {cc}&= -D^2\varDelta \varvec{s}_\mathrm {cc} -S^{-1}\varDelta X_\mathrm {af}\varDelta \varvec{s}_\mathrm {af} + \sigma \mu S^{-1}\varvec{e}. \end{aligned}$$
(13b)

By solving (11) for \(\varDelta \varvec{w}\) instead of solving (6) for \(\varDelta \varvec{y}\), we can compute \(\varDelta \varvec{s}_\mathrm {af}\), \(\varDelta \varvec{ x}_\mathrm {af}\), \(\varDelta \varvec{ s}_\mathrm {cc}\), and \(\varDelta \varvec{x}_\mathrm {cc}\) and can save 1MV in (12a) and another in (13a) if a predictor step is performed per interior-point iteration. Here, MV denotes the computational cost required for one matrix-vector multiplication.

Remark 1

For solving an interior-point step from the condensed step equation (6) using a suited Krylov subspace method, updating \((\varvec{x},\varvec{w},\varvec{s})\) rather than \((\varvec{x},\varvec{y},\varvec{s})\) can save 1MV each interior-point iteration.

Note that in the predictor and corrector stages, problem (11) has the same matrix but different right-hand sides. We introduce methods for solving it in the next section.

3 Application of inner-iteration preconditioned Krylov subspace methods

In lines 4 and 10 of Algorithm 1, the linear system (11) needs to be solved, with its matrix becoming increasingly ill-conditioned as the interior-point iterations proceed. In this section, we focus on applying inner-iteration preconditioned Krylov subspace methods to (11) because they are advantageous in dealing with ill-conditioned sparse matrices. The methods to be discussed are the preconditioned CG and MINRES methods [38, 59] applied to the normal equations of the second kind ((P)CGNE and (P)MRNE, respectively) [13, 55], and the right-preconditioned generalized minimal residual method (AB-GMRES) [37, 55].

Consider solving linear system \({\mathbf {A}} {\mathbf {x}} = {\mathbf {b}}\), where \({\mathbf {A}} \in {\mathbf {R}}^{n \times n}\). First, the conjugate gradient (CG) method [38] is an iterative method for such problems when \({\mathbf {A}}\) is a symmetric and positive (semi)definite matrix and \({\mathbf {b}} \in {\mathcal {R}}({\mathbf {A}})\). CG starts with an initial approximate solution \({\mathbf {x}}_0 \in {\mathbb {R}}^n\) and determines the kth iterate \({\mathbf {x}}_k \in {\mathbb {R}}^n\) by minimizing \(\Vert {\mathbf {x}}_k - {\mathbf {x}}_* \Vert ^2_{\mathbf {A}}\) over the space \({\mathbf {x}}_0 + {\mathcal {K}}_k ({\mathbf {A}}, {\mathbf {r}}_0)\), where \({\mathbf {r}}_0 = {\mathbf {b}} - {\mathbf {A}} {\mathbf {x}}_0\), \({\mathbf {x}}_*\) is a solution of \({\mathbf {A}} {\mathbf {x}} = {\mathbf {b}}\), and \(\Vert {\mathbf {x}}_k - {\mathbf {x}}_* \Vert ^2_{\mathbf {A}} := ({\mathbf {x}}_k - {\mathbf {x}}_*)^{\mathsf {T}}{\mathbf {A}}({\mathbf {x}}_k - {\mathbf {x}}_*)\).

MINRES [59] is another iterative method for solving such problems but only requires \({\mathbf {A}}\) to be symmetric. MINRES with \({\mathbf {x}}_0\) determines the kth iterate \({\mathbf {x}}_k\) by minimizing \(\Vert {\mathbf {b}} - {\mathbf {A}} {\mathbf {x}} \Vert _2\) over the same space as CG.

Third, the generalized minimal residual (GMRES) method [64] only requires \({\mathbf {A}}\) to be square. GMRES with \({\mathbf {x}}_0\) determines the kth iterate \({\mathbf {x}}_k\) by minimizing \(\Vert {\mathbf {b}} - {\mathbf {A}} {\mathbf {x}} \Vert _2\) over \(\varvec{x}_0 + {\mathcal {K}}_k ({\mathbf {A}}, {\mathbf {r}}_0)\).

3.1 Application of inner-iteration preconditioned CGNE and MRNE methods

We first introduce CGNE and MRNE. Let \({\mathbf {A}} = {\mathcal {A}} {\mathcal {A}}^{\mathsf {T}}\), \({\mathbf {x}} = \varDelta \varvec{y}_\mathrm {af}\), \({\mathbf {b}} = \varvec{f}_\mathrm {af}\), and \(\varDelta \varvec{w}_\mathrm {af} = {\mathcal {A}}^{\mathsf {T}} \varDelta \varvec{y}_\mathrm {af}\) for the predictor stage, and similarly, let \({\mathbf {A}} = {\mathcal {A}} {\mathcal {A}}^{\mathsf {T}}\), \({\mathbf {x}} = \varDelta \varvec{y}_\mathrm {cc}\), \({\mathbf {b}} = \varvec{f}_\mathrm {cc}\), and \(\varDelta \varvec{w}_\mathrm {cc} = {\mathcal {A}}^{\mathsf {T}} \varDelta \varvec{y}_\mathrm {cc}\) for the corrector stage. CG and MINRES applied to systems \({\mathbf {A}} {\mathbf {x}} = {\mathbf {b}}\) are CGNE and MRNE, respectively. With these settings, let the initial solution \(\varDelta \varvec{w}_0 \in {\mathcal {R}}({\mathcal {A}}^{\mathsf {T}})\) in both stages, and denote the initial residual by \(\varvec{g}_0:=\varvec{f}-{\mathcal {A}}\varDelta \varvec{w}_0\). CGNE and MRNE can solve (11) without forming \({\mathcal {A}} {\mathcal {A}}^{\mathsf {T}}\) explicitly.

Concretely, CGNE gives the kth iterate \(\varDelta \varvec{w}_k\) such that \(\Vert \varDelta \varvec{w}_k - \varDelta \varvec{w}_* \Vert _2 = \min _{\varDelta \varvec{w} \in \varDelta \varvec{w}_0 + {\mathcal {K}}_k ({\mathcal {A}}^{\mathsf {T}} {\mathcal {A}}, {\mathcal {A}}^{\mathsf {T}} \varvec{g}_0)} \Vert \varDelta \varvec{w} - \varDelta \varvec{w}_* \Vert _2\), where \(\varDelta \varvec{w}_*\) is the minimum-norm solution of \({\mathcal {A}} \varDelta \varvec{w} = \varvec{f}\) for \(\varDelta \varvec{w}_0 \in {\mathcal {R}}({\mathcal {A}}^{\mathsf {T}})\) and \(\varvec{f} \in {\mathcal {R}}({\mathcal {A}})\). MRNE gives the kth iterate \(\varDelta \varvec{w}_k\) such that \(\Vert \varvec{f} - {\mathcal {A}} \varDelta \varvec{w}_k \Vert _2 = \min _{\varDelta \varvec{w} \in \varDelta \varvec{w}_0 + {\mathcal {K}}_k ({\mathcal {A}}^{\mathsf {T}} {\mathcal {A}}, {\mathcal {A}}^{\mathsf {T}} \varvec{g}_0)} \Vert \varvec{f} - {\mathcal {A}} \varDelta \varvec{w} \Vert _2\).

We use inner-iteration preconditioning for CGNE and MRNE methods. The following is a brief summary of the part of [55] where the inner-outer iteration method is analyzed. We give the expressions for the inner-iteration preconditioning and preconditioned matrices to state the conditions under which the former is SPD. Let M be a symmetric nonsingular splitting matrix of \({\mathcal {A}} {\mathcal {A}}^{\mathsf {T}}\) such that \({\mathcal {A}} {\mathcal {A}}^{\mathsf {T}} = M - N\). Denote the inner-iteration matrix by \(H = M^{-1} N\). The inner-iteration preconditioning and preconditioned matrices are \(C^{\langle \ell \rangle } = \sum _{i = 0}^{\ell - 1} H^i M^{-1}\) and \({\mathcal {A}} {\mathcal {A}}^{\mathsf {T}} C^{\langle \ell \rangle } = M\sum _{i=0}^{\ell -1} (I- H) H^iM^{-1} = M(I - H^\ell )M^{-1}\), respectively. If \(C^{\langle \ell \rangle }\) is nonsingular, then \({\mathcal {A}} {\mathcal {A}}^{\mathsf {T}} C^{\langle \ell \rangle } \varvec{u} = \varvec{f}\), \(\varvec{z} = C^{\langle \ell \rangle } \varvec{u}\) is equivalent to \({\mathcal {A}} {\mathcal {A}}^{\mathsf {T}} \varvec{z} = \varvec{f}\) for all \(\varvec{f} \in {\mathcal {R}}({\mathcal {A}})\). For \(\ell \) odd, \(C^{\langle \ell \rangle }\) is symmetric and positive definite (SPD) if and only if the inner-iteration M is SPD; for \(\ell \) even, \(C^{\langle \ell \rangle }\) is SPD if and only if \(M + N\) is SPD [52, 53, Theorem 2.8]. We give Algorithms 2, 3 for CGNE and MRNE preconditioned by inner iterations [55, Algorithms E.3, E.4].

figure b
figure c

3.2 Application of inner-iteration preconditioned AB-GMRES method

Next, we introduce AB-GMRES. GMRES can solve a square linear system transformed from the rectangular system \({\mathcal {A}} \varDelta \varvec{w}_\mathrm {af} = \varvec{f}_\mathrm {af}\) in the predictor stage and \({\mathcal {A}} \varDelta \varvec{w}_\mathrm {cc} = \varvec{f}_\mathrm {cc}\) in the corrector stage by using a rectangular right-preconditioning matrix that does not necessarily have to be \(\mathcal {A^{\mathsf {T}}}\). Let \({\mathcal {B}} \in {\mathbb {R}}^{n \times m}\) be a preconditioning matrix for \({\mathcal {A}}\). Then, AB-GMRES corresponds to GMRES [64] applied to

$$\begin{aligned} {\mathcal {A}} {\mathcal {B}} \varvec{z} = \varvec{f}, \quad \varDelta \varvec{w} = {\mathcal {B}} \varvec{z}, \end{aligned}$$

which is equivalent to the minimum-norm solution to the problem (11), for all \(\varvec{f} \in {\mathcal {R}}({\mathcal {A}})\) if \({\mathcal {R}}({\mathcal {B}}) = {\mathcal {R}}({\mathcal {A}}^{\mathsf {T}})\) [55, Theorem 5.2], where \(\varDelta \varvec{w} = \varDelta \varvec{w}_\mathrm {af}\) or \(\varDelta \varvec{w}_\mathrm {cc}\), \(\varvec{f} = \varvec{f}_\mathrm {af}\) or \(\varvec{f}_\mathrm {cc}\), respectively. AB-GMRES gives the kth iterate \(\varDelta \varvec{w}_k = {\mathcal {B}} \varvec{z}_k\) such that \(\varvec{z}_k=\arg \min _{\varvec{z} \in \varvec{z}_0 + {\mathcal {K}}_k ({\mathcal {A}} {\mathcal {B}}, \varvec{g}_0)} \Vert \varvec{f} - {\mathcal {A}} {\mathcal {B}} \varvec{z} \Vert _2\), where \(\varvec{z}_0\) is the initial iterate and \(\varvec{g}_0 = \varvec{f} - {\mathcal {A}} {\mathcal {B}} \varvec{z}_0\).

Specifically, we apply AB-GMRES preconditioned by inner iterations [54, 55] to (11). This method was shown to outperform previous methods on ill-conditioned and rank-deficient problems. We give expressions for the inner-iteration preconditioning and preconditioned matrices. Let M be a nonsingular splitting matrix such that \({\mathcal {A}} {\mathcal {A}}^{\mathsf {T}} = M - N\). Denote the inner-iteration matrix by \(H = M^{-1} N\). With \(C^{\langle \ell \rangle } = \sum _{i = 0}^{\ell - 1} H^i M^{-1}\), the inner-iteration preconditioning and preconditioned matrices are \({\mathcal {B}}^{\langle \ell \rangle } = {\mathcal {A}}^{\mathsf {T}} C^{\langle \ell \rangle }\) and \({\mathcal {A}} {\mathcal {B}}^{\langle \ell \rangle } = \sum _{i=0}^{\ell -1} (I- H) H^i = M(I - H^\ell )M^{-1}\), respectively. If the inner-iteration matrix H is semiconvergent, i.e., \(\lim _{i \rightarrow \infty } H^i\) exists, then AB-GMRES preconditioned by the inner-iterations determines the minimum-norm solution of \({\mathcal {A}} \varDelta \varvec{w} = \varvec{f}\) without breakdown for all \(\varvec{f} \in {\mathcal {R}}({\mathcal {A}})\) and for all \(\varDelta \varvec{w}_0 \in {\mathcal {R}}({\mathcal {A}}^{\mathsf {T}})\) [55, Theorem 5.5]. The inner-iteration preconditioning matrix \({\mathcal {B}}^{\langle \ell \rangle }\) works on \({\mathcal {A}}\) in AB-GMRES as in Algorithm 4 [55, Algorithm 5.1].

figure d

Here, \(\varvec{v}_1, \varvec{v}_2, \dots , \varvec{v}_k\) are orthonormal, \(\varvec{e}_1\) is the first column of the identity matrix, and \({\bar{H}}_k = \lbrace h_{i, j} \rbrace \in {\mathbb {R}}^{(k+1) \times k}\).

Note that the left-preconditioned generalized minimal residual method (BA-GMRES) [37, 54, 55] can be applied to solve the corrector stage problem, which can be written as the normal equations of the first kind

$$\begin{aligned} {\mathcal {A}}{\mathcal {A}}^{\mathsf {T}}\varDelta \varvec{y}_\mathrm {cc} = {\mathcal {A}}(SX)^{-1/2}\left( \varDelta X_\mathrm {af}\varDelta S_\mathrm {af}\varvec{e} - \sigma \mu \varvec{e}\right) , \end{aligned}$$

or equivalently

$$\begin{aligned} \min _{\varDelta \varvec{y}_\mathrm {cc}} \Vert {\mathcal {A}}^{\mathsf {T}} \varDelta \varvec{y}_\mathrm {cc} -(SX)^{-1/2}\left( \varDelta X_\mathrm {af}\varDelta S_\mathrm {af}\varvec{e} - \sigma \mu \varvec{e}\right) \Vert _2. \end{aligned}$$
(14)

In fact, this formulation was adopted in [32] and solved by the CGLS method preconditioned by partial Cholesky decomposition that works in m-dimen-sional space. The BA-GMRES also works in m-dimensional space.

The advantage of the inner-iteration preconditioning methods is that we can avoid explicitly computing and storing the preconditioning matrices for \({\mathcal {A}}\) in (11). We present efficient algorithms for specific inner iterations in the next section.

3.3 SSOR inner iterations for preconditioning the CGNE and MRNE methods

The inner-iteration preconditioned CGNE and MRNE methods require a symmetric preconditioning matrix. This is achieved by the SSOR inner-iteration preconditioning, which works on the normal equations of the second kind \({\mathcal {A}} {\mathcal {A}}^{\mathsf {T}} \varvec{z} = \varvec{g}\), \(\varvec{u} = {\mathcal {A}}^{\mathsf {T}} \varvec{z}\), and its preconditioning matrix \(C^{\langle \ell \rangle }\) is SPD for \(\ell \) odd for \(\omega \in (0, 2)\) [52, 53, Theorem 2.8]. This method exploits a symmetric splitting matrix by the forward updates, \(i = 1, 2, \dots , m\) in lines 36 in Algorithm 6 and the reverse updates, \(i = m, m-1, \dots , 1\), and can be efficiently implemented as the NE-SSOR method [63, 55, Algorithm D.8]. See [8] where SSOR preconditioning for CGNE with \(\ell = 1\) is proposed. Let \(\varvec{\alpha }_i^{\mathsf {T}}\) be the ith row vector of \({\mathcal {A}}\). Algorithm 5 shows the NE-SSOR method.

figure e

When Algorithm 5 is applied to lines 2 and 6 of Algorithm 2 and lines 2 and 6 of Algorithm 3, the normal equations of the second kind are solved approximately.

3.4 SOR inner iterations for preconditioning the AB-GMRES method

Next, we introduce the SOR method applied to the normal equations of the second kind \({\mathcal {A}} {\mathcal {A}}^{\mathsf {T}} \varvec{p} = \varvec{g}\), \(\varvec{z} = {\mathcal {A}}^{\mathsf {T}} \varvec{p}\) with \(\varvec{g} = \varvec{v}_k\) or \(\varvec{q}_k\) as used in Algorithm 4. If the relaxation parameter \(\omega \) satisfies \(\omega \in (0, 2)\), then the iteration matrix H of this method is semiconvergent, i.e., \(\lim _{i \rightarrow \infty } H^i\) exists [21]. An efficient algorithm for this method is called NE-SOR and is given as follows [63, 55, Algorithm D.7].

figure f

When Algorithm 6 is applied to lines 4 and 12 of Algorithm 4, the normal equations of the second kind are solved approximately.

Since the rows of A are required in the NE-(S)SOR iterations, it would be more efficient if A is stored row-wise.

3.5 Row-scaling of \({\mathcal {A}}\)

Let \({\mathcal {D}}\) be a diagonal matrix whose diagonal elements are positive. Then, problem (11) is equivalent to

$$\begin{aligned} \min \Vert \varDelta \varvec{w}\Vert _2 \quad \text{ subject } \text{ to } \quad {\mathcal {D}}^{-1}{\mathcal {A}}\varDelta \varvec{w} = {\mathcal {D}}^{-1}\varvec{f}. \end{aligned}$$
(15)

Denote \(\hat{{\mathcal {A}}}:={\mathcal {D}}^{-1}{\mathcal {A}}\) and \({\hat{f}}:={\mathcal {D}}^{-1}\varvec{f}\). Then, the scaled problem (15) is

$$\begin{aligned} \min \Vert \varDelta \varvec{w}\Vert _2 \quad \text{ subject } \text{ to } \quad \hat{{\mathcal {A}}}\varDelta \varvec{w} = \hat{\varvec{f}}. \end{aligned}$$
(16)

If \(\hat{{\mathcal {B}}}\in {\mathbb {R}}^{n\times m}\) satisfies \({\mathcal {R}}(\hat{{\mathcal {B}}}) = {\mathcal {R}}(\hat{{\mathcal {A}}}^{\mathsf {T}})\), then (16) is equivalent to

$$\begin{aligned} \hat{{\mathcal {A}}}\hat{{\mathcal {B}}}\hat{\varvec{z}} = \hat{\varvec{f}}, \qquad \varDelta \varvec{w} = \hat{{\mathcal {B}}}\hat{\varvec{z}} \end{aligned}$$
(17)

for all \(\hat{\varvec{f}}\in {\mathcal {R}}(\hat{{\mathcal {A}}})\). The methods discussed earlier can be applied to (17). In the NE-(S)SOR inner iterations, one has to compute \(\Vert \hat{\varvec{\alpha }}_i\Vert _2\), the norm of the ith row of \(\hat{{\mathcal {A}}}\). However, this can be omitted if the ith diagonal element of \({\mathcal {D}}\) is chosen as the norm of the ith row of \({\mathcal {A}}\), that is, \({\mathcal {D}}(i,i):=\Vert \varvec{\alpha }_i\Vert _2, \; i = 1, \dots , m\). With this choice, the matrix \(\hat{{\mathcal {A}}}\) has unit row norm \(\Vert \hat{\varvec{\alpha }}_i\Vert _2 = 1, \; i = 1, \dots , m\). Hence, we do not have to compute the norms \(\Vert \hat{\varvec{\alpha }}_i\Vert _2\) inside the NE-(S)SOR inner iterations if we compute the norms \(\Vert \varvec{\alpha }_i\Vert _2\) for the construction of the scaling matrix \({\mathcal {D}}\). The row-scaling scheme does not incur extra CPU time. We observe in the numerical results that this scheme improves the convergence of the Krylov subspace methods.

CGNE and MRNE preconditioned by inner iterations applied to a scaled linear system \({\mathcal {D}}^{-1} {\mathcal {A}} \varDelta \varvec{w} = {\mathcal {D}}^{-1} \varvec{f}\) are equivalent to CG and MINRES applied to \({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {A}}^{\mathsf {T}} C^{\langle \ell \rangle } {\mathcal {D}} \varvec{v} = \varvec{f}\), \(\varDelta \varvec{w} = {\mathcal {A}}^{\mathsf {T}} C^{\langle \ell \rangle } {\mathcal {D}} \varvec{v}\), respectively, and hence determine the minimum-norm solution of \({\mathcal {A}} \varDelta \varvec{w} = \varvec{f}\) for all \(\varvec{f} \in {\mathcal {R}}({\mathcal {A}})\) and for all \(\varDelta \varvec{w}_0 \in {\mathbb {R}}^n\) if \(C^{\ell }\) is SPD. Now we give conditions under which AB-GMRES preconditioned by inner iterations applied to a scaled linear system \({\mathcal {D}}^{-1} {\mathcal {A}} \varDelta \varvec{w} = {\mathcal {D}}^{-1} \varvec{f}\) determines the minimum-norm solution of the unscaled one \({\mathcal {A}} \varDelta \varvec{w} = \varvec{f}\).

Lemma 1

If \({\mathcal {R}}({\mathcal {B}}) = {\mathcal {R}}({\mathcal {A}}^{\mathsf {T}})\) and \({\mathcal {D}} \in {\mathbb {R}}^{m \times m}\) is nonsingular, then AB-GMRES applied to \({\mathcal {D}}^{-1} {\mathcal {A}} \varDelta \varvec{w} = {\mathcal {D}}^{-1} \varvec{f}\) determines the solution of \(\min \Vert \varDelta \varvec{w} \Vert _2\), subject to \({\mathcal {A}} \varDelta \varvec{w} = \varvec{f}\) without breakdown for all \(\varvec{f} \in {\mathcal {R}}({\mathcal {A}})\) and for all \(\varDelta \varvec{w}_0 \in {\mathbb {R}}^n\) if and only if \({\mathcal {N}}({\mathcal {B}}) \cap {\mathcal {R}}({\mathcal {D}}^{-1} {\mathcal {A}}) = \lbrace \varvec{0} \rbrace \).

Proof

Since \({\mathcal {R}}({\mathcal {B}}) = {\mathcal {R}}({\mathcal {A}}^{\mathsf {T}})\) gives \({\mathcal {R}}({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {B}}) = {\mathcal {R}}({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {A}}^{\mathsf {T}}) = {\mathcal {R}}({\mathcal {D}}^{-1} {\mathcal {A}})\), the equality \(\min _{\varvec{u} \in {\mathbb {R}}^m} \Vert {\mathcal {D}}^{-1} (\varvec{f} - {\mathcal {A}} {\mathcal {B}} \varvec{u}) \Vert _2 = \min _{\varDelta \varvec{w} \in {\mathbb {R}}^n} \Vert {\mathcal {D}}^{-1} (\varvec{f} - {\mathcal {A}} \varDelta \varvec{w}) \Vert _2\) holds for all \(\varvec{f} \in {\mathbb {R}}^m\) [37, Theorem 3.1]. AB-GMRES applied to \({\mathcal {D}}^{-1} {\mathcal {A}} \varDelta \varvec{w} = {\mathcal {D}}^{-1} \varvec{f}\) determines the kth iterate \(\varDelta \varvec{w}_k\) by minimizing \(\Vert {\mathcal {D}} (\varvec{f} - {\mathcal {A}} \varDelta \varvec{w}) \Vert _2\) over the space \(\varDelta \varvec{w}_0 + {\mathcal {K}}_k ({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {B}}, {\mathcal {D}}^{-1} \varvec{g}_0)\), and thus determines the solution of \(\min \Vert \varDelta \varvec{w} \Vert _2\), subject to \({\mathcal {D}}^{-1} {\mathcal {A}} \varDelta \varvec{w} = {\mathcal {D}}^{-1} \varvec{f}\) without breakdown for all \(\varvec{f} \in {\mathcal {R}}({\mathcal {A}})\) and for all \(\varDelta \varvec{w}_0 \in {\mathbb {R}}^n\) if and only if \({\mathcal {N}}({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {B}}) \cap {\mathcal {R}}({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {B}}) = \lbrace \varvec{0} \rbrace \) [55, Theorem 5.2], which reduces to \({\mathcal {R}}({\mathcal {D}}^{-1} {\mathcal {A}}) \cap {\mathcal {N}}({\mathcal {B}}) = \lbrace \varvec{0} \rbrace \) from \({\mathcal {N}}({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {B}}) = {\mathcal {R}}({\mathcal {B}}^{\mathsf {T}} {\mathcal {A}}^{\mathsf {T}} {\mathcal {D}}^{-{\mathsf {T}}})^\perp = {\mathcal {R}}({\mathcal {B}}^{\mathsf {T}} {\mathcal {A}}^{\mathsf {T}})^\perp = {\mathcal {R}}({\mathcal {B}}^{\mathsf {T}} {\mathcal {B}})^\perp = {\mathcal {R}}({\mathcal {B}}^{\mathsf {T}})^\perp = {\mathcal {N}}({\mathcal {B}})\). \(\square \)

Theorem 1

If \({\mathcal {D}} \in {\mathbb {R}}^{m \times m}\) is nonsingular and the inner-iteration matrix is semiconvergent, then AB-GMRES preconditioned by the inner iterations applied to \({\mathcal {D}}^{-1} {\mathcal {A}} \varDelta \varvec{w} = {\mathcal {D}}^{-1} \varvec{f}\) determines the solution of \(\min \Vert \varDelta \varvec{w} \Vert _2\), subject to \({\mathcal {A}} \varDelta \varvec{w} = \varvec{f}\) without breakdown for all \(\varvec{f} \in {\mathcal {R}}({\mathcal {A}})\) and for all \(\varDelta \varvec{w}_0 \in {\mathbb {R}}^n\).

Proof

From Lemma 1, it is sufficient to show that \({\mathcal {R}}({\mathcal {B}}) ={\mathcal {R}}({\mathcal {A}}^{\mathsf {T}})\) and \({\mathcal {N}}({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {B}}) \cap {\mathcal {R}}({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {B}}) = \lbrace \varvec{0} \rbrace \). Since \({\mathcal {D}}^{-1} M {\mathcal {D}}^{-{\mathsf {T}}} = {\mathcal {D}}^{-1} ({\mathcal {A}} {\mathcal {A}}^{\mathsf {T}} - N) {\mathcal {D}}^{-{\mathsf {T}}}\) is the splitting matrix of \({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {A}}^{\mathsf {T}} {\mathcal {D}}^{-{\mathsf {T}}}\) for the inner iterations, the inner-iteration matrix is \({\mathcal {D}}^{\mathsf {T}} H {\mathcal {D}}^{-{\mathsf {T}}}\). Hence, the inner-iteration preconditioning matrix \({\mathcal {B}} = {\mathcal {A}}^{\mathsf {T}} C^{\langle \ell \rangle } {\mathcal {D}}\) satisfies \({\mathcal {R}}({\mathcal {B}}) = {\mathcal {R}}({\mathcal {A}}^{\mathsf {T}})\) [55, Lemma 4.5]. On the other hand, \({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {B}} = {\mathcal {D}}^{-1} M (I - H^{\ell }) ({\mathcal {D}}^{-1} M)^{-1}\) satisfies \({\mathcal {N}}({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {B}}) \cap {\mathcal {R}}({\mathcal {D}}^{-1} {\mathcal {A}} {\mathcal {B}}) = \lbrace \varvec{0} \rbrace \) [55, Lemmas 4.3, 4.4]. \(\square \)

4 Numerical experiments

In this section, we compare the performance of the interior-point method based on the iterative solvers with the standard interior-point programs. We also developed an efficient direct solver coded in C to compare with the iterative solvers. For the sake of completeness, we briefly describe our direct solver first.

4.1 Direct solver for the normal equations

To deal with the rank-deficiency, we used a strategy that is similar to the Cholesky-Infinity modification scheme introduced in the LIPSOL solver [74]. However, instead of penalizing the elements that are close to zero, we removed them and solved the reduced system. We implemented this modification by an LDLT decomposition. We used the Matlab built-in function chol to detect whether the matrix is symmetric positive definite. We used the ldlchol from CSparse package version 3.1.0 [19] when the matrix was symmetric positive definite, and we turned to the Matlab built-in solver ldl for the semidefinite cases which uses MA57 [23].

We explain the implementation by an example where \({\mathcal {A}}{\mathcal {A}}^{\mathsf {T}} \in {\mathbb {R}}^{3\times 3}\). For matrix \({\mathcal {A}}{\mathcal {A}}^{\mathsf {T}}\), LDLT decomposition gives

$$\begin{aligned} {\mathcal {A}}{\mathcal {A}}^{\mathsf {T}} = L G L^{\mathsf {T}} =\begin{bmatrix} 1&0&0\\ l_{21}&1&0 \\ l_{31}&l_{32}&1 \end{bmatrix} \begin{bmatrix} g_1&0&0 \\0&g_2&0\\0&0&g_3 \end{bmatrix} \begin{bmatrix} 1&l_{21}&l_{31}\\0&1&l_{32} \\ 0&0&1\end{bmatrix}. \end{aligned}$$

Correspondingly, we partition \(\varDelta \varvec{y}= [ \varDelta y_1,\varDelta y_2,\varDelta y_3 ]^{\mathsf {T}}\) and \(\varvec{f}=[ f_1, f_2, f_3]^{\mathsf {T}}\). Assuming that the diagonal element \(g_2\) is close to zero, we let \({\tilde{L}}:= \bigl [{\begin{matrix} 1 &{} 0\\ l_{31} &{} 1 \end{matrix}}\bigr ]\), \({\tilde{G}}:= \bigl [{\begin{matrix} g_1&{} 0\\ 0 &{} g_3 \end{matrix}}\bigr ]\), \(\tilde{\varvec{f}} = [f_1, f_3]^{\mathsf {T}}\), \(\tilde{\varDelta \varvec{y}} = [\varDelta y_1, \varDelta y_3]^{\mathsf {T}}\), and solve

$$\begin{aligned} {\tilde{L}}{\tilde{G}}^{1/2}\left( ({\tilde{L}}{\tilde{G}}^{1/2})^{\mathsf {T}} \tilde{\varDelta \varvec{y}}\right) = \tilde{\varvec{f}}, \end{aligned}$$

using forward and backward substitutions. The solution is then given by \(\varDelta \varvec{y} = [\varDelta y_1, 0, \varDelta y_3]^{\mathsf {T}}\).

4.2 Implementation specifications

In this section, we describe our numerical experiments.

The initial solution for the interior-point method was set using the method described in LIPSOL solver [74]. The initial solution for the Krylov subspace iterations and the inner iterations was set to zero.

We set the maximum number of the interior-point iterations as 99 and the stopping criterion regarding the error measure as

$$\begin{aligned} \varGamma ^{(k)}\le \epsilon _\mathrm {out} = 10^{-8}, \end{aligned}$$
(18)

where \(\varGamma ^{(k)}\) is defined by (10).

For the iterative solver for the linear system (11), we set the maximum number of iterations for CGNE, MRNE and AB-GMRES as m, and relaxed it to 40,000 for some difficult problems for CGNE and MRNE. We set the stopping criterion for the scaled residual as

$$\begin{aligned} \Vert \hat{\varvec{f}}-\hat{{\mathcal {A}}}\varDelta \varvec{w}^{(k)}\Vert _2 \le \epsilon _\mathrm {in}\Vert \hat{\varvec{f}}\Vert _2, \end{aligned}$$

where \(\epsilon _\mathrm {in} \) is initially \(10^{-6}\) and is kept in the range \([10^{-14}, 10^{-4}]\) during the process. We adjusted \(\epsilon _\mathrm {in} \) according to the progress of the interior-point iterations. We truncated the iterative solving prematurely in the early interior-point iterations, and pursued a more precise direction as the LP solution was approached. The progress was measured by the error measure \(\varGamma ^{(k)}\). Concretely, we adjusted \(\epsilon _\mathrm {in} \) as

$$\begin{aligned} \epsilon _\mathrm {in}^{(k)} = {\left\{ \begin{array}{ll} \epsilon _\mathrm {in}^{(k-1)} \times 0.75 &{} \varvec{if }\,\log _{10}{\varGamma ^{(k)}}\in (-3, 1],\\ \epsilon _\mathrm {in}^{(k-1)} \times 0.375 &{} \varvec{if }\, \log _{10}{\varGamma ^{(k)}} \in (-\infty ,-3].\end{array}\right. } \end{aligned}$$

For steps where iterative solvers failed to converge within the maximum number of iterations, we adopted the iterative solution with the minimum residual norm and slightly increased the value of \(\epsilon _\mathrm {in}\) by multiplying by 1.5 which would be used in the next interior-point step.

Note that preliminary experiments were conducted with the tolerance being fixed for all the problems. However, further experiments showed that adjusting the parameter \(\epsilon _\mathrm {in}\) with the progress towards an optimal solution worked better. This is also another advantage of using iterative solvers rather than direct solvers.

We adopt the implementation of AB-GMRES preconditioned by NE-SOR inner-iterations [56] with the additional row-scaling scheme (Sect. 3.5). No restarts were used for the AB-GMRES method. The non-breakdown conditions discussed in Sects. 3.1, 3.2 are satisfied.

For the direct solver, the tolerance for dropping pivot elements close to zero was \(10^{-16}\) for most of the problems; for some problems this tolerance has to be increased to \(10^{-6}\) to overcome breakdown.

The experiment was conducted on a MacBook Pro with a 2.6 GHz Intel Core i5 processor with 8 GB of random-access memory, OS X El Capitan version 10.11.2. The interior-point method was coded in Matlab R2014b and the iterative solvers including AB-GMRES (NE-SOR), CGNE (NE-SSOR), and MRNE (NE-SSOR), were coded in C and compiled as Matlab Executable (MEX) files accelerated with Basic Linear Algebra Subprograms (BLAS).

We compared our implementation with PDCO version 2013 [65] and three solvers available in CVX [35, 36]: SDPT3 version 4.0 [68, 69], SeDuMi version 1.34 [68] and MOSEK version 7.1.0.12 [57], with the default interior-point stopping criterion (18). Note that SDPT3, SeDuMi, and PDCO are non-commercial public domain solvers, whereas MOSEK is a commercial solver known as one of the state-of-the-art solvers. PDCO provides several choices for the solvers for the interior-point steps, among which we chose the direct (Cholesky) method and the LSMR method. Although MINRES solver is another iterative solver available in PDCO, its homepage [65] suggests that LSMR performs better in general. Thus, we tested with LSMR. For PDCO parameters, we chose to suppress scaling for the original problem. The other solvers were implemented with the CVX Matlab interface, and we recorded the CPU time reported in the screen output of each solver. However, it usually took a longer time for the CVX to finish the whole process. The larger the problem was, the more apparent this extra CPU time became. For example, for problem ken_ 18, the screen output of SeDuMi was 765.3 s while the total processing time was 7615.2 s.

We tested on two classes of LP problems: 127 typical problems from the benchmark libraries and 13 problems arising from basis pursuit. The results are described in Sects. 4.3 and 4.4, respectively.

4.3 Typical LP problems: sparse and ill-conditioned problems

We tested 127 typical LP problems from the Netlib, Qaplib and Mittelmann collections in [20]. Most of the problems have sparse and full-rank constraint matrix A (except problems bore3d and cycle). For the problems with \(\varvec{l}\le \varvec{x}\le \varvec{u},\;\varvec{l}\ne \varvec{0},\;\varvec{u}\ne \varvec{\infty }\), we transform them using the approach in LIPSOL [74].

The overall summary of numerical experiments on the 127 typical problems is given in Table 1. The counts in column “Failed” include the case where a problem was solved at a relaxed tolerance (phrased as “inaccurately solved” in CVX). Column “Expensive” refers to the case where the interior-point iterations took more than a time limit of 20 hours.

Table 1 Overall performance of the solvers on 127 testing problems

MOSEK was most stable in the sense that it solved all 127 problems, and MRNE (NE-SSOR) came next with only two failures with the Netlib problems greanbea and greanbeb. CGNE (NE-SSOR) method solved almost all the problems that MRNE (NE-SSOR) solved, except for the largest Qaplib problem, which was solved to a slightly larger tolerance level of \(10^{-7}\). AB-GMRES (NE-SOR) was also very stable and solved the problems accurately enough. However, it took longer than 20 hours for two problems that have 105,127 and 16,675 equations, respectively, although it succeeded in solving larger problems such as pds-80. The other solvers were less stable. The modified Cholesky solver and PDCO (Direct) solved \(92\%\) and \(87\%\) of the problems, respectively, although they were faster than the other solvers for the problems that they could successfully solve. PDCO (LSMR) solved \(69\%\) problems and was slower than the proposed solvers. The reason could be that it does not use preconditioners. SDPT3 solved \(60\%\) and SeDuMi \(82\%\) of the problems. Here we should mention that SeDuMi and SDPT3 are designed for LP, SDP, and SOCP, while our code is (currently) tuned solely for LP.

Note that MOSEK solver uses a multi-corrector interior-point method [30] while our implementation is a single corrector (i.e., predictor-corrector) method. This led to different numbers of interior-point iterations as shown in the tables. Thus, there is still room for improvement in the efficiency of our solver based on iterative solvers if a more elaborately tuned interior-point framework such as the one in MOSEK is adopted.

Fig. 1
figure 1

Dolan–Moré profiles comparing the CPU time costs for the proposed solvers, public domain and commercial solvers

Fig. 2
figure 2

Dolan–Moré profiles comparing the CPU time costs for the proposed solvers and public domain solvers

In order to show the trends of performance, we use the Dolan–Moré performance profiles [22] in Figs. 1 and 2, with \(\pi (\tau ):=P(\log _2 r_{ps}\le \tau )\) the proportion of problems for which \(\log _2\)-scaled performance ratio is at most \(\tau \), where \(r_{ps}:=t_{ps}/t^*_\mathrm {p}\), \(t_{ps}\) is the CPU time for solver s to solve problem p, and \(t^*_\mathrm {p}\) is the minimal CPU time for problem p. Figure 1 includes the commercial solver MOSEK while Fig. 2 does not. Note that the generation of Fig. 2 is not by simply removing the curve of MOSEK from Fig. 1, but rather removing the profile of MOSEK from the comparison dataset and thus changing the minimum CPU time cost for each problem. The comparison indicates that the iterative solvers, although slower than the commercial solver MOSEK in some cases, were often able to solve the problems to the designated accuracy.

In Tables 2, 3, and 4, we give the following information:

  1. 1.

    the name of the problem and the size (mn) of the constraint matrix,

  2. 2.

    the number of interior-point iterations required for convergence,

  3. 3.

    CPU time for the entire computation in seconds. For the cases shorter than 3000 s, CPU time is taken as an average over 10 measurements. In each row, we indicate in boldface and italic the fastest and second fastest solvers in CPU time, respectively.

Besides the statistics, we also use the following notation:

  • †inaccurately solved, i.e., the value of \(\epsilon _\mathrm {out}\) was relaxed to a larger level. In the column “Iter”, we provide extra information †\(_a\) at the stopping point: for our solvers, \(a=\lfloor \log _{10}\varGamma ^{(k)}\rfloor \), where \(\lfloor \cdot \rfloor \) is the floor function; for CVX solvers, \(a=\lfloor \log _{10}\mu \rfloor \) as provided in the CVX output; PDCO solvers do not provide this information, thus they are not given;

  • \(\mathtt {f}\) the interior-point iterations diverged;

  • \(\mathtt {t}\) the iterations took longer than 20 hours.

Note that all zero rows and columns of the constraint matrix A were removed beforehand. The problems marked with \({\#}\) are with rank-deficient A even after this preprocessing. For these problems we put \(\mathrm {rank}(A)\) in brackets after m, which is computed using the Matlab function \(\mathtt {sprank}\).

In order to give an idea of the typical differences between methods, we present the interior-point convergence curves for problem ken_ 13. The problem has a constraint matrix \(A\in {\mathbb {R}}^{28,632\times 42,659}\) with full row rank and 97, 246 nonzero elements.

Different aspects of the performance of the four solvers are displayed in Fig. 3. The red dotted line with diamond markers represents the quantity related to AB-GMRES (NE-SOR), the blue with downward-pointing triangle CGNE (NE-SSOR), the yellow with asterisk MRNE (NE-SSOR), and the dark green with plus sign the modified Cholesky solver. Note that for this problem ken_ 13, the modified Cholesky solver became numerically inaccurate at the last step and it broke down if the default dropping tolerance was used. Thus, we increased it to \(10^{-6}\).

Figure 3a shows \(\kappa ({\mathcal {A}}{\mathcal {A}}^{\mathsf {T}})\) in \(\log _{10}\) scale. It verifies the claim that the least squares problem becomes increasingly ill-conditioned at the final steps in the interior-point process: \(\kappa ({\mathcal {A}}{\mathcal {A}}^{\mathsf {T}})\) started from around \(10^{20}\) and increased to \(10^{80}\) at the last 3–5 steps. Figure 3b shows the convergence curve of the duality measure \(\mu \) in \(\log _{10}\) scale. The \(\mu \) drops below the tolerance and the stopping criterion is satisfied. Although it is not shown in the figure, we found that the interior-point method with modified Cholesky with the default value of the dropping tolerance \(10^{-16}\) stagnated for \(\mu \simeq 10^{-4}\). Comparing with Fig. 3a, it is observed that the solvers started to behave differently as \(\kappa ({\mathcal {A}}{\mathcal {A}}^{\mathsf {T}})\) increased sharply.

Table 2 Experiments on Netlib problems
Table 3 Experiments on Qaplib problems
Table 4 Experiments on Mittelmann problems
Fig. 3
figure 3

Numerical results for problem ken_ 13

Figure 3c, d show the relative residual norm \(\Vert \varvec{f}_\mathrm {af} - {\mathcal {A}} {\mathcal {A}}^{\mathsf {T}} \varDelta \varvec{y}_\mathrm {af} \Vert _2 / \Vert \varvec{f}_\mathrm {af}\Vert _2\) in the predictor stage and \(\Vert \varvec{f}_\mathrm {cc} - {\mathcal {A}} {\mathcal {A}}^{\mathsf {T}} \varDelta \varvec{y}_\mathrm {cc} \Vert _2/ \Vert \varvec{f}_\mathrm {cc}\Vert _2\) in the corrector stage, respectively. The quantities are in \(\log _{10}\) scale. The relative residual norm for modified Cholesky tended to increase with the interior-point iterations and sharply increased in the final phase when it lost accuracy in solving the normal equations for the steps. We observed similar trends for other test problems and, in the worst cases, the inaccuracy in the solutions prevented interior-point convergence. Among the iterative solvers, AB-GMRES (NE-SOR) and MRNE (NE-SSOR) were the most stable in keeping the accuracy of solutions to the normal equations; CGNE (NE-SSOR) performed similarly but lost numerical accuracy at the last few interior-point steps.

Figure 3e, f show the CPU time and number of iterations of the Krylov methods for each interior-point step, respectively. It was observed that the CPU time of the modified Cholesky solver was more evenly distributed in the whole process while that of the iterative solvers tended to be less in the beginning and ending phases. At the final stage, AB-GMRES (NE-SOR) required the fewest number of iterations but cost much more CPU time than the other two iterative solvers. This can be explained as follows: AB-GMRES (NE-SOR) requires increasingly more CPU time and memory with the number of iterations because it has to store the orthonormal vectors in the modified Gram-Schmidt process as well as the Hessenberg matrix. In contrast, CGNE (NE-SSOR) and MRNE (NE-SSOR) based methods require constant memory. CGNE (NE-SSOR) took more iterations and CPU time than MRNE (NE-SSOR). Other than \({\mathcal {A}}\) and the preconditioner, the memory required for k iterations of AB-GMRES is \({\mathcal {O}}(k^2 + km + n)\) and that for CGNE and MRNE iterations is \({\mathcal {O}}(m+n)\) [37, 55]. This explains why AB-GMRES (NE-SOR), although requiring fewer iterations, usually takes longer to obtain the solution at each interior-point step. We also did experiments on restarting AB-GMRES for a few problems. However, the performance was not competitive compared to the non-restarted version.

On the other hand, the motivation for using AB-GMRES (NE-SOR) is that GMRES is more robust for ill-conditioned problems than the symmetric solvers CG and MINRES. This is because GMRES uses a modified Gram-Schmidt process to orthogonalize the vectors explicitly; CG and MINRES rely on short recurrences, where orthogonality of vectors may be lost due to rounding error. Moreover, GMRES allows using non-symmetric preconditioning while the symmetric solvers require symmetric preconditioning. For example, using SOR preconditioner is cheaper than SSOR for one iteration because the latter goes forwards and backwards. SOR requires 2MV + 3m operations per inner iteration, while SSOR requires 4MV + 6m. In this sense, the GMRES method has more freedom for choosing preconditioners.

From Fig. 3, we may draw a few conclusions. For most problems, the direct solver gave the most efficient result in terms of CPU time. However, for some problems, the direct solver tended to lose accuracy as interior-point iterations proceeded and, in the worst cases, this would inhibit convergence. For problems where the direct method broke down, the proposed inner-iteration preconditioned Krylov subspace methods worked until convergence. With the iterative solvers, it is acceptable to solve (7) and (8) to a moderate level of accuracy in the early phase of the interior-point iterations, and then increase the level of accuracy in the late phase.

4.4 Basis pursuit problems

Most of the problems tested in the last section have a sparse constraint matrix A. The average nonzero density is \(2.55\%\), \(0.62\%\), and \(0.45\%\) for the problems in Netlib, Qaplib, and Mittelmann, respectively. However, the matrix can be large and dense for problems such as QP in support vector machine training and linear programming in basis pursuit [11]. The package Atomizer [10] gives such matrices.

In this section, we enrich the experiment by adding problems arising from basis pursuit [11]. We reproduced the \(\ell _1\)-norm optimization problems from the package Atomizer [10], and reformulated them in the standard form of linear programming. The connection between basis pursuit and LP can be found therein. The problems tested in this section have constraint matrices with average nonzero density \(48.33\%\) and are usually very well-conditioned, with condition number in the range of (1, 18.54]. The results are shown in Table 5.

Table 5 Experiments on basis pursuit problems

The notations have the same meaning as explained in the previous section. Although PDCO’s direct solver may be fast for the problems in Table 5, if the problems are given without explicit constraint matrices, one has to use the iterative solver (e.g., LSMR) version. The result shows that only AB-GMRES (NE-SOR) and MOSEK succeeded in solving all the problems. Among these two methods, AB-GMRES (NE-SOR) was faster than MOSEK for the problems bpfig22, bpfig23, bpfig31, and bpfig51.

5 Conclusions

We proposed a new way of preconditioning the normal equations of the second kind arising within interior-point methods for LP problems (11). The resulting interior-point solver is composed of three nested iteration schemes. The outer-most layer is the predictor-corrector interior-point method; the middle layer is the Krylov subspace method for least squares problems, where we may use AB-GMRES, CGNE or MRNE; on top of that, we use a row-scaling scheme that does not incur extra CPU time but helps improving the condition of the system at each interior-point step; the inner-most layer, serving as a preconditioner for the middle layer, is the stationary inner iterations. Among the three layers, only the outer-most one runs towards the required accuracy and the other two are terminated prematurely. The linear systems are solved with a gradually tightened stopping tolerance. We also proposed a new recurrence regarding \(\varDelta \varvec{w}\) in place of \(\varDelta \varvec{y}\) to omit one matrix-vector product at each interior-point step. We showed that the use of inner-iteration preconditioners in combination with these techniques enables the efficient interior-point solution of wide-ranging LP problems. We also presented a fairly extensive benchmark test for several renowned solvers including direct and iterative solvers.

The advantage of our method is that it does not break down, even when the matrices become ill-conditioned or (nearly) singular. The method is competitive for large and sparse problems and may also be well-suited to problems in which matrices are too large and dense for direct approaches to work. Extensive numerical experiments showed that our method outperforms the open-source solvers SDPT3, SeDuMi, and PDCO regarding stability and efficiency.

There are several aspects of our method that could be improved. The current implementation of the interior-point method does not use a preprocessing step except for eliminating empty rows and columns. Its efficiency may be improved by adopting some existing preprocessing procedure such as presolve to detect and remove linear dependencies of rows and columns in the constraint matrix. Also, the proposed method could be used in conjunction with more advanced interior-point frameworks such as the multi-corrector interior-point method. In terms of the linear solver, future work is to try reorthogonalization for CG and MINRES and the Householder orthogonalization for GMRES. It is also important to develop preconditioners that only require the action of the operator on a vector, as in huge basis pursuit problems.

It would also be worthwhile to extend our method to problems such as convex QP and SDP.