Block preconditioners for linear systems in interior point methods for convex constrained optimization

In this paper, we address the preconditioned iterative solution of the saddle-point linear systems arising from the (regularized) Interior Point method applied to linear and quadratic convex programming problems, typically of large scale. Starting from the well-studied Constraint Preconditioner, we review a number of inexact variants with the aim to reduce the computational cost of the preconditioner application within the Krylov subspace solver of choice. In all cases we illustrate a spectral analysis showing the conditions under which a good clustering of the eigenvalues of the preconditioned matrix can be obtained, which foreshadows (at least in case PCG/MINRES Krylov solvers are used), a fast convergence of the iterative method. Results on a set of large size optimization problems confirm that the Inexact variants of the Constraint Preconditioner can yield efficient solution methods.


Introduction
In this paper, we consider linear and convex quadratic programming (LP and QP) problems of the following form: where c, x ∈ R n , b ∈ R m , A ∈ R m×n . For quadratic programming problems we assume that Q 0 ∈ R n×n , while for linear programming we have Q = 0. The problem (1.1) is often referred to as the primal form of the quadratic programming problem; the dual form of the problem is given by where z ∈ R n , y ∈ R m . Problems of linear or quadratic programming form are fundamental problems in optimization, and arise in a wide range of scientific applications. A variety of optimization methods exist for solving the problem (1.1). Two popular and successful approaches are interior point methods (IPMs) and proximal methods of multipliers (PMMs). Within an IPM, a Lagrangian is constructed involving the objective function and the equality constraints of (1.1), to which a logarithmic barrier function is then added in place of the inequality constraints. Hence, a logarithmic barrier sub-problem is solved at each iteration of the algorithm (see [36] for a survey on IPMs). The key feature of a PMM is that, at each iteration, one seeks the minimum of the problem (1.1) as stated, but one adds to the objective function a penalty term involving the norm of the difference between x and the previously computed estimate. Then, an augmented Lagrangian method is applied to approximately solve each such sub-problem (see [55,63] for a review of proximal point methods, and [16,41,61,62] for a review of augmented Lagrangian methods).
Upon applying either naive IPM or IP-PMM, the vast majority of the computational effort arises from the solution of the resulting linear systems of equations at each IP-PMM iteration. These linear equations can be tackled in the form of an augmented system, or the reduced normal equations: we focus much of our attention on the augmented system, as unless Q has some convenient structure it is highly undesirable to form the normal equations or apply the resulting matrix within a solver. Within the linear algebra community, direct methods are popular for solving such systems due to their generalizability, however if the matrix system becomes sufficiently large the storage and/or operation costs can rapidly become excessive, depending on the computer architecture used. The application of iterative methods, for instance those based around Krylov subspace methods such as the Conjugate Gradient method (CG) [42] or MINRES [54], is an attractive alternative, but if one cannot construct suitable preconditioners which can be applied within such solvers then convergence can be prohibitively slow, and indeed it is possible that convergence is not achieved at all. The development of powerful preconditioners is therefore crucial.
A range of general preconditioners have been proposed for augmented systems arising from optimization problems, see [14,17,24,29,53,66] for instance. However, as is the case within the field of preconditioning in general, these are typically sensitive to changes in structure of the matrices involved, and can have substantial memory requirements. Preconditioners have also been successfully devised for specific classes of programming problems solved using similar optimization methods: applications include those arising from multicommodity network flow problems [22], stochastic programming problems [21], formulations within which the constraint matrix has primal block-angular structure [23], and PDE-constrained optimization problems [56,57], or various applications [26]. However, such preconditioners exploit particular structures arising from specific applications; unless there exists such a structure which hints as to the appropriate way to develop a solver, the design of bespoke preconditioners remains a challenge. We finally remark that also selection of a proper stopping criterion for the inner linear solver is crucial to devise an efficient outer-inner iteration (see [74] for a recent study).
The block structure of the linear system we will address is the well-known saddlepoint system: where G is an R n×n SPD matrix, while A ∈ R m×n rectangular matrix with full row rank, as usually m < n. The exact constraint preconditioner (CP) is defined as Application of M −1 to a vector, required at each iteration of a Krylov solver, rests on the efficient computation of the square-root free Cholesky factorization of the negative Schur complement of D in M, which allows for the following factorization: The aim of this paper is to review the block preconditioners proposed mainly for solving system (1.3) with focus on the approximate construction (or factorization) of the Schur complement matrix, a key issue for the development of efficient iterative solver. This paper is structured as follows. In Sect. 2 we describe the structure of a linear system to be solved at each IP iteration in the case of Quadratic (Linear) and Non Linear problems. In Sect. 3 we recall the main properties of the exact constraint preconditioner (see e.g. [44]), whereas in Sects. 4, 5 and 6 we describe three variants of this approach, to make the cost of the preconditioner application affordable also when addressing large size problems. Namely, in Sect. 4 we describe the Inexact Constraint Preconditioner obtained by sparsifying matrix A [12,13], in Sect. 5 we approximate directly the Schur complement matrix AD −1 A T by neglecting small terms in D −1 as proposed in [11]; in Sect. 6 we review an approach which considers application of the exact CP at selected IP iterations while simply updating it (by a BFGS-like formula) in the subsequent iterations [8]. Some numerical results are shown in Sect. 7 and, finally, in Sect. 8 we give some concluding remarks. Notation: Given an arbitrary square (or rectangular) matrix A, then λ max (A) and λ min (A) (or σ max (A) and σ min (A)) denote the largest and smallest eigenvalues (or singular values) of the matrix A, respectively. In all the following sections we will indicate with the symbol H the coefficient matrix, with M the preconditioner in its "direct" form (i.e. M ≈ H ) and with P the inverse preconditioner (satisfying P ≈ H −1 ).

Linear algebra in interior point methods
Interior point methods for linear, quadratic and nonlinear optimization differ obviously in many details but they rely on the same linear algebra kernel. We discuss briefly two cases of quadratic and nonlinear programming, following the presentation in [14].

Quadratic programming
Consider the convex quadratic programming problem where Q ∈ R n×n is positive semidefinite matrix, A ∈ R m×n is the full rank matrix of linear constraints and vectors x, c and b have appropriate dimensions. The usual transformation in interior point methods consists in replacing inequality constraints with the logarithmic barriers to get where μ ≥ 0 is a barrier parameter. The Lagrangian associated with this problem has the form: ln x j and the conditions for a stationary point write where S = diag{s 1 , s 2 , . . . , s n } and e = (1, 1, . . . , 1) T , the first order optimality conditions (for the barrier problem) are: Interior point algorithm for quadratic programming [72] applies Newton method to solve this system of nonlinear equations and gradually reduces the barrier parameter μ to guarantee the convergence to the optimal solution of the original problem. The Newton direction is obtained by solving the system of linear equations: By elimination of from the second equation we get the symmetric indefinite augmented system of linear equations For a generic large and sparse matrix Q it is generally inefficient to form explicitly the normal equation matrix H N E so that this kind of problems are usually tackled in their augmented form (2.3).

Special cases. linear programming, separable problems
Simplified versions of (2.3) are obtained for linear programming by simply setting Q ≡ 0, thus obtaining This problem is usually solved by resorting to the normal equation system which now reads Differently from the quadratic case, the normal equation matrix can be explicitly formed. As H N E is symmetric positive definite, the system H N E y = b L P can be solved by the Preconditioned Conjugate Gradient (PCG) iterative method. However, as it is observed by many authors, the condition number of H N E grows rapidly as the Interior Point method approaches the solution to the optimization problem, and finding suitable preconditioners for this problem is currently under research. This topic will be discussed in Sect. 5. Separable problems are characterized by the fact that Q is a diagonal matrix and hence easily invertible. As in the linear case, H N E can be formed allowing to address the normal equation system.

Nonlinear programming
Consider the convex nonlinear optimization problem min f (x) where x ∈ R n , and f : R n → R and g : R n → R m are convex, twice differentiable. Having replaced inequality constraints with an equality g(x) + z = 0, where z ∈ R m is a nonnegative slack variable, we can formulate the associated barrier problem and write the Lagrangian for it The conditions for a stationary point write The first order optimality conditions (for the barrier problem) have thus the following form where Y = diag{y 1 , y 2 , · · · , y m }. Interior point algorithm for nonlinear programming [72] applies Newton method to solve this system of equations and gradually reduces the barrier parameter μ to guarantee the convergence to the optimal solution of the original problem. The Newton direction is obtained by solving the system of linear equations: where Using the third equation we eliminate from the second equation and get The matrix involved in this set of linear equations is symmetric and indefinite. For convex optimization problem (when f and g are convex), the matrix Q is positive semidefinite and if f is strictly convex, Q is positive definite and the matrix in (2.7) is quasidefinite. Similarly to the case of quadratic programming by eliminating x from the first equation we can reduce this system further to the form of normal equations The two systems (2.3) and (2.7) have many similarities. The main difference is that in (2.3) only the diagonal scaling matrix 1 changes from iteration to iteration, while in the case of nonlinear programming not only the matrix 2 = Z −1 Y but also the matrices Q(x, y) and A(x) in (2.7) change in every iteration. Both these systems are indefinite. However, to avoid the need of using 2 × 2 pivots in their factorization we transform them to quasidefinite ones by the use of primal and dual regularization [1]. Our analysis in the following sections is concerned with the quadratic optimization problems, and hence A and Q are constant matrices. However, the major conclusions can be generalized to the nonlinear optimization case.

Exact constraint preconditioner
In this section we shall discuss the properties of the preconditioned matrices involved in (2.3). For ease of the presentation we shall focus on the quadratic programming case with linear equality constraints hence we will assume that −1 2 = 0 (we also drop the subscript in 1 ). In [14] these results are extended to the nonlinear programming case.
Due to the presence of matrix −1 , the augmented system is very ill-conditioned. Indeed, some elements of tend to zero while others tend to infinity as the optimal solution of the problem is approached. The performance of any iterative method critically depends on the quality of the preconditioner in this case. Block-type preconditioners are widely used in linear systems obtained from a discretization of PDEs [10,15,34,58,73]. The preconditioners for the augmented system have also been used in the context of linear programming [33,53] and in the context of nonlinear programming [29,44,47,48,64]. As was shown in [53], the preconditioners for indefinite augmented system offer more freedom than those for the normal equations. Moreover, the factorization of the augmented system is sometimes much easier than that of the normal equations [2] (this is the case, for example, when A contains dense columns). Hence even in the case of linear programming (in which normal equations is a viable approach) augmented system offers important advantages. For quadratic and nonlinear programming the use of the augmented system is a must and so we deal with the augmented system preconditioners in this paper. On the other hand, we realize that for some specially structured problems such as multicommodity network flows, very efficient preconditioners for the normal equations [22,43] can also be designed.

Spectral analysis
We consider the augmented system CPs for matrix H have the following form: where D is some symmetric approximation to G. Any preconditioner of this type can be regarded as the coefficient matrix of a Karush-Kuhn-Tucker (KKT) system associated with an optimization problem with the same constraints as the original problem, thus motivating the name of the preconditioner. We note that D should be chosen so that M is nonsingular and is "easier to factorize" than H ; furthermore, it must involve in order to capture the key numerical properties of H . A common choice is a different approach consists in implicitly defining D by using a factorization of the form M = UCU T , where U and C are specially chosen matrices [28]. Here we consider (3.4), which is SPD. The spectral properties of the preconditioned matrix M −1 H and the application of CG with preconditioner M to the KKT linear system have been deeply investigated. For the sake of completeness, in the next theorem we summarize some theoretical results about CPs, given in e.g. [8,14,44,47].

Theorem 3.1 Let Z ∈ R n×(n−m) be a matrix whose columns span the nullspace of A.
Assume also that D is SPD. The following properties hold.

If CG is applied to system (3.2) with preconditioner M and starting guess u
then the corresponding iterates x j are the same as the ones generated by CG applied to with preconditioner Z T D Z . Thus, the component x * of the solution u * of system (3.2) is obtained in at most n − m iterations and the following inequality holds: The directions p j and the residuals r j generated by applying CG with preconditioner M to system (3.2), with the same starting guess as in item 4, take the following form: wherep j,1 and r j,1 are the direction and the residual, respectively, at the j-th iteration of CG applied to (3.7) with preconditioner Z T D Z , and From the previous theorem it follows that the preconditioned matrix has 2m unit eigenvalues independently of the particular choice of D; on the other hand, properties 2 and 3 show that the better D approximates G, the more the remaining n − m eigenvalues of M −1 H are clustered around 1. Furthermore, the application of CG to the KKT system (3.2) with preconditioner M is closely related to the application of CG to system (3.7) with preconditioner Z T D Z. We note that property 4 does not guarantee that y j = y * after at most n −m iterations; actually, a breakdown may occur at the (n − m + 1)-st iteration. However, this is a "lucky breakdown", in the sense that y * can be easily obtained starting from the last computed approximation of it, as shown in [47]. More generally, since it may happen that the 2-norm of the PCG residual may not decrease as fast as the H -norm of the PCG error, a suitable scaling of the KKT system matrix can be used to prevent this situation [64]. We mention that other constraint-preconditioned Krylov solvers, such as MINRES, SYMMLQ and GMRES have been analyzed and implemented in [27].

Remark 3.2
The excellent spectral properties of the constraint-preconditioned matrix and the possibility of employing a suitable Conjugate Gradient method makes this preconditioner a natural choice for solving the IP linear systems. However, we note that the effectiveness of CPs may be hidden by the computational cost for the factorization of their Schur complements, thus reducing the efficiency of the overall IP procedure. Sects. 4, 5 and 6 will be devoted to address this issue. In particular we will focus on 1. Constructing an inexact-constraint preconditioner, thus allowing a sparser L-factor for the Normal Equation matrix AD −1 A T . 2. Dropping small elements in D −1 , in this way producing a far sparser Schur complement matrix. 3. Avoiding factorization of the Schur complement matrix at each IP iteration by modifying, with a low rank matrix, the preconditioner computed at one of the previous iterations.

Inexact jacobian constraint preconditioner
We consider the preconditioning of the KKT system Hu = d by the inexact Jacobian constraint preconditioner M I J , where Once again we will consider D as a diagonal approximation of G andÃ a sparsification of A in order to have a sparse matrixÃD −1ÃT which is needed at each application of the preconditioner M I J . The eigenvalue distribution of M −1 I J H has been investigated in [12,13].
Following [12] we define E = A −Ã, rank(E) = p. Hereσ 1 is the smallest singular value ofÃD −1/2 . We introduce two error terms: which measure the errors of the approximations to the (1, 1) block and to the matrix of the constraints, respectively. The distance | | of the complex eigenvalues from one with = λ − 1, will be bounded in terms of these two quantities.
The efficiency of the inexact Jacobian constraint preconditioner rely on (1) the clustering of the eigenvalues around one, stated by Theorem 4.1, and (2) by the ability to exactly (and economically) factorizeÃD −1ÃT =: L D 0 L T .

An inexact normal equations preconditioner
Following [11] we consider the following augmented system after regularization.
In (5.1) two positive and decreasing sequences ρ k and δ k have been introduced in order to obtain a regularized interior point method. In the sequel we will assume that both ρ k and δ k are O(μ k ) i.e. go to zero as the barrier parameter. We remind the important feature of the matrix k : as the method approaches an optimal solution, the positive diagonal matrix has some entries that (numerically) grow like μ −1 k , while others approach zero. By observing the matrix in (5.1), we can immediately see the benefits of using regularization in IPMs. On one hand, the dual regularization parameter δ k ensures that the system matrix in (5.1) is invertible, even if A is rank-deficient. On the other hand, the primal regularization parameter ρ k controls the worst-case conditioning of the (1, 1) block of (5.1), improving the numerical stability of the method (and hence its robustness). We refer the reader to [1,59,60] for a review of the benefits of regularization in the context of IPMs.
The normal equations, at a generic step k of the Interior Point method, read as follows: In order to employ preconditioned MINRES or CG to solve (5.1) or (5.2) respectively, we must find an approximation for the coefficient matrix in (5.2). To do so, we employ a symmetric and positive definite block-diagonal preconditioner for the saddle-point system (5.1), involving approximations for the negative of the (1,1) block, as well as the Schur complement H N E . See [45,51,68] for motivation of such saddlepoint preconditioners. In light of this, we approximate Q in the (1,1) block by its diagonal, i.e.Q = diag(Q) and define the diagonal matrixG k as Then, we define the diagonal matrix E k with entries (dropping the index k in E k andG k ) The dropping threshold in (5.4) guarantees that a coefficient in the diagonal matrix ( −1 +Q + ρ k I n ) −1 is set to zero only if it is below a constant times the barrier parameter μ k . As a consequence fewer outer products of columns of A contribute to the normal equations, and the resulting preconditioner M N E is expected to be more sparse than H N E . This choice is also crucial to guarantee that the eigenvalues of the preconditioned normal equations matrix are independent of μ. Before discussing the role of the constant C k , let us first address the preconditioning of the augmented system matrix in (5.1). The matrix M N E acts as a preconditioner for CG applied to the normal equations. In order to construct a preconditioner for the augmented system matrix in (5.1), we employ a block-diagonal preconditioner of the form: withG k and M N E defined in (5.3) and in (5.5), respectively. Note that MINRES requires a symmetric positive definite preconditioner and hence many other block preconditioners for (5.1) are not applicable. As we will show in the sequel, following the developments in [60], forcing the regularization variables δ k , ρ k to decrease at the same rate as μ k is numerically beneficial, will provide the spectrum of the preconditioned normal equations to be independent of μ k ; a very desirable property for preconditioned systems arising from IPMs.

Spectral analysis. LP or separable QP cases
In this section we provide a spectral analysis of the preconditioned normal equations in the LP or separable QP case, assuming that (5.5) is used as the preconditioner. Although this is a specialized setting, we may make use of the following result in our analysis of the augmented system arising from the general QP case.
Let us define this normal equations matrixH N E , using the definition ofG k (5.3), asH The following Theorem provides lower and upper bounds on the eigenvalues of M −1 N EH N E , at an arbitrary iteration k of Algorithm IP-PMM.
Proof The eigenvalues of M −1 N EH N E must satisfy Multiplying (5.8) on the left by u T and setting z = A T u yields For every vector u in the nullspace of A T we have z = 0 and λ = 1. The fact that both E k andG k − E k 0 (from the definition of E k ) implies the lower bound. To prove the upper bound we first observe that, in view of (5.4), Remark 1 Following the discussion in the end of the previous section, we know that μ k δ k = O(1), since IP-PMM forces δ k to decrease at the same rate as μ k . Combining this with the result of Theorem 5.1 implies that the condition number of the preconditioned normal equations is asymptotically independent of μ k .

Remark 2
In the LP case (Q = 0), or the separable QP case (Q diagonal), Theorem 5.1 characterizes the eigenvalues of the preconditioned matrix within the CG method.

BFGS-like low-rank update of the M NE preconditioner
Given a rectangular (tall) matrix V ∈ R m× p with maximum column rank, it is possible to define a generalized block-tuned preconditioner M satisfying the property so that the columns of V become eigenvectors of the preconditioned matrix corresponding to the eigenvalue ν. A way to construct M (or its explicit inverse) is suggested by the BFGS-based preconditioners used e.g. in [7] for accelerating Newton linear systems or analyzed in [50] for general sequences of linear systems, that is We notice that BFGS-like updates for sequences of linear systems have also been investigated e.g. in [31,38]. Note also that if the columns of V would be chosen as e.g. the p exact rightmost eigenvectors of M −1 N EH N E (corresponding to the p largest eigenvalues) then all the other eigenpairs, Finally, by computing approximately the rightmost eigenvectors, we would expect a slight perturbation of λ 1 , . . . , λ m− p , depending on the accuracy of this approximation. For a detailed perturbation analysis see e.g. [70].
Updating a given preconditioner and reusing it, in the framework of solution of sequences of linear systems, have been analyzed in various papers. Among the others we quote [71] in which an adaptive automated procedure is devised for determining whether to use a direct or iterative solver, whether to reinitialize or update the preconditioner, and how many updates to apply. In [69] the update involves an inexact LU factorization of a given matrix in the sequence, [46] in the context of slightly varying coefficient matrices in the sequence (which is not the case in IP optimization). More related to the Interior Point iteration are [3,4] where the L DL T factorization of the saddle point matrix is updated at each IP iteration, and cheap approximations of the initial Schur complement matrix are provided.

Spectral analysis. QP case
In the MINRES solution of QP instances the system matrix is while the preconditioner is The following Theorem will characterize the eigenvalues of M −1 AS H AS in terms of the extremal eigenvalues of the preconditioned (1,1) block of (5.1),F −1 k F k , and of M −1 N EH N E as described by Theorem 5.1. We will work with (symmetric positive definite) similarity transformations of these matrices defined aŝ and set Hence, an arbitrary element of the in the range of the Rayleigh Quotient of these matrices is represented as: Similarly, an arbitrary element of q(M N E ) is denoted by

Theorem 5.2 Let k be an arbitrary iteration of IP-PMM. Then, the eigenvalues of M −1 AS H AS lie in the union of the following intervals:
Proof The proof of this theorem, following an idea in [5], can be found in [11].

Remark 5.3
It is well known that a pessimistic bound on the convergence rate of MINRES can be obtained if the sizes of I − and I + are roughly the same [40]. In our case, as usually β F β N E , we can assume that the length of both intervals is roughly √ β N E . As a heuristic we may therefore use [30,Theorem 4.14], which predicts the reduction of the residual in the M −1 AS -norm in the case where both intervals have exactly equal length. This then implies that

Multiple BFGS-like updates of the constraint preconditioner
In order to avoid the factorization (1.5) at a certain IP iteration k, we construct a preconditioner for the KKT system at that iteration by updating a CP computed at an iteration i < k. We extend to KKT systems and CPs the preconditioner updating technique for SPD matrices presented in [39], which in turn exploits ideas from [52,67]. Let us consider, as before, with S 1 ∈ R n×q such that rank(S 1 ) = q, AS 1 = 0.  (6.5) which is analogous to the BFGS update discussed in [39] for SPD matrices. It is easy to see that the previous rank-2q update allows the preconditioned matrix to have at least q eigenvalues equal to 1 by directly verifying that

+(I − S(S T H S) −1 S T H )P(I − H S(S T H S) −1 S T ),
We also prove most important property of M upd i.e. that the rank-2q update (6.4) (or, equivalently, (6.5)) produces a CP. To this aim, define where L S is the lower triangular Cholesky factor of the matrix S T H S.

Theorem 6.1 The matrix M upd given in (6.4) is a CP for the matrix H in (3.2).
Proof We show that the update (6.4) involves only the (1, 1) block of M and that M upd is nonsingular; hence the thesis holds. Let us split S into two blocks: From (6.3) it follows that AS 1 = 0. Then, Likewise, we have It follows that In order to prove that M upd is nonsingular, we consider a matrix Z ∈ R n×(n−m) whose columns span the nullspace of A and prove that Z T (D + − )Z is SPD (see, e.g., [25]). We observe that where we set Then, by using the Sherman-Morrison-Woodbury formula, we have which implies that D upd is SPD. This concludes the proof.
The unit eigenvalues of the preconditioned matrix P upd H , in view of Theorem 3.1, are those of (Z T G Z)w = λ(Z T (D + − )Z )w, (6.11) where Z ∈ R n×(n−m) spans the nullspace of A. Following [8] the following results on the eigenvalue distribution of P upd H can be proved: Theorem 6.2 Let P upd be the matrix in (6.5). Then P upd H has an eigenvalue at 1 with multiplicity at least 2m + q.
The next theorem shows that the nonunit extremal eigenvalues of D −1 upd G are bounded by the extremal eigenvalues of D −1 G. Thus, by Theorem 3.1, we expect the application of P upd to H to yield better spectral properties than the application of P. Theorem 6.3 Let D upd be as in (6.9), then any eigenvalue of D −1 upd G satisfies Proof See [8].
We defer a discussion on the choice of columns of matrix S in Sect. 7.3.

Numerical results
Extensive numerical results regarding all the proposed inexact block preconditioners can be found in [8,[11][12][13]. We will briefly report some examples in which the advantages for the proposed inexact CP variants are evident. The problems we will consider, whose characteristics are reported in Table 1, are taken from the CUTEst [37] collection or they are modifications of them.

Inexact jacobian
In the definition of preconditioner M I J we used the following dropping rule to determine matrix E: where with A j we denote the jth column of A.
In other words, we drop an element from matrix A if it is below a prescribed tolerance and outside a fixed band. The first requirement prevents E from becoming too large with consequent going away of the eigenvalues from the unity (see the bounds in Theorem 4.1). The second requirement attempts to control the fill-in of A A T and hence of its Cholesky factor L.
The following results refer to the runs on an Intel Xeon PC 2.80 GHz with 2 GB RAM of a pure FORTRAN version of the HOPDM code [35], with the g77 compiler and the -O4 option. We solved the SQP2500_2 problem using both the exact CP (with PCG as the inner solver) and with the Inexact Jacobian Constraint Preconditioner (IJCP -in combination with QMRs [32]).
The results are reported in Table 2 which shows the impressive gain in terms of CPU time provided by the inexact preconditioner. The number of inner iterations are (clearly) larger using IJCP, however the extreme sparsity of the preconditioner reduces considerably the cost of a single iteration.
To show how the drop and nband values may affect the distribution of the eigenvalues of M −1 I J H , we have plotted in Fig. 1 all the eigenvalues of the preconditioned matrix on the complex plane. In the "no-drop" case (exact constraint preconditioner), the eigenvalues are all real (the ones are 4007 > 2m = 4000, as expected). With  This is also shown in Table 3, where we report the number of unit eigenvalues, the maximum distance from the unity (| |, see Theorem 4.1), and the smallest real part of all the eigenvalues.

Inexact NE preconditioner
The experiments reported in this Section were conducted on a PC with a 2.2GHz Intel Core i7 processor (hexa-core), 16GB RAM The MATLAB version used was R2019a. We set the tolerance for the PCG/MINRES relative residual to 10 −4 and 100 (300) the maximum number of PCG (MINRES) iterations.

Low-rank updates and dynamic refinement
At each IP iteration we check the number of non-zeros of the preconditioner used in the previous iteration. If this number exceeds some predefined constant (depending on the number of constraints m), we perform certain low-rank updates to the preconditioner, to ensure that its quality is improved, without having to use very much memory. In such a case, the following tasks are performed as sketched in Algorithm LRU-0. Then, at every Krylov iteration, the computation of the preconditioned residualr = M −1 r requires the steps outlined in Algorithm LRU-1.
Algorithm LRU-0 Low-Rank Updates-0: Before the Krylov Solver Iteration Algorithm LRU-1 Low-Rank Updates-1: In our implementation, the first step of Algorithm LRU-0 is performed using the restarted Lanczos method through the inbuilt MATLAB function eigs, requesting 1-digit accurate eigenpairs. We finally remark that a good approximation of the largest eigenvalues of the preconditioned matrix could be extracted for free [9] during the iterative solution of the correction linear system and used them to accelerate the predictor  linear system by the low-rank correction. This approach, would save on the cost of computing eigenpairs but would provide acceleration in the second linear system only.
The quality of both preconditioners in (5.6) and (5.5) depends heavily on the quality of the approximation of the normal equations. If the Krylov method converged fast in the previous IP-PMM iteration (compared to the maximum allowed number of Krylov iterations), while requiring a substantial amount of memory, then the preconditioner quality is lowered (i.e. C k+1 > C k ). Similarly, if the Krylov method converged slowly, the preconditioner quality is increased (i.e. C k > C k+1 ). If the number of non-zeros of the preconditioner is more than a predefined large constant (depending on the available memory), and the preconditioner is still not good enough, we further increase the preconditioner's quality (i.e. we decrease C k ), but at a very slow rate. This behavior is observed in practice very close to the IP convergence especially on large scale problems.
To show the efficiency of this variant we first present in Table 4 the results of the M N E preconditioner for the LP-NUG-20 and LP-NUG-30 problems.
From this table we can see that the M N E preconditioner (plus low-rank correction) is able to keep very low the number of inner iterations in both problems. Note that the largest instance can not be solved by an exact Constraint Preconditioner. In order to clarify the benefit of the low-rank (LR) updates in Table 5 we report the results in solving these linear systems with the low-rank strategy and different accuracy/number of eigenpairs (LR( p, tol) meaning that we approximate p eigenpairs with eigs with a tolerance tol). The best choice, using p = 10 and 0.1 accuracy, improves the M N E preconditioner both in terms of linear iterations and total CPU time.
To summarize the comparison of the two approaches, we include Fig. 2. It contains the performance profiles of the two methods, over the 26 largest linear programming problems of the QAPLIB, Kennington, Mittelmann, and Netlib libraries, for which at least one of the two methods was terminated successfully. In particular, in Fig. 2a  As one can observe, IP-PMM with factorization was able to solve only 84.6% of these problems, due to excessive memory requirements (namely, problems LP-OSA-60, LP-PDS-100, RAIL4284, LP-NUG-30 were not solved due to insufficient memory). As expected, however, it converges in fewer iterations for most problems that are solved successfully by both methods. Moreover, IP-PMM with PCG is able to solve every problem that is successfully solved by IP-PMM with factorization. Furthermore, it manages to do so requiring significantly less time, which can be observed in Fig. 2a. Notice that we restrict the comparison to only large-scale problems, since this is the case of interest.
Finally, with regards to convex quadratic problems, the IP-PMM method with MINRES as the inner iterative solver and M AS as the preconditioner was able to solve more than 99% of the Maros-Mészáros test set [49], which is comprised of 127 convex quadratic programming problems. See [11] for details.

Low-rank update of an exact CP
To experimentally analyze this CP variant, we first focus on the choice of the matrix S needed for building P upd for the k-th KKT system in the sequence under consideration, by suggesting a number of choices: BFGS-P. k-th iteration is obtained by setting its columns equal to the he first q Horthonormal directions constructed during the PCG solution of IP step k − 1. Since we have experimentally verified that the PCG directions can rapidly lose the property of being mutually H k−1 -orthogonal when H k−1 is highly ill conditioned, we perform a selective reorthogonalization of the directions forming S, during their computation. More precisely, p i is H k−1 -reorthogonalized against p l , with l < i, whenever needed. We now present a different choice of S, which deserves a detailed explanation. BFGS-C. Matrix S used to build P upd for the KKT system at the k-th iteration is obtained by setting its columns equal to the first q H-orthonormal directions constructed during the PCG solution of the current IP step k. The suffix -C in BFGS-C refers to the fact that PCG directions from the current system are used to define S. As for BFGS-P, a selective H k -reorthogonalization of the directions is performed in the first q PCG iterations.
It is easy to show [8] that in exact arithmetic the BFGS-C procedure is equivalent to PCG with preconditioner P. Thus, BFGS-C may appear completely useless. Nevertheless, it is useful in finite-precision arithmetic. Freezing the preconditioner P computed at a certain IP iteration and using it in subsequent IP iterations yields directions which rapidly and dramatically lose orthogonality. BFGS-C appears to mitigate this behavior, improving the performance of PCG.
In order to illustrate this situation, we discuss some numerical results obtained with one of the 35 KKT systems arising in the solution, by an IP procedure, of the CVXQP3 convex QP problem, with dimensions n = 20000 and m = 15000 (see Sect. 7 for the details).
We considered PCG with the following preconditioning procedures: • BFGS-C with q = 50 and seed CP recomputed every 6 IP iteration; • CP recomputed from scratch every sixth IP iteration and frozen in the subsequent five iterations (henceforth this is referred to as FIXED preconditioning procedure).
In order to make a fair comparison, we performed a selective reorthogonalization of the first 50 directions during the execution of PCG with FIXED preconditioning. We also run PCG with FIXED preconditioning without any reorthogonalization. We focus on the KKT system at the 24-th IP iteration, hence the seed preconditioner comes from the 19-th IP iteration. Figure 3 shows the normalized scalar products p T j H p l p j H p l , l = 50, j > l, (7.1) for BFGS-C and both versions of the FIXED procedure. In the latter case, we observe a quick loss of orthogonality with respect to the first q directions, even when the reorthogonalization procedure is applied. Conversely, BFGS-C appears to better preserve orthogonality. As a consequence, the number of PCG iterations corresponding to BFGS-C is smaller than in the other cases (122 iterations for BFGS-C vs 176 and 196 for FIXED with and without reorthogonalization, respectively). Applying the FIXED preconditioning approach with a selective reorthogonalization of each PCG direction with respect to the first 50 ones we achieved a worst PCG behavior than the one obtained with the BFGS-C strategy. This observation led us to consider a further preconditioning procedure. DOUBLE. We apply q PCG iterations to the current system H k u = d k by using a CP, say P (0) upd , built with the BFGS-P procedure, i.e., by updating a seed preconditioner P with the first q normalized PCG directions obtained at the (k − 1)-st IP iteration. Then we restart PCG from the last computed iterate u q , with the following preconditioner: where S C contains the normalized directions computed in the first q PCG iterations. As for the previous preconditioning procedures, a selective reorthogonalization is applied to the PCG directions used to build P (0) upd and to those used for P upd . The sequences of KKT systems have been obtained by running the Fortran 95 PRQP code, which implements an infeasible inexact potential reduction IP method [18,20,25] using as the tolerances on the relative duality gap and the relative infeasibilities 10 −6 and 10 −7 , respectively. Within PRQP, the PCG iterations have been stopped as soon as where τ 1 depends on the duality gap value at the current IP iteration (see [19] for the details). A maximum number of 600 PCG iterations has been considered. The preconditioning procedures FIXED, BFGS-P, BFGS-C and DOUBLE have been applied with different values of s and q, while the PCG solver has been also applied with the CP recomputed from scratch for each KKT system (RECOM).
The code has been implemented in Fortran 95 and run on an Intel Core i7 CPU (2.67 GHz) with 6 GB RAM and 8 MB cache. The factorization of the Schur complement has been performed by the MA57 routine from HSL Mathematical Software Library (http://www.hsl.rl.ac.uk).
In Table 6 we report some results concerning the application of the preconditioning procedures, including the FIXED one, to CVXQP3N: the cumulative number of PCG iterations (PGC iters), the relevant CPU times (in seconds), namely: for the factorization of the Schur complement (Tf-Schur), for the application of the seed preconditioner P within PCG (Ta-seed), for the preconditioner updates and the reorthogonalization steps (Tupd), and the total times (Ttot). The factorization of the Schur complement is rather expensive, and the recomputation of the CP from scratch produces by far the largest execution time, even if it yields a much smaller number of PCG iterations than the other preconditioning procedures. Furthermore, the updating procedures generally produce a significant reduction of the number of iterations with respect to the FIXED one, and hence smaller execution times. BFGS-C and DOUBLE turn out to be the most efficient procedures.
The PCG convergence histories for the KKT systems at the 24-th IP iteration, with the updating procedures and the FIXED ones for s = 8 and q max = 50, clearly show how each procedure compares with the others in terms of PCG iterations (see Fig. 4): the best preconditioning procedure is DOUBLE, followed by BFGS-P, BFGS-C and then FIXED. This is a general behavior, although sporadic failures have been observed with BFGS-P and DOUBLE in cases where BFGS-C and FIXED work.

Concluding remarks
We have presented three inexact variants of the Constraint Preconditioner for solving Convex Linear and Quadratic Programming Problems of very large size. Any of these procedures aims at reducing the complexity of the exact factorization of the CP, often prohibitive for realistic problems. We list below the pros and cons of these approaches to help driving the reader to the best choice for the problem at hand.
• Inexact Jacobian. This preconditioner is suggested when the coefficients in the matrix of constraint display variation of magnitudes and/or the matrix displays some band structure. Solvers like PCG or MINRES can not be used, and theoretical convergence estimates are problematic due to the presence of complex eigenvalues in the preconditioned matrix. • Inexact NE preconditioner. This is a promising variant which drives the sparsity of the of the Cholesky factor of the (approximate) Schur complement matrix by means of a sequence, {C k }. At each IP iteration k, C k is defined dynamically as a function of the fill-in and the number of iterations at previous iteration k−1. Krylov solvers such as PCG (Linear and Quadratic separable Problems) or MINRES (Quadratic Problems) are allowed. • Low Rank updates of an exact CP. This method can be employed whenever the exact CP factorization is expensive but applicable. It reduces considerably the main cost of the overall algorithm, which is the factorization of the Schur complement. The PCG is allowed also for the augmented system since updated CPs are still CPs (see Theorem 3.1).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.