Dynamic Non-diagonal Regularization in Interior Point Methods for Linear and Convex Quadratic Programming

In this paper, we present a dynamic non-diagonal regularization for interior point methods. The non-diagonal aspect of this regularization is implicit, since all the off-diagonal elements of the regularization matrices are cancelled out by those elements present in the Newton system, which do not contribute important information in the computation of the Newton direction. Such a regularization has multiple goals. The obvious one is to improve the spectral properties of the Newton system solved at each iteration of the interior point method. On the other hand, the regularization matrices introduce sparsity to the aforementioned linear system, allowing for more efficient factorizations. We also propose a rule for tuning the regularization dynamically based on the properties of the problem, such that sufficiently large eigenvalues of the non-regularized system are perturbed insignificantly. This alleviates the need of finding specific regularization values through experimentation, which is the most common approach in the literature. We provide perturbation bounds for the eigenvalues of the non-regularized system matrix and then discuss the spectral properties of the regularized matrix. Finally, we demonstrate the efficiency of the method applied to solve standard small- and medium-scale linear and convex quadratic programming test problems.


Introduction
In this paper, we are concerned with finding the solution of linear and convex quadratic programming problems, using an infeasible primal-dual interior point method.Such methods are called infeasible due to the fact that they allow intermediate iterates, produced by the algorithm, to be infeasible for the problem under consideration.They are called primal-dual, because they operate on both the primal and the dual space.Interior Point Methods (IPMs) deal with the inequality constraints of the problem by introducing logarithmic barriers in the objective, which penalize when any of the inequality constraints is close to being violated.At each iteration, the optimality conditions of the barrier problems are formed and one (or a few) steps of Newton method are applied to them.There is vast available literature on interior point methods and we refer the interested reader to [12] for an extended literature review.
Most implementations transform the Newton system into a symmetric indefinite system of linear equations, which when solved, determines the Newton direction.The latter constitutes the main computational effort and challenge for IPMs.At every iteration of the method, the system matrix as well as the right hand side change.There are three main reasons indicating why solving such a system can be challenging.The most obvious one, is that the dimension of such systems can be very large, which makes the task of solving them expensive in terms of processing time and memory requirements.A second important challenge, inherent in interior point methods, is that as the algorithm approaches optimality, the systems that we have to solve become increasingly ill-conditioned.Finally, a rank deficient constraint matrix can result in a singular Newton system matrix.It is well known that the latter two difficulties can be addressed by the use of some regularization technique, at the expense of solving a perturbed problem, [2].
Such regularization techniques, embedded in the interior-point framework for solving linear and convex quadratic programming problems, have been previously proposed in the literature.For example, in [1], a dynamic primal-dual regularization for interior point methods was derived.The authors solve a slightly altered symmetric indefinite system, to which a diagonal perturbation (regularization) has been introduced.This perturbation transforms the symmetric indefinite matrix into a quasi-definite one.It is proved in [28], that such matrices are strongly factorizable.Hence, the regularized system can be factorized efficiently.The authors interpreted these regularization matrices as adding proximal terms to the primal and dual objective functions.The values of these perturbations are chosen dynamically during the factorization of the system matrix, where potentially unstable pivots are regularized stronger (using some pre-specified "large" regularization value), while safer ones are almost not regularized at all.In [11], based on this proximal point interpretation given in [1], the authors proposed a primal-dual pair of regularized models, where the duality correspondence arises by setting the regularization variables as proximal terms.They observed that for specific parameter values, this primal-dual regularized model is exact, that is it yields an optimal solution which is also an optimal solution of the respective non-regularized primal-dual pair.There, the authors introduced two uniform diagonal regularization matrices whose values were tuned experimentally over a variety of problems.A similar regularization was also used in [25].It is worth mentioning that similar ideas have also been applied in IPMs suitable for general non-linear optimization problems (see [4,5]).
In this paper, we are taking a different approach.We observe that when an IPM progresses and approaches optimality, significant part of the primal-dual variables approaches zero fast and hence becomes negligible.Yet it is not straightforward how the algorithm might exploit this feature.The proposed method attempts to do so.The method dynamically chooses a suitable regularization for the symmetric indefinite system and effectively "annihilates" the effects of those parts of it, which do not contribute important information to the computation of the Newton direction.The proposed technique involves non-diagonal regularization matrices.However, their non-diagonal terms are only implicit; they do not need to be computed because they are immediately cancelled by other terms present in the linear system.Hence, the effect of adding such non-diagonal regularization is making the Newton system more sparse and therefore easier.In contrast to other previously developed approaches, this regularization is dynamically tuned based on the problem properties.We develop an approach which attempts to capture the needs of an arbitrary problem and regularize its system matrix accordingly.This alleviates the problem of finding specific regularization values that work well over a variety of problems.In general, the proposed approach is very conservative and regularizes the system as little as possible, while ensuring numerical stability.
The rest of the paper is organized as follows.In Section 2, we summarize our notation and present the adopted model, based on which, we define our regularization matrices, firstly for linear and then for convex quadratic programming problems.For both cases, we provide arguments indicating why the proposed dynamic tuning of the regularization matrices is expected to introduce a controlled perturbation to the problem.In Section 3, we provide a spectral analysis, which shows the effect of the proposed regularization and gives specific bounds for the eigenvalues of the regularized system matrix.In Section 4, we provide the algorithmic scheme along with some implementation details and numerical results, and finally in Section 5 we derive our conclusions.

Notation
Given an arbitrary symmetric square matrix Q, we denote positive semi-definiteness (positive definiteness) by Q 0 (Q 0).We denote the Euclidean norm (2-norm) as • .Any other norm, will be specified by a subscript.For example, the ∞-norm, is denoted as • ∞ .We denote by e the column vector of ones of appropriate dimension.Given a set of indices, say B, e B denotes the vector of ones with dimension equal to the cardinality of B, that is: e B ∈ R |B| .For an arbitrary matrix, say A, A B denotes the sub-matrix whose columns and rows are indicated from the set of indices B. Similarly, A BN contains rows of A that belong in B and columns of A that belong in N .Iterates of the algorithm are denoted as w k = (x k , r k , s k , y k , z k ), where k ∈ N is the iteration counter.An optimal solution of the problem, is denoted as w * = (x * , r * , s * , y * , z * ).Given a vector x ∈ R n , we denote by X ∈ R n×n the diagonal matrix that contains x in its diagonal.To simplify the notation, if a matrix, say X k , depends on the iteration k, we will omit the sub-script and state the dependence whenever it is not obvious.When an arbitrary function, say f , depends on some parameter, say η, we denote this relation as: f η (•).Given an arbitrary square matrix B, off(B) denotes the square matrix that has the same off-diagonal elements as B and has zeros in its diagonal.Similarly, diag(B) = B − off(B).The j-th diagonal element of a square matrix B will be denoted as: (B) jj .B H denotes the conjugate (Hermitian) transpose of matrix B. We denote the smallest (largest) eigenvalue of an arbitrary matrix B, by λ min (B) (λ max (B)).Similarly, the smallest (largest) singular value of an arbitrary matrix B is denoted by σ min (B) (σ max (B)).Finally, the set of all eigenvalues (spectrum) of an arbitrary matrix B, is denoted as λ(B).

Problem Formulation
We consider the following primal-dual pair of convex quadratic programming problems in the standard form: Without loss of generality, we assume that m ≤ n.Note that if Q = 0, (P)-(D) is a primal-dual pair of linear programming problems.If the problems under consideration are feasible, it can easily be verified that there exists an optimal primal-dual triple (x, y, z) satisfying the Karush-Kuhn-Tucker (KKT) optimality conditions for this primal-dual pair (see for example Prop.2.3.4 in [6]).
Our model is based on the developments in [11], [1] and [25].More specifically, by applying a generalized primal-dual proximal point method on (P), as in ( [11,5]), one can get the following pair of primal-dual regularized problems: where s ∈ R n , r ∈ R m are auxiliary variables introduced from the primal-dual application of the proximal point method and, R p 0 ∈ R n×n , R d 0 ∈ R m×m are the primal and dual regularization matrices respectively, that will be specified later.The duality correspondence follows after taking r = y − ỹ and s = x − x, where ỹ and x are estimates of the dual and primal solutions y * , x * respectively.Of course R p = 0, R d = 0 recovers the initial pair (P)-(D).In [11], the authors observe that this pair of regularized problems is exact under some conditions on the estimates x, ỹ.In such a case, an optimal solution of (P r )-(D r ) is also an optimal solution of (P)-(D).For more information about exactness of regularization, we refer the interested reader to [10].
In [11,4,5], models similar to (P r )-(D r ) are used, restricted however in the case where R p = ρI and R d = δI, for some positive values δ, ρ.It is a well known fact, proved for the first time in [23], that these regularization schemes can be interpreted as the primal-dual application of the standard proximal point method.However, our model does not specify the structure of the regularization matrices R p , R d .The only requirement is that these matrices are positive definite.As we commented previously, this model can be interpreted as the application of a generalized primal-dual proximal point method.Such methods, instead of adding the typical 2-norm in the objective function, make use of the so called D-functions.In fact, one could easily verify that any elliptic norm (defined by an arbitrary positive definite matrix) satisfies the conditions, given in [7,16], for being a D-function.
In other words, our algorithm adds an elliptic norm in the objective, instead of the typical 2-norm.The focus of the paper however, prevents us from going deeper into these matters.For more about proximal point methods, we refer the reader to [22,13,23,16,7], and the references therein.

The Newton System
In order to solve the problems presented in the previous sub-section, using interior point methods, we proceed by replacing the non-negativity constraints with logarithmic barriers in the objective.In view of the previous, we obtain the following primal-dual regularized barrier problems: in which non-negativity constraints x > 0 and z > 0 are implicit.
Forming the Lagrangian of the primal barrier problem, we get: ln(x j ). (2.3) Now, we can form the first order optimality conditions of the problems by taking the gradient of (2. 3) and equating it to zero, giving us the following block equations: By looking at the optimality conditions of the dual barrier problem, we see that the final two conditions are: We write the optimality conditions in the form of a function, F x,ỹ,µ (w) : R 3n+2m → R 3n+2m and we want to solve: at each IPM iteration, where w = (x, r, s, y, z), µ > 0 is the barrier parameter and σ ∈ ]0, 1[ is a centring parameter (which determines how fast µ is forced to decrease).We want to force µ → 0, since then, the solution of this system leads to the solution of (P r )-(D r ).Notice that (P r )-(D r ) is parametrized by the estimates x and ỹ.As observed in [11], if these estimates are close enough to some optimal solution of (P)-(D), then an optimal solution of (P r )-(D r ) is also an optimal solution of (P)-(D).At the beginning of the k-th iteration of the IPM, we have available the iterate and we choose a value for the centring parameter σ k ∈ ]0, 1[.Following the developments in [1,11,4,5], for proximal point methods, we update the estimates of x * , y * as x = x k , ỹ = y k .Next, Newton method is applied to the mildly nonlinear system (2.4).After evaluating the Jacobian of F x,ỹ,µ (w), the Newton direction is determined at each IPM iteration by solving a system of the following form: Notice that the matrices X, Z, R p and R d all depend on the iteration k of the algorithm.Once the Newton direction ∆w = (∆x, ∆r, ∆s, ∆y, ∆z) is computed, the algorithm chooses a step-length a k ∈ ]0, 1] and sets the new iterate to w k+1 = w k + a k ∆w.In order to compute the Newton direction efficiently, we want to eliminate some variables of (2.5).Since we set ỹ = y k , the second block equation of (2.5) gives: and if R d 0, we get the following relation: Similarly, by looking at the third block equation of (2.5) and substituting x = x k , we get: and if R p 0 we have that: ∆x = s k + ∆s. (2.7) Note that we always use either R d 0 or R d = 0 and similarly, either R p 0 or R p = 0. Hence, the previous two relations are either well-defined or absent.Using (2.6) and (2.7) to eliminate ∆r and ∆s, we can reduce (2.5) to the following system: Next, we proceed by eliminating ∆z.For that purpose, we have from the third row of (2.8) that: Substituting (2.9) into the first row of (2.8), we get the following reduced symmetric system (so called Augmented System): where Θ = XZ −1 .In the case of linear programming (Q = 0) or when solving quadratic separable problems (in which case Q is diagonal), it may be beneficial to further eliminate ∆x from (2.10), which will end up at the so called normal equations.However, one should note that this is not a good idea when it comes to general convex quadratic programming problems, since pivoting on the (1,1) block of (2.10) could result in a dense system, even in cases where both A and Q are sparse.
Having said that, we can eliminate ∆x by looking at the first block equation of (2.10), which gives: and by substituting (2.11) into the second row of (2.10), we get the normal equations: where in which the system matrix is symmetric and positive definite.
The proposed model differs from the one derived in [11] in that it allows the use of general positive definite regularization matrices.For example, if R p , R d are non-diagonal matrices, then this would amount to the primal and dual application of a generalized proximal point method that adds an elliptic norm in the objective, instead of the typical 2-norm that is employed in standard proximal point methods.Notice that at every iteration of the algorithm, R p , R d , x and ỹ are updated.In other words, (P r )-(D r ) represents a sequence of sub-problems.At every such sub-problem, we apply a single iteration of the interior point method.How R p and R d are updated will be presented in the following sub-section.

The Regularization Matrices
As IPM approaches optimality, the diagonal matrix Θ contains elements that converge to zero and others that diverge to infinity.This is because µ k → 0 and we force the complementarity conditions to be approximately satisfied (XZe ≈ σ k µ k e) .As a consequence, the matrices in (2.10) and (2.12) become extremely ill-conditioned.On top of that, it is often the case due to modelling choices, that the constraint matrix A is not of full row rank, which makes the system matrices singular.It is well known, as shown by Armand and Benoist [2], that both these problems can be addressed with the use of regularization.The most common approach in the literature, is the addition of two diagonal regularization matrices, say R p , R d , whose values are tuned experimentally over a variety of problems ( [25,1,11,2]).
Roughly speaking, the goals of a regularization method for IPMs are ([2, 1, 1. to improve the spectral properties of the matrices in (2.10) and (2.12), 2. without significantly perturbing the previous systems, 3. while preserving the sparsity of the problem and the computational efficiency of the method.
To the best of our knowledge, most of the regularization methods in literature manage to achieve the first and the third regularization goals, failing however to achieve the second goal with certainty.This is the case since these regularization methods are tuned experimentally.Hence, they do not rely on the properties of the problem itself, and as a consequence, such regularization values can only be good for some problems and poor for others.The proposed method takes a different approach, by introducing two non-diagonal regularization matrices R p and R d , which are tuned based on the properties of the problem.Of course one could argue that this may disturb the sparsity and as a consequence the computational efficiency of the method, however, these non-diagonal matrices are created implicitly.As we will show later, not only the sparsity is preserved, but in fact it is improved.
As we already mentioned, as IPM approaches optimality, the matrix Θ contains some very large and some very small elements.The proposed regularization exploits this inherent feature of the method and splits the columns of the problem matrix in two sets, say N and B such that: where |N | = n 1 and |B| = n 2 , with n 1 + n 2 = n.Notice that the previous splitting captures all the columns only if the method converges to a strictly complementary solution (that is the limit point satisfies: xT ẑ = 0 and xj + ẑj > 0, ∀ j).In the quadratic programming case, a strictly complementary solution may not exist.Hence, there might exist some indices j ⊆ {1, • • • , n} for which: x j → 0 and z j → 0. In such a case, it is unknown whether the value of Θ jj will be small or large.We can assume, without loss of generality, that any such indices will be classified as elements of B (although in practice this would depend on the value of Θ jj , as we will show later).Of course for the case of linear programming (Q = 0), it is a well-known fact (see for example [29]) that a strictly complementary solution always exists, if the problems are feasible.Moreover, as shown in [18,14], primal-dual IPMs converge to such an optimal solution.If a strictly complementary solution exists for the quadratic programming case, it is shown in [15], that an infeasible primal-dual IPM which reduces the constraints violation at the same rate as µ is reduced, produces iterates that converge to a strictly complementary solution.
In what follows, we present the construction of the regularization for the case of linear programming and then we suggest an extension for convex quadratic programming.

Linear Programming
For the case of linear programming we employ a dual regularization, that is, in (2.5) we set R p = 0 and only use R d 0 to improve the spectral properties of the problem.Given this set-up, and by permuting the columns so that the first n 1 of them correspond to indices in N while the remaining correspond to indices in B, the augmented system in (2.10) takes the form: where A N ∈ R m×n 1 and A B ∈ R m×n 2 .Pivoting on the first n 1 columns of (2.13), gives the partially reduced augmented system: Since we know that Θ N → 0, we expect that the magnitude of A N Θ N A T N will be small when the method approaches optimality.Intuitively, our goal is to create a regularization matrix that will implicitly absorb the off-diagonal elements of A N Θ N A T N (promoting sparsity) and regularize the system with values having a slightly larger magnitude to that of the elements which were absorbed.For this class of problems, we will focus on solving the normal equations.Given (2.14), we can form the normal equations by eliminating ∆x B , which gives the following system: We choose the following dual regularization matrix: where ∆ d is a diagonal matrix chosen such that R d 0 and diagonally dominant, that is: For computational efficiency and numerical stability, we choose ∆ d = δ d,k I m , with: Observe that the regularization matrix given in (2.15), strongly depends on the properties of the problem as well as on the iteration k of the IPM.In order to control which elements enter the set N , at every iteration k, we enforce the following condition: where reg thr,k is set to 1 at the beginning of the optimization (k = 0), and is decreased at the same rate as µ k (i.e.reg thr,k = O(µ k )).Once reg thr,K becomes smaller than a predefined value, say > 0, for some large K ≥ 1, we fix it to this value (reg thr,k = , ∀ k ≥ K).The choice of will be specified later.Note that (2.17) ensures that δ d,k < reg thr,k , at every iteration.In order to show that sparsity is improved, we form again the normal equations' matrix using the definition of R d to get: From the previous one can easily observe that the sparsity of the normal equations is improved, since some off-diagonal elements of the matrix have been absorbed by the regularization.
Since reg thr,k is not allowed to go to zero as µ k → 0, we would like to know how much we perturb the Newton system, by having it fixed to some value > 0, when the method is close to optimality.
In the rest of this subsection, we compute some perturbation bounds, which depend on the value of reg thr .

Motivation
Now that we have defined the regularization matrix for the case of linear programming problems, let us provide a motivation for this choice.Firstly, note that the proposed regularization has multiple objectives.On the one hand, we want to find a good criterion for tuning a uniform dual regularization matrix δ d,k I based on the properties of the problem, such that the non-regularized problem matrix is not perturbed significantly while its spectral properties are improved.On the other hand, we use this uniform dual regularization value as a cut-off point, for dropping the smallest off-diagonal elements in the normal equations matrix, improving the computational efficiency of the method.In what follows we will provide an analysis indicating why the uniform dual regularization that we introduce is expected not to perturb the problem significantly.Then, we will show that further dropping the off-diagonal elements introduces a controlled perturbation.
Based on the previous, let us assume for now that R d = δ d,k I, where δ d,k is defined as in (2.16).For simplicity of notation, we omit the iteration subscript in δ d and we let: We want to analyse the difference in the eigenvalues of the matrices M and M + E. For the rest of this sub-section, let λ i denote the i-th smallest eigenvalue of M , λi the i-th smallest eigenvalue of M +E, and λ i (t) the i-th smallest eigenvalue of M +tE, with t ∈ [0, 1].The smallest eigenvalues of M (in the absolute value sense) will be increased after the addition of E and this is of course desirable, since this was the main motivation for introducing the regularization.The following analysis provides perturbation bounds only for eigenvalues of M that satisfy |λ i | > 2 E .We will assume also that the eigenvalues that we analyse are simple (i.e.their algebraic multiplicity is 1).The analysis can be extended to multiple eigenvalues, however it gets unnecessarily complicated.Such an analysis is derived in the appendix of [20].Let us now state a lemma derived in [27].
Lemma 2.1.Let M , E be square Hermitian matrices.Denote by λ i (t) the i-th smallest eigenvalue of M + tE and consider the eigenvector function x(t) such that: As observed in [20], if the eigenvector x(t) has small components in the positions corresponding to the dominant elements of E, then ∂λ i (t) ∂t is expected to be small.Let us now provide the following lemma, based on the developments in [8].
Proof The proof follows exactly the developments in [8], but we provide it here for completeness.
From the second block equation of M x = λ i x, we have: where the latter is well defined since we have assumed that λ i = 0.By taking norms on both sides in the previous equation, we get: . Hence, we have: By solving the previous inequality, we get: which completes the proof.
The following lemma will be a useful tool for the analysis.We omit its trivial proof.
, where a > 0.Then, f (x) is a monotone increasing function for x > 0.
Let us now bound the second block of the eigenvector function x 2 (t) based on the developments in [8].
Lemma 2.4.Assume that λ i = 0 is the i-th smallest eigenvalue of M .Consider the eigenvector function x(t) such that: H ] H and assuming that |λ i | > 2 E , we have that: Proof We omit the proof which follows from Lemma 2.3 combined with the previous developments.
The interested reader can view [8], Lemma 2.8, for a detailed derivation which can directly be applied in our context.
Let us now derive the following theorem which bounds the difference between the i-th smallest eigenvalues of the matrices M and M + E respectively.Theorem 2.1.Let λ i and λi be the respective i-th smallest eigenvalues of M and M + E and define For every i such that |λ i | > 2 E we have that: Proof From Lemma 2.1 and Lemma 2.4 it follows that: The proof is complete.Note that, since φ i < 1, the latter is a tighter bound than the general bound provided by Weyl's inequality, given that the eigenvalue under consideration is larger than 2 E .From the previous results we can draw several useful observations.As we already stated, the smaller the components of x 2 (t) are, the smaller ∂λ i (t) ∂t is expected to be.But x 2 (t) is bounded by φ i .Hence, the smaller φ i is, the more insensitive the eigenvalue λ i is to the perturbation E = δ d .In fact, in the previous theorem we proved that the error in the eigenvalue is bounded by E φ 2 i .
Let us now examine the nature of φ i .Firstly, one can see that it depends on the norm of the constraint matrix A, and from Lemma 2.3 we can observe that it is monotone increasing with respect to the norm of A. What this tells us, is that the smaller the norm of the constraint matrix A is, the more insensitive the eigenvalues of matrix M are to the perturbation E. Of course the latter holds only for eigenvalues that are sufficiently larger than 2 E .On the other hand, from the definition of φ i , we can see that it is beneficial to have a small E , since then, most of the eigenvalues of M are expected to satisfy: We now shift our attention to the proposed tuning of the regularization parameters.From (2.17), the set of indices N is such that: max j (Θ N ) jj AA T ∞ ≤ reg thr .Also, from (2.16), we have that . By combining the previous, we get: Observe that, if AA T ∞ is large, we allow few columns to enter the partition N .In this case, φ i is expected to be close to 1 for most of the eigenvalues λ(M ).On the other hand, |N | is increased if the infinity norm of AA T is small, and in such a case, φ i is expected to be small for many eigenvalues of the system matrix M .A more sophisticated choice for the regularization value based on the derived bounds is possible, however, the proposed regularization has two goals, that is not to perturb the system significantly while introducing sparsity to the problem, and hence the definition of δ d is computationally advantageous for that.Note that taking advantage of the previously presented bounds indicates that the sufficiently large (in the absolute value sense) eigenvalues of the system matrix ( 2δ d ) will be perturbed almost insignificantly.If some eigenvalues of the matrix are very small, the previous arguments break down.We will derive lower bounds for these eigenvalues in the next section.
Having introduced the diagonal uniform regularization δ d I, let us examine the effect of further dropping the off-diagonal elements off(A N Θ N A T N ) from the normal equations (2.12).For that, we define K = AΘA T + δ d I and R = off(A N Θ N A T N ) and consider the following generalized eigenvalue problem: The previous is well defined since K 0. We will analyse the eigenvalues of Then from (2.18) and for some eigenvector u corresponding to the maximum eigenvalue, we would have: By adding u T diag(A N Θ N A T N )u to both sides of the previous inequality, we get: which is a contradiction.Hence λ max (K − 1 2 RK − 1 2 ) < 1.On the other hand, if we assume by contradiction that λ min (K − 1 2 RK − 1 2 ) ≤ −1, from (2.18) and for an eigenvector u corresponding to the minimum eigenvalue, we would get: However, using (2.16), we get δ d + R 0, hence −δ d u T u < u T Ru, which contradicts the previous inequality.Hence, λ min (K − 1 2 RK − 1 2 ) > −1.Now, one can easily observe that: where ρ(•) is the spectral radius, and hence the eigenvalues of K −1 (K − R) are clustered around 1.This supports the claim that further dropping the off-diagonal elements of the part of the normal equations corresponding to indices in N , after adding a uniform dual regularization, introduces a controlled perturbation.

Quadratic Programming
Unlike the case of linear programming, for the case of quadratic programming we employ a primaldual regularization, that is, we use both R p 0 and R d 0, as shown in (2.5), to improve the spectral properties of the problem.For this case, we modify the condition for allowing a column to enter the set N , and at each iteration k, in place of (2.17), we require: where reg thr,k is updated as indicated in the linear programming case (sub-section 2.4.1).As before, by permuting the columns so that the first n 1 correspond to indices in N while the remaining ones correspond to indices in B, the augmented system in (2.10) takes the form: where and the permuted matrix Q is: being the respective blocks of the matrix Q, while R pN ∈ R n 1 ×n 1 and R pB ∈ R n 2 ×n 2 are the only two non-zero blocks of the block-diagonal primal regularization matrix R p .As we mentioned earlier, when we solve general convex quadratic programming problems, it is dangerous to eliminate the (1,1) block of (2.10) and solve the problem using (2.12), since the latter system may become dense.However in the linear programming case, our regularization matrix was tuned based on the properties of the normal equations.In order to overcome this problem, we introduce a primal regularization that can absorb the non-diagonal elements of the (1,1) block of the permuted augmented system (2.20).This allows us to safely (from the sparsity and computational point of view) pivot on this block and perform the analysis in a similar manner as in the linear programming case.Hence, we define: with where ∆ pN ∈ R n Pivoting on the (1,1) block of (2.20) results in the following partially reduced augmented system: where: Using a similar reasoning as before, we will tune the matrix R pB so that sparsity is promoted.By looking at the (1,1) block of (2.23), one can see that an obvious choice for this matrix would be: with where ∆ pB ∈ R n 2 ×n 2 is a uniform diagonal matrix, which ensures that R pB 0 and diagonally dominant.Finally, by looking at the (2,2) block of (2.23), we can define R d in a similar manner as in the linear programming case as: with where again ∆ d ∈ R m×m is a uniform diagonal matrix, which ensures that R d 0 and diagonally dominant.Note that condition (2.19), which defines columns qualified to enter N , ensures that the positive elements of the diagonal matrices ∆ pB , ∆ d will be strictly less than reg thr,k , at every iteration k of the algorithm.

Motivation
As in the linear programming case, let us provide the motivation for the previously presented regularization scheme.We will derive some useful bounds that extend those provided in the motivation paragraph for the linear programming regularization.All the bounds stated here are direct applications of the results obtained in [8] and for simplicity are given without proofs.Let: and denote by λ i and λi the i-th smallest eigenvalues of M and M + E respectively.Note that ∆ p is a permuted n × n diagonal matrix, comprised of the two uniform primal regularization matrices H , it can be proven as before that: A counterpart of Lemma 2.4 for this case follows from [8] and states that if we have: where For a detailed derivation of the previous results, the interested reader can look at [8], Lemmas 2.8, 2.9.
Finally, the counterpart of Theorem 2.1 for this case states that for each i such that: i .These bounds are slightly less intuitive than the ones provided for the linear programming case, however similar arguments to those used in the linear programming case can be employed here, supporting the claim that the uniform regularization that we introduce does not perturb the sufficiently large (in the absolute value sense) eigenvalues of the non-regularized system significantly.The main reason why we provide these bounds is for completeness.We could proceed by showing, as in the linear programming case, that further dropping off (as the proposed non-diagonal regularization suggests) alters the eigenvalues of the diagonally regularized system in a controlled way, but for ease of presentation we omit this for a future study.

Rank deficient matrices and the value of
Notice that both in the linear and the quadratic programming case, during some iterations of the IPM, no columns will satisfy the respective conditions for entering N .In order to ensure that rank deficiency will not get in the way of the proposed method, at every such iteration k, we apply a uniform dual regularization R d = reg thr,k I m , where reg thr,k is updated as stated in sub-section 2.4.1.In the quadratic programming case, we also include a uniform primal regularization R p = reg thr,k I n .We expect that sufficiently large (in the absolute value sense) eigenvalues ( 2•reg thr,k ) of the system are perturbed insignificantly by using such a uniform regularization.Once at least one column enters N , we drop this uniform regularization, and start using the regularization matrices presented in this paper.
Notice that reg thr is not allowed to decrease more than a pre-specified value > 0. We set this to: = max{ 0.1•tol A 2 , 10 −13 }, where tol is the error tolerance for successful termination of the algorithm and is usually set to the values 10 −6 or 10 −8 .This value is based on the bounds derived in the motivation paragraphs presented both for the linear and the quadratic programming case, so that φ 2 i is small.

Spectral Analysis
This section focuses on analysing the spectral properties of the regularized systems provided in the previous section.As before, the analysis is split into linear and quadratic programming respectively.For each of these cases, we will provide the spectral properties of the respective augmented and partially reduced augmented system, showing the effectiveness of the proposed regularization method.

Linear Programming
For linear programming problems, we employ only dual regularization, that is we set R p = 0 and use only R d 0. In sub-section 2.4.1, it was noted that ∆ d is chosen such that R d 0 and diagonally dominant.This is very easy to see, by looking at the definition of (2.16) combined with (2.15).Since R d is diagonally dominant, we are able to invoke the Gershgorin Circle Theorem, which states that if: then any eigenvalue of R d is positive and lies in one of the following discs: where λ i represents the i-th eigenvalue of R d .On the other hand, by construction, we know that Let us now analyse the spectral properties of the matrix in (2.13).For that we provide the following theorem, which gives bounds for the eigenvalues of the system.The proof is based on the developments in [26] and [24].
Theorem 3.1.For all (x, z) > 0 and R d as defined in (2.15), the coefficient matrix of (2.13) has exactly n negative and m positive eigenvalues.Order and denote them as: These eigenvalues satisfy the following bounds: In case rank(A) < m, the eigenspace of the eigenvalues originating only from R d is {0} × Null(A T ) and there are m − rank(A) such eigenvalues.
Proof Firstly, from Sylvester's law of inertia we know that since Θ and AΘA T + R d are positive definite, the regularized augmented system matrix of (2.13) possesses precisely n negative and m positive eigenvalues.If µ is an eigenvalue of the linear system matrix of (2.13), then there are vectors u ∈ R n and p ∈ R m that cannot both be zero, using which the eigenvalue problem can be written in the following form: As observed in [11], if rank(A) < m, there are some eigenvalues of the matrix in (2.13), that satisfy: R d p = µp.The associated eigenspace is {0} × Null(A T ).
If µ < 0 then u = 0 since otherwise p = 0 because R d 0. On the other hand, if µ > 0 then p = 0 since otherwise u = 0 because Θ −1 0. Taking the inner product of the first equation of (3.2) with u, and the second equation with p and subtracting the former from the latter gives: Using the fact that Θ −1 0, along with R d 0, and assuming that µ < 0 (i.e.u = 0): (min where the inequality follows because the left hand side is as small as possible and we dropped the positive term p T R d p.But since µ < 0 in this case, we know that − min j (Θ −1 ) jj > µ = µ −1 .Furthermore, if µ < 0 then we know that R d − µI 0. Hence it is invertible and we can solve the second equation of (3.2) with respect to p, substitute the result in the first equation and take the inner product with u to get: Hence: where we observed that the left hand side has negative terms, took the most negative possible values for these terms and divided by u T u.Note that for the second term of the left hand side, we used the fact that for two positive definite matrices A, B, we have that λ min (A + B) ≥ λ min (A) + λ min (B).Solving the previous inequality with respect to µ (and using the roots of the second order equation), we get that: Now, for the case where µ > 0 (where we know that p = 0), we solve the first equation of (3.2) with respect to u, substitute the result in the second one and take the inner product with p, to get: Observe that λ max (( Given that all the terms on the left hand side are positive, we can take upper bounds for every term, multiply everything by µ (since µ > 0) and divide both sides by p T p.This gives us the following second order inequality with respect to µ: Solving the previous quadratic inequality, gives: where we used the right-most upper bound given in (3.1).Working similarly using the same equation but slightly altered, that is: and by taking lower bounds on each term of the left hand side and re-arranging them, we get the following inequality: Solving the previous gives us the last bound: which completes the proof.Below we provide an analogous theorem applied to the matrix of (2.14).Again, we use the definition of R d that is given in (2.15).With this in mind, we know that the (2,2) block of (2.14) is comprised of two diagonal matrices, i.e.: 16).The proof is similar to that of the previous theorem, and hence it is not provided here.Theorem 3.2.For all (x, z) > 0 and R d as defined in (2.15), the coefficient matrix of (2.14) has exactly n 2 negative and m positive eigenvalues.Order and denote them as: These eigenvalues satisfy the following bounds: In case rank(A B ) < m, the eigenspace of the eigenvalues originating only from D * is {0} × Null(A T B ) and there are m − rank(A B ) such eigenvalues.Now we can compare the bounds given in Theorems 3.1 and 3.2 and observe clear advantages of using the partially reduced augmented system (2.14) over the full augmented system (2.13).Firstly, one can easily note that − min j (Θ −1 B ) jj = − min j (Θ −1 ) jj , hence the bound for the largest negative eigenvalue is identical for both systems.However, there are two main differences: 1. We have that max j (Θ −1 B ) jj ≤ max j (Θ −1 ) jj (and usually max j (Θ −1 B ) jj max j (Θ −1 ) jj ).As a consequence the bound on the most negative eigenvalue of (2.13) will be larger (in the absolute value sense), than the bound on the respective eigenvalue of (2.14).
2. Our guaranteed lower bound for the minimum eigenvalue of R d is smaller than the respective lower bound for the minimum eigenvalue of D * .In fact, where δ d is defined in (2.16) and the second lower bound is given in (3.1).By construction the first bound is better.As a consequence, the smallest positive eigenvalue of (2.14) is guaranteed to be at least as large as δ d .

Quadratic Programming
For quadratic programming problems we employ a primal-dual regularization.In subsection 2.4.
• For R d , we are able to invoke the Gershgorin circle theorem as in the linear programming case stating that if: then any eigenvalue of R d is positive and lies in one of the following discs: where , where λ i is the i-th eigenvalue of R d .On the other hand, by construction we know that • For R pB , we apply the same theorem however in this case we have: and any eigenvalue of R pB is positive and lies in one of the following discs: where As before, we know that: where λ i is the i-th eigenvalue of R pB .
• Finally, we can work similarly to examine the spectral properties of R pN .Again by letting: any eigenvalue of R pN is positive and lies in one of the following discs: where , where λ i is the i-th eigenvalue of R pN .But since Q N 0 as a principal minor of Q 0, we know that if a diagonal element of Q N is zero, then its respective column and row are also zero.Hence this implies tighter final bounds, that is: Let us now analyse the spectral properties of (2.20).For that we provide the following theorem, which is the extension of Theorem 3.1 for the QP case.The proof is almost identical and hence it is not provided here.For notational convenience, let: Theorem 3.3.For all (x, z) > 0 and R d , R pB , R pN as defined in (2.26), (2.24) and (2.21) respectively, the coefficient matrix of (2.20) has exactly n negative and m positive eigenvalues.Order and denote them as: These eigenvalues satisfy the following bounds: In case rank(A) < m, the eigenspace of the eigenvalues originating only from R d is {0} × Null(A T ) and there are m − rank(A) such eigenvalues.
Below we provide a similar theorem, applied to (2.23).For that, we will use R d , R pB , R pN as defined in sub-section 2.4.2 as well as the respective eigenvalue bounds given in (3.3), (3.4) and (3.5).Using the definitions of the regularization matrices, we know that the matrix in the (1,1) block of (2.23) takes the form: ), while the (2,2) block of (2.23) becomes: For all (x, z) > 0 and R d , R pB , R pN as defined in (2.26), (2.24) and (2.21) respectively, the coefficient matrix of (2.23) has exactly n 2 negative and m positive eigenvalues.Order and denote them as: These eigenvalues satisfy the following bounds:

In case rank(A
3. As in the LP case, our guaranteed lower bound for the minimum eigenvalue of R d is smaller than the respective lower bound for the minimum eigenvalue of D * .In fact, where we use δ d as defined in (2.27), while the last inequality follows from (3.3).By construction, the first bound is better.As a consequence, the smallest positive eigenvalue of (2.23) is guaranteed to be at least as large as δ d .
4 Implementation and Numerical Results

The Algorithmic Framework
At this point, we are providing a generic algorithm (IPM-NDR), summarizing the infeasible primaldual IPM with non-diagonal regularization.The algorithm solves the Newton system arising from the optimality conditions of (2.1)-(2.2),at each iteration, using a direct method.Note that this is just a general outline and does not contain the actual details of the implemented method.Implementation details will be presented in the next sub-section.Note that in the algorithm, we make the distinction between linear and quadratic programming problems, by using the logical variables LP and QP, respectively.

Implementation Details
We implemented the algorithm in Matlab.Our implementation solves linear and convex quadratic programming problems in the standard form.However, all the free variables are treated as variables bounded by some box constraints.We set some initial bounds, for all the free variables.If the method pushes some of these variables to take values outside of this box, then the respective bounds are increased to give space for variables to increase their values.Note that this heuristic causes that extra iterations are needed to converge for a few problems, since every time the box constraints are changed, the method loses primal feasibility.

Regularization
We set reg thr,0 = 1, and we decrease it at the same rate as µ k decreases, until it becomes smaller than = max tol•10 −1 A 2 2 , 10 −13 .Then, it takes this value and stays constant for the rest of the optimization process.As before, tol is the error tolerance specified by the user.At every iteration, we enable columns to enter the set N only if: max j∈N (Θ) jj max AA T ∞ , QQ T ∞ ≤ reg thr,k .This ensures that (∆ d ) ii , as defined in (2.16) and (2.27) for linear and convex quadratic problems respectively, is smaller than reg thr,k , ∀i ∈ {1, • • • , m}, ∀ k ≥ 0. The latter also holds for (∆ pB ) ii as in (2.25) , ∀i ∈ {1, • • • , n 2 }, which is only defined for quadratic programming problems.Of course for linear programming problems we have R p = 0. Note that during the first iterations of the method, N is usually empty.In order to avoid instability, we include a uniform dual regularization R d = reg thr,k I m .For the quadratic programming case, we also include a uniform primal regularization, that is: R p = reg thr,k I n .This uniform regularization is dropped when N is non-empty.As an extra safeguard, when the factorization of the system fails, we increase reg thr by a factor of 10 and repeat the factorization.If this process is repeated for 6 consecutive times, we stop the method.All other implementation details concerning the regularization follow from Section 2.

Newton-step computation
For general convex quadratic problems, the Newton direction is calculated from system (2.23), after computing its symmetric LDL T decomposition, where L is a lower triangular matrix and D is diagonal.For that, we use the build-in Matlab symmetric decomposition (i.e.ldl).We know that such a decomposition always exists, with D diagonal, for the aforementioned system, since after introducing the regularization, the matrix of (2.23) is guaranteed to be quasi-definite; a class of matrices known to be strongly factorizable, [28].For that reason, we change the default pivot threshold of ldl to 10 −14 .We use such a small pivot threshold in order to avoid any 2x2 pivots during the factorization routine.For linear programming problems, we solve the system (2.12) (with Q = 0), using the build-in Cholesky decomposition of Matlab (i.e.chol).∆x is then recovered from (2.11).In the quadratic programming case, ∆s is recovered from (2.7).In both cases ∆z is recovered from (2.9) and ∆r from (2.6).

Starting point
We have already mentioned that the method is infeasible and hence the starting point does not need to be primal and dual feasible.The only requirement is that the initial values of the variables x, z are strictly positive.We use a starting point that was proposed in [19].Here we will only state it for completeness.To construct this point, we try to solve the pair of problems (P), (D), but we ignore the non-negativity constraints.Such relaxed problems have closed form solutions: Then, in order to guarantee positivity and sufficient magnitude of x, z, we compute the expressions δ x = max(−1.5min{xi }, 0) and δ z = max(−1.5min{zi }, 0) and we obtain: where e is the vector of ones of appropriate dimension.Finally, we define the starting point by setting:

Centring parameter
As minimum and maximum centring parameters, we fix σ min = 0.05 and σ max = 0.95.In the first iteration we use σ 0 = 0.5.Then, at each iteration k, in order to determine the centring parameter σ k , we perform the following operations: where a k−1 x , a k−1 z are the step-lengths in directions ∆x, ∆z of the previous iteration, respectively.Then we assign: and finally The latter is a heuristic which performs well in infeasible IPM implementations.
Step-length computation In order to calculate the step-length, we apply the fraction to the boundary rule, that is we compute the largest step-lengths to the boundary of the non-negative orthant, i.e.: and we set: where τ ∈ ]0, 1[ is set to τ = 0.995.The constant τ acts as a safeguard against bad directions.Taking a full step towards a direction can potentially push the iterates of the algorithm close to the boundary.This in turn, can significantly slow down the convergence of the method.The primal variables x, r are updated using the step-length a x while the dual variables y, s, z are updated using the step-length a z .

Termination Criteria
Finally, the algorithm is terminated either if the number of maximum iterations specified by the user is reached, or when all the following three conditions are satisfied: where tol is the tolerance specified by the user.

Numerical Results
We have made a particular effort to keep the implementation as simple as possible, so that the regularization effects can easily be seen and analysed.For that reason, we applied scaling only to problems which required it to converge and this was needed only for 5 out of the 218 problems solved.On the other hand, no predictor-corrector technique was included.We tested our method on problems coming from the Netlib collection [21] as well as on a set of convex quadratic programming problems given in [17].We present the numerical results, firstly for linear programming problems and then for quadratic programming ones.In order to demonstrate the effects of the proposed regularization method, we will compare it with an interior point method that uses a uniform regularization.This uniform regularization scheme, can be interpreted as the application of a standard proximal point method, in contrast to the proposed method, which can be interpreted as the application of a generalized proximal point method.The experiments were conducted on a PC with a 2.2GHz Inter Core i5 processor (dual-core) and 4GB RAM, run under Linux operating system.The Matlab version used was R2018a.

Linear programming problems
As we have already stated, for linear programming problems we use only dual regularization, that is we set R p = 0 and s = 0 in (P r )-(D r ).For that reason, we will compare our method with an algorithm that uses a uniform dual regularization, R d = reg thr,k I m , ∀k ≥ 0, where reg thr,k is updated as indicated in the previously presented Regularization paragraph.If N = ∅, the two methods are exactly the same.Hence, the difference between the methods arises when some columns of the constraint matrix have entered the set N .The tolerance used in the experiments for the linear programming problems was tol = 10 −6 .We will not use a smaller tolerance because our method does not have primal regularization.As a consequence, if some elements of Θ B become very large, this can create numerical instability if there is no primal regularization to keep such entries manageable in terms of machine precision.As an extra safeguard, when the factorization fails, we increase the uniform regularization value by a factor of 10 until the factorization is completed successfully.Finally, we set the maximum iterations of the method to be maxit = 200.If this number is reached, the algorithm stops indicating that the optimal solution was not found.To conclude we use: tol = 10 −6 , maxit = 200.
The statistics of runs of the proposed IPM with non-diagonal regularization and of the previously mentioned IPM with uniform regularization, over the Netlib test set, have been collected in Table 1.
Notice that Table 1 contains only a sub-set of the 96 problems of the Netlib collection.All problems for which the set N stayed empty throughout the whole optimization process have been excluded.In this case, the two methods are completely equivalent.Both IPM with non-diagonal regularization and IPM with uniform regularization, solved all 96 problems of the Netlib collection.The former did so in 146,6 seconds and a total of 3322 IPM iterations.The latter needed 165,7 seconds and a total of 3442 iterations.In other words, the IPM using the proposed regularization, solved the whole set in 11.5% less time, requiring 3% less iterations.The computational benefits of the non-diagonal regularization become obvious in the larger instances of the Netlib collection.See for example problems DFL001, QAP15 in Table 1.
We also include Table 2, in which the factorization times are compared when using non-diagonal and uniform regularization respectively, over the last four iterations of problems DFL001 and GREEN-BEA.The size of the respective constraint matrices also includes columns which were added to transform the problems to the standard form.Extra information, concerning the cardinality of the partition N , the iteration count as well as the time needed to compute the Cholesky factorization of the system matrix at the respective iteration, has been collected in Table 2. Analysing the results reported in Tables 1 and 2, one can observe that while the proposed nondiagonal regularization matrix does not affect the convergence of the method, it can accelerate the factorization significantly through the sparsity that it introduces in the system matrix.Notice that for both DFL001 and GREENBEA, almost half of their columns lie in the partition N and this does not prevent the algorithm from converging.
Finally, in order to present the importance of regularization, as well as the overall comparison of the two different regularization schemes, we also include Figure 1, which contains the performance profiles, over the whole Netlib set, of three different methods.The green triangles correspond to the IPM with non-diagonal regularization.The red stars correspond to the IPM with uniform regularization, and finally the blue crosses correspond to an IPM without regularization.In Figure 1a, we present the performance profiles with respect to the total time to convergence, while in Figure 1b the performance profiles with respect to the total number of iterations.The horizontal axis (in logarithmic scale), represents the performance ratio with respect to the best performance achieved by one of the three methods for each problem.For example, 2 in the horizontal axis is interpreted as: "what percentage of problems was solved by each method, in at most 2 times the best achieved time for each problem".The vertical axis shows the percentage of problems solved by each method for different values of the performance ratio.Efficiency is measured by the rate at which each of the lines increases, as the ratio increases.Robustness is measured by the maximum percentage achieved by each of the methods.For more information about performance profiles, we refer the reader to [9], where this benchmarking scheme was proposed.By looking at Figure 1, one can observe the importance of regularization in terms of robustness of the method.The IPM scheme that does not employ any regularization, fails to solve 18.75% of the problems in the Netlib collection.On the other hand, the IPM with non-diagonal regularization is more efficient in terms of time to convergence, when compared to the other two methods.Notice that this is not the case for the IPM using uniform regularization, which is less efficient than the other two methods for 70% of the problems.As expected, the IPM that does not use regularization converges in less iterations for most of the problems that it successfully solves.This is expected, since in the regularized schemes, we are perturbing the Newton system.Obviously, this perturbation is benign, in the sense that it allows us to significantly improve the robustness of the method.

Convex quadratic programming problems
For this class of problems, we employ a primal-dual dynamic regularization.Hence, we will compare our method with an algorithm that uses a uniform primal-dual regularization.Such a method adds two uniform diagonal matrices R p = reg thr,k I and R d = reg thr,k I to the (1,1) and (2,2) blocks of the augmented system respectively.This scheme can be interpreted as the primal and dual application of the standard proximal point method, in contrast to the proposed regularization scheme, which is the primal and dual application of a generalized proximal point method.As an extra safeguard, when the factorization fails, we increase the uniform regularization value by a factor of 10 until the factorization is completed successfully.The tolerance used in the experiments for this class of problems was tol = 10 −8 .As in the linear programming case, we set the maximum iterations of the method to be maxit = 200.To conclude we use: tol = 10 −8 , maxit = 200.
For this problem set, the algorithm did not employ any scaling in the problem matrices.The computational results, obtained with the proposed non-diagonally regularized IPM and with the previously mentioned uniformly regularized IPM, over the Maros and Mészáros repository of convex quadratic programming problems, are presented in Table 3.As before, Table 3 contains only a subset of the 122 problems of the collection.All problems for which the set N stayed empty throughout the whole optimization process have been excluded.In contrast to the linear programming case, the results collected in Table 3 do not demonstrate any significant advantage in terms of sparsity of linear systems achievable by the new regularization technique.This is a consequence of the fact that the problems under consideration are of small to medium size, while the overhead of setting up the partially reduced augmented system (2.23) is time consuming in Matlab, where manipulating a permuted matrix is costly, due to Matlab's default mechanism to store matrices by columns.Nevertheless, both IPM with non-diagonal regularization and IPM with uniform regularization, solved all 122 problems.The former required 386,1 seconds and a total of 4162 IPM iterations.The latter required 400,2 seconds and a total of 4170 iterations.
In other words, the non-diagonal scheme required 3% less time and a similar number of iterations, as compared to the uniform scheme, for this test set.
As before, in order to illustrate the effect of the non-diagonal regularization in terms of factorization performance, we provide Table 4, in which the factorization times obtained when using non-diagonal and uniform regularization respectively are compared, over the last four iterations of problems LISWET1, FORPLAN and SHELL.The size of the constraint matrix in each case also includes columns which were added to transform the problem to the standard form.Information concerning the cardinality of the partition N , the iteration count as well as the time needed to compute the LDL T factorization of the system matrix at the respective iteration, is gathered in Table difference is almost negligible.We should mention here, that the proposed tuning of the non-diagonal regularization is quite conservative.Hence, we would expect that one could improve the efficiency of such a method at the expense of its robustness.

Conclusions
In this paper, we derive a dynamic non-diagonal regularization scheme suitable for interior point methods.The proposed scheme is automatically tuned based on the properties of the problem, such that sufficiently large eigenvalues of the Newton system are perturbed insignificantly.The presence of non-diagonal terms in the regularization matrices allows us to introduce more sparsity in the linear system, solved to determine the Newton direction at each iteration of the interior point method.The regularization matrices can be computed expeditiously, enabling more efficient factorizations of the system matrix.The method has been implemented and the computational results demonstrate its efficiency.The results also support the claim that the proposed rule, for tuning the regularization matrices based on the properties of the problem, produces a regularization which perturbs the system almost insignificantly while maintaining numerical stability.An extension of this regularization, to interior point methods that solve the Newton system using an iterative scheme, seems natural and will be addressed in a future work.

( a )
Performance profile with respect to time (b) Performance profile with respect to iterations

Figure 1 :
Figure 1: Performance profiles over the Netlib test set.
1 ×n 1 is a uniform diagonal matrix, which ensures that R pN 0 and diagonally dominant.Although ∆ pN can have sizeable values, (2.19) ensures that the respective elements in Θ −1 N have significantly larger values, making this perturbation acceptable.Using (2.21), the (1,1) block of (2.20) becomes: 2, it was noted that ∆ d is chosen such that R d 0 and diagonally dominant, while ∆ pB is chosen such that R pB 0 and diagonally dominant.This can be seen by looking at (2.27) combined with (2.26) and (2.25) combined with (2.24), respectively.Similarly, positive definiteness and diagonal dominance of R pN follows immediately by construction, i.e. by looking at equations (2.21) and(2.22).For notational convenience, we define:

Table 2 :
Sparsity introduced from the non-diagonal regularization (linear programming)