An interior point-proximal method of multipliers for convex quadratic programming

In this paper we combine an infeasible Interior Point Method (IPM) with the Proximal Method of Multipliers (PMM). The resulting algorithm (IP-PMM) is interpreted as a primal-dual regularized IPM, suitable for solving linearly constrained convex quadratic programming problems. We apply few iterations of the interior point method to each sub-problem of the proximal method of multipliers. Once a satisfactory solution of the PMM sub-problem is found, we update the PMM parameters, form a new IPM neighbourhood and repeat this process. Given this framework, we prove polynomial complexity of the algorithm, under standard assumptions. To our knowledge, this is the first polynomial complexity result for a primal-dual regularized IPM. The algorithm is guided by the use of a single penalty parameter; that of the logarithmic barrier. In other words, we show that IP-PMM inherits the polynomial complexity of IPMs, as well as the strict convexity of the PMM sub-problems. The updates of the penalty parameter are controlled by IPM, and hence are well-tuned, and do not depend on the problem solved. Furthermore, we study the behavior of the method when it is applied to an infeasible problem, and identify a necessary condition for infeasibility. The latter is used to construct an infeasibility detection mechanism. Subsequently, we provide a robust implementation of the presented algorithm and test it over a set of small to large scale linear and convex quadratic programming problems. The numerical results demonstrate the benefits of using regularization in IPMs as well as the reliability of the method.


Introduction 1.Primal-Dual Pair of Convex Quadratic Programming Problems
In this paper, we consider the following primal-dual pair of linearly constrained convex quadratic programming problems, in the standard form: where c, x, z ∈ R n , b, y ∈ R m , A ∈ R m×n , Q 0 ∈ R n×n .Without loss of generality m ≤ n.Duality, between the problems (P)-(D), arises by introducing the Lagrangian function of the primal, using y ∈ R m and z ∈ R n , z ≥ 0, as the Lagrange multipliers for the equality and inequality constraints, respectively.Hence, we have: Using the Lagrangian function, one can formulate the first-order optimality conditions (known as Karush-Kuhn-Tucker (KKT) conditions) for this primal-dual pair.In particular, we define the vector w = (x T , y T , z T ) T , and compute the gradient of L(•), that is, F (w) = ∇L(w), where F (w) : R 2n+m → R 2n+m .Then, equating F to the zero vector gives the desired conditions, i.e.: where e denotes the vector of ones of appropriate dimension, while X, Z ∈ R n×n denote the diagonal matrices satisfying X ii = x i and Z ii = z i , ∀ i ∈ {1, • • • , n}.For simplicity of exposition, when referring to convex quadratic programming problems, we implicitly assume that the problems are linearly constrained.If both (P) and (D) are feasible problems, it can easily be verified that there exists an optimal primal-dual triple (x, y, z), satisfying the KKT optimality conditions of this primal-dual pair (see for example Prop.2.3.4 in [7]).

A Primal-Dual Interior Point Method
Primal-dual Interior Point Methods (IPMs) are popular for simultaneously solving (P) and (D), iteratively.As indicated in the name, primal-dual IPMs act on both primal and dual variables.There are numerous variants of IPMs and the reader is referred to [16] for an extended literature review.In this paper, an infeasible primal-dual IPM is presented.Such methods are called infeasible because they allow intermediate iterates of the method to be infeasible for (P)-(D), in contrast to feasible IPMs, which require intermediate iterates to be strictly feasible.
Interior point methods, replace the non-negativity constraints of the problems with logarithmic barriers in the objective.That is, at each iteration, we choose a barrier parameter µ and form the logarithmic barrier primal-dual pair : in which non-negativity constraints x > 0 and z > 0 are implicit.As before, we can form the KKT optimality conditions of the pair (1.3)- (1.4), by introducing the Lagrangian of the primal barrier problem: Equating the gradient of the previous function to zero, gives the following conditions: Applying the same methodology for (1.4), gives the final condition, namely: At each IPM iteration, we want to solve the following mildly non-linear system: where F IPM σ,µ (w) = 0 is a slightly perturbed form of the previously presented optimality conditions.In particular, σ ∈ (0, 1) is a centering parameter, which determines how fast µ will be forced to decrease at the next IPM iteration.For σ = 1, we recover the barrier optimality conditions, while for σ = 0 we recover the initial problems' optimality conditions (1.2).The efficiency of the method depends heavily on the choice of σ.In fact, various improvements of the traditional IPM schemes have been proposed in the literature, which solve the previous system for multiple carefully chosen values of the centering parameter σ and of the right hand side, at each IPM iteration.These are the so called predictor-corrector schemes, proposed for the first time in [22].Various extensions of such methods have been proposed and analysed in the literature (e.g.see [15], [32] and references therein).However, for simplicity of exposition, we will follow the traditional approach; σ is chosen heuristically at each iteration, and the previous system is solved only once.
In order to solve the system F IPM σ,µ (w) = 0 at each iteration, the most common approach is to apply the Newton method.Newton method is favored for systems of this form, due to the self-concordance of the function ln(•).For more details on this subject, the interested reader is referred to Chapter 2 in [23].At the beginning of the k-th iteration of the IPM, for k ≥ 0, we have available an iterate w k = (x T k , y T k , z T k ) T , and a barrier parameter µ k , defined as µ k = x T k z k n .We choose a value for the centering parameter σ k ∈ (0, 1) and evaluate the Jacobian of F IPM σ k ,µ k (w k ).Then, the Newton direction is determined by solving the following system of equations: (1.6) For ease of presentation, when a matrix depends on the iteration k (in this case X and Z), we omit the subscript and state the dependence whenever it is not obvious.Notice that as µ → 0, the optimal solution of (1.3)-(1.4)converges to the optimal solution of (P)-(D).Polynomial convergence of such methods has been proved multiple times in the literature (see for example Chapter 6 in [33], [34] and the references therein).

Primal Proximal Point Method
One possible method for solving the primal problem (P), is the so called proximal point method.
Given an arbitrary starting point x 0 , the k-th iteration of the method is summarized by the following minimization problem: x k+1 = arg min where ρ k > 0 is a non-increasing sequence of penalty parameters, and x k is the current estimate for the optimal solution of (P).Notice that such an algorithm is not practical, since we have to solve a sequence of sub-problems, of similar difficulty to that of (P).Nevertheless, the proximal method contributes the ρ k I term to the Hessian of the objective, and hence the sub-problems are strongly convex.This method is known to achieve a linear convergence rate (and superlinear if ρ k → 0), as shown in [29,13], among others, even in the case where the minimization subproblems are solved approximately.Various extensions of this method have been proposed in the literature.For example, one could employ different penalty functions other than the typical 2-norm penalty presented previously (see [9,18,11] and references therein).Despite the fact that such algorithms are not practical, they have served as a powerful theoretical tool and a basis for other important methods.For an extensive review of the applications of standard primal proximal point methods, we refer the reader to [26].

Dual Proximal Point Method
It is a well-known fact, observed for the first time in [29], that the application of the proximal point method on the dual problem, is equivalent to the application of the augmented Lagrangian method on the primal problem, which was proposed for the first time in [17,28].In view of the previous fact, we will present the derivation of the augmented Lagrangian method for the primal problem (P), while having in mind that an equivalent reformulation of the model, results in the proximal point method for the dual (D) (see [29,12] or Chapter 3.4.4 in [8]).
We start by defining the augmented Lagrangian function of (P).Given an arbitrary starting guess for the dual variables y 0 , the augmented Lagrangian function is defined at the k-th iteration as: where δ k > 0 is a non-increasing sequence of penalty parameters and y k is the current estimate of the optimal Lagrange multiplier vector.The augmented Lagrangian method (ALM) is summarized below: Notice that the update of the Lagrange multiplier estimates, can be interpreted as the application of the dual ascent method.A different type of update would be possible, however, the presented update is cheap and effective, due to the strong concavity of the proximal ("regularized") dual problem (since δ k > 0).Convergence and iteration complexity of such methods have been established multiple times (see for example [29,20]).There is a vast literature on the subject of augmented Lagrangian methods.For a plethora of technical results and references, the reader is referred to [6].For a convergence result of various extended versions of the method, the reader is referred to [11].

Primal-Dual Proximal Point Method
In [29], the author presented, for the first time, the proximal method of multipliers (PMM).The method consists of applying both primal and dual proximal point methods for solving (P)-(D).
For an arbitrary starting point (x 0 , y 0 ), and using the already defined augmented Lagrangian function, given in (1.7), PMM can be summarized by the following iteration: where ρ k , δ k are positive non-increasing penalty parameters.One can observe, that at every iteration of the method, the primal problem that we have to solve is strongly convex, while its dual, strongly concave.As shown in [29], the addition of the term ρ k x − x k 2 , in the augmented Lagrangian method, does not affect its convergence rate, while ensuring better numerical behaviour of its iterates.An extension of this algorithm, allowing for the use of more general penalties, can be found in [11].

Regularization in Interior Point Methods
In the context of interior point methods, it is often beneficial to include a regularization, in order to improve the spectral properties of the system matrix in (1.6).For example, notice that if the constraint matrix A is rank-deficient, then the matrix in (1.6) is not invertible.The latter can be immediately addressed by the introduction of a dual regularization, say δ > 0, ensuring that rank([A | δI m ]) = m.For a detailed analysis of the effect of dual regularization on such systems, the reader is referred to [2].On the other hand, the most common and efficient approach in practice, is to eliminate the variables ∆z from system (1.6) and solve the following symmetric reduced (augmented ) system instead: where Θ = XZ −1 .Since Xz → 0 close to optimality, one can observe that the matrix Θ will contain some very large and some very small elements.Hence, the matrix in (1.9) will be increasingly ill-conditioned, as we approach optimality.The introduction of a primal regularization, say ρ > 0, can ensure that the matrix Q + Θ −1 + ρI n , will have eigenvalues that are bounded away from zero, and hence a significantly better worst-case conditioning than that of Q + Θ −1 .In other words, regularization is commonly employed in IPMs, as a means of improving robustness and numerical stability.As we will discuss later, by introducing regularization in IPMs, one can also gain efficiency.This is because regularization transforms the matrix in (1.9) into a quasi-definite one.It is known that such matrices can be factorized efficiently (see [31]).
In view of the previous discussion, one can observe that a very natural way of introducing primal regularization to problem (P), is through the application of the primal proximal point method.Similarly, dual regularization can be incorporated through the application of the dual proximal point method.This is a well-known fact.The authors in [1], presented a primal-dual regularized IPM, and interpreted this regularization as the application of the proximal point method.Consequently, the authors in [12] developed a primal-dual regularized IPM, through the primal-dual application of the proximal point method on the primal barrier problem (1.3).
There, global convergence of the method was proved, under some assumptions.A variation of the method proposed in [12] is given in [27], where general non-diagonal regularization matrices are employed, as a means of improving factorization efficiency.
Similar ideas have been applied in the context of IPMs for general non-linear programming problems.For example, the authors in [3], presented an interior point method combined with the augmented Lagrangian method, and proved that, under some general assumptions, the method converges at a superlinear rate.Similar approaches can be found in [5,14,4], among others.
In this paper, we develop a path-following primal-dual regularized IPM for solving convex quadratic programming problems.The algorithm is obtained by applying one or a few iterations of the infeasible primal-dual interior point method, in order to solve sub-problems arising from the primal-dual proximal point method.Under some mild assumptions, we prove polynomial complexity of the algorithm and provide global convergence guarantees.To our knowledge, this is the first complexity result for a general primal-dual regularized IPM scheme.Notice that a complexity result is given for a primal regularized IPM for linear complementarity problems in [35].However, the authors significantly alter the Newton directions, making their method very difficult to generalize and hard to achieve efficiency in practice.The proposed approach is implemented and its reliability is demonstrated through extensive experimentation.
The implementation slightly deviates from the theory, however, most of the theoretical results are verified in practice.The main purpose of this paper, is to provide a reliable method that can be used for solving general convex quadratic problems, without the need of pre-processing, or of previous knowledge about the problems.Such a method is implemented and supported by a novel theoretical result, indicating that regularization alleviates various issues arising in IPMs, without affecting their important worst-case polynomial complexity.As a side-product of the theory, an implementable infeasibility detection mechanism is also derived and tested in practice.
In Section 2, we provide the algorithmic framework of the method.Consequently, in Section 3, we prove polynomial complexity of the algorithm, and global convergence is established.In Section 4, an infeasibility test is derived.Numerical results of the implemented method, are presented and discussed in Section 5. Finally, we derive some conclusions in Section 6.

Algorithmic Framework
In the previous section, we presented all the necessary tools for deriving the proposed Interior Point-Proximal Method of Multipliers (IP-PMM).Effectively, we would like to merge the proximal method of multipliers with the infeasible interior point method.For that purpose, assume that, at some iteration k of the method, we have available an estimate λ k for the optimal Lagrange multiplier vector y * .Similarly, we denote by ζ k the estimate of the primal solution x * .Now, we define the proximal penalty function that has to be minimized at the k-th iteration of proximal method of multipliers, for solving (P), given the estimates λ k , ζ k : with δ k > 0, ρ k > 0 some non-increasing penalty parameters.In order to solve the PMM sub-problem (1.8), we will apply one (or a few) iterations of the previously presented infeasible IPM.To do that, we alter the previous penalty function, by including logarithmic barriers, that is: where µ k > 0 is the barrier parameter.In order to form the optimality conditions of this sub-problem, we equate the gradient of L IP −P M M δ k ,ρ k ,λ k ,ζ k (•) to the zero vector, i.e.: Following the developments in [3], we can define the variables: e, to get the following (equivalent) system of equations (first-order optimality conditions): As shown before, interior point methods introduce a perturbation in the third block equation of the previous conditions.Hence, at each IPM iteration, we choose a centering parameter σ k > 0, and we want to solve the following system: Observe the similarity between the systems given in (1.2), (1.5) and (2.2).In fact, the PMM system can be written as: while the IPM system as: for w = (x T , y T , z T ) T and F (w) as defined in (1.2).Notice that each PMM sub-problem is parametrized by δ, ρ, λ and ζ.Here, we interpret δ > 0 as the dual regularization parameter, while ρ > 0 as the primal one.The interpretations of λ and ζ have been already stated.Hence, for each such sub-problem, we can define the following sets: Notice that F δ,ρ,λ,ζ and F + δ,ρ,λ,ζ represent the feasibility and strict feasibility sets of the subproblem, respectively.S δ,ρ,λ,ζ represents the solution set.Let us introduce the following notation, that will be used later.
Definition 1.Given two real-valued functions T (x), f (x), x ∈ R + , we say that: if and only if, ∃ constants c, x 0 , such that: We mentioned earlier that we develop a path-following method.Hence, we have to describe a neighbourhood, in which the iterations of the method should lie.However, unlike typical pathfollowing methods, the neighbourhood that we define depends on the PMM sub-problem.In other words, we define the following family of neighbourhoods: where C ≥ 0 is a large constant and γ ∈ (0, 1) a constant preventing component-wise complementarity products from approaching zero faster than µ = x T z n .The terms b and c represent the current infeasibility and will vary at every iteration.The dependence of these vectors on the iteration of the algorithm, k, is omitted for simplicity of exposition.The only requirement for b, c, is that their respective norms do not exceed C. We are now able to derive Algorithm IP-PMM, summarizing the proposed interior point-proximal method of multipliers.We will prove polynomial complexity of this scheme in the next section, under some mild assumptions (to be specified later).
Notice, in Algorithm IP-PMM, that we force σ to be less than 0.5.This value is set, without loss of generality, for simplicity of exposition.Similarly, in the choice of the step-length, we Algorithm IP-PMM Interior Point-Proximal Method of Multipliers Input: A, Q, b, c, tol.Parameters: 0 < σ min ≤ σ max ∈ (0, 0.5] (bounds for the centering parameter).
then Declare convergence and return the optimal solution.return (x k , y k , z k ).else Choose σ k ∈ [σ min , σ max ] and solve (Newton step): The constant 0.01 is chosen for ease of presentation.It depends on the choice of the maximum value of σ.The parameter tol, represents the error tolerance (chosen by the user).The terminating conditions, require the norm of primal and dual infeasibility, as well as complementarity, to be less than this tolerance.In such a case, we accept the iterate as a solution triple.The estimates λ, ζ are not updated if primal or dual feasibility (respectively) are not sufficiently decreased.In this case, we keep the estimates constant while continuing decreasing the penalty parameters δ, ρ.This is the typical approach employed in augmented Lagrangian methods, and is very useful in proving polynomial complexity of the algorithm.Following the usual practice with proximal and augmented Lagrangian methods, we accept a new estimate when the respective residual is sufficiently decreased.Finally, for reasons to become clear later, we force the regularization parameters δ and ρ to be decreased at the same rate as µ.This is a key feature of the approach, that will prove very useful in the convergence analysis.

Convergence Analysis of IP-PMM
For the rest of this section, we will make use of the following feasibility assumption.
Assumption 1.The constraint matrix has full row rank, that is rank(A) = m and there exists an optimal solution (x * , y * , z * ) for the primal-dual pair (P)-(D), such that: Note that this assumption is commonly employed when deriving polynomial complexity for an interior point method.Even in the case where C does depend polynomially on n, the polynomial complexity still holds.In other words, the independence on n is mainly employed for simplicity of exposition.In any case, in the next section, this assumption will be relaxed, in order to assess the behaviour of the method under infeasibility.
One can observe, for every positive parameters δ, ρ and estimates λ, ζ, that there exists a unique optimal triple (x * r , y * r , z * r ) ∈ S δ,ρ,λ,ζ .To see this, notice that convexity of (P) implies that the functions L(•, y, z) and −L(x, •, •) cannot decrease, along any direction, faster than linearly, where L(x, y, z) is defined in (1.1).As a consequence, the functions L(•, y, z) , must have bounded level sets.For a rigorous proof of this, the reader is referred to Section 3.4 of [8].Hence, we can search for the solution of the following problem: Using Weiestrass' extreme value theorem, we know that the solution is attained.Uniqueness follows from the strong convexity of the terms ρ 2 x − ζ 2 and δ 2 y − λ 2 .A similar result is shown in [29].Let us now use these remarks, to derive the following important Lemma.Lemma 3.1.Given Assumption 1, and for all δ > 0, ρ > 0 and λ ∈ R m , ζ ∈ R n produced by Algorithm IP-PMM, there exists a triple (x * r , y * r , z * r ) ∈ S δ,ρ,λ,ζ such that: where C is a constant independent of n.
Proof.We prove the claim, by induction, firstly for x * r .For that, we observe that during an arbitrary iteration, say k, of Algorithm IP-PMM, the estimate ζ either remains the same, or takes the value of x k .We know from the algorithm that ζ 0 = O(C).If, during the first k iterations of the method, ζ remains constant (ζ k = ζ 0 ), then we know that for any such iterate, there exists a unique triple (x * r , y * r , z * r ) ∈ S δ k ,ρ k ,λ k ,ζ 0 .Following the developments in [30]-Theorem 1, we can observe that under Assumption 1, the following inequality holds (which is based on the non-expansiveness of the proximal operator): where x * is defined in Assumption 1.A simple manipulation yields that x * r = O(C).On the other hand, let us assume that at iteration k, we have ζ k = O(C) and that the estimate will be updated.We know, from Algorithm (IP-PMM), that the estimate ζ is updated only if the following holds: On the other hand, from the neighbourhood conditions, we have that: Combining the previous two inequalities gives: But since ρ k+1 = Θ(µ k+1 ) and ζ k = O(C), we can conclude that x k+1 = O(C).Hence, ζ k+1 = x k+1 and ζ k+1 = O(C).Again, we know that there exists a unique optimal solution (x * r , y * r , z * r ) ∈ S δ k+1 ,ρ k+1 ,λ k+1 ,ζ k+1 .Therefore, as before, non-expansiveness of the proximal operator gives the result, that is: This completes the proof of boundedness of x * r , ∀ k ≥ 0. We work similarly to prove boundedness of y * r .As before, we observe that the initial estimate λ 0 satisfies: λ 0 = O(C).At each iteration, k, of the method, we have that either λ k = λ k−1 or λ k = y k .Assume that λ 0 stays constant for the first k iterations.Then, using the non-expansiveness of the proximal operator, we know that: y * r − y * ≤ λ 0 − y * , where y * is defined in Assumption 1.One can easily observe that as long as λ 0 is not updated, we have: y * r = O(C).Assume now, that at iteration k, the estimate is such that λ k = O(C) and that λ k+1 = y k+1 .We know, from Algorithm IP-PMM, that the latter will only happen if: On the other hand, from the neighbourhood conditions we know that: Combining the previous two inequalities, yields that: Since δ k+1 = Θ(µ k+1 ), and λ k = O(C), we can conclude that: y k+1 = O(C).We set λ k+1 = y k+1 , and by using the non-expansiveness of the proximal operator, we get: which in turn yields that y * r = O(C), at iteration k + 1.Having proven boundedness of x * r and y * r , for every iteration of the method, we can use the dual feasibility condition, that is: to conclude that z * r = O(C), ∀ k ≥ 0, which concludes the proof.For simplicity of exposition, we employ the following Assumption: Assumption 2. For all δ k , ρ k , λ k and ζ k produced at iteration k ≥ 0 of Algorithm IP-PMM, there exists a triple (x, ỹ, z), which satisfies the following system of linear equations: with (x, z) > (0, 0), x = O(C) and z = O(C), where b and c are defined in In order to support the previous assumption, we provide Lemma 3.2, in which we prove the existence of a bounded solution for system (3.1), with (x, z) > (0, 0).Lemma 3.2.Given Assumption 1 and for all δ k , ρ k , λ k and ζ k produced at iteration k ≥ 0 of Algorithm IP-PMM, there exists a triple (x, ỹ, z) which satisfies the following system of linear equations: From the previous developments, as well as from the definition of N δ k ,ρ k ,λ k ,ζ k , we know that b and ĉ are bounded.In other words, there is a large constant K ≥ 0, such that: b ≤ K and ĉ ≤ K, ∀ k ≥ 0. We want to find a solution for the following system: Since there is no restriction on the sign of ỹ, we can set: for any choice of x.Substituting ỹ in the second block equation of the previous system, gives: and observe that M 0, since ρ k > 0. Hence, all eigenvalues of M , ψ i , i ∈ {1, • • • , n}, are real and positive.Notice that ĉ − 1 δ k A T b depends on δ k and will have a large magnitude if δ k is close to zero.However, as δ k decreases, the positive eigenvalues of matrix 1 δ k A T A increase (notice that rank(A) = m).Hence, we can find positive scalars where u i u i denotes component-wise multiplication of the vector u i with itself, while e denotes the vector of ones of dimension n.Given the previous scalars a 1 , • • • , a n , we can set: By substituting x in equation (3.2), we get: where we used the fact that u i denotes the i-th eigenvector of M .The last two inequalities follow by construction.This proves that there exists a solution to system (3.1)such that (x, z) > (0, 0), and completes the proof.
In view of Lemma 3.2, one can observe that Assumption 2 is very mild and is mainly employed for simplicity of exposition.In the following Lemma, we derive boundedness of the iterates of Algorithm IP-PMM.
Lemma 3.3.Given Assumptions 1 and 2, and that Algorithm IP-PMM produces iterates we have that: and y k is bounded by some large constant independent of the iteration k.
Proof.From Assumption 2, we know that, at each iteration k, there exists a triple (x, ỹ, z) solving system (3.1),such that: x = O(C), z = O(C) and (x, z) > θ(e, e), for some positive constant θ.On the other hand, from Lemma 3.1, we know that at every iteration k, there exists a triple ( Let us consider the following auxiliary triple: Using (3.3), one can observe that: where the last equality follows from the definition of the neighbourhood Similarly, one can show that: By combining the previous two relations, we have: From (3.4), it can be seen that: However, from Assumption 2 and Lemma 3.1, we have that: x ≥ θe, z ≥ θe, for some positive constant θ = O(1), while . By combining all the previous, we get: where we used the fact that (x * r ) T z * r = 0. Hence, (3.5) implies that: Finally, we have from the neighbourhood conditions that: Boundedness of all terms (except for y k ), as well as the fact that rank(A) = m, yields boundedness of y k , and completes the proof.
As in a typical IPM convergence analysis, we proceed by bounding some components of the scaled Newton direction.The proof is similar to that presented in [33], Lemma 6.5.
Lemma 3.4.Given Assumptions 1 and 2, there exists a constant K = O(nC 3 ), independent of the iteration k, such that: with D 2 = XZ −1 , for every iteration k of Algorithm IP-PMM, where (∆x k , ∆y k , ∆z k ) is the Newton direction obtained by solving system (2.5).
Proof.Let us define the following matrix: W = (XZ) . At each iteration of Algorithm IP-PMM, consider the following auxiliary triple: where (x, ỹ, z) is defined in Assumption 2, and (x * r , y * r , z * r ) is defined in Lemma 3.1.Using the neighbourhood conditions in (2.4), one can observe that the auxiliary triple in (3.6) solves the following homogeneous system: The previous implies that: On the other hand, using the last block equation of the Newton system (2.5), we have: By multiplying both sides of the previous equation by W −1 , we get: But, from (3.7), we know that xT z ≥ 0 and hence: Combining (3.8) with the previous, gives: We isolate one of the two terms of the left hand side of the previous inequality, take square roots on both sides and apply the triangular inequality to it, to get: We now proceed to bounding the terms in the right hand side of (3.9).Firstly, notice from the neighbourhood conditions (2.4), that: This in turn implies that: .
On the other hand, we have that: Hence, combining the previous two relations yields: We proceed by bounding D −1 .For that, we have: .
Similarly, we have that: . Hence: Combining all the previous bounds yields the claimed bound for D −1 ∆x k .One can bound D∆z k in the same way and this completes the proof.
We are now able to prove that at every iteration of Algorithm IP-PMM, there exists a steplength α k > 0, using which, the new iterate satisfies the conditions required by the algorithm.
Proof.The proof follows the developments in Chapter 6, Lemma 6.7 in [33], but we provide it here for completeness.From Lemma 3.4, we have: where K = O nC 3 .Similarly, it is easy to see that: On the other hand, by summing over all n components of the last block equation of the Newton system (2.5), we have: or component-wise: We proceed by proving (3.10).Using (3.13), we have: where we set (without loss of generality) β 1 = σ min 2 .The most-right hand side of the previous inequality will be non-negative for every α satisfying: In order to prove (3.11), we will use (3.14) and the fact that from the neighbourhood conditions we have: In particular, we have: By combining all the previous, we get: In turn, the most-right hand side of the previous relation will be non-negative for every α satisfying: Finally, to prove (3.12), we set (without loss of generality) β 2 = 0.99.We know, from Algorithm IP-PMM, that σ max ≤ 0.5.With the previous two remarks in mind, we have: The last term will be non-positive for every α satisfying: Hence, by combining all the bounds on the step-length, we have that: Further noticing, from Lemma 3.4, that K = O(nC 3 ), where C is a positive constant independent of n, completes the proof.
In the following Lemma, we show that given an initial point in the parametrized neighbourhood (2.4), the iterates of the algorithm will lie in their respective parametrized neighbourhoods.
Lemma 3.6.Given Assumptions 1 and 2, and any iterate Proof.In Lemma 3.5, we showed that (3.11) holds, which in turn is one of the neighbourhood conditions.Hence, we have to prove that the new iterate will satisfy the approximate feasibility conditions of the new neighbourhood.We have that: In that case, we have that: µ 0 , which implies that λ k+1 = λ k .From the Newton system (2.5), we have that: At this point, we want to update δ k such that δ k+1 = Θ(µ k+1 ).Let us assume that: δ k+1 = (1 − α k ψ)δ k .Then, we would like to show that there exists ψ > 0, independent of the iteration k, such that: With this in mind, we see that: From (3.10), we know that: ))µ k .Hence, given the previous and (3.10), we know that (3.16) will hold for every ψ > 0, such that: or equivalently: This implies that (3.16) holds, for every ψ ∈ (0, ψ).However, from Algorithm IP-PMM, we know that δ k = Θ(µ k ), while, from Lemma 3.3, we know that y k+1 must be bounded by a constant independent of the iteration k (and hence λ k is also bounded).In other words, ψ must be bounded away from zero, independently of the iteration.
Similarly, if: then there exists c such that: c ≤ C and −Qx k+1 + A T y k+1 + z k+1 = c + µ k+1 µ 0 c.On the other hand, if the previous does not hold, as before, we can show that there must exist a constant ψ > 0 independent of the iteration k, such that: We omit the details, since the latter can be treated similarly to the approximate primal feasibility condition.
The following Theorem summarizes our results.
Theorem 3.7.The sequence {µ k } generated by Algorithm IP-PMM converges Q-linearly to zero, and the sequences of regularized residual norms Proof.From (3.12) we have that: while, from (3.15), we know that ∀ k ≥ 0, ∃ ᾱ ≥ κ n 2 such that: α k ≥ ᾱ.Hence, we can easily see that: µ k → 0. On the other hand, from the neighbourhood conditions, we know that for all k ≥ 0: This completes the proof.
Theorem 3.8.Let ∈ (0, 1) be a given error tolerance.Suppose also, that a suitable starting point for Algorithm IP-PMM is available, such that µ 0 ≤ C ω for some positive constants C, ω.Given Assumptions 1 and 2, there exists an index K with: Proof.We omit the proof, which follows standard developments.Such a proof can be found in [33,34], among others.
Theorem 3.9.Suppose that Algorithm IP-PMM generates the iterates and terminates when a limit point is reached.Then, if Assumptions 1 and 2 hold, every limit point of {(x k , y k , z k )}, determines a primal-dual solution of the non-regularized pair (P)-(D).
Proof.From Theorem 3.7, we know that {µ k } → 0, and hence, there exists a sub-sequence K ⊆ N, such that: Now, from Algorithm IP-PMM we have that δ k = Θ(µ k ), ρ k = Θ(µ k ) and from Lemma 3.3 we know that {(x k , y k , z k )} is a bounded sequence.Hence, we obtain that: One can readily observe that, under our assumptions, the limit point of the algorithm satisfies the conditions given in (1.2).
Note that a complexity result for the case where rank(A) < m (rank deficient case) could be possible, however, it would substantially complicate the analysis and hence it is left for a future work.Nevertheless, global convergence for this case could be derived, by following the developments in [12].For ease of presentation, we do not present it here.

Infeasible Problems
Let us now drop our previous assumptions (Assumption 1, 2), in order to analyze the behaviour of the algorithm in the case where an infeasible problem is tackled.Let us employ the following two premises: Premise 1.During the iterations of Algorithm IP-PMM, the sequences: Premise 2. There does not exists a primal-dual triple, satisfying the KKT conditions for the primal-dual pair (P)-(D).
The following analysis is based on the developments in [12] and [2].However, in these papers, such an analysis is proposed in order to derive convergence of an IPM, while here, we use it as a tool in order to construct a reliable and implementable infeasibility detection mechanism.In what follows, we show that Premises 1 and 2 are contradictory.In other words, if Premise 2 holds (which means that our problem is infeasible), then we will show that Premise 1 cannot hold.
Lemma 4.1.Given Premise 1, and by assuming that x T k z k > , for some > 0, for all iterations k of Algorithm IP-PMM, the Newton direction produced by (2.5) is uniformly bounded by a constant dependent only on n.
Proof.Let us use a variation of Theorem 1 given in [2].This theorem states that if the following conditions are satisfied, δ A T A is positive definite, then the coefficient matrix of (2.5) is non-singular and has a uniformly bounded inverse.Note that (i), (ii) and (iv) are trivial to verify.Condition (iii) follows since we know that our iterates lie on N δ k ,ρ k ,λ k ,ζ k , while we have x T z ≥ , by assumption.Indeed, from the neighbourhood conditions (2.4), we have that Finally, we have to show that the right hand side of (2.5) is uniformly bounded.But that follows immediately from the neighbourhood conditions (2.4), and completes the proof.
In the following Lemma, given Premise 1, we prove by contradiction that the parameter µ k of Algorithm IP-PMM converges to zero.The proof is based on the developments in [19,12] and is only partially given here, for ease of presentation.
Lemma 4.2.Given Premise 1, and a sequence Proof.Assume, by virtue of contradiction, that µ k > , ∀ k ≥ 0.Then, we know that the Newton direction obtained by the algorithm at every iteration, after solving (2.5), will be uniformly bounded by a constant dependent only on n.Take any k ≥ 0 and define the following functions: with ψ 1 and ψ 2 determined in a similar manner to Lemma 3.6 (see (3.16) and (3.17) respectively).We would like to show that there exists α * > 0, such that: These conditions model the requirement that the next iteration of Algorithm IP-PMM must lie in the updated neighbourhood: N δ k+1 ,ρ k+1 ,λ k+1 ,ζ k+1 .Proving the existence of α * > 0 such that f i (α * ) ≥ 0, ∀ i = 1, • • • , n and h(α * ) ≥ 0, follows exactly the developments in Section 3 of [19].
We prove the claim for g p (•) and observe that a similar analysis is needed for g d (•).We will treat the case where λ k+1 = λ k here, since proving the claim for the case where λ k+1 = y k+1 follows easily.Using Lemma 4.1, we know that there exists a constant π, depending only on n, such that: ∆x T k ∆z k ≤ π.Then, from the second and the third block equations of (2.5), by taking the triangular inequality, and by using the neighbourhood conditions, we have that: Notice however, using a similar reasoning as in Lemma 3.6, that: and from Premise 1 as well as from Algorithm IP-PMM, we know that there always exists such a ψ 1 , bounded away from zero.In other words, we have that: where * ≥ σ min C 2µ 0 , and hence, Working similarly, we can show that all the functions are non-negative, for every α ∈ [0, α * ], where: However, using the inequality: we get that µ k → 0, which contradicts our assumption that µ k > , ∀ k ≥ 0.
Finally, using the following Theorem, we will derive a condition for detecting infeasible problems.
Theorem 4.3.Given Premise 2, i.e. there does not exist a KKT triple for the pair (P)-(D), then Premise 1 fails to hold.
Proof.By virtue of contradiction, let Premise 1 hold.In Lemma 4.2, we proved that given Premise 1, Algorithm IP-PMM produces iterates that belong to the neighbourhood (2.4) and µ k → 0. But from the neighbourhood conditions we can observe that: and Hence, we can choose a sub-sequence K ⊆ N, for which: But since y k − λ k and x k − ζ k are bounded, while δ k → 0 and ρ k → 0, we have that: This contradicts Premise 2, i.e. that the pair (P)-(D) does not have a KKT triple, and completes the proof.

Computational Experience
In this section, we provide some implementation details and present computational results of the method, over a set of small to large scale linear and convex quadratic programming problems.
The code was written in Matlab and can be found here: https://www.maths.ed.ac.uk/ERGO/software.html(source link).

Implementation Details
Our implementation slightly deviates from the theory, in order to gain some additional computational efficiency.

Free Variables
The method takes as input problems in the following form: where I = {1, • • • , n} \ F is the set of indices indicating the non-negative variables.In particular, if a problem instance has only free variables, no logarithmic barrier is employed and the method reduces to a standard proximal method of multipliers.Of course in this case, the derived complexity result does not hold.Nevertheless, a global convergence result holds, as shown in [29].In general, convex optimization problems with only equality constraints are usually easy to deal with, and the proposed algorithm behaves very well when solving such problems in practice.

Constraint Matrix Scaling
In the pre-processing stage, we check if the constraint matrix is well scaled, i.e if: max If the previous is not satisfied, we apply geometric scaling in the rows of A, that is, we multiply each row of A by a scalar of the form: However, for numerical stability, we find the largest integer p i , such that: 2 p i ≤ d i and we set Hence, the scaling factors will be powers of two.Based on the IEEE representation of floating point numbers, multiplying by a power of 2 translates into an addition of this power to the exponent, without affecting the mantissa.

Starting Point, Newton-step Computation and Step-length
We use a starting point, based on the developments in [22].To construct it, we try to solve the pair of problems (P)-(D), ignoring the non-negativity constraints.
However, in order to ensure stability and efficiency, we regularize the matrix AA T and we employ the Preconditioned Congugate Gradient (PCG) method to solve these systems (in order to avoid forming AA T ).We use the classical Jacobi preconditioner to accelerate PCG, i.e.P = diag(AA T ) + δI, where δ = 8, is set as the regularization parameter.
Then, in order to guarantee positivity and sufficient magnitude of x I , z I , we compute δ x = max{−1.5• min(x I ), 0} and δ z = max{−1.5• min(z I ), 0} and we obtain: , where e |I| is the vector of ones of appropriate dimension.Finally, we set: In order to find the Newton step, we employ a widely used predictor-corrector method.We provide the algorithmic scheme in Algorithm PC for completeness, but the reader is referred to [22], for an extensive review of the method.
We solve the systems (5.1) and (5.2), using the build in Matlab symmetric decomposition (i.e.ldl).Such a decomposition always exists, with D diagonal, for the aforementioned systems, since after introducing the regularization, the system matrices are guaranteed to be quasidefinite; a class of matrices known to be strongly factorizable, [31].In order to exploit that, we

Algorithm PC Predictor-Corrector method
Compute the predictor: where where a b denotes component-wise multiplication between the two vectors.Compute the step in the non-negativity orthant: for i = 1, • • • , |I|, and set: with τ = 0.995 (avoid going too close to the boundary).Compute a centrality Compute the corrector: Compute the step in the non-negativity orthant: and set: Update: (x k+1 , y k+1 , z k+1 ) = (x k + α x ∆x, y k + α z ∆y, z k + α z ∆z).
change the default pivot threshold of ldl to a value slightly lower than the minimum allowed regularization value (reg thr ; specified in sub-section 5.1.4).Using such a small pivot threshold guarantees that no 2x2 pivots will be employed during the factorization routine.

Regularization Parameters
At the beginning of the optimization, we set: δ 0 = 8, ρ 0 = 8, λ 0 = y 0 , ζ 0 = x 0 .Then, at the end of every iteration, we employ the algorithmic scheme given in Algorithm PEU.In order to ensure numerical stability, we do not allow δ or ρ to become smaller than a suitable positive value, reg thr .We set: reg thr = max . This value is based on the developments in [27], in order to ensure that we introduce a controlled perturbation in the eigenvalues of the non-regularized linear system.The reader is referred to [27] for an extensive study on the subject.If the factorization fails, we increase the regularization parameters by a factor of 10 and repeat the factorization.If this happens while either δ or ρ have reached their minimum allowed value (reg thr ), then we also increase this value by a factor of 10.If this occurs 5 consecutive times, the method is terminated with a message indicating ill-conditioning.
Algorithm PEU Penalty and Estimate Updates

Termination Criteria
There are three possible termination criteria.
1.The method declares convergence, when all the following three conditions are satisfied: ≤ tol, and µ ≤ tol, where tol is the tolerance specified by the user.
2. Primal-dual infeasibility is declared, if the primal or the dual estimate has not been updated for 5 consecutive iterations and either of the following conditions holds: 3. Finally, the method is terminated if the number of maximum iterations, specified by the user, is reached before any of the previous termination criteria is met.

Numerical Results
At this point, we present the computational results obtained by solving a set of small to large scale linear and convex quadratic problems.In order to stress out the importance of regularization, we compare IP-PMM with a non-regularized IPM.The latter implementation, follows exactly from the implementation of IP-PMM, simply by fixing δ and ρ to zero.In the non-regularized case, the minimum pivot of the ldl function is restored to its default value, in order to avoid numerical instability.Throughout all of the presented experiments, we set the number of maximum iterations to 200.It should be noted here, that we expect IP-PMM to require more iterations to converge, as compared to the non-regularized IPM.In turn, the Newton systems arising in IP-PMM, have better numerical properties (accelerating the factorization process), while overall the method is expected to be significantly more stable.
In what follows, we demonstrate that this increase in the number of iterations is benign, in that it does not make the resulting method inefficient.In contrast, we will show that the acceleration of the factorization process, more than compensates for the increase in the number of iterations.The experiments were conducted on a PC with a 2.2GHz Intel Core i7 processor (hexa-core), 16GB RAM, run under Windows 10 operating system.The Matlab version used was R2018b.

Linear Programming Problems
Let us compare the proposed method with the respective non-regularized implementation, over the Netlib collection, [25].The test set consists of 96 linear programming problems.We set the desired tolerance to tol = 10 −6 .Firstly, we compare the two methods, without using the pre-solved version of the problem collection (e.g.allowing rank-deficient matrices).In this case, the non-regularized IPM converged for only 66 out of the total 96 problems.On the other hand, IP-PMM solved successfully the whole set, in 143 seconds (and a total of 3,015 IPM iterations).Hence, one of the benefits of regularization, that of alleviating rank deficiency of the constraint matrix, becomes immediately obvious.
However, in order to explore more potential benefits of regularization, we run the algorithm on a pre-solved Netlib library.In the pre-solved set, the non-regularized IPM converged for 93 out of 96 problems.The three failures occurred due to instability of the Newton system.The overall time spent was 353 seconds (and a total of 1,871 IPM iterations).On the other hand, IP-PMM solved the whole set in 182 seconds (and a total of 3,035 iterations).We notice here, that IP-PMM converges slower when solving the pre-solved problems.This is most likely the case due to the extra restrictions that are introduced from the pre-solve.Nevertheless, two more benefits of regularization become obvious here.Firstly, we can observe that numerical stability can be a problem, even if we ensure that the constraint matrices are of full row rank.Secondly, notice that despite the fact that IP-PMM required 61% more iterations, it still solved the whole set in 47% less CPU time.This is because in IP-PMM, only diagonal pivots are allowed during the factorization.We could enforce the same condition on the non-regularized IPM, but then significantly more problems would fail to converge (22/96) due to numerical instability.
In Table 1, we collect statistics from the runs of the two methods over some medium scale instances of the presolved Netlib test set.Notice that the number of constraint/variables presented, occurs after adding slack variables and extra constraints needed to transform the problem to the accepted IP-PMM format.From Table 1, it becomes obvious that the factorization efficiency is significantly improved by the introduction of the regularization terms.In all of the presented instances, IP-PMM converged needing more iterations, but requiring less CPU time.
In order to summarize the comparison of the two methods, we include Figure 1, which contains the performance profiles, over the pre-solved Netlib set, of the two methods.IP-PMM is represented by the green line (consisted of triangles), while non-regularized IPM by the blue line (consisted of stars).In Figure 1a, we present the performance profile with respect to time required for convergence, while in Figure 1b, the performance profile with respect to the number of iterations.In both figures, the horizontal axis is in logarithmic scale, and it represents the ratio with respect to the best performance achieved by one of the two methods, for each problem.The vertical axis shows the percentage of problems solved by each method, for different values of the performance ratio.Robustness is "measured" by the maximum achievable percentage, while efficiency by the rate of increase of each of the lines (faster increase translates to better efficiency).For more information about performance profiles, we refer the reader to [10], where this benchmarking was proposed.One can see that all of our previous observations are verified in Figure 1.

Infeasible Problems
In order to assess the accuracy of the proposed infeasibility detection criteria, we attempt to solve 28 infeasible problems, coming from the Netlib infeasible collection ( [25], see also Infeasible Problems).For 22 out of the 28 problems, the method was able to recognize that the problem under consideration is infeasible, and exit before the maximum number of iterations was reached.There were 4 problems, for which the method terminated after reaching the maximum number of iterations.For 1 problem the method was terminated early due to numerical instability.Finally, there was one problem for which the method converged to the least squares solution, which satisfied the optimality conditions for a tolerance of 10 −6 .Overall, IP-PMM run all 28 infeasible problems in 34 seconds (and a total of 1,813 IPM iterations).The proposed infeasibility detection mechanism had a 78% rate of success over the infeasible test set, while no feasible problem was recognized as infeasible.A more accurate infeasibility detection mechanism could be possible, however, the proposed approach is easy to implement and cheap from the computational point of view.Nevertheless, the interested reader is referred to [4,24] and the references therein, for various other infeasibility detection methods.

Quadratic Programming Problems
Next, we present the comparison of the two methods over the Maros-Mészáros test set ( [21]), which is comprised of 122 convex quadratic programming problems.Notice that in the previous experiments, we used the pre-solved version of the collection.However, we do not have a pre-solved version of this test set available.Since the focus of the paper is not in the pre-solve phase of convex problems, we present the comparison of the two methods over the set, without applying any pre-processing.As a consequence, non-regularized IPM fails to solve 27 out of the total 122 problems.However, only 11 out of 27 failed due to rank deficiency.The remaining 16 failures occurred due to numerical instability.On the contrary, IP-PMM solved the whole set successfully in 127 seconds (and a total of 2,756 iterations).As before, the required tolerance was set to 10 −6 .
In Table 2, we collect statistics from the runs of the two methods over some medium scale instances of the Maros-Mészáros collection.Again, the number of constraints/variables presented, occurs after adding slack variables and extra constraints needed to transform the problem in the accepted IP-PMM format.In order to summarize the comparison of the two methods, we include Figure 2, which contains the performance profiles, over the Maros-Mészáros test set, of the two methods.IP-PMM is represented by the green line (consisted of triangles), while non-regularized IPM by the blue line (consisted of stars).In Figure 2a, we present the performance profile with respect to time required for convergence, while in Figure 2b, the performance profile with respect to the number of iterations.Similar remarks can be made here, as those given when summarizing the linear programming experiments.One can readily observe the importance of the stability introduced by the regularization.On the other hand, the overhead in terms of number of iterations that is introduced due to regularization, is acceptable due to the acceleration of the factorization (since we are guaranteed to have a quasi-definite augmented system).

Verification of the Theory
We have already presented the benefits of using regularization in interior point methods.A question arises, as to whether a regularized IPM can actually find an exact solution of the problem under consideration.Theoretically, we have proven this to be the case.However, in practice one is not allowed to decrease the regularization parameters indefinitely, since ill-conditioning will become a problem.Based on the theory of augmented Lagrangian methods, one knows that sufficiently small regularization parameters suffice for exactness (see [6], among others).
In what follows, we provide a "table of robustness" of IP-PMM.We run the method over the Netlib and the Maros-Mészáros collections, for decreasing values of the required tolerance and report the number of problems that converged.One can observe, from Table 3, that IP-PMM is sufficiently robust.Even in the case where a 10 digit accurate solution is required, the method is able to maintain a success rate larger than 91%.

Large Scale Problems
All of our previous experiments were conducted on small to medium scale linear and convex quadratic programming problems.We have showed (both theoretically and practically) that the proposed method is reliable.However, it is worth mentioning the limitations of the method.Since we employ exact factorizations during the iterations of the IPM, we expect that the method will be limited in terms of the size of the problems it can solve.The main bottleneck arises from the factorization, which does not scale well in terms of processing time and memory requirements.In order to explore the limitations, in Table 4 we provide the statistics of the runs of the method over a small set of large scale problems.It contains the dimensions of the problems (after transforming them in the IP-PMM format), the number of non-zeros, as well as the time needed to convergence.The tolerance used in these experiments was 10 −6 .From Table 4, it can be observed that as the dimension of the problem grows, the time to convergence is significantly increased.This increase in time is disproportionate for some problems.
The reason for that, is that the required memory could exceed the available RAM memory, in which case the swap-file is activated.Access to the swap memory is extremely slow and hence the time could potentially increase disproportionately.On the other hand, we retrieve two failures due to lack of available memory.The previous issues, could potentially be addressed by the use of iterative methods.Such methods, embedded in the IP-PMM framework, could significantly relax the memory as well as the processing requirements, at the expense of providing inexact directions.Combining IP-PMM (which is stable and reliable) with such an inexact scheme (which could accelerate the IPM iterations) seems to be a viable and competitive alternative and will be addressed in a future work.

Conclusions
In this paper, we present an algorithm suitable for solving convex quadratic programs.It arises from the combination of an infeasible interior point method with the proximal method of multipliers (IP-PMM).The method is interpreted as a primal-dual regularized IPM, and we prove that it is guaranteed to converge in a polynomial number of iterations, under reasonable assumptions.Then, an easily implementable infeasibility detection mechanism is derived and tested in practice.The algorithm is implemented, and the reliability of the method is demonstrated.At the expense of some extra iterations, regularization improves the numerical properties of the interior point iterations.The increase in the number of iterations is benign, since factorization efficiency as well as stability is gained.Not only the method remains efficient, but it outperforms a similar non-regularized IPM scheme.We observe the limitations of the current approach, due to the cost of factorization, and it is noted that embedding iterative methods in the current scheme can significantly improve the scalability of the algorithm at the expense of inexact directions.Since the reliability of IP-PMM is demonstrated, it only seems reasonable to allow for approximate solutions and still expect fast convergence.Hence, a future goal is to extend the theory as well as the implementation, in order to allow for inexact Newton directions.

Figure 1 :
Figure 1: Performance profiles over the pre-solved Netlib test set.

Table 1 :
Netlib medium scale instances

Table 3 :
Table of robustness

Table 4 :
Large scale problems