On a Primal-Dual Newton Proximal Method for Convex Quadratic Programs

This paper introduces QPDO, a primal-dual method for convex quadratic programs which builds upon and weaves together the proximal point algorithm and a damped semismooth Newton’s method. The outer proximal regularization yields a numerically stable method, and we interpret the proximal operator as the unconstrained minimization of the primal-dual proximal augmented Lagrangian function. This allows the inner Newton’s scheme to exploit sparse symmetric linear solvers and multi-rank factorization updates. Moreover, the linear systems are always solvable independently from the problem data and exact linesearch can be performed. The proposed method can handle degenerate problems, provides a mechanism for infeasibility detection, and can exploit warm starting, while requiring only convexity. We present details of our open-source C implementation and report on numerical results against state-of-the-art solvers. QPDO proves to be a simple, robust, and eﬃcient numerical method for convex quadratic programming.


Introduction
Quadratic programs (QPs) are one of the fundamental problems in optimization.In this paper, we consider linearly constrained convex QPs, in the form: min x 1 2 x Qx + q x, s. t. l ≤ Ax ≤ u, with x ∈ R n .Matrix Q ∈ R n×n and vector q ∈ R n define the objective function, whereas the constraints are encoded by matrix A ∈ R m×n and vectors l, u ∈ R m .We assume (i) the problem data Q, q, and A are bounded, (ii) matrix Q is symmetric positive semidefinite, i.e., Q 0, and (iii) vectors l and u satisfy l ≤ u, l < +∞, and u > −∞ component-wise; cf.[55,30].We will refer to the nonempty, closed and convex set as the constraint set.Note that (1) represents a general convex QP, in that it accomodates also equality constraints and bounds.

Background
Optimization problems in the form (1) appear in a variety of applications and are of interest in engineering, statistics, finance and many other fields.QPs often arise as sub-problems in methods for general nonlinear programming [9,29,40], and greatly vary in terms of problem size and structure.
Convex QPs have been studied since the 1950s [20] and several numerical methods have been developed since then.These differ in how they balance the number of iterations and the cost (e.g., run time) per iteration.
Active-set methods for QPs originated from extending the simplex method for linear programs (LPs) [60].These methods select a set of binding constraints and iteratively adapt it, seeking the set of active constraints at the solution.Active-set algorithms can be easily warm started and can lead to finite convergence.Moreover, by adding and dropping constraints from the set of binding ones, factorization updates can be adopted for solving successive linear systems.However, these methods may require many iterations to identify the correct set of active constraints.Modern solvers based on active-set methods are qpOASES [18] and NASOQ [12].
First-order methods iteratively compute an optimal solution using only first-order information about the cost function [45,42].As these methods consist of computationally cheap and simple steps, they are well suited to applications with limited computing resources [55].However, first-order algorithms usually require many iterations to achieve accurate solutions and may suffer from ill-conditioning of the problem data.Several acceleration schemes have been proposed to improve their behaviour [1,58].The OSQP [55] solver offers an implementation based on ADMM [8].
Interior-point methods consider the problem constraints in the objective function via barrier functions and solve a sequence of parametric sub-problems [9,Chap. 11], [40, §16.6].Although not easily warm started, the polynomial complexity makes interior-point methods appealing for large scale problems [27].They usually require few but rather demanding iterations [29,40].Recent developments are found in the regularized method IP-PMM [47].
Semismooth Newton's methods apply a nonsmooth version of Newton's method to the KKT conditions of the original problem [49,48].In the strictly convex case, i.e., with Q 0, this approach performs very well as long as the underlying linear systems are nonsingular.Regularized, or stabilized, semismooth Newton-type methods, such as QPALM [30,31] and FBstab [34], overcome these drawbacks.

Approach
In this work we present a numerical method for solving general QPs.The proposed algorithm is based on the proximal point algorithm and a semismooth Newton's method for solving the sub-problems, which are always solvable for any choice of problem data.We therefore impose no restrictions such as strict convexity of the cost function or linear independence of the constraints.As such, our algorithm gathers together the benefits of fully regularized primal-dual methods and semismooth Newton's methods with active-set structure.Our algorithm can exploit warm starting to reduce the number of iterations, as well as factorization caching and multi-rank update techniques for efficiency, and it can obtain accurate solutions.
Our approach, dubbed QPDO from "Quadratic Primal-Dual Optimizer", is inspired by and shares many characteristics with algorithms that have already been proposed, in particular with QPALM [30] and FBstab [34].On the other hand, they differ on some key aspects.QPALM relates to the proximal method of multipliers [30,Rem. 2], which in turn is associated to the classical (primal) augmented Lagrangian function [52].Instead, FBstab and QPDO apply the proximal point method, yielding exact primal-dual regularization.However, FBstab reformulates the sub-problem via the (penalized) Fischer-Burmeister NCP function [19,10], and adopts the squared residual norm as a merit function for the inner iterative loop; this prevents the use of symmetric sparse linear solvers.Instead, QPDO adopts the minimum NCP function, which leads to symmetric linear systems with active-set structure.Then, we show the primal-dual proximal augmented Lagrangian function, introduced in [50,25] and [15], is a suitable merit function for the proximal sub-problem, which allows us to perform an exact linesearch in a fully primal-dual regularized context.Indeed, we believe, the main contribution of this work consists in recognizing this link, exploiting it to bridge the gap between previously proposed methods, and developing a robust and efficient algorithm that possesses their advantages but does not suffer from their disadvantages.The algorithm is described with a nested structure: outer and inner iterations are indexed by k ∈ N, respectively.Given an arbitrary vector x, x k denotes that x depends on k, and analogously for matrices.We denote y the dual variable associated to the constraints in problem (1).A primal-dual pair (x, y) will be denoted by v, and we will refer interchangeably to it as a vector or to its components x and y.An optimal solution to the problem (1) will be denoted as (x , y ), or v .Optimal solutions of proximal sub-problems will be denoted using an appropriate subscript, according to the iteration.For example, (x k , y k ), and v k , denote the solution to the proximal sub-problem corresponding to the k-th outer iteration.
Outline The rest of the paper is organized as follows.In Section 2, we outline our algorithmic framework.Sections 3 and 4 develop and present our algorithm in details.In particular, in Section 4.1 we establish our key result, which relates the proximal operator and the primal-dual proximal augmented Lagrangian function.Convergence properties are analyzed in Section 5, while Section 6 juxtaposes QPDO with similar methods.We present details of our implementation in Section 7 and report on numerical experience in Section 8.

Algorithm
In this section, we outline our Quadratic Primal-Dual Optimizer (QPDO), which weaves together the proximal point algorithm and a semismooth Newton's method.Effectively, the proximal operator is evaluated by solving a sub-problem via semismooth Newton's method.Thus, the latter constitutes a inner iterative procedure, embedded into the outer proximal point loop.The proposed numerical scheme is sketched in Algorithms 1 and 2, highlighting the nested structure for clarity of presentation.We denote r and r k the outer and inner residuals defined in (4) and (11), respectively, and v a primal-dual pair (x, y).We provide more details and investigate its convergence properties in the following sections.

Outer loop: inexact proximal point method
Our method solves problem (1) using the proximal point algorithm, with inexact evaluation of the proximal operator.In Algorithm 1, this is evaluated by means of a semismooth Newtontype method, which constitutes a inner iterative procedure, further investigated in Section 4. This section focuses on the outer loop corresponding to the proximal point algorithm, which has Algorithm 1 QPDO: Quadratic Primal-Dual Optimizer input: Q, q, A, l, u parameters: > 0, 0 ≥ 0, κ ∈ [0, 1), 0 < σ min ≤ σ 0 , 0 < µ min ≤ µ 0 guess: Algorithm 2 Inner loop: semismooth Newton's method v ← v k repeat get δv by solving the linear system (17) get τ by solving the piecewise linear equation ( 18) been extensively studied in the literature [53].We recall some important results and refer to [37,52,35,41] for more details.

Optimality conditions
Problem (1) can be equivalently expressed as where f (x) := 1 2 x Qx + q x and g(z are the objective function and the characteristic function of the constraint set C, respectively.The necessary and sufficient, first-order optimality conditions of problem (2), and hence problem (1), read where ∂g * denotes the conjugate subdifferential of g [41].We will refer to vector y ∈ R m as the dual variable and to T : R ⇒ R , := n + m, as the KKT operator for problem (1).These optimality conditions, in the form (3), involve the set-valued operator T .However, noticing that, for any α > 0, the conditions v = Π C (v + αu) and v ∈ ∂g * (u) are equivalent [51, §23], conditions in (3) can be equivalently rewritten.Choosing α = 1, we can define the (outer) residual r : R → R and express the KKT conditions for (1) as This reformulation can be obtained also by employing the minimum NCP function [56] and rearranging to obtain the projection operator Π C .The residual r is analogous to the natural residual function π investigated in [43].Since it is an error bound for problem (1), i.e., dist Thm 18], the norm of r is a sensitive optimality measure and its value can be adopted as a stopping criterion.

Proximal point algorithm
The proximal point algorithm [53] finds zeros of maximal monotone operators by recursively applying their proximal operator.Since T is a maximal monotone operator [52,41], the proximal point algorithm converges to an element v of the set of primal-dual solutions T −1 (0), if any exists [37,53].Starting from an initial guess v 0 , it generates a sequence {v k } of primal-dual pairs by recursively applying the proximal operator P k : Here, {Σ k } is a sequence of non-increasing positive definite matrices, namely, Σ k 0 and Σ k − Σ k+1 0 for all k ∈ N. The matrices Σ k control the primal-dual proximal regularization and, similarly to exact penalty methods, these are not required to vanish [52,53].Since T is maximally monotone, the proximal operator P k is well defined and single valued for all v ∈ dom T = R [37].Thus, from (5), evaluating the proximal operator P k at v k is equivalent to finding the unique v ∈ R that satisfies This is guaranteed to have a unique solution and to satisfy certain useful regularity properties; see Section 4 below.As a result, we can construct a fast inner solver for these sub-problems based on semismooth Newton's method.

Early termination
The proximal point algorithm tolerates errors, namely the inexact evaluation of the proximal operator P k [53].Criterion (A r ) in [35] provides conditions for the design of convergent inexact proximal point algorithms [35,Thm 2.1].Let v k := P k (v k ) denote the unique proximal subproblem solution and v k+1 ≈ v k the actual recurrence update.Then, the aforementioned criterion requires where r ≥ 0 and {e k } is a summable sequence of nonnegative inner tolerances, i.e., e k ≥ 0 for all k and ∞ k=0 e k < +∞.However, since v k is effectively unknown, this criterion is impractical in its form.Instead, in Algorithm 1 it is required that v k+1 satisfies r k (v k+1 ) ∞ ≤ k .Here, r k denotes the residual for the k-th sub-problem, and is defined in (11).In Section 5 we will show that this criterion is a simple and viable substitute, which retains the significance of (A r ).

Warm starting
If a solution v exists, the (outer) sequence {v k } generated by (5) converges, by the global convergence of the proximal point algorithm [53].Then, expectedly, P k (v k ) and v k are arbitrarily close for sufficiently large k [34, §4].This supports the idea of warm starting the inner solver with the current outer estimate v k , that is, setting v ← v k in Algorithm 2. For large k, only one or few Newton-type inner iterations are needed to find an approximate sub-problem solution v k+1 .

Inner loop: semismooth Newton's method
In this section we focus on solving sub-problem (6) via a semismooth Newton's method.For the sake of clarity, and without loss of generality, we consider for some parameters σ k , µ k ∈ R ++ .

Merit function
We now derive the simple yet fundamental result that is the key to develop our method.This provides the NCP reformulation of the proximal sub-problem with a suitable merit function.The former yields symmetric active-set linear systems, while the latter leads to exact linesearch.
Let us express the proximal sub-problem (6) in the form Similarly to (4), for any given α > 0, this can be rewritten as where we denote Then, for any positive α = µ k , the conditions in ( 8) are equivalent to namely their unique solutions coincide.Now, we observe that the right-hand side of ( 10) is the gradient of the function By construction, this is a continuously differentiable function whose gradient vanishes at the unique solution of the proximal sub-problem.Furthermore, for any α ∈ (0, µ k ), it is strictly convex and hence admits a unique minimizer.This must coincide with the unique proximal point.Therefore, this function is a suitable merit function for the sub-problem.The particular choice α := µ k /2 inherits all these properties and leads to the inner optimality conditions with r k : R → R the inner residual, and the associated merit function In fact, M k : R → R is the primal-dual proximal augmented Lagrangian function [50,25,15]; see Appendix A for a detailed derivation.This underlines once again the strong relationship between the proximal point algorithm and the augmented Lagrangian framework, pioneered in [52].On the one hand, by (12), the dual regularization parameter µ k controls the constraint penalization [21, §3.2].On the other hand, this provides an "interpretation of the augmented Lagrangian method as an adaptive constraint regularization process" [3, §2].
The inner residual r k in ( 11) is piecewise affine, hence strongly semismooth on R [48,33].Effectively, it can be employed as stopping criterion in place of ∇M k (•) .In fact, given the unique, bounded, and nonsingular matrix T k defined by we have the identity The availability of a suitable merit function allows us to adopt a damped Newton-type method and design a linesearch-based globalization strategy, in contrast with [46,23,34].Since M k is continuously differentiable and piecewise quadratic, an exact linesearch procedure can be carried out, which yields finite convergence [57].
Finally, we highlight that the method asymptotically reduces to a sequence of regularized semismooth Newton's steps applied to the original, unperturbed optimality system, on the vein of [2].This closely relates to the concept of exact regularization [22].In fact, the proximal primal-dual regularization is exact; see Proposition 1 and compare [3, Thm 1].Proposition 1.Let k ∈ N be arbitrary.
(i) Suppose v k solves sub-problem (11) for v k := v k and for some σ k ≥ 0 and µ k > 0.Then, v k solves the original problem (4).
Proof.The proof is immediate by direct comparison of ( 4) and (11).

Search direction
A semismooth Newton's direction δv = (δx, δy) at v = (x, y) solves Here, the matrix V k (v) is an element of the generalized Jacobian [51, §23] of r k at v, which has the form In turn, the diagonal matrix P k (v) with entries is an element of the generalized Jacobian of Π C at w k .Owing to the selection of P k (v) with binary entries, the linear system ( 14) can be rewritten in symmetric form, similar to those arising in active-set methods [32].To this end, we notice that, if P ii k (v) = 1, the corresponding inner residual in (11) , and the linear equation in (14) gives δy i = −y i .This yields the crucial observation that, by (16), it holds P k (v)δy = −P k (v)y for all v ∈ R .Then, an equivalent yet symmetric linear system is obtained, whose solution is the search direction δv at v: The active-set structure introduced by P k allows us to obtain a symmetric linear system and adopt multi-rank factorization updates [24,14] while maintaining structure and sparsity of the coefficient matrix [55,12].The linear system in (17) always admits a unique solution, since the coefficient matrix is symmetric quasi-definite [59], independent of the problem data.

Exact linesearch
Given a primal-dual pair v and a search direction δv, we seek a stepsize τ > 0 to effectively update v to v + τ δv in Algorithm 2. Similarly to M k , the function ψ k : τ → M k (v + τ δv) is continuously differentiable, piecewise quadratic, and strictly convex.Thus, the optimal stepsize τ := arg min t∈R ψ k (t) is found as the unique zero of ψ k , i.e., ψ k (τ ) = 0. Since ψ k is a piecewise linear, strictly monotone increasing function, the exact linesearch procedure amounts to solving a piecewise linear equation of the form with respect to τ ∈ R. Here, the coefficients are given by whose derivation is reported in Appendix B. Thanks to its peculiar structure, (18) can be solved efficiently and exactly (up to numerical precision), e.g., by sorting and linear interpolation, cf.
We underline that the stepsize τ is unique and strictly positive, since M k is strictly convex and δv is a descent direction for M k at v.This follows from the observation that

Convergence analysis
This section discusses the convergence of QPDO as outlined in Algorithms 1 and 2. Our analysis relies on well-established results for Newton's and proximal point methods; in particular, we refer to [53,35,57].
First, we focus on the inner loop, described in Algorithm 2 and detailed in Section 4. Since linear system ( 17) is always solvable, the search direction δv exists and is unique.Similarly, there exists a unique, positive optimal stepsize τ which solves (18).Thus, all steps are welldefined.It remains to show that the condition r k (v) ∞ ≤ k is eventually satisfied.Since M k is continuously differentiable, strictly convex, and piecewise quadratic, the semismooth Newton's method with exact linesearch exhibits finite convergence [57, Thm 3].Thus, ∇M k (v) = 0 after finitely many iterations.Then, by ∇M k (•) = T k r k (•) with T k nonsingular, it reaches r k (v) = 0. Hence, for any k > 0, the inner stopping criterion is eventually satisfied, and the inner loop terminates.
Let us consider now the outer loop, sketched in Algorithm 1.This consists of inexact proximal point iterations [53], hence global and local convergence properties of the outer loop can be derived based on [35,Thm 2.1].Recall that, by construction, the regularization parameters are positive and non-increasing.Also, by 0 ∈ R + and κ ∈ [0, 1), the sequence The following result shows that criterion (A r ) [35] holds.Lemma 1.Let T −1 (0) be nonempty, any v 0 ∈ R be given, and the sequence {v k } be generated by Algorithm 1.Then, there exists a summable sequence {e k } ⊆ R + such that Proof.By the inner stopping condition, for all k ∈ N it holds r k (v k+1 ) ≤ k , with summable { k } ⊆ R + .Morever, since M k is Σ k -strongly convex, we have that, for some ηk > 0, it is for all v ∈ R .By the boundedness of Σ k away from zero, matrix T k is bounded and there exists a constant η > 0 such that the bound v − v k ≤ η r k (v) holds for all k ∈ N and v ∈ R .Thus, in particular, for all k ∈ N it is Let e k := η k , and the proof is complete.
Theorem 1.Let T −1 (0) be nonempty, any v 0 ∈ R be given, and the sequence {v k } be generated by Algorithm 1.Then, the sequence {v k } is well-defined and converges linearly to a solution v ∈ T −1 (0).

Relationship with similar methods
Our approach is inspired by and shares many features with other recently developed methods.This section elaborates upon their relationship with QPDO.
FBstab [34] "synergistically combines the proximal point algorithm with a primal-dual semismooth Newton-type method" to solve convex QPs.By adopting the Fischer-Burmeister [19,10] NCP function, FBstab does not depend on an estimate of the active set, which may result in a more regular behavior than QPDO.In contrast, adopting the minimum NCP function, QPDO can exploit factorization updates, perform exact line search by solving a piecewise linear equation, and handle simultaneously bilateral constraints.
QPALM is a "proximal augmented Lagrangian based solver for convex QPs" [30]; recent advancements allow to handle nonconvex QPs as well [31].Given a primal-dual estimate v, the exact, unique resolvent update v of QPALM [30,Eq. 6], with Σ = blockdiag(σ −1 I, µ −1 I), is given by Herein function ϕ is given by [30, Eq. 8] and closely resembles M k in (12).Since (20a) yields ∇ϕ(x ) = 0, combining with (20b) and rearranging give Conditions ( 21) and (11) differ only in the argument of Π C , where the term −µy/2 is missing in (21b).This underlines the primal-dual nature of QPDO, that may better cope with changes in the active set [32] and control the quality of both primal and dual variables during iterations [26,2], without any additional computational effort.OSQP is a "solver for convex quadratic programs based on the alternating direction method of multipliers" [55].Rearranging from [55, Alg.1], with parameters α = 1, ρ = µ −1 , and given primal-dual estimate (x, y) and constraint estimate z, the (unique) primal-auxiliary update (x ♦ , s ♦ ) satisfies Then, the constraint and dual updates are given by z ♦ = Π C z + µs ♦ and y ♦ = s ♦ + µ −1 z − z ♦ , respectively.Although conditions ( 22) resemble ( 11), an auxiliary variable s substitutes the dual variable y and the projection in ( 11) is replaced by the constraint estimate z.This makes sub-problem (22) a linear system and results in a first-order method.

Implementation details
QPDO has been implemented in C code 1 and provides an interface to MATLAB.This section discusses some relevant aspects of the program, such as the linear solver, parameters update rules, infeasibility detection, and problem scaling.

Linear solver
The linear system ( 17) is solved with CHOLMOD [11].This linear solver is analogous to the one adopted in QPALM [30], for the sake of comparison.Let (r dual k , r prim k ) partition the inner residual r k in (11).Then, formally solving for δy in (17), we obtain the expression (omitting subscripts and arguments) 1 Available online at https://github.com/aldma/qpdo.
where the second and third lines are due to the binary structure of P .Substituting δy and rearranging, we obtain a linear system for δx: This has a symmetric, positive definite coefficient matrix and can be solved by CHOLMOD [11].On the one hand, this approach allows multi-rank factorization updates [14], thus avoiding the need for a full re-factorization at every inner iteration.On the other hand, sparsity pattern may be lost and significant fill-in may arise due to the matrix-matrix product A A. For this reason, the current implementation may from solving (17) via sparse symmetric linear solvers with multi-rank factorization updates, which is a topic for future research.

Parameters selection
Solving convex QPs via the proximal point algorithm imposes mild restrictions on the sequence of primal-dual regularization parameters {Σ k }.As mentioned in Section 3.2, there are no additional requirements other than being non-increasing and positive definite.However, similarly to forcing sequences in augmented Lagrangian methods [13], the sequence of regularization parameters greatly affects the behaviour of QPDO, and a careful tuning can positively impact the performance.For instance, although faster convergence rates can be expected if Σ k → 0 [35], numerical stability and machine precision should be taken into account.Following [31, §5.3] and [55, §5.2], our implementation considers only diagonal matrices of the form Σ k = blockdiag(σ k I, diag(µ k )), and refer to the effect of σ k and µ k as primal and dual regularization, respectively.
Dual regularization This term proves critical for the practical performance of the method.We argue, these parameters have such impact since they strike the balance between the number of inner and outer iterations, seeking easy-to-solve sub-problems, effective warm starting, or rapid constraints satisfaction.Therefore, we carefully initialize and update these parameters, guided by the interpretation as a constraint penalization offered by the augmented Lagrangian framework; cf.Section 4.1.In our implementation, we consider a vector µ k to gain a finer control over the constraint penalization [13].Given a (primal) initial guess x 0 ∈ R n , we initialize as in [7, §12.4]: where µ max 0 ≥ µ min 0 > 0 and κ µ ≥ 0.Then, following [31, §5.3], we monitor the primal residual r prim (v) := Ax − Π C (Ax + y) from ( 4) and update the dual regularization parameter µ k accordingly.If where θ µ ∈ (0, 1), µ min > 0, and δ µ ≥ 0. Otherwise, we set µ i k+1 = µ i k .These rules adapt the constraint penalization on the current residual, seeking a uniform, steady progression towards feasibility, while making sure the sequences {µ i k }, i ∈ [1; m], are non-increasing and bounded away from zero.In our implementation, the default values are µ min 0 = 10 −4 , µ max 0 = 10 4 , κ µ = 0.1, µ min = 10 −8 , δ µ = 10 −2 and θ µ = 0.1.
Primal regularization This term turns out to be less crucial with respect to the dual counterpart.For this reason, it is associated to a scalar value and tuned independently from the residual.Starting from σ 0 > 0, we apply where σ min > 0 and κ σ ∈ [0, 1].In our implementation the default values are σ 0 = 0.1, σ min = 10 −7 , and κ σ = 0.1.

Early termination
The inner tolerance k also affects the performance of QPDO, since it balances sub-problem accuracy and early termination.In Algorithm 1, these aspects relate to the parameters 0 and κ , which drive { k } to zero.However, finite precision should also be taken into account.In fact, although the semismooth Newton's method converges in finitely many iterations, the solution provided is exact up to round-off errors and numerical precision.Therefore, we deviate from Algorithm 1 in this respect and employ the update rule where 0 ≤ min ≤ opt .In our implementation, the default values are 0 = 1, κ = 0.1, min = 10 −14 , and opt = 10 −6 .

Infeasibility detection
A routine for detecting primal and dual infeasibility of problem (1) is included in Algorithm 1.This allows the algorithm to terminate with either a primal-dual solution or a certificate of primal or dual infeasibility, for some given tolerances.We adopt the mechanism developed in [4, §5.2], which holds whenever the proximal point algorithm is employed to solve the KKT conditions (3).Problem (1) is declared primal or dual infeasible based on some conditions on ∆x k := x k+1 − x k and ∆y k := y k+1 − y k , k ≥ 0. The reader may refer to [55, §3.4], [30, §V.C], and [34, §4.1], and [47, §4] for analogous applications.

Preconditioning
Preconditioning, or scaling, the problem may alleviate ill-conditioning and mitigate numerical issues, especially when the problem data span across many orders of magnitude.In our implementation, we closely follow [31, §5.2] and scale the problem data by performing the Ruiz's equilibration procedure [54] on the constraint matrix A. This procedure iteratively scales the rows and columns of a matrix in order to make their infinite norms approach one.By default, QPDO performs 10 scaling iterations.Slightly different routines are adopted, e.g., in [55, §5.1] and [47, §5.1.2].Note that, by default, if the problem is initially scaled, the termination conditions for both, optimality and infeasibility, refer to the original, unscaled problem.

Numerical results
We discuss details of our open-source implementation of QPDO and present computational results on random problems and the Maros-Mészáros set [36].We test and compare QPDO against the open-source, full-fledged solvers OSQP [55] and QPALM [30,31].Indeed, "the construction of appropriate software is by no means trivial and we wish to make a thorough job of it" [13]; we plan to improve our current implementation and to report comprehensive numerical results in due time.

Setup
We consider the tolerance opt = 10 −5 , and set the tolerances in OSQP and QPALM to abs = opt and rel = 0.In addition, we set the maximum number of iterations and the time limit to 10 12 and 100 s, respectively, for every solver, and we leave all the other settings to the internal defaults.It is worth mentioning that, since no initial guess is provided, all the solvers start with v 0 = 0. We deem optimal a primal-dual pair v returned by a solver if it satisfies the condition r(v ) ∞ ≤ opt , otherwise we consider it a failure.The code to generate the numerical results is currently available upon reasonable request to the author.All the experiments were carried out on a desktop running Ubuntu 16.04 with Intel Core i7-8700 and 16 GB RAM.
Metrics Let S, P , and t s,p denote the set of solvers, the set of problems, and the time required for solver s ∈ S to return a solution for problem p ∈ P .The shifted geometric mean (sgm) t s of the run times for solver s ∈ S on P is defined by with the shift t shift = 1 s [38].Here, when solver s fails to solve problem p, the term t s,p is set to the time limit.We also adopt the performance profiles [16] to compare the solver timings.Considering t s,p = +∞ when solver s fails on problem p, f r s (τ ) is the fraction of problems solved by solver s within τ times the best timing.Since performance profiles may be misleading when more than two solvers are compared [28], we will compare QPDO to QPALM and OSQP separately.Considering t s,p = +∞ when solver s fails on problem p, f a s (t) is the fraction of problems solved by solver s within the time t.Note that, in contrast to f r s , the time profile t → f a s (t) is independent from other solvers and displayed with the actual timings of s.

Random problems
We considered QPs in the form of (1) with randomly generated problem data.In each problem instance, the number of variables is n = 10 a , with a uniformly distributed, and ranges between 10 1 and 10 3 , i.e., a ∼ U(1, 3).The number of constraints is m = 10 n.The linear cost is normally distributed, i.e., q i ∼ N (0, 1).The cost matrix is Q = P P , where P ∈ R n×n has 10% nonzero entries P ij ∼ N (0, 1).The constraint matrix A ∈ R m×n contains 10% nonzero entries A ij ∼ N (0, 1).The bounds are uniformly distributed, i.e., l i ∼ U(−1, 0) and u i ∼ U(0, 1).We also investigated equality-constrained QPs.For these problems, n ranges between 10 2 and 10 4 , m = n/10 , and l i = u i ∼ N (0, 1).We generated 250 instances of each problem class.

Results
Computational results are summarized in Table 1 and shown in Figs. 1 and 2. Performance profiles suggest that, for both problem classes, QPALM exhibits the best performance, with QPDO slightly behind and OSQP third.However, the time profiles in Fig. 2 show that, on equality-constrained QPs, QPDO scales better than the other solvers.Indeed, QPDO is the first to complete the test set of random problems.OSQP reaches the time limit on few problems, due to the relative high accuracy requirement.Overall, all solvers prove competitive.
Results Computational results are summarized in Table 1 and shown in Figs. 3 and 4. On this test set, QPDO demonstrates its robustness.OSQP is very fast for some problems but has a high failure rate.As a first-order method, OSQP builds upon computationally cheap iterations, but it may take many to cope with ill-conditioning and the relatively high accuracy requirements.QPALM fails on some problems, presumably due to linear algebra issues; its relatively high timing in Table 1 is associated to the failure rate.Considering only the problems it solved, QPALM's timing (sgm) is 0.050 s, whereas QPDO takes 0.054 s.Indeed, this proves QPDO is both reliable and effective.

Conclusions
This paper presented a primal-dual Newton-type proximal method for convex quadratic programs.We build upon a simple yet crucial result: a suitable merit function for the proximal sub-problem is found in the proximal primal-dual augmented Lagrangian function.This allows us to effectively weave the proximal point method together with semismooth Newton's, yielding structured symmetric linear systems, exact linesearch, and the possibility to apply sparse multirank factorization updates.Requiring only convexity, the method is simple and easily warm started, can exploit sparsity pattern, is robust to early termination, and can detect infeasibility.
We have implemented our method QPDO in a general-purpose solver, written in open-source C code.We benchmarked it against state-of-the-art QP solvers, comparing run times and failure rates.QPDO proved reliable, effective, and competitive.

A Primal-dual proximal augmented Lagrangian function
We show that the merit function M in ( 12) for the sub-problem ( 11) is indeed the primal-dual proximal augmented Lagrangian function, proposed and investigated in [50,25,15].Let us reformulate (1) as the equivalent problem min for some given parameter µ > 0 and dual estimate y ∈ R m ; cf.[13,6] and [50,25].Introducing a primal proximal regularization, we define M z µ,σ (x, z, y, x, y) := M z µ (x, z, y, y) for some given parameter σ > 0 and primal estimate x ∈ R n .In the context of primal-dual augmented Lagrangian methods, the function M z µ,σ is to be jointly minimized with respect to x, z, and y [50,25].Following [15], we consider the explicit minimization over the auxiliary variable z.The minimizer z µ of M z µ,σ in ( 24) is readily obtained as

B Exact linesearch coefficients
We prove that the right-hand side of (18) coincides with ψ k (τ ) for all τ ∈ R, with the coefficients given in (19).Let w k := Ax + µ k (y k − y/2) and δw k := Aδx − µ k y/2; cf.(19).Then, from whenever clear from the context.[a, b], (a, b), [a, b), and (a, b] stand for closed, open, and half-open intervals, respectively, with end points a and b. [a; b], (a; b), [a; b), and (a; b] stand for discrete intervals, e.g., [a; b] = [a, b] ∩ Z.Given a vector x ∈ R n , x and x i denote its transpose and its i-th component, respectively.We adopt the norms x = x 2 := √ x x and x ∞ := max i∈[1;n] |x i |.Given a set S, |S| denotes its cardinality.In R n , the relations <, ≤, =, ≥, and > are understood component-wise.Given a nonempty closed convex set C ⊆ R n , we denote χ C : R n → R ∪ {+∞} its characteristic function, namely χ C (x) = 0 if x ∈ C and χ C (x) = +∞ otherwise, dist C : R n → R its distance, namely x → min z∈C z − x , and its projection Π C : R n → R n , namely x → arg min z∈C z − x .

Figure 1 :Figure 2 :Figure 3 :Figure 4 :
Figure 1: Comparison on random problems with performance profiles: fraction of problems solved by each solver as a function of performance ratio.

Table 1 :
Comparison on different problem classes with timings, as shifted geometric mean, and failure rates.Furthermore, performance profiles do not provide the percentage of problems that can be solved (for some given tolerance opt ) within a given time t.Thus, on the vein of data profiles [39, §2.2], we plot the function f a s : R → [0, 1], s ∈ S, defined by