1 Introduction

Quadratic programs (QPs) are one of the fundamental problems in optimization. In this paper, we consider linearly constrained convex QPs, in the form:

$$\begin{aligned} \min _{\mathbf {x}} \quad \frac{1}{2} \mathbf {x}^\top \mathbf{Q} \mathbf {x} + \mathbf{q}^\top \mathbf {x} ,\quad s.t.\quad \mathbf{l} \le \mathbf{A} \mathbf {x} \le \mathbf{u} , \end{aligned}$$
(1)

with \(\mathbf{x} \in \mathbb {R}^n\). \(\mathbf{Q} \in \mathbb {R}^{n \times n}\) and \(\mathbf{q} \in \mathbb {R}^n\) define the objective function, whereas the constraints are encoded by \(\mathbf{A} \in \mathbb {R}^{m \times n}\) and \(\mathbf{l}, \mathbf{u} \in {\overline{\mathbb {R}}}^m\). We assume (i) \(\mathbf{Q}\) is symmetric positive semidefinite, i.e., \(\mathbf{Q} \succeq 0\), and (ii) \(\mathbf{l}\) and \(\mathbf{u}\) satisfy \(\mathbf{l} \le \mathbf{u}\), \(\mathbf{l} < +\infty\), and \(\mathbf{u} > -\infty\) component-wise; cf. [33, 59]. We will refer to the nonempty, closed and convex set

$$\begin{aligned} {\mathcal {C}} {:}{=}\{ \mathbf{z} \in \mathbb {R}^m : \mathbf{l} \le \mathbf{z} \le \mathbf{u} \} \end{aligned}$$

as the constraint set. Note that (1) represents a general convex QP, in that it accomodates also equality constraints and bounds. We denote N the sum of the number of nonzero entries in \(\mathbf{Q}\) and \(\mathbf{A}\), i.e., \(N {:}{=}nnz( \mathbf{Q} ) + nnz( \mathbf{A} )\).

1.1 Background

Optimization problems in the form (1) appear in a variety of applications and are of interest in engineering, statistics, finance and many other fields. QPs often arise as sub-problems in methods for general nonlinear programming [10, 31, 44], and greatly vary in terms of problem size and structure.

Convex QPs have been studied since the 1950s [22] and several numerical methods have been developed since then. These differ in how they balance the number of iterations and the cost (e.g., run time) per iteration.

Active-set methods for QPs originated from extending the simplex method for linear programs (LPs) [64]. These methods select a set of binding constraints and iteratively adapt it, seeking the set of active constraints at the solution. Active-set algorithms can be easily warm started and can lead to finite convergence. Moreover, by adding and dropping constraints from the set of binding ones, factorization updates can be adopted for solving successive linear systems. However, these methods may require many iterations to identify the correct set of active constraints. Modern solvers based on active-set methods are qpOASES [20] and NASOQ [13].

First-order methods iteratively compute an optimal solution using only first-order information about the cost function [46, 49]. As these methods consist of computationally cheap and simple steps, they are well suited to applications with limited computing resources [59]. However, first-order algorithms usually require many iterations to achieve accurate solutions and may suffer from ill-conditioning of the problem data. Several acceleration schemes have been proposed to improve their behaviour [1, 62]. The open-source solver OSQP [59] offers an implementation based on ADMM [9].

Interior-point methods move the problem constraints to the objective function via barrier functions and solve a sequence of parametric sub-problems [10, Chap. 11], [44, Sect. 16.6]. Although not easily warm started, the polynomial complexity makes interior-point methods appealing for large scale problems [29]. They usually require few but rather demanding iterations [31, 44]. Interior-point methods are currently the default algorithms in the commercial solvers GUROBI [32] and MOSEK [43]. Recent developments are found in the regularized method IP-PMM [51].

Semismooth Newton methods apply a nonsmooth version of Newton’s method to the KKT conditions of the original problem [52, 53]. In the strictly convex case, i.e., with \(\mathbf{Q} \succ 0\), this approach performs very well as long as the underlying linear systems are nonsingular. Regularized, or stabilized, semismooth Newton-type methods, such as QPALM [33, 34] and FBstab [37], overcome these drawbacks.

The augmented Lagrangian framework [7, 14, 44], semismooth Newton methods [25, 53], and proximal techniques [48, 56] are undergoing a revival, as their seamless combination exhibits valuable properties and provides useful features, such as regularization and numerical stability [6, 33, 37]. These ideas form the basis for our approach.

1.2 Approach

In this work we present a numerical method for solving general QPs. The proposed algorithm is based on the proximal point algorithm and a semismooth Newton method for solving the sub-problems, which are always solvable for any choice of problem data. We therefore impose no restrictions such as strict convexity of the cost function or linear independence of the constraints. As such, our algorithm gathers together the benefits of fully regularized primal-dual methods and semismooth Newton methods with active-set structure. Our algorithm can exploit warm starting to reduce the number of iterations, as well as factorization caching and multi-rank update techniques for efficiency, and it can obtain accurate solutions.

Our approach, dubbed QPDO from “Quadratic Primal-Dual Optimizer”, is inspired by and shares many characteristics with algorithms that have already been proposed, in particular with QPALM [33] and FBstab [37]. On the other hand, they differ on some key aspects. QPALM relates to the proximal method of multipliers [33, Rem. 2], which in turn is associated to the classical (primal) augmented Lagrangian function [55]. Instead, FBstab and QPDO apply the proximal point method, yielding exact primal-dual regularization. A more detailed comparison is deferred to Sect. 5. However, FBstab reformulates the sub-problem via the (penalized) Fischer-Burmeister NCP function [11, 21], and adopts the squared residual norm as a merit function for the inner iterative loop; this prevents the use of symmetric sparse linear solvers. Instead, QPDO adopts the minimum NCP function, which leads to symmetric linear systems with active-set structure. Then, we show the primal-dual proximal augmented Lagrangian function, introduced in [27, 54] and [17], is a suitable merit function for the proximal sub-problem, which allows us to perform an exact linesearch in a fully primal-dual regularized context. Indeed, we believe, the main contribution of this work consists in recognizing this link, exploiting it to bridge the gap between previously proposed methods, and developing a robust and efficient algorithm that possesses their advantages but does not suffer from their disadvantages.

Notation \(\mathbb {N}\), \(\mathbb {Z}\), \(\mathbb {R}\), \(\mathbb {R}_+\), and \(\mathbb {R}_{++}\) denote the sets of natural, integer, real, non-negative real, and positive real numbers, respectively. We denote \({\overline{\mathbb {R}}} := \mathbb {R}\cup \{-\infty ,\infty \}\) the extended real line. The identity matrix and the vector of ones of size n are denoted by \(\mathbf{I}_n\) and \(\mathbf{1}_n\), respectively. We may omit subscripts whenever clear from the context. [ab], (ab), [ab), and (ab] stand for closed, open, and half-open intervals, respectively, with end points a and b. [ab], (ab), [ab), and (ab] stand for discrete intervals, e.g., \([a;b] = [a,b] \cap {\mathbb {Z}}\). Given a vector \(\mathbf {x} \in \mathbb {R}^n\), \(\mathbf {x}^\top\) and \(\mathbf {x}^i\) denote its transpose and its i-th component, respectively. We adopt the norms \(\Vert \mathbf {x} \Vert = \Vert \mathbf {x} \Vert _2 {:}{=}\sqrt{\mathbf {x}^\top \mathbf {x}}\) and \(\Vert \mathbf {x} \Vert _\infty {:}{=}\max _{i \in [1;n]} |\mathbf {x}^i|\). Given a set S, |S| denotes its cardinality. In \(\mathbb {R}^n\), the relations <, \(\le\), \(=\), \(\ge\), and > are understood component-wise. Given a nonempty closed convex set \({\mathcal {C}} \subseteq \mathbb {R}^n\), we denote \(\chi _{\mathcal {C}} : \mathbb {R}^n \rightarrow \mathbb {R}\cup \{+\infty \}\) its characteristic function, namely \(\chi _{\mathcal {C}}(\mathbf {x}) = 0\) if \(\mathbf {x} \in {\mathcal {C}}\) and \(\chi _{\mathcal {C}}(\mathbf {x}) = +\infty\) otherwise, \(dist_{\mathcal {C}} : \mathbb {R}^n \rightarrow \mathbb {R}\) its distance, namely \({\mathbf {x}} \mapsto {\min }_{\mathbf{z} \in {\mathcal {C}}} \Vert \mathbf{z}-\mathbf {x}\Vert\), and its projection \(\Pi _{\mathcal {C}} : \mathbb {R}^n \rightarrow \mathbb {R}^n\), namely \({\mathbf {x}} \mapsto {\arg \min }_{\mathbf{z} \in {\mathcal {C}}} \Vert \mathbf{z}-\mathbf {x}\Vert\). Thus, it holds \(dist_{\mathcal {C}}(\mathbf {x}) = \Vert \Pi _{\mathcal {C}}(\mathbf {x}) - \mathbf {x} \Vert\).

The algorithm is described with a nested structure, whose outer iterations are indexed by \(k \in \mathbb {N}\). Given an arbitrary vector \({\mathbf {x}}\), \({\mathbf {x}}_k\) denotes that \({\mathbf {x}}\) depends on k, and analogously for matrices. We denote y the dual variable associated with the constraints in problem (1). A primal-dual pair \((\mathbf {x},\mathbf{y})\) will be denoted v, and we will refer interchangeably to it as a vector or to its components \({\mathbf {x}}\) and y. An optimal solution to (1) will be denoted \(( {\mathbf {x}}^\star ,{\mathbf {y}}^\star )\), or \(\mathbf {v}^\star\). Optimal solutions of proximal sub-problems will be denoted using an appropriate subscript, according to the iteration. For example, \((\mathbf {x}_k^\star ,\mathbf {x}_k^\star )\), and \(\mathbf {x}_k^\star\), denote the solution to the proximal sub-problem corresponding to the k-th outer iteration.

Outline The rest of the paper is organized as follows. Sections 2 and 3 develop and present our method in detail. In particular, in Sect. 3.1 we establish our key result, which relates the proximal operator and the primal-dual proximal augmented Lagrangian function. Our algorithmic framework is outlined in Section 4 and the convergence properties are analyzed in Sect. 4.1, while Sect. 5 contrasts QPDO with similar methods. We present details of our implementation in Sect. 6 and report on numerical experience in Sect. 7.

2 Outer loop: inexact proximal point method

Our method solves (1) using the proximal point algorithm with inexact evaluation of the proximal operator. The latter is evaluated by means of a semismooth Newton-type method, which constitutes an inner iterative procedure further investigated in Sect. 3. Here we focus on the outer loop corresponding to the proximal point algorithm, which has been extensively studied in the literature [56]. We recall some important results and refer to [38, 40, 45, 55] for more details.

2.1 Optimality conditions

Problem (1) can be equivalently expressed as

$$\begin{aligned} \min _{\mathbf {x}}\quad f(\mathbf {x}) + g(\mathbf{A}{\mathbf {x}}) , \end{aligned}$$
(2)

where

$$\begin{aligned} f({\mathbf {x}}) {:}{=}\frac{1}{2} {\mathbf {x}}^\top \mathbf{Q} {\mathbf {x}} + \mathbf{q}^\top {\mathbf {x}} \qquad \text {and}\qquad g(\mathbf{z}) {:}{=}\chi _{\mathcal {C}}(\mathbf{z}) \end{aligned}$$

are the objective function and the characteristic function of the constraint set \({\mathcal {C}}\), respectively. The necessary and sufficient first-order optimality conditions of (2), and hence (1), read

$$\begin{aligned} \mathbf{0} \in {\mathcal {T}}( {\mathbf {v}}) {:}{=}\begin{pmatrix} \mathbf{Q} {\mathbf {x}} + \mathbf{q} + \mathbf{A}^\top {\mathbf {y}}\\ - \mathbf{A} {\mathbf {x}} + \partial g^*(\mathbf{y}) \end{pmatrix} , \end{aligned}$$
(3)

where \(\partial g^*\) denotes the (set-valued) conjugate subdifferential of g [45]. For all \(i \in [1;m]\), \(\left( \partial g^*({\mathbf {y}}) \right) ^i = \mathbf{l}^i\) if \({{\mathbf {y}}^i < 0}\), \(\left( \partial g^*({\mathbf {x}}) \right) ^i = [\mathbf{l}^i,\mathbf{u}^i]\) if \({{\mathbf {x}}^i = 0}\), and \(\left( \partial g^*({\mathbf {y}}) \right) ^i = \mathbf{u}^i\) if \({{\mathbf {y}}^i > 0}\). We will refer to \({\mathcal {T}} : \mathbb {R}^\ell \rightrightarrows \mathbb {R}^\ell\), \(\ell {:}{=}n + m\), as the KKT operator for (1). However, noticing that, for any \(\alpha > 0\), the conditions \({\mathbf{v} = \Pi _{{\mathcal {C}}}( {\mathbf {v}}+ \alpha \mathbf{u} )}\) and \({{\mathbf {v}} \in \partial g^*(\mathbf{u})}\) are equivalent [57, Sect. 23], conditions in (3) can be reformulated. Choosing \({\alpha = 1}\), we define the (outer) residual \({\mathbf{r} : \mathbb {R}^\ell \rightarrow \mathbb {R}^\ell }\) and equivalently express (3) as

$$\begin{aligned} \mathbf{0} = \mathbf{r}( {\mathbf {v}}) {:}{=}\begin{pmatrix} \mathbf{Q} {\mathbf {x}} + \mathbf{q} + \mathbf{A}^\top {{\mathbf {y}}} \\ \mathbf{A} {\mathbf {x}} - \Pi _{{\mathcal {C}}}(\mathbf{A} {\mathbf {x}} + {\mathbf {y}}) \end{pmatrix} . \end{aligned}$$
(4)

This reformulation can be obtained also by employing the minimum NCP function [60] and rearranging to obtain the projection operator \(\Pi _{{\mathcal {C}}}\). The residual \(\mathbf{r}\) is analogous to the natural residual function \(\varvec{\pi }\) investigated in [47]. Since it is an error bound for problem (1), i.e., \(dist_{{\mathcal {T}}^{-1}(\mathbf{0})}({\mathbf {v}}) = {\mathcal {O}}( \Vert \mathbf{r}({\mathbf {v}})\Vert )\) [47, Thm 18], \(\mathbf{r}\) is a suitable optimality measure and its norm can be adopted as a stopping criterion. Although equivalent, (3) is considered here only as a theoretical tool for developing the proposed method, whereas the outer residual \(\mathbf{r}\) in (4) serves as a computationally practical optimality criterion.

2.2 Proximal point algorithm

The proximal point algorithm [56] finds zeros of maximal monotone operators by recursively applying their proximal operator. Since \({\mathcal {T}}\) is a maximal monotone operator [45, 55], the proximal point algorithm converges to an element \({\mathbf {v}}^\star\) of the set of primal-dual solutions \({\mathcal {T}}^{-1}(\mathbf{0})\), if any exists [40, 56]. Starting from an initial guess \({\mathbf {v}}_0\), it generates a sequence \(\{ {\mathbf {v}}_k \}\) of primal-dual pairs by recursively applying the proximal operator \({\mathcal {P}}_k\):

$$\begin{aligned} {\mathbf {v}}_{k+1} = {\mathcal {P}}_k( {\mathbf {v}}_k ) ,\qquad {\mathcal {P}}_k {:}{=}(\mathbf{I} + \mathbf \Sigma _k^{-1} {\mathcal {T}})^{-1} , \end{aligned}$$
(5)

where \(\{ \mathbf \Sigma _k \}\) is a sequence of non-increasing positive definite matrices, namely, \(\mathbf \Sigma _k \succ 0\) and \(\mathbf \Sigma _k - \mathbf \Sigma _{k+1} \succeq 0\) for all \(k\in \mathbb {N}\). The matrices \(\mathbf \Sigma _k\) control the primal-dual proximal regularization and, similarly to exact penalty methods, these are not required to vanish [55, 56]. Since \({\mathcal {T}}\) is maximal monotone, \({\mathcal {P}}_k\) is well defined and single valued for all \({\mathbf {v}} \in dom{\mathcal {T}} = \mathbb {R}^\ell\) [40]. Thus, from (5), evaluating \({\mathcal {P}}_k\) at \({\mathbf {v}}_k\) is equivalent to finding the unique \({\mathbf {v}} \in \mathbb {R}^\ell\) that satisfies

$$\begin{aligned} \mathbf{0} \in {\mathcal {T}}_k({\mathbf {v}}) {:}{=}{\mathcal {T}}({\mathbf {v}}) + \mathbf \Sigma _k ({\mathbf {v}} - {\mathbf {v}}_k) . \end{aligned}$$
(6)

This is guaranteed to have a unique solution and to satisfy certain useful regularity properties; see Sect. 3 below. As a result, we can construct a fast inner solver for these sub-problems based on the semismooth Newton method.

2.3 Early termination

The proximal point algorithm tolerates errors, namely the inexact evaluation of \({\mathcal {P}}_k\) [56]. Criterion \((A_r)\) in [38] provides conditions for the design of convergent inexact proximal point algorithms [38, Thm 2.1]. Let \({\mathbf {v}}_k^\star {:}{=}{\mathcal {P}}_k({\mathbf {v}}_k)\) denote the unique proximal sub-problem solution and \({\mathbf {v}}_{k+1} \approx {\mathbf {v}}_k^\star\) the actual recurrence update. The aforementioned criterion requires

$$\begin{aligned} \Vert {\mathbf {v}}_{k+1} - {\mathbf {v}}_k^\star \Vert \le e_k \min \left( 1, \Vert {\mathbf {v}}_{k+1} - {\mathbf {v}}_k \Vert ^r \right) , \end{aligned}$$
(7)

where \(r \ge 0\) and \(\{ e_k \}\) is a summable sequence of nonnegative inner tolerances, i.e., \(e_k \ge 0\) for all k and \(\sum _{k=0}^\infty e_k < +\infty\). However, since \({\mathbf {v}}_k^\star\) is effectively unknown, this criterion is impractical. Instead, in Algorithm 1 it is required that \({\mathbf {v}}_{k+1}\) satisfy \(\Vert \mathbf{r}_k({\mathbf {v}}_{k+1}) \Vert _\infty \le \varepsilon _k\). Here, \(\mathbf{r}_k\) denotes the residual for the k-th sub-problem, and is defined in (14). In Sect. 4.1 we will show that this criterion is a simple and viable substitute, which retains the significance of \((A_r)\).

2.4 Warm starting

If a solution \({\mathbf {v}}^\star\) exists, the (outer) sequence \(\{{\mathbf {v}}_k \}\) generated by (5) converges, by the global convergence of the proximal point algorithm [56]. Then, \({\mathcal {P}}_k({\mathbf {v}}_k)\) and \({\mathbf {v}}_k\) are arbitrarily close to each other for sufficiently large k [37, Sect. 4]. This supports the idea of warm starting the inner solver with the current outer estimate \({\mathbf {v}}_k\), that is, setting \({\mathbf {v}} \leftarrow {\mathbf {v}}_k\) in Algorithm 2. In practice, for large k, only one or few Newton-type inner iterations are needed to find an approximate sub-problem solution \({\mathbf {v}}_{k+1}\).

2.5 Primal and dual infeasibility

Infeasibility detection in convex programming has been studied in [4, 5]. Certifying primal infeasibility of (1) amounts to finding a vector \({\mathbf {y}} \in \mathbb {R}^m\) such that

$$\begin{aligned} \mathbf{A}^\top {\mathbf {y}}= \mathbf{0} ,\quad \sum _{\begin{array}{c} i \in [1;m] \\ \mathbf{u}^i \in \mathbb {R} \end{array}} \mathbf{u}^i \max ({\mathbf {y}}^i,0) + \sum _{\begin{array}{c} i \in [1;m] \\ \mathbf{l}^i \in \mathbb {R} \end{array}} \mathbf{l}^i \min ({\mathbf {y}}^i,0) < 0 . \end{aligned}$$
(8)

Similarly, it can be shown that a vector \({\mathbf {x}} \in \mathbb {R}^n\) satisfying

$$\begin{aligned} \mathbf{Q} {\mathbf {x}} = \mathbf{0} ,\quad \mathbf{q}^\top {\mathbf {x}} < 0 ,\quad (\mathbf{A} {\mathbf {x}})^i {\left\{ \begin{array}{ll} = 0 &{} \mathbf{l}^i, \mathbf{u}^i \in \mathbb {R}, \\ \ge 0 &{} \mathbf{l}^i \in \mathbb {R}, \mathbf{u}^i = +\infty , \\ \le 0 &{} \mathbf{l}^i = -\infty , \mathbf{u}^i \in \mathbb {R}, \end{array}\right. } \quad i \in [1;m] , \end{aligned}$$
(9)

is a certificate of dual infeasibility for (1) [4, Prop. 3.1].

3 Inner loop: semismooth Newton method

In this section we focus on solving (6) via a semismooth Newton method. For the sake of clarity, and without loss of generality, we consider

$$\begin{aligned} \mathbf \Sigma _k {:}{=}blockdiag(\sigma _k \mathbf{I}_n, \mu _k \mathbf{I}_m) . \end{aligned}$$

for some parameters \(\sigma _k, \mu _k \in \mathbb {R}_{++}\).

3.1 Merit function

We now derive the simple yet fundamental result that is the key to developing our method. This provides the NCP reformulation of the proximal sub-problem with a suitable merit function. The former yields symmetric active-set linear systems, while the latter leads to exact linesearch.

Let us express (6) in the form

$$\begin{aligned} \mathbf{0} \in \begin{pmatrix} \mathbf{Q} {\mathbf {x}} + \mathbf{q} + \mathbf{A}^\top {\mathbf {y}} + \sigma _k ({\mathbf {x}} - {\mathbf {x}}_k) \\ - \mathbf{A} {\mathbf {x}} + \mu _k ({\mathbf {y}} - {\mathbf {y}}_k) + \partial g^*({\mathbf {y}}) \end{pmatrix} . \end{aligned}$$
(10)

Similarly to (4), for any given \(\alpha > 0\), this can be rewritten as

$$\begin{aligned} \mathbf{0} = \begin{pmatrix} \mathbf{Q} {\mathbf {x}} + \mathbf{q} + \mathbf{A}^\top {\mathbf {y}}+ \sigma _k ({\mathbf {x}} - {\mathbf {x}}_k) \\ \mathbf{A} {\mathbf {x}} + \mu _k ({\mathbf {y}}_k - {\mathbf {y}}) - \Pi _{{\mathcal {C}}}( \mathbf{w}_k ) \end{pmatrix} , \end{aligned}$$
(11)

where we denote

$$\begin{aligned} \mathbf{w}_k {:}{=}\mathbf{A} {\mathbf {x}} + \mu _k ({\mathbf {y}}_k - {\mathbf {y}}) + \alpha {\mathbf {y}} . \end{aligned}$$

The second condition in (11) can be expressed as \(\mathbf{0} = \mathbf{w}_k - \Pi _{{\mathcal {C}}}(\mathbf{w}_k ) - \alpha {\mathbf {x}}\). Then, we substitute y with \([\mathbf{w}_k - \Pi _{{\mathcal {C}}}(\mathbf{w}_k ) ] / \alpha\) in the first condition in (11), and multiply the second one by \((\alpha - \mu _k)/\alpha\). Hence, for any positive \(\alpha \ne \mu _k\), (11) is equivalent to

$$\begin{aligned} \mathbf{0} = \begin{pmatrix} \mathbf{Q} {\mathbf {x}} + \mathbf{q} + \frac{1}{\alpha } \mathbf{A}^\top [\mathbf{w}_k - \Pi _{{\mathcal {C}}}(\mathbf{w}_k)] + \sigma _k ({\mathbf {x}} - {\mathbf {x}}_k) \\ \frac{\alpha - \mu _k}{\alpha } [\mathbf{w}_k - \Pi _{{\mathcal {C}}}(\mathbf{w}_k )] + (\mu _k - \alpha ) {\mathbf {x}} \end{pmatrix} , \end{aligned}$$
(12)

namely their unique solutions coincide. Now, we observe that the right-hand side of (12) is the gradient of the function

$$\begin{aligned} f({\mathbf {x}}) + \frac{1}{2 \alpha } dist_{{\mathcal {C}}}^2(\mathbf{w}_k ) + \frac{\sigma _k}{2} \Vert {\mathbf {x}} - {\mathbf {x}}_k\Vert ^2 + \frac{\mu _k - \alpha }{2} \Vert {\mathbf {y}}\Vert ^2 . \end{aligned}$$
(13)

By construction, this is a continuously differentiable function whose gradient vanishes at the unique solution of the proximal sub-problem. Furthermore, for any \(\alpha \in (0,\mu _k)\), it is strictly convex and hence admits a unique minimizer that must coincide with the unique proximal point. Therefore, (13) is a suitable merit function for the sub-problem. The particular choice \(\alpha {:}{=}\mu _k / 2\) inherits all these properties and leads to the inner optimality conditions

$$\begin{aligned} \mathbf{0} = \mathbf{r}_k( {\mathbf v} ) {:}{=}\begin{pmatrix} \mathbf{Q} {\mathbf x} + \mathbf{q} + \mathbf{A}^\top {\mathbf y} + \sigma _k ({\mathbf x} - {\mathbf x}_k) \\ \mathbf{A} {\mathbf x} + \mu _k ({\mathbf y}_k - {\mathbf y}) - \Pi _{{\mathcal {C}}}(\mathbf{A} {\mathbf x} + \mu _k ({\mathbf y}_k - {\mathbf y}/2)) \end{pmatrix} , \end{aligned}$$
(14)

with \(\mathbf{r}_k : \mathbb {R}^\ell \rightarrow \mathbb {R}^\ell\) the inner residual, and the associated merit function

$$\begin{aligned} {\mathcal {M}}_k(\mathbf{v}) {:}{=}f({\mathbf {x}}) + \frac{1}{\mu _k} dist_{{\mathcal {C}}}^2( \mathbf{A} {\mathbf {x}} + \mu _k ({\mathbf {y}}_k - {\mathbf {y}}/2) ) + \frac{\sigma _k}{2} \Vert {\mathbf {x}} - {\mathbf {x}}_k\Vert ^2 + \frac{\mu _k}{4} \Vert {\mathbf {y}}\Vert ^2 . \end{aligned}$$
(15)

In fact, \({\mathcal {M}}_k : \mathbb {R}^\ell \rightarrow \mathbb {R}\) is the primal-dual proximal augmented Lagrangian function [17, 27, 54]; see Appendix A for a detailed derivation. This underlines once again the strong relationship between the proximal point algorithm and the augmented Lagrangian framework, pioneered in [55]. On the one hand, by (15), the dual regularization parameter \(\mu _k\) controls the constraint penalization [23, Sect. 3.2]. On the other hand, it provides an “interpretation of the augmented Lagrangian method as an adaptive constraint regularization process” [3, Sect. 2].

The inner residual \(\mathbf{r}_k\) in (14) is piecewise affine, hence strongly semismooth on \(\mathbb {R}^\ell\) [36, 52]. In fact, given \(\mu _k\) bounded away from zero and the unique, bounded, and nonsingular matrix \(\mathbf{T}_k\) defined by

$$\begin{aligned} \mathbf{T}_k {:}{=}\begin{bmatrix} \mathbf{I} &{} \frac{2}{\mu _k} \mathbf{A}^\top \\ \mathbf{0} &{} -\mathbf{I} \end{bmatrix} , \end{aligned}$$
(16)

we have the identity \(\nabla {\mathcal {M}}_k(\cdot ) = \mathbf{T}_k \mathbf{r}_k(\cdot )\). Effectively, \(\Vert \mathbf{r}_k(\cdot )\Vert\) can be employed as stopping criterion in place of \(\Vert \nabla {\mathcal {M}}_k (\cdot )\Vert\). We prefer the former, since \(\mathbf{r}_k\) corresponds to a perturbation of the outer residual \(\mathbf{r}\); cf. (4).

The availability of a suitable merit function allows us to adopt a damped Newton-type method and design a linesearch globalization strategy, in contrast with [25, 37, 50]. Since \({\mathcal {M}}_k\) is continuously differentiable and piecewise quadratic, an exact linesearch procedure can be carried out, which yields finite convergence [61].

Finally, we highlight that the method asymptotically reduces to a sequence of regularized semismooth Newton steps applied to the original, unperturbed optimality system, in the vein of [2]. This closely relates to the concept of exact regularization [24]. In fact, the proximal primal-dual regularization is exact; see Theorem 1 and compare with [3, Thm 1].

Proposition 1

Let \(k \in \mathbb {N}\) be arbitrary.

  1. (i)

    Suppose \({\mathbf {v}}_k^\star\) solves (14) for \({\mathbf {v}}_k {:}{=}{\mathbf {v}}_k^\star\) and for some \(\sigma _k \ge 0\) and \(\mu _k > 0\). Then, \({\mathbf {v}}_k^\star\) solves (4).

  2. (ii)

    Alternatively, suppose \({\mathbf {v}}_k^\star\) solves (14) for \({\mathbf {y}}_k {:}{=}{\mathbf {y}}_k^\star\), \(\sigma _k {:}{=}0\), and for some \(\mu _k > 0\). Then, \({\mathbf {v}}_k^\star\) solves (4).

  3. (iii)

    Conversely, suppose \({\mathbf {v}}^\star\) solves (4). Then, \({\mathbf {v}}^\star\) solves (14) for \({\mathbf {v}}_k {:}{=}{\mathbf {v}}^\star\) and for any \(\sigma _k \ge 0\) and \(\mu _k > 0\).

Proof

The proof is immediate by direct comparison of (4) and (14). \(\square\)

Subproblem (14) is equivalent to the unconstrained minimization of the primal-dual augmented Lagrangian function \({\mathcal {M}}_k\), given in (15). However, by introducing the auxiliary variable \(\mathbf{s} \in \mathbb {R}^m\), we can rewrite subproblem (14) as the equivalent yet smoother problem

$$\begin{aligned} \min _{{\mathbf {x}},\mathbf{s}}\quad&\frac{1}{2} {\mathbf {x}}^\top \mathbf{Q} {\mathbf {x}} + \mathbf{q}^\top {\mathbf {x}} + \frac{\sigma _k}{2} \Vert {\mathbf {x}} - {\mathbf {x}}_k \Vert ^2 + \frac{1}{2 \mu _k} \Vert \mathbf{s} - \mu _k \mathbf{y}_k \Vert ^2 \nonumber \\ s.t.\quad&\mathbf{l} \le \mathbf{A} {\mathbf {x}} + \mathbf{s} \le \mathbf{u} , \end{aligned}$$
(17)

that is a primal-dual proximal regularization of (1). Indeed, it is always feasible and strictly convex and the constraints satisfy the linear independence constraint qualification (LICQ). This shows that each outer iteration is associated to a regularized QP, which can be effectively solved by Newton-type methods.

3.2 Search direction

A semismooth Newton direction \(\delta {\mathbf {v}} = (\delta {\mathbf {x}},\delta {\mathbf {y}})\) at \(\mathbf{v} = ({\mathbf {x}},{\mathbf {y}})\) solves

$$\begin{aligned} \mathbf{V}_k({\mathbf {v}}) \delta {\mathbf {v}} = - \mathbf{r}_k({\mathbf {v}}) , \end{aligned}$$
(18)

where

$$\begin{aligned} \mathbf{V}_k({\mathbf {v}}) = \begin{bmatrix} \mathbf{Q} + \sigma _k \mathbf{I} &{} \mathbf{A}^\top \\ (\mathbf{I} - \mathbf{P}_k({\mathbf {v}})) \mathbf{A} &{} -\mu _k (\mathbf{I} - \mathbf{P}_k({\mathbf {v}})/2) \end{bmatrix} \end{aligned}$$
(19)

is an element of the generalized Jacobian [57, Sect. 23] of \(\mathbf{r}_k\) at v. In turn, the diagonal matrix \(\mathbf{P}_k({\mathbf {v}})\) with entries

$$\begin{aligned} \mathbf{P}_k^{ii}({\mathbf {v}}) {:}{=}{\left\{ \begin{array}{ll} 1 &{} \text {if}~\mathbf{l}^i< \mathbf{w}_k^i < \mathbf{u}^i \\ 0 &{} \text {otherwise} \end{array}\right. }, ~i=1,\dots ,m , \end{aligned}$$
(20)

is an element of the generalized Jacobian of \(\Pi _{{\mathcal {C}}}\) at \(\mathbf{w}_k\). Owing to (20), (18) can be rewritten in symmetric form, similar to those arising in active-set methods [35]. To this end, we notice that, if \(\mathbf{P}_k^{ii}({\mathbf {v}}) = 1\), the corresponding inner residual in (14) simplifies into \(\mathbf{r}_k^{n+i}({\mathbf {v}}) = - \mu _k {\mathbf {y}}^i / 2\), and the linear equation in (18) gives \(\delta {\mathbf {y}}^i = - {\mathbf {y}}^i\). This yields the crucial observation that, by (20), \(\mathbf{P}_k({\mathbf {v}}) \delta {\mathbf {y}} = - \mathbf{P}_k({\mathbf {v}}) {\mathbf {y}}\) for all \({\mathbf {v}} \in \mathbb {R}^\ell\). Then, an equivalent yet symmetric linear system is obtained, whose solution is the search direction \(\delta {\mathbf {v}}\) at v:

$$\begin{aligned} \begin{bmatrix} \mathbf{Q} + \sigma _k \mathbf{I} &{} \mathbf{A}^\top (\mathbf{I} - \mathbf{P}_k({\mathbf {v}})) \\ (\mathbf{I} - \mathbf{P}_k({\mathbf {v}})) \mathbf{A} &{} - \mu _k (\mathbf{I} - \mathbf{P}_k({\mathbf {v}})/2) \end{bmatrix} \begin{pmatrix} \delta {\mathbf {x}} \\ \delta {\mathbf {y}} \end{pmatrix} = \begin{pmatrix} \mathbf{A}^\top \mathbf{P}_k({\mathbf {v}}) {\mathbf {y}} \\ \mathbf{0} \end{pmatrix} - \mathbf{r}_k({\mathbf {v}}) . \end{aligned}$$
(21)

The active-set structure introduced by \(\mathbf{P}_k\) allows us to obtain a symmetric linear system and adopt multi-rank factorization updates [15, 26] while maintaining structure and sparsity of the coefficient matrix [13, 59]. The linear system in (21) always admits a unique solution, since the coefficient matrix is symmetric quasi-definite [63], independent of the problem data.

3.3 Exact linesearch

Given a primal-dual pair v and a search direction \(\delta {\mathbf {v}}\), we seek a stepsize \(\tau > 0\) to effectively update v to \({\mathbf {v}} + \tau \, \delta {\mathbf {v}}\) in Algorithm 2. Similarly to \({\mathcal {M}}_k\), the function \(\psi _k : \tau \mapsto {\mathcal {M}}_k({\mathbf {v}} + \tau \delta {\mathbf {v}})\) is continuously differentiable, piecewise quadratic, and strictly convex. Thus, the optimal stepsize \(\tau {:}{=}arg\,min_{t \in \mathbb {R}} \psi _k(t)\) is found as the unique zero of \(\psi _k^\prime\), i.e., \(\psi _k^\prime (\tau ) = 0\). Since \(\psi _k^\prime\) is a piecewise linear, strictly monotone increasing function, the exact linesearch procedure amounts to solving a piecewise linear equation of the form

$$\begin{aligned} 0 = \alpha _k \tau + \beta _k + \frac{2}{\mu _k} \delta \mathbf{w}_k\left[\mathbf{w}_k + \tau \delta \mathbf{w}_k - \Pi _{{\mathcal {C}}}\left(\mathbf{w}_k + \tau \delta \mathbf{w}_k \right) \right] \end{aligned}$$
(22)

with respect to \(\tau \in \mathbb {R}\). Here, the coefficients are given by

$$\begin{aligned} \alpha _k\,{:}{=}\,\delta {\mathbf {x}}^\top (\mathbf{Q} + \sigma _k \mathbf{I}) \delta {\mathbf {x}} + \mu _k \delta {\mathbf {y}}^\top \delta {\mathbf {y}} /2 , \end{aligned}$$
(23a)
$$\begin{aligned} \beta _k\,{:}{=}\,\delta {\mathbf {x}}^\top [\mathbf{Q} {\mathbf {x}} + \mathbf{q} + \sigma _k ({\mathbf {x}} - {\mathbf {x}}_k) ] + \mu _k \delta {\mathbf {y}}^\top {\mathbf {y}} /2 , \end{aligned}$$
(23b)
$$\begin{aligned} \mathbf{w}_k\,{:}{=}\,\mathbf{A} {\mathbf {x}} + \mu _k \left( {\mathbf {y}}_k - {\mathbf {y}}/2 \right) , \end{aligned}$$
(23c)
$$\begin{aligned} \delta \mathbf{w}_k\,{:}{=}\,\mathbf{A} \delta {\mathbf {x}} - \mu _k \delta {\mathbf {y}}/2 , \end{aligned}$$
(23d)

whose derivation is reported in Appendix B. Thanks to its peculiar structure, (22) can be solved efficiently and exactly (up to numerical precision), e.g., by sorting and linear interpolation, cf. [33, Alg. 2].

We underline that the stepsize \(\tau\) is unique and strictly positive, since \({\mathcal {M}}_k\) is strictly convex and \(\delta {\mathbf {v}}\) is a descent direction for \({\mathcal {M}}_k\) at v. This follows from the observation that

$$\begin{aligned} \psi _k^\prime (0) = \delta {\mathbf {v}}^\top \nabla {\mathcal {M}}_k({\mathbf {v}}) = \delta {\mathbf {v}}^\top \mathbf{T}_k \mathbf{r}_k({\mathbf {v}}) = - \delta {\mathbf {v}}^\top \mathbf{T}_k \mathbf{V}_k({\mathbf {v}}) \delta {\mathbf {v}} < 0 , \end{aligned}$$

since \(\partial ^2 {\mathcal {M}}_k({\mathbf {v}}) \ni \mathbf{T}_k \mathbf{V}_k(\mathbf{v}) \succ 0\).

4 Algorithm and convergence

Our Quadratic Primal-Dual Optimizer (QPDO), which weaves together the proximal point algorithm and a semismooth Newton method, is outlined in Algorithms 1 and 2. We highlight the nested structure for clarity of presentation. Effectively, Algorithm 1 corresponds to the proximal point algorithm, as discussed in Sect. 2. The proximal operator, \({\mathcal {P}}_k\), is evaluated in Algorithm 2 by solving a sub-problem via the semismooth Newton method, as detailed in Sect. 3. We denote \(\mathbf{r}\) and \(\mathbf{r}_k\) the outer and inner residuals defined in (4,14), respectively, and v a primal-dual pair \(({\mathbf {x}},{\mathbf {y}})\). Infeasibility detection, parameters update, and linear solvers are detailed in Sect. 6.

Algorithm 1 QPDO: Quadratic Primal-Dual Optimizer

 

input: \(\mathbf{Q}\), \(\mathbf{q}\), \(\mathbf{A}\), \(\mathbf{l}\), \(\mathbf{u}\)

 

parameters: \(\epsilon > 0\), \(\epsilon _0 \ge 0\), \(\kappa _\epsilon \in [0,1)\), \(0 < \sigma _{\text {min}} \le \sigma _0\), \(0 < \mu _{\text {min}} \le \mu _0\)

 

guess: \(\mathbf{x}_0 \in \mathbb {R}^n\), \(\mathbf{y}_0 \in \mathbb {R}^m\)

 

for \(k=0,1,2,\dots \) do

 

      if \(\Vert \mathbf{r}( \mathbf{v}_k ) \Vert _\infty \le \epsilon \) then

 

         return \(\mathbf{v}_k\)

 

      end if

 

      find \(\mathbf{v}_{k+1}\) such that \(\Vert \mathbf{r}_k( \mathbf{v}_{k+1} ) \Vert _\infty \le \epsilon _k\) by invoking Algorithm 2

 

      check for primal-dual infeasibility with \(\varDelta \mathbf{v}_k {:}{=}\mathbf{v}_{k+1} - \mathbf{v}_k\)

 

      choose parameters \(\sigma _{k+1} \in [\sigma _{\text {min}},\sigma _k]\) and \(\mu _{k+1} \in [\mu _{\text {min}},\mu _k]\)

 

      set \(\epsilon _{k+1} \leftarrow \kappa _\epsilon \epsilon _k\)

 

end for

 

Algorithm 2 Inner loop: semismooth Newton method

 

\(\mathbf{v} \leftarrow \mathbf{v}_k\)

 

repeat

 

      get the search direction \(\delta \mathbf{v} \in \mathbb {R}^\ell \) by solving the linear system (21)

 

      get the stepsize \(\tau \in \mathbb {R}_{++}\) by solving the piecewise linear equation (22)

 

      set \(\mathbf{v} \leftarrow \mathbf{v} + \tau \, \delta \mathbf{v}\)

 

until \(\Vert \mathbf{r}_k( \mathbf{v} ) \Vert _\infty \le \epsilon _k\)

 

\(\mathbf{v}_{k+1} \leftarrow \mathbf{v}\)

 

4.1 Convergence analysis

This section discusses the convergence of QPDO as outlined in Algorithm 1 and 2. We show that the proposed algorithm either generates a sequence of iterates \( \{\mathbf {v}_k \}\) that in the limit satisfy the optimality conditions (4), when problem (1) is solvable, or provides a certificate of primal and/or dual infeasibility otherwise. Our analysis relies on well-established results for Newton and proximal point methods; in particular, we refer to [38, 56, 61].

First, we focus on the inner loop, described in Algorithm 2 and detailed in Sect. 3.

Lemma 1

Consider an arbitrary but fixed outer iteration, indexed by \(k \in \mathbb {N}\), and suppose \(\varepsilon _k \ge 0\). Then, the procedure in Algorithm 2 is well defined and terminates after finitely many steps.

Proof

The search direction \(\delta {\mathbf {v}}\) exists and is unique, since linear system (21) is always solvable. Similarly, there exists a unique, positive optimal stepsize \(\tau\) which solves (22). Thus, all steps of Algorithm 2 are well-defined. Since \({\mathcal {M}}_k\) is continuously differentiable, strictly convex, and piecewise quadratic, the semismooth Newton method with exact linesearch exhibits finite convergence [61, Thm 3]. Thus, \(\nabla {\mathcal {M}}_k({\mathbf {v}}) = \mathbf{0}\) after finitely many iterations. Then, by \(\nabla {\mathcal {M}}_k(\cdot ) = \mathbf{T}_k \mathbf{r}_k(\cdot )\) with \(\mathbf{T}_k\) nonsingular, it reaches \(\mathbf{r}_k({\mathbf {v}}) = \mathbf{0}\). Hence, for any \(\varepsilon _k \ge 0\), the inner stopping criterion \(\Vert \mathbf{r}_k({\mathbf {v}}) \Vert _\infty \le \varepsilon _k\) is eventually satisfied, and the inner loop terminates. \(\square\)

Notice that, with \(\varepsilon _k = 0\), Algorithm 2 returns the unique (proximal) point \({\mathbf {v}}_k^\star {:}{=}{\mathcal {P}}_k( {\mathbf {v}}_k )\).

Let us consider now the outer loop, sketched in Algorithm 1. Recall that, by construction, the regularization parameters are positive and non-increasing. The outer loop consists of inexact proximal point iterations [56], hence global and local convergence properties can be derived based on [38, Prop. 1.2]. The following result shows that criterion \((A_r)\) [38] holds.

Lemma 2

Let \({\mathcal {T}}^{-1}(\mathbf{0})\) be nonempty, any \({\mathbf {v}}_0 \in \mathbb {R}^\ell\) be given, and the sequence \(\{ {\mathbf {v}}_k \}\) be generated by Algorithm 1. Then, there exists a summable sequence \(\{ e_k \} \subseteq \mathbb {R}_+\) such that

$$\begin{aligned} \Vert {\mathbf {v}}_{k+1} - {\mathbf {v}}_k^\star \Vert \le e_k \quad \forall k \in \mathbb {N}. \end{aligned}$$

Proof

By \(\varepsilon _0 \in \mathbb {R}_+\) and \(\kappa _\varepsilon \in [0,1)\), the sequence \(\{\varepsilon _k\} \subseteq \mathbb {R}_+\) is summable, since \(\sum _{k \in \mathbb {N}} \varepsilon _k = \sum _{k \in \mathbb {N}} \kappa _\varepsilon ^k \varepsilon _0 = \varepsilon _0 / (1 - \kappa _\varepsilon ) < + \infty\). By the inner stopping condition, for all \(k \in \mathbb {N}\) it holds \(\Vert \mathbf{r}_k({\mathbf {v}}_{k+1}) \Vert \le \varepsilon _k\). Morever, since \({\mathcal {M}}_k\) is \(\mathbf \Sigma _k\)-strongly convex, we have that, for some \({\tilde{\eta }}_k > 0\), it is

$$\begin{aligned} {\tilde{\eta }}_k \Vert {\mathbf {v}} - {\mathbf {v}}_k^\star \Vert \le \Vert \nabla {\mathcal {M}}({\mathbf {v}}) - \nabla {\mathcal {M}}({\mathbf {v}}_k^\star ) \Vert = \Vert \nabla {\mathcal {M}}({\mathbf {v}}) \Vert = \Vert \mathbf{T}_k \, \mathbf{r}_k({\mathbf {v}}) \Vert \end{aligned}$$

for all \({\mathbf {v}} \in \mathbb {R}^\ell\). By the boundedness of \(\mathbf \Sigma _k\) away from zero, matrix \(\mathbf{T}_k\) is bounded and there exists a constant \(\eta > 0\) such that the bound \(\Vert {\mathbf {v}} - {\mathbf {v}}_k^\star \Vert \le \eta \Vert \mathbf{r}_k({\mathbf {v}}) \Vert\) holds for all \(k \in \mathbb {N}\) and \({\mathbf {v}} \in \mathbb {R}^\ell\). Thus, in particular, for all \(k \in \mathbb {N}\) it is

$$\begin{aligned} \Vert {\mathbf {v}}_{k+1} - {\mathbf {v}}_k^\star \Vert \le \eta \Vert \mathbf{r}_k({\mathbf {v}}_{k+1}) \Vert \le \eta \varepsilon _k . \end{aligned}$$

Let \(e_k {:}{=}\eta \varepsilon _k\), and the proof is complete. \(\square\)

Notice that we choose \(r = 0\) in \((A_r)\), particularly in (7), for the sake of simplicity, although this may prevent faster convergence; see [38, Thm 2.1]. Relying on the inexact proximal point algorithm, the following result states that Algorithm 1 converges to a solution, if one exists.

Theorem 1

Let \({\mathcal {T}}^{-1}(\mathbf{0})\) be nonempty, any \({\mathbf {v}}_0 \in \mathbb {R}^\ell\) be given, and the sequence \(\{ {\mathbf {v}}_k \}\) be generated by Algorithm 1. Then, the sequence \(\{ {\mathbf {v}}_k \}\) is well defined and converges to a solution \({\mathbf {v}}^\star \in {\mathcal {T}}^{-1}(\mathbf{0})\).

Proof

The error bound condition, namely criterion \((A_r)\), is enforced by costruction; cf. Lemma 2. It remains to show that there exists \(a ,\varepsilon > 0\) such that for all \(\mathbf{u} \in \mathbb {R}^\ell\), \(\Vert \mathbf{u} \Vert \le \varepsilon\), it holds \(dist_{{\mathcal {T}}^{-1}(\mathbf{0})}({\mathbf {v}}) \le a \Vert \mathbf{u} \Vert\) for all \({\mathbf {v}} \in {\mathcal {T}}^{-1}(\mathbf{u})\). Since problem (3) is a polyhedral variational inequality, this property holds globally [19, Sect. 3D]. Hence, we can invoke [38, Prop. 1.2] to conclude that \(\Vert {\mathbf {v}}_k - {\mathbf {v}}^\star \Vert \rightarrow 0\). \(\square\)

Finally, Theorem 2 guarantees that Algorithm 1 terminates if the original problem (1) does not admit any solution. This allows our method to detect infeasibility and to return a certificate.

Theorem 2

Suppose problem (1) is primal and/or dual infeasible, i.e., \({\mathcal {T}}^{-1}(\mathbf{0})\) is empty. Let any \({\mathbf {v}}_0 \in \mathbb {R}^\ell\) be given, the sequence \(\{ {\mathbf {v}}_k \}\) be generated by Algorithm 1, and define \(\varDelta {\mathbf {v}}_k {:}{=}{\mathbf {v}}_{k+1} - {\mathbf {v}}_k\). Then, the sequence \(\{ \varDelta {\mathbf {v}}_k \}\) admits a limit \(\varDelta {\mathbf {v}}\), i.e., \(\varDelta {\mathbf {v}}_k \rightarrow \varDelta {\mathbf {v}}\). Moreover,

  1. (i)

    if \(\varDelta {\mathbf {y}} \ne \mathbf{0}\), then problem (1) is primal infeasible and \(\varDelta {\mathbf {y}}\) satisfies the primal infeasibility condition (8);

  2. (ii)

    if \(\varDelta {\mathbf x} \ne \mathbf{0}\), then problem (1) is dual infeasible and \(\varDelta {\mathbf {x}}\) satisfies the dual infeasibility condition (9).

Proof

Lemma 5.1 in [4] ensures that \(\varDelta {\mathbf {v}}_k \rightarrow \varDelta \mathbf{v}\), since Algorithm 1 is an instance of the proximal point algorithm. If \({\mathcal {T}}^{-1}(\mathbf{0}) = \emptyset\), then \(\varDelta {\mathbf {v}} \ne \mathbf{0}\), and this gives certificates of primal and/or dual infeasibility according to [4, Thm 5.1]. \(\square\)

5 Relationship with similar methods

Our approach is inspired by and shares many features with other recently developed methods. This section elaborates upon their relationship with QPDO.

FBstab [37] “synergistically combines the proximal point algorithm with a primal-dual semismooth Newton-type method” to solve convex QPs. By adopting the Fischer-Burmeister [11, 21] NCP function, FBstab does not depend on an estimate of the active set, which may result in a more regular behavior than QPDO. In contrast, adopting the minimum NCP function, QPDO can exploit factorization updates, perform exact linesearch by solving a piecewise linear equation, and handle simultaneously bilateral constraints.

QPALM is a “proximal augmented Lagrangian based solver for convex QPs” [33]; recent advancements allow to handle nonconvex QPs as well [34]. Given a primal-dual estimate \(\overline{\mathbf {v}}\), the exact, unique resolvent update \({\mathbf {v}}^\triangle\) of QPALM [33, Eq. 6], with \(\mathbf \Sigma = blockdiag(\sigma ^{-1} \mathbf{I},\mu ^{-1} \mathbf{I})\), is given by

$$\begin{aligned} {\mathbf x}^\triangle= \textstyle arg\,min_{{\mathbf x} \in \mathbb {R}^n} \; \varphi ( {\mathbf x} ) , \end{aligned}$$
(24a)
$$\begin{aligned} {\mathbf y}^\triangle= \overline{{\mathbf y}} + \mu ^{-1} \left[ \mathbf{A} {\mathbf x}^\triangle - \Pi _{{\mathcal {C}}}\left( \mathbf{A} {\mathbf x}^\triangle + \mu \overline{{\mathbf y}} \right) \right] . \end{aligned}$$
(24b)

In (24a), \(\varphi\) is given by [33, Eq. 8]

$$\begin{aligned} \varphi ({\mathbf {x}}) {:}{=}f({\mathbf {x}}) + \frac{1}{2 \mu } dist_{{\mathcal {C}}}^2\left( \mathbf{A} {\mathbf {x}} + \mu \overline{{\mathbf {y}}} \right) + \frac{\sigma }{2} \Vert {\mathbf {x}} - \overline{\mathbf {x}}\Vert ^2 \end{aligned}$$

and closely resembles \({\mathcal {M}}_k\) in (15). Since (24a) yields \(\nabla \varphi ({\mathbf {x}}^\triangle ) = \mathbf{0}\), combining with (24b) and rearranging give

$$\begin{aligned} \mathbf{0}&= \mathbf{Q} {\mathbf {x}}^\triangle + \mathbf{q} + \mathbf{A}^\top {\mathbf {y}}^\triangle + \sigma \left( {\mathbf {x}}^\triangle - \overline{{\mathbf {x}}} \right) , \end{aligned}$$
(25a)
$$\begin{aligned} \mathbf{0}&= \mathbf{A} {\mathbf {x}}^\triangle + \mu \left( \overline{{\mathbf {y}}} - {\mathbf {y}}^\triangle \right) - \Pi _{{\mathcal {C}}}\left( \mathbf{A} {\mathbf {x}}^\triangle + \mu \overline{{\mathbf {y}}} \right) . \end{aligned}$$
(25b)

Conditions Eqs. (25) and (14) differ only in the argument of \(\Pi _{{\mathcal {C}}}\), where the term \(- \mu {\mathbf {y}}/2\) is missing in (25b). This underlines the primal-dual nature of QPDO. A comparative investigation into how QPDO copes with changes in the active set [35] and controls the quality of both primal and dual variables during iterations [2, 28] is a topic for future work.

OSQP is a “solver for convex quadratic programs based on the alternating direction method of multipliers” [59]. Rearranging from [59, Alg. 1], with parameters \(\alpha = 1\), \(\rho = \mu ^{-1}\), and given primal-dual estimate \((\overline{{\mathbf {x}}},\overline{{\mathbf {y}}})\) and constraint estimate \(\overline{\mathbf{z}}\), the (unique) primal-auxiliary update \(({\mathbf {x}}^\lozenge ,\mathbf{s}^\lozenge )\) satisfies

$$\begin{aligned} \mathbf{0}&= \mathbf{Q} {\mathbf {x}}^\lozenge + \mathbf{q} + \mathbf{A}^\top \mathbf{s}^\lozenge + \sigma ( {\mathbf {x}}^\lozenge - \overline{{\mathbf {x}}} ) , \end{aligned}$$
(26a)
$$\begin{aligned} \mathbf{0}&= \mathbf{A} {\mathbf {x}}^\lozenge + \mu (\overline{{\mathbf {y}}} - \mathbf{s}^\lozenge ) - \overline{\mathbf{z}} . \end{aligned}$$
(26b)

Then, the constraint and dual updates are given by \(\mathbf{z}^\lozenge = \Pi _{{\mathcal {C}}}\left( \overline{\mathbf{z}} + \mu \mathbf{s}^\lozenge \right)\) and \(\mathbf{y}^\lozenge = \mathbf{s}^\lozenge + \mu ^{-1} \left( \overline{\mathbf{z}} - \mathbf{z}^\lozenge \right)\), respectively. Although conditions (26) resemble (14), an auxiliary variable \(\mathbf{s}\) substitutes the dual variable y and the projection in (14) is replaced by the constraint estimate \(\overline{\mathbf{z}}\). This makes sub-problem (26) a linear system and results in a first-order method.

6 Implementation details

QPDO has been implemented in C and provides a MATLAB interface. It can solve QPs of the form (1) and makes no assumptions about the problem data other than convexity; it is available online at

https://github.com/aldma/qpdo.

This section discusses some relevant aspects of the program, such as the linear solver, parameters update rules, infeasibility detection, and problem scaling.

6.1 Linear solver

The linear system (21) is solved with CHOLMOD [12], using a sparse Cholesky factorization. This linear solver is analogous to the one adopted in QPALM [33], for the sake of comparison. Let \((\mathbf{r}_k^\text {dual},\mathbf{r}_k^\text {prim})\) partition the inner residual \(\mathbf{r}_k\) in (14). Then, formally solving for \(\delta {\mathbf {y}}\) in (21), we obtain the expression (omitting subscripts and arguments)

$$\begin{aligned} \delta {\mathbf {y}}&= \mu ^{-1} (\mathbf{I} - \mathbf{P}/2)^{-1} \left[ (\mathbf{I} - \mathbf{P}) \mathbf{A} \delta {\mathbf {x}} + \mathbf{r}^{\text {prim}} \right] \\&= \mu ^{-1} (\mathbf{I} + \mathbf{P}) \left[ (\mathbf{I} - \mathbf{P}) \mathbf{A} \delta {\mathbf {y}} + \mathbf{r}^{\text {prim}} \right] \\&= \mu ^{-1} (\mathbf{I} - \mathbf{P}) \mathbf{A} \delta {\mathbf {x}} + \mu ^{-1} (\mathbf{I} + \mathbf{P}) \mathbf{r}^{\text {prim}} , \end{aligned}$$

where the second and third lines are due to the binary structure of \(\mathbf{P}\). Substituting \({\mathbf {y}}\) and rearranging, we obtain a linear system for \(\delta {\mathbf {x}}\):

$$\begin{aligned} \left[ \mathbf{Q} + \sigma \mathbf{I} + \mu ^{-1} \mathbf{A}^\top (\mathbf{I} - \mathbf{P}) \mathbf{A} \right] \delta {\mathbf {x}}&= \mathbf{A}^\top \mathbf{P} {\mathbf {y}} - \mu ^{-1} \mathbf{A}^\top (\mathbf{I} - \mathbf{P}) \mathbf{r}^{\text {prim}} - \mathbf{r}^\text {dual} , \end{aligned}$$

which has a symmetric, positive definite coefficient matrix and can be solved by CHOLMOD [12]. On the one hand, this approach allows multi-rank factorization updates [15], thus avoiding the need for a full re-factorization at every inner iteration. On the other hand, sparsity pattern may be lost and significant fill-in may arise due to the matrix-matrix product \(\mathbf{A}^\top \mathbf{A}\). For this reason, the current implementation may benefit from directly solving (21) via sparse symmetric linear solvers, possibly with multi-rank factorization updates. To better exploit the data sparsity pattern and the capabilities of the proposed method, we plan to add other linear solvers in future versions.

6.2 Parameters selection

Solving convex QPs via the proximal point algorithm imposes mild restrictions on the sequence of primal-dual regularization parameters \(\{ \mathbf \Sigma _k \}\). As mentioned in Sect. 2.2, there are no additional requirements other than being non-increasing and positive definite. However, similarly to forcing sequences in augmented Lagrangian methods [14], the sequence of regularization parameters greatly affects the behaviour of QPDO, and a careful tuning can positively impact the performance. For instance, although faster convergence rates can be expected if \(\mathbf \Sigma _k \rightarrow \mathbf{0}\) [38], numerical stability and machine precision should be taken into account. Following [34, Sect. 5.3] and [59, Sect. 5.2], our implementation considers only diagonal matrices of the form \(\mathbf \Sigma _k = blockdiag( \sigma _k \mathbf{I}, diag(\varvec{\mu }_k) )\), and we refer to the effect of \(\sigma _k\) and \(\varvec{\mu }_k\) as primal and dual regularization, respectively.

The dual regularization parameter \(\varvec{\mu }_k\) proves critical for the practical performance of the method since it strikes the balance between the number of inner and outer iterations, seeking easy-to-solve sub-problems, effective warm starting, and rapid constraints satisfaction. Therefore, we carefully initialize and update \(\varvec{\mu }_k\), guided by the interpretation as a constraint penalization offered by the augmented Lagrangian framework; cf. Sect. 3.1. In our implementation, we consider a vector \(\varvec{\mu }_k\) to gain a finer control over the constraint penalization [14]. Given a (primal) initial guess \({\mathbf {x}}_0 \in \mathbb {R}^n\), we initialize as in [8, Sect. 12.4]:

$$\begin{aligned} \mathbf{d}_0&{:}{=}\mathbf{A} {\mathbf {x}}_0 - \Pi _{{\mathcal {C}}}(\mathbf{A} {\mathbf {x}}_0) , \\ \varvec{\mu }_0^i&{:}{=}\Pi _{\left[ \mu _0^{\text {min}} , \mu _0^{\text {max}} \right] }\left( \kappa _\mu \frac{\max ( 1, (\mathbf{d}_0^i)^2/2 )}{\max ( 1, |f({\mathbf {x}}_0)| )} \right) , ~i \in [1;m] , \end{aligned}$$

where \(\mu _0^{\text {max}} \ge \mu _0^{\text {min}} > 0\) and \(\kappa _\mu \ge 0\). Then, following [34, Sect. 5.3], we monitor the primal residual \(\mathbf{r}_{\text {prim}}({\mathbf {v}}) {:}{=}\mathbf{A} {\mathbf {x}} - \Pi _{{\mathcal {C}}}(\mathbf{A} {\mathbf {x}}+{\mathbf {y}})\) from (4) and update the dual regularization parameter \(\varvec{\mu }_k\) accordingly. If \(| \mathbf{r}_{\text {prim}}^i({v}_{k+1}) | > \max \left( \theta _\mu | \mathbf{r}_{\text {prim}}^i({\mathbf {v}}_k) | , \varepsilon _{\text {opt}} \right)\), we set

$$\begin{aligned} \varvec{\mu }_{k+1}^i = \Pi _{\left[ \mu _{\text {min}}, \varvec{\mu }_k^i \right] }\left( \delta _\mu \frac{\Vert \mathbf{r}_{\text {prim}}({\mathbf {v}}_{k+1}) \Vert _\infty }{| \mathbf{r}_{\text {prim}}^i({\mathbf {v}}_{k+1}) |} \varvec{\mu }_k^i \right) , \end{aligned}$$

where \(\theta _\mu \in (0,1)\), \(\mu _{\text {min}} > 0\), and \(\delta _\mu \ge 0\). Otherwise, we set \(\varvec{\mu }_{k+1}^i = \varvec{\mu }_k^i\). These rules adapt the constraint penalization on the current residual, seeking a uniform, steady progression towards feasibility, while making sure the sequences \(\{\varvec{\mu }_k^i\}\), \(i \in [1;m]\), are non-increasing and bounded away from zero. In our implementation, the default values are \(\mu _0^{\text {min}} = 10^{-3}\), \(\mu _0^{\text {max}} = 10^3\), \(\kappa _\mu = 0.1\), \(\mu _{\text {min}} = 10^{-9}\), \(\delta _\mu = 10^{-2}\) and \(\theta _\mu = 0.25\).

The primal regularization turns out to be less crucial with respect to the dual counterpart. For this reason, it is associated to a scalar value and tuned independently from the residual. Starting from \(\sigma _0 > 0\), we apply

$$\begin{aligned} \sigma _{k+1} = \max ( \sigma _{\text {min}} , \kappa _\sigma \sigma _k ) , \end{aligned}$$

where \(\sigma _{\text {min}} > 0\) and \(\kappa _\sigma \in [0,1]\). In our implementation the default values are \(\sigma _0 = 10^{-3}\), \(\sigma _{\text {min}} = 10^{-7}\), and \(\kappa _\sigma = 0.1\).

Early termination The inner tolerance \(\varepsilon _k\) also affects the performance of QPDO, since it balances sub-problem accuracy and early termination. In Algorithm 1, these aspects relate to the parameters \(\varepsilon _0\) and \(\kappa _\varepsilon\), which drive \(\{ \varepsilon _k \}\) to zero. However, finite precision should also be taken into account. In fact, although the semismooth Newton method converges in finitely many iterations, the solution provided is exact up to round-off errors and numerical precision. Therefore, we deviate from Algorithm 1 in this respect and employ the update rule

$$\begin{aligned} \varepsilon _{k+1} = \max ( \varepsilon _{\text {min}} , \kappa _\varepsilon \varepsilon _k ) , \end{aligned}$$

where \(0 \le \varepsilon _{\text {min}} \le \varepsilon _{\text {opt}}\). In our implementation, the default values are \(\varepsilon _0 = 1\), \(\kappa _\varepsilon = 0.1\), \(\varepsilon _{\text {min}} = 0.1 \varepsilon _{\text {opt}}\), and \(\varepsilon _{\text {opt}} = 10^{-6}\).

6.3 Infeasibility detection

A routine for detecting primal and dual infeasibility of (1) is included in Algorithm 1. This allows the algorithm to terminate with either a primal-dual solution or a certificate of primal or dual infeasibility, for some given tolerances. We adopt the mechanism developed in [4, Sect. 5.2], which holds whenever the proximal point algorithm is employed to solve the KKT conditions (3). Problem (1) is declared primal or dual infeasible based on the conditions given in Sect. 2.5 and the vectors \(\varDelta {\mathbf {x}}_k {:}{=}{\mathbf {x}}_{k+1} - {\mathbf {x}}_k\) and \(\varDelta {\mathbf {y}}_k {:}{=}{\mathbf {y}}_{k+1} - {\mathbf {y}}_k\), \(k \ge 0\). As in [34], we deem the problem primal infeasible if \(\varDelta {\mathbf {y}}_k \ne \mathbf{0}\) and the following two conditions hold

$$\begin{aligned} \Vert \mathbf{A}^\top \varDelta {\mathbf {y}}_k \Vert _\infty&\le \varepsilon _{\text {pinf}} \Vert \varDelta {\mathbf {y}}_k \Vert _\infty , \end{aligned}$$
(27a)
$$\begin{aligned} \mathbf{u}^\top \max ({\mathbf {y}}_k, \mathbf{0}) + \mathbf{l}^\top \min ({\mathbf {y}}_k, \mathbf{0})&\le - \varepsilon _{\text {pinf}} \Vert \varDelta {\mathbf {y}}_k \Vert _\infty , \end{aligned}$$
(27b)

where \(\varepsilon _{\text {pinf}} > 0\) is some tolerance level. The problem is considered dual infeasible if \(\varDelta {\mathbf {x}}_k \ne \mathbf{0}\) and the following conditions hold

$$\begin{aligned} \Vert \mathbf{Q} \varDelta {\mathbf {x}}_k \Vert _\infty&\le \varepsilon _{\text {dinf}} \Vert \varDelta {\mathbf {x}}_k \Vert _\infty , \end{aligned}$$
(28a)
$$\begin{aligned} \mathbf{q}^\top \varDelta {\mathbf {x}}_k&\le - \varepsilon _{\text {dinf}} \Vert \varDelta {\mathbf {x}}_k \Vert _\infty , \end{aligned}$$
(28b)
$$\begin{aligned} (\mathbf{A} \varDelta {\mathbf {x}}_k)^i&{\left\{ \begin{array}{ll} \in [-\varepsilon _{\text {dinf}}, \varepsilon _{\text {dinf}}] \Vert \varDelta {\mathbf {x}}_k \Vert _\infty &{} \mathbf{u}^i, \mathbf{l}^i \in \mathbb {R}, \\ \ge - \varepsilon _{\text {dinf}} \Vert \varDelta {\mathbf {x}}_k \Vert _\infty &{} \mathbf{u}^i = +\infty , \\ \le \varepsilon _{\text {dinf}} \Vert \varDelta {\mathbf {x}}_k \Vert _\infty &{} \mathbf{l}^i = -\infty , \end{array}\right. } ,~i \in [1;m], \end{aligned}$$
(28c)

where \(\varepsilon _{\text {dinf}} > 0\) is some tolerance level. In case of primal and/or dual infeasibility, we return the vectors \(\varDelta {\mathbf {y}}_k\) and \(\varDelta {\mathbf {x}}_k\) as certificates of primal and infeasibility, respectively. In our implementation, the default values are \({\varepsilon _{\text {pinf}} = \varepsilon _{\text {dinf}} = 10^{-6}}\). The reader may refer to [59, Sect. 3.4], [33, Sect. V.C], and [37, Sect. 4.1], and [51, Sect. 4] for analogous applications.

6.4 Preconditioning

Preconditioning, or scaling, the problem may alleviate ill-conditioning and mitigate numerical issues, especially when the problem data span across many orders of magnitude. In our implementation, we closely follow [34, Sect. 5.2] and scale the problem data by performing the Ruiz’s equilibration procedure [58] on the constraint matrix \(\mathbf{A}\). This procedure iteratively scales the rows and columns of a matrix in order to make their infinity norms approach one. By default, QPDO performs 10 scaling iterations. Slightly different routines are adopted, e.g., in [59, Sect. 5.1] and [51, Sect. 5.1.2]. Note that, by default, if the problem is initially scaled, the termination conditions for optimality and infeasibility refer to the original, unscaled, problem.

7 Numerical results

We discuss details of our open-source C implementation of QPDO and present computational results on random problems and the Maros-Mészáros set [39]. We test and compare QPDO against the open-source, full-fledged solvers OSQP [59] and QPALM [33, 34], and the commercial interior-point solver MOSEK [43]. Indeed, “the construction of appropriate software is by no means trivial and we wish to make a thorough job of it” [14]; we plan to improve our current implementation, in particular the linear solver discussed in Sect. 6.1, and to report comprehensive numerical results in due time.

7.1 Setup

We consider the tolerance \(\varepsilon _{\text {opt}} = 10^{-5}\), and set the tolerances for all the solvers accordingly. In addition, we set the maximum run time of each solver to 100 s and no limit on the maximum number of iterations. We leave all the other settings to the internal defaults. It is worth mentioning that, since no initial guess is provided, QPDO, OSQP, and QPALM start with \({\mathbf {v}}_0 = \mathbf{0}\).

In general it is hard to compare the solution accuracy because the solvers may verify different termination criteria. While QPDO, QPALM and OSQP monitor the residual \(\mathbf{r}\) in (4) and check the condition \(\Vert \mathbf{r}({\mathbf {v}}^\star ) \Vert _\infty \le \varepsilon _{\text {opt}}\), MOSEK satisfies the complementarity slackness with different metrics and scalings. Therefore, we decided not to include checks on \(\Vert \mathbf{r}({\mathbf {v}}^\star ) \Vert _\infty\). Instead, we deem optimal a primal-dual pair \({\mathbf {v}}^\star\) if it is returned by a solver declaring success, otherwise we consider it a failure.

All the experiments were carried out on a desktop running Ubuntu 16.04 LTS with Intel Core i7-8700, 3.20 GHz, and 16 GB RAM. The code for all the numerical examples is available online at [16].

Metrics Let S, P, and \(t_{s,p}\) denote the set of solvers, the set of problems, and the time required for solver \(s \in S\) to return a solution for problem \(p \in P\). The shifted geometric mean (sgm) \({\widehat{t}}_s\) of the run times for solver \(s \in S\) on P is defined by

$$\begin{aligned} {\widehat{t}}_s {:}{=}\exp \left( \frac{1}{|P|} \sum _{p \in P} \ln \left( t_{s,p} + t_{\text {shift}} \right) \right) - t_{\text {shift}} \end{aligned}$$

with the shift \(t_{\text {shift}} = 1\) s [41]. Here, when solver s fails to solve problem p, \(t_{s,p}\) is set to the time limit. We also adopt performance profiles [18] to compare the solver timings. These plot the function \(f_s^\text {r} : \mathbb {R}\rightarrow [0,1]\), \(s \in S\), defined by

$$\begin{aligned} f_s^\text {r}(\tau ) {:}{=}\frac{|\{ p \in P : t_{s,p} \le \tau \, t_p^\text {min} \}|}{|P|} ,\quad t_p^\text {min} {:}{=}\min _{s \in S} t_{s,p} . \end{aligned}$$

Considering \(t_{s,p} = +\infty\) when solver s fails on problem p, \(f_s^\text {r}(\tau )\) is the fraction of problems solved by solver s within \(\tau\) times the best timing. Note that, although we cannot necessarily assess the performance of one solver relative to another with performance profiles, they still represent a tool for evaluating and comparing the performance of a solver with respect to the best one [30].

However, performance profiles do not provide the percentage of problems that can be solved (for some given tolerance \(\varepsilon _\text {opt}\)) within a given time t. Thus, on the vein of data profiles [42, Sect. 2.2], we plot the function \(f_s^\text {a} : \mathbb {R}\rightarrow [0,1]\), \(s \in S\), defined by

$$\begin{aligned} f_s^\text {a}(t) {:}{=}\frac{|\{ p \in P : t_{s,p} \le t \}|}{|P|} . \end{aligned}$$

Considering \(t_{s,p} = +\infty\) when solver s fails on problem p, \(f_s^\text {a}(t)\) is the fraction of problems solved by solver s within the time t. Note that, in contrast to \(f_s^\text {r}\), the time profile \(t \mapsto f_s^\text {a}(t)\) is independent from other solvers and displayed with the actual timings of s.

7.2 Random problems

We considered QPs in the form (1) with randomly generated problem data. In each problem instance, the number of variables is \(n = \lceil 10^a \rceil\), with a uniformly distributed, and ranges between \(10^2\) and \(10^3\), i.e., \(a \sim {\mathcal {U}}(2,3)\). The number of constraints is \(m = \lceil b \, n \rceil\), with \(b \sim {\mathcal {U}}(2,5)\). The linear cost is normally distributed, i.e., \(\mathbf{q}_i \sim {\mathcal {N}}(0,1)\). The cost matrix is \(\mathbf{Q} = \mathbf{P} \mathbf{P}^\top + \alpha \mathbf{I}_n\), where \(\mathbf{P} \in \mathbb {R}^{n \times n}\) has \(10\%\) nonzero entries \(\mathbf{P}_{ij} \sim {\mathcal {N}}(0,1)\), and \(\alpha = 10^{-6}\). The constraint matrix \(\mathbf{A} \in \mathbb {R}^{m \times n}\) contains \(10\%\) nonzero entries \(\mathbf{A}_{ij} \sim {\mathcal {N}}(0,1)\). The bounds are uniformly distributed, i.e., \(\mathbf{l}_i \sim {\mathcal {U}}(-1,0)\) and \(\mathbf{u}_i \sim {\mathcal {U}}(0,1)\). We also investigated equality-constrained QPs. For these problems, \(m = \lceil n/b\rceil\), with \(b \sim {\mathcal {U}}(2,5)\), and \(\mathbf{l} = \mathbf{u} = \mathbf{A} \tilde{\mathbf {x}}\), where \(\tilde{\mathbf {x}}_i \sim {\mathcal {N}}(0,1)\). We generated 500 instances of each problem class.

Results Computational results are summarized in Table 1 and shown in Figs. 1, 2. Both performance and time profiles suggest that, for random QPs, QPALM exhibits the best performance, with OSQP slightly slower and QPDO third. For equality-constrained QPs, instead, OSQP performs best with QPALM and QPDO slightly behind. MOSEK is generally slower than the other solvers and, for random QPs, it often declares success with a solution that does not satisfy the condition \(\Vert \mathbf{r}({\mathbf {v}}^\star ) \Vert _\infty \le \varepsilon _{\text {opt}}\).

Fig. 1
figure 1

Comparison on random problems with performance profiles

Fig. 2
figure 2

Comparison on random problems with time profiles

Table 1 Comparison on different problem classes with run times, as shifted geometric means (sgm), and failure rates

7.3 Maros-Mészáros problems

We considered the Maros-Mészáros test set [39] of hard QPs and selected those with \(n \le 10^3\), due to the limitations mentioned in Sect. 6.1. This yields 73 problems, with \(2 \le n \le 1000\), \(3 \le m \le 1750\), and the number of nonzeros \(6 \le N \le 22292\).

Results Computational results are summarized in Tables 1, 2 and shown in Figs. 3, 4. On this test set, QPDO demonstrates its robustness, solving all the problems. OSQP is very fast for some problems but has a high failure rate; it fails on 5 of the 20 problems reported in Table 2. As a first-order method, OSQP builds upon computationally cheap iterations, but it may take many to cope with ill-conditioning and the relatively high accuracy requirements. QPALM is still competitive but fails on the VALUES problem, due to linear algebra issues. MOSEK seems to perform better than the other solvers on the larger problems, but it often does not satisfy the condition \(\Vert \mathbf{r}({\mathbf {v}}^\star ) \Vert _\infty \le \varepsilon _{\text {opt}}\), and fails on many problems. Overall, this proves QPDO is both reliable and effective.

Fig. 3
figure 3

Comparison on Maros-Mészáros problems with performance profiles

Fig. 4
figure 4

Comparison on Maros-Mészáros problems with time profiles

Table 2 Comparison on the larger Maros-Mészáros problems (\(N \ge 5000\)) with KKT residual and run time for QPDO, QPALM, and MOSEK

7.4 Degenerate and infeasible problems

Consider the following parameterized QP, adapted from [37, Sect. 5.4]:

$$\begin{aligned} \min _{{\mathbf x}}\quad\frac{1}{2} {\mathbf x}^\top \begin{bmatrix} 1 {} 0 \\ 0 {} 0 \end{bmatrix} {\mathbf x} + \begin{pmatrix} 1 \\ c \end{pmatrix}^\top {\mathbf x} \\ s.t.\quad\begin{pmatrix} -\infty \\ 1 \\ 1 \end{pmatrix} \le \begin{bmatrix} a {} a \\ 1 {} 0 \\ 0 {} 1 \end{bmatrix} {\mathbf x} \le \begin{pmatrix} 0 \\ 3 \\ b \end{pmatrix} . \end{aligned}$$

By varying a, b and c, we can create degenerate or infeasible test problems.

First, we consider the degenerate problem obtained by setting \({a=0}\), \({b = 3}\), and \({c = 0}\). This problem admits primal solutions \({\mathbf{x}}^\star \in \{ (1,\alpha ) \mid 1 \le \alpha \le 3 \}\). Running with default settings, QPDO signals optimality after 6 proximal iterations and 14 Newton iterations, and returns \({\mathbf{x}} = (1.0,1.0)\), \({\mathbf{y}} = (0.0,-2.0,0.0)\), with residual \(\Vert \mathbf{r}({\mathbf{v}}) \Vert _\infty = 1.0 \cdot 10^{-7}\).

Second, we consider a primal infeasible QP by setting \(a=1\), \(b = 3\) and \(c = 0\). QPDO signals primal infeasibility after 3 proximal iteration and 8 Newton iterations, and returns the certificate \(\varDelta {\mathbf{y}} = (6.6, -6.6, -6.6) \cdot 10^4\).

Finally, we consider a dual infeasible QP by setting \(a=0\), \(b = +\infty\) and \(c = -1\). For such problem, (0, 1) is a direction of unbounded descent. QPDO signals dual infeasibility after 5 proximal iterations and 12 Newton iterations, and returns the certificate \(\varDelta {\mathbf{x}} = (1.1 \cdot 10^{-5}, 1.0 \cdot 10^7)\).

8 Conclusions

This paper presented a primal-dual Newton-type proximal method for convex quadratic programs. We build upon a simple yet crucial result: a suitable merit function for the proximal sub-problem is found in the proximal primal-dual augmented Lagrangian function. This allows us to effectively weave the proximal point method together with semismooth Newton, yielding structured symmetric linear systems, exact linesearch, and the possibility to apply sparse multi-rank factorization updates. Requiring only convexity, the method is simple and easily warm started, can exploit sparsity, is robust to early termination, and can detect infeasibility. We have implemented our method QPDO in a general-purpose solver, written in open-source C code. We benchmarked it against state-of-the-art QP solvers, comparing run times and failure rates. QPDO proved reliable, effective, and competitive.