On a primal-dual Newton proximal method for convex quadratic programs

De Marchi, Alberto

doi:10.1007/s10589-021-00342-y

On a primal-dual Newton proximal method for convex quadratic programs

Open access
Published: 06 January 2022

Volume 81, pages 369–395, (2022)
Cite this article

Download PDF

You have full access to this open access article

Computational Optimization and Applications Aims and scope Submit manuscript

On a primal-dual Newton proximal method for convex quadratic programs

Download PDF

Alberto De Marchi ORCID: orcid.org/0000-0002-3545-6898¹

3237 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

This paper introduces QPDO, a primal-dual method for convex quadratic programs which builds upon and weaves together the proximal point algorithm and a damped semismooth Newton method. The outer proximal regularization yields a numerically stable method, and we interpret the proximal operator as the unconstrained minimization of the primal-dual proximal augmented Lagrangian function. This allows the inner Newton scheme to exploit sparse symmetric linear solvers and multi-rank factorization updates. Moreover, the linear systems are always solvable independently from the problem data and exact linesearch can be performed. The proposed method can handle degenerate problems, provides a mechanism for infeasibility detection, and can exploit warm starting, while requiring only convexity. We present details of our open-source C implementation and report on numerical results against state-of-the-art solvers. QPDO proves to be a simple, robust, and efficient numerical method for convex quadratic programming.

An efficient primal-dual interior point algorithm for convex quadratic semidefinite optimization

Article 23 March 2024

Principled analyses and design of first-order methods with inexact proximal operators

Article 17 December 2022

IPRQP: a primal-dual interior-point relaxation algorithm for convex quadratic programming

Article 20 July 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Quadratic programs (QPs) are one of the fundamental problems in optimization. In this paper, we consider linearly constrained convex QPs, in the form:

$$\begin{aligned} \min _{\mathbf {x}} \quad \frac{1}{2} \mathbf {x}^\top \mathbf{Q} \mathbf {x} + \mathbf{q}^\top \mathbf {x} ,\quad s.t.\quad \mathbf{l} \le \mathbf{A} \mathbf {x} \le \mathbf{u} , \end{aligned}$$

(1)

with $\mathbf{x} \in \mathbb {R}^n$. $\mathbf{Q} \in \mathbb {R}^{n \times n}$ and $\mathbf{q} \in \mathbb {R}^n$ define the objective function, whereas the constraints are encoded by $\mathbf{A} \in \mathbb {R}^{m \times n}$ and $\mathbf{l}, \mathbf{u} \in {\overline{\mathbb {R}}}^m$. We assume (i) $\mathbf{Q}$ is symmetric positive semidefinite, i.e., $\mathbf{Q} \succeq 0$, and (ii) $\mathbf{l}$ and $\mathbf{u}$ satisfy $\mathbf{l} \le \mathbf{u}$, $\mathbf{l} < +\infty$, and $\mathbf{u} > -\infty$ component-wise; cf. [33, 59]. We will refer to the nonempty, closed and convex set

$$\begin{aligned} {\mathcal {C}} {:}{=}\{ \mathbf{z} \in \mathbb {R}^m : \mathbf{l} \le \mathbf{z} \le \mathbf{u} \} \end{aligned}$$

as the constraint set. Note that (1) represents a general convex QP, in that it accomodates also equality constraints and bounds. We denote N the sum of the number of nonzero entries in $\mathbf{Q}$ and $\mathbf{A}$, i.e., $N {:}{=}nnz( \mathbf{Q} ) + nnz( \mathbf{A} )$.

1.1 Background

Optimization problems in the form (1) appear in a variety of applications and are of interest in engineering, statistics, finance and many other fields. QPs often arise as sub-problems in methods for general nonlinear programming [10, 31, 44], and greatly vary in terms of problem size and structure.

Convex QPs have been studied since the 1950s [22] and several numerical methods have been developed since then. These differ in how they balance the number of iterations and the cost (e.g., run time) per iteration.

Active-set methods for QPs originated from extending the simplex method for linear programs (LPs) [64]. These methods select a set of binding constraints and iteratively adapt it, seeking the set of active constraints at the solution. Active-set algorithms can be easily warm started and can lead to finite convergence. Moreover, by adding and dropping constraints from the set of binding ones, factorization updates can be adopted for solving successive linear systems. However, these methods may require many iterations to identify the correct set of active constraints. Modern solvers based on active-set methods are qpOASES [20] and NASOQ [13].

First-order methods iteratively compute an optimal solution using only first-order information about the cost function [46, 49]. As these methods consist of computationally cheap and simple steps, they are well suited to applications with limited computing resources [59]. However, first-order algorithms usually require many iterations to achieve accurate solutions and may suffer from ill-conditioning of the problem data. Several acceleration schemes have been proposed to improve their behaviour [1, 62]. The open-source solver OSQP [59] offers an implementation based on ADMM [9].

Interior-point methods move the problem constraints to the objective function via barrier functions and solve a sequence of parametric sub-problems [10, Chap. 11], [44, Sect. 16.6]. Although not easily warm started, the polynomial complexity makes interior-point methods appealing for large scale problems [29]. They usually require few but rather demanding iterations [31, 44]. Interior-point methods are currently the default algorithms in the commercial solvers GUROBI [32] and MOSEK [43]. Recent developments are found in the regularized method IP-PMM [51].

Semismooth Newton methods apply a nonsmooth version of Newton’s method to the KKT conditions of the original problem [52, 53]. In the strictly convex case, i.e., with $\mathbf{Q} \succ 0$, this approach performs very well as long as the underlying linear systems are nonsingular. Regularized, or stabilized, semismooth Newton-type methods, such as QPALM [33, 34] and FBstab [37], overcome these drawbacks.

The augmented Lagrangian framework [7, 14, 44], semismooth Newton methods [25, 53], and proximal techniques [48, 56] are undergoing a revival, as their seamless combination exhibits valuable properties and provides useful features, such as regularization and numerical stability [6, 33, 37]. These ideas form the basis for our approach.

1.2 Approach

In this work we present a numerical method for solving general QPs. The proposed algorithm is based on the proximal point algorithm and a semismooth Newton method for solving the sub-problems, which are always solvable for any choice of problem data. We therefore impose no restrictions such as strict convexity of the cost function or linear independence of the constraints. As such, our algorithm gathers together the benefits of fully regularized primal-dual methods and semismooth Newton methods with active-set structure. Our algorithm can exploit warm starting to reduce the number of iterations, as well as factorization caching and multi-rank update techniques for efficiency, and it can obtain accurate solutions.

Our approach, dubbed QPDO from “Quadratic Primal-Dual Optimizer”, is inspired by and shares many characteristics with algorithms that have already been proposed, in particular with QPALM [33] and FBstab [37]. On the other hand, they differ on some key aspects. QPALM relates to the proximal method of multipliers [33, Rem. 2], which in turn is associated to the classical (primal) augmented Lagrangian function [55]. Instead, FBstab and QPDO apply the proximal point method, yielding exact primal-dual regularization. A more detailed comparison is deferred to Sect. 5. However, FBstab reformulates the sub-problem via the (penalized) Fischer-Burmeister NCP function [11, 21], and adopts the squared residual norm as a merit function for the inner iterative loop; this prevents the use of symmetric sparse linear solvers. Instead, QPDO adopts the minimum NCP function, which leads to symmetric linear systems with active-set structure. Then, we show the primal-dual proximal augmented Lagrangian function, introduced in [27, 54] and [17], is a suitable merit function for the proximal sub-problem, which allows us to perform an exact linesearch in a fully primal-dual regularized context. Indeed, we believe, the main contribution of this work consists in recognizing this link, exploiting it to bridge the gap between previously proposed methods, and developing a robust and efficient algorithm that possesses their advantages but does not suffer from their disadvantages.

Notation $\mathbb {N}$, $\mathbb {Z}$, $\mathbb {R}$, $\mathbb {R}_+$, and $\mathbb {R}_{++}$ denote the sets of natural, integer, real, non-negative real, and positive real numbers, respectively. We denote ${\overline{\mathbb {R}}} := \mathbb {R}\cup \{-\infty ,\infty \}$ the extended real line. The identity matrix and the vector of ones of size n are denoted by $\mathbf{I}_n$ and $\mathbf{1}_n$, respectively. We may omit subscripts whenever clear from the context. [a, b], (a, b), [a, b), and (a, b] stand for closed, open, and half-open intervals, respectively, with end points a and b. [a; b], (a; b), [a; b), and (a; b] stand for discrete intervals, e.g., $[a;b] = [a,b] \cap {\mathbb {Z}}$. Given a vector $\mathbf {x} \in \mathbb {R}^n$, $\mathbf {x}^\top$ and $\mathbf {x}^i$ denote its transpose and its i-th component, respectively. We adopt the norms $\Vert \mathbf {x} \Vert = \Vert \mathbf {x} \Vert _2 {:}{=}\sqrt{\mathbf {x}^\top \mathbf {x}}$ and $\Vert \mathbf {x} \Vert _\infty {:}{=}\max _{i \in [1;n]} |\mathbf {x}^i|$. Given a set S, |S| denotes its cardinality. In $\mathbb {R}^n$, the relations <, $\le$, $=$, $\ge$, and > are understood component-wise. Given a nonempty closed convex set ${\mathcal {C}} \subseteq \mathbb {R}^n$, we denote $\chi _{\mathcal {C}} : \mathbb {R}^n \rightarrow \mathbb {R}\cup \{+\infty \}$ its characteristic function, namely $\chi _{\mathcal {C}}(\mathbf {x}) = 0$ if $\mathbf {x} \in {\mathcal {C}}$ and $\chi _{\mathcal {C}}(\mathbf {x}) = +\infty$ otherwise, $dist_{\mathcal {C}} : \mathbb {R}^n \rightarrow \mathbb {R}$ its distance, namely ${\mathbf {x}} \mapsto {\min }_{\mathbf{z} \in {\mathcal {C}}} \Vert \mathbf{z}-\mathbf {x}\Vert$, and its projection $\Pi _{\mathcal {C}} : \mathbb {R}^n \rightarrow \mathbb {R}^n$, namely ${\mathbf {x}} \mapsto {\arg \min }_{\mathbf{z} \in {\mathcal {C}}} \Vert \mathbf{z}-\mathbf {x}\Vert$. Thus, it holds $dist_{\mathcal {C}}(\mathbf {x}) = \Vert \Pi _{\mathcal {C}}(\mathbf {x}) - \mathbf {x} \Vert$.

The algorithm is described with a nested structure, whose outer iterations are indexed by $k \in \mathbb {N}$. Given an arbitrary vector ${\mathbf {x}}$, ${\mathbf {x}}_k$ denotes that ${\mathbf {x}}$ depends on k, and analogously for matrices. We denote y the dual variable associated with the constraints in problem (1). A primal-dual pair $(\mathbf {x},\mathbf{y})$ will be denoted v, and we will refer interchangeably to it as a vector or to its components ${\mathbf {x}}$ and y. An optimal solution to (1) will be denoted $( {\mathbf {x}}^\star ,{\mathbf {y}}^\star )$, or $\mathbf {v}^\star$. Optimal solutions of proximal sub-problems will be denoted using an appropriate subscript, according to the iteration. For example, $(\mathbf {x}_k^\star ,\mathbf {x}_k^\star )$, and $\mathbf {x}_k^\star$, denote the solution to the proximal sub-problem corresponding to the k-th outer iteration.

Outline The rest of the paper is organized as follows. Sections 2 and 3 develop and present our method in detail. In particular, in Sect. 3.1 we establish our key result, which relates the proximal operator and the primal-dual proximal augmented Lagrangian function. Our algorithmic framework is outlined in Section 4 and the convergence properties are analyzed in Sect. 4.1, while Sect. 5 contrasts QPDO with similar methods. We present details of our implementation in Sect. 6 and report on numerical experience in Sect. 7.

2 Outer loop: inexact proximal point method

Our method solves (1) using the proximal point algorithm with inexact evaluation of the proximal operator. The latter is evaluated by means of a semismooth Newton-type method, which constitutes an inner iterative procedure further investigated in Sect. 3. Here we focus on the outer loop corresponding to the proximal point algorithm, which has been extensively studied in the literature [56]. We recall some important results and refer to [38, 40, 45, 55] for more details.

2.1 Optimality conditions

Problem (1) can be equivalently expressed as

$$\begin{aligned} \min _{\mathbf {x}}\quad f(\mathbf {x}) + g(\mathbf{A}{\mathbf {x}}) , \end{aligned}$$

(2)

where

$$\begin{aligned} f({\mathbf {x}}) {:}{=}\frac{1}{2} {\mathbf {x}}^\top \mathbf{Q} {\mathbf {x}} + \mathbf{q}^\top {\mathbf {x}} \qquad \text {and}\qquad g(\mathbf{z}) {:}{=}\chi _{\mathcal {C}}(\mathbf{z}) \end{aligned}$$

are the objective function and the characteristic function of the constraint set ${\mathcal {C}}$, respectively. The necessary and sufficient first-order optimality conditions of (2), and hence (1), read

$$\begin{aligned} \mathbf{0} \in {\mathcal {T}}( {\mathbf {v}}) {:}{=}\begin{pmatrix} \mathbf{Q} {\mathbf {x}} + \mathbf{q} + \mathbf{A}^\top {\mathbf {y}}\\ - \mathbf{A} {\mathbf {x}} + \partial g^*(\mathbf{y}) \end{pmatrix} , \end{aligned}$$

(3)

where $\partial g^*$ denotes the (set-valued) conjugate subdifferential of g [45]. For all $i \in [1;m]$, $\left( \partial g^*({\mathbf {y}}) \right) ^i = \mathbf{l}^i$ if ${{\mathbf {y}}^i < 0}$, $\left( \partial g^*({\mathbf {x}}) \right) ^i = [\mathbf{l}^i,\mathbf{u}^i]$ if ${{\mathbf {x}}^i = 0}$, and $\left( \partial g^*({\mathbf {y}}) \right) ^i = \mathbf{u}^i$ if ${{\mathbf {y}}^i > 0}$. We will refer to ${\mathcal {T}} : \mathbb {R}^\ell \rightrightarrows \mathbb {R}^\ell$, $\ell {:}{=}n + m$, as the KKT operator for (1). However, noticing that, for any $\alpha > 0$, the conditions ${\mathbf{v} = \Pi _{{\mathcal {C}}}( {\mathbf {v}}+ \alpha \mathbf{u} )}$ and ${{\mathbf {v}} \in \partial g^*(\mathbf{u})}$ are equivalent [57, Sect. 23], conditions in (3) can be reformulated. Choosing ${\alpha = 1}$, we define the (outer) residual ${\mathbf{r} : \mathbb {R}^\ell \rightarrow \mathbb {R}^\ell }$ and equivalently express (3) as

$$\begin{aligned} \mathbf{0} = \mathbf{r}( {\mathbf {v}}) {:}{=}\begin{pmatrix} \mathbf{Q} {\mathbf {x}} + \mathbf{q} + \mathbf{A}^\top {{\mathbf {y}}} \\ \mathbf{A} {\mathbf {x}} - \Pi _{{\mathcal {C}}}(\mathbf{A} {\mathbf {x}} + {\mathbf {y}}) \end{pmatrix} . \end{aligned}$$

(4)

This reformulation can be obtained also by employing the minimum NCP function [60] and rearranging to obtain the projection operator $\Pi _{{\mathcal {C}}}$. The residual $\mathbf{r}$ is analogous to the natural residual function $\varvec{\pi }$ investigated in [47]. Since it is an error bound for problem (1), i.e., $dist_{{\mathcal {T}}^{-1}(\mathbf{0})}({\mathbf {v}}) = {\mathcal {O}}( \Vert \mathbf{r}({\mathbf {v}})\Vert )$ [47, Thm 18], $\mathbf{r}$ is a suitable optimality measure and its norm can be adopted as a stopping criterion. Although equivalent, (3) is considered here only as a theoretical tool for developing the proposed method, whereas the outer residual $\mathbf{r}$ in (4) serves as a computationally practical optimality criterion.

2.2 Proximal point algorithm

The proximal point algorithm [56] finds zeros of maximal monotone operators by recursively applying their proximal operator. Since ${\mathcal {T}}$ is a maximal monotone operator [45, 55], the proximal point algorithm converges to an element ${\mathbf {v}}^\star$ of the set of primal-dual solutions ${\mathcal {T}}^{-1}(\mathbf{0})$, if any exists [40, 56]. Starting from an initial guess ${\mathbf {v}}_0$, it generates a sequence $\{ {\mathbf {v}}_k \}$ of primal-dual pairs by recursively applying the proximal operator ${\mathcal {P}}_k$:

$$\begin{aligned} {\mathbf {v}}_{k+1} = {\mathcal {P}}_k( {\mathbf {v}}_k ) ,\qquad {\mathcal {P}}_k {:}{=}(\mathbf{I} + \mathbf \Sigma _k^{-1} {\mathcal {T}})^{-1} , \end{aligned}$$

(5)

where $\{ \mathbf \Sigma _k \}$ is a sequence of non-increasing positive definite matrices, namely, $\mathbf \Sigma _k \succ 0$ and $\mathbf \Sigma _k - \mathbf \Sigma _{k+1} \succeq 0$ for all $k\in \mathbb {N}$. The matrices $\mathbf \Sigma _k$ control the primal-dual proximal regularization and, similarly to exact penalty methods, these are not required to vanish [55, 56]. Since ${\mathcal {T}}$ is maximal monotone, ${\mathcal {P}}_k$ is well defined and single valued for all ${\mathbf {v}} \in dom{\mathcal {T}} = \mathbb {R}^\ell$ [40]. Thus, from (5), evaluating ${\mathcal {P}}_k$ at ${\mathbf {v}}_k$ is equivalent to finding the unique ${\mathbf {v}} \in \mathbb {R}^\ell$ that satisfies

$$\begin{aligned} \mathbf{0} \in {\mathcal {T}}_k({\mathbf {v}}) {:}{=}{\mathcal {T}}({\mathbf {v}}) + \mathbf \Sigma _k ({\mathbf {v}} - {\mathbf {v}}_k) . \end{aligned}$$

(6)

This is guaranteed to have a unique solution and to satisfy certain useful regularity properties; see Sect. 3 below. As a result, we can construct a fast inner solver for these sub-problems based on the semismooth Newton method.

2.3 Early termination

The proximal point algorithm tolerates errors, namely the inexact evaluation of ${\mathcal {P}}_k$ [56]. Criterion $(A_r)$ in [38] provides conditions for the design of convergent inexact proximal point algorithms [38, Thm 2.1]. Let ${\mathbf {v}}_k^\star {:}{=}{\mathcal {P}}_k({\mathbf {v}}_k)$ denote the unique proximal sub-problem solution and ${\mathbf {v}}_{k+1} \approx {\mathbf {v}}_k^\star$ the actual recurrence update. The aforementioned criterion requires

$$\begin{aligned} \Vert {\mathbf {v}}_{k+1} - {\mathbf {v}}_k^\star \Vert \le e_k \min \left( 1, \Vert {\mathbf {v}}_{k+1} - {\mathbf {v}}_k \Vert ^r \right) , \end{aligned}$$

(7)

where $r \ge 0$ and $\{ e_k \}$ is a summable sequence of nonnegative inner tolerances, i.e., $e_k \ge 0$ for all k and $\sum _{k=0}^\infty e_k < +\infty$. However, since ${\mathbf {v}}_k^\star$ is effectively unknown, this criterion is impractical. Instead, in Algorithm 1 it is required that ${\mathbf {v}}_{k+1}$ satisfy $\Vert \mathbf{r}_k({\mathbf {v}}_{k+1}) \Vert _\infty \le \varepsilon _k$. Here, $\mathbf{r}_k$ denotes the residual for the k-th sub-problem, and is defined in (14). In Sect. 4.1 we will show that this criterion is a simple and viable substitute, which retains the significance of $(A_r)$.

2.4 Warm starting

If a solution ${\mathbf {v}}^\star$ exists, the (outer) sequence $\{{\mathbf {v}}_k \}$ generated by (5) converges, by the global convergence of the proximal point algorithm [56]. Then, ${\mathcal {P}}_k({\mathbf {v}}_k)$ and ${\mathbf {v}}_k$ are arbitrarily close to each other for sufficiently large k [37, Sect. 4]. This supports the idea of warm starting the inner solver with the current outer estimate ${\mathbf {v}}_k$, that is, setting ${\mathbf {v}} \leftarrow {\mathbf {v}}_k$ in Algorithm 2. In practice, for large k, only one or few Newton-type inner iterations are needed to find an approximate sub-problem solution ${\mathbf {v}}_{k+1}$.

2.5 Primal and dual infeasibility

Infeasibility detection in convex programming has been studied in [4, 5]. Certifying primal infeasibility of (1) amounts to finding a vector ${\mathbf {y}} \in \mathbb {R}^m$ such that

$$\begin{aligned} \mathbf{A}^\top {\mathbf {y}}= \mathbf{0} ,\quad \sum _{\begin{array}{c} i \in [1;m] \\ \mathbf{u}^i \in \mathbb {R} \end{array}} \mathbf{u}^i \max ({\mathbf {y}}^i,0) + \sum _{\begin{array}{c} i \in [1;m] \\ \mathbf{l}^i \in \mathbb {R} \end{array}} \mathbf{l}^i \min ({\mathbf {y}}^i,0) < 0 . \end{aligned}$$

(8)

Similarly, it can be shown that a vector ${\mathbf {x}} \in \mathbb {R}^n$ satisfying

$$\begin{aligned} \mathbf{Q} {\mathbf {x}} = \mathbf{0} ,\quad \mathbf{q}^\top {\mathbf {x}} < 0 ,\quad (\mathbf{A} {\mathbf {x}})^i {\left\{ \begin{array}{ll} = 0 &{} \mathbf{l}^i, \mathbf{u}^i \in \mathbb {R}, \\ \ge 0 &{} \mathbf{l}^i \in \mathbb {R}, \mathbf{u}^i = +\infty , \\ \le 0 &{} \mathbf{l}^i = -\infty , \mathbf{u}^i \in \mathbb {R}, \end{array}\right. } \quad i \in [1;m] , \end{aligned}$$

(9)

is a certificate of dual infeasibility for (1) [4, Prop. 3.1].

3 Inner loop: semismooth Newton method

In this section we focus on solving (6) via a semismooth Newton method. For the sake of clarity, and without loss of generality, we consider

$$\begin{aligned} \mathbf \Sigma _k {:}{=}blockdiag(\sigma _k \mathbf{I}_n, \mu _k \mathbf{I}_m) . \end{aligned}$$

for some parameters $\sigma _k, \mu _k \in \mathbb {R}_{++}$.

3.1 Merit function

We now derive the simple yet fundamental result that is the key to developing our method. This provides the NCP reformulation of the proximal sub-problem with a suitable merit function. The former yields symmetric active-set linear systems, while the latter leads to exact linesearch.

Let us express (6) in the form

$$\begin{aligned} \mathbf{0} \in \begin{pmatrix} \mathbf{Q} {\mathbf {x}} + \mathbf{q} + \mathbf{A}^\top {\mathbf {y}} + \sigma _k ({\mathbf {x}} - {\mathbf {x}}_k) \\ - \mathbf{A} {\mathbf {x}} + \mu _k ({\mathbf {y}} - {\mathbf {y}}_k) + \partial g^*({\mathbf {y}}) \end{pmatrix} . \end{aligned}$$

(10)

Similarly to (4), for any given $\alpha > 0$, this can be rewritten as

$$\begin{aligned} \mathbf{0} = \begin{pmatrix} \mathbf{Q} {\mathbf {x}} + \mathbf{q} + \mathbf{A}^\top {\mathbf {y}}+ \sigma _k ({\mathbf {x}} - {\mathbf {x}}_k) \\ \mathbf{A} {\mathbf {x}} + \mu _k ({\mathbf {y}}_k - {\mathbf {y}}) - \Pi _{{\mathcal {C}}}( \mathbf{w}_k ) \end{pmatrix} , \end{aligned}$$

(11)

where we denote

$$\begin{aligned} \mathbf{w}_k {:}{=}\mathbf{A} {\mathbf {x}} + \mu _k ({\mathbf {y}}_k - {\mathbf {y}}) + \alpha {\mathbf {y}} . \end{aligned}$$

The second condition in (11) can be expressed as $\mathbf{0} = \mathbf{w}_k - \Pi _{{\mathcal {C}}}(\mathbf{w}_k ) - \alpha {\mathbf {x}}$. Then, we substitute y with $[\mathbf{w}_k - \Pi _{{\mathcal {C}}}(\mathbf{w}_k ) ] / \alpha$ in the first condition in (11), and multiply the second one by $(\alpha - \mu _k)/\alpha$. Hence, for any positive $\alpha \ne \mu _k$, (11) is equivalent to

$$\begin{aligned} \mathbf{0} = \begin{pmatrix} \mathbf{Q} {\mathbf {x}} + \mathbf{q} + \frac{1}{\alpha } \mathbf{A}^\top [\mathbf{w}_k - \Pi _{{\mathcal {C}}}(\mathbf{w}_k)] + \sigma _k ({\mathbf {x}} - {\mathbf {x}}_k) \\ \frac{\alpha - \mu _k}{\alpha } [\mathbf{w}_k - \Pi _{{\mathcal {C}}}(\mathbf{w}_k )] + (\mu _k - \alpha ) {\mathbf {x}} \end{pmatrix} , \end{aligned}$$

(12)

namely their unique solutions coincide. Now, we observe that the right-hand side of (12) is the gradient of the function

$$\begin{aligned} f({\mathbf {x}}) + \frac{1}{2 \alpha } dist_{{\mathcal {C}}}^2(\mathbf{w}_k ) + \frac{\sigma _k}{2} \Vert {\mathbf {x}} - {\mathbf {x}}_k\Vert ^2 + \frac{\mu _k - \alpha }{2} \Vert {\mathbf {y}}\Vert ^2 . \end{aligned}$$

(13)

By construction, this is a continuously differentiable function whose gradient vanishes at the unique solution of the proximal sub-problem. Furthermore, for any $\alpha \in (0,\mu _k)$, it is strictly convex and hence admits a unique minimizer that must coincide with the unique proximal point. Therefore, (13) is a suitable merit function for the sub-problem. The particular choice $\alpha {:}{=}\mu _k / 2$ inherits all these properties and leads to the inner optimality conditions

$$\begin{aligned} \mathbf{0} = \mathbf{r}_k( {\mathbf v} ) {:}{=}\begin{pmatrix} \mathbf{Q} {\mathbf x} + \mathbf{q} + \mathbf{A}^\top {\mathbf y} + \sigma _k ({\mathbf x} - {\mathbf x}_k) \\ \mathbf{A} {\mathbf x} + \mu _k ({\mathbf y}_k - {\mathbf y}) - \Pi _{{\mathcal {C}}}(\mathbf{A} {\mathbf x} + \mu _k ({\mathbf y}_k - {\mathbf y}/2)) \end{pmatrix} , \end{aligned}$$

(14)

with $\mathbf{r}_k : \mathbb {R}^\ell \rightarrow \mathbb {R}^\ell$ the inner residual, and the associated merit function

$$\begin{aligned} {\mathcal {M}}_k(\mathbf{v}) {:}{=}f({\mathbf {x}}) + \frac{1}{\mu _k} dist_{{\mathcal {C}}}^2( \mathbf{A} {\mathbf {x}} + \mu _k ({\mathbf {y}}_k - {\mathbf {y}}/2) ) + \frac{\sigma _k}{2} \Vert {\mathbf {x}} - {\mathbf {x}}_k\Vert ^2 + \frac{\mu _k}{4} \Vert {\mathbf {y}}\Vert ^2 . \end{aligned}$$

(15)

In fact, ${\mathcal {M}}_k : \mathbb {R}^\ell \rightarrow \mathbb {R}$ is the primal-dual proximal augmented Lagrangian function [17, 27, 54]; see Appendix A for a detailed derivation. This underlines once again the strong relationship between the proximal point algorithm and the augmented Lagrangian framework, pioneered in [55]. On the one hand, by (15), the dual regularization parameter $\mu _k$ controls the constraint penalization [23, Sect. 3.2]. On the other hand, it provides an “interpretation of the augmented Lagrangian method as an adaptive constraint regularization process” [3, Sect. 2].

The inner residual $\mathbf{r}_k$ in (14) is piecewise affine, hence strongly semismooth on $\mathbb {R}^\ell$ [36, 52]. In fact, given $\mu _k$ bounded away from zero and the unique, bounded, and nonsingular matrix $\mathbf{T}_k$ defined by

$$\begin{aligned} \mathbf{T}_k {:}{=}\begin{bmatrix} \mathbf{I} &{} \frac{2}{\mu _k} \mathbf{A}^\top \\ \mathbf{0} &{} -\mathbf{I} \end{bmatrix} , \end{aligned}$$

(16)

we have the identity $\nabla {\mathcal {M}}_k(\cdot ) = \mathbf{T}_k \mathbf{r}_k(\cdot )$. Effectively, $\Vert \mathbf{r}_k(\cdot )\Vert$ can be employed as stopping criterion in place of $\Vert \nabla {\mathcal {M}}_k (\cdot )\Vert$. We prefer the former, since $\mathbf{r}_k$ corresponds to a perturbation of the outer residual $\mathbf{r}$; cf. (4).

The availability of a suitable merit function allows us to adopt a damped Newton-type method and design a linesearch globalization strategy, in contrast with [25, 37, 50]. Since ${\mathcal {M}}_k$ is continuously differentiable and piecewise quadratic, an exact linesearch procedure can be carried out, which yields finite convergence [61].

Finally, we highlight that the method asymptotically reduces to a sequence of regularized semismooth Newton steps applied to the original, unperturbed optimality system, in the vein of [2]. This closely relates to the concept of exact regularization [24]. In fact, the proximal primal-dual regularization is exact; see Theorem 1 and compare with [3, Thm 1].

Proposition 1

Let $k \in \mathbb {N}$ be arbitrary.

(i)
Suppose ${\mathbf {v}}_k^\star$ solves (14) for ${\mathbf {v}}_k {:}{=}{\mathbf {v}}_k^\star$ and for some $\sigma _k \ge 0$ and $\mu _k > 0$. Then, ${\mathbf {v}}_k^\star$ solves (4).
(ii)
Alternatively, suppose ${\mathbf {v}}_k^\star$ solves (14) for ${\mathbf {y}}_k {:}{=}{\mathbf {y}}_k^\star$, $\sigma _k {:}{=}0$, and for some $\mu _k > 0$. Then, ${\mathbf {v}}_k^\star$ solves (4).
(iii)
Conversely, suppose ${\mathbf {v}}^\star$ solves (4). Then, ${\mathbf {v}}^\star$ solves (14) for ${\mathbf {v}}_k {:}{=}{\mathbf {v}}^\star$ and for any $\sigma _k \ge 0$ and $\mu _k > 0$.

Proof

The proof is immediate by direct comparison of (4) and (14). $\square$

Subproblem (14) is equivalent to the unconstrained minimization of the primal-dual augmented Lagrangian function ${\mathcal {M}}_k$, given in (15). However, by introducing the auxiliary variable $\mathbf{s} \in \mathbb {R}^m$, we can rewrite subproblem (14) as the equivalent yet smoother problem

$$\begin{aligned} \min _{{\mathbf {x}},\mathbf{s}}\quad&\frac{1}{2} {\mathbf {x}}^\top \mathbf{Q} {\mathbf {x}} + \mathbf{q}^\top {\mathbf {x}} + \frac{\sigma _k}{2} \Vert {\mathbf {x}} - {\mathbf {x}}_k \Vert ^2 + \frac{1}{2 \mu _k} \Vert \mathbf{s} - \mu _k \mathbf{y}_k \Vert ^2 \nonumber \\ s.t.\quad&\mathbf{l} \le \mathbf{A} {\mathbf {x}} + \mathbf{s} \le \mathbf{u} , \end{aligned}$$

(17)

that is a primal-dual proximal regularization of (1). Indeed, it is always feasible and strictly convex and the constraints satisfy the linear independence constraint qualification (LICQ). This shows that each outer iteration is associated to a regularized QP, which can be effectively solved by Newton-type methods.

3.2 Search direction

A semismooth Newton direction $\delta {\mathbf {v}} = (\delta {\mathbf {x}},\delta {\mathbf {y}})$ at $\mathbf{v} = ({\mathbf {x}},{\mathbf {y}})$ solves

$$\begin{aligned} \mathbf{V}_k({\mathbf {v}}) \delta {\mathbf {v}} = - \mathbf{r}_k({\mathbf {v}}) , \end{aligned}$$

(18)

where

$$\begin{aligned} \mathbf{V}_k({\mathbf {v}}) = \begin{bmatrix} \mathbf{Q} + \sigma _k \mathbf{I} &{} \mathbf{A}^\top \\ (\mathbf{I} - \mathbf{P}_k({\mathbf {v}})) \mathbf{A} &{} -\mu _k (\mathbf{I} - \mathbf{P}_k({\mathbf {v}})/2) \end{bmatrix} \end{aligned}$$

(19)

is an element of the generalized Jacobian [57, Sect. 23] of $\mathbf{r}_k$ at v. In turn, the diagonal matrix $\mathbf{P}_k({\mathbf {v}})$ with entries

$$\begin{aligned} \mathbf{P}_k^{ii}({\mathbf {v}}) {:}{=}{\left\{ \begin{array}{ll} 1 &{} \text {if}~\mathbf{l}^i< \mathbf{w}_k^i < \mathbf{u}^i \\ 0 &{} \text {otherwise} \end{array}\right. }, ~i=1,\dots ,m , \end{aligned}$$

(20)

is an element of the generalized Jacobian of $\Pi _{{\mathcal {C}}}$ at $\mathbf{w}_k$. Owing to (20), (18) can be rewritten in symmetric form, similar to those arising in active-set methods [35]. To this end, we notice that, if $\mathbf{P}_k^{ii}({\mathbf {v}}) = 1$, the corresponding inner residual in (14) simplifies into $\mathbf{r}_k^{n+i}({\mathbf {v}}) = - \mu _k {\mathbf {y}}^i / 2$, and the linear equation in (18) gives $\delta {\mathbf {y}}^i = - {\mathbf {y}}^i$. This yields the crucial observation that, by (20), $\mathbf{P}_k({\mathbf {v}}) \delta {\mathbf {y}} = - \mathbf{P}_k({\mathbf {v}}) {\mathbf {y}}$ for all ${\mathbf {v}} \in \mathbb {R}^\ell$. Then, an equivalent yet symmetric linear system is obtained, whose solution is the search direction $\delta {\mathbf {v}}$ at v:

$$\begin{aligned} \begin{bmatrix} \mathbf{Q} + \sigma _k \mathbf{I} &{} \mathbf{A}^\top (\mathbf{I} - \mathbf{P}_k({\mathbf {v}})) \\ (\mathbf{I} - \mathbf{P}_k({\mathbf {v}})) \mathbf{A} &{} - \mu _k (\mathbf{I} - \mathbf{P}_k({\mathbf {v}})/2) \end{bmatrix} \begin{pmatrix} \delta {\mathbf {x}} \\ \delta {\mathbf {y}} \end{pmatrix} = \begin{pmatrix} \mathbf{A}^\top \mathbf{P}_k({\mathbf {v}}) {\mathbf {y}} \\ \mathbf{0} \end{pmatrix} - \mathbf{r}_k({\mathbf {v}}) . \end{aligned}$$

(21)

The active-set structure introduced by $\mathbf{P}_k$ allows us to obtain a symmetric linear system and adopt multi-rank factorization updates [15, 26] while maintaining structure and sparsity of the coefficient matrix [13, 59]. The linear system in (21) always admits a unique solution, since the coefficient matrix is symmetric quasi-definite [63], independent of the problem data.

3.3 Exact linesearch

Given a primal-dual pair v and a search direction $\delta {\mathbf {v}}$, we seek a stepsize $\tau > 0$ to effectively update v to ${\mathbf {v}} + \tau \, \delta {\mathbf {v}}$ in Algorithm 2. Similarly to ${\mathcal {M}}_k$, the function $\psi _k : \tau \mapsto {\mathcal {M}}_k({\mathbf {v}} + \tau \delta {\mathbf {v}})$ is continuously differentiable, piecewise quadratic, and strictly convex. Thus, the optimal stepsize $\tau {:}{=}arg\,min_{t \in \mathbb {R}} \psi _k(t)$ is found as the unique zero of $\psi _k^\prime$, i.e., $\psi _k^\prime (\tau ) = 0$. Since $\psi _k^\prime$ is a piecewise linear, strictly monotone increasing function, the exact linesearch procedure amounts to solving a piecewise linear equation of the form

$$\begin{aligned} 0 = \alpha _k \tau + \beta _k + \frac{2}{\mu _k} \delta \mathbf{w}_k\left[\mathbf{w}_k + \tau \delta \mathbf{w}_k - \Pi _{{\mathcal {C}}}\left(\mathbf{w}_k + \tau \delta \mathbf{w}_k \right) \right] \end{aligned}$$

(22)

with respect to $\tau \in \mathbb {R}$. Here, the coefficients are given by

$$\begin{aligned} \alpha _k\,{:}{=}\,\delta {\mathbf {x}}^\top (\mathbf{Q} + \sigma _k \mathbf{I}) \delta {\mathbf {x}} + \mu _k \delta {\mathbf {y}}^\top \delta {\mathbf {y}} /2 , \end{aligned}$$

(23a)

$$\begin{aligned} \beta _k\,{:}{=}\,\delta {\mathbf {x}}^\top [\mathbf{Q} {\mathbf {x}} + \mathbf{q} + \sigma _k ({\mathbf {x}} - {\mathbf {x}}_k) ] + \mu _k \delta {\mathbf {y}}^\top {\mathbf {y}} /2 , \end{aligned}$$

(23b)

$$\begin{aligned} \mathbf{w}_k\,{:}{=}\,\mathbf{A} {\mathbf {x}} + \mu _k \left( {\mathbf {y}}_k - {\mathbf {y}}/2 \right) , \end{aligned}$$

(23c)

$$\begin{aligned} \delta \mathbf{w}_k\,{:}{=}\,\mathbf{A} \delta {\mathbf {x}} - \mu _k \delta {\mathbf {y}}/2 , \end{aligned}$$

(23d)

whose derivation is reported in Appendix B. Thanks to its peculiar structure, (22) can be solved efficiently and exactly (up to numerical precision), e.g., by sorting and linear interpolation, cf. [33, Alg. 2].

We underline that the stepsize $\tau$ is unique and strictly positive, since ${\mathcal {M}}_k$ is strictly convex and $\delta {\mathbf {v}}$ is a descent direction for ${\mathcal {M}}_k$ at v. This follows from the observation that

$$\begin{aligned} \psi _k^\prime (0) = \delta {\mathbf {v}}^\top \nabla {\mathcal {M}}_k({\mathbf {v}}) = \delta {\mathbf {v}}^\top \mathbf{T}_k \mathbf{r}_k({\mathbf {v}}) = - \delta {\mathbf {v}}^\top \mathbf{T}_k \mathbf{V}_k({\mathbf {v}}) \delta {\mathbf {v}} < 0 , \end{aligned}$$

since $\partial ^2 {\mathcal {M}}_k({\mathbf {v}}) \ni \mathbf{T}_k \mathbf{V}_k(\mathbf{v}) \succ 0$.

4 Algorithm and convergence

Our Quadratic Primal-Dual Optimizer (QPDO), which weaves together the proximal point algorithm and a semismooth Newton method, is outlined in Algorithms 1 and 2. We highlight the nested structure for clarity of presentation. Effectively, Algorithm 1 corresponds to the proximal point algorithm, as discussed in Sect. 2. The proximal operator, ${\mathcal {P}}_k$, is evaluated in Algorithm 2 by solving a sub-problem via the semismooth Newton method, as detailed in Sect. 3. We denote $\mathbf{r}$ and $\mathbf{r}_k$ the outer and inner residuals defined in (4,14), respectively, and v a primal-dual pair $({\mathbf {x}},{\mathbf {y}})$. Infeasibility detection, parameters update, and linear solvers are detailed in Sect. 6.

Algorithm 1 QPDO: Quadratic Primal-Dual Optimizer
input: $\mathbf{Q}$, $\mathbf{q}$, $\mathbf{A}$, $\mathbf{l}$, $\mathbf{u}$
parameters: $\epsilon > 0$, $\epsilon _0 \ge 0$, $\kappa _\epsilon \in [0,1)$, $0 < \sigma _{\text {min}} \le \sigma _0$, $0 < \mu _{\text {min}} \le \mu _0$
guess: $\mathbf{x}_0 \in \mathbb {R}^n$, $\mathbf{y}_0 \in \mathbb {R}^m$
for $k=0,1,2,\dots $ do
if $\Vert \mathbf{r}( \mathbf{v}_k ) \Vert _\infty \le \epsilon $ then
return $\mathbf{v}_k$
end if
find $\mathbf{v}_{k+1}$ such that $\Vert \mathbf{r}_k( \mathbf{v}_{k+1} ) \Vert _\infty \le \epsilon _k$ by invoking Algorithm 2
check for primal-dual infeasibility with $\varDelta \mathbf{v}_k {:}{=}\mathbf{v}_{k+1} - \mathbf{v}_k$
choose parameters $\sigma _{k+1} \in [\sigma _{\text {min}},\sigma _k]$ and $\mu _{k+1} \in [\mu _{\text {min}},\mu _k]$
set $\epsilon _{k+1} \leftarrow \kappa _\epsilon \epsilon _k$
end for

Algorithm 2 Inner loop: semismooth Newton method
$\mathbf{v} \leftarrow \mathbf{v}_k$
repeat
get the search direction $\delta \mathbf{v} \in \mathbb {R}^\ell $ by solving the linear system (21)
get the stepsize $\tau \in \mathbb {R}_{++}$ by solving the piecewise linear equation (22)
set $\mathbf{v} \leftarrow \mathbf{v} + \tau \, \delta \mathbf{v}$
until $\Vert \mathbf{r}_k( \mathbf{v} ) \Vert _\infty \le \epsilon _k$
$\mathbf{v}_{k+1} \leftarrow \mathbf{v}$

4.1 Convergence analysis

This section discusses the convergence of QPDO as outlined in Algorithm 1 and 2. We show that the proposed algorithm either generates a sequence of iterates $ \{\mathbf {v}_k \}$ that in the limit satisfy the optimality conditions (4), when problem (1) is solvable, or provides a certificate of primal and/or dual infeasibility otherwise. Our analysis relies on well-established results for Newton and proximal point methods; in particular, we refer to [38, 56, 61].

First, we focus on the inner loop, described in Algorithm 2 and detailed in Sect. 3.

Lemma 1

Consider an arbitrary but fixed outer iteration, indexed by $k \in \mathbb {N}$, and suppose $\varepsilon _k \ge 0$. Then, the procedure in Algorithm 2 is well defined and terminates after finitely many steps.

Proof

The search direction $\delta {\mathbf {v}}$ exists and is unique, since linear system (21) is always solvable. Similarly, there exists a unique, positive optimal stepsize $\tau$ which solves (22). Thus, all steps of Algorithm 2 are well-defined. Since ${\mathcal {M}}_k$ is continuously differentiable, strictly convex, and piecewise quadratic, the semismooth Newton method with exact linesearch exhibits finite convergence [61, Thm 3]. Thus, $\nabla {\mathcal {M}}_k({\mathbf {v}}) = \mathbf{0}$ after finitely many iterations. Then, by $\nabla {\mathcal {M}}_k(\cdot ) = \mathbf{T}_k \mathbf{r}_k(\cdot )$ with $\mathbf{T}_k$ nonsingular, it reaches $\mathbf{r}_k({\mathbf {v}}) = \mathbf{0}$. Hence, for any $\varepsilon _k \ge 0$, the inner stopping criterion $\Vert \mathbf{r}_k({\mathbf {v}}) \Vert _\infty \le \varepsilon _k$ is eventually satisfied, and the inner loop terminates. $\square$

Notice that, with $\varepsilon _k = 0$, Algorithm 2 returns the unique (proximal) point ${\mathbf {v}}_k^\star {:}{=}{\mathcal {P}}_k( {\mathbf {v}}_k )$.

Let us consider now the outer loop, sketched in Algorithm 1. Recall that, by construction, the regularization parameters are positive and non-increasing. The outer loop consists of inexact proximal point iterations [56], hence global and local convergence properties can be derived based on [38, Prop. 1.2]. The following result shows that criterion $(A_r)$ [38] holds.

Lemma 2

Let ${\mathcal {T}}^{-1}(\mathbf{0})$ be nonempty, any ${\mathbf {v}}_0 \in \mathbb {R}^\ell$ be given, and the sequence $\{ {\mathbf {v}}_k \}$ be generated by Algorithm 1. Then, there exists a summable sequence $\{ e_k \} \subseteq \mathbb {R}_+$ such that

$$\begin{aligned} \Vert {\mathbf {v}}_{k+1} - {\mathbf {v}}_k^\star \Vert \le e_k \quad \forall k \in \mathbb {N}. \end{aligned}$$

Proof

By $\varepsilon _0 \in \mathbb {R}_+$ and $\kappa _\varepsilon \in [0,1)$, the sequence $\{\varepsilon _k\} \subseteq \mathbb {R}_+$ is summable, since $\sum _{k \in \mathbb {N}} \varepsilon _k = \sum _{k \in \mathbb {N}} \kappa _\varepsilon ^k \varepsilon _0 = \varepsilon _0 / (1 - \kappa _\varepsilon ) < + \infty$. By the inner stopping condition, for all $k \in \mathbb {N}$ it holds $\Vert \mathbf{r}_k({\mathbf {v}}_{k+1}) \Vert \le \varepsilon _k$. Morever, since ${\mathcal {M}}_k$ is $\mathbf \Sigma _k$-strongly convex, we have that, for some ${\tilde{\eta }}_k > 0$, it is

$$\begin{aligned} {\tilde{\eta }}_k \Vert {\mathbf {v}} - {\mathbf {v}}_k^\star \Vert \le \Vert \nabla {\mathcal {M}}({\mathbf {v}}) - \nabla {\mathcal {M}}({\mathbf {v}}_k^\star ) \Vert = \Vert \nabla {\mathcal {M}}({\mathbf {v}}) \Vert = \Vert \mathbf{T}_k \, \mathbf{r}_k({\mathbf {v}}) \Vert \end{aligned}$$

for all ${\mathbf {v}} \in \mathbb {R}^\ell$. By the boundedness of $\mathbf \Sigma _k$ away from zero, matrix $\mathbf{T}_k$ is bounded and there exists a constant $\eta > 0$ such that the bound $\Vert {\mathbf {v}} - {\mathbf {v}}_k^\star \Vert \le \eta \Vert \mathbf{r}_k({\mathbf {v}}) \Vert$ holds for all $k \in \mathbb {N}$ and ${\mathbf {v}} \in \mathbb {R}^\ell$. Thus, in particular, for all $k \in \mathbb {N}$ it is

$$\begin{aligned} \Vert {\mathbf {v}}_{k+1} - {\mathbf {v}}_k^\star \Vert \le \eta \Vert \mathbf{r}_k({\mathbf {v}}_{k+1}) \Vert \le \eta \varepsilon _k . \end{aligned}$$

Let $e_k {:}{=}\eta \varepsilon _k$, and the proof is complete. $\square$

Notice that we choose $r = 0$ in $(A_r)$, particularly in (7), for the sake of simplicity, although this may prevent faster convergence; see [38, Thm 2.1]. Relying on the inexact proximal point algorithm, the following result states that Algorithm 1 converges to a solution, if one exists.

Theorem 1

Let ${\mathcal {T}}^{-1}(\mathbf{0})$ be nonempty, any ${\mathbf {v}}_0 \in \mathbb {R}^\ell$ be given, and the sequence $\{ {\mathbf {v}}_k \}$ be generated by Algorithm 1. Then, the sequence $\{ {\mathbf {v}}_k \}$ is well defined and converges to a solution ${\mathbf {v}}^\star \in {\mathcal {T}}^{-1}(\mathbf{0})$.

Proof

The error bound condition, namely criterion $(A_r)$, is enforced by costruction; cf. Lemma 2. It remains to show that there exists $a ,\varepsilon > 0$ such that for all $\mathbf{u} \in \mathbb {R}^\ell$, $\Vert \mathbf{u} \Vert \le \varepsilon$, it holds $dist_{{\mathcal {T}}^{-1}(\mathbf{0})}({\mathbf {v}}) \le a \Vert \mathbf{u} \Vert$ for all ${\mathbf {v}} \in {\mathcal {T}}^{-1}(\mathbf{u})$. Since problem (3) is a polyhedral variational inequality, this property holds globally [19, Sect. 3D]. Hence, we can invoke [38, Prop. 1.2] to conclude that $\Vert {\mathbf {v}}_k - {\mathbf {v}}^\star \Vert \rightarrow 0$. $\square$

Finally, Theorem 2 guarantees that Algorithm 1 terminates if the original problem (1) does not admit any solution. This allows our method to detect infeasibility and to return a certificate.

Theorem 2

Suppose problem (1) is primal and/or dual infeasible, i.e., ${\mathcal {T}}^{-1}(\mathbf{0})$ is empty. Let any ${\mathbf {v}}_0 \in \mathbb {R}^\ell$ be given, the sequence $\{ {\mathbf {v}}_k \}$ be generated by Algorithm 1, and define $\varDelta {\mathbf {v}}_k {:}{=}{\mathbf {v}}_{k+1} - {\mathbf {v}}_k$. Then, the sequence $\{ \varDelta {\mathbf {v}}_k \}$ admits a limit $\varDelta {\mathbf {v}}$, i.e., $\varDelta {\mathbf {v}}_k \rightarrow \varDelta {\mathbf {v}}$. Moreover,

(i)
if $\varDelta {\mathbf {y}} \ne \mathbf{0}$, then problem (1) is primal infeasible and $\varDelta {\mathbf {y}}$ satisfies the primal infeasibility condition (8);
(ii)
if $\varDelta {\mathbf x} \ne \mathbf{0}$, then problem (1) is dual infeasible and $\varDelta {\mathbf {x}}$ satisfies the dual infeasibility condition (9).

Proof

Lemma 5.1 in [4] ensures that $\varDelta {\mathbf {v}}_k \rightarrow \varDelta \mathbf{v}$, since Algorithm 1 is an instance of the proximal point algorithm. If ${\mathcal {T}}^{-1}(\mathbf{0}) = \emptyset$, then $\varDelta {\mathbf {v}} \ne \mathbf{0}$, and this gives certificates of primal and/or dual infeasibility according to [4, Thm 5.1]. $\square$

5 Relationship with similar methods

Our approach is inspired by and shares many features with other recently developed methods. This section elaborates upon their relationship with QPDO.

FBstab [37] “synergistically combines the proximal point algorithm with a primal-dual semismooth Newton-type method” to solve convex QPs. By adopting the Fischer-Burmeister [11, 21] NCP function, FBstab does not depend on an estimate of the active set, which may result in a more regular behavior than QPDO. In contrast, adopting the minimum NCP function, QPDO can exploit factorization updates, perform exact linesearch by solving a piecewise linear equation, and handle simultaneously bilateral constraints.

QPALM is a “proximal augmented Lagrangian based solver for convex QPs” [33]; recent advancements allow to handle nonconvex QPs as well [34]. Given a primal-dual estimate $\overline{\mathbf {v}}$, the exact, unique resolvent update ${\mathbf {v}}^\triangle$ of QPALM [33, Eq. 6], with $\mathbf \Sigma = blockdiag(\sigma ^{-1} \mathbf{I},\mu ^{-1} \mathbf{I})$, is given by

$$\begin{aligned} {\mathbf x}^\triangle= \textstyle arg\,min_{{\mathbf x} \in \mathbb {R}^n} \; \varphi ( {\mathbf x} ) , \end{aligned}$$

(24a)

$$\begin{aligned} {\mathbf y}^\triangle= \overline{{\mathbf y}} + \mu ^{-1} \left[ \mathbf{A} {\mathbf x}^\triangle - \Pi _{{\mathcal {C}}}\left( \mathbf{A} {\mathbf x}^\triangle + \mu \overline{{\mathbf y}} \right) \right] . \end{aligned}$$

(24b)

In (24a), $\varphi$ is given by [33, Eq. 8]

$$\begin{aligned} \varphi ({\mathbf {x}}) {:}{=}f({\mathbf {x}}) + \frac{1}{2 \mu } dist_{{\mathcal {C}}}^2\left( \mathbf{A} {\mathbf {x}} + \mu \overline{{\mathbf {y}}} \right) + \frac{\sigma }{2} \Vert {\mathbf {x}} - \overline{\mathbf {x}}\Vert ^2 \end{aligned}$$

and closely resembles ${\mathcal {M}}_k$ in (15). Since (24a) yields $\nabla \varphi ({\mathbf {x}}^\triangle ) = \mathbf{0}$, combining with (24b) and rearranging give

$$\begin{aligned} \mathbf{0}&= \mathbf{Q} {\mathbf {x}}^\triangle + \mathbf{q} + \mathbf{A}^\top {\mathbf {y}}^\triangle + \sigma \left( {\mathbf {x}}^\triangle - \overline{{\mathbf {x}}} \right) , \end{aligned}$$

(25a)

$$\begin{aligned} \mathbf{0}&= \mathbf{A} {\mathbf {x}}^\triangle + \mu \left( \overline{{\mathbf {y}}} - {\mathbf {y}}^\triangle \right) - \Pi _{{\mathcal {C}}}\left( \mathbf{A} {\mathbf {x}}^\triangle + \mu \overline{{\mathbf {y}}} \right) . \end{aligned}$$

(25b)

Conditions Eqs. (25) and (14) differ only in the argument of $\Pi _{{\mathcal {C}}}$, where the term $- \mu {\mathbf {y}}/2$ is missing in (25b). This underlines the primal-dual nature of QPDO. A comparative investigation into how QPDO copes with changes in the active set [35] and controls the quality of both primal and dual variables during iterations [2, 28] is a topic for future work.

OSQP is a “solver for convex quadratic programs based on the alternating direction method of multipliers” [59]. Rearranging from [59, Alg. 1], with parameters $\alpha = 1$, $\rho = \mu ^{-1}$, and given primal-dual estimate $(\overline{{\mathbf {x}}},\overline{{\mathbf {y}}})$ and constraint estimate $\overline{\mathbf{z}}$, the (unique) primal-auxiliary update $({\mathbf {x}}^\lozenge ,\mathbf{s}^\lozenge )$ satisfies

$$\begin{aligned} \mathbf{0}&= \mathbf{Q} {\mathbf {x}}^\lozenge + \mathbf{q} + \mathbf{A}^\top \mathbf{s}^\lozenge + \sigma ( {\mathbf {x}}^\lozenge - \overline{{\mathbf {x}}} ) , \end{aligned}$$

(26a)

$$\begin{aligned} \mathbf{0}&= \mathbf{A} {\mathbf {x}}^\lozenge + \mu (\overline{{\mathbf {y}}} - \mathbf{s}^\lozenge ) - \overline{\mathbf{z}} . \end{aligned}$$

(26b)

Then, the constraint and dual updates are given by $\mathbf{z}^\lozenge = \Pi _{{\mathcal {C}}}\left( \overline{\mathbf{z}} + \mu \mathbf{s}^\lozenge \right)$ and $\mathbf{y}^\lozenge = \mathbf{s}^\lozenge + \mu ^{-1} \left( \overline{\mathbf{z}} - \mathbf{z}^\lozenge \right)$, respectively. Although conditions (26) resemble (14), an auxiliary variable $\mathbf{s}$ substitutes the dual variable y and the projection in (14) is replaced by the constraint estimate $\overline{\mathbf{z}}$. This makes sub-problem (26) a linear system and results in a first-order method.

6 Implementation details

QPDO has been implemented in C and provides a MATLAB interface. It can solve QPs of the form (1) and makes no assumptions about the problem data other than convexity; it is available online at

https://github.com/aldma/qpdo.

This section discusses some relevant aspects of the program, such as the linear solver, parameters update rules, infeasibility detection, and problem scaling.

6.1 Linear solver

The linear system (21) is solved with CHOLMOD [12], using a sparse Cholesky factorization. This linear solver is analogous to the one adopted in QPALM [33], for the sake of comparison. Let $(\mathbf{r}_k^\text {dual},\mathbf{r}_k^\text {prim})$ partition the inner residual $\mathbf{r}_k$ in (14). Then, formally solving for $\delta {\mathbf {y}}$ in (21), we obtain the expression (omitting subscripts and arguments)

$$\begin{aligned} \delta {\mathbf {y}}&= \mu ^{-1} (\mathbf{I} - \mathbf{P}/2)^{-1} \left[ (\mathbf{I} - \mathbf{P}) \mathbf{A} \delta {\mathbf {x}} + \mathbf{r}^{\text {prim}} \right] \\&= \mu ^{-1} (\mathbf{I} + \mathbf{P}) \left[ (\mathbf{I} - \mathbf{P}) \mathbf{A} \delta {\mathbf {y}} + \mathbf{r}^{\text {prim}} \right] \\&= \mu ^{-1} (\mathbf{I} - \mathbf{P}) \mathbf{A} \delta {\mathbf {x}} + \mu ^{-1} (\mathbf{I} + \mathbf{P}) \mathbf{r}^{\text {prim}} , \end{aligned}$$

where the second and third lines are due to the binary structure of $\mathbf{P}$. Substituting ${\mathbf {y}}$ and rearranging, we obtain a linear system for $\delta {\mathbf {x}}$:

$$\begin{aligned} \left[ \mathbf{Q} + \sigma \mathbf{I} + \mu ^{-1} \mathbf{A}^\top (\mathbf{I} - \mathbf{P}) \mathbf{A} \right] \delta {\mathbf {x}}&= \mathbf{A}^\top \mathbf{P} {\mathbf {y}} - \mu ^{-1} \mathbf{A}^\top (\mathbf{I} - \mathbf{P}) \mathbf{r}^{\text {prim}} - \mathbf{r}^\text {dual} , \end{aligned}$$

which has a symmetric, positive definite coefficient matrix and can be solved by CHOLMOD [12]. On the one hand, this approach allows multi-rank factorization updates [15], thus avoiding the need for a full re-factorization at every inner iteration. On the other hand, sparsity pattern may be lost and significant fill-in may arise due to the matrix-matrix product $\mathbf{A}^\top \mathbf{A}$. For this reason, the current implementation may benefit from directly solving (21) via sparse symmetric linear solvers, possibly with multi-rank factorization updates. To better exploit the data sparsity pattern and the capabilities of the proposed method, we plan to add other linear solvers in future versions.

6.2 Parameters selection

Solving convex QPs via the proximal point algorithm imposes mild restrictions on the sequence of primal-dual regularization parameters $\{ \mathbf \Sigma _k \}$. As mentioned in Sect. 2.2, there are no additional requirements other than being non-increasing and positive definite. However, similarly to forcing sequences in augmented Lagrangian methods [14], the sequence of regularization parameters greatly affects the behaviour of QPDO, and a careful tuning can positively impact the performance. For instance, although faster convergence rates can be expected if $\mathbf \Sigma _k \rightarrow \mathbf{0}$ [38], numerical stability and machine precision should be taken into account. Following [34, Sect. 5.3] and [59, Sect. 5.2], our implementation considers only diagonal matrices of the form $\mathbf \Sigma _k = blockdiag( \sigma _k \mathbf{I}, diag(\varvec{\mu }_k) )$, and we refer to the effect of $\sigma _k$ and $\varvec{\mu }_k$ as primal and dual regularization, respectively.

The dual regularization parameter $\varvec{\mu }_k$ proves critical for the practical performance of the method since it strikes the balance between the number of inner and outer iterations, seeking easy-to-solve sub-problems, effective warm starting, and rapid constraints satisfaction. Therefore, we carefully initialize and update $\varvec{\mu }_k$, guided by the interpretation as a constraint penalization offered by the augmented Lagrangian framework; cf. Sect. 3.1. In our implementation, we consider a vector $\varvec{\mu }_k$ to gain a finer control over the constraint penalization [14]. Given a (primal) initial guess ${\mathbf {x}}_0 \in \mathbb {R}^n$, we initialize as in [8, Sect. 12.4]:

$$\begin{aligned} \mathbf{d}_0&{:}{=}\mathbf{A} {\mathbf {x}}_0 - \Pi _{{\mathcal {C}}}(\mathbf{A} {\mathbf {x}}_0) , \\ \varvec{\mu }_0^i&{:}{=}\Pi _{\left[ \mu _0^{\text {min}} , \mu _0^{\text {max}} \right] }\left( \kappa _\mu \frac{\max ( 1, (\mathbf{d}_0^i)^2/2 )}{\max ( 1, |f({\mathbf {x}}_0)| )} \right) , ~i \in [1;m] , \end{aligned}$$

where $\mu _0^{\text {max}} \ge \mu _0^{\text {min}} > 0$ and $\kappa _\mu \ge 0$. Then, following [34, Sect. 5.3], we monitor the primal residual $\mathbf{r}_{\text {prim}}({\mathbf {v}}) {:}{=}\mathbf{A} {\mathbf {x}} - \Pi _{{\mathcal {C}}}(\mathbf{A} {\mathbf {x}}+{\mathbf {y}})$ from (4) and update the dual regularization parameter $\varvec{\mu }_k$ accordingly. If $| \mathbf{r}_{\text {prim}}^i({v}_{k+1}) | > \max \left( \theta _\mu | \mathbf{r}_{\text {prim}}^i({\mathbf {v}}_k) | , \varepsilon _{\text {opt}} \right)$, we set

$$\begin{aligned} \varvec{\mu }_{k+1}^i = \Pi _{\left[ \mu _{\text {min}}, \varvec{\mu }_k^i \right] }\left( \delta _\mu \frac{\Vert \mathbf{r}_{\text {prim}}({\mathbf {v}}_{k+1}) \Vert _\infty }{| \mathbf{r}_{\text {prim}}^i({\mathbf {v}}_{k+1}) |} \varvec{\mu }_k^i \right) , \end{aligned}$$

where $\theta _\mu \in (0,1)$, $\mu _{\text {min}} > 0$, and $\delta _\mu \ge 0$. Otherwise, we set $\varvec{\mu }_{k+1}^i = \varvec{\mu }_k^i$. These rules adapt the constraint penalization on the current residual, seeking a uniform, steady progression towards feasibility, while making sure the sequences $\{\varvec{\mu }_k^i\}$, $i \in [1;m]$, are non-increasing and bounded away from zero. In our implementation, the default values are $\mu _0^{\text {min}} = 10^{-3}$, $\mu _0^{\text {max}} = 10^3$, $\kappa _\mu = 0.1$, $\mu _{\text {min}} = 10^{-9}$, $\delta _\mu = 10^{-2}$ and $\theta _\mu = 0.25$.

The primal regularization turns out to be less crucial with respect to the dual counterpart. For this reason, it is associated to a scalar value and tuned independently from the residual. Starting from $\sigma _0 > 0$, we apply

$$\begin{aligned} \sigma _{k+1} = \max ( \sigma _{\text {min}} , \kappa _\sigma \sigma _k ) , \end{aligned}$$

where $\sigma _{\text {min}} > 0$ and $\kappa _\sigma \in [0,1]$. In our implementation the default values are $\sigma _0 = 10^{-3}$, $\sigma _{\text {min}} = 10^{-7}$, and $\kappa _\sigma = 0.1$.

Early termination The inner tolerance $\varepsilon _k$ also affects the performance of QPDO, since it balances sub-problem accuracy and early termination. In Algorithm 1, these aspects relate to the parameters $\varepsilon _0$ and $\kappa _\varepsilon$, which drive $\{ \varepsilon _k \}$ to zero. However, finite precision should also be taken into account. In fact, although the semismooth Newton method converges in finitely many iterations, the solution provided is exact up to round-off errors and numerical precision. Therefore, we deviate from Algorithm 1 in this respect and employ the update rule

$$\begin{aligned} \varepsilon _{k+1} = \max ( \varepsilon _{\text {min}} , \kappa _\varepsilon \varepsilon _k ) , \end{aligned}$$

where $0 \le \varepsilon _{\text {min}} \le \varepsilon _{\text {opt}}$. In our implementation, the default values are $\varepsilon _0 = 1$, $\kappa _\varepsilon = 0.1$, $\varepsilon _{\text {min}} = 0.1 \varepsilon _{\text {opt}}$, and $\varepsilon _{\text {opt}} = 10^{-6}$.

6.3 Infeasibility detection

A routine for detecting primal and dual infeasibility of (1) is included in Algorithm 1. This allows the algorithm to terminate with either a primal-dual solution or a certificate of primal or dual infeasibility, for some given tolerances. We adopt the mechanism developed in [4, Sect. 5.2], which holds whenever the proximal point algorithm is employed to solve the KKT conditions (3). Problem (1) is declared primal or dual infeasible based on the conditions given in Sect. 2.5 and the vectors $\varDelta {\mathbf {x}}_k {:}{=}{\mathbf {x}}_{k+1} - {\mathbf {x}}_k$ and $\varDelta {\mathbf {y}}_k {:}{=}{\mathbf {y}}_{k+1} - {\mathbf {y}}_k$, $k \ge 0$. As in [34], we deem the problem primal infeasible if $\varDelta {\mathbf {y}}_k \ne \mathbf{0}$ and the following two conditions hold

$$\begin{aligned} \Vert \mathbf{A}^\top \varDelta {\mathbf {y}}_k \Vert _\infty&\le \varepsilon _{\text {pinf}} \Vert \varDelta {\mathbf {y}}_k \Vert _\infty , \end{aligned}$$

(27a)

$$\begin{aligned} \mathbf{u}^\top \max ({\mathbf {y}}_k, \mathbf{0}) + \mathbf{l}^\top \min ({\mathbf {y}}_k, \mathbf{0})&\le - \varepsilon _{\text {pinf}} \Vert \varDelta {\mathbf {y}}_k \Vert _\infty , \end{aligned}$$

(27b)

where $\varepsilon _{\text {pinf}} > 0$ is some tolerance level. The problem is considered dual infeasible if $\varDelta {\mathbf {x}}_k \ne \mathbf{0}$ and the following conditions hold

$$\begin{aligned} \Vert \mathbf{Q} \varDelta {\mathbf {x}}_k \Vert _\infty&\le \varepsilon _{\text {dinf}} \Vert \varDelta {\mathbf {x}}_k \Vert _\infty , \end{aligned}$$

(28a)

$$\begin{aligned} \mathbf{q}^\top \varDelta {\mathbf {x}}_k&\le - \varepsilon _{\text {dinf}} \Vert \varDelta {\mathbf {x}}_k \Vert _\infty , \end{aligned}$$

(28b)

$$\begin{aligned} (\mathbf{A} \varDelta {\mathbf {x}}_k)^i&{\left\{ \begin{array}{ll} \in [-\varepsilon _{\text {dinf}}, \varepsilon _{\text {dinf}}] \Vert \varDelta {\mathbf {x}}_k \Vert _\infty &{} \mathbf{u}^i, \mathbf{l}^i \in \mathbb {R}, \\ \ge - \varepsilon _{\text {dinf}} \Vert \varDelta {\mathbf {x}}_k \Vert _\infty &{} \mathbf{u}^i = +\infty , \\ \le \varepsilon _{\text {dinf}} \Vert \varDelta {\mathbf {x}}_k \Vert _\infty &{} \mathbf{l}^i = -\infty , \end{array}\right. } ,~i \in [1;m], \end{aligned}$$

(28c)

where $\varepsilon _{\text {dinf}} > 0$ is some tolerance level. In case of primal and/or dual infeasibility, we return the vectors $\varDelta {\mathbf {y}}_k$ and $\varDelta {\mathbf {x}}_k$ as certificates of primal and infeasibility, respectively. In our implementation, the default values are ${\varepsilon _{\text {pinf}} = \varepsilon _{\text {dinf}} = 10^{-6}}$. The reader may refer to [59, Sect. 3.4], [33, Sect. V.C], and [37, Sect. 4.1], and [51, Sect. 4] for analogous applications.

6.4 Preconditioning

Preconditioning, or scaling, the problem may alleviate ill-conditioning and mitigate numerical issues, especially when the problem data span across many orders of magnitude. In our implementation, we closely follow [34, Sect. 5.2] and scale the problem data by performing the Ruiz’s equilibration procedure [58] on the constraint matrix $\mathbf{A}$. This procedure iteratively scales the rows and columns of a matrix in order to make their infinity norms approach one. By default, QPDO performs 10 scaling iterations. Slightly different routines are adopted, e.g., in [59, Sect. 5.1] and [51, Sect. 5.1.2]. Note that, by default, if the problem is initially scaled, the termination conditions for optimality and infeasibility refer to the original, unscaled, problem.

7 Numerical results

We discuss details of our open-source C implementation of QPDO and present computational results on random problems and the Maros-Mészáros set [39]. We test and compare QPDO against the open-source, full-fledged solvers OSQP [59] and QPALM [33, 34], and the commercial interior-point solver MOSEK [43]. Indeed, “the construction of appropriate software is by no means trivial and we wish to make a thorough job of it” [14]; we plan to improve our current implementation, in particular the linear solver discussed in Sect. 6.1, and to report comprehensive numerical results in due time.

7.1 Setup

We consider the tolerance $\varepsilon _{\text {opt}} = 10^{-5}$, and set the tolerances for all the solvers accordingly. In addition, we set the maximum run time of each solver to 100 s and no limit on the maximum number of iterations. We leave all the other settings to the internal defaults. It is worth mentioning that, since no initial guess is provided, QPDO, OSQP, and QPALM start with ${\mathbf {v}}_0 = \mathbf{0}$.

In general it is hard to compare the solution accuracy because the solvers may verify different termination criteria. While QPDO, QPALM and OSQP monitor the residual $\mathbf{r}$ in (4) and check the condition $\Vert \mathbf{r}({\mathbf {v}}^\star ) \Vert _\infty \le \varepsilon _{\text {opt}}$, MOSEK satisfies the complementarity slackness with different metrics and scalings. Therefore, we decided not to include checks on $\Vert \mathbf{r}({\mathbf {v}}^\star ) \Vert _\infty$. Instead, we deem optimal a primal-dual pair ${\mathbf {v}}^\star$ if it is returned by a solver declaring success, otherwise we consider it a failure.

All the experiments were carried out on a desktop running Ubuntu 16.04 LTS with Intel Core i7-8700, 3.20 GHz, and 16 GB RAM. The code for all the numerical examples is available online at [16].

Metrics Let S, P, and $t_{s,p}$ denote the set of solvers, the set of problems, and the time required for solver $s \in S$ to return a solution for problem $p \in P$. The shifted geometric mean (sgm) ${\widehat{t}}_s$ of the run times for solver $s \in S$ on P is defined by

$$\begin{aligned} {\widehat{t}}_s {:}{=}\exp \left( \frac{1}{|P|} \sum _{p \in P} \ln \left( t_{s,p} + t_{\text {shift}} \right) \right) - t_{\text {shift}} \end{aligned}$$

with the shift $t_{\text {shift}} = 1$ s [41]. Here, when solver s fails to solve problem p, $t_{s,p}$ is set to the time limit. We also adopt performance profiles [18] to compare the solver timings. These plot the function $f_s^\text {r} : \mathbb {R}\rightarrow [0,1]$, $s \in S$, defined by

$$\begin{aligned} f_s^\text {r}(\tau ) {:}{=}\frac{|\{ p \in P : t_{s,p} \le \tau \, t_p^\text {min} \}|}{|P|} ,\quad t_p^\text {min} {:}{=}\min _{s \in S} t_{s,p} . \end{aligned}$$

Considering $t_{s,p} = +\infty$ when solver s fails on problem p, $f_s^\text {r}(\tau )$ is the fraction of problems solved by solver s within $\tau$ times the best timing. Note that, although we cannot necessarily assess the performance of one solver relative to another with performance profiles, they still represent a tool for evaluating and comparing the performance of a solver with respect to the best one [30].

However, performance profiles do not provide the percentage of problems that can be solved (for some given tolerance $\varepsilon _\text {opt}$) within a given time t. Thus, on the vein of data profiles [42, Sect. 2.2], we plot the function $f_s^\text {a} : \mathbb {R}\rightarrow [0,1]$, $s \in S$, defined by

$$\begin{aligned} f_s^\text {a}(t) {:}{=}\frac{|\{ p \in P : t_{s,p} \le t \}|}{|P|} . \end{aligned}$$

Considering $t_{s,p} = +\infty$ when solver s fails on problem p, $f_s^\text {a}(t)$ is the fraction of problems solved by solver s within the time t. Note that, in contrast to $f_s^\text {r}$, the time profile $t \mapsto f_s^\text {a}(t)$ is independent from other solvers and displayed with the actual timings of s.

7.2 Random problems

We considered QPs in the form (1) with randomly generated problem data. In each problem instance, the number of variables is $n = \lceil 10^a \rceil$, with a uniformly distributed, and ranges between $10^2$ and $10^3$, i.e., $a \sim {\mathcal {U}}(2,3)$. The number of constraints is $m = \lceil b \, n \rceil$, with $b \sim {\mathcal {U}}(2,5)$. The linear cost is normally distributed, i.e., $\mathbf{q}_i \sim {\mathcal {N}}(0,1)$. The cost matrix is $\mathbf{Q} = \mathbf{P} \mathbf{P}^\top + \alpha \mathbf{I}_n$, where $\mathbf{P} \in \mathbb {R}^{n \times n}$ has $10\%$ nonzero entries $\mathbf{P}_{ij} \sim {\mathcal {N}}(0,1)$, and $\alpha = 10^{-6}$. The constraint matrix $\mathbf{A} \in \mathbb {R}^{m \times n}$ contains $10\%$ nonzero entries $\mathbf{A}_{ij} \sim {\mathcal {N}}(0,1)$. The bounds are uniformly distributed, i.e., $\mathbf{l}_i \sim {\mathcal {U}}(-1,0)$ and $\mathbf{u}_i \sim {\mathcal {U}}(0,1)$. We also investigated equality-constrained QPs. For these problems, $m = \lceil n/b\rceil$, with $b \sim {\mathcal {U}}(2,5)$, and $\mathbf{l} = \mathbf{u} = \mathbf{A} \tilde{\mathbf {x}}$, where $\tilde{\mathbf {x}}_i \sim {\mathcal {N}}(0,1)$. We generated 500 instances of each problem class.

Results Computational results are summarized in Table 1 and shown in Figs. 1, 2. Both performance and time profiles suggest that, for random QPs, QPALM exhibits the best performance, with OSQP slightly slower and QPDO third. For equality-constrained QPs, instead, OSQP performs best with QPALM and QPDO slightly behind. MOSEK is generally slower than the other solvers and, for random QPs, it often declares success with a solution that does not satisfy the condition $\Vert \mathbf{r}({\mathbf {v}}^\star ) \Vert _\infty \le \varepsilon _{\text {opt}}$.

Table 1 Comparison on different problem classes with run times, as shifted geometric means (sgm), and failure rates

Full size table

7.3 Maros-Mészáros problems

We considered the Maros-Mészáros test set [39] of hard QPs and selected those with $n \le 10^3$, due to the limitations mentioned in Sect. 6.1. This yields 73 problems, with $2 \le n \le 1000$, $3 \le m \le 1750$, and the number of nonzeros $6 \le N \le 22292$.

Results Computational results are summarized in Tables 1, 2 and shown in Figs. 3, 4. On this test set, QPDO demonstrates its robustness, solving all the problems. OSQP is very fast for some problems but has a high failure rate; it fails on 5 of the 20 problems reported in Table 2. As a first-order method, OSQP builds upon computationally cheap iterations, but it may take many to cope with ill-conditioning and the relatively high accuracy requirements. QPALM is still competitive but fails on the VALUES problem, due to linear algebra issues. MOSEK seems to perform better than the other solvers on the larger problems, but it often does not satisfy the condition $\Vert \mathbf{r}({\mathbf {v}}^\star ) \Vert _\infty \le \varepsilon _{\text {opt}}$, and fails on many problems. Overall, this proves QPDO is both reliable and effective.

Table 2 Comparison on the larger Maros-Mészáros problems ($N \ge 5000$) with KKT residual and run time for QPDO, QPALM, and MOSEK

Full size table

7.4 Degenerate and infeasible problems

Consider the following parameterized QP, adapted from [37, Sect. 5.4]:

$$\begin{aligned} \min _{{\mathbf x}}\quad\frac{1}{2} {\mathbf x}^\top \begin{bmatrix} 1 {} 0 \\ 0 {} 0 \end{bmatrix} {\mathbf x} + \begin{pmatrix} 1 \\ c \end{pmatrix}^\top {\mathbf x} \\ s.t.\quad\begin{pmatrix} -\infty \\ 1 \\ 1 \end{pmatrix} \le \begin{bmatrix} a {} a \\ 1 {} 0 \\ 0 {} 1 \end{bmatrix} {\mathbf x} \le \begin{pmatrix} 0 \\ 3 \\ b \end{pmatrix} . \end{aligned}$$

By varying a, b and c, we can create degenerate or infeasible test problems.

First, we consider the degenerate problem obtained by setting ${a=0}$, ${b = 3}$, and ${c = 0}$. This problem admits primal solutions ${\mathbf{x}}^\star \in \{ (1,\alpha ) \mid 1 \le \alpha \le 3 \}$. Running with default settings, QPDO signals optimality after 6 proximal iterations and 14 Newton iterations, and returns ${\mathbf{x}} = (1.0,1.0)$, ${\mathbf{y}} = (0.0,-2.0,0.0)$, with residual $\Vert \mathbf{r}({\mathbf{v}}) \Vert _\infty = 1.0 \cdot 10^{-7}$.

Second, we consider a primal infeasible QP by setting $a=1$, $b = 3$ and $c = 0$. QPDO signals primal infeasibility after 3 proximal iteration and 8 Newton iterations, and returns the certificate $\varDelta {\mathbf{y}} = (6.6, -6.6, -6.6) \cdot 10^4$.

Finally, we consider a dual infeasible QP by setting $a=0$, $b = +\infty$ and $c = -1$. For such problem, (0, 1) is a direction of unbounded descent. QPDO signals dual infeasibility after 5 proximal iterations and 12 Newton iterations, and returns the certificate $\varDelta {\mathbf{x}} = (1.1 \cdot 10^{-5}, 1.0 \cdot 10^7)$.

8 Conclusions

This paper presented a primal-dual Newton-type proximal method for convex quadratic programs. We build upon a simple yet crucial result: a suitable merit function for the proximal sub-problem is found in the proximal primal-dual augmented Lagrangian function. This allows us to effectively weave the proximal point method together with semismooth Newton, yielding structured symmetric linear systems, exact linesearch, and the possibility to apply sparse multi-rank factorization updates. Requiring only convexity, the method is simple and easily warm started, can exploit sparsity, is robust to early termination, and can detect infeasibility. We have implemented our method QPDO in a general-purpose solver, written in open-source C code. We benchmarked it against state-of-the-art QP solvers, comparing run times and failure rates. QPDO proved reliable, effective, and competitive.

Availability of data and material

The datasets used, generated, and analyzed during the current study, as well as the associated code for execution and analysis, are available at https://doi.org/10.5281/zenodo.4756720.

Code availability

The C code implementation of the solver is openly available on GitHub at https://github.com/aldma/qpdo.

References

Ali, A., Wong, E., Kolter, J.Z.: A semismooth Newton method for fast, generic convex programming. In: Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 70–79. Sydney (2017). http://proceedings.mlr.press/v70/ali17a.html
Armand, P., Omheni, R.: A globally and quadratically convergent primal-dual augmented Lagrangian algorithm for equality constrained optimization. Optim. Methods Softw. 32(1), 1–21 (2017). https://doi.org/10.1080/10556788.2015.1025401
Article MathSciNet MATH Google Scholar
Arreckx, S., Orban, D.: A regularized factorization-free method for equality-constrained optimization. SIAM J. Optim. 28(2), 1613–1639 (2018). https://doi.org/10.1137/16M1088570
Article MathSciNet MATH Google Scholar
Banjac, G., Goulart, P., Stellato, B., Boyd, S.: Infeasibility detection in the alternating direction method of multipliers for convex optimization. J. Optim. Theory Appl. 183(2), 490–519 (2019). https://doi.org/10.1007/s10957-019-01575-y
Article MathSciNet MATH Google Scholar
Banjac, G., Lygeros, J.: On the asymptotic behavior of the Douglas-Rachford and proximal-point algorithms for convex optimization. Optim. Lett. 15(8), 2719–2732 (2021). https://doi.org/10.1007/s11590-021-01706-3
Article MathSciNet MATH Google Scholar
Bemporad, A.: A numerically stable solver for positive semidefinite quadratic programs based on nonnegative least squares. IEEE Trans. Autom. Control 63(2), 525–531 (2018). https://doi.org/10.1109/TAC.2017.2735938
Article MATH Google Scholar
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont (1996)
MATH Google Scholar
Birgin, E.G., Martínez, J.M.: Practical Augmented Lagrangian Methods for Constrained Optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA (2014)
Book Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. now (2011). https://doi.org/10.1561/2200000016
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Chen, B., Chen, X., Kanzow, C.: A penalized Fischer-Burmeister NCP-function. Math. Program. 88(1), 211–216 (2000). https://doi.org/10.1007/PL00011375
Article MathSciNet MATH Google Scholar
Chen, Y., Davis, T.A., Hager, W.W., Rajamanickam, S.: Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Trans. Math. Softw. 35(3), 1–14 (2008). https://doi.org/10.1145/1391989.1391995
Article MathSciNet Google Scholar
Cheshmi, K., Kaufman, D.M., Kamil, S., Dehnavi, M.M.: NASOQ: numerically accurate sparsity-oriented QP solver. ACM Trans. Graph. 39, 96 (2020)
Article Google Scholar
Conn, A.R., Gould, N.I.M., Toint, P.L.: A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal. 28(2), 545–572 (1991). https://doi.org/10.1137/0728030
Article MathSciNet MATH Google Scholar
Davis, T.A., Hager, W.W.: Multiple-rank modifications of a sparse Cholesky factorization. SIAM J. Matrix Anal. Appl. 22(4), 997–1013 (2001). https://doi.org/10.1137/S0895479899357346
Article MathSciNet MATH Google Scholar
De Marchi, A.: Benchmark examples for QPDO (2021). https://doi.org/10.5281/zenodo.4756720
Dhingra, N.K., Khong, S.Z., Jovanović, M.R.: The proximal augmented Lagrangian method for nonsmooth composite optimization. IEEE Trans. Autom. Control 64(7), 2861–2868 (2019). https://doi.org/10.1109/TAC.2018.2867589
Article MathSciNet MATH Google Scholar
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002). https://doi.org/10.1007/s101070100263
Article MathSciNet MATH Google Scholar
Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings. Springer, Monogr. Math (2009)
Book Google Scholar
Ferreau, H.J., Kirches, C., Potschka, A., Bock, H.G., Diehl, M.: qpOASES: a parametric active-set algorithm for quadratic programming. Math. Program. Comput. 6(4), 327–363 (2014). https://doi.org/10.1007/s12532-014-0071-1
Article MathSciNet MATH Google Scholar
Fischer, A.: A special Newton-type optimization method. Optimization 24, 269–284 (1992). https://doi.org/10.1080/02331939208843795
Article MathSciNet MATH Google Scholar
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956). https://doi.org/10.1002/nav.3800030109
Article MathSciNet Google Scholar
Friedlander, M.P., Orban, D.: A primal-dual regularized interior-point method for convex quadratic programs. Math. Program. Comput. 4(1), 71–107 (2012). https://doi.org/10.1007/s12532-012-0035-2
Article MathSciNet MATH Google Scholar
Friedlander, M.P., Tseng, P.: Exact regularization of convex programs. SIAM J. Optim. 18(4), 1326–1350 (2008). https://doi.org/10.1137/060675320
Article MathSciNet MATH Google Scholar
Gerdts, M., Kunkel, M.: A nonsmooth Newton’s method for discretized optimal control problems with state and control constraints. J. Ind. Manag. Optim. 4(2), 247–270 (2008). https://doi.org/10.3934/jimo.2008.4.247
Article MathSciNet MATH Google Scholar
Gill, P.E., Golub, G.H., Murray, W., Saunders, M.A.: Methods for modifying matrix factorizations. Math. Comput. 28(126), 505–535 (1974)
Article MathSciNet Google Scholar
Gill, P.E., Robinson, D.P.: A primal-dual augmented Lagrangian. Comput. Optim. Appl. 51(1), 1–25 (2012). https://doi.org/10.1007/s10589-010-9339-1
Article MathSciNet MATH Google Scholar
Gill, P.E., Robinson, D.P.: A globally convergent stabilized SQP method. SIAM J. Optim. 23(4), 1983–2010 (2013). https://doi.org/10.1137/120882913
Article MathSciNet MATH Google Scholar
Gondzio, J.: Interior point methods 25 years later. Eur. J. Oper. Res. 218(3), 587–601 (2012). https://doi.org/10.1016/j.ejor.2011.09.017
Article MathSciNet MATH Google Scholar
Gould, N., Scott, J.: A note on performance profiles for benchmarking software. ACM Trans. Math. Softw. (2016). https://doi.org/10.1145/2950048
Article MathSciNet MATH Google Scholar
Gould, N..I..M., Orban, D., Toint, r.P..L.: Numerical methods for large-scale nonlinear optimization. Acta Numer. 14, 299–361 (2005). https://doi.org/10.1017/S0962492904000248
Article MathSciNet MATH Google Scholar
Gurobi Optimization Inc.: Gurobi optimizer reference manual (2021). https://www.gurobi.com/documentation/9.1/refman/refman.html. Accessed from 6 May 2021
Hermans, B., Themelis, A., Patrinos, P.: QPALM: a Newton-type proximal augmented Lagrangian method for quadratic programs. In: IEEE 58th Conference on Decision and Control (CDC), pp. 4325–4330. Nice, France (2019). https://doi.org/10.1109/CDC40024.2019.9030211
Hermans, B., Themelis, A., Patrinos, P.: QPALM: A proximal augmented Lagrangian method for nonconvex quadratic programs (2020)
Hintermüller, M., Ito, K., Kunisch, K.: The primal-dual active set strategy as a semismooth Newton method. SIAM J. Optim. 13(3), 865–888 (2002). https://doi.org/10.1137/S1052623401383558
Article MathSciNet MATH Google Scholar
Izmailov, A.F., Solodov, M.V.: Newton-Type Methods for Optimization and Variational Problems. Springer, New York (2014). https://doi.org/10.1007/978-3-319-04247-3
Book MATH Google Scholar
Liao-McPherson, D., Kolmanovsky, I.: FBstab: a proximally stabilized semismooth algorithm for convex quadratic programming. Automatica 113, 108801 (2020). https://doi.org/10.1016/j.automatica.2019.108801
Article MathSciNet MATH Google Scholar
Luque, F.J.: Asymptotic convergence analysis of the proximal point algorithm. SIAM J. Control Optim. 22(2), 277–293 (1984). https://doi.org/10.1137/0322019
Article MathSciNet MATH Google Scholar
Maros, I., Mészáros, C.: A repository of convex quadratic programming problems. Optim. Methods Softw. 11(1–4), 671–681 (1999). https://doi.org/10.1080/10556789908805768
Article MathSciNet MATH Google Scholar
Minty, G.J.: Monotone (nonlinear) operators in Hilbert space. Duke Math. J. 29(3), 341–346 (1962). https://doi.org/10.1215/S0012-7094-62-02933-2
Article MathSciNet MATH Google Scholar
Mittelmann, H.D.: Benchmarks for optimization software. http://plato.asu.edu/bench.html. Accessed from 19 Nov 2020
Moré, J.J., Wild, S.M.: Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 20(1), 172–191 (2009). https://doi.org/10.1137/080724083
Article MathSciNet MATH Google Scholar
MOSEK ApS: MOSEK optimization toolbox for MATLAB. Release 9.2.42 (2021). https://docs.mosek.com/9.2/toolbox/index.html. Accessed from 5 May 2021
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York, NY, USA (2006)
MATH Google Scholar
O’Connor, D., Vandenberghe, L.: Primal-dual decomposition by operator splitting and applications to image deblurring. SIAM J. Imaging Sci. 7(3), 1724–1754 (2014). https://doi.org/10.1137/13094671X
Article MathSciNet MATH Google Scholar
O’Donoghue, B., Chu, E., Parikh, N., Boyd, S.: Conic optimization via operator splitting and homogeneous self-dual embedding. J. Optim. Theory Appl. 169(3), 1042–1068 (2016). https://doi.org/10.1007/s10957-016-0892-3
Article MathSciNet MATH Google Scholar
Pang, J.S.: Error bounds in mathematical programming. Math. Program. 79(1), 299–332 (1997). https://doi.org/10.1007/BF02614322
Article MathSciNet MATH Google Scholar
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014). https://doi.org/10.1561/2400000003
Article Google Scholar
Patrinos, P., Bemporad, A.: An accelerated dual gradient-projection algorithm for embedded linear model predictive control. IEEE Trans. Autom. Control 59(1), 18–33 (2014). https://doi.org/10.1109/TAC.2013.2275667
Article MathSciNet MATH Google Scholar
Pieraccini, S., Gasparo, M.G., Pasquali, A.: Global Newton-type methods and semismooth reformulations for NCP. Appl. Numer. Math. 44(3), 367–384 (2003). https://doi.org/10.1016/S0168-9274(02)00169-1
Article MathSciNet MATH Google Scholar
Pougkakiotis, S., Gondzio, J.: An interior point-proximal method of multipliers for convex quadratic programming. Comput. Optim. Appl. (2020). https://doi.org/10.1007/s10589-020-00240-9
Article MATH Google Scholar
Qi, L., Jiang, H.: Semismooth Karush-Kuhn-Tucker equations and convergence analysis of Newton and quasi-Newton methods for solving these equations. Math. Oper. Res. 22(2), 301–325 (1997). https://doi.org/10.1287/moor.22.2.301
Article MathSciNet MATH Google Scholar
Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58(1), 353–367 (1993). https://doi.org/10.1007/BF01581275
Article MathSciNet MATH Google Scholar
Robinson, D.P.: Primal-dual methods for nonlinear optimization. Ph.D. thesis, University of California, San Diego (2007)
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976). https://doi.org/10.1287/moor.1.2.97
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976). https://doi.org/10.1137/0314056
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton, NJ (1997)
MATH Google Scholar
Ruiz, D.: A scaling algorithm to equilibrate both rows and columns norms in matrices. Tech. Rep. RAL-TR-2001-034, Rutherford Appleton Laboratory, Oxon, UK (2001)
Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: an operator splitting solver for quadratic programs. Math. Program. Comput. (2020). https://doi.org/10.1007/s12532-020-00179-2
Article MathSciNet MATH Google Scholar
Sun, D., Qi, L.: On NCP-functions. Comput. Optim. Appl. 13(1), 201–220 (1999). https://doi.org/10.1023/A:1008669226453
Article MathSciNet MATH Google Scholar
Sun, J.: On piecewise quadratic Newton and trust region problems. Math. Program. 76(3), 451–467 (1997). https://doi.org/10.1007/BF02614393
Article MathSciNet MATH Google Scholar
Themelis, A., Patrinos, P.: SuperMann: a superlinearly convergent algorithm for finding fixed points of nonexpansive operators. IEEE Trans. Autom. Control 64(12), 4875–4890 (2019). https://doi.org/10.1109/TAC.2019.2906393
Article MathSciNet MATH Google Scholar
Vanderbei, R.J.: Symmetric quasidefinite matrices. SIAM J. Optim. 5(1), 100–113 (1995). https://doi.org/10.1137/0805005
Article MathSciNet MATH Google Scholar
Wolfe, P.: The simplex method for quadratic programming. Econometrica 27(3), 382–398 (1959)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work would not have been possible without the support and enthusiasm of my supervisor, Matthias Gerdts. The author would like to thank Axel Dreves, whose comments on an early draft greatly improved the presentation. The two anonymous reviewers are thanked for critically reading the manuscript and providing comments.

Funding

Open Access funding enabled and organized by Projekt DEAL. Not applicable.

Author information

Authors and Affiliations

Universität der Bundeswehr München, 85577, Neubiberg/Munich, Germany
Alberto De Marchi

Authors

Alberto De Marchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto De Marchi.

Ethics declarations

Competing interests

The author declares that he has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (csv 12 KB)

Appendices

Appendix A: Primal-dual proximal augmented Lagrangian function

We show that the merit function ${\mathcal {M}}$ in (15) for the sub-problem (14) is indeed the primal-dual proximal augmented Lagrangian function, proposed and investigated in [17, 27, 54]. Let us reformulate (1) as the equivalent problem

$$\begin{aligned} \min _{\mathbf{x},\mathbf{z}}\quad f(\mathbf{x}) + g(\mathbf{z}) \qquad s.t.\quad \mathbf{A} \mathbf {x} = \mathbf{z} , \end{aligned}$$

(29)

where $\mathbf{x} \in \mathbb {R}^n$ and $\mathbf{z} \in \mathbb {R}^m$ are decision variables. The Lagrangian ${\mathcal {L}}^z$, the augmented Lagrangian ${\mathcal {L}}_\mu ^z$, and the primal-dual augmented Lagrangian ${\mathcal {M}}_\mu ^z$ functions for (29) are given by

$$\begin{aligned} {\mathcal {L}}^z(\mathbf{x},\mathbf{z},\mathbf{y})&{:}{=}f(\mathbf{x}) + g(\mathbf{z}) + \mathbf{y}^\top (\mathbf{A} \mathbf{x} - \mathbf{z}) , \\ {\mathcal {L}}_\mu ^z(\mathbf{x},\mathbf{z},\mathbf{y})&{:}{=}{\mathcal {L}}^z(\mathbf{x},\mathbf{z},\mathbf{y}) + \frac{1}{2 \mu } \Vert \mathbf{A} \mathbf{x} - \mathbf{z} \Vert ^2 , \\ {\mathcal {M}}_\mu ^z(\mathbf{x},\mathbf{z},\mathbf{y},\overline{\mathbf{y}})&{:}{=}{\mathcal {L}}_\mu ^z(\mathbf{x},\mathbf{z},\overline{\mathbf{y}}) + \frac{1}{2 \mu } \Vert \mathbf{z} - \mathbf{A} \mathbf{x} + \mu (\mathbf{y} - \overline{\mathbf{y}}) \Vert ^2 , \end{aligned}$$

for some given parameter $\mu > 0$ and dual estimate $\overline{\mathbf{y}} \in \mathbb {R}^m$; cf. [7, 14] and [27, 54]. Introducing a primal proximal regularization, we define

$$\begin{aligned} {\mathcal {M}}_{\mu ,\sigma }^z(\mathbf{x},\mathbf{z},\mathbf{y},\overline{\mathbf{x}},\overline{\mathbf{y}}) {:}{=}{\mathcal {M}}_{\mu }^z(\mathbf{x},\mathbf{z},\mathbf{y},\overline{\mathbf{y}}) + \frac{\sigma }{2} \Vert \mathbf{x} - \overline{\mathbf{x}} \Vert ^2 \end{aligned}$$

(30)

for some given parameter $\sigma > 0$ and primal estimate $\overline{\mathbf{x}} \in \mathbb {R}^n$. In the context of primal-dual augmented Lagrangian methods, the function ${\mathcal {M}}_{\mu ,\sigma }^z$ is to be jointly minimized with respect to $\mathbf{x}$, $\mathbf{z}$, and y [27, 54]. Following [17], we consider the explicit minimization over the auxiliary variable $\mathbf{z}$. The minimizer $\mathbf{z}_\mu$ of ${\mathcal {M}}_{\mu ,\sigma }^z$ in (30) is readily obtained as

$$\begin{aligned} \mathbf{z}_\mu (\mathbf{x},\mathbf{y},\overline{\mathbf{y}}){:}{=}arg\,min_{\mathbf{z}} {\mathcal {M}}_{\mu ,\sigma }^z(\mathbf{x},\mathbf{y},\mathbf{z},\overline{\mathbf{x}},\overline{\mathbf{y}}) \nonumber \\= arg\,min_{\mathbf{z}} \left\{ g(\mathbf{z}) + \mathbf{y}^\top (\mathbf{A} \mathbf {x} - \mathbf{z}) + \frac{1}{2 \mu } \Vert \mathbf{A} \mathbf{x} - \mathbf{z} \Vert ^2 + \frac{1}{2 \mu} \Vert \mathbf{z} - \mathbf{A} \mathbf{x} + \mu (\mathbf{y} - \overline{\mathbf{y}}) \Vert ^2 \right\} \nonumber \\= arg\,min_{\mathbf{z}} \left\{ g(\mathbf{z}) + \frac{1}{2 \mu } \Vert \mathbf{A} \mathbf{x} - \mathbf{z} + \mu \overline{\mathbf{y}} \Vert ^2 + \frac{1}{2 \mu } \Vert \mathbf{z} - \mathbf{A} \mathbf{x} + \mu (\mathbf{y} - \overline{\mathbf{y}}) \Vert ^2 \right\} \nonumber \\= arg\,min_{\mathbf{z}} \left\{ g(\mathbf{z}) + \frac{1}{\mu } \Vert \mathbf{z} - \mathbf{A} \mathbf{x} - \mu ( \overline{\mathbf{y}} - \mathbf{y}/2) \Vert ^2 + \frac{\mu }{4} \Vert \mathbf{y} \Vert ^2 - \frac{\mu }{2} \Vert \overline{\mathbf{y}} \Vert ^2 \right\} \nonumber \\= \Pi _{{\mathcal {C}}}\left( \mathbf{A} \mathbf{x} + \mu ( \overline{\mathbf{y}} -\mathbf{y}/2) \right) . \end{aligned}$$

(31)

Considering ${\mathcal {M}}_{\mu ,\sigma }^z$ on the manifold defined by $\mathbf{z}_\mu$ in (31), we get the primal-dual proximal augmented Lagrangian function ${\mathcal {M}}_{\mu ,\sigma }$. This yields

$$\begin{aligned} {\mathcal {M}}_{\mu ,\sigma }(\mathbf{v},\overline{\mathbf{v}}){:}{=}{\mathcal {M}}_{\mu ,\sigma }^z(\mathbf{x},\mathbf{z}_\mu (\mathbf{x},\mathbf{y},\overline{\mathbf{y}}),\mathbf{y},\overline{\mathbf{x}},\overline{\mathbf{y}}) \nonumber \\= f(\mathbf{x}) + \frac{1}{\mu } \Vert \mathbf{z}_\mu (\mathbf{x},\mathbf{y},\overline{\mathbf{y}}) - \mathbf{A} \mathbf {x} - \mu ( \overline{\mathbf{y}} - \mathbf{y}/2) \Vert ^2 + \frac{\mu }{4} \Vert \mathbf{y} \Vert ^2 - \frac{\mu }{2} \Vert \overline{\mathbf{y}} \Vert ^2 + \frac{\sigma }{2} \Vert \mathbf{x} - \overline{\mathbf{x}} \Vert ^2 \nonumber \\= f(\mathbf{x}) + \frac{1}{\mu } dist_{{\mathcal {C}}}^2\left( \mathbf{A} \mathbf{x} + \mu ( \overline{\mathbf{y}} - \mathbf{y}/2) \right) + \frac{\sigma }{2} \Vert \mathbf{x} - \overline{\mathbf{x}} \Vert ^2 + \frac{\mu }{4} \Vert \mathbf{y} \Vert ^2 - \frac{\mu }{2} \Vert \overline{\mathbf{y}} \Vert ^2 , \end{aligned}$$

which matches ${\mathcal {M}}_k$ in (15), up to the constant term $- \mu \Vert \overline{\mathbf{y}} \Vert ^2 / 2$.

Appendix B: Exact linesearch coefficients

We prove that the right-hand side of (22) coincides with $\psi _k^\prime (\tau )$ for all $\tau \in \mathbb {R}$, with the coefficients given in (23). Let $\mathbf{w}_k {:}{=}\mathbf{A} \mathbf{x} + \mu _k (\mathbf{y}_k - \mathbf{y}/2)$ and $\delta \mathbf{w}_k {:}{=}\mathbf{A} \delta \mathbf{x} - \mu _k \mathbf{y}/2$; cf. (23). Then, from (15), we have

$$\begin{aligned} \psi ^\prime (\tau )= \delta \mathbf{v}^\top \nabla {\mathcal {M}}(\mathbf{v} + \tau \delta \mathbf{v}) \\= \begin{pmatrix} \delta \mathbf{x} \\ \delta \mathbf{y} \end{pmatrix}^\top \begin{pmatrix} \mathbf{Q} (\mathbf{x} + \tau \delta \mathbf{x}) + \mathbf{q} + \frac{2}{\mu _k} \mathbf{A}^\top \left[\mathbf{w}_k + \tau \delta \mathbf{w}_k \right. \\ \left. -\Pi _{{\mathcal {C}}}\left(\mathbf{w}_k + \tau \delta \mathbf{w}_k \right) \right] + \sigma _k (\mathbf{x} + \tau \delta \mathbf{x} - \mathbf{x}_k) \\ - \left[ \mathbf{A} (\mathbf{x} + \tau \delta \mathbf{x}) + \mu _k (\mathbf{y}_k - \mathbf{y} - \tau \delta \mathbf{y}) - \Pi _{{\mathcal {C}}}\left(\mathbf{w}_k + \tau \delta \mathbf{w}_k \right) \right] \end{pmatrix} \\= \delta \mathbf{x}^\top \left[ \mathbf{Q} \mathbf{x} + \mathbf{q} + \sigma _k (\mathbf{x} - \mathbf{x}_k) \right] + \frac{\mu _k}{2} \delta \mathbf{y}^\top \mathbf{y} + \tau \delta \mathbf{x}^\top \left( \mathbf{Q} + \sigma _k \mathbf{I} \right) \delta \mathbf{x} + \tau \frac{\mu _k}{2} \delta \mathbf{y}^\top \delta \mathbf{y} \\\quad + \left[ \frac{2}{\mu _k} \mathbf{A} \delta \mathbf{x} - \delta \mathbf{y} \right] ^\top \left[\mathbf{w}_k + \tau \delta \mathbf{w}_k - \Pi _{{\mathcal {C}}}\left(\mathbf{w}_k + \tau \delta \mathbf{w}_k \right) \right] \\= \alpha _k \tau + \beta _k + \frac{2}{\mu _k} \delta \mathbf{w}_k^\top \left[\mathbf{w}_k + \tau \delta \mathbf{w}_k - \Pi _{{\mathcal {C}}}\left(\mathbf{w}_k + \tau \delta \mathbf{w}_k \right) \right] . \end{aligned}$$

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

De Marchi, A. On a primal-dual Newton proximal method for convex quadratic programs. Comput Optim Appl 81, 369–395 (2022). https://doi.org/10.1007/s10589-021-00342-y

Download citation

Received: 12 December 2020
Accepted: 01 December 2021
Published: 06 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10589-021-00342-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Algorithm 1 QPDO: Quadratic Primal-Dual Optimizer
input: \(\mathbf{Q}\), \(\mathbf{q}\), \(\mathbf{A}\), \(\mathbf{l}\), \(\mathbf{u}\)
parameters: \(\epsilon > 0\), \(\epsilon _0 \ge 0\), \(\kappa _\epsilon \in [0,1)\), \(0 < \sigma _{\text {min}} \le \sigma _0\), \(0 < \mu _{\text {min}} \le \mu _0\)
guess: \(\mathbf{x}_0 \in \mathbb {R}^n\), \(\mathbf{y}_0 \in \mathbb {R}^m\)
for \(k=0,1,2,\dots \) do
if \(\Vert \mathbf{r}( \mathbf{v}_k ) \Vert _\infty \le \epsilon \) then
return \(\mathbf{v}_k\)
end if
find \(\mathbf{v}_{k+1}\) such that \(\Vert \mathbf{r}_k( \mathbf{v}_{k+1} ) \Vert _\infty \le \epsilon _k\) by invoking Algorithm 2
check for primal-dual infeasibility with \(\varDelta \mathbf{v}_k {:}{=}\mathbf{v}_{k+1} - \mathbf{v}_k\)
choose parameters \(\sigma _{k+1} \in [\sigma _{\text {min}},\sigma _k]\) and \(\mu _{k+1} \in [\mu _{\text {min}},\mu _k]\)
set \(\epsilon _{k+1} \leftarrow \kappa _\epsilon \epsilon _k\)
end for

Algorithm 2 Inner loop: semismooth Newton method
\(\mathbf{v} \leftarrow \mathbf{v}_k\)
repeat
get the search direction \(\delta \mathbf{v} \in \mathbb {R}^\ell \) by solving the linear system (21)
get the stepsize \(\tau \in \mathbb {R}_{++}\) by solving the piecewise linear equation (22)
set \(\mathbf{v} \leftarrow \mathbf{v} + \tau \, \delta \mathbf{v}\)
until \(\Vert \mathbf{r}_k( \mathbf{v} ) \Vert _\infty \le \epsilon _k\)
\(\mathbf{v}_{k+1} \leftarrow \mathbf{v}\)

On a primal-dual Newton proximal method for convex quadratic programs

Abstract

Similar content being viewed by others

An efficient primal-dual interior point algorithm for convex quadratic semidefinite optimization

Principled analyses and design of first-order methods with inexact proximal operators

IPRQP: a primal-dual interior-point relaxation algorithm for convex quadratic programming

1 Introduction

1.1 Background

1.2 Approach

2 Outer loop: inexact proximal point method

2.1 Optimality conditions

2.2 Proximal point algorithm

2.3 Early termination

2.4 Warm starting

2.5 Primal and dual infeasibility

3 Inner loop: semismooth Newton method

3.1 Merit function

Proposition 1

Proof

3.2 Search direction

3.3 Exact linesearch

4 Algorithm and convergence

4.1 Convergence analysis

Lemma 1

Proof

Lemma 2

Proof

Theorem 1

Proof

Theorem 2

Proof

5 Relationship with similar methods

6 Implementation details

6.1 Linear solver

6.2 Parameters selection

6.3 Infeasibility detection

6.4 Preconditioning

7 Numerical results

7.1 Setup

7.2 Random problems

7.3 Maros-Mészáros problems

7.4 Degenerate and infeasible problems

8 Conclusions

Availability of data and material

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (csv 12 KB)

Appendices

Appendix A: Primal-dual proximal augmented Lagrangian function

Appendix B: Exact linesearch coefficients

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation