A perturbation view of level-set methods for convex optimization

Level-set methods for convex optimization are predicated on the idea that certain problems can be parameterized so that their solutions can be recovered as the limiting process of a root-finding procedure. This idea emerges time and again across a range of algorithms for convex problems. Here we demonstrate that strong duality is a necessary condition for the level-set approach to succeed. In the absence of strong duality, the level-set method identifies ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon $$\end{document}-infeasible points that do not converge to a feasible point as ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon $$\end{document} tends to zero. The level-set approach is also used as a proof technique for establishing sufficient conditions for strong duality that are different from Slater’s constraint qualification.


Introduction
Duality in convex optimization may be interpreted as a notion of the sensitivity of an optimization problem to perturbations of its data.Similar notions of sensitivity appear in numerical analysis, where the effects of numerical errors on the stability of the computed solution are of central concern.Indeed, backward-error analysis (Higham 2002, §1.5) describes the related notion that computed approximate solutions may be considered as exact solutions of perturbations of the original problem.It is natural, then, to ask if duality can help us to understand the behavior of a class of numerical algorithms for convex optimization.In this paper, we describe how the level-set method (van den Berg andFriedlander 2007, 2008a;Aravkin et al. 2018) produces an incorrect solution when applied to a problem for which strong duality fails to hold.In other words, the level-set method cannot succeed if there does not exist a dual pairing that is tight.This failure of strong duality indicates that the stated optimization problem is brittle, in the sense that its value as a function of small perturbations to its data is discontinuous; this violates a vital assumption needed for the level-set method to succeed.
Consider the convex optimization problem minimize x∈X f (x) subject to g(x) ≤ 0, (P) where f and g are closed proper convex functions that map R n to the extended real line R∪{∞}, and X is a convex set in R n .Let the optimal value be τ * p := inf (P) < ∞, so that (P) is feasible.In the context of level-set methods, we may think of g(x) ≤ 0 as representing a constraint that poses a computational challenge.For example, there may not exist any efficient algorithm that can compute the projection onto the constraint set { x ∈ X | g(x) ≤ 0 }.In many important cases, the objective function has a useful structure that makes it computationally convenient to swap the roles of the objective f with the constraint g, and to instead solve the level-set problem where τ is an estimate of the optimal value τ * p .If τ ≈ τ * p , the level-set constraint f (x) ≤ τ ensures that a solution x τ ∈ X of this problem causes f (x τ ) to have a value near τ * p .If, additionally, g(x τ ) ≤ 0, then x τ is a nearly optimal and feasible solution for (P).The trade-off for this potentially more convenient problem is that we must compute a sequence of τ k that converges to the optimal value τ * p .Define the optimal-value function, or simply the value function, of (Q τ ) by (1) This definition suggests that if the constraint in (P) is active at a solution (i.e., g(x) = 0), then τ * p is a root of the equation and in particular, the leftmost root: The surprise is that this is not always true.Fix τ * d to be the optimal value of the Lagrange-dual problem (Rockafellar and Wets 1998, §I) to (P).If strong duality fails for (P)-i.e., τ and the inequality is tight only when strong duality holds, i.e., τ * p = τ * d .Thus, a rootfinding algorithm-such as bisection or Newton's method-implemented so as to yield the leftmost root of the equation (1), converges to a value of τ that prevents (Q τ ) from attaining an optimal solution.This idea is illustrated in Figure 1 and manifested by Example 3.2.
Our discussion identified τ * d as the optimal value of the Lagrange dual problem.However, the same conclusion holds when we consider any dual pairing that arises from a convex perturbation.The following theorem, proven in a later section, encapsulates the main result of this paper.The inequality on v in the following statement accommodates the possibility that the constraint in (P) is not active at the solution, and so applies more generally than the preceding discussion.
Theorem 1.1 For the value function v, Note that the theorem does not address conditions under which v(τ * p ) ≤ 0, which is true if and only if the solution set argmin (P) is not empty.In particular, any x * ∈ argmin (P) is a solution of (Q τ ) for τ = τ * p , and hence v(τ * p ) ≤ 0. However, if argmin (P) is empty, then there is no solution to (Q τ ) and hence v(τ * p ) = +∞.We do not assume that readers are experts in convex duality theory, so we present an abbreviated summary of the machinery needed to develop the proof of Theorem 1.1.We also describe a generalized version of the level-set pairing between the problems (P) and (Q τ ) and thus establish Theorem 5.1, which generalizes Theorem 1.1.We show how this theory can be used as a theoretical device to establish sufficient conditions for strong duality.

Level-set methods
The technique of exchanging the roles of the objective and constraint functions has a long history.For example, the isoperimetric problem-which dates back to the second century B.C.E.-seeks the maximum area that can be circumscribed by a curve of fixed length (Wiegert 2010).The converse problem seeks the minimum-length curve that encloses a certain area.Both problems yield the same circular solution.The meanvariance model of financial portfolio optimization pioneered by Markovitz (Markowitz 1987), is another example.It can be phrased as either the problem of allocating assets that minimize risk (i.e., variance) subject to a specified mean return, or as the problem of maximizing the mean return subject to a specified risk.The correct choice of parameters causes both problems to have the same solution.
The idea of rephrasing an optimization problem as root-finding problem appears often in the optimization literature.The celebrated Levenberg-Marquadt algorithm (Marquardt 1963;Morrison 1960), and trust-region methods (Conn et al. 2000) more generally, use a root-finding procedure to solve a parameterized version of the optimization problem.The widely-used SPGL1 software package for sparse optimization (van den Berg and Friedlander 2013) implements the level-set method for obtaining sparse solutions of linear least-squares and underdetermined linear systems (van den Berg andFriedlander 2008b, 2011).
In practice, only an approximate solution of the problem (P) is required, and the level-set method can be used to obtain an approximate root that satisfies v(τ ) ≤ .The solution x ∈ X of the corresponding level-set problem (Q τ ) is super-optimal and -infeasible: f (x) ≤ τ * p and g(x) ≤ .
(3) Aravkin et al. (2018) give a full description of the general algorithm, together with a complexity analysis showing only O log(1/ ) approximate evaluations of v are required to obtain an -infeasible solution, where the subproblem accuracy is proportional to the final required accuracy specified by (3).

Problem formulation
The formulation (P) is very general, even though the constraint g(x) ≤ 0 represents only a single function of the full constraint set represented by X .There are various avenues for reformulating any combination of constraints that lead to this formulation.For instance, Example 3.2 demonstrates how multiple linear constraints of the form Ax = b can be represented as a constraint on the norm of the residual, i.e., g(x) = Ax − b ≤ 0.More generally, for any set of constraints c(x) ≤ 0 where c = (c i ) is a vector of functions c i , we may set g(x) = ρ(max{0, c(x)}) for any convenient nonnegative convex function ρ that vanishes only at the origin.

Examples
Some examples will best illustrate the behavior of the value function.The first two are semidefinite programs (SDPs) for which strong duality fails to hold.These demonstrate that the level-set method can produce diverging iterates.The last example is a 1-norm regularized least-squares problem.It confirms that the level-set method succeeds in that important case, often used in solving problems that arise in applications of sparse recovery (Chen et al. 2001) and compressed sensing (Candès 2006;Donoho 2006).
For the first two examples below, let x ij denote the (i, j)th entry of the n-by-n symmetric matrix X = (x ij ).The notation X 0 denotes the requirement that X is symmetric positive semidefinite.
Example 3.1 (SDP with infinite gap) whose solution and optimal value are given, respectively, by The Lagrange dual is a feasibility problem: Because the dual problem is infeasible, we assign the dual optimal value τ * (5) Because X * is primal optimal, v(τ ) = 0 for all τ ≥ τ * p = 0. Now consider the parametric matrix for all τ < 0 and > 0, which is feasible for the level-set problem (5)-i.e., v(τ ) is finite.The level-set problem clearly has a zero lower bound that can be approached by sending ↓ 0. Thus, v(τ ) = 0 for all τ < 0. In summary, v(τ ) = 0 for all τ , and so v(τ ) has roots less than the true optimal value τ * p .Furthermore, for τ < 0, there is no primal attainment for (1), because lim ↓0 X(τ, ) does not exist.

Example 3.2 (SDP with finite gap)
Consider the 3 × 3 SDP minimize The positive semidefinite constraint on X, together with the constraint x 11 = 0, implies that x 31 must vanish.Thus, the solution and optimal value are given, respectively, by The Lagrange dual problem is maximize The dual constraint requires y 2 = 1, and thus the optimal dual value is τ * For the application of the level-set method to primal problem (6), we assign which together define the value function As in Example 3.1, any convex nonnegative g function that vanishes on the feasible set could have been used to define v.It follows from (7) that v(τ ) = 0 for all τ ≥ 0. Also, it can be verified that v(τ ) = 0 for all τ ≥ τ * d = −1.To understand this, first define the parametric matrix which is feasible for level-set problem (9), and has objective value g(X ) = 2 .Because X is feasible for all positive , the optimal value vanishes because v(τ Moreover, the set of minimizers for ( 9) is empty for all τ ∈ (−1, 0). Figure 1 illustrates the behavior of this value function.
The level-set method fails since the root of v(τ ) identifies an incorrect optimal primal value τ * p , and instead identifies the optimal dual value τ * The proof of Theorem 1.1 reveals that the behavior exhibited by Examples 3.1 and 3.2 stems from the failure of strong duality with respect to perturbations in the linear constraints.In the case of Example 3.2, we can produce a sequence of matrices X each of which is -infeasible with respect to the infeasibility measure g-cf.(8).However, the limit as ↓ 0 does not produce a feasible point, and the limit does not even exist because the entry x 33 of X goes to infinity.
Example 3.3 (Basis pursuit denoising (Chen et al. 1998(Chen et al. , 2001)) The level-set method implemented in the SPGL1 software package solves the 1-norm regularized leastsquares problem minimize for any value of σ ≥ 0, assuming that the problem remains feasible.(The case σ = 0 is important, as it accommodates the case in which we seek a sparse solution to the under-determined linear system Ax = b.)The algorithm approximately solves a sequence of flipped problems minimize where τ k is chosen so that the corresponding solution Strong duality holds because the domains of the nonlinear functions (i.e., the 1-and 2-norms) cover the whole space; see Rockafellar and Wets (1998, Theorem 11.39), which is also summarized by Theorem 4.1.

Equivalence between value functions
The level-set method is based on a kind of inverse-function relationship between the value function (1) of the flipped problem and the value function of the perturbed original problem Clearly, τ * p = p(0).Aravkin et al. (2013) describe the formal relationship between the value functions v and p, and their respective solutions.We summarize the key aspects here.
Let argmin v(τ ) and argmin p(σ), respectively, denote the set of solutions to the optimization problem underlying the value functions v and p (which may be empty).Thus, for example, if p(σ) < ∞, and argmin p(σ) is empty otherwise.Clearly, argmin p(0) = argmin (P).Because p is defined via an infimum, argmin p(σ) can be empty even if p is finite, in which case we say that the optimal value p(σ) is not attained.
Let S be the set of parameters τ for which the level-set constraint f (x) ≤ τ of (Q τ ) holds with equality.Formally, Theorem 3.1 (Value-function inverses (Aravkin et al. 2013, Theorem 2.1)) For every τ ∈ S, the following statements hold: The condition τ ∈ S means that the constraint of the level-set problem (Q τ ) must be active in order for the result to hold.The following example illustrates the need for this condition.
so it has the wrong optimal value.This theorem is symmetric, and holds if the roles of f and g, and p and v, are reversed.We note that this result holds even if any of the underlying objects that define (P) are not convex.
Part (b) of the theorem confirms that if τ * p ∈ S-i.e., the constraint g(x) ≤ 0 holds with equality at a solution of (P)-then solutions of the level-set problem coincide with solution of the original problem defined by p( 0 In order establish an inverse-function-like relationship between the value functions p and v that always holds for convex problems, we provide a modified definition of the epigraphs for v and w.
This definition is almost exactly the same as the regular definition for the epigraph of a function, given by epi The result below follows immediately from the definition of the value function epigraph.It establishes that (2) holds if (Q τ ) has a solution that attains its optimal value (as opposed to relying on the infimal operator to achieve that value).Proposition 3.1 For the value functions p and v,

Duality in convex optimization
Duality in convex optimization can be understood as describing the behavior of an optimization problem under perturbation to its data.From this point of view, dual variables describe the sensitivity of the problem's optimal value to that perturbation.The description that we give here summarizes a well-developed theory fully described by Rockafellar and Wets (Rockafellar and Wets 1998).We adopt a geometric viewpoint that we have found helpful for understanding the connection between duality and the level-set method.
For this section only, consider the generic convex optimization problem where f : R n → R ∪ {∞} is some arbitrary closed proper convex function.The perturbation approach is predicated on fixing an arbitrary convex function F (x, u) : (Example 4.1 below describes an instance of such a function.)The choice of F determines the perturbation function which describes how the optimal value of f changes under a perturbation u.We seek the behavior of the perturbation function about the origin, at which the value of p coincides with the optimal value τ * p , i.e., p(0) = τ * p .A dual optimization problem is constructed by considering the set of affine functions u → µ, u + q parameterized by (µ, q) ∈ R m × R that minorize p: µ, u + q ≤ p(u) ∀u. (10) The tightest lower minorant is obtained by maximizing the left-hand side µ, u +q over (µ, q) subject to the constraint above.This can be accomplished by first eliminating q and setting it to is the convex conjugate of p. Figure 2(a) illustrates how the choice of q µ causes the affine function to support the epigraph of p. Thus, (10) is equivalent to the expression µ, u − p (µ) ≤ p(u) ∀u.
Now take the supremum over µ of the left-hand side, which results in the value Fig. 2 The relationship between the primal perturbation p(u) and a single instance (with slope µ and intercept q µ ) of the uncountably many minorizing affine functions that define the dual problem.The panel on the left depicts a non-optimal supporting hyperplane that generates a value q µ < τ * p ; the panel on the right depicts an optimal supporting hyperplane that generates a slope µ that causes It is thus evident that the biconjugate p , which is necessarily closed convex, provides a global lower envelope for p, i.e., p (u) ≤ p(u) ∀u.
This inequality is tight at a point u-i.e., p (u) = p(u)-if and only if p is lowersemicontinuous at u.Because of the connection between lower semicontinuity and the closure of the epigraph, we say that p is closed at such points u; see Rockafellar (1970, Theorem 7.1).
Weak duality expresses the following relationship between the optimal primal and dual values: Strong duality holds when τ * p = τ * d , which is equal to the closure of p at the origin.The functions p and p , evaluated at the origin, define dual pairs of optimization problems given by The expression for p (0) is a direct application of the conjugate operator.
The following theorem confirms the weak duality that always holds between the optimal values τ * p and τ * d .It also establishes sufficient conditions that guarantee when the perturbation function p is closed at the origin, and hence strong duality holds.The complete version of this theorem includes statements regarding the relationships between the subdifferentials of p and q at the origin (Rockafellar and Wets 1998, Theorem 11.39).
where c is a vector-valued convex function and A is a matrix.Introduce perturbations u 1 and u 2 to the right-hand sides of the constraints, which gives rise to Lagrange duality, and corresponds to the perturbation function One valid choice for the value function that corresponds to swapping both constraints with the objective to ( 13) can be expressed as , where the operator [u 1 ] + = max{0, u 1 } is taken component-wise on the elements of u 1 .This particular formulation of the value function makes explicit the connection to the perturbation function.We may thus interpret the value function as giving the minimal perturbation that corresponds to an objective value less than or equal to τ .

A duality gap gives a false root
We discuss some intuition as to why we can expect level-set methods to return incorrect solutions when applied to problems without strong duality.Apply the level-set method to (P) and for simplicity, suppose that (P) attains its optimal value and that the constraint g(x) ≤ 0 is active at any solution.Consider the perturbation function and the value function v(τ ) given by (1).First, suppose that strong duality holds for the problem under the given perturbation, so then p(0) = p * * (0), which means that the perturbation function p is closed at the origin.We sketch an example p(u) and the corresponding v(τ ) in the top row of Figure 3.
To understand this picture, first consider the value τ 1 < τ * p , as in the top row of Figure 3.It is evident that v(τ 1 ) is positive, because otherwise there must exist a vector x ∈ X that is super-optimal (f (x) ≤ τ 1 < τ * p ) and feasible (g(x) ≤ 0), which contradicts the definition of τ * p .It then follows that the value u := v(τ 1 ) yields p(u) = τ 1 .For τ 2 > τ * , any solution to the original problem would be feasible (therefore requiring no perturbation u) and would achieve objective value p(0) = τ * p < τ 2 .Furthermore, notice that as τ 1 → τ * p , the value p(u 1 ) varies continuously in τ 1 , where u 1 is the smallest root of p(u) = τ 1 .With τ = τ 1 , we have v(τ 1 ) > 0. With τ = τ 3 > τ * p , we have v(τ ) = 0 because any solution to (P) causes (Q τ ) to have zero value.But for τ * d < τ 2 < τ * p , we see that v(τ 2 ) = 0, because for any positive there exists positive u < such that p(u) ≤ τ 2 .Even though there is no feasible point that achieves a superoptimal value f (x) ≤ τ 2 < τ * p , for any positive there exists an -infeasible point that achieves that objective value.

Lagrange duality framework
Define the function The function F is convex in both of its arguments, and gives rise to the perturbation function p defined by ( 14).In summary, this choice of perturbation leads to the pair as defined by ( 1) and ( 14).
Choose ν = µ, and so there exists x such that F (x, û) ≤ p(û) + µ.Together with (18), we have Therefore, for each > 0, we can find a pair (x, û) that satisfies ( 16), which completes the proof of the second result.
Next we prove the first result, which is equivalent to proving that v(τ which completes the proof.

General duality framework
We generalize the level-set method to arbitrary perturbations and more general notions of duality.In this case we are interested in the value function pair where • is any norm.Because p is parameterized by a vector u, we must consider the norm of the perturbation, rather than just its value.Therefore, v(τ ) is necessarily non-negative.We are thus interested in the leftmost root of the equation v(τ ) = 0, rather than an inequality as in Theorem 1.1.
Theorem 5.1 For the functions p and v defined by (20), The proof is almost identical to that of Theorem 1.1, except that we treat u as a vector, and replace u by u in ( 16), ( 17), and (19).Theorems 1.1 and 5.1 imply that v(τ ) ≤ 0 for all values larger than the optimal dual value (the inequality τ > τ * d is strict, as v(τ * d ) may be infinite).Thus if strong duality does not hold, then v(τ ) identifies the wrong optimal value for the original problem being solved.This means that the level-set method may provide a point arbitrarily close to feasibility, but is at least a fixed distance away from the true solution independent of how close to feasibility the returned point may be.

Sufficient conditions for strong duality
The condition that 0 ∈ dom p may be interpreted as Slater's constraint qualification (Borwein and Lewis 2010, §3.2), which in the context of (P) requires that there exist a point x in the domain of f and for which g(x) < 0. This condition is sufficient to establish strong duality.Here we show how Theorem 1.1 can be used as a device to characterize an alternative set of sufficient conditions that continue to ensure strong duality even for problems that do not satisfy Slater's condition.Proposition 6.1 Problem (P) satisfies strong duality if either one of the following conditions hold: Proof Consider the level-set problem (Q τ ) and its corresponding optimal-value function v(τ ) given by (1).In either case (a) or (b), the feasible set 1) is compact because either X is compact or the level sets of f are compact.Therefore, (Q τ ) always attains its minimum for all τ ≥ inf { f (x) | x ∈ X }.
Suppose strong duality does not hold.Theorem 1.1 then confirms that there exists a parameter τ ∈ (τ * d , τ * p ) such that v(τ ) = 0.However, because (Q τ ) always attains its minimum, there must exist a point x ∈ X such that f (x) ≤ τ < τ * p and g(x) ≤ 0, which contradicts the fact that τ * p is the optimal value of (P).We have therefore established that τ * d = τ * p and hence that (P) satisfies strong duality.
We can use Proposition 6.1 to establish that certain optimization problems that do not satisfy a Slater constraint qualification still enjoy strong duality.As an example, consider the conic optimization problem A concrete application of this model problem is the SDP relaxation of the celebrated phase-retrieval problem (Candès et al. 2013;Waldspurger et al. 2015) minimize X tr(X) subject to AX = b, X 0, ( where K is now the cone of Hermitian positive semidefinite matrices (i.e., all the eigenvalues are real-valued and nonnegative) and c = I is the identity matrix, so that C, X = tr(X).In that setting, Candès et al. (2013) prove that with high probability, the feasible set of ( 21) is a rank-1 singleton (the desired solution), and thus we cannot use Slater's condition to establish strong duality.However, because K is self dual (Boyd and Vandenberghe 2004, Example 2.24), clearly c ∈ int K, and by the discussion above, we can use Proposition 6.1 to establish that strong duality holds (22).A consequence of Proposition 6.1 is that it is possible to modify (P) in order to guarantee strong duality.In particular, we may regularize the objective, and instead consider a version of the problem with the objective as f (x)+µ x , where the parameter µ controls the degree of regularization contributed by the regularization term x .If, for example, f is bounded below on X , the regularized objective is then coercive and Proposition 6.1 asserts that the revised problem satisfies strong duality.Thus, the optimal value function of the level-set problem has the correct root, and the level-set method is applicable.For toy problems such as Examples 3.1 and 3.2, where all of the feasible points are optimal, regularization would not perturb the solution; however, in general we expect that the regularization will perturb the resulting solution-and in some cases this may be the desired outcome.

Fig. 1 A
Fig. 1 A sketch of the value function v for Example 3.2; cf.(9).The value function v(τ ) vanishes for all τ ≥ τ * d , where τ * d < τ * p .This causes the level-set method to converge to an incorrect solution when strong duality fails to hold.
and this dual pairing fails to have strong duality.The application of the level-set method to the primal problem (4) can be accomplished by defining the functions f (X) := −2x 21 and g(X) := |x 11 |, which together define the value function of the level-set problem (Q τ ):
Definition 3.1 (Value function epigraph) The value function epigraph of the optimal value function p(σ) Theorem 4.1 (Weak and strong duality) Consider the primal-dual pair (12).

Fig. 3
Fig. 3 The perturbation function p(u) and corresponding level-set value function v(τ ) for problems with strong duality (top row) and no strong duality (bottom row).Panel (c) illustrates the case when strong duality fails and the graph of p is open at the origin, which implies that τ * d < τ * p ≡ p(0).
minimize x c, x subject to Ax = b, x ∈ K,(21)where A : E 1 → E 2 is a linear map between Euclidean spaces E 1 and E 2 , and K ⊆ E 1 is a closed proper convex cone.This wide class of problems includes linear programming (LP), second-order programming (SOCP), and SDPs, and has many important scientific and engineering applications (Ben-Tal and Nemirovski 2001).If c is contained in the interior of the dual coneK * = { y ∈ E 1 | x, y ≥ 0 ∀x ∈ K }, then c, x > 0 for all feasible x ∈ K. Equivalently, the function f (x) := c, x + δ K (x) is coercive.Thus, (21) is equivalent to the problem minimize x f (x) subject to Ax = b,which has a coercive objective.Thus, Part (a) of Proposition 6.1 applies, and strong duality holds.