Abstract
We consider the class of convex composite minimization problems which consists of minimizing the sum of two nonsmooth extended valued convex functions, with one which is composed with a linear map. Convergence rate guarantees for first order methods on this class of problems often require the additional assumption of Lipschitz continuity of the nonsmooth objective function composed with the linear map. We introduce a theoretical framework where the restrictive Lipschitz continuity of this function is not required. Building on a novel dual representation of the so-called Pasch-Hausdorff envelope, we derive an exact Lipshitz regularization for this class of problems. We then show how the aforementioned result can be utilized in establishing function values-based rates of convergence in terms of the original data. Throughout, we provide examples and applications which illustrate the potential benefits of our approach.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
A fundamental generic optimization problem that covers various classes of convex models arising in many modern applications is the well known composite minimization problem which consists of minimizing the sum of two nonsmooth extended valued convex functions, with one which is composed with a linear map
where both \(f: \mathbb {R}^m \rightarrow (-\infty ,\infty ]\) and \(w: \mathbb {R}^n \rightarrow (-\infty ,\infty ]\) are proper closed and convex and \(\textbf{A}\in \mathbb {R}^{m \times n}\).
This model is very rich and under specific assumptions on the problem’s data, it has led to the development of fundamental primal and primal-dual optimization algorithms, see e.g., [2, 3, 6] and references therein.
Simple algorithms for solving (G) are based on primal first order methods, whereby we suppose that only w admits a computationally tractable proximal map [13], and we obviously want to avoid the proximal computation of \(\textbf{x}\mapsto f(\textbf{A}\textbf{x})\), which in general is intractable, even when f is prox-tractable.Footnote 1 A central property required in the non-asymptotic convergence rate analysis (iteration complexity) in terms of function values of such primal methods, is the Lipschitz continuity of the function f. Therefore, whenever the Lipschitz continuity property for the function f is absent, as it might occur in many applications modeled by problem (G), the use of simple primal-based methods might be impossible. Two examples of such simple algorithms where w is prox-tractable and requiring the Lipschitz continuity of f are:
-
(a)
Proximal subgradient method [15, 20] The proximal subgradient method takes the formFootnote 2\(\textbf{x}^{k+1} = \textrm{prox}_{t_k w}(\textbf{x}^k-t_k \textbf{A}^T f'(\textbf{A}\textbf{x}^k))\) where \(t_k>0\) is a stepsize, and for any proper closed and convex function \(s: \mathbb {R}^n \rightarrow (-\infty ,\infty ]\),
$$\begin{aligned} \textrm{prox}_{s}(\textbf{x}) = \displaystyle \mathop {\text{ argmin }}_{\textbf{u}} \left\{ s(\textbf{u})+\frac{1}{2}\Vert \textbf{u}-\textbf{x}\Vert ^2 \right\} \end{aligned}$$stands for the proximal map of s [13]. The Lipschitz continuity of f is, however, a key assumption needed for establishing rate of convergence [2, Section 9.3].
-
(b)
Smoothing-based methods A common way to solve (G) is to replace f by a smooth approximation \(f_{\mu }\) (\(\mu >0\) is a smoothing parameter), where by “smooth approximation" we mean that \(f_{\mu }\) satisfies that it is \(\frac{\alpha }{\mu }\)-smooth (\(\alpha >0\)) and that
$$\begin{aligned} (AS)\qquad f_{\mu }(\textbf{x}) \le f(\textbf{x}) \le f_{\mu }(\textbf{x})+\beta \mu , \qquad \text {for some parameter } \beta >0. \end{aligned}$$Then, an accelerated proximal gradient method is employed on the smooth problem \(\min f_{\mu }(\textbf{A}\textbf{x})+w(\textbf{x})\), [4, 14]. The latter approach in which the smoothing parameter is fixed in advance can also be refined within an adaptive smoothing which employs one iteration of an accelerated method on the function \(f_{\mu _k}(\textbf{A}\textbf{x})+w(\textbf{x})\) where \(\mu _k\) is a decreasing sequences that diminishes to zero, as k, the dynamic iteration index, increases, see for instance [7, 21]. The existence of such smooth approximations satisfying (AS) is guaranteed when f is Lipschitz continuous. Unfortunately, in general such a guarantee does not exist.
When both f and w admit computationally efficient proximal maps [13], one can consider tackling the composite model (G) by applying primal-dual Lagrangian based methods, such as the popular Alternating Direction of Multipliers (ADM) scheme [9], and its related variants; see for instance [5, 8, 9, 12, 19] and references therein. However, to obtain rates of convergence in terms of function values for these methods, some type of Lipschitz continuity is often required (see for example [8, Remark 3] and [19]), while improved types of convergence results can be derived only under additional assumptions, see e.g., [18].
Contribution and Outline We introduce a theoretical framework where the restrictive Lipschitz continuity of the function f of problem (G) is not required. The derivation and the development of our results rely on a powerful fact involving the so-called Pasch-Hausdorff (PH) envelope of a function [10], which consists of the infimal convolution of the given function with a penalized norm, and which generates a Lipschitz continuous function. This is presented in Sect. 2 where we also derive a new dual formulation of the PH envelope which is a key player in our analysis. The main idea is then to replace the function f with its PH envelope which allows to construct an exact Lipschitz regularization of problem (G); a simple and useful property which appears to have been overlooked in the literature. We prove that as long as the PH parameter is larger than a dual optimal bound, then problem (G) and its exact Lipschitz regularization counterpart are equivalent; see Sect. 3. In Sect. 4 we show how the aforementioned equivalence result can be utilized in establishing function values-based rates of convergence in terms of the original data. Finding a dual bound for the norm of the dual optimal solution, as required by the equivalence result, is not always easy to derive. We address this issue in Sect. 5 where we show that given a Slater point for the general convex model (G), we can evaluate such a bound in terms of this Slater point, and without actually needing to compute the dual problem. Throughout the paper, we provide examples and applications which illustrate the potential benefits of our approach.
Notation Vectors are denoted by boldface lowercase letters, e.g., \(\textbf{y}\), and matrices by boldface uppercase letters, e.g., \(\textbf{B}\). The vectors of all zeros and ones are denoted by \(\textbf{0}\) and \(\textbf{e}\) respectively. The underlying spaces are \(\mathbb {R}^n\)-spaces endowed with an inner product \(\langle \cdot , \cdot \rangle \). The closed ball with center \(\textbf{c}\in \mathbb {R}^n\) and radius \(r>0\) w.r.t. a norm \(\Vert \cdot \Vert _a\) is denoted by \(B_a[\textbf{c},r] = B_{\Vert \cdot \Vert _a}[\textbf{c},r]=\{\textbf{x}\in \mathbb {R}^n: \Vert \textbf{x}-\textbf{c}\Vert _a \le r\}\) and the corresponding open ball by \(B_a(\textbf{c},r) = \{\textbf{x}\in \mathbb {R}^n: \Vert \textbf{x}-\textbf{c}\Vert _a < r\}\). Given a matrix \(\textbf{A}\in \mathbb {R}^{m \times n}\), \(\Vert \textbf{A}\Vert \) denotes its spectral norm: \(\Vert \textbf{A}\Vert = \sqrt{\lambda _{\max }(\textbf{A}^T \textbf{A})}\). We use the standard notation \([n]\equiv \{1,2,\ldots ,n\}\) for a positive integer n. For any extended real-valued function h, the conjugate is defined as \(h^*(\textbf{y}) \equiv \max _{\textbf{x}} \left\{ \langle \textbf{x},\textbf{y}\rangle - h(\textbf{x}) \right\} \). For a given set S, the indicator function \(\delta _S(\textbf{x})\) is equal to 0 if \(\textbf{x}\in S\) and \(\infty \) otherwise. Further standard definitions or notations in convex analysis which are not explicitly mentioned here can be found in the classical book [17].
2 The Pasch-Hausdorff Lipschitz Regularization
Assume that \(\mathbb {R}^m\) is endowed with some norm \(\Vert \cdot \Vert _a\). The dual norm is denoted by \(\Vert \cdot \Vert _a^*\) (not to be confused with the Fenchel conjugate). A natural way to “transform” a function \(h: \mathbb {R}^m \rightarrow (-\infty , \infty ]\) into a Lipschitz continuous function is via the Pasch-Hausdorff (PH) envelope [1, Section 12.3] defined for a parameter \(M>0\) as
It is well known [1, Proposition 12.17] that if a proper function h has an M-Lipschitz minorant (w.r.t. \(\Vert \cdot \Vert _a\)), then \(h^{[M]}\) is the largest M-Lipschitz minorant of h, and the only other case is when \(h^{[M]} \equiv -\infty \). This result does not require any convexity assumption on h.
2.1 A Dual Representation of The Pasch-Hausdorff Envelope
In our setting of problem (G), f is proper closed and convex. In this case, we will now show that the PH envelope admits a dual representation that will be essential to our analysis. This property is stated in the following lemma. We also state and prove the other elementary properties that \(f^{[M]}\) is an M-Lipschitz minorant of f for the sake of completeness. Before proceeding, recall that for any set C, \(\delta _C^*(\textbf{y}) = \sigma _C(\textbf{y}):=\max \{\langle \textbf{y}, \textbf{x}\rangle : \textbf{x}\in C\}\), and \(\displaystyle \mathop {\textrm{ri}}(C)\) stands for the relative interior of C which is nonempty whenever the set C is nonempty and convex, [17, Theorem, 6.2].
Lemma 2.1
(Dual representation of \(f^{[M]}\)) Let \(f:\mathbb {R}^m \rightarrow (-\infty ,\infty ]\) be a proper closed and convex function. Suppose that there exists \(\hat{\textbf{y}} \in \displaystyle \mathop {\textrm{dom}}(f^*)\) such that \(\Vert \hat{\textbf{y}}\Vert _a^* < M\) for some \(M>0\). Then
-
(a)
It holds that
$$\begin{aligned} f^{[M]} := (f^*+\delta _{B_{\Vert \cdot \Vert _a^*}[\textbf{0},M]})^*; \end{aligned}$$(2.2) -
(b)
\(f^{[M]}\) is real-valued and convex and the minimal value in (2.1) is attained;
-
(c)
[1, Proposition 12.17] \(f^{[M]}(\textbf{x})\le f(\textbf{x})\) for all \(\textbf{x}\);
-
(d)
[1, Proposition 12.17] \(f^{[M]}: \mathbb {R}^m \rightarrow \mathbb {R}\) is M-Lipschitz continuous w.r.t. the norm \(\Vert \cdot \Vert _a\).
Proof
(a+b) By [17, Theorem 16.4], if \(\displaystyle \mathop {\textrm{ri}}(\displaystyle \mathop {\textrm{dom}}(f^*)) \cap B_{\Vert \cdot \Vert _a^*}(\textbf{0},M) \ne \emptyset \), then
Since \(f^{**}=f\) (as f is proper closed and convex), and \(\delta _{B_{\Vert \cdot \Vert _a^*}[\textbf{0},M]}^* = \sigma _{B_{\Vert \cdot \Vert _a^*}[\textbf{0},M]} = M \Vert \cdot \Vert _a\), we obtain that \((f^*+\delta _{B_{\Vert \cdot \Vert _a^*}[\textbf{0},M]})^*= f \Box (M \Vert \cdot \Vert _a)=f^{[M]}\). The result [17, Theorem 16.4] also establishes the finiteness and attainment of the minimal value in (2.1). The convexity of \(f^{[M]}\) follows by the fact that it is a conjugate function, see [2, Theorem 4.3]. What is left is to show that \(\displaystyle \mathop {\textrm{ri}}(\displaystyle \mathop {\textrm{dom}}(f^*)) \cap B_{\Vert \cdot \Vert _a^*}(\textbf{0},M) \ne \emptyset \). Indeed, since \(\displaystyle \mathop {\textrm{dom}}(f^*)\) is convex and nonempty (by the convexity and properness of f [2, Theorem 4.5]), it follows that there exists \(\tilde{\textbf{y}} \in \displaystyle \mathop {\textrm{ri}}(\displaystyle \mathop {\textrm{dom}}(f^*))\). Therefore, recalling that \(\hat{\textbf{y}} \in \displaystyle \mathop {\textrm{dom}}(f^*)\), by the line segment principle, for any \(\lambda \in (0,1)\) we have that \(\hat{\textbf{y}}+\lambda (\tilde{\textbf{y}}-\hat{\textbf{y}}) \in \displaystyle \mathop {\textrm{ri}}(\displaystyle \mathop {\textrm{dom}}(f^*))\). Thus, we can take \(\tilde{\lambda } \in (0,1)\) small enough for which \(\hat{\textbf{y}}+\tilde{\lambda }(\tilde{\textbf{y}}-\hat{\textbf{y}}) \in \displaystyle \mathop {\textrm{ri}}(\displaystyle \mathop {\textrm{dom}}(f^*)) \cap B_{\Vert \cdot \Vert _a^*}(\textbf{0},M).\)
(c) Follows from the following elementary argument:
(d) Note that by part (b) \(f^{[M]}\) is real-valued. Then by the triangle inequality,
Changing the roles of \(\textbf{x}\) and \(\textbf{y}\) we also obtain that \(f^{[M]}(\textbf{y}) \le f^{[M]}(\textbf{x}) + M\Vert \textbf{x}-\textbf{y}\Vert _a\), thus establishing the desired result that \(|f^{[M]}(\textbf{x})-f^{[M]}(\textbf{y})| \le M \Vert \textbf{x}-\textbf{y}\Vert _a\) for any \(\textbf{x},\textbf{y}\in \mathbb {R}^m\). \(\square \)
2.2 Some Examples of PH Envelopes
Obviously, computing the PH envelope can be a challenging task. In this section we describe several cases in which its evaluation is tractable. In what follows, for any nonempty set C the distance function with respect to a norm \(\Vert \cdot \Vert _a\) is defined by \(d_{C,\Vert \cdot \Vert _a}(\textbf{x}) = \min _{\textbf{y}\in C} \Vert \textbf{y}-\textbf{x}\Vert _a\). If the distance function is with respect to the Euclidean norm \(\Vert \cdot \Vert = \sqrt{\langle \cdot ,\cdot \rangle }\), then we will settle with the notation \(d_C\).
Example 2.1
(Indicator function) Suppose \(f=\delta _C\) where C is a nonempty closed and convex set. Then the PH envelope of f is given by
Example 2.2
(Ball-pen) Consider the so-called “ball-pen” function \(f: \mathbb {R}^n \rightarrow (-\infty ,\infty ]\) given by \(f(\textbf{x}) = -\sqrt{1-\Vert \textbf{x}\Vert _2^2}\) with \(\displaystyle \mathop {\textrm{dom}}(f) = B[\textbf{0},1]\). Here we assume that \(\Vert \cdot \Vert _a = \Vert \cdot \Vert _2\). Obviously, this is not a Lipschitz continuous function being an extended real-valued function. The M-Lipschitz PH envelope is given by
A rather technical argument that uses the dual representation (2.2) shows that (see Appendix A)
A one-dimensional illustration is given in Fig. 1.
Example 2.3
(Minus sum of logs) Let \(\textbf{b}\in \mathbb {R}^m\) and define \(f(\textbf{z}):= \sum _{i=1}^m f_i(z_i)\), where
We want to find the M-Lipschitz PH envelope of f w.r.t. the \(\ell _1\)-norm:
Note that in the above we exploited the separability of the \(\ell _1\)-norm, which is a demonstration to the fact that the choice of norm might be essential to the ability of computing the PH envelope. Indeed, in this case, computing an explicit expression for the PH envelope under the \(\ell _2\)-norm, for example, seems to be a difficult task. By (2.4),
where for any \(c,z \in \mathbb {R}\) we define
Thus, computing \(f^{[M]}\) amounts to solving the one-dimensional problem (2.5). An explicit expression for \(h_c^{[M]}\) is
The validity of the above expression for \(h_c^{[M]}\) is shown in Appendix B.
3 Exact Lipschitz Regularization for Model (G)
We now return to the general model (G) (equation (1.1)) under the conditions that f and w are proper closed and convex. The main idea is to replace the function f with its PH envelope to obtain the Lipschitz regularized problem
The main question that we wish to address is
Under which conditions are problems (G) and (G\(_M\)) equivalent?
By “equivalent” we mean that the optimal sets of the two problems are identical. We will show in Theorem 3.1 below that as long as M is larger than a bound on the optimal set of the dual problem, then such an equivalency holds. Since duality arguments are essential in our analysis, we first recall the well-known dual problem of (G):
According to [17, Corollary 31.2.1], to guarantee strong duality, it is sufficient that the constraint qualification
holds. We are now ready to answer the main question stated above.
Theorem 3.1
(Equivalence between (G) and (G\(_M\))) Suppose that f and w are proper closed and convex functions and that condition (3.1) holds. In addition, assume that \(\text{ val }(G)>-\infty \). Let \(\textbf{y}^*\) be an optimal solution of the dual problem (DG) and let \(M>\Vert \textbf{y}^*\Vert _a^*\). Then
-
(a)
Problems (G) and \((G_M)\) have the same optimal value.
-
(b)
If \(\textbf{x}^*\) is an optimal solution of problem \((G_M)\), then \(f^{[M]}(\textbf{A}\textbf{x}^*)=f(\textbf{A}\textbf{x}^*)\).
-
(c)
Problems (G) and \((G_M)\) have the same optimal sets.
Proof
-
(a)
By condition (3.1) and the finiteness of \(\text{ val }(G)\), it follows that \(\text{ val }(G)=\text{ val }(DG)\). Since the optimal solution \(\textbf{y}^*\) of the dual problem satisfies \(\Vert \textbf{y}^*\Vert _a^* <M\), it follows that \(\text{ val }(DG)=\text{ val }(R)\), where (R) is the problem
$$\begin{aligned} (\text{ R}) \quad \max _{\textbf{y}} \left\{ -f^*(\textbf{y})-\delta _{B_{\Vert \cdot \Vert _a^*}[\textbf{0},M]}(\textbf{y})-w^*(-\textbf{A}^T\textbf{y})\right\} . \end{aligned}$$Note that by the dual representation (2.2), (R) is actually
$$\begin{aligned} \max _{\textbf{y}} \left\{ -(f^{[M]})^*(\textbf{y})-w^*(-\textbf{A}^T\textbf{y})\right\} , \end{aligned}$$meaning that (R) is the dual problem to \((G_M)\). In particular, \(\text{ val }(G_M) \ge \text{ val }(R)\) is finite. Moreover, by Lemma 2.1(b), \(\displaystyle \mathop {\textrm{dom}}(f^{[M]})=\mathbb {R}^m\), and thus the condition
$$\begin{aligned} \exists \hat{\textbf{x}} \in \displaystyle \mathop {\textrm{ri}}(\displaystyle \mathop {\textrm{dom}}w), \textbf{A}\hat{\textbf{x}} \in \displaystyle \mathop {\textrm{ri}}(\displaystyle \mathop {\textrm{dom}}(f^{[M]})) \end{aligned}$$amounts to “\(\exists \hat{\textbf{x}} \in \displaystyle \mathop {\textrm{ri}}(\displaystyle \mathop {\textrm{dom}}w)\)”which trivially holds as the relative interior of nonempty convex sets is always nonempty. Consequently, strong duality between problems (R) and \((G_M)\) holds. We can finally conclude that
$$\begin{aligned} \text{ val }(G_M) = \text{ val }(R) = \text{ val }(DG) = \text{ val }(G). \end{aligned}$$ -
(b)
Note the following observation that follows from part (a): for any \(N>\Vert \textbf{y}^*\Vert _a^*\), it holds that \(\text{ val }(G_N) = \text{ val }(G)\). Suppose that \(\textbf{x}^*\) is an optimal solution of \((G_M)\). Then
$$\begin{aligned} f^{[M]}(\textbf{A}\textbf{x}^*)+w(\textbf{x}^*) = \text{ val }(G). \end{aligned}$$Assume by contradiction that \(f^{[M]}(\textbf{A}\textbf{x}^*) \ne f(\textbf{A}\textbf{x}^*)\). Then this means that there exists \(\textbf{z}\ne \textbf{A}\textbf{x}^*\) such that \(f^{[M]}(\textbf{A}\textbf{x}^*) = M \Vert \textbf{z}-\textbf{A}\textbf{x}^*\Vert _a+f(\textbf{z})\). Take \(M' \in (\Vert \textbf{y}^*\Vert , M)\), then \(f^{[M']}(\textbf{A}\textbf{x}^*) \le M' \Vert \textbf{z}-\textbf{A}\textbf{x}^*\Vert _a+f(\textbf{z})< M \Vert \textbf{z}-\textbf{A}\textbf{x}^*\Vert _a+f(\textbf{z}) = f^{[M]}(\textbf{A}\textbf{x}^*)\), and therefore
$$\begin{aligned} \text{ val }(G_{M'}) \le f^{[M']}(\textbf{A}\textbf{x}^*) +w(\textbf{x}^*) <f^{[M]}(\textbf{A}\textbf{x}^*) +w(\textbf{x}^*)=\text{ val }(G_M), \end{aligned}$$which is a contradiction to the observation indicated at the beginning of the proof of this part.
-
(c)
Assume that \(\textbf{x}^*\) is an optimal solution of (G). Then
$$\begin{aligned} \text{ val }(G) = f(\textbf{A}\textbf{x}^*) +w(\textbf{x}^*) {\mathop {\ge }\limits ^{(*)}} f^{[M]}(\textbf{A}\textbf{x}^*)+w(\textbf{x}^*) \ge \text{ val }(G_M), \end{aligned}$$where \((*)\) follows from Lemma 2.1(c). However, since by part (a) we have that \(\text{ val }(G_M)=\text{ val }(G)\), it follows that \( f^{[M]}(\textbf{A}\textbf{x}^*)+w(\textbf{x}^*) = \text{ val }(G_M)\) and hence that \(\textbf{x}^*\) is an optimal solution of \((G_M)\). In the opposite direction, assume that \(\textbf{x}^*\) is an optimal solution of problem \((G_M)\). Then by part (b), \(f^{[M]}(\textbf{A}\textbf{x}^*)=f(\textbf{A}\textbf{x}^*)\) and consequently,
$$\begin{aligned} \text{ val }(G_M) = f^{[M]}(\textbf{A}\textbf{x}^*)+w(\textbf{x}^*) = f(\textbf{A}\textbf{x}^*)+w(\textbf{x}^*) \ge \text{ val }(G), \end{aligned}$$and since \(\text{ val }(G)=\text{ val }(G_M)\) (part (a)) we conclude that \(f(\textbf{A}\textbf{x}^*)+w(\textbf{x}^*) = \text{ val }(G)\), meaning that \(\textbf{x}^*\) is an optimal solution of (G).
\(\square \)
Remark 3.1
(Exact Penalty Viewpoint) Theorem 3.1 can be shown as a consequence of a result of Han and Mangasarian [11] on sufficiency conditions on exact penalty functions. Specifically, we can rewrite problem (G) as
and consider the penalized problem
Fixing \(\textbf{x}\) and minimizing with respect to \(\textbf{z}\), we obtain problem \((G_M)\). Theorem 4.9 from [11] shows that indeed when M is larger than the dual norm of an optimal dual solution, then problem \((\text{ G-Pen}_{M})\), has the same optimal set of problem (G-Con), which readily implies that the optimal sets of (G) and \((G_M)\) coincide. Our simple, independent-interest proof reveals the PH envelope’s benefit for the exact penalty approach. Additionally, our work highlights that existing penalty approaches implicitly generate Lipschitz functions, a property crucial for convergence rate analysis.
What remains is of course the question of how to find a bound on the optimal set of the dual problem. This issue will be studied in Sect. 5.
The objective function of problem \((G_M)\) includes the Lipschitz continuous component \(f^{[M]}\). This enables the use of a basic first-order method to achieve non-asymptotic rates of convergence in terms of function values, as explained in the introduction. However, these rates of convergence will depend on the PH envelope \(f^{[M]}\). In the next section we show that in the case where f is an indicator function, rates of convergence in terms of the original data can be obtained.
4 Algorithm Iteration Complexity for a Constrained Model
We focus on the important constrained model
where \(w: \mathbb {R}^n \rightarrow (-\infty ,\infty ]\) is proper closed and convex, \(\textbf{A}\in \mathbb {R}^{m \times n}\) and \(C \subseteq \mathbb {R}^m\) is a nonempty closed and convex set. Model (Q) fits model (G) with \(f = \delta _C\).
By Example 2.1, \(f^{[M]}(\textbf{x})=M d_{C,\Vert \cdot \Vert _a}(\textbf{x})\), and hence the M-Lipschitz regularization of problem (Q) is
For example, if \(C = \{\textbf{b}\}\), meaning that problem (Q) is \(\min \{w(\textbf{x}): \textbf{A}\textbf{x}= \textbf{b}\}\), then \((Q_M)\) has the form
If \(C = \{ \textbf{z}: \textbf{z}\le \textbf{b}\}\), meaning that problem (Q) is \(\min \{ w(\textbf{x}): \textbf{A}\textbf{x}\le \textbf{b}\}\), then in case where \(\Vert \cdot \Vert _a = \Vert \cdot \Vert _2\), \((Q_M)\) has the form
Recall that thanks to Theorem 3.1, problem (Q\(_M\)) has the same optimal set as (Q) under the following assumption.
Assumption 1
-
(a)
w is proper closed and convex.
-
(b)
C is nonempty closed and convex.
-
(c)
\(\text{ val }(Q):=\min _{\textbf{A}\textbf{x}\in C} w(\textbf{x}) >-\infty \).
-
(d)
\(\exists \hat{\textbf{x}} \in \displaystyle \mathop {\textrm{ri}}(\displaystyle \mathop {\textrm{dom}}w), \textbf{A}\hat{\textbf{x}} \in \displaystyle \mathop {\textrm{ri}}(C)\)
-
(e)
\(M>\Vert \textbf{y}^*\Vert _a^*\) for some optimal solution \(\textbf{y}^*\) of the dual problem.
Suppose now that we have an algorithm for solving problem \((\text{ Q}_M)\), and that the sequence generated by the algorithm \(\{\textbf{x}^k\}_{k \ge 0}\) satisfies the following complexity result in terms of function values:
where \(\alpha : \mathbb {R}_{++} \rightarrow \mathbb {R}_+\) satisfies
and \(\textbf{x}^*\) is an optimal solution of problem \((\text{ Q}_M)\). We will now show that the complexity result (4.1) can be translated to a complexity result in terms of the original problem (Q) in the sense that we get an \(\alpha (k)\)-rate of convergence in terms of the original objective function and the constraint violation.
Theorem 4.1
Suppose that Assumption 1 holds for model (Q), and assume that a sequence \(\{\textbf{x}^k\}_{k \ge 0}\) satisfies (4.1) with \(\alpha : \mathbb {R}_{++} \rightarrow \mathbb {R}_+\) satisfying (4.2) and \(\textbf{x}^*\) being an optimal solution of problem \((\text{ Q}_M)\). Then \(\textbf{x}^*\) is an optimal solution of (Q) and the following holds for all \(k \ge 0\):
Proof
By Theorem 3.1, \(\textbf{x}^*\) is also an optimal solution of problem (Q), and thus, in particular, \(d_{C,\Vert \cdot \Vert _a}(\textbf{A}\textbf{x}^*)=0\). Therefore, (4.1) can be rewritten as
By the nonnegativity of the distance function, it follows that that \(w(\textbf{x}^k)-w(\textbf{x}^*)\le \alpha (k).\) Take \(M' =\frac{\Vert \textbf{y}^*\Vert _a^*+M}{2}\). Then (4.3) can be written as
By Assumption 1(e) one has \(M>\Vert \textbf{y}^*\Vert _a^*\), then, Since \(M'>\Vert \textbf{y}^*\Vert _a^*\) and it follows by Theorem 3.1 that \(\textbf{x}^*\) is also a minimizer of
which implies in particular that
and thus, by (4.4), we conclude that
\(\square \)
5 Bounding the Dual Optimal Solution
The results in Sect. 2 assume that we are given a bound on the norm of a dual optimal solution. This bound is not always easy to derive. It is very well known that the boundedness of the dual optimal solution of a given convex problem is guaranteed under the usual Slater condition, see e.g., [17]. In fact, for the classical inequality constrained convex optimization problem, it is possible to exhibit an explicit bound on the norm of dual optimal solutions. More specifically, with \(\{f_{i}\}_{i=0}^m\) convex functions on \(\mathbb {R}^d\), assume that for the convex optimization problem
there exists \(\bar{\textbf{x}} \in \mathbb {R}^d\) such that \( f_i(\bar{\textbf{x}}) <0, i=1,\ldots m\), and that \(f_* >-\infty \). Obviously, \(\bar{\textbf{x}}\) is a slater point of (CC). Then, it is known and easy to show (see e.g., [5, exercise 5.3.1, p.516]) that for any dual optimal solution \(\textbf{y}^* \in \mathbb {R}^d_{+}\), one has
However, to the best of our knowledge, the derivation of such an explicit bound on an optimal dual solution of the general convex model (G) does not seem to be known or/and have been addressed in the literature. In this section, we show that given a Slater point of the primal general problem (G), we can evaluate such a bound in terms of the Slater point without actually needing to compute the dual problem. We then illustrate the potential benefits of this theoretical result.
The model that we consider is our general model (G) (equation (1.1)) under the following assumption.
Assumption 2
-
(a)
\(f: \mathbb {R}^m \rightarrow (-\infty ,\infty ]\) is proper closed and convex.
-
(b)
\(w: \mathbb {R}^n \rightarrow (-\infty , \infty ]\) is proper closed and convex.
-
(c)
The optimal set of (G) is nonempty.
For the sake of analysis of this section we will assume that \(\displaystyle \mathop {\textrm{dom}}(f)\) has the structure.
where \(\textbf{b}\in \mathbb {R}^{m_1}\) and \(C \subseteq \mathbb {R}^{m_2}\) is a nonempty closed and convex set (\(m_1+m_2=m\)). We partition \(\textbf{A}\) as \(\textbf{A}= \begin{pmatrix} \textbf{A}_1 \\ \textbf{A}_2 \end{pmatrix}\), where \(\textbf{A}_1 \in \mathbb {R}^{m_1\times n}, \textbf{A}_2\in \mathbb {R}^{m_2 \times n}\). Then actually the domain of \(\textbf{x}\mapsto f(\textbf{A}\textbf{x})\) is given by \(\{\textbf{x}: \textbf{A}_1 \textbf{x}= \textbf{b}, \textbf{A}_2 \textbf{x}\in C\}.\) We assume that \(\textbf{A}_1\) has full row rank (a mild assumption since otherwise we can remove dependancies). We make the convention that the case \(m_1=0\) corresponds to the situation where \(\displaystyle \mathop {\textrm{dom}}(f) = \{\textbf{x}: \textbf{A}_2 \textbf{x}\in C\}\) and that the case \(m_2=0\) corresponds to the case where \(\displaystyle \mathop {\textrm{dom}}(f) = \{\textbf{x}: \textbf{A}_1 \textbf{x}= \textbf{b}\}\). The partition of a vector \(\textbf{z}\in \mathbb {R}^m\) into \(m_1\) and \(m_2\)-length vectors is given by \(\textbf{z}= (\textbf{z}_1^T,\textbf{z}_2^T)^T\), where \(\textbf{z}_1 \in \mathbb {R}^{m_1}, \textbf{z}_2 \in \mathbb {R}^{m_2}\). We will assume that \(\mathbb {R}^m\) is endowed with the norm
where \(\Vert \cdot \Vert _{\alpha _1}\) and \(\Vert \cdot \Vert _{\alpha _2}\) are norms on \(\mathbb {R}^{m_1}\) and \(\mathbb {R}^{m_2}\) respectively. The dual norm is (as before, \(\textbf{y}_1, \textbf{y}_2\) are the \(m_1\) and \(m_2\)-length blocks of \(\textbf{y}\))
Recall that the dual of problem (G) is
The dual objective function is thus
where \(\mathcal {L}(\textbf{x},\textbf{z};\textbf{y})\) is the Lagrangian function given by
Using the partitions of \(\textbf{y}\) and \(\textbf{z}\) to \(m_1\) and \(m_2\)-length vectors \(\textbf{y}= (\textbf{y}_1^T,\textbf{y}_2^T)^T\), \(\textbf{z}= (\textbf{z}_1^T,\textbf{z}_2^T)^T\), the Lagrangian can thus be rewritten as
Strong duality of the pair (G) and (DG) is guaranteed if we assume, in addition to Assumption 2, the following Slater condition (similar to condition (3.1)):
For the sake of the current analysis, we will replace the above condition with a slightly stronger condition: \(\exists \bar{\textbf{x}}: \textbf{A}_1 \bar{\textbf{x}} = \textbf{b}, \textbf{A}_2 \bar{\textbf{x}} \in \displaystyle \mathop {\textrm{int}}(C), \bar{\textbf{x}} \in \displaystyle \mathop {\textrm{int}}(\displaystyle \mathop {\textrm{dom}}(w))\). The exact assumption, in a more quantitative form, is now stated.
Assumption 3
There exist \(r>0, s>0\) and \(\bar{\textbf{x}}\) such that \(\textbf{A}_1 \bar{\textbf{x}} = \textbf{b}\), \(B_{\alpha _2}[\textbf{A}_2 \bar{\textbf{x}}, r] \subseteq C\) and \(B_2[\bar{\textbf{x}}, s] \subseteq \displaystyle \mathop {\textrm{dom}}(w)\).
As usual, if \(m_2=0\), we make the convention that Assumption 3 reduces to “there exist \(s>0\) and \(\bar{\textbf{x}}\) such that \(\textbf{A}_1 \bar{\textbf{x}} = \textbf{b}, B_2[\bar{\textbf{x}}, s] \subseteq \displaystyle \mathop {\textrm{dom}}(w)\)” and in the case where \(m_1=0\) the assumption reduces to “there exist \(r>0, s>0\) and \(\bar{\textbf{x}}\) such that \(B_{\alpha _2}[\textbf{A}_2 \bar{\textbf{x}}, r] \subseteq C, B_2[\bar{\textbf{x}}, s] \subseteq \displaystyle \mathop {\textrm{dom}}(w)\)".
We are now ready to prove the main theorem connecting an upper bound on the norm of optimal dual solutions to a given Slater point.
Theorem 5.1
(Bound on optimal dual solutions) Suppose that Assumptions 2 and 3 hold with \(\bar{\textbf{x}} \in \mathbb {R}^n, r>0\) and \(s>0\). Let \(\textbf{y}\) be an optimal solution of the dual problem (DG). Then
where
and
where \(\Vert \textbf{A}_2\Vert _{2,\alpha _2} = \max \{ \Vert \textbf{A}_2 \textbf{v}\Vert _{\alpha _2}: \Vert \textbf{v}\Vert _2=1\}\), \(\sigma _{\min }(\textbf{A}_1) = \sqrt{\lambda _{\min }(\textbf{A}_1 \textbf{A}_1^T)}\) is the minimal singular value of \(\textbf{A}_1\) and \(D_{2,\alpha _1^*}\) is a constant satisfying that \(\Vert \textbf{y}_1\Vert _2 \ge D_{2,\alpha _1^*} \Vert \textbf{y}_1\Vert _{\alpha _1}^*\) for all \(\textbf{y}_1 \in \mathbb {R}^{m_1}\) and \(\textbf{U}_2\in \mathbb {R}^{m \times m_2}\) the submatrix of \(\textbf{I}_{m}\) comprising the last \(m_2\) columns.
Proof
By the definition \(\textbf{U}_2\), for any \(\textbf{w}\in \mathbb {R}^{m_2}\), we have that \(\textbf{U}_2 \textbf{w}= \begin{pmatrix} \textbf{0}_{m_1} \\ \textbf{w}\end{pmatrix} \in \mathbb {R}^m\). Define \(\bar{\textbf{z}}_1 = \textbf{b}, \bar{\textbf{z}}_2 = \textbf{A}_2 \bar{\textbf{x}}\). For any \(\textbf{d}\in B_{\alpha _2}[\textbf{0},r], \textbf{u}\in B_2[\textbf{0},s]\), utilizing (5.2) and (5.3) and Assumption 3, we have,
where the last equality follows from the relations \(\textbf{A}_1 \bar{\textbf{x}} = \bar{\textbf{z}}_1\) and \(\textbf{A}_2 \bar{\textbf{x}} = \bar{\textbf{z}}_2\). Rearranging terms, we obtain that
Take \(\tilde{\textbf{d}}\) such that \(\Vert \tilde{\textbf{d}}\Vert _{\alpha _2}=r\) for which \(\langle \tilde{\textbf{d}}, \textbf{y}_2 \rangle = r\Vert \textbf{y}_2\Vert _{\alpha _2}^*\) (such \(\tilde{\textbf{d}}\) exists by the definition of the dual norm). Also, define
so that \(\langle \tilde{\textbf{u}}, \textbf{A}_1^T \textbf{y}_1 \rangle = -s \Vert \textbf{A}_1^T \textbf{y}_1\Vert _2\). Plugging \(\textbf{d}= \tilde{\textbf{d}}\) and \(\textbf{u}= \textbf{0}\) in (5.5) yields
Plugging \(\textbf{d}=\textbf{0}\) and \(\textbf{u}= \tilde{\textbf{u}}\) in (5.5), we obtain
We have by the Cauchy-Schwarz inequality thatFootnote 3
which combined with (5.6) yields
Using the fact that \(\Vert \textbf{A}_1^T \textbf{y}_1\Vert _2\ge \sqrt{\lambda _{\min } (\textbf{A}_1 \textbf{A}_1^T)}\Vert \textbf{y}_1\Vert _2\ge D_{2,\alpha _1^*}\sigma _{\min }(\textbf{A}_1) \Vert \textbf{y}_1\Vert _{\alpha _1}^*\), we finally obtain that
\(\square \)
Remark 5.1
(Case \(m_1=0\) ) In the case where \(m_1=0\) in which \(\displaystyle \mathop {\textrm{dom}}(f) = \{ \textbf{x}: \textbf{A}\textbf{x}\in C\}\), the result is that under Assumption 3 it holds that
Remark 5.2
(Case \(m_2=0\) ) In the case where \(m_2=0\) in which \(\displaystyle \mathop {\textrm{dom}}(f) = \{ \textbf{x}: \textbf{A}\textbf{x}= \textbf{b}\}\), the result is that under Assumption 3 it holds that
Application Examples We end this section with some applications illustrating the potential of our results.
Example 5.1
(Basis pursuit) Consider the so-called “basis pursuit” problem
that fits the general model (G) with \(w(\textbf{x}) = \Vert \textbf{x}\Vert _1\) and \(f = \delta _{\{\textbf{b}\}}\). Problem (5.7) is a well known “convex" relaxation of a compressed sensing model. Suppose that \(\bar{\textbf{x}}\) satisfies \(\textbf{A}\bar{\textbf{x}}=\textbf{b}\) and s is an arbitrary positive scalar. If we take \(\Vert \cdot \Vert _{\alpha _1}=\Vert \cdot \Vert _2\), then according to Remark 5.2, the bound that we obtain on \(\Vert \textbf{y}\Vert _2\) is
Taking \(s \rightarrow \infty \) (as s can be taken arbitrarily large), we obtain the bound
Invoking Theorem 3.1, and recalling that \(f^{[\gamma ]}(\textbf{z}) = \gamma \Vert \textbf{z}-\textbf{b}\Vert _2\) (see Example 2.1), we can now deduce that problem (5.7) is equivalent to
whenever \(\gamma >\frac{\sqrt{n}}{\sigma _{\min }(\textbf{A})}\).
This provides an exact penalty (see e.g., [5, 16]) unconstrained reformulation of problem (5.7), with an explicit exact penalty parameter.
Example 5.2
(Nonsmooth minimization over linear inequalities) Consider the problem
where w is a real-valued convex function. Denote the rows of \(\textbf{A}\in \mathbb {R}^{m \times n}\) by \(\textbf{a}_1^T,\ldots ,\textbf{a}_m^T\). This problem fits the general model (G) with \(f = \delta _{C}\) where \(C = \{\textbf{z}: \textbf{z}\le \textbf{b}\}\). Suppose that \(\bar{\textbf{x}}\) be a point satisfying \(\textbf{A}\bar{\textbf{x}}<\textbf{b}\). Obviously \(B_{\infty }[\textbf{A}\bar{\textbf{x}},r] \subseteq C\) with \(r = \min _{i \in [m]}\{b_i-\textbf{a}_i^T \bar{\textbf{x}}\}\). Then according to Remark 5.1, we have the following boundFootnote 4 on the the \(\ell _1\)-norm of the dual optimal solution:
For a given \(M>0\), the M-Lipschitz counterpart of f is
Thus, by Theorem 3.1, problem (5.8) is equivalent to
as long as \(\gamma >\frac{w(\bar{\textbf{x}})-\text{ val }(G)}{ \min _{i \in [m]}\{b_i-\textbf{a}_i^T \bar{\textbf{x}}\}}\). In case where \(\textbf{b}>\textbf{0}\) and w is nonnegative, we can choose \(\bar{\textbf{x}}=\textbf{0}\) and use the fact that \(\text{ val }(G)\ge 0\) to obtain the simplified upper bound
Example 5.3
(Analytic center of polytops) Consider the problem of finding the analytic center of the set \(P = \{\textbf{x}: \textbf{A}\textbf{x}\ge \textbf{b}\}\) with \(\textbf{A}\in \mathbb {R}^{m \times n}\) and \(\textbf{b}\in \mathbb {R}^m\):
Here \(\textbf{a}_1^T,\ldots ,\textbf{a}_m^T\) are the rows of \(\textbf{A}\). This problem fits our model (G) with \(w \equiv 0\) and \(f(\textbf{z}) = \sum _{i=1}^m f_i(z_i)\), where
By Example 2.3, we have that
where for any \(c,z \in \mathbb {R},\)
Since the underlying norm on the primal space is the \(\ell _1\)-norm, it follows that we need to upper bound the \(\ell _{\infty }\)-norm of the dual optimal solution, and this is done using Remark 5.1:
where \(\bar{\textbf{x}}\) is a point satisfying \(\textbf{A}\bar{\textbf{x}}>\textbf{b}\) and \(B_1[\textbf{A}\bar{\textbf{x}}, r] \subseteq \{\textbf{z}: \textbf{z}>\textbf{b}\}\). Since \(\textbf{A}\bar{\textbf{x}}>\textbf{b}\), the choice \(r = \frac{1}{2}\min _{i \in [m]}\{ \textbf{a}_i^T \bar{\textbf{x}}-b_i\}\) implies the inclusion relation, and we will use this value for r. We also have
If in addition we know that the the polytope P is bounded and contained in \(B_2[\textbf{0},R]\), then we can also find a lower bound on the optimal value of (AC) using the following obvious inequality that holds for any \(\textbf{x}\):
Thus, the bound (5.10) in this setting implies that
The problem (AC) is therefore equivalent to
as long as
where \(r = \frac{1}{2}\min _{i \in [m]}\{ \textbf{a}_i^T \bar{\textbf{x}}-b_i\}\).
Notes
We often use the slightly fuzzy notion “prox-tractable" to describe a function whose proximal map can be computed efficiently.
\(f'(\textbf{x})\) is an arbitrary member of the subdifferential set \(\partial f(\textbf{x})\).
\(\Vert \textbf{A}_2^T\Vert _{\alpha _2^*,2}=\max \{\Vert \textbf{A}_2^T \textbf{v}\Vert _2: \Vert \textbf{v}\Vert _{\alpha _2}^*=1\}\)
It coincides with the bound (5.1) with \(f_i(\textbf{x})=\textbf{a}_i^T \bar{\textbf{x}}-b_i.\)
References
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, second edition Springer, Cham (2017)
Beck, A.: First-order methods in optimization, volume 25 of MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia; Mathematical Optimization Society, Philadelphia (2017)
Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal recovery problems. In: Palomar, D., Eldar, Y.C. (eds.) Convex Optimization in Signal Processing and Communications, pp. 139–162. Cambridge University Press, Cambridge (2009)
Beck, A., Teboulle, M.: Smoothing and first order methods: a unified framework. SIAM J. Optim. 22(2), 557–580 (2012)
Bertsekas, D.P.: Nonlinear Programming, second edition Athena Scientific, Belmont (1999)
Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)
Boţ, R.I., Bohm, A.: Variable smoothing for convex optimization problems using stochastic gradients. J. Sci. Comput. 85(2), 33 (2020)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. (2010). https://doi.org/10.1007/s10851-010-0251-1
Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator Splitting Methods in Nonlinear Mechanics, vol. 9. Society for Industrial Mathematics (1989)
Hausdorff, F.: Über halbstetige Funktionen und deren Verallgemeinerung. Math. Z. 5, 292–309 (1919)
Han, S.P., Mangasarian, O.L.: Exact penalty functions in nonlinear programming. Math. Program. 17(1), 251–269 (1979)
He, B., Yuan, X.: On the \({O}(1/n)\) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Moreau, J.J.: Proximité et dualité dans un espace Hilbertien. phBulletin de la Société Mathématique de France 90, 273–299 (1965)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1, Ser. A), 127–152 (2005)
Polyak, B.T.: Minimization of unsmooth functionals. USSR Comput. Math. Phys. 9, 14–29 (1969)
Polyak, B.T.: Introduction to Optimization. Optimization Software Inc., New York (1987)
Rockafellar, R.T.: Convex Analysis. Princeton Mathematical Series, vol. 28. Princeton University Press, Princeton, N.J. (1970)
Sabach, S., Teboulle, M.: Faster Lagrangian-based methods in convex optimization. SIAM J. Optim. 32, 204–227 (2022)
Shefi, R., Teboulle, M.: Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM J. Optim. 24, 269–297 (2014)
Shor, N.Z.: Minimization Methods for Nondifferentiable Functions, volume 3 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin,: Translated from the Russian by Kiwiel, K. C., Ruszczyński, A.: (1985)
Tran-Dinh, Q.: Adaptive smoothing algorithms for nonsmooth composite convex minimization. Comput. Optim. Appl. 66(3), 425–451 (2017)
Funding
Open access funding provided by Tel Aviv University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no Conflict of interest to declare that are relevant to the content of this article.
Additional information
Communicated by Yurii Nesterov.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The research of A. Beck was partially supported by the Israel Science Foundation Grant 926/21. The research of M. Teboulle was partially supported by the Israel Science Foundation under ISF Grant 2619/20.
Appendices
Appendix A: PH Envelope of Example 2.2
First note that \(f^*(\textbf{y}) = \sqrt{1+\Vert \textbf{y}\Vert _2^2}\) (see [2, Section 4.4.13]). Thus, by the dual representation of the PH envelope (2.2), we have for any \(\textbf{x}\),
Writing the optimality conditions for the above, we get that \(\textbf{y}\in \mathbb {R}^m\) is an optimal solution if and only if there exists s such that
We explore two cases.
Case I If \(\Vert \textbf{y}\Vert _2<M\), then \(s=0\) and thus
Taking the squared norm of both sides we obtain \(\Vert \textbf{x}\Vert _2^2 = \frac{\Vert \textbf{y}\Vert _2^2}{1+\Vert \textbf{y}\Vert _2^2}\). Consequently, \(\Vert \textbf{x}\Vert _2 < 1\) and \(\Vert \textbf{y}\Vert _2^2 = \frac{\Vert \textbf{x}\Vert _2^2}{1-\Vert \textbf{x}\Vert _2^2}\). Note that the requirement \(\Vert \textbf{y}\Vert _2 <M\) translates to \(\Vert \textbf{x}\Vert _2 < \frac{M}{\sqrt{1+M^2}}\). Thus, under the condition \(\Vert \textbf{x}\Vert _2 < \frac{M}{\sqrt{1+M^2}}\), by (A.3) the optimal solution of (A.1) is
and the optimal value (whenever \(\Vert \textbf{x}\Vert _2 < \frac{M}{\sqrt{1+M^2}}\)) is
Case II If \(\Vert \textbf{y}\Vert _2 =M\), then by (A.2), \(\textbf{x}= \left( 2s +\frac{1}{\sqrt{1+M^2}} \right) \textbf{y}\). Taking the squared norm of both sides we obtain that \( \left( 2s +\frac{1}{\sqrt{1+M^2}} \right) ^2 = \frac{\Vert \textbf{x}\Vert _2^2}{M^2};\) therefore \(2s+\frac{1}{\sqrt{1+M^2} }= \pm \frac{\Vert \textbf{x}\Vert _2}{M}\), and since \(s \ge 0\), then only \(s = \frac{1}{2}\left( \frac{\Vert \textbf{x}\Vert _2}{M}-\frac{1}{\sqrt{1+M^2}}\right) \) is a valid solution whenever \(\Vert \textbf{x}\Vert _2 \ge \frac{M}{\sqrt{1+M^2}}\). Therefore, in this case, the optimal solution of (A.1) is \(\textbf{y}_{\textbf{x}} = M \frac{\textbf{x}}{\Vert \textbf{x}\Vert _2}\) and the optimal value is
Summarizing the two cases, we obtain that
which is the same as the expression in (2.3).
Appendix B: PH Envelope of Example 2.3
The objective is to find the value of
where \(t(u)=-\log (u-c)+M|z-u|\) whenever \(u>c\) and \(t(u)=\infty \) otherwise. The minimizer of t is \(u^*=z\) if \(z>c\) and \(0 \in \partial t(z)\), which translates to \( \frac{1}{z-c} \le M\), meaning to \( z \ge c+\frac{1}{M}\). Thus, \(u^*=z\) is the optimal solution if and only if \( z \ge c+\frac{1}{M}\). Assume now that \(z<c+\frac{1}{M}\). In this case z, is not the optimal solution, and thus the optimal solution \(u^*\) is attained at a point in which t is differentiable. Since \(\frac{d}{du}(-\log (u-c))= - \frac{1}{u-c}\) is negative over the domain \((c,\infty )\), it follows that the optimal solution \(u^*\) satisfies \(u^*>z\) and hence \( -\frac{1}{u^*-c}+M=0\), meaning that
Overall, we obtain that the optimal solution is
and the corresponding optimal value is
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Beck, A., Teboulle, M. Exact Lipschitz Regularization of Convex Optimization Problems. J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02465-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10957-024-02465-8