Nonlinear Optimization pp 117180  Cite as
Convex Optimization
 1.6k Downloads
Abstract
(a) Unconstrained convex optimization, where the constraint set F represents a given convex subset of \(\mathbb {R}^{n}\) (as \(\mathbb {R} _{++}^{n} \)).
Sensitivity analysis gives an answer to “what if” questions as “how does small perturbations in some part of the data affect the optimal value of P?”. Subsection 4.4.1 tackles this question when the perturbations affect the righthand side of the constraints (representing the available resources when P models a production planning problem). Subsections 4.4.3 and 4.4.4 introduce two different dual problems for P and provide their corresponding strong duality theorems.
(d) Conic optimization , where F is defined by means of constraints involving closed convex cones in certain Euclidean spaces. This type of convex optimization problems is very important in practice and can be solved by means of efficient numerical methods based on advanced mathematical tools. Since they can hardly be solved with pen and paper, the purpose of Section 4.5 is to introduce the readers in this fascinating field through the construction of duality problems for this class of problems and the review of the corresponding strong duality theorems.
4.1 The Optimal Set
The next result concerns the optimal set \(F^{*}= \mathrm{argmin}\left\{ f\left( x\right) :x\right. \) \(\left. \in F\right\} \) of \(P\).
Proposition 4.1
(The optimal set in convex optimization) Let P be the convex optimization problem in (4.1). Then:
(i) \(F^{*}\) is a convex set.
(ii) If f is strictly convex on F, then \(\left F^{*}\right \le 1\).
(iii) If F is an unbounded closed set such that \(\mathop {\mathrm{int}}F\ne \emptyset \) and f is strongly convex and continuous on F, then \(\left F^{*}\right =1\).
Proof
(i) We can assume that \(F^{*}\ne \emptyset \). Then, due to the convexity of F and f, \(F^{*}=S_{v\left( P\right) }\left( f\right) \) is convex.
(iii) Since F is an unbounded closed set and f is coercive by Proposition 2.60, we have \(\left F^{*}\right \ge 1\). Then, \(\left F^{*}\right =1\) by (ii).
(v) The inclusion \(\left\{ x\in F:\nabla f\left( x\right) =0_{n}\right\} \subset F^{*}\) is consequence of (iv) applied to all \(x\in F\), while the reverse inclusion follows from Fermat’s principle. \(\square \)
Example 4.2
The same happens with the function \(f\left( x\right) =\frac{\sqrt{x^{2}+a^{2} }}{v_{1}}+\frac{\sqrt{\left( x1\right) ^{2}+b^{2}}}{v_{2}}\) used to prove the refraction law.
Example 4.3
It is easy to see that the inclusion in Proposition 4.1(iv) may be strict (consider \(f\left( x\right) =e^{x}\) with F being any closed interval in \(\mathbb {R}\)).
We now show an immediate application of this result to systems of linear inequalities, while the next two sections provide applications to statistical inference and operations research.
Corollary 4.4
Proof
Three elementary proofs of the existence of solution for the nonlinear system in (4.3) can be found in [25], but the analytical computation of such a solution is usually a hard task. The uniqueness is not guaranteed as the objective function of \(P_{2}\) is not strictly convex.
Example 4.5
Example 4.6
4.2 Unconstrained Convex Optimization
This section presents two interesting applications of unconstrained convex optimization. The first one is a classical location problem posed by Fermat in the seventeenth century that gave rise to a vast amount of the literature illustrating the three paradigms of optimization: the geometrical, the analytical, and the numerical ones. The second application consists in a detailed resolution of an important optimization problem arising in statistical inference that it is commonly solved in a nonrigorous way.
4.2.1 The Fermat–Steiner Problem
The problem \(P_{FS}\) appears under different names in the mathematical literature (Fermat, Fermat–Steiner, Fermat–Torricelli, Fermat–Steiner–Weber, etc.), and its variants, generically called location (or pcenter) problems, are still being studied by operational researchers and analysts. The most prominent of these variants are those that consider more than three given points, those which assign weights to the given points (e.g., number of residents at each village in the examples above), those which replace the Euclidean distance with nonEuclidean ones, those which replace the plane by spaces of greater (even infinite) dimension, those which replace the objective function by another one as \(d\left( P, P_{1}\right) ^{2}+d\left( P, P_{2}\right) ^{2}+d\left( P, P_{3}\right) ^{2}\) (in which case it is easy to prove that the optimal solution is the barycenter of the triangle of vertices \(P_{1}\), \(P_{2}\), and \(P_{3}\)) or \(\max \left\{ d\left( P, P_{1}\right) ,d\left( P, P_{2}\right) ,d\left( P, P_{3}\right) \right\} \) (frequently used when locating emergency services), etc. There exists an abundant literature on the characterization of the triangle centers as optimal solutions to suitable unconstrained optimization problems. One of the last centers to be characterized in this way has been the orthocenter (see [55]).
Solving an optimization problem by means of the geometric approach starts with a preliminary empirical phase, which is seldom mentioned in the literature. This provides a conjecture on the optimal solution (a point in the case of \(P_{FS}\)), which is followed by the conception of a rigorous compass and ruler constructive proof, in the style of Euclid’s elements. The limitation of this approach is the lack of a general method to get conjectures and proofs, which obliges the decision maker to conceive ad hoc experiments and proofs. We illustrate the geometric approach by solving \(P_{FS}\) for triples of points \(P_{1}\), \(P_{2}\), and \(P_{3}\) determining obtuseangled triangles. We denote by \(\alpha _{i}\) the angle (expressed in degrees) corresponding to vertex \(P_{i}\), \(i=1,2,3\). Intuition suggests that when \(\max \left\{ \alpha _{1},\alpha _{2},\alpha _{3}\right\} \) is big enough, the optimal solution is the vertex corresponding to the obtuse angle. Indeed, we now show that this conjecture is true when \(\max \left\{ \alpha _{1},\alpha _{2},\alpha _{3}\right\} \ge 120\) degrees by means of a compass and ruler constructive proof.
Proposition 4.7
(Trivial Fermat–Steiner problems) If \(\alpha _{i}\ge 120\) degrees, then \(P_{i}\) is a global minimum of \(P_{FS} \).
Proof
Assume that the angle at \(P_{2}\), say \(\alpha _{2}\), is not less than 120 degrees. By applying to \(P_{1}\) and to an arbitrary point \(P\ne P_{2}\) a counterclockwise rotation centered at \(P_{2}\), with angle \(\alpha :=180\alpha _{2}\) degrees, one gets the image points \(P_{1}^{\prime }\) and \(P^{\prime }\), so that the points \(P_{1}^{\prime }, P_{2}\), and \(P_{3}\) are aligned (see Fig. 4.3).
The converse statement claiming that the attainment of the minimum at a vertex entails that \(\max \left\{ \alpha _{1},\alpha _{2},\alpha _{3}\right\} \ge 120\) degrees can also be proved geometrically [68, pp. 280–282].
From now on, we assume that \(\max \left\{ \alpha _{1},\alpha _{2},\alpha _{3}\right\} <120\) degrees, i.e., that the minimum of f is not attained at one of the points \(P_{1}{} , P_{2}, P_{3}\). The first two solutions to \( P_{FS}\) under this assumption were obtained by Torricelli and his student Viviani, but were published in 1659, 12 years after the publication of Cavallieri’s solution. Another interesting solution was provided by Steiner in 1842, which was later rediscovered by Gallai and, much later, by Hoffmann in 1929 (many results and methods have been rediscovered again and again before the creation of the first mathematical abstracting and reviewing services, Zentralblatt MATH., in 1930, and Mathematical Reviews, in 1940). Steiner’s solution was based on successive 60 degree rotations in the plane, while Torricelli’s one consisted in maximizing the area of those equilateral triangles whose sides contain exactly one of the points \(P_{1}\), \(P_{2}\), and \(P_{3}\). A particular case of the latter problem was posed by Moss, in 1755, at the women’s magazine The Ladies Diary or the Woman’s Almanack, which suggests the existence of an important cohort of women with a good mathematical training during the Age of Enlightenment. The solution to the Torricelli–Moss problem, actually the dual problem to \(P_{FS}\), was rediscovered by Vecten, in 1810, and by Fasbender, in 1846, to whom it was mistakenly attributed until 1991.
The second approach used to solve \(P_{FS}\), called analogical, is inspired in the least action metaphysical principle asserting that nature always works in the most economic way, i.e., by minimizing certain magnitude: time in optics (as observed by Fermat), surface tension in soap bubbles, potential energy in gravitational fields, etc. The inconvenience of this analogical approach is that it is not easy to design an experiment involving some physical magnitude whose minimization is equivalent to the optimization problem to be solved. Moreover, nature does not compute global minima but local ones (in fact, critical points), as it happens with beads inserted in wires, whose equilibrium points are the local minima of the height (recall that the potential energy of a particle of given mass m and variable height y is mgy, where g denotes the gravitational constant, so the gravitational field provides local minima of y on the curve represented by the wire). Moreover, these local minima might be attained in a parsimonious way, at least in the case of wildlife (evolution by natural selection is a very slow process).
The analogical approach has inspired many exotic numerical optimization methods imitating nature or even social behavior, but not always in a rigorous way. Such methods, called metaheuristic, are designed to generate, or select, a heuristic (partial search algorithm) that may provide a sufficiently good solution to a given optimization problem. In fact, “in recent years, the field of combinatorial optimization [where some decision variables take integer values] has witnessed a true tsunami of ‘novel’ metaheuristic methods, most of them based on a metaphor of some natural or manmade process. The behavior of virtually any species of insects, the flow of water, musicians playing together—it seems that no idea is too farfetched to serve as inspiration to launch yet another metaheuristic” [82].
We illustrate the analogical approach by solving \(P_{FS}\) through two experiments that are based on the minimization of the surface tension of a soap bubble and the minimization of the potential energy of a set of mass points, respectively.
4.2.1.1 Minimizing the Surface Tension
The physicist Plateau witnessed, in 1840, a domestic accident caused by a servant who wrongly poured oil on a receptacle containing a mixture of alcohol and water. Intrigued by the spheric shape of the oil bubbles, Plateau started to perform various experiments with soapy water solutions, which lead him to the conclusion that the surface tension of the soap film is proportional to its surface area, so that the bubbles are spheric when they are at equilibrium (i.e., when the pressure of the air contained in the bubble equals the atmospheric pressure). Due to the lack of analytic tools, he could not justify his conjecture, which was proved in the twentieth century by Radó, in 1930, and by Douglas, in 1931, independently.
4.2.1.2 Minimizing the Potential Energy
The famous Scottish Book was a notebook used in the 1930s and 1940s by mathematicians of the Lwów School of Mathematics for collecting interesting solved, unsolved, and even probably unsolvable problems (Lwów is the Polish name of the, at present, Ukrainian city of Lviv). The notebook was named after the Scottish Café where it was kept. Among the active participants at these meetings, there were the functional analysts and topologists Banach, Ulam, Mazur, and Steinhaus, who conceived an ingenious experiment to solve the weighted Fermat–Steiner problem. This problem seems to have been first considered in the calculus textbook published by Thomas Simpson in 1750. Tong and Chua (1995) have proposed a geometric solution inspired in Torricelli’s one for \(P_{FS}\).
4.2.1.3 The Analytical Solution
We start by exploring the properties of the objective function \( f=\sum _{i=1}^{3}f_{i}\), where \(f_{i}\left( x\right) =\left\ xx_{i}\right\ \). Obviously, \(f_{i}\) is convex as it is the result of composing an affine function, \(x\mapsto xx_{i}\), and a convex one (the norm), \(i=1,2,3\), so f is convex too. Thanks to the convexity of \(P_{FS}\), the local minima obtained via the analogical approach are actually global minima. We now prove that f is strictly convex and coercive.
Proposition 4.8
Proposition 4.9
(Geometric solution to the Fermat–Steiner problem) The optimal solution of \(P_{FS}\) is the isogonic center of the triangle with vertices \(x_{1}\), \(x_{2}\), and \(x_{3}\), that is, the point \( \overline{x}\) from which the three sides are seen under the same angle of 120 degrees.
Proof
Example 4.10
([68, pp. 284–285]). In the 1960s, Bell Telephone installed telephone networks connecting the different headquarters of the customer companies with a cost, regulated by law, which was proportional to the total length of the installed network. One of these customers, Delta Airlines, wanted to connect its hubs placed at the airports of Atlanta, Chicago, and New York, which approximately formed an equilateral triangle. Bell Telephone proposed to install cables connecting one of the bases with the other two bases, but Delta Airlines proposed instead to create a virtual base at the isogonic center of the triangle formed by the three hubs, linking this point with the three bases by cables. The Federal Authority agreed with this solution, and Delta Airlines earned a significant amount of money that we can estimate. Take the average distance between hubs as length unit. Since the height of the equilateral triangle is \(\frac{\sqrt{3}}{2}\), the distance from the isogonic center (here coinciding with the orthocenter) to any vertex is \(\frac{2}{3}\times \frac{\sqrt{3}}{2}=\frac{\sqrt{3}}{3}\). Therefore, the length of the networks proposed by Bell Telephone and by Delta Airlines was of 2 and \(\sqrt{3}\) units, with an estimated earning of \(\left( \frac{2\sqrt{3}}{2}\right) \times 100= 13.4\%\).
4.2.2 The FixedPoint Method of Weiszfeld
4.2.3 An Application to Statistical Inference
Proposition 4.11
(Maximum likelihood estimators) The maximum likelihood estimators of the parameters \(\mu \) and \( \sigma ^{2}\) of a normally distributed random variable are the sample mean and the variance, respectively.
4.3 Linearly Constrained Convex Optimization
4.3.1 Optimality Conditions
Definition 4.12
Since \(I(\overline{x})\) is finite, \(A\left( \overline{x}\right) \) is a finitely generated convex cone, and so, it is closed (see Proposition 2.13).
Definition 4.13
Proposition 4.14
Proof
We shall prove both inclusions.
Lemma 4.15
Proof
To prove the reverse inclusion, we now show that \(Y\subset Y^{\circ \circ }\) (from this inclusion, one concludes that \(\mathop {\mathrm{cl}}\mathop {\mathrm{cone}}Y\subset \mathop {\mathrm{cl}}\mathop {\mathrm{cone}}Y^{\circ \circ }=Y^{\circ \circ }\), as \(Y^{\circ \circ }\) is a closed convex cone). Take any \(y\in Y\). By the definition of \(Y^{\circ }\), one has \(x^{T}y\le 0\) for all \(x\in Y^{\circ }\), and so, \(y\in Y^{\circ \circ }\). \(\square \)
We next characterize those homogeneous linear inequalities which are satisfied by all the solutions to a given homogeneous linear inequality system. In two dimensions, this result characterizes those halfplanes which contain a given angle with apex at the origin and whose boundaries go through the origin.
Corollary 4.16
Proof
This is a direct application of Lemma 4.15 to the set \(Y:=\left\{ a_i, i\in I\right\} \). Indeed, \(a\in Y^{\circ \circ }\), by definition, if \(a^Tx\le 0\) for all \(x\in \mathbb {R}^n\) such that \(a_i^Tx\le 0\) for all \(i\in I\). By Lemma 4.15, we have that \(Y^{\circ \circ }=\mathop {\mathrm{cl}}\mathop {\mathrm{cone}}\left\{ a_i, i\in I\right\} \), which proves the claim. \(\square \)
Trying to characterize in a rigorous way the equilibrium points of dynamic systems, the Hungarian physicist G. Farkas proved in 1901, after several failed attempts, the particular case of the above result where the index set I is finite and the closure operator can be removed (recall that any finitely generated convex cone is closed). This classical Farkas lemma will be used in this chapter to characterize optimal solutions to convex problems and to linearize conic systems. Among its many applications, let us mention machine learning [63], probability, economics, and finance [32, 37]. The generalized Farkas lemma (whose infinite dimensional version was proved by Chu [22]) is just one of the many available extensions of the classical Farkas lemma [31]. We use this version in Chapter 6 to obtain necessary optimality conditions in nonconvex optimization.
The following proposition lists some basic properties of polar cones to be used hereinafter.
Proposition 4.17
(Handling polar sets) Given \(Y, Z\subset \mathbb {R}^{n}\), the following properties hold:
(i) If \(Y\subset Z\), then \(Z^{\circ }\subset Y^{\circ }.\)
(ii) \(Y^{\circ }={\left( \mathop {\mathrm{cone}}Y\right) }^{\circ }={\left( \mathop {\mathrm{cl }}\mathop {\mathrm{cone}}Y \right) }^{\circ }.\)
(iii) \(Y^{\circ \circ }=Y\) if and only if Y is a closed convex cone.
Proof
Statements (i) and (ii) come straightforwardly from the definition of polar cone, while (iii) is immediate from the Farkas Lemma 4.15. \(\square \)
Corollary 4.18
Proof
Example 4.19
We now prove, from the previously obtained relationships between the feasible direction cone and the active cone, the simplest version of the Karush–Kuhn–Tucker (KKT in brief) theorem, which provides a firstorder characterization for linearly constrained convex optimization problems in terms of the socalled nonnegativity condition (NC) , the stationarity condition (SC), and the complementarity condition (CC) , altogether called the KKT conditions . The general version of the KKT theorem, which provides a necessary optimality condition for nonlinear optimization problems with inequality constraints, was first proved by Karush in his unpublished master’s thesis on the extension of the method of Lagrange multipliers for equality constrained problems, in 1939, and rediscovered by Kuhn and Tucker in 1951.
Theorem 4.20
(KKT theorem with linear constraints) Let \(\overline{x}\in F\). Then, the following statements are equivalent:
(i) \(\overline{x}\in F^{*}\).
(ii) \(\nabla f\left( \overline{x}\right) \in A\left( \overline{x}\right) \).
Proof
We shall prove that (i)\(\Leftrightarrow \)(ii) and (ii)\(\Leftrightarrow \) (iii).
The converse statement is trivial. \(\square \)
Example 4.21
In the particular case of linear optimization, where \(f\left( x\right) =c^{T}x\) for some vector \(c\in \mathbb {R}^{n}\), the global minima are characterized by the condition \(c\in A\left( \overline{x} \right) \).
It is worth observing that we have proved (i)\(\Rightarrow \)(ii)\( \Leftrightarrow \)(iii) in Theorem 4.20 under the unique assumption that \(\overline{x}\) is a local minimum of the problem. So, if f is not convex, (ii) and the equivalent statement (iii) are necessary conditions for \(\overline{x}\) to be a local minimum. The application of this useful necessary condition allows to filter the candidates for local minima and for computing the global minima whenever F is compact or f is coercive on F (by comparing candidates). Another useful trick consists in tackling, instead of the given problem, the result of eliminating some constraints, called relaxed problem, whose feasible set is generally greater than the initial one (this is systematically made when the constraint set C is discrete, e.g., when the decision variables are integer, i.e., \(C=\mathbb {Z} ^{n}\)). The next example illustrates the application of both tools to solve nonconvex optimization problems.
Example 4.22
4.3.2 Quadratic Optimization
Linearly constrained convex quadratic optimization problems enjoy specific theoretical properties, as the socalled Frank–Wolfe Theorem [36], which guarantees the existence of optimal solutions even when the polyhedral feasible set is unbounded and the objective function is neither convex nor coercive.
Theorem 4.23
Proof
The original proof given by Frank and Wolfe in [36] is beyond the scope of this book. An analytical direct proof was offered by Blum and Oettli in [14]. \(\square \)
Example 4.24
Consider the geometric problem consisting in the separation of two finite sets in \(\mathbb {R}^{n}\), \(\left\{ u_{1},\ldots , u_{p}\right\} \) and \(\left\{ v_{1},\ldots , v_{q}\right\} \), by means of the thickest possible sandwich (the region of \(\mathbb {R}^{n}\) limited by two parallel hyperplanes, also called strip when \(n=2\)); see Fig. 4.11.
The quadratic problem \(P_{U, V}\) presents a double inconvenient: It may have multiple optimal solutions, and the computed optimal solution Open image in new window may have many nonzero entries. Observe that, if \({}\overline{u}_{k}= \overline{v}_{k}=0\), the kth component of the observations is useless for the classification and can be eliminated from the medical check or from the list of documents attached by the borrower to his/her application.
In order to overcome the first inconvenience, we can add to the objective function of \(P_{U, V}\) a regularization term \(\gamma \left\ y\right\ ^{2}\), with \(\gamma >0\) selected by the decision maker, guaranteeing the existence of a unique optimal solution to the resulting convex quadratic optimization problem (thanks to the strong convexity of the regularization term). This optimal solution, depending on \(\gamma \), approaches the optimal set of \(P_{U, V}\) as \(\gamma \) decreases to 0.
Regarding the second inconvenience, we could follow a similar strategy, consisting in adding to the objective function of \(P_{U, V}\) a sparsity term \(\gamma s\left( y\right) \), with \(s\left( y\right) \) denoting the number of nonzero components of y, and \(\gamma >0\). Unfortunately, the function s is not even continuous. For this reason, it is usually replaced by the \( \ell _{1}\) norm, as the optimal solutions of the resulting problems tend to be sparse vectors (i.e., vectors with many zero components).
The main reasons to devote a subsection to this particular class of convex optimization problems are its importance in practice and the fact that they can be solved with pen and paper when the number of inequality constraints is sufficiently small. Even more, as shown next, their optimal sets can be expressed by means of closed formulas in the favorable case where all constraints are linear equations, as it happens in the design of electric circuits (recall that the heat generated in a conductor is proportional to the resistance times the square of the intensity, so Q is diagonal and \(c = 0_{n}).\)
Proposition 4.25
Proof
It is a straightforward consequence of the Frank–Wolfe Theorem 4.23, Propositions 3.1 and 4.1, and Theorem 4.20. \(\square \)
A multifunction (or setvalued mapping) between two nonempty sets \(X\subset \mathbb {R}^m\) and \(Y\subset \mathbb {R}^n\) is a correspondence associating with each \(x\in X\) a subset of \(Y\).
Definition 4.26
Obviously, \(P_{C}\left( y\right) \ne \emptyset \) as \(d\left( y,\cdot \right) \) is a continuous coercive function on \(\mathbb {R}^n\) and the set C is closed. For instance, if C is a sphere centered at z, \(P_{C}(z) =C\) and \( P_{C}\left( y\right) =C\cap \left\{ z+\lambda \left( yz\right) :\lambda \ge 0\right\} \) (a unique point) for all \(y\ne z\).
Proposition 4.25 allows to compute \(P_{C}\left( y\right) \) for any \(y\in \mathbb {R}^{n}\), whenever C is a polyhedral convex set, but not to get an explicit expression for \(P_{C}\).
Example 4.27
We now consider the particular case of the minimization of a quadratic function on an affine manifold.
Proposition 4.28
Proof
Let \(a_{i}^{T}\), \(i=1,\ldots , m\), be the rows of M. By assumption, we know that the set \(\left\{ a_{i}, i=1,\ldots , m\right\} \) is linearly independent.
Example 4.29
4.3.3 Some Closed Formulas
Corollary 4.30
We now apply Proposition 4.28 to the computation of \( P_{C}\left( y\right) \), whenever C is an affine manifold.
Corollary 4.31
Proof

Step 1: Find w such that \(\left\{ Gw=dMy\right\} \).

Step 2: Compute \(P_{F}\left( y\right) =M^{T}w+y\).
The next result is an immediate consequence of Corollary 4.31.
Corollary 4.32
Example 4.33
Example 4.34
4.4 Arbitrarily Constrained Convex Optimization\(^{\star }\)
4.4.1 Sensitivity Analysis
We now study the properties of \(\mathcal {\vartheta }\), whose argument z is interpreted as a perturbation of \(0_{m}\). One of these properties is the convexity that we define now for extended functions. To handle these functions, we must first extend the algebraic operations (sum and product) and the natural ordering of \(\mathbb {R}\) to the extended real line \(\overline{ \mathbb {R}}\).
We are now in a position to define convexity of extended real functions, which, unlike Definition 2.26, does not involve a particular convex set C of \(\mathbb {R}^{n}\).
Definition 4.35
It is easy to prove that h is convex if and only if \(\mathop {\mathrm{epi}}h\) is a convex subset of \(\mathbb {R}^{n+1}\). Hence, the supremum of convex functions (in particular the supremum of affine functions) is also convex. Since linear mappings between linear spaces preserve the convexity of sets and \( \mathop {\mathrm{dom}}h\) is the image of \(\mathop {\mathrm{epi}}h\) by the vertical projection Open image in new window if h is convex then \(\mathop {\mathrm{dom}}h\) is convex too.
Definition 4.36
A given point \(\widehat{x}\in C\) is a Slater point for P if \(g_{i}\left( \widehat{x}\right) <0\) for all \(i\in I\) (that is, \(\widehat{x}\in F\) and \( I\left( \widehat{x}\right) =\emptyset \)). We say that the problem P satisfies the Slater constraint qualification (SCQ) when there exists a Slater point.
The next example shows that \(\mathcal {\vartheta }\) might be nondifferentiable at \(0_{m}\) even when SCQ holds, although it is a convex function, as we shall see in Theorem 4.38. It can even be noncontinuous; see Exercise 4.16.
Example 4.37
Theorem 4.38
(Convexity of the value function) The value function \(\mathcal {\vartheta }\) of a convex optimization problem P is convex. Moreover, if P satisfies SCQ, then \(0_{m}\in \mathop {\mathrm{int}}\mathop {\mathrm{dom}}\mathcal {\vartheta }. \)
Proof
Observe that the restriction of \(\mathcal {\vartheta }\) to \(\mathop {\mathrm{dom}} \mathcal {\vartheta }\) takes values in \(\mathbb {R\cup }\left\{ \infty \right\} \). Let \(y, z\in \mathbb {R}^{n}\) and \(\mu \in ] 0,1 [ \). If either \(\mathcal {\vartheta }\left( x\right) =+\infty \) or \(\mathcal {\vartheta }\left( y\right) =+\infty \), then \(\left( 1\mu \right) \mathcal {\vartheta }\left( x\right) +\mu \mathcal { \vartheta }\left( y\right) =+\infty \), according to the calculus rules on \( \overline{\mathbb {R}}\). Hence, we may assume that \(y, z\in \mathop {\mathrm{dom}}\mathcal { \vartheta }\).
Assume now that P satisfies SCQ. Let \(\widehat{x}\in C\) be such that \( g_{i}\left( \widehat{x}\right) <0\) for all \(i\in I\). Let \(\rho :=\min _{i\in I}\left( g_{i}\left( \widehat{x}\right) \right) >0\), that is, \(\max _{i\in I}g_{i}\left( \widehat{x}\right) =\rho \).
Given \(z\in \rho \mathbb {B}\), one has \(\left z_{i}\right \le \rho \), so \(g_{i}\left( \widehat{x}\right) \le \rho \le z_{i}\), \(i\in I\). Thus, \(\widehat{x}\in \mathcal {F}(z)\) and one has \(z\in \mathop {\mathrm{ dom}}\mathcal {F}=\mathop {\mathrm{dom}}\mathcal {\vartheta }\). Since \(\rho \mathbb {B} \mathbb {\subset }\mathop {\mathrm{dom}}\mathcal {\vartheta }\), we conclude that \( 0_{m}\in \mathop {\mathrm{int}}\mathop {\mathrm{dom}}\mathcal {\vartheta }\), which completes the proof. \(\square \)
In Exercise 4.16, \(\mathcal {\vartheta }\) is finite valued on \(\mathop {\mathrm{dom}} \mathcal {\vartheta }\). We now show that this fact is a consequence of the convexity of \(\mathcal {\vartheta }\) and its finiteness at some point of \(\mathop {\mathrm{int}}\mathop {\mathrm{dom}}\mathcal {\vartheta }=\mathbb {R}\).
Corollary 4.39
If \(0_{m}\in \mathop {\mathrm{int}}\mathop {\mathrm{dom}}\mathcal { \vartheta }\) and \(\mathcal {\vartheta }\left( 0_{m}\right) \in \mathbb {R}\), then \(\mathcal {\vartheta }\) is finite on the whole of \(\mathop {\mathrm{dom}}\mathcal { \vartheta }\).
Proof
As a consequence of Theorem 4.38, \(\mathop {\mathrm{epi}}\mathcal {\vartheta }\) is convex, but it can be closed (as in Example 4.37) or not. It always satisfies the inclusion \(\mathop {\mathrm{gph}}\vartheta \subset \mathop {\mathrm{bd}} \mathop {\mathrm{cl}}\mathop {\mathrm{epi}}\mathcal {\vartheta }\) because \(\mathop {\mathrm{gph}} \vartheta \subset \mathop {\mathrm{epi}}\mathcal {\vartheta }\) and, for any \(x\in \mathop {\mathrm{dom}}\vartheta \), \({\left( x,\vartheta \left( x\right) \frac{1}{k} \right) }^{T} \notin \mathop {\mathrm{epi}}\mathcal {\vartheta }\) for all \(k\in \mathbb {N}\). Thus, by the supporting hyperplane theorem, there exists a hyperplane supporting \(\mathop {\mathrm{cl}}\mathop {\mathrm{epi}}\mathcal {\vartheta }\) at any point of \( \mathop {\mathrm{gph}}\vartheta \). This hyperplane might be unique or not, and when it is unique, it can be vertical or not. For instance, in Example 4.37, any line of the form \(y=\lambda z\) with \(\lambda \in \left[ 0,1 \right] \) supports \(\mathop {\mathrm{epi}}\mathcal {\vartheta }\) at \(0_2\), while the vertical line Open image in new window does not support \(\mathop {\mathrm{epi}}\mathcal {\vartheta }\).
Theorem 4.40
Proof
Finally, the case where \(\mathcal {\vartheta }\) is differentiable at \(0_{m}\) is a direct consequence of the last assertion in Proposition 2.33. \(\square \)
A vector \(\lambda \in \mathbb {R}_{+}^{m}\) satisfying (4.24) is said to be a sensitivity vector for \(P\). The geometrical meaning of (4.24) is that there exists an affine function whose graph contains Open image in new window and is a lower approximation of \(\mathcal {\vartheta }\).
In Example 4.37, the sensitivity vectors (scalars here as \(m=1\)) for P are the elements of the interval \(\left[ 0,1\right] \); see Fig. 4.18.
In Example 4.37, \(\mathcal {\vartheta }^{\prime }(0;1)\ge \lambda \) for all \(\lambda \in [0,1]\), so \(\mathcal {\vartheta }^{\prime }(0;1)\ge 0\). Similarly, \(\mathcal {\vartheta }^{\prime }(0;1)\ge \lambda \) for all \(\lambda \in [0,1]\), so \(\mathcal {\vartheta }^{\prime }(0;1)\ge 1\). Actually, one has that \(\mathcal {\vartheta }^{\prime }(0;1)=0\) and \(\mathcal {\vartheta }^{\prime }(0;1)=1\).
4.4.2 Optimality Conditions
Definition 4.41
We now prove that, if we know a sensitivity vector, we can reformulate P as an unconstrained convex problem.
Theorem 4.42
Proof
The Lagrange function for the problem in Example 4.37 is \(L\left( x,\lambda \right) =\left x_{1}\right +x_{2}+\lambda x_{1}\). Taking the sensitivity vector (here a scalar) \(\lambda =1\), one has, for any \(x\in C= \mathbb {R\times }\mathbb {R}_{+}\), \(L\left( x, 1\right) =x_{2}\), if \(x_{1}<0\), and \(L\left( x, 1\right) =2x_{1}+x_{2}\), if \(x_{1}\ge 0\), so we have that \( \inf _{x\in C}L\left( x, 1\right) =0=v\left( P\right) \).
Theorem 4.43
(Saddle point theorem) Assume that P is a bounded convex problem satisfying SCQ. Let \(\overline{x}\in C\). Then, \(\overline{x}\in F^{*}\) if and only if there exists \(\overline{\lambda }\in \mathbb {R}^{m}\) such that:
(NC) \(\overline{\lambda }\in \mathbb {R}_{+}^{m}\);
(SPC) \(L\left( \overline{x},\lambda \right) \le L\left( \overline{x}, \overline{\lambda }\right) \le L\left( x,\overline{\lambda }\right) ,\quad \forall x\in C,\lambda \in \mathbb {R}_{+}^{m}\); and
(CC) \(\overline{\lambda }_{i}g_{i}\left( \overline{x}\right) =0,\quad \forall i\in I\).
The new acronym (SPC) refers to saddle point condition .
Remark 4.44
(Comment previous to the proof). (SPC) can be interpreted by observing that \(\left( \overline{x},\overline{ \lambda }\right) \) is a saddle point for the function \(\left( x,\lambda \right) \mapsto L\left( x,\lambda \right) \) on \(C\times \mathbb {R} _{+}^{m}\) as \(\overline{x}\) is a minimum of \(L\left( x,\overline{\lambda } \right) \) on C, while \(\overline{\lambda }\) is a maximum of \(L\left( \overline{x},\lambda \right) \) on \(\mathbb {R}_{+}^{m}\). For instance, \(0_{2} \) is a saddle point for the function \(\left( x,\lambda \right) \mapsto \) \(x^{2}\lambda ^{2}\) on \(\mathbb {R}^{2}\), as Fig. 4.20 shows. One can easily check that \(\left( 0,0,1\right) \) is a saddle point for the problem in Example 4.37, as \(L\left( x, 1\right) =\left x_{1}\right +x_{2}+x_{1}\ge 0=L\left( 0,0,1\right) \) for all \(x\in C\) and \(L\left( 0,0,\lambda \right) =0=L\left( 0,0,1\right) \) for all \(\lambda \in \mathbb {R}_{+}\).
Proof
Throughout this proof, we represent by (SPC1) and (SPC2) the first and the second inequalities in (SPC), respectively.
What Theorem 4.43 asserts is that, under the assumptions on P (boundedness and SCQ), a feasible solution is optimal if and only if there exists a vector \(\overline{\lambda }\in \mathbb {R}^{m}\) such that (NC), (SPC), and (CC) hold, in which case we say that \(\overline{\lambda }\) is a Lagrange vector .
The next simple example, where \(n=m=1\), allows to visualize the saddle point of a convex optimization problem as \(\mathop {\mathrm{gph}}L\subset \mathbb {R}^{3}\).
Example 4.45
We now show that the condition \(\nabla f\left( \overline{x}\right) \in A( \overline{x})=\mathop {\mathrm{cone}}\left\{ \nabla g_{i}\left( \overline{x}\right) , i\in I\left( \overline{x}\right) \right\} \) (the active cone at \(\overline{x}\) defined in (4.7)) also characterizes the optimality of \(\overline{x}\in F\) when P is a differentiable convex optimization problem and SCQ holds (which means that the assumptions of the KKT Theorems 4.20 and 4.46 are independent of each other).
Theorem 4.46
(KKT theorem with convex constraints) Assume that P is a bounded convex problem satisfying SCQ, with \(f,g_{i}, i\in I\), differentiable on some open set containing \(C\). Let \( \overline{x}\in F\cap \mathop {\mathrm{int}}C\). Then, the following statements are equivalent:
(i) \(\overline{x}\in F^{*}\).
(ii) \(\nabla f\left( \overline{x}\right) \in A\left( \overline{x}\right) \).
Proof
Since (ii) and (iii) are trivially equivalent, it is sufficient to prove that (i)\(\Leftrightarrow \)(iii).
A vector \(\overline{\lambda }\in \mathbb {R}^{m}\) such that (NC), (SC), and (CC) hold is said to be a KKT vector.
Remark 4.47
Observe that SCQ was not used in the proof of (iii)\(\Rightarrow \)(i). Therefore, in a convex problem, the existence of a KKT vector associated with \(\overline{x}\) implies the global optimality of \(\overline{x}\).
Revisiting Example 4.37, \(\left( {\overline{x}}^{T},\overline{\lambda } \right) =\left( 0,0,1\right) \) does not satisfy (SC) of Theorem 4.46 as f is not even differentiable at \(0_{2} \). Concerning Example 4.45, it is easy to check that \(0_{2} \) satisfies (NC), (SC), and (CC).
The KKT conditions are used to either confirm or reject the optimality of a given feasible solution (e.g., the current iterate for some convex optimization algorithms) or as a filter allowing to elaborate a list of candidates to global minima. Under the assumptions of Theorem 4.46, the sensitivity theorem guarantees the existence of some sensitivity vector and the proofs of the saddle point and the KKT theorems show that such a vector is a KKT vector. So, if P has a unique KKT vector, then it is a sensitivity vector too.
Example 4.48
\(\bullet \) \(I(x)=\emptyset \): The unique solution to \(\nabla f(x)=0_{2}\) is \(x={\left( 0,5\right) }^{T} \notin F\).
Replacing this value, we obtain \(x^{1}={\left( 1,2\right) }^{T} \in F\), which is a minimum of P with corresponding KKT vector \( {\left( 1,0\right) }^{T}\). Once we have obtained the unique minimum of P, there is no need to complete the discussion of the remaining cases \(I\left( x\right) =\left\{ 2\right\} \) and \(I\left( x\right) =\left\{ 1,2\right\} \).
Figure 4.23 shows the partition of F into the sets \( F_{1}, F_{2}, F_{3}, F_{4}\) corresponding to the parts of \(I\left( x\right) \), \( \emptyset \), \(\{1\}\), \(\{2\}\), and \(\{1,2\}\), respectively. Observe that \( F_{1}=\mathop {\mathrm{int}}F\), \(F_{2}\) is an arch of circle without its end points, \(F_{3}\) is a segment without its extreme points, and, finally, \( F_{4} \) is formed by two isolated points (the end points of the mentioned segment).
When P has multiple solutions, one can use the next result, which directly proves that the Lagrange vectors are sensitivity vectors.
Theorem 4.49
(Sensitivity and saddle points) Let \(\overline{x}\in F^{*}\). If \(\overline{\lambda }\in \mathbb {R}^{m}\) satisfies
(NC) \(\overline{\lambda }\in \mathbb {R}_{+}^{m}\),
(SPC) \(L\left( \overline{x},\lambda \right) \le L\left( \overline{x}, \overline{\lambda }\right) \le L\left( x,\overline{\lambda }\right) ,\quad \forall x\in C,\lambda \in \mathbb {R}_{+}^{m}\), and
(CC) \(\overline{\lambda }_{i}g_{i}\left( \overline{x}\right) =0,\quad \forall i\in I\),
Proof
4.4.3 Lagrange Duality
The main result in any duality theory establishes that \(v\left( D^L\right) =v\left( P\right) \) under suitable assumptions. This equation allows to certify the optimality of the current iterate or to stop the execution of primaldual algorithms whenever an \(\varepsilon \)optimal solution has been attained. Strong duality holds when, in addition to \(v\left( D^L\right) =v\left( P\right) \), \(G^{*}\ne \emptyset \). In linear optimization, it is known that the simultaneous consistency of the primal problem P and its dual one \(D^L\) guarantees that \( v\left( D^L\right) =v\left( P\right) \) with \(F^{*}\ne \emptyset \ \)and \( G^{*}\ne \emptyset \). This is not the case in convex optimization, where strong duality requires the additional condition that SCQ holds (which is not enough in nonconvex optimization).
Theorem 4.50
(Strong Lagrange duality) If P satisfies SCQ and it is bounded, then \(v\left( D^L\right) =v\left( P\right) \) and \(G^{*}\ne \emptyset \).
Proof
The assumptions guarantee, by Theorem 4.40, the existence of a sensitivity vector \(\overline{y}\in \mathbb {R}_{+}^{m}\), and this vector satisfies \(h\left( \overline{y}\right) =v\left( P\right) \) by Theorem 4.42. Then, \(v\left( P\right) =h\left( \overline{y}\right) \le v\left( D^L\right) \) and the conclusion follows from the weak duality theorem. \(\square \)
Therefore, in the simple Example 4.45, where SCQ holds, one has \(h\left( y\right) =\inf _{x\in \mathbb {R}}\left( x^{2}+y\left( x^{2}1\right) \right) =y\), for all \(y\in \mathbb {R}^m_{+}\). So, \(v\left( D^L\right) =\sup _{y\in \mathbb {R}_{+}}h \left( y\right) =0=v\left( P\right) \) and the optimal value of \(D^L\) is attained at \(\overline{ y}=0\).

Obtaining, as in linear optimization, an exact optimal solution of \( D_{Q}^{L}\) by means of some quadratic solver, and then the aimed optimal solution of \(P_{Q}\) by using (SC) and (CC);

Interrupting the execution of any primaldual algorithm whenever \(f\left( x_{k}\right) h\left( y_{k}\right) <\varepsilon \) for some tolerance \(\varepsilon >0\) (approximate stopping rule), as shown in Fig. 4.24.
4.4.4 Wolfe Duality
Ph. Wolfe, an expert in quadratic optimization, proposed in 1961 [90] an alternative dual problem that allowed to cope with convex quadratic optimization problems whose objective function fails to be strongly convex.
Proposition 4.51
(Weak duality) It holds that \(v\left( D^{W}\right) \le v\left( P\right) \).
Proof
Theorem 4.52
(Strong Wolfe duality) If P is solvable and SCQ holds, then \(v\left( D^{W}\right) =v\left( P\right) \) and \(D^{W}\) is solvable.
Proof
Let \(\overline{x}\in F^{*}\). It will be sufficient to show the existence of some \(\overline{y}\in \mathbb {R}_{+}^{m}\) such that \( \left( \overline{x},\overline{y}\right) \in G^{*}\) and \(f\left( \overline{x}\right) =L\left( \overline{x},\overline{y}\right) \).
4.5 A Glimpse on Conic Optimization\(^{\star }\)

The positive orthant of \(\mathbb {R}^{m}\), \(\mathbb {R}_{+}^{m}\), which converts the conic constraint \(Ax+b\in K\) into an ordinary linear inequality system;
 The secondorder cone, also called the icecream cone,and cartesian products of the form \(\prod _{j=1}^{l}K_{p}^{m_{j}+1}\) (\(K_{p}^{3}\) is represented twice in Fig. 4.6).$$ K_{p}^{m}:=\left\{ x\in \mathbb {R}^{m}:x_{m}\ge \left\ {\left( x_{1},\ldots , x_{m1}\right) }^{T} \right\ \right\} , $$

The cone of positive semidefinite symmetric matrices in \( \mathcal {S}_{q}\), usually denoted by \(\mathcal {S}_{q}^{+}\). Here, we identify the space of all \( q\times q\) symmetric matrices \(\mathcal {S}_{q}\) with \(\mathbb {R}^{q(q+1)/2}=\mathbb {R }^{m}\). From now on, we write \(A\succeq 0\) when \(A\in \mathcal {S}_{q}\) is positive semidefinite and \(A\succ 0\) when it is positive definite.
It is easy to see that the above cones are closed and convex, and they have nonempty interior (observe that \(A\succ 0\) implies that \(A\in \mathop {\mathrm{int}} \mathcal {S}_{q}^{+}\) as the eigenvalues of A are continuous functions of its entries). Moreover, they are pointed and symmetric (meaning that \(K\cap K=\left\{ 0_{m}\right\} \) and that \(K^{\circ }=K\), respectively). The pointedness of \(\mathbb {R}_{+}^{m}\) and \(K_{p}^{m}\) is evident, while, for \(\mathcal {S}_{q}^{+}\), it follows from the characterization of the symmetric positive semidefinite matrices by the nonnegativity of their eigenvalues. The symmetry is also evident for \(\mathbb {R}_{+}^{m}\), it can be easily proved from the Cauchy–Schwarz inequality for \(K_{p}^{m}\), and it is a nontrivial result proved by Moutard for \(\mathcal {S}_{q}^{+}\) [58, Theorem 7.5.4].

The Slater constraint qualification for the pair \(P_{K}^{1}D_{K}^{1}\) is weaker than the one corresponding to the pair \(P_{K}^{2}D_{K}^{2}\), i.e., the existence of some \(\widehat{x}\in \mathbb {R}^{n}\) such that \( g\left( \widehat{x}\right) <0\).

\(D_{K}^{2}\) can hardly be solved in practice as no explicit expression of g is available except for particular cases.

In linear optimization, \(\mathcal {Z}=\mathbb {R}^{m}\), \( \left\langle c, x\right\rangle =c^{T}x\), and \(\mathcal {K}=\mathbb {R}_{+}^{m}\). Strong duality holds just assuming the existence of some primaldual feasible solution; i.e., no CQ is needed.

In secondorder cone optimization, \(\mathcal {Z} =\prod _{j=1}^{l}\mathbb {R}^{n_{j}+1}\), \(\left\langle c, x\right\rangle =\sum _{j=1}^{l}c_{j}^{T}x_{j}\), and \(\mathcal {K} =\prod _{j=1}^{l}K_{p}^{n_{j}+1}\).

In semidefinite optimization, \(\mathcal {Z}=\mathcal {S}_{n}\), \( \left\langle C, X\right\rangle \) is the trace of the product matrix CX, and \(\mathcal {K}=\mathcal {S}_{n}^{+}\).
The SCQ in secondorder cone optimization and semidefinite optimization reads as follows: There exists a primaldual feasible solution \(\left( \widehat{x},\left( \widehat{y},\widehat{z}\right) \right) \) such that \( \widehat{x},\widehat{z}\in \mathop {\mathrm{int}}\mathcal {K}\). For the mentioned last two classes of optimization problems, SCQ also guarantees the existence of a primal optimal solution [6]. A favorable consequence of the symmetry of the three mentioned cones is that the corresponding conic problems admit efficient primaldual algorithms [69].
Many textbooks on convex optimization, e.g., [10, 15], pay particular attention to the theory and methods of conic optimization. Several chapters of [7] also deal with the theory and methods of conic and semidefinite optimization, while [2] is focused on secondorder cone optimization. Regarding applications, [10, 15] present interesting applications to engineering and finance (e.g., the portfolio problem), while [85] contains chapters reviewing applications of conic optimization to nonlinear optimal control (pp. 121–133), truss topology design (pp. 135–147), and financial engineering (pp. 149–160).
4.6 Exercises
4.1
4.2
Express a positive number a as the sum of three numbers so that the sum of its corresponding cubes is minimized, under the following assumptions:
(a) The three numbers are arbitrary.
(b) The three numbers are positive.
(c) The three numbers are nonnegative.
4.3
4.4
4.5
Solve the optimization problem posed in Exercise 1.3 by using the KKT conditions.
4.6
4.7
(b) Solve P graphically.
(c) Prove analytically that the result obtained in (b) is really true.
(d) Solve the problem obtained when we add to P the constraint \( x_{2}\ge 3\).
4.8
4.9
(b) Analyze the fulfillment of the KKT conditions at the point obtained in (a).
4.10
4.11
4.12
(b) Check that \(\mathcal {\vartheta }\) is convex and differentiable at 0.
(c) Find the set of sensitivity vectors for \(P\left( 0\right) \).
4.13
(b) Determine whether the value function is convex and whether P satisfies SCQ.
(c) Compute a sensitivity vector.
4.14
The utility function of a consumer is \(u\left( x, y\right) =xy\), where x and y denote the consumed quantities of two goods A and B, whose unit prices are 2 and 3 c.u., respectively. Maximize the consumer utility, knowing that she has 90 c.u.
4.15
A tetrahedron (or triangular pyramid) is rectangular when three of its faces are rectangle triangles that we will name cateti, whereas the fourth face will be named hypotenuse. Design a rectangular tetrahedron whose hypotenuse has minimum area being the pyramid height on the hypotenuse h meters.
4.16
(b) Study the continuity and differentiability of \(\mathcal {\vartheta }\) on its domain.
(c) Compute the sensitivity vectors.
(d) Determine whether the optimal values of P and of its Lagrange dual problem \(D^{L}\) are equal.
4.17
(b) Study the continuity and differentiability of \(\mathcal {\vartheta }\) on its domain.
(c) Compute the sensitivity vectors of \(\mathcal {\vartheta }\).
(d) Compute the optimal set of \(P\).
(e) Compute the optimal set of its Lagrange dual problem \(D^{L}\).
(f) Determine whether strong duality holds.
4.18
4.19
(b) Express analytically and represent graphically \(\mathop {\mathrm{gph}}\mathcal {F}\).
(c) Identify the value function \(\mathcal {\vartheta }\) and represent graphically \(\mathop {\mathrm{gph}}\mathcal {\vartheta }\).
(d) Analyze the differentiability of \(\mathcal {\vartheta }\) on the interior of its domain.
(e) Compute the sensitivity vectors of \(\mathcal {\vartheta }\) (here scalars, since \(m=1\)).
(f) Compute the saddle points of the Lagrange function L.
(g) Compute the KKT vectors on the optimal solutions of \(P\).
(h) Check the strong duality property for the Lagrange dual \(D^{L}\) and for the Wolfe dual \(D^{W}\) of \(P\).