Robust convex optimization: A new perspective that unifies and extends

Robust convex constraints are difficult to handle, since finding the worst-case scenario is equivalent to maximizing a convex function. In this paper, we propose a new approach to deal with such constraints that unifies most approaches known in the literature and extends them in a significant way. The extension is either obtaining better solutions than the ones proposed in the literature, or obtaining solutions for classes of problems unaddressed by previous approaches. Our solution is based on an extension of the Reformulation-Linearization-Technique, and can be applied to general convex inequalities and general convex uncertainty sets. It generates a sequence of conservative approximations which can be used to obtain both upper- and lower- bounds for the optimal objective value. We illustrate the numerical benefit of our approach on a robust control and robust geometric optimization example.


Introduction
In this paper, we consider a general hard robust constraint h ( A(x)z + b(x)) ≤ 0, ∀z ∈ Z, (1) where h : R m → [−∞, +∞] is a proper, closed and convex function, A : R n x → R m×L , b : R n x → R L are affine, and Z is a nonempty convex subset of R L . Unlike inequalities that are concave in the uncertain parameters [4], a tractable equivalent reformulation for the robust constraint (1) is out of reach in general.
When h(·) is (conic) quadratic, several exact reformulations have been proposed for specific uncertainty sets. We refer to [3,Chapter 6] for an overview of these methods. In the general case, generic approximate methods have been proposed for safely approximating (1). For homogeneous convex functions h(·) and symmetric norm-based uncertainty sets Z, Bertsimas and Sim [8] propose a computationally tractable safe approximation together with probabilistic guarantees. Zhen et al. [21] develop safe approximations for conic quadratic and SDP problems with polyhedral uncertainty sets. They first reformulate the robust inequality into an equivalent set of linear adaptive robust constraints, which can then be approximated by, e.g., static or affine decision rules. Roos et al. [18] extend their approach with affine decision rules to general convex functions h(·) but still with polyhedral uncertainty sets. For general convex uncertainty sets, they propose a similar approach, yet using static decision rules only.
In this paper, we propose a new general safe approximation procedure for robust convex constraint (1). Our approach is 'general' in the sense that it can be applied to a generic convex function h(·) and convex uncertainty set Z. Using the Reformulation-Perspectification-Technique (RPT) as a unifying lens [20], our approach not only unifies many previous work from the literature but also extends them in a significant way. First, it is superior on special cases with homogeneous functions h(·) and/or polyhedral uncertainty sets. Second, it is applicable to any convex function h(·) and convex uncertainty set Z under mild assumptions, namely that one can explicitly describe the uncertainty set and the domain of the conjugate of h (see definition in Sect. 1.2) via convex inequalities.
We develop our main approximation for a generic class of uncertainty sets, described as the intersection of a polyhedron and a convex set. We first prove that the original uncertain convex inequality (1) is equivalent to an uncertain linear constraint with a non-convex uncertainty set. In other words, we reformulate (1) as a linear robust constraint where the new uncertainty set Θ is non-convex due to the presence of bilinear equality constraints. We then use the recently developed Reformulation-Perspectification-Technique (RPT) [20] to obtain a hierarchy of safe approximations, i.e, we construct convex sets Θ i , i = 0, 1, . . . , 3, such that Θ ⊆ Θ 3 ⊆ Θ 2 ⊆ Θ 1 ⊆ Θ 0 . We show that this approach unifies a variety of methods known in the literature, and extends them in a significant way. When the uncertainty set lies in the non-negative orthant, we show that the first-level approximation Θ 1 coincides with the perspectification approach, a generalization of the approach of Bertsimas and Sim [8] for which we also provide probabilistic guarantees. Consequently, our approach connects and generalizes further the work of Bertsimas and Sim [8] and Roos et al. [18] to a broader class of uncertainty sets, and improves them by proposing tighter approximations, Θ 2 and Θ 3 . For polyhedral uncertainty sets, our RPT-based approach with Θ 1 coincides with the approximation from Roos et al. [18] with linear decision rules [18,Theorem 2].
The paper is structured as follows: In Sect. 2, we present our RPT-based hierarchy of safe approximations. We connect it with existing results from the literature in Sect. 3. In particular, for uncertainty sets lying in the non-negative orthant, we show in Appendix A that our first-level approximation with Θ 1 can be alternatively derived using a 'perspectification' technique and provide probabilistic guarantees in this case. When applied to a minimization problem, our safe approximations provide a valid upper bound on the objective value. Our approach can also be adapted to derive lower bounds, as described in Sect. 4. Finally, we assess the numerical performance of our technique in Sect. 5, and conclude our findings in Sect. 6.

Examples
In this section, we present a few examples of the robust convex constraint (1) often encountered in the literature and summarize them in Table 1.
Quadratic Optimization (QO) We consider the general quadratic constraint where F : R n x → R m×L , f : R n x → R L and g : R n x → R are affine in x. The constraint is of the form (1)  .
Alternatively, the constraint can be represented as a second-order cone constraint which is also of the form (1)

Piecewise linear constraints (PWL)
We consider the piece-wise linear convex constraint max k∈ [m] a k (x) z + b k (x) ≤ 0, ∀z ∈ Z, where a k : R n x → R L and b k : R n x → R are affine functions of x. Such a constraint is a special case of (1) with

Sum-of-max of linear constraints (SML)
The sum-of-max of linear constraints is defined as h( y) := k max i∈I k y i , for any y ∈ R n y .
Geometric optimization (GO) In geometric optimization, the function h in (1) is the log-sum-exp function defined by h( y) := log e y 1 + · · · + e y n y for any y ∈ R n y .
Sum In this paper, we study robust convex constraints, and not only robust conic constraints, on purpose. It is generally believed that most convex problems can be modeled in terms of the five basic cones: linear, quadratic, semi-definite, power, and exponential cones. Still, we do not restrict our attention to robust conic optimization for the following reason: Although conic optimization is a powerful modeling framework, the uncertainty is often split over multiple constraints, and the robust version of the conic representation is not equivalent to the original one. For example, while the sum-of-max of linear functions is linear-cone representable, its robust counterpart is not necessarily equivalent with the robust version of the original constraint. The conclusion is that the right order is to first develop the robust counterpart of the nonlinear inequality and then reformulate it into a conic representation, instead of the other way around.

Notations
We use nonbold face characters (x) to denote scalars, lowercase bold faced characters (x) to denote vectors, uppercase bold faced characters (X) to denote matrices, and bold calligraphic characters such as X to denote sets. We denote by e i the unit vector with 1 at the ith coordinate and zero elsewhere, with dimension implied by the context. We use [n] to denote the finite index set [n] = {1, . . . , n} with cardinality |[n]| = n. Whenever the domain of an optimization variable is omitted, it is understood to be the entire space (whose definition will be clear from the context).
The function δ * (x|S) denotes the support function of the set S evaluated at x, i.e., δ * (x|S) = sup y∈S y x.
A function f : R n y → [−∞, +∞] is said to be proper if there exists at least one vector y such that f ( y) < +∞ and for any y ∈ R n y , f ( y) > −∞. For a proper, closed and convex function f , we define its conjugate as f * (w) = sup y∈dom f w y − f ( y) . When f is closed and convex, f * * = f . If the function f : R n y → [−∞, +∞] is proper, closed and convex, the perspective function of f is defined for all y ∈ R n y and t ∈ R + as The function f ∞ : R n y × R + → [−∞, +∞] is the recession function, which can be equivalently defined as Among others, the perspective and conjugate functions of f satisfy the relationship For ease of exposition, we use t f ( y/t) to denote the perspective function f .

The reformulation-perspectification approach
In this section, we describe our main approach, which is based on an extension of the Reformulation-Linearization-Technique, that is, Reformulation-Perspectification-Technique [20]. Our approach comprises three steps: -Step 1. Reformulate the robust convex constraint (1) into a robust linear optimization problem with bilinear equalities in the uncertainty set. -Step 2. Apply the Reformulation-Perspectification-Technique to get a safe approximation of the robust convex inequality that is linear in the uncertain parameters. -Step 3. Construct a computationally tractable robust counterpart of the approximation obtained in Step 2 by using the approach described in Ben-Tal et al. [4].

Step 1: Bilinear reformulation
We first describe Step 1 in more detail. The following proposition shows that the robust convex constraint (1) is equivalent to a robust linear constraint with bilinear equality constraints in the uncertainty set. where Proof Because h is closed and convex we have and thus for the properly defined set Θ.
The complication in (2) is that the extended uncertainty set Θ is not convex due to the bilinear constraint V = wz . Therefore, in Step 2 we propose a safe approximation of Θ based on the Reformulation-Perspectification-Technique [20]. Formally, we provide tractable approximations of the robust convex constraint in (1) by exhibiting convex constitutes a safe approximation of (1). In addition, this safe approximation is linear in the uncertain parameter (w, w 0 , z, V , v 0 ) and convex in the decision variable x. Hence, provided that the new uncertainty sets Φ is convex, its robust counterpart can be derived in a tractable manner using techniques from Ben-Tal et al. [4] (Step 3). Observe that since the objective function is linear, we can replace Θ by conv(Θ) in (2) without loss of optimality. So we effectively need to construct sets Φ that approximate conv(Θ) as closely as possible.

Step 2: Hierarchy of safe approximations
To develop a hierarchy of safe approximations (Step 2) for the robust non-convex constraint (2), we assume that the uncertainty set Z is described through K 0 linear and K 1 convex inequalities.

Assumption 1 The uncertainty set
Note that the decomposition of Z into linear and convex constraints in Assumption 1 is not unique. Indeed, affine functions are also convex so one could assume K 0 = 0 without loss of generality. However, the presence of linear constraints in the definition of Z will be instrumental in deriving good approximations. Consequently, from a practical standpoint, we encourage to include as many linear constraints as possible in the linear system of inequalities Dz ≤ d. We illustrate this modeling choice on a simple example. Similarly, we assume that dom h * can be described with J 0 linear and J 1 convex inequalities.

Assumption 2
The set dom h * = {w | Fw ≤ f , g j (w) ≤ 0, j ∈ [J 1 ]} has J 0 linear inequalities defined by F and f , and J 1 nonlinear inequalities defined by g j , j ∈ [J 1 ], where g j is proper, closed, and convex for each j ∈ [J 1 ].
Under Assumptions 1 and 2 , the non-convex uncertainty set Θ can be represented as Each of these two set memberships can be expressed through constraints that are convex in (w, w 0 , z, V , v 0 ). As a result, the three sets where the last equality follows from the fact that V = wz . Since w ∈ dom h * , we must have d k w − V D e k d k − e k Dz ∈ dom h * . We then derive convex constraints in (w, w 0 , z, V , v 0 ) that enforces this set membership. By multiplying both sides of a generic convex constraint on w, g(w) ≤ 0, by the non-negative scalar d k − e k Dz, we obtain which is convex in d k − e k Dz, d k w − V D e k , since the constraint function is the perspective of the convex function g. Hence, it is convex in (w, z, V ). Note that if d k − e k Dz is not strictly positive, constraint (4) should still hold by a limit argument. We apply this methodology to: -Linear constraints on w, g(w) = Fw − f ≤ 0. In this case, (4) leads to and can be expressed as a linear constraint in (w, z, V ): -and the epigraph constraint, h * (w) ≤ w 0 , yielding where the equivalence follows from v 0 = w 0 z.
Together, these constraints (5)-(6)-(7) enforce that b) Similarly, we can consider a linear constraint in w, f j − e j Fw ≥ 0, for some j ∈ [J 0 ]. We have Given a generic convex constraint in z, c(z) ≤ 0, we obtain a new convex constraint Applying these manipulations to the linear constraints Dz ≤ d yields Taking Eventually, we obtain constraints (8)- (9), enforcing that Proposition 2 exhibits a hierarchy of approximations to convexify the non-convex set Θ. A safe approximation is then obtained by considering the robust linear constraint with Φ = Θ i , i = 0, 1, 2. Observe that for Θ 0 , the robust counterpart of the safe approximation is Accordingly, it is fair to admit that Θ 0 offers a poor approximation and that relevant safe approximations are to be found at higher orders of the hierarchy. A higher order in the hierarchy, however, does not necessarily imply a tighter safe approximation, as discussed in Sect. 2.5. So far, we obtained convex constraints on (w, w 0 , z, V , v 0 ) by multiplying a linear constraint on z (resp. w) with a convex constraint on w (resp. z). However, in many cases multiplying two general convex constraints, e.g., multiplying c k (z) ≤ 0 with g j (w) ≤ 0, also yields valuable convex constraints. The resulting set, which we denote Θ 3 and concisely define as provides an even tighter safe approximation of (2). In the definition above, "c k (z) × g j (w) ≤ 0" denotes valid convex inequalities obtained when multiplying "c k (z) ≤ 0" with "g j (w) ≤ 0". In the following example, we illustrate this approach when c k and g j are two conic quadratic functions, which typically occurs with ellipsoidal uncertainty sets and a conic quadratic function h. We refer to Zhen et al. [20] for a thorough analysis of all 15 possible multiplications of any two of the five basic convex cones (linear, conic quadratic, power, exponential, semi-definite).

Example 2 Consider two conic quadratic inequalities
the first one coming from the description of dom h * , and the second is one of the constraints that define Z. We now apply RPT to these inequalities, i.e., we multiply the LHS and RHS of the two constraints, and we multiply the LHS of the first constraint with the second constraint, and multiply the LHS of the second constraint with the first constraint. We obtain: Here, · 2 denotes the 2-norm of a matrix defined as its largest singular value, i.e., In particular, we used the fact that for rank-1 matrices M = uv , M 2 = u v . In practice, constraints of the form M 2 ≤ λ can be modeled as semidefinite constraints using Schur complement.
To improve scalability, however, one can replace the 2-norm in the first constraint by the Frobenius norm of the matrix. Indeed, M 2 ≤ M F , and the equality holds for rank-1 and zero matrices, so are valid, yet looser, constraints.
We could even further tighten our approximations by applying RPT to z and to w separately. More specifically, we could multiply the linear and nonlinear constraints that define Z with each other and express them in terms of z, an additional variable defined as Z = zz . The same could be done for the constraints in dom h * , and define W = ww . By doing so, all the nonlinear substitutions are concisely defined as In a relaxation, the equality = is replaced by and, by using Schur complements, yields the semidefinite constraint: Hence, we can add this LMI to any of the safe approximations obtained by RPT Θ i and the resulting uncertainty set, Θ L M I ,i , would provide a safe approximation at least as good as the one obtained from Θ i . However, whether the extra computational burden outweighs the improvement of the approximation remains an open question, whose answer might depend on the specific problem. On a geometric optimization example in Sect. 5.2, for instance, we observed no numerical improvement from adding this LMI.

Step 3: Tractable robust counterpart
Finally, the safe approximation (10) is a robust constraint, linear in the uncertain parameters (w, w 0 , z, V , v 0 ), and with a convex uncertainty set Φ = Θ i , i = 0, 1, 2, 3. Hence, following the approach of Ben-Tal et al. [4], one can derive its robust counterpart by computing the support function of Φ, δ * (·|Φ). In particular, (10) is equivalent to In our hierarchy, note that Θ i+1 is obtained by imposing additional constraints to Consequently, the support function of Θ i+1 can be expressed as a function of the support functions of Θ i and Φ i+1 [see [4,Lemma 6.4]].

Description of the approach for conic constraints
In this section, we describe our Reformulation-Perspectification-Technique for conic inequalities. We consider an uncertain conic constraint of the type where K ∈ m is a closed convex cone with nonempty relative interior. Examples of these cones are the linear cone, the second-order cone, the semi-definite cone, the power cone, and the exponential cone. We refer the reader to MOSEK [16] for an extensive treatment of these five cones. The dual cone, referred to as K * of K is defined as and is again a convex cone. Furthermore, we have K = (K * ) * , because the cone is closed and convex. Using these properties, we derive for (12) We can now directly apply our RPT approach to the last inequality that is bilinear in the uncertain parameters w and z. Note that this conic case is a special case of (2) where we have h * (w) = 0, and dom h * = K * .

Trade-off between approximation accuracy and computational complexity
Our RPT approach exhibits a hierarchy of increasingly tighter safe approximations.
In this section, we discuss the trade-off between the quality of a safe approximation and its computational tractability.
In terms of computational tractability, Table 2 reports the number of linear and convex inequalities involved in each set of our hierarchy, Hence, Θ 2 requires no additional linear constraints compared with Θ 1 . Moving one level higher in the hierarchy requires adding a quadratic number of linear/convex constraints compared with the number of constraints defining Z and dom h * . Furthermore, our approach involves the perspective functions and conjugates of the functions defining Z and dom h * . If a function is conic representable in a cone K, then its perspective (resp. conjugate) is conic representable in the same cone K (resp. the dual cone K * ). On this aspect, our approach does not increase the computational complexity.
Regarding approximation guarantees, we note that if the description of Z involves no linear constraints (K 0 = 0), then Θ 0 = Θ 1 , and Θ 2 would provide a substantial benefit. Alternatively, when Z is polyhedral (J 0 = 0), Θ 1 = Θ 2 and Θ 1 provides all the benefit. Consequently, we intuit that the relative benefit of Θ 2 over Θ 1 depends on how binding the convex constraints in Z are. For instance, if the description of Z contains one convex constraint c(z) ≤ 0 that is redundant with the linear constraints Dz ≤ d, then the constraints in Θ 2 are redundant with linear constraints in Θ 1 .
Proof As previously observed, imposing constraints (5) for all k ∈ [K 0 ] in Θ 1 is equivalent to imposing (8) for all j ∈ [J 0 ]. Hence, for any j ∈ [J 0 ], for any Hence, c( y) ≤ 0 and constraints (13) are satisfied. Note that the result holds if f j − e j Fw = 0, by continuity.
When Z is polyhedral, we will show in Sect. 3.2 that RPT with Θ 1 can be viewed as approximating an adaptive robust optimization problem (with uncertainty set dom h * ) with linear decision rules [as in [18,Theorem 2]]. In addition, if h is piecewise linear, then dom h * is a simplex. Since linear decision rules are optimal for simplex uncertainty sets, as proved by Bertsimas and Goyal [7, Theorem 1] and generalized in Ben-Ameur et al. [1,Corollary 1], Θ 1 provides an exact reformulation of (1). In general, for polyhedral Z, the quality of the approximation with Θ 1 will thus depend on the geometry of dom h * , in particular its symmetry and "simplex dilation factor" [5]; see El Housni and Goyal [12,13] and references therein for some average-and worst-case performance results of linear decision rules. Theoretical results to precisely quantify the benefit, in terms of approximation guarantee, of each level of the hierarchy constitutes an interesting future direction.
From a practical standpoint, when using our approach with the uncertainty set Θ 1 , one can check whether the additional constraints in Θ 2 are satisfied by the worst-case scenario for the current solution. If so, the current solution remains robust feasible for Θ 2 and Θ 2 will not provide any improvement. Alternatively, our approach provides both an upper and lower bound (see Sect. 4). Combined, they can be used to measure the approximation gap and decide whether to use Θ 2 instead.

Connection with previous approaches from the literature
In this section, we show that our RPT approach unifies many existing robust convex optimization techniques, and extends them in a significant way.

Case when the uncertainty set lies in the non-negative orthant
First, we restrict our analysis to cases where the linear inequalities describing Z in Assumption 1 are non-negativity constraints only. Formally, we make the following assumption: Note that the non-negativity assumption of Z can always be satisfied by properly lifting or shifting the uncertainty set (see the detailed discussion in Sect. 5.1 and Appendix A). Under this more specialized assumption, we reformulate our RPT-based safe approximation with Θ 1 (Corollary 1) and discuss its connection with the approaches from Bertsimas and Sim [8] and Roos et al. [18]. We will later refer to Corollary 1 as the perspectification approach because it can also be derived without invoking the RPT but a perspectification lemma instead (as we do in Appendix A).

Corollary 1 For Z satisfying Assumption 3, the safe approximation obtained with RPT and Θ 1 is equivalent to
Proof We need the following identity, for any u ≥ 0, Denoting between brackets the dual variables associated with constraints in the definition of Θ 1 , Let us observe that and similarly for the terms involving V e /z , ∈ [L], so that where the last equality holds because u = 0 and u = 1.
Remark 1 Corollary 1 might be uninformative if the recession function of h is unbounded (e.g., if h is strongly convex). In this case, we propose an alternative to Corollary 1 with the additional requirement that h(0) = 0. Indeed, h(0) = 0 implies w 0 ≥ h * (w) ≥ 0. In this case, we should define in Proposition 2, where the membership v 0 /w 0 ∈ Z is enforced by multiplying each constraint in the definition of Z by w 0 ≥ 0. The resulting safe approximation is: The key benefit from Corollary 1 is that it preserves the uncertainty set. Namely, it safely approximates the robust convex constraint (1) by a robust constraint that is linear in the same uncertain parameter z. Consequently, one can also derive probabilistic guarantees for this safe approximation, after positing a generative distributional model for z. We derive such guarantees in Appendix A.
We now compare this formulation with the approaches from Bertsimas and Sim [8] and Roos et al. [18].
For positively homogeneous functions h, we show that Corollary 1 recovers the approach of Bertsimas and Sim [8].
Indeed, Bertsimas and Sim [8] construct a safe approximation for the robust constraint (1) in the special case where the function h(·) is positively homogeneous, i.e., h(λ y) = λh( y) for any λ > 0 and y. In this case, so Corollary 1 is equivalent to which coincides with the approximation proposed by Bertsimas and Sim [8]. Accordingly, the RPT approach extends the one in Bertsimas and Sim [8] in multiple significant ways. First, Bertsimas and Sim [8] require h to be convex and homogeneous, hence sub-additive. They use sub-additivity to derive their safe approximation. In contrast, Corollary 1 applies to any convex function. In Appendix A, we show how Corollary 1 can also be obtained independently from the RPT approach, by using a generalization of sub-additivity to general convex functions called the perspectification lemma (Lemma 1). Second, the RPT approach can lead to tighter approximations when (a) some of the generic convex constraints c k (z) ≤ 0 are also linear and can be used to define a smaller set Θ 1 , or when (b) a higher set in the hierarchy of approximations is used. Finally, besides the safe approximation, Bertsimas and Sim [8] also propose a non-convex lifting procedure to transform any norm-based set into a lifted uncertainty set that lies in the non-negative orthant and satisfies Assumption 3. On that regard, our proposal is agnostic of the procedure used in practice to ensure Assumption 3 holds.
We now show that Corollary 1 coincides with the safe approximation of (1) from Roos et al. [18] with static policies.
Roos et al. [18] propose to first reformulate (1) as an adaptive robust nonlinear constraint, which they then approximate with static decision rules: Proposition 3 is equivalent to Corollary 1 after observing that, for any y, Consequently, our RPT approach extends Roos et al. [18,Theorem 7] by providing tighter approximations when additional linear constraints are present or when using a higher set in the hierarchy.

Case when the uncertainty set is a polyhedron
In this subsection, we show that the safe approximation obtained from Θ 1 for (1) with Z = {z | Dz ≤ d}, where D ∈ R K 0 ×L , d ∈ R K 0 , coincides with that of Theorem 2 in Roos et al. [18].
It follows from Proposition 2 that the convex robust constraint (1) can be safely approximated by The corresponding tractable robust counterpart constitutes a safe approximation of (1), which coincides with the safe approximation of (1) proposed in Roos et al. [18,Theorem 2].

Connection to result for robust quadratic with an ellipsoid
It is well-known that robust quadratic inequalities with an ellipsoidal uncertainty set admit an exact SDP reformulation via the S-lemma [see [3]]. Alternatively, we show that the same reformulation can be obtained via RPT. To this end, consider the following robust quadratic inequality, which is a special case of the bilinear reformulation in Proposition 1 with w = z, which is equivalent to The set Θ ell is non-convex because of the nonlinear equality Z = zz , and the outer approximations of Θ ell proposed in Proposition 2 satisfy Θ ell ⊆ Θ ell,0 = Θ ell,1 = Θ ell,2 . By relaxing the non-convex constraint Z = zz to Z zz , and then using Schur complement we obtain: The tractable robust counterpart of max (z,Z)∈Θ ell,3

Tr( A(x)Z) + b(x) z ≤ g(x)
coincides with the tractable robust counterpart of (15) obtained from using the Slemma. Moreover, if the uncertainty set constitutes an intersection of ellipsoids, the obtained safe approximation for the robust quadratic inequality via RPT coincides with that from the approximate S-lemma [2]. In Zhen et al. [20], a similar relationship between the (approximate) S-lemma and RPT is established for quadratically constrainted quadratic optimization.

Obtaining lower bounds
One simple way of relaxing constraint (1) is to consider a finite subset of scenarios Z sampled from the uncertainty set Z, and the sampled version of (1) constitutes a finite set of convex inequalities: where {z (1) , · · · , z (I ) } = Z ⊆ Z. Here, the point-wise maximum of convex functions in the sampled inequality can be reformulated into a finite set of convex inequalities (16), and provides a naive progressive approximation of (1). The question that remains is how to choose the finite set of scenarios. One simple way to do this would be to consider all extreme points of Z, and the sampled version (16) is then optimal. However, the number of extreme points of Z could be large in general. Hadjiyiannis et al. [14] propose a way to obtain a small but effective finite set of scenarios for two-stage robust linear problems, which takes scenarios that are binding for the model solved with affine policies.
Alternatively, we can also adopt a dual perspective and sample scenarios from dom h * . Consider the following form of the bilinear reformulation (2) with a fixed x: The embedded maximization problem in (17) can be interpreted as a disjoint bilinear problem, that is, fixing either z or w, the embedded problem becomes a convex optimization problem. Hence, given a finite set W ⊆ dom h * , the constraints provide a progressive approximation of (1) as well. Hence, we propose a hybrid approach to obtain valid lower bounds where we replace both Z and dom h * in (17) by finite subsets Z and W respectively. The idea here is adopted from Zhen et al. [21].
For a fixed x, we observe that the embedded problem in (17) becomes a convex optimization problem whenever z or w is fixed. By exploiting this observation, for any given z ∈ Z ⊆ Z, the maximizer is a violating scenario if the corresponding optimal value is larger than 0, and w can be used to update W, that is, W ← W ∪ {w }. Subsequently, fixing the obtained w in (17), the maximizer is a violating scenario if the corresponding optimal value is larger than 0, and z can be used to update Z, that is, Z ← Z ∪ {z }. We can repeat this iterative procedure till no improvement on the lower bound for (17) can be achieved. The enriched sets Z and W can then be used to improve x by solving the optimization problem with both (16) with Z and (18) with W. The procedure can be repeated till no improvement can be achieved or the prescribed computational limit is met.

Remark 2 One can improve the efficiency of the iterative procedure if A(x) = A
in (1), that is, A is independent of x. In this case, for a given w , de Ruiter et al. [19] have shown that the maximizer z in (19) dominates w , that is, the feasible region of x in (16) with Z = {z } is larger or equal to the one in (18) with W = {w }. Therefore, the inequalities involving w is redundant, and can be omitted throughout the procedure to improve the efficiency of this iterative procedure.

Computational results
In this section, we assess the numerical benefit from our RPT hierarchy on two examples, a robust control and geometric optimization setting, respectively.

Constrained linear-quadratic control
In this section, we consider an example from stochastic linear-quadratic control as in Bertsimas and Brown [6]. They consider a discrete-time stochastic system of the form: where y k , u k , and z k are the state, control, and disturbance vectors respectively. The objective is to control the system (find control vectors u k ) in order to minimize the cost function under some uncertainty on the disturbance vectors z k . After algebraic manipulations and given a robust description of the uncertainty, the problem can be cast into min where Z = {z : z 2 ≤ γ }, and properly define vector h and matrices F, C (C 0).
Note that x can be subject to other deterministic constraints. For instance, imposing u k ≥ 0 yields linear constraints on x. We refer the reader to Bertsimas and Brown [6] and references therein for problem motivation and details on the derivations. Hence, we focus our attention to the robust convex constraint Note that Z is described using a single quadratic constraint. However, our approximations partly rely on the presence of linear constraints in the definition of the uncertainty set. Hence, one could consider a shifted uncertainty set instead, obtained by introducing redundant linear inequalities such asz ≤ z ≤z withz = γ e andz = −γ e, or a non-convex lifting of Z as in Bertsimas and Sim [8]. In this section, we compare these modeling alternatives with three objectives in mind: (a) illustrate the benefit of our general RPT approach outlined in Sect. 2 over the perspectification approach from Sect. 3.1 (and Appendix A), which requires more stringent assumptions on the uncertainty set, (b) measure the relative benefit from shifting or lifting the uncertainty set, and (c) compare and assess the overall strength of our proposed safe approximations. It is well-known that the robust constraint (20) is equivalent to a semidefinite constraint, which we will use as a benchmark to measure the suboptimality of our approach. In our numerical experiments, as in Bertsimas and Brown [6], we take y 0 = −1, ∈ (0, 1), and q k = r k = 0.

Benefit from RPT over Perspectification
As we derived in Sect. 3.1, applying RPT up to Θ 1 in Proposition 2 leads to the safe approximation presented in Corollary 1. In addition, here, there are no linear constraints in the description of dom h * , so Θ 1 = Θ 2 and the benefit from using RPT (Proposition 2) over perspectification (Corollary 1) might be unclear. Yet, as discussed in Sect. 2.2, RPT is more flexible and can allow for an arbitrary number of linear constraints, while perspectification can only account for non-negativity constraints on z. We illustrate this point on the present example.
To derive meaningful safe approximations, we need to enrich the definition of Z with linear constraints. A generic method for doing so is shifting the uncertainty set. For instance, any z : z 2 ≤ γ satisfies −γ e ≤ z ≤ γ e. Accordingly, we can: -Consider the shifted uncertain vectorz = γ e + z, z ∈ Z, so thatz ≥ 0 and apply Corollary 1. We will refer to this method as [Cor.  Fig. 1).

Benefit from lifting over shifting
Another option to add linear constraints into the definition of the uncertainty set is to decompose z into z = z + − z − and consider the lifted uncertainty set {(z + , z − ) : Observe that this kind of non-convex lifting cannot be applied to any uncertainty set, for the constraint z 2 ≤ γ is replaced by z + + z − 2 ≤ γ -instead of z + − z − 2 ≤ γ -and constitutes a valid lifting of Z for norm-sets only [8]. Yet, as demonstrated on Fig. 2 [8] -competes and even surpasses the third level of approximation on the shifted one. However, we observe that the difference between shifting and lifting shrinks as γ increases. Also, we should emphasize the fact that shifting is a very generic procedure that can be applied to any bounded uncertainty set, without further assumption on its structure. Figure 3 compares the performance of the SDP reformulation with safe approximations obtained from Θ 3 in Proposition 2, with the original uncertainty set {z : z 2 ≤ γ }, its shifted and lifted versions respectively. In short, we observe that our approach successfully provides near-optimal solution to (within 5% optimality) at a fraction of the cost of solving the exact SDP reformulation.

Robust geometric optimization
We evaluate the performance of our proposed approach on geometric optimization instances that are randomly as in Hsiung et al. [15]. In particular, we consider geometric where c = 1 ∈ R n x is the all ones vector, and B (1) i , B (2) i ∈ R n x ×L are randomly generated sparse matrices with sparsity density 0.1 whose nonzero elements are uniformly distributed on the interval [−1, 1]. The uncertainty set is assumed to be an intersection of a hypercube and a ball, that is, We consider a set of instances with n x = I = 100 and L ∈ {6, 8, · · · , 20}. We select a γ = L 2 L Γ (n/2+1) π L/2 , where Γ denotes the gamma function, that ensures the volume of the ball {z | z 2 ≤ γ } coincides with the volume of the hypercube {z | z ∞ ≤ 1}. All the reported numerical results are from the average of 10 randomly generated instances.
Consider the bilinear reformulation of the i-th constraint of (21) sup with the non-convex uncertainty set which is safely approximated by four convex sets that satisfy Θ ⊆ Θ L M I ,2 ⊆ Θ 2 ⊆ Θ 1 ⊆ Θ 1 , where Θ 1 and Θ 2 are obtained from applying Proposition 2 to Θ, while Θ L M I ,2 is obtained by enriching Θ 2 with the LMI in (11). In addition, we also compare the safe approximations from Θ 1 , Θ 2 and Θ L M I ,2 with Θ 1 , where Θ 1 is the corresponding Θ 1 if Z box = {z | z ∞ ≤ 1} is considered instead of Z. Note that the obtained safe approximation from Θ 1 coincides with that of Roos et al. [18] as discussed in Sect. 3.2. We refer to Appendix C.1 for a detailed representation of Θ 1 , Θ 1 , Θ 2 and Θ L M I ,2 . We resort to comparing our solutions' objective value to a lower bound. To this end, we compute a lower bound by using the optimal solution to the tightest obtained safe approximation to find potentially critical scenarios in the uncertainty set. The lower bound is then constructed by solving a model that only safeguards for this finite set of critical scenarios; see more detail in Sect. 4.

Numerical performance
From Fig. 4 we observe that, while our proposed safe approximations from Θ 1 and Θ 2 requires similar computational effort as that from Θ 1 , the average optimality gaps from considering Θ 1 and Θ 2 are consistently smaller than that from considering Θ 1 . For all considered instances, we observe that despite the significant additional computational effort, the optimality gaps obtained from considering Θ L M I ,2 coincide with the optimality gaps from considering Θ 2 . Therefore, we do not report the computation time for the safe approximations from Θ L M I ,2 . Moreover, we also consider the shifted uncertainty set, and observe that the obtained optimality gaps coincide with the optimality gaps when the original uncertainty set is considered. Fig. 4 Computational results on robust geometric optimization. Numerical behavior of the safe approximations from Θ 1 , Θ 1 and Θ 2 as the size of the instance, that is, L ∈ {6, 8, · · · , 20}, increases

Benefit from lifting
Alternatively, we consider the following lifted uncertainty set of Z.
This non-convex projection is first proposed in Bertsimas and Sim [8] and then extended to solve adaptive robust linear problems in Chen and Zhang [11]. By substituting z with z + − z − , the bilinear reformulation becomes with the lifted non-convex uncertainty set which is safely approximated by four convex sets that satisfy Θ + ⊆ Θ + 2 ⊆ Θ + 1 ⊆ Θ + 1 and Θ + ⊆ Θ + 2 ⊆ Θ + 1 ⊆ Θ BS , where Θ + 1 and Θ + 2 obtained from applying Proposition 2 to Θ + , and Θ + 1 is the corresponding We define Θ BS in such a way that the obtained safe approximation coincides with that of the extended approach We refer to Appendix C.2 for a detailed representation of Θ BS , Θ + 1 , Θ + 1 and Θ + 2 . We do not consider Θ + L M I ,2 , that is, Θ + 2 with addition LMI similarly as in (11), because we already observe that Θ L M I ,2 does not improve the approximation from Θ 2 for the problem we consider here.

Numerical performance with lifting
From Fig. 5 we observe that, while our proposed safe approximations from Θ + 1 and Θ + 2 require similar computational effort as that from Θ BS , the average optimality gaps from considering Θ + 1 and Θ + 2 are consistently smaller than that from considering Θ BS . For all considered instances, we observe that the optimality gaps obtained from considering Θ + 1 coincide with the optimality gaps from considering Θ 1 (i.e., without lifting). Therefore, we do not report the computation time for the safe approximations from Θ + 1 .

Overall performance
Firstly, the average optimality gaps from Θ 1 (which has the same performance as Roos et al. [18]; see Sect. 3.2) and Θ BS (which has the same performance as the extended approach of Bertsimas and Sim [8]; see Sects. 3.1 and Appendix A) are very close to each other, while the computational effort required for computing the safe approximation from Θ BS is almost twice as much as that from Θ 1 . Our proposed safe approximations from Θ 1 , Θ 2 , Θ + 1 and Θ + 2 consistently outperform that from considering Θ 1 and Θ BS . Although the computational effort required for computing the safe approximations from Θ + 1 and Θ + 2 is more than twice of that from Θ 1 and Θ BS , the average optimality gaps from Θ + 1 and Θ + 2 are half of the ones from Θ 1 and Θ BS .

Conclusion
We propose a hierarchical safe approximation scheme for (1). Via numerical experiments, we demonstrate that our approach either coincides with or improves upon the existing approaches of Bertsimas and Sim [8] and Roos et al. [18]. Furthermore, our approach not only provides a trade-off between the solution quality and the computational effort, but also allows a direct integration with existing safe approximation schemes, e.g., the lifting technique proposed in Bertsimas and Sim [8] and Chen and Zhang [11], which is used to further improve the obtained safe approximations.

A The perspectification approach
In this section, we propose an extension of the approach proposed by Bertsimas and Sim [8] for general convex constraints and general convex uncertainty sets. Although this approach is dominated by our main RPT-based approach in Sect. 2, it has the advantage that probability guarantees can be derived for the solutions obtained. While Bertsimas and Sim [8] requires the function h to be sub-additive, our generalization is based on the 'perspectification' of the function h(·). As in Bertsimas and Sim [8], we make the additional assumption that the uncertainty set lies in the non-negative orthant, as stated in Assumption 3. The non-negativity assumption of Z holds without loss of generality and can always be satisfied by properly lifting or shifting the uncertainty set. Indeed, if Z R L + , one can decompose z into z = z + − z − with z + , z − ≥ 0. Then, is linear in x and our analysis applies to the lifted uncertainty set However, the lifted uncertainty set Z defined above might be unbounded. As a result, Bertsimas and Sim [8] and Chen and Zhang [11] propose to incorpo-rate a non-convex lifting for norm-based uncertainty sets. For instance, ifZ = z ∈ R L : z ∈ Z, z ∞ ≤ 1, z 2 ≤ γ , Bertsimas and Sim [8] consider the following lifted uncertainty set: Alternatively, we can shift the uncertainty set by some constant factor z 0 such that z + z 0 ∈ z 0 + Z ⊆ R L + . Hence, We numerically discussed the implications of these two alternatives on the numerical behavior of our proposed approximations in Sect. 5.1. As previously discussed in Sect. 3.1, the perspectification lemma we present in this section offers an alternative derivation of Corollary 1, i.e., the RPT-based approximation with Θ 1 under Assumption 3. In addition, for this approach, we derive probabilistic guarantees, i.e., bounds on the probability of constraint violation, under the assumption that the uncertain parameter has sub-Gaussian tails.

A.1 Safe approximations via perspectification
The perspectification approach is based on the following perspectification lemma [19]: for any t ≥ 0 satisfying m i=1 α i t i = 1. Remark 3 If g is positively homogeneous, i.e., for any vector y ∈ R m and any positive scalar λ > 0, g(λ y) = λg( y), then (22) reduces to and the additional variables t i in (22) need not to be introduced. In this case, Lemma 1 simply states that g is sub-additive.

Remark 4
The terms in the right-hand side of Lemma 1 should be interpreted as α i times the perspective function of g at (t i , y i ). In particular, it is defined when t i = 0 and equal to α i g ∞ ( y i ) in this case.
We use this lemma to develop a safe approximation of (1), i.e., a sufficient condition for x to satisfy (1).

Proposition 4 (Safe Approximation via Perspectification) Under Assumption 3, a safe approximation of (1) is:
where A(x)e selects the -th column of the matrix A(x) for each ∈ [L].
Following Remark 4, the terms u h( A(x)e /u l ) in (23a) should be interpreted as persp h (u , A(x)e ). In particular, for u = 0, u h( A(x)e /u l ) = h ∞ ( A(x)e ). Proposition 4 approximates the robust convex constraint (1) via a set of adaptive robust constraints that depend linearly in the uncertainty z. Note, however, that even in the fully adaptive case where (u, u) can depend arbitrarily on the uncertain parameter z, Proposition 4 is only a valid approximation of the robust constraint (1), not an equivalent reformulation.

Remark 5
The safe approximation in Proposition 4 can be concisely written as Expressing h(·) as the conjugate of its conjugate, we have

A.2 Probabilistic guarantees
The main advantage of the safe approximation obtained with Corollary 1-2 is that the resulting constraint is a robust constraint with the same uncertainty set Z for which probabilistic guarantees can be derived. In order to provide such guarantees, distributional assumptions on the "true" uncertain parameter are needed. However, Assumption 3 requires the uncertainty set Z to lie in the non-negative orthant, an assumption which is often satisfied in practice once the original uncertain parameter is shifted or lifted. Accordingly, we derive probabilistic guarantees for these two cases. For clarity, we will refer to this original uncertain vector as u and identify random variables with tildes (·).

A.2.1 Shifted uncertainty set
We first consider the case where the uncertainty set is shifted and make the following assumption:

Assumption 4
We assume that (i) the uncertainty set Z ⊆ R L + can be decomposed into Z = z 0 + U , where U is full dimensional and contains 0 in its interior; (ii) the random vectorz can be decomposed intoz = z 0 +ũ, where z 0 is a deterministic vector andũ is a random vector whose coordinates are L independent sub-Gaussian random variables with parameter 1 (see Definition 1 ).
In practice, the uncertain parameter z is often decomposed into z =ẑ + u, wherê z is the nominal value and u the (uncertain) deviation from the nominal. Then, an uncertainty set for u is built, U, and z is taken in the uncertainty setẑ+U . Assumption 4 mirrors this modelling process with the additional caveat that the uncertainty set is further shifted by z 0 −ẑ in order to guarantee that Z ⊆ R L + . From a probability distribution standpoint, Assumption 4 requires the coordinates ofũ to be sub-Gaussian, i.e., Definition 1 [17, Definition 1.2 in] A random variableũ ∈ R is said to be sub-Gaussian with parameter σ 2 , denotedũ ∼ subG(σ 2 ), ifũ is centered, i.e., E[ũ] = 0, and for all s ∈ R, E e sũ ≤ e s 2 σ 2 2 .
Requiringũ to be sub-Gaussian or, equivalently, to have Gaussian-type tail probabilities, is a fairly general assumption. In particular, standard Gaussian random variables or random variables with mean 0 and bounded between −1 and 1 are special cases of sub-Gaussian random variables with parameter 1 ; see [17,Lemma 1.8]. From a bound on the moment generating function, one can derive large deviation bounds. More precisely, ifũ ∼ subG(σ 2 ), then for any t > 0, P(ũ > t) ≤ exp(−t 2 /(2σ 2 )) [17,Lemma 1.3].
Under this set of assumptions, we derive the following probabilistic guarantees.
Proposition 5 displays a posteriori and a priori probabilistic guarantees, i.e., a bound that depends on a specific robust solution x and one that only depends on the uncertainty set U . In both cases, it involves p 0 ≥ P(z 0) = P(ũ −z 0 ), i.e., the probability that a random uncertain parameterz does not belong to the non-negative orthant. In cases wherez is known or assumed to have bounded support -for instancẽ z ∈ [−1, 1] L almost surely-then z 0 can be chosen large enough so that P(z 0) = 0. Otherwise, for a general z 0 ≥ 0, a union bound inequality yields where the last inequality follows from the fact thatũ i ∼ subG(1). Hence, p 0 is a valid upper bound on P(z 0) and can be as low as 0 and in general depends on the shifting vector z 0 . The a priori guarantees in Proposition 5 depend on the uncertainty set through its robust complexity, ρ(U ), introduced by Bertsimas et al. [10]. We report closed-form expressions of ρ(U ), or at least a lower bound, for commonly used uncertainty set in Table 3.
Proof We consider a robust solution x. With support function notations, x satisfies We denote A the event thatz ≥ 0. Hence, P (A c ) = P z 0 ≤ p 0 , by definition of p 0 . Conditioned on A, the assumptions of Lemma 1 are satisfied, leading to Table 3 Valid lower bound on robust complexity of U , defined as ρ(U ) := min y: y 2 =1 max u∈Z y u, for some common uncertainty sets. Instances denoted by a * are valid under the assumption that the true uncertain parameterũ satisfies ũ ∞ ≤ 1. Results are taken from Table 2 in Bertsimas et al. [10] Uncertainty set Definition Safe approximation of ρ(U ) where the last inequality follows from the fact that, by affine transformation of sub-Gaussian random variables,ũ h ∞ ( A(x)) is itself sub-Gaussian with parameter h ∞ ( A(x)) 2 2 . By definition of the robust complexity, for any vector v, so that we obtain All things considered, h( A(x)z + b(x)) > 0|A) + P A c , and the result follows.

Remark 7
As previously suggested, if the recession function of h is not finite, we can further assume that h(0) = 0 and show that constitutes a valid safe approximation of (1). If x satisfies such a safe approximation, then a similar line of reasoning yields (proof omitted)

A.2.2 Lifted uncertainty set
We now consider the case where Z is a lifted version of an original uncertainty set. Accordingly, we assume the following:
Under this assumption, we can write Corollary 1-2 as: Since z + and z − cannot be simultaneously positive, the constraint above is equivalent to Despite additional technicalities due to the fact that the coordinates of z ± are defined as the positive (resp. negative) parts of sub-Gaussian random variables, we obtain similar guarantees. Proposition 6 recovers exactly Theorem 2 from Bertsimas et al. [9], under the weaker assumption that u j 's are independent sub-Gaussian random variables (instead of independent Gaussian random variables) and for any uncertainty set U . In particular, our proof relies on a novel generalization of the Hanson-Wright inequality to sub-Gaussian random variables, which is of independent interest (see the Online Supplement, Theorem OS.2 and Corollary OS.4). However, the constant α in Proposition 6 is weaker than the ones obtained by Bertsimas et al. [9]. Indeed, while our bound is generic, they provide h-specific values for α, which we could use to tighten our bound as well. Before proving Proposition 6, we establish a technical lemma. Finally, let us prove the Proposition 6.

Remark 9
We recover exactly Theorem 2 of Bertsimas et al. [9], under the weaker assumption that u j 's are independent sub-Gaussian random variables (instead of independent Gaussian random variables) and for any uncertainty set U. However, the generic constant α we obtain is weaker. Indeed, Bertsimas et al. [9] provide an hspecific proof of Lemma 2 and are able to derive tighter values of α in these special cases, which could in turn be used to tighten our bound as well.

B Derivations for the robust control example
C Derivations for the robust geometric optimization example

C.1 Original formulation
The safe approximation at Level 1 with Z box = {z | z ∞ ≤ 1} (instead of z) is: The safe approximation at Level 1 is: The safe approximation at Level 2 is: The safe approximation at Level 3 is: where W ∈ R 3×3 , V = [v 0 v 1 v 2 ] ∈ R 3×L , z ∈ R L×L and w = [w 0 w 1 w 2 ] .