Distributionally Robust Optimization with Decision Dependent Ambiguity Sets

We study decision dependent distributionally robust optimization models, where the ambiguity sets of probability distributions can depend on the decision variables. These models arise in situations with endogenous uncertainty. The developed framework includes two-stage decision dependent distributionally robust stochastic programming as a special case. Decision dependent generalizations of five types of ambiguity sets are considered. These sets are based on bounds on moments, Wasserstein metric, $\phi$-divergence and Kolmogorov-Smirnov test. For the finite support case, we use linear, conic or Lagrangian duality to give reformulations of the models with a finite number of constraints. These reformulations allow solutions of such problems using global optimization techniques. Certain reformulations give rise to non-convex semi-infinite programs. Techniques from global optimization and semi-infinite programming can be used to solve these reformulations.


Introduction
The uncertain characteristics of a system's performance often depend on its design decisions. This type of uncertainty is called endogenous uncertainty. For example in a newsvendor model product demand function may depend on its selling price (Hu et al., 2015). Additional examples of decision problems with endogeneous uncertainty from finance, resource management, process design, and network design are given in Section 2.1. The goal of this paper is to present decision dependent ambiguity frameworks to model problems involving endogenous uncertainty. The main contribution is in showing that the dualization of a certain inner problem continues to be applicable in this more general setting. This dualization has a unique advantage for the problems under consideration. It allows application of algorithms from nonlinear global optimization to solve the resulting reformulations.
Specifically, we study the optimization problems in which the ambiguity set of distributions may depend on the decisions in the following modeling framework: Here x is the vector of decision variables with the feasible set X ⊆ R n , and ξ is the vector of uncertain model parameters, which is defined on a measurable space (Ξ, F); Ξ is the support in R d , and F is a σ-algebra. For a given x, the ambiguity set P(x) of the unknown probability distribution depends on the decision variable x, and P(x) ⊆ P(Ξ, F), where P(Ξ, F) is the set of probability distributions defined on (Ξ, F). The function f (x) is the deterministic part of the objective with no uncertain parameters. Keeping this function in (D 3 RO) allows us to consider decision models involving two-stage decision making. We denote the inner problem max P ∈P(x) E P [h(x, ξ)] as (D 3 RO)-inner. Note that if h(x, ξ) is a recourse function in a two-stage stochastic program, i.e., h(x, ξ) = min y∈R q g(x, y, ξ), where g(x, y, ξ) and ψ i (x, y, ξ) are bounded and continuous functions of x, y and ξ, then (D 3 RO) becomes a two-stage decision dependent distributionally robust stochastic program (TSD 3 SP), which is an important application of (D 3 RO). We assume that the minimization problem in (1) is feasible for any x ∈ X and ξ ∈ Ξ, and h(x, ξ) is finite. In other words, we assume that (D 3 SP) has complete recourse (Birge and Louveaux, 1997). The ambiguity set P(x) can be constructed in many different ways. The reformulations given in this paper are for the decision dependent generalizations of the most common types of ambiguity sets proposed in the distributionally robust optimization literature. The current work on distributional robust optimization, which assumes that the distribution of uncertain parameters is decision independent, is reviewed in Section 2.2. In Section 3 we investigate the reformulation of (D 3 RO) for five different possible specifications of P(x): (i) the ambiguity sets defined by using component-wise moment inequalities and bounds on the scenario probabilities; (ii) ambiguity sets defined by using the mean vector and covariance matrix inequalities; (iii) ambiguity sets defined by using the Wasserstein metric; (iv) ambiguity sets defined using φdivergence; and (v) the ambiguity sets defined using the multi-variate Kolmogorov-Smirnov test. The reformulations are given in Sections 3.1 to 3.5, respectively. The basic concept used in arriving at the reformulations of (D 3 RO)-inner is to use linear programming duality or conic duality as needed in the specific settings. Lagrangian duality is used for the situations where considering the saddle point problem appears more suitable.
We note that the computational complexity of the reformulated problem is not the main motivation of this paper. Our goal is towards studying the modeling frameworks that more realistically represent the underlying phenomenon. Here we also do not focus on developing any new (possibly more efficient) algorithms, as that is left for future studies. In general, we refer to the global optimization techniques for solving the non-convex optimization problems resulting from our reformulations (Floods and Pardalos, 2014). Moreover, to simplify the presentation, we consider the finite support case in the main text. In some cases the results are also possible for ambiguity sets allowing for continuous support. In these cases the semi-infinite programming reformulation of the corresponding models allow the use of a cutting surface, and possible other algorithms, to solve the problems. These results and a well-known cutting surface algorithm for global semi-infinite programming is given in the appendix. The construction of the decision dependent parameters appearing in the specification of the ambiguity set is discussed briefly when making the concluding remarks.

Literature Review
We first review prior work on optimization models and methods for problems where decision influences the problem parameters. We will subsequently provide a literature review of recent developments in the area of distributionally robust optimization that is relevant to the current paper.
In the framework of stochastic optimization, the endogenous uncertainty affects the underlying probability distribution and the scenario tree.  first studied stochastic programming problems with decision dependent scenario distributions, where the distribution is indexed by a Boolean vector. They provided an implicit enumeration algorithm for solving these problems based on a branch-and-bound scheme. This model and the proposed branch-and-bound method was applied to an optimal selection and sequencing of oil well exploration problem under reservoir capacity uncertainty (Jonsbråten, 1998). Ahmed (2000) investigated a class of single stage stochastic programs with discrete candidate probability distributions that are based on Luce's choice axiom (Luce, 1977). The decision affects utility functions of the choices, and hence the probability distribution (Hu and Mehrotra, 2015). These types of problems arise from network design and server selection applications (Ahmed, 2000). It is shown that stochastic programs of this class can be reformulated as 0-1 hyperbolic programs. Viswanath et al. (2004) investigated a two-stage shortest path problem in a stochastic network which arises from disaster relief services. Here the first stage investment decisions can reduce the failure probability of links in the network, and a shortest path is identified based on the post-event network. Held and Woodruff (2005) developed a heuristic algorithm to solve a twostage stochastic network interdiction problem, where the interdictor is the first-stage decision maker whose objective is to maximize the probability that the minimum path length exceeds a certain value after the interdiction. The interdictor's decision changes the network topology and the uncertainty description. The structure of single-stage stochastic programming with a decision dependent probability distribution was also studied by Dupačová (2006). Lee et al. (2012) investigated a newsvendor model under decision dependent uncertainty, where sequential decisions are made after a re-estimation of the demand distribution. They provide conditions under which the estimation and decision process converges. Grossmann (2004, 2006) developed a disjunctive programming reformulation for multistage decision dependent stochastic programs. They investigated this problem with finitely many scenarios, where exogenous and endogenous uncertain parameters are involved. In their models, endogenous parameters are resolved after the operational decisions are made (e.g., a facility is installed or an investment is made). A branch-and-bound algorithm is developed to solve the disjunctive program by branching on the logic variables involved in the disjunctive clauses , and a lower bound is obtained at each node by solving a Lagrangian dual sub-problem (Goel and Grossmann, 2005). More solution strategies for the disjunctive program are given in (Gupta and Grossmann, 2011). This framework is applied to model and solve the offshore oil or gas field infrastructure multi-stage planning problem with uncertainty in estimating parameters that are not immediately realized (Tarhan et al., 2009). The framework is also applied to optimize process network synthesis problems with yield uncertainty that can be reduced by investing in pilot plants (Tarhan and Grossmann, 2008). Tarhan et al. (2013) developed a computational strategy that combines global optimization and outer-approximation to solve multistage nonlinear mixed-integer programs with decision dependent uncertainty.
Decision dependent uncertainty is also considered in the framework of robust optimization by letting the uncertainty set depend on the decision variables. Spacey et al. (2012) studied a problem of minimizing the run time of a computer program by assigning code segments to execution locations where the scheduling of code segment execution depends on the assignment. In robust combinatorial optimization, decision dependent uncertainty set is used to ensure the same relative protection level of all binary decision vectors (Poss, 2013). To model a robust task scheduling problem with uncertainty in the processing time, Vujanic et al. (2016) proposed a decision dependent uncertainty set as a Minkowski sum of some static sets such that the uncertain completion time interval of a task can naturally depend on the starting time of the task. Hu et al. (2015) studied a newsvendor model where the product demand may depend on the selling price. Since the analytical relationship between the demand and the selling price is unknown, they construct a family of decreasing and convex functions from historical data as the functional ambiguity set of the true demand function, and solve the functionally robust optimization problem of this model using a univariate reformulation. Nohadani and Sharma (2016) investigated robust linear programs with decision dependent budget-type uncertainty (RLP-DDU). They showed that this problem is NP-hard even in the case where the uncertainty set is a polyhedron and the decision dependence is affine. RLP-DDU can be reformulated as a mixed integer linear program (MILP), if the decision variables affect uncertain variables by controlling the upper bounds of the uncertain variables. This concept is demonstrated in a robust shortest-path problem, where the uncertainty is resolved progressively when approaching the destination.

Literature review on distributionally robust optimization
Distributionally robust optimization is a generalization of the classical robust optimization framework. It treats uncertain parameters as random variables with an unknown probability distribution. In the DRO framework the unknown distribution is described by an ambiguity set of probability distributions. The DRO framework solves a min-max problem, and identifies an optimal solution by assuming that nature will pick a worst case probability distribution based on the decision maker's choice.
Current approaches to constructing the ambiguity set are based on moment inequalities that specify the set of candidate probability distributions, and statistical distances between a candidate distribution and a reference distribution (see (Bertsimas et al., 2010;Birge and Wets, 1987;Delage and Ye, 2010;Dupačová, 1987;Mehrotra and Papp, 2015;Prékopa, 1995;Shapiro and Ahmed, 2004;Shapiro and Kleywegt, 2002)). Specifically, Bertsimas et al. (2010) studied two-stage stochastic linear programs with fixed recourse and specified the ambiguity set using first and second moments of the uncertain parameters. They showed that this problem can be reformulated into a semidefinite program. Delage and Ye (2010) studied DRO problems with uncertain parameters in the objective, while the ambiguity set is defined by conic inequalities on the mean vector and the covariance matrix. They showed that this DRO model is polynomial-time solvable under the assumption that the objective is convex in the decision variables, and it is concave in the random parameters. The DRO version of the least-square problem, which does not satisfy the concavity assumption in (Delage and Ye, 2010), was studied in (Mehrotra and Zhang, 2014). It was shown to admit a SDP reformulation for this case.
Using statistical distances is another way to define the ambiguity set. A statistical distance measures the difference between two probability distributions, and hence the ambiguity set can be naturally defined as a set of probability distributions that are within a certain distance from a reference distribution. Within various types of statistical distances, the Wasserstein metric is a useful choice in defining the ambiguity set due to its tractability. The study of DRO problems with ambiguity sets defined using the Wasserstein metric and the solution approach are developed in (Gao and Kleywegt, 2016;Esfahan and Kuhn, 2015;Shafieezadeh-Abadeh et al., 2015;Luo and Mehrotra, 2017).
DRO problems with ambiguity sets defined using φ-divergence are investigated in (Ben-Tal et al., 2013;Calafiore, 2007;Jiang and Guan, 2016;Love and Bayraksan, 2016;Wang et al., 2016;Yanıkoglu and Hertog, 2013). Specifically, Calafiore (2007) studied a robust portfolio selection problem using KL divergence to characterize the ambiguity set. Ben-Tal et al. (2013) showed that the robust counterpart of linear optimization problems with uncertainty set defined by φdivergence are tractable for most choices of the function φ. Jiang and Guan (2016) investigated the distributionally robust chance constraint models where the ambiguity set is defined using φ-divergence. They showed that this type of chance constraint is equivalent to the classical chance constraint with a perturbed risk level, and this risk level can be evaluated using a line search algorithm. Yanıkoglu and Hertog (2013) proposed a method to construct a confidence region of an unknown random vector using its samples. The method is based on partitioning the sample space into cells, and approximating the continuous unknown probability distribution with counts in each cell. A subset of cells is selected to form the confidence region, and the cell selection process is formulated as a convex optimization problem with a constraint that bounds the φ-divergence between the unknown probability distribution and the empirical distribution induced by cells.
Many DRO problems are computationally intractable even though the ambiguity set is well defined and convex. Therefore, convex approximation schemes are proposed for certain types of DRO problems. Goh and Sim (2010) studied two-stage distributionally robust linear programs with expectations in the objective and constraints. They developed an approximation framework based on linear decision rules that can reformulate the DRO problems into tractable conic programs if the ambiguity set is conic representable. Wiesemann et al. (2015) proposed a framework for modeling and solving distributionally robust convex optimization problems in which the ambiguity set is conically representable and constraint functions are piecewise affine in both decision variables and random parameters. They showed that the reformulated problem is polynomial-time solvable under a strict nesting condition of the confidence sets. Chen et al. (2017) investigated DRO with the ambiguity set of probability distributions that can be characterized by a tractable conic representable support set with expectation constraints. This ambiguity set leads to a reformulation of DRO with a convex piecewise affine objective function as a tractable conic program (Wiesemann et al., 2015). The conic constraint involved in this ambiguity set can be reformulated as infinitely many constraints induced by elements in the dual cone. Based on this reformulation technique, they proposed an iterative approach for this class of DRO problems by solving a sequence of tractable problems with finitely many constraints. Zhen et al. (2017) used the Fourier-Motzkin elimination technique in the two-stage adjustable robust optimization (ARO) setting to eliminate all or a subset of second stage variables sequentially, and remove some redundant constraints afterwards. In the cases where all second stage variables are eliminated, the two-stage ARO problems become classical robust linear programs, which can be solved to optimality. In the cases where a subset of second stage variables are eliminated, this technique can improve the solutions of two-stage ARO. Bertsimas et al. (2017) developed a tractable framework for solving two-stage adaptive distributionally robust linear optimization problems with second-order conic representable ambiguity sets. It is shown that the two-stage adaptive distributionally robust linear optimization problem can be reformulated as a classical robust optimization problem, and a tractable formulation can be obtained by imposing linear decision rules (LDR) on the second stage variables. They also improved the current LDR techniques applied to adaptive distributionally robust linear optimization by incorporating uncertain parameters in the LDR setting.

Reformulation of (D RO)
We investigate the dual of the inner problem of (D 3 RO) under the assumption that the probability distributions have a finite support on Ξ. For some types of ambiguity sets considered in this section, the reformulation can be generalized to the case where Ξ is a continuous support. The results for these more general cases are given in the appendix. We make the following assumption: Assumption 1. Every P ∈ P(x) has a decision independent finite support Ξ := {ξ k } N k=1 in Ξ, ∀x ∈ X, for a fixed N .
It follows that the candidate probability distributions in P(x) can be represented as a vector w ∈ R N such that w i is the mass assigned to the point ξ i (i ∈ [N ]) and p 1 = 1 for all x ∈ X. Note that the support of the distribution and the number of scenarios are allowed to change with x by forcing certain scenarios to have zero probability. In Sections 3.1-3.5, we derive the dual of (D 3 RO)-inner to reformulate (D 3 RO) with the five different types of ambiguity sets discussed in Section 1. Ambiguity sets are defined by simple measure and moment inequalities (Section 3.1), using bounds on moment constraints (Section 3.2), by Wasserstein metric (Section 3.3), using φ-divergence (Section 3.4), and based on the K-S test (Section 3.5).

Ambiguity sets defined by simple measure and moment inequalities
We consider the moment robust set defined as follows: given measures for a fixed x that are lower and upper bounds of candidate probability measures, and f := [f 1 (ξ), . . . , f m (ξ)] is a vector of moment functions. To ensure that P is a probability distribution, we set l 1 (x) = u 1 (x) = 1 and f 1 (ξ) = 1 in the above definition of P SM (x). For any ξ ∈ Ξ, let ξ := [ξ 1 , . . . , ξ d ]. When stand moments are used, the ith (i ∈ [m]) entry of f has the form: f i (ξ) := (ξ 1 ) k i1 · (ξ 2 ) k i2 · · · (ξ d ) k id , where k ij is a nonnegative integer indicating the power of ξ j for the ith moment function. The framework also allows the use of generalized moments by choosing alternative base functions. Note that the first constraint in (2) is used to ensure that P is a probability distribution. The ambiguity set (2) is a generalization of the set in (Mehrotra and Papp, 2015) for the decision dependent case. The following theorem gives a reformulation of (D 3 RO) with moment robust ambiguity set P SM (x).
Theorem 1. Let Assumption 1 hold. In the ambiguity set (2) . If for any x ∈ X the ambiguity set (2) is nonempty, then the (D 3 RO) problem with the ambiguity set P SM (x) can be reformulated as the following nonlinear program: Proof. Under Assumption 1, the inner problem of (D 3 RO) becomes the following linear program: Based on the hypothesis of the theorem, the above linear program is feasible for any x ∈ X. We take the dual of (4) and combine the dual problem with the outer problem to get the desired reformulation.
A reformulation for the two-stage case is given in the following corollary.
Corollary 1. If h(·, ·) is a recourse function defined in (1), then the (D 3 RO) problem with the ambiguity set P SM (x) can be formulated as follows:

Ambiguity sets defined by bounds on moment constraints
We now consider a moment robust set with multi-variate bounds defined as follows: This set is a generalization of the set used in (Delage and Ye, 2010) for the decision dependent case. Note that in a special case of (6), µ(x) and Q(x) may not depend on x. In this case, the confidence region specified by α(x) and β(x) captures decision dependent ambiguity in estimating the distribution moments. The following theorem gives a reformulation of (D 3 RO) with the ambiguity set P DY (x). This theorem is a generalization of Lemma 1 in (Delage and Ye, 2010) for the finite support case.
Theorem 2. Let Assumption 1 hold. Suppose that Slater's constraint qualification conditions are satisfied, i.e., for any x ∈ X, there exist a vector p ′ : . Then the (D 3 RO) problem with the ambiguity set P M B (x) can be reformulated as: where N SOC is a second order cone defined as N SOC := y := [y 0 , y 1 , . . . , Proof. Under Assumption 1, the ambiguity set becomes: Then the inner problem of (D 3 RO) can be formulated as: The Lagrangian function of (9) using the dual variables indicated in (9) has the following form: (10) Applying Sion's minimax theorem, the inner problem (9) has the form: Since Slater's constraint qualification conditions are satisfied, strong duality holds, and hence (11) and (9) have the same optimal value. Substituting the Lagrangian function (10) into (11), and solving the inner maximization problem over p and τ , we reformulate the dual problem as: Substituting (12) in (D 3 RO), we obtain (7). (1), then the (D 3 RO) problem with the ambiguity set P M B (x) can be reformulated as follows:

Ambiguity sets defined by Wasserstein metric
Instead of using moment based definitions of the ambiguity set, we may define this set using a statistical distance, such as the Wasserstein metric. We now study the (D 3 RO) problem with a decision dependent ambiguity set defined using the L 1 -Wasserstein metric as follows: where P 0 is a nominal probability distribution, and W(·, ·) : P(Ξ, F) × P(Ξ, F) → R is the L 1 -Wasserstein metric defined in (Givens and Shortt, 1984): where S(P 1 , P 2 ) := K ∈ P(Ξ × Ξ, F × F) : K(A × Ξ) = P 1 (A), K(Ξ × A) = P 2 (A), ∀A ∈ F is the set of all joint probability distributions whose marginals are P 1 and P 2 , and · is an arbitrary norm defined on R d . The ambiguity set (14) is a generalization of the one considered in (Gao and Kleywegt, 2016;Esfahan and Kuhn, 2015;Shafieezadeh-Abadeh et al., 2015;Luo and Mehrotra, 2017) for the decision dependent case. As a special case of (14), under Assumption 1, P W (x) is written as: wherep is a given empirical probability distribution on Ξ, and the Wasserstein metric can be simplified as W(p,p) = min The following theorem gives a reformulation of (D 3 RO) for the ambiguity set (16).
Theorem 3. Let Assumption 1 hold. In the ambiguity set (14), let the reference distribution P 0 be: P 0 = N i=1p i δ ξ i , then the (D 3 RO) problem with the ambiguity set (16) can be reformulated as: Proof. Since Ξ is finite, the (D 3 RO)-inner problem with ambiguity set P W (x) can be formulated as the following linear program: where w is a joint probability distribution with two marginal distributions given by p andp, respectively. The dual of the above linear program is: After substituting (19) into (D 3 RO), we obtain the desired reformulation (17).
A reformulation for the two-stage case is given in the following corollary.
Corollary 3. If h(·, ·) is a recourse function defined in (1), then the (D 3 RO) problem with the ambiguity set P W (x) can be reformulated as follows: The reformulation (17) of (D 3 RO) with the ambiguity set defined using the Wasserstein metric can be generalized for the case where the support Ξ is continuous. The details of this generalization are given in Appendix A.

Ambiguity sets defined using φ-divergence
We now study the (D 3 RO) problem using a decision dependent ambiguity set defined using the notion of φ-divergence: where D φ (P ||P 0 ) = Ω φ dP dP 0 dP 0 , and φ is a non-negative and convex function. This type of ambiguity set is a generalization of the one considered in (Ben-Tal et al., 2013;Calafiore, 2007;Jiang and Guan, 2015;Love and Bayraksan, 2016;Wang et al., 2016;Yanıkoglu and Hertog, 2013) for the decision dependent case. Under Assumption 1, and using P 0 := N i=1p i δ ξ i as the nominal distribution, the ambiguity set (21) is written as: Two reformulations of (D 3 RO) with ambiguity set P φ (x) are given in the following theorem.
Theorem 4. Let Assumption 1 hold, and φ be a non-negative convex function. Assume that the following Slater condition is satisfied for every x ∈ X: there exist a p ∈ R N such that p i > 0, . Then (D 3 RO) with the ambiguity set P φ (x) can be reformulated as the following semi-infinite program: where S = {p ∈ R N : Alternatively, (23) also has the reformulation: Proof. The (D 3 RO) problem can be written as min the optimal objective of the following optimization problem: Since φ is convex, (25) is a convex program with respect to the decision variable p. For a fixed x ∈ X, the Lagrangian dual of (25) is written as follows: α, β, λ are the Lagrangian multipliers. Since Slater's condition is satisfied for any x ∈ X, strong duality holds. The inner maximization problem of (26) is equivalent to {max z, s.t. z ≥ L(p; α, β, λ) ∀p ∈ S}, which gives the reformulation (23).
Note that the inner problem of (26) is an unconstrained convex optimization problem. Using the the KKT optimality conditions we have: After substituting the expression of the Lagrangian in (26), adding the optimality condition and using strong duality, we obtain the reformulation given in (24).
A reformulation for the two-stage stochastic optimization case is given in the following corollary.
Corollary 4. If h(·, ·) is a recourse function defined in (1), then the (D 3 RO) problem with the ambiguity set P φ (x) can be reformulated as follows: where S = {p ∈ R N :

Ambiguity sets defined based on the Kolmogorov-Smirnov test
The K-S distance has been used by Bertsimas et al. (2013) in defining an ambiguity set in datadriven robust optimization models. For two univariate probability distributions P 1 and P 2 , let F 1 and F 2 be their cumulative distribution functions. The Kolmogorov-Smirnov (KS) distance is defined as: We now study the (D 3 RO) problem with the ambiguity set defined based on the KS-distance. Note that although (29) is defined for an univariate random variable, this definition can be directly generalized for the probability distribution of a random vector with a finite support.
Specifically, under Assumption 1, let P 0 = N i=1p i δ ξ i be an empirical probability distribution. The KS-distance between a discrete probability distribution P = N i=1 p i δ ξ i and P 0 can be written as: The decision dependent ambiguity set of probability distributions is constructed using the KSdistance as follows: A reformulation of the (D 3 RO) problem is given in the following theorem.
Theorem 5. Let Assumption 1 hold. The (D 3 RO) problem with the ambiguity set (31) can be reformulated as: Proof. The (D 3 RO) problem with the ambiguity set (31) can be written as: Note that the inner problem of (33) can be reformulated as the following linear program: After taking the dual of the above linear program and combining it with the outer problem, we obtain (32).
A reformulation for the two-stage stochastic optimization case is given in the following corollary.
Corollary 5. If h(·, ·) is a recourse function defined in (1), then the (D 3 RO) problem with the ambiguity set P KS (x) can be reformulated as follows: The reformulation (32) of (D 3 RO) with the ambiguity set defined using the K-S distance can be generalized for the case where the support Ξ is continuous. The details of this generalization are given in Appendix B.

Concluding Remarks
We have established a framework for reformulating the distributionally robust optimization problems with important types of decision dependent ambiguity sets. These ambiguity sets contain decision dependent parameters. For example, the moment robust ambiguity set (6) contains parameters α(x), β(x), µ(x) and Q(x), which are functions of the decision x. We now briefly discuss the estimation of these functions using a data-driven approach. Ambiguity sets for ξ under an arbitrary decision x can be constructed if such information is available from past decisions, or if it is possible for us to experiment with trial decisions {x i } k i=1 and collect samples of the random vector ξ under each decision x i . From these samples we can establish the analytical relation between the parameters in defining the ambiguity set and the decision using statistical learning models. We can subsequently extrapolate this analytical relation to a general decision x to obtain an empirical decision dependent ambiguity set description.
The goal of this paper was to show that it is possible to extend the dual formulations in DRO even when the ambiguity sets are decision dependent. The analysis suggests that the situations for which DRO models admit a dual reformulation also allow for dual reformulations for the decision dependent case. The reformulated models are generally non-convex optimization problems requiring further investigation towards developing efficient algorithms for the specific situations. The non-convex optimization problems may have further structure when additional assumptions on decision dependent parameters and the feasible set X are imposed. This structure may be exploited for further refined reformulations and the development of efficient algorithms.
Appendix A Reformulation of (D 3 RO) with Wasserstein metric and continuous support of random parameters We study the reformulation of (D 3 RO) with Wasserstein metric and continuous support of random parameters. In contrast to the case studied in Section 3.3, we do not assume that Assumption 1 holds. As a consequence, the support Ξ of the decision dependent random parameters ξ can be continuous. Suppose at a decision x 0 , we have observed N samples of the random variable ξ, written as {ξ i } N i=1 . We construct an empirical distribution as: Setting the empirical distribution as the center of the Wasserstein ball, we can define the decision dependent ambiguity set as: The reformulation of (D 3 RO) with the ambiguity set P W C (x) is given by the following theorem. Theorem 6. The (D 3 RO) with the ambiguity set P W C (x) can be reformulated as the following semi-infinite program: Proof. From Theorem 3.6 of (Luo and Mehrotra, 2017), the inner problem of D 3 RO with the ambiguity set defined by the Wasserstein metric (36) is equivalent to the following conic linear program: where Ξ ′ := Ξ \ {ξ i } N , and µ 0 denotes that µ is a positive measure. Based on Theorem 3.7 of (Luo and Mehrotra, 2017), we can apply the conic duality theory from (Shapiro, 2001) to (38), and obtain the following dual formulation of (38): After combining (39) with the outer minimization problem over x, we obtain the desired reformulation (37).
Appendix B Reformulation of (D 3 RO) with K-S distance and continuous support of the random parameters We now investigate the reformulation of (D 3 RO) with the ambiguity set defined by K-S distance where the support Ξ is not finite. We assume that Ξ is contained in a hyper-rectangle [a, b] The definition of K-S distance (29) can be generalized for two multivariate cumulative distribution functions as follows: where s = [s 1 , . . . , s d ] and the cumulative function F i (i = 1, 2) is defined as: Suppose for a x 0 ∈ X, we have observed N samples of the random vector ξ, written as {ξ i } N i=1 . We define the empirical distribution as P 0 := N i=1 1 N δ ξ i and denote the cumulative distribution function of P 0 as F 0 . Let P([a, b], B) denote the set of probability distributions on [a, b] with the Borel sigma algebra B. For any P ∈ P([a, b], B), let F P denote the cumulative distribution function of P . The decision dependent ambiguity set based on K-S distance can be constructed as: We now reformulate the ambiguity set (41) k < b k . Let us divide each interval [a k , b k ] into N + 1 sub-intervals as: k , b k ], and create an N -dimensional grid based on the sub-intervals for each dimension k to partition [a, b] into (N + 1) d sub-rectangular cells. Based on this partition and using the convention that ξ , the reference CDF can be written as: where N j 1 j 2 ...j d is the number of observed samples within the hyper-rectangle [a 1 , ξ For simplicity of notations, we let I j 1 j 2 ...j d := I 1 j 1 × I 2 j 2 × · · · × I d j d , then the ambiguity set (41) can be reformulated as (43) Reformulation of the (D 3 RO) with the ambiguity set (41) is given in the following theorem: a, b] for any x ∈ X, the (D 3 RO) with the ambiguity set (41) can be reformulated as the following semi-infinite program: Proof. Note that the probability P (s ∈ I j 1 j 2 ...j d ) in (43) can be written as the expectation of the indicator function 1 I j 1 j 2 ...j d (s) with respect to P . The inner problem of (D 3 RO) can be reformulated as the following conic linear program: Applying conic duality (Shapiro, 2001) to (45) we obtain the following dual problem of (45): Note that by partitioning the range of the vector s, the first constraint of (46) can be reformulated as the following semi-infinite constraints: λ j 1 ...j d + λ j 1 ...j d + γ ≥ h(x, s) ∀s ∈ I j 1 ...j d , ∀j r ∈ {0, 1, . . . , N }, r ∈ [d].
Since h(x, s) is continuous in s ∈ [a, b] for any x ∈ X, we can replace I j 1 ...j d with the closure cl(I j 1 ...j d ) in the above semi-infinite constraints. Then by Proposition 2.8(iii) of (Shapiro, 2001), the optimal objective of (45) equals the optimal objective of (46). After combining (46) with the outer minimization problem over x ∈ X, we can reformulate (D 3 RO) into (44).
infinite program: s.t. g(x, t) ≤ 0, ∀t ∈ T, x ∈ X, (gen-SIP) where X ⊆ R k 1 and T ⊆ R k 2 × Z k 3 , allowing that T may be defined as a mixed-integer set. The cutting-surface algorithm is given in Algorithm 1. The idea of the cutting-surface algorithm is to solve a relaxation problem (or a master problem) of the semi-infinite program at each iteration, where the relaxation problem has a finite number of constraints. An additional constraint that is violated by the solution of the current relaxation problem is added to the current set of constraints for the relaxation problem in the next iteration. Algorithm 1 is based on an oracle to solve the master problem where T ′ is a finite subset of T , and an oracle to solve the separation problem max t∈T g(x, t), for anyx ∈ X. It outputs an ε-optimal solution to (gen-SIP), where the accuracy of a solution to (gen-SIP) is defined in Definition 1. Theorem 8 shows that Algorithm 1 terminates in finitely many iterations if X × T is compact and g(x, t) is continuous on X × T .
Definition 1. For a general semi-infinite program in the form of (gen-SIP), a point x 0 ∈ X is an ε-feasible solution of (gen-SIP) if max t∈T g(x 0 , t) ≤ ε. A point x 0 ∈ X is an ε-optimal solution of (gen-SIP) if x 0 is an ε-feasible solution of (gen-SIP) and f (x 0 ) ≤ Val(gen-SIP).
Algorithm 1 A cutting-surface algorithm (modified exchange algorithm) to solve (gen-SIP).
Prerequisites: An oracle that generates the optimal solution to the master problem (48) and an oracle that generates an ε-optimal solution to the separation problem (49). Output: An ε-optimal solution of (gen-SIP).
Step 2 Determine an optimal solution x k of the problem min x∈X {f (x) : s.t. g(x, t) ≤ 0, t ∈ T k }.
Step 3 Determine a ε 2 -optimal solution t k+1 of the problem max t∈T g(x k , t). If g(x k , t k+1 ) ≤ ε 2 , stop and return x k ; otherwise let T k+1 ← T k ∪ {t k+1 }, k ← k + 1 and go to Step 2 Theorem 8 (Theorem 7.2 in (Hettich and Kortanek, 1993)). If X × T is compact, and g(x, t) is continuous on X × T , then Algorithm 1 terminates in finitely many iterations and returns an ε-optimal solution of (gen-SIP).
We note that the oracle problem (49) in the cutting-surface algorithm is simply a function evaluation problem for the decision dependent but finite support case. Therefore, in this case the algorithm can be adapted by sequentially adding cuts as constraints based on violated inequalities are identified.