Practicable robust stochastic optimization under divergence measures with an application to equitable humanitarian response planning

We seek to provide practicable approximations of the two-stage robust stochastic optimization model when its ambiguity set is constructed with an f-divergence radius. These models are known to be numerically challenging to various degrees, depending on the choice of the f-divergence function. The numerical challenges are even more pronounced under mixed-integer first-stage decisions. In this paper, we propose novel divergence functions that produce practicable robust counterparts, while maintaining versatility in modeling diverse ambiguity aversions. Our functions yield robust counterparts that have comparable numerical difficulties to their nominal problems. We also propose ways to use our divergences to mimic existing f-divergences without affecting the practicability. We implement our models in a realistic location-allocation model for humanitarian operations in Brazil. Our humanitarian model optimizes an effectiveness-equity trade-off, defined with a new utility function and a Gini mean difference coefficient. With the case study, we showcase (1) the significant improvement in practicability of the robust stochastic optimization counterparts with our proposed divergence functions compared to existing f-divergences, (2) the greater equity of humanitarian response that the objective function enforces and (3) the greater robustness to variations in probability estimations of the resulting plans when ambiguity is considered.

the additional characteristic of being associated with a discrete set of events. In several decision-making problems, uncertainty is inherently event-based. For instance, disasters, such as earthquakes and tsunamis, are event-based occurrences since historical data come from discrete incidents. In such cases, modeling ambiguity without event-wise considerations is fundamentally flawed because it avoids the correlations that exist between data from the same event. In addition, omitting event-wise associations in estimating descriptive statistics or statistical distances leads to the underlying mis-characterization that all events follow the same stochastic process.
In robust stochastic optimization, ambiguity is modeled via uncertainties in scenario/event probabilities. In the literature, ambiguity sets for robust stochastic optimization are most famously constructed by using an f-divergence as statistical distance measurement between scenario probabilities, and allowing probability variations such that their resulting total f-divergence does not exceed a pre-specified radius. From a decision-making standpoint, the choice of which f-divergence to employ is influenced by 1) the decision maker's attitude towards ambiguity and 2) the limit behaviours of f-divergence functions. A conservative decision maker, for instance, will prefer a pointwise smaller f-divergence function since it allows larger probability deviations within the same radius. The behaviours of f-divergence functions at 0 and ∞ determine whether scenarios can be "suppressed" (made impossible) or "popped" (made possible) (Love and Bayraksan 2014). While f-divergence is the more widely used distance measure for scenario-based robust stochastic optimization, the Wasserstein distance is favoured on situations where data is not scenariobased. Several recent works have proposed scenario-based robust stochastic optimization approaches under the Wasserstein distance (Gao and Kleywegt 2022;Xie 2020), but these appear to not be exactly comparable with f-divergence functions. f-divergence balls include only distributions that are absolutely continuous with respect to the nominal distribution. They thus do not accommodate distributions whose supports are bigger than the nominal distribution. This is usually used to motivate the use of the Wasserstein distance. In our case, however, this property of f-divergences is actually beneficial. In situations where data is available on extreme events, distributions with supports outside of the nominal distribution's can lead to over-conservative planning. In particular, this paper deals with application in a disaster response problem in Brazil, where the dataset contains the 2011 Rio de Janeiro floods and mudslides, which is commonly recognized as the worst weather-related natural disaster in Brazilian history. Extending the support beyond a worst-ever disaster situation is not desirable and we therefore omit the Wasserstein distance from consideration in this work.
The main issue with robust stochastic optimization under f-divergence is that the practicability of the models vary significantly depending on the choice of the f-divergence function. In this paper, we borrow the terminology in Chen et al. (2020), which defines a practicable robust optimization model as one where the computational difficulties of solving the robust counterpart is comparable to that of solving the nominal problem. When first-stage decisions are real-valued in a twostage linear robust optimization setting, the variation distance yields a linear programming model whereas the Kullback-Leibler function produces a general convex optimization model with exponential terms which, although admitting a self-concordant barrier and being polynomially tractable, is numerically challenging. When first-stage decisions are mixed-integer, the numerical challenge is even more pronounced. For instance, the Kullback-Leibler function produces a mixed-integer model with exponential terms, which is hard to solve to global optimality with stateof-the-art optimization software. Furthermore, many decision-making problems that are posed using event-wise ambiguity require mixed-integer first-stage decisions, such as location-allocation for disaster response.
Conceptually, different f-divergences, such as Kullback-Leibler, Burg entropy, and Hellinger distance, are used in robust stochastic optimization because they can capture different attitudes towards ambiguity, or ambiguity aversions. These functions possess properties that decision makers with certain ambiguity aversions would naturally prefer. For instance, the Kullback-Leibler function has a higher secondorder derivative than the Hellinger distance, which means that if a decision maker is more averse to large deviations from the nominal distribution, he/she would prefer the Kullback-Leibler over the Hellinger distance. That said, no one can claim that any of these f-divergences characterize exactly his/her ambiguity aversion, because the latter is not an exactly quantifiable concept.
Given the aforementioned practicability issues in robust stochastic optimization and the understanding that ambiguity aversion is not an exactly quantifiable concept, this paper develops novel divergences that 1) are versatile enough to closely mimic ambiguity aversions conceived by every existing f-divergence in the literature and 2) maintain a high level of practicability irrespective of the existing f-divergence they approximate. In addition, our robust stochastic optimization framework remains practicable when first-stage decisions are mixed-integer. In particular, we devise a divergence function based on the infimal convolution of weighted modified variation distances that yields a linear programming robust stochastic optimization counterpart when first-stage decisions are real-valued. Moreover, we derive a piecewise linear divergence that, similar to the infimal convolution, produces a linear programming robust stochastic optimization counterpart, but with greater versatility in modeling ambiguity. Both divergence functions yield mixed-integer programming robust stochastic optimization counterparts when first-stage decisions are mixed-integer. Note that by mimicking f-divergences, we mean that our divergences are able to closely approximate the f-divergence functions. We make no claim about whether the resulting approximations maintain the information theoretic properties of existing f-divergences. The scope of this paper is limited to reproducing the optimal decisions from robust stochastic optimization under f-divergences, rather than investigating statistical or information theoretic properties, which, to our knowledge, have not been linked to the decision-making process anywhere in the literature.
However, these functions are limited in that they are non-differentiable. Differentiability is a desirable quality of divergences in some cases, especially when these are used as surrogate loss functions. To create versatile and smooth divergences, we combine the concepts of piecewise linearity and infimal convolution via the Moreau-Yosida regularization. The resulting robust stochastic optimization counterpart is mixed-integer second-order conic at worst.
Our distinct contributions in this paper are summarized as follows: 1. We introduce three divergence functions, based on infimal convolution, piecewise linearity, and Moreau-Yosida regularization. The developed functions offer greater versatility in modeling ambiguity aversions while yielding more practicable robust stochastic optimization counterparts than existing divergence functions. All our resulting formulations can be efficiently solved by commercial optimization packages. 2. We show ways in which our divergence functions can be used to mimic f-divergences without losing the superior practicability. This is achieved via least-square fitting for the infimal convolution and the piecewise linear divergences and with a combination of least-square fitting and Monte Carlo integration for the Regularized function. We show that these methods yield closed-form functional approximations of existing f-divergence functions. 3. We apply our practicable models to a realistic humanitarian logistics problem that strives to devise an effective and equitable prepositioning relief supply chain based on real Brazilian natural hazards. The academic literature on scenario-based stochastic programming approaches for humanitarian logistics problems has massively increased over the past decades (Sabbaghtorkan et al. 2020). However, most studies still rely on unwarranted assumptions about the scenarios in which the most unrealistic one is perhaps the presumption that the underlying distribution of past disasters is deterministic (Grass and Fischer 2016). This may result in optimal solution prescriptions that are biased towards an inaccurate probability estimation that favors extreme (bad) scenarios and ends up pushing humanitarian assistance in a direction not delivering broader benefits, which is particularly problematic in situations of scarce resources and when people's vulnerability comes into play. Therefore, we propose the first location-allocation humanitarian logistics planning model under ambiguity. Ambiguity has been modelled in other aspects of humanitarian logistics, such as last mile distribution (Zhang et al. 2020) and relief prepositioning . Our model contains an objective function defined with a utility function and a Gini coefficient, that not only models the effectiveness-equity trade-off in humanitarian operations but also prioritizes allocating scarce resources to more vulnerable areas, based on the well-known FGT poverty measure (Foster et al. 1984). In this case study, we showcase the practicability of our robust stochastic optimization counterparts, the impact of considering ambiguity, and the advantages of our effectiveness-equitable objective function.
The remainder of the paper is organized as follows. Section 2 describes the preliminary concepts and notations of the general robust stochastic optimization problem with f-divergences. Section 3 develops the first proposed divergence based on the infimal convolution of modified variation distances. Section 4 develops the piecewise linear divergence. Section 5 presents the Moreau-Yosida regularization of our piecewise linear divergence. Section 6 implements the proposed models in a realistic case study of Brazilian disasters. Section 7 summarizes the work and discusses some future research directions. The appendices to this article contain all formal proofs of theoretical results.

Robust stochastic optimization framework
Let c ∈ ℝ N 1 and b ∈ ℝ M 1 be vectors and A ∈ ℝ M 1 ×N 1 be a matrix, all of parameters of the first-stage model. The notations d ∈ ℝ N 2 , D ∈ ℝ M 2 ×N 1 , E ∈ ℝ M 2 ×N 2 , and f ∈ ℝ M 2 represent parameters of the second-stage model, some of which are dependent on scenario . Given S > 1 and decision variables x and y which are N 1 -dimensional (consisting of N ′ 1 integers and N ′′ 1 continuous, where N � 1 + N �� 1 = N 1 ) and N 2 -dimensional vectors, respectively, and [S] is the set of running indices from 1 to S, our robust stochastic optimization framework can be posed as the following problem: where P is an ambiguity set used to characterize uncertainties in scenario probabilities. This work positions the framework within the practical context of humanitarian logistics planning. A wide array of preparedness/response planning models follow the robust stochastic optimization paradigm in Model (P1), such as locationallocation models, location-prepositioning-allocation models, last-mile distribution models and mass casualty response planning models. The paradigm's main enabling feature is that it allows two-stage scenario-based planning when scenario probabilities are hard to estimate, and this being without restriction on continuity/integrality of first-stage decisions. Humanitarian logistics problems typically involve decisions made before (preparedness) and after (response) uncertainty realizations, which suitwell two-stage modeling. Also, decisions made in the preparedness phase often involve integral components such as facility location, as well as real-valued components such as inventory prepositioning. In addition, these problems rely on scenariobased data, such as disaster occurrences, for which probabilities are unknown or impossible to estimate (this is especially true for low-frequency high-impact events).
We model uncertainties in scenario probabilities through deviations from a given nominal distribution, such that the f-divergence of these deviations does not exceed a given threshold Ξ . The f-divergence between two probability vectors p and q is defined as where (z) is a convex f-divergence function on z ≥ 0 . The limit definitions of when q = 0 are 0 (0∕0) = 0 and 0 (a∕0) = a lim z→∞ (z)∕z . The f-divergence function has two main properties: 1) It is convex and 2) (z) > 0 for every z ≥ 0 , except at z = 1 where (1) = 0 . Our ambiguity set constructed using f-divergence is In practice, the model requires almost the same inputs from the decision maker as a classical two-stage stochastic programming model. From the uncertainty characterization perspective, the only additional requirements are an f-divergence function and a parameter Ξ to represent the decision maker's aversion to deviations from the initial probability estimates. If the decision maker has reasonable certainty about their probability estimates, they would set Ξ to be low (remember that (1) = 0 and ∑ ∈[S] q = 1 , which means that Ξ = 0 would ensure zero deviations from the initial probability estimates). Else, Ξ would be set to a high value, where the highest value can be obtained from the optimization model Throughout this paper, all proposed models will satisfy the following two conditions, borrowing terminologies from Hanasusanto and Kuhn (2018): Definition 1 (Relatively complete recourse). Model (P1) has relatively complete recourse when, for any given Definition 2 (Sufficiently expensive recourse). Model (P1) has sufficiently expensive recourse when, for any given ∈ [S] , the dual of Q(x, ) is feasible.
If Model (P1) has both relatively complete recourse and sufficiently expensive recourse, Q(x, ) is feasible and finite. These two conditions are not overly restrictive and are satisfied by many problems or can be enforced by induced constraints. For instance, in humanitarian problems where demands have to be satisfied, inclusion of unmet demands with finite cost penalties can ensure feasibility and finiteness of Q(x, ) . Another possibility is to include first-stage constraints that ensure enough facilities (propositioned inventory) are located (deployed) to satisfy all demands. The following theorem, adapted from Bayraksan and Love (2015), shows that Model (P1) can be reformulated in terms of the so-called convex conjugate of . The convex conjugate, denoted by the function * ∶ ℝ → ℝ ∪ {∞} , is defined as

Theorem 1 (Bayraksan and Love 2015) Model (P1) is equivalent to
Depending on the f-divergence used, the robust counterpart in Theorem 1 will have different levels of practicability. For example, with the Hellinger distance, the model is mixed-integer conic quadratic, whereas the Kullback-Leibler divergence leads to a general mixed-integer optimization model containing exponential terms. The only f-divergence known in the literature that yields a mixed-integer robust counterpart for Model (P1) is the variation distance. However, the variation distance is linear and lack versatility in portraying ambiguity preferences. For reference, we provide Table 1 that lists some popular examples of f-divergences for cases where closed-form conjugates exist. Humanitarian logistics problems are varied, with contrasting levels of past data consistency. For instance, disaster impacts on a very disaster-prone area with stable infrastructure would likely resemble impacts of similar past disasters, which would make probabilities reliably estimable from past data. In other cases, either events have rare features (e.g. COVID-19 pandemic), or infrastructures change significantly, meaning that disaster impacts vary unpredictably, thereby making probability estimates unreliable or unrealistic. The contrasting , + , − ≥ 0.
Mixed-integer levels of past data consistency make it necessary to offer a wide variety of functions to model different ambiguity preferences and thus capture a wide range of realities via robust stochastic modeling. Moreover, these models tend to be large-scale, making it necessary to find ways to solve them within reasonable times.

A divergence construction using infimal convolution
While possessing desirable properties such as convexity and versatility in modeling diverse ambiguity aversions, f-divergences are never exact measures of ambiguity aversion because the latter is an inherently subjective concept. Even in cases of consistent disaster impacts on stable infrastructure, there still exists a personal element in ambiguity aversion that is attached to each decision maker's preferences (political ideologies of different governments, modus operandi of different humanitarian organizations) and is not dependent on the impacted area per se. In this section and subsequent ones, we seek to propose new divergence functions that offer greater versatility than f-divergences, mainly because 1) they can be built based on the ambiguity preferences of the decision maker and 2) they can mimic existing f-divergences while being more practicable. An infimal convolution of D closed convex functions 1 , … , D is defined by The following theorem shows that under specific modifications, the infimal convolution of the D modified variation distances is itself a divergence function, which we define to be a function that is convex and positive everywhere on the domain, except at z = 1 , where it is equal to zero. This definition follows that of general statistical divergences, with the addition of convexity, which is essential in optimization modeling. The theorem also shows that under further conditions, the infimal convolution is equivalent to the variation distance.
|} is a divergence measure, irrespective of the choice of positive . The decision maker is therefore free to choose values of such that the resulting infimal convolution matches his/her ambiguity aversion. Independent of the choice, the solvable form of Model (P1) is a mixed-integer programming model, as formally shown in the next theorem.
, Model (P1) is equivalent to the following mixed-integer programming model: Whilst different values of yield different divergence functions, the practicability of the robust stochastic optimization counterpart is unchanged. The vector can thus be used to tailor the divergence function to match the decision maker's ambiguity aversion. In the absence of a clear preference, can be computed by using an existing f-divergence function as a reference. We hereafter provide decision makers with a method to estimate based on how well it makes the infimal convolution of modified variation distances (ICV) fit an existing f-divergence function of choice. This helps reduce the practical arbitrariness involved in choosing and thus, in finding an appropriate divergence function. In addition, our method does not affect the practicability of model implementation, since it allows to be computed without solving any auxiliary model. With the resulting value, decision makers can easily plot and visually assess the resulting divergence function. The next theorem shows that if the ICV is a leastsquare fit of an existing f-divergence function, has a closed-form optimal value. To prove that, we first define the sum of square of differences (SSD) for the ICV as where H = max ∈[S] {q q } and q is an upper bound on the probability of scenario .

Theorem 4
The minimizer * of SSD is such that Interestingly, a further practicability improvement can be achieved with the least-square ICV (LS-ICV), i.e., the ICV such that the value of minimizes the SSD. This is because of the following lemma, which proves that the LS-ICV of any f-divergence function is simply a weighted variation distance that is independent of D . This follows from the fact that under the least-square weights, the minimum SSD is independent of D.
Because of Lemma 5, the solvable form of Model (P1) under the LS-ICV is independent of the value of D , which means that it admits fewer variables and constraints than Model (P2), as shown by the following theorem.
equivalent to the following mixed-integer programming model: Model (P1) is therefore shown to be a mixed-integer programming model under the LS-ICV, with fewer variables and constraints than under the general ICV. This means that when an existing f-divergence function is used as a reference to estimate optimal weights in the infimal convolution using the least-square method, the practicability improves. In a way, by offering decision makers a method to tailor the new divergence function according to existing ones, we have also offered them a way to improve the solvability of the resulting decision-making problem. Equipped with this interesting set of results, in subsequent sections, we provide decision makers with new divergence functions that are even more versatile than the LS-ICV and that can capture the behaviors of existing f-divergence functions even more closely, while keeping decision-making models easily solvable and least-square estimates of parameters computable in closed forms.

A piecewise linear divergence
To offer further improve versatility in ambiguity aversion modeling, with little sacrifice to practicability, we propose a piecewise linear divergence. This divergence allows the decision maker to portray different ambiguity aversions in different ranges of probability deviations. The piecewise linear divergence with P pieces is defined by max p∈ [P] {a p z − b p } , such that (1) = 0 and (z) > 0 for all other z. In the same vein as in the infimal convolution case, the decision maker can choose values of a and b that better capture his/her ambiguity aversion. The difference here is that the choices can be varied according to the probability deviation range, which means that the decision maker is able to tailor his/her attitude towards ambiguity depending on ranges of probabilities and on how far from nominal the probability is. The piecewise linear divergence has the added advantage of being able to mimic more closely the behaviours of existing f-divergences, while offering better practicability. Piecewise linear fitting is a challenging task that is generally solved heuristically. There exists a whole body of literature on piecewise linear fitting and the main methods proposed rely on mixed-integer programming approaches (Rebennack and Krasko 2020;Toriello and Vielma 2012) or non-convex real-valued approaches (Rebennack and Kallrath 2015). In our case, such methods would involve solving additional models (non-convex models may even require additional algorithmic developments), which would reduce the practicability of our robust stochastic optimization framework. Indeed, this work focuses on producing robust counterparts with the same level of numerical difficulty as the nominal problem,i.e., practicable robust counterparts. To ensure easily computable piecewise linear approximations, we impose restrictions on the positions of breakpoints. Moreover, we show in the following proposition that a piecewise linear divergence that is a piecewise leastsquare fit of an existing f-divergence, which we term LS-PL, can be expressed as a closed-form recursive function that does not require the solution of additional models. We also show in Theorem 10 that our function can be dualized to produce a mixed-integer programming robust counterpart.
Proposition 7 Suppose a piecewise linear divergence is constructed such that fitting is done separately for ranges z ≤ 1 and 1 ≤ z ≤ H . If L and U pieces with equally spaced intersection points are used for z ≤ 1 and 1 ≤ z ≤ H , respectively, the piecewise linear divergence that fits an existing divergence (z) , such that pieces are sequentially least-square fitted starting at z = 1 , is given by This shows that under mild conditions, to obtain the least-square piecewise linear fit of an existing f-divergence function, we need not solve the complicated definite integral in the SSD formula. In addition, the SSD of the fit can be calculated in closed form, as indicated in the following corollary.

Corollary 8 The minimum SSD when the piecewise linear function G(z) is used to fit an existing divergence (z) is
where and and f a The following theorem states that the LS-PL is always at least as good as the LS-ICV in mimicking the behaviours of existing f-divergences.

Theorem 9 The SSD of the LS-PL is always less than or equal to that of the LS-ICV.
Finally, we show that, similar to the LS-ICV, the LS-PL also yields a mixed-integer programming model ing mixed-integer programming model: Therefore, the piecewise linear divergence is at least as good as the infimal convolution divergence at replicating ambiguity aversions modeled through existing f-divergences, while yielding a solvable form for Model (P2) that is close in practicability to the solvable form under the infimal convolution divergence. Notice that a * and b * are calculated a priori and could have easily been replaced with pre-defined values if the decision maker wishes to model his/her ambiguity aversion differently from existing f-divergences.

Convolution with Moreau-Yosida regularization
One issue in the divergences proposed so far is the lack of differentiability. Our infimal convolution divergence is not differentiable at the point where it is equal to zero and the piecewise linear divergence is not differentiable at the points of intersection of pieces. Differentiability of f-divergences is especially required in the machine learning context (Bartlett et al. 2006) where these divergences are used as surrogate loss functions. In the humanitarian logistics context, differentiability allows decision makers to understand ambiguity aversion based on rates of change of divergence functions with probability deviations. In a way, the divergence function value is a penalty for deviations from nominal probabilities. For instance, if a disaster is consistent with historical data and the probability can be estimated with high level of confidence, decision makers would choose divergence functions with high rates of change, so as to prevent large probability deviations. In the piecewise linear case, decision makers know the rates of change in intervals between intersecting pieces, but not at points of intersection. This becomes problematic if the number of pieces is large, as non-differentiability spreads across a large number of points. The class of f-divergence functions shown in (Ben-Tal et al. 2013) contains a mix of differentiable and non-differentiable functions. In the spirit of proposing divergences that have widespread applicability, we combine the infimal convolution with piecewise linearity to produce smooth and differentiable piecewise linear divergences that can mimic the behaviours of existing f-divergences. The smooth piecewise linear divergence can more accurately mimic existing smooth f-divergences. We use the Moreau-Yosida regularization, itself defined as an infimal convolution, due to its desirable conjugacy and closed-form property.

Definition 3
The Moreau-Yosida regularization Y of a closed convex function g is , the value of m that makes Y a least-square fit of is obtained from the following quadratic programming problem Without running the optimization model in the proposition, the value of m * can be obtained in closed form by setting the derivative to zero for cases where the critical point of SSD is a minimizer. For such cases, The value of seems to be determined a posteriori in this proposition, which is problematic since it is needed in the calculation of m * . A simple solution to this is to initialize and then iteratively lower it until before ≥ max p∈ [L+U] {a * p } m * is violated.
With the regularized divergence, Model (P1) is solvable as a mixed-integer second order conic problem, as shown in the following theorem.
The price of differentiability, when it is administered via the Moreau-Yosida regularization is, then, a loss of tractability. A point that is worth noting is that for cases such as the Kullback-Leibler divergence and the Burg Entropy, the regularization can replicate closely the ambiguity aversions they represent while actually improving the practicability from a general mixed-integer problem with exponential/logarithmic terms to a mixed-integer second order conic problem, which can be very desirable if running time, approximation closeness, and differentiability are crucial in implementing the models in practical decision support systems.
A crucial point to note about the LS-ICV, LS-PL and the regularized LS-PL is that they can also mimic f-divergences that do not have closed-form convex conjugates, such as the J-divergence (see (Ben-Tal et al. 2013)). Within an even bigger picture, our proposed divergences allow the decision maker to altogether avoid the tedious derivation of convex conjugates, and thus use a greater variety of divergence measures for which the conjugates are not readily available, such as Bregman divergences and -divergences.

Numerical study: the Brazilian humanitarian supply chain
In July 2013, the Brazilian National Department for Civil Protection and Defense (SEDEC) reached an agreement with the Brazilian Postal Office Service ('Correios') to preposition relief supplies, such as food, water, and medicine, among others, in municipalities in major states in Brazil. In case of a disaster, SEDEC would transship these supplies between municipalities depending on victim needs, with the local civil defenses thereafter taking over last-mile distributions. In an unexpected turn of events, the endeavour was abandoned due to the lack of coordination of logistics operations and the high implementation costs. SEDEC instead resorted to a supplier selection strategy based on the ability to fulfil relief demands within 192 h for the North region and 96 h for the remaining regions. This strategy is flawed since most relief supplies are needed within the first 48 critical hours after a disaster has struck. In addition, this strategy is blind to imbalances in socioeconomic vulnerabilities, which is a prominent reality in Brazil. Many recent disasters in Brazil are actually a consequence of social processes whose main characteristic is the unequal distribution of opportunities or social inequality that push poorer people to risky areas or to informal settlements often placed in slopes and floodplains that lack basic infrastructure (Carmo and Anazawa 2014; .
Considering that prepositioning of relief aid is one of the most effective preparedness strategies to deal with most types of disasters, but acknowledging it can be prohibitively complicated and expensive, we propose to build a location-allocation network for strategic relief aid prepositioning focused on the protection of the most vulnerable groups in a disaster aftermath. Our approach is meant to be implemented by an entity such as the Federal Government, who will be in charge of allocating relief aid to help the affected municipalities within the first 48 critical hours. When the lack of resources makes it impossible to maintain high levels of prepositioned stock to supply all incoming aid requests at once, our formulation will judiciously select locations that should be prioritized according to their utility profile, which includes a measure of vulnerability, amongst others.
Formally stated, our problem concerns a given geographical region prone to natural hazards and is defined with respect to a number of settlements (towns, villages or even neighbourhoods), each of which we refer to as an affected area, that exhibit different characteristics such as demographic and socioeconomic profiles. In a disaster aftermath, these affected areas may require relief aid from prepositioned stockpiles of supplies in response facilities. Humanitarian logisticians must decide here-and-now on where to set up response facilities and to what level critical supplies must be stored in these facilities. After a disaster strikes, they must make wait-and-see decisions on how to assign victim needs to the set-up response facilities. The proposed optimization model uses the following notation. N is the set of potential response facility locations, A is the set of affected areas, L is the set of response facility sizes, R is the set of relief aid supplies, and [S] is the set of indices running from 1 to S, in which S denotes the number of disaster scenarios. We denote by c o n the fixed cost of setting up response facility of size at location n, c p rn the prepositioning cost of relief aid r at location n, and c d an the unit cost of shipping relief aid supplies from location n to affected area a. The capacity (in volume) of response facility of size is given by resp , and the units of storage space required by relief aid r is given by r . Prepositioning of relief aid r at location n implies a minimum quantity of min rn , and the overall quantity of relief aid r to be stockpiled is denoted by max r . Pre-and post-disaster humanitarian operations are budgeted at and ′ , respectively. Victim needs for relief aid r at affected area a in scenario are represented by d ra and the nominal probability of occurrence of scenario is given by p . Finally, we denote by u ran the utility of assignment of 100% of relief aid r of affected area a to response facility n in scenario . Our model entails the following decision variables: P rn is the quantity of relief aid r prepositioned at response facility n, Y n is the binary variable that indicates whether or not the response facility of category is established at location n, X ran is the fraction of relief aid r of affected area a satisfied by response facility n in scenario , and G is the Gini coefficient of assignments of relief aid r from response facility n in scenario . The optimization model is then (4) P rn ≥ min rn ∑ ∈L Y n ∀r ∈ R, n ∈ N The objective function (1) maximizes only the recourse function since there is no first-stage cost. Constraint (2) ensures that relief supplies can be prepositioned at a node n only if a response facility is established at that node and that this prepositioning respects the space limitation of the response facility. Constraint (3) puts an upper bound onthe quantity of relief aid of type r available for prepositioning. This depends largely on supply amounts accumulated through donations and limits on stockpiles for other reasons such as perishability. It is generally agreed a priori between private suppliers and public bodies or non-governmental organisations in charge of disaster relief operations. Constraint (4) maintains that if the decision is made to preposition relief aid of type r at a response facility located at node n, a minimum quantity of that relief aid is prepositioned. This prevents operational inefficiencies, such as frequent loading and unloading, associated with having small quantities of supplies stocked at facilities. Constraint (5) restricts each node n to only one type or size of response facility. Constraint (6) defines the pre-disaster financial budgets for carrying out prepositioning activities. Constraints (7) and (8) specify the domains of the first-stage decision variables.
The recourse function given in (9) reflects two crucial concepts in humanitarian logistics, effectiveness and equity. The effectiveness measure is ∑ r∈R,a∈A,n∈N u ran X ran , which is the total utility of relief aid assignments, and the equity measure is 1 − G , where the Gini coefficient G is a popular measure of inequity. It is now well-established that equity, whilst being under-studied in humanitarian logistics, is an essential consideration in humanitarian operations (Alem et al. 2016;Sabbaghtorkan et al. 2020;Alem et al. 2022;Çankaya et al. 2019). Given a scenario s, the effectiveness function determines the extent to which the established response facilities cover victim needs, whereas the equity function measures the extent to which the prepositioned stock of relief aids is fairly allocated amongst affected areas. The objective function follows the rationale developed in (Eisenhandler and Tzur 2018) with the additional contribution of a novel utility function tailored for the Brazilian case. The explicit form of this function is detailed in the next paragraph. Constraint (10) ensures that assignments of relief aid r from node n to satisfy victim needs do not exceed the available prepositioned supplies at n. Constraint (11) guarantees that the total amount of relief aid r allocated to affected area a does not exceed victim needs. Constraint (12) bounds the post-disaster expenses for carrying out the relief aid assignments. Finally, constraints (13) specify the domain of the second-stage decision variables.

Devising an equity-effectiveness trade-off measure
The proposed utility function in (9) is built upon the effectiveness principle using 1) the socioeconomic vulnerability of affected areas, 2) the accessibility, reflected by the travel times from response facilities to affected areas, 3) the criticality of relief aids in alleviating human suffering, and 4) the victim needs, as follows: where a represents the socioeconomic vulnerability of affected area a, ran is the accessibility associated with assigning the response facility at node n to cover victim needs of type r in affected area a, w r is the criticality of relief aid r, and d ra represents the victim needs at affected area a for relief aid r in scenario . Notice that, for a given coverage level, say X ran , a scenario-dependent recourse maximization of ∑ r∈R,a∈A,n∈N u ranXran will favour affected areas that present higher utilities.
In this paper, we adopt the poverty measure based on income poverty (FGT) as a proxy for the socioeconomic vulnerability (following developments in ). Poverty is indeed recognized as one of the main drivers that lead to vulnerability to natural hazards in the sense that it narrows coping and resistance strategies, and it causes the loss of diversification, the restriction of entitlements, and the lack of empowerment (de Sherbinin 2008; The World Bank 2017). u ran = a an w r d ra , The FGT poverty measure is a popular measure for its simplicity and desirable axiomatic properties (Foster 2010). Let h a be the total population of affected area a and h EP a , h VP a , h AP a be the number of extremely poor people,very poor people, and almost poor people in a, respectively. The income classes represented by extremely poor, very poor, and almost poor people are assumed to have a per capita household income equal to or less than thresholds given by t EP , t VP , and t AP per month, respectively. The average incomes of these groups are given by EP a , VP a , and AP a , respectively. Let 0 be a poverty line or given threshold for income. The FGT poverty measure for an affected area a is calculated as To quantify the accessibility, we take into account the response time necessary to cover affected area a from node n. Let an be the travel time of any relief aid when the response facility at n is assigned to cover affected area a, and assume that supplies must ideally arrive at affected areas within a reference time of ̄ time units. The accessibility is then defined as an = 1 , if an ≤̄ ; an = 1 − an −̄ , if ̄< an < 2̄ ; otherwise, an = 0.
To evaluate the importance or criticality of relief aid r, let NP r be the number of people affected by the shortage of relied aid r and DT r be the maximum deprivation time per person for relief aid r. Therefore, w � r = NP r DT r represents the importance proportional to the number of people and inversely proportional to the deprivation times. Finally, w r = w � r ∑̄r ∈R w �r to ensure that ∑ r w r = 1. For the equity measure, we use the relative mean difference proxy for the Gini Coefficient, following the rationale from Mandell (1991), with envy level expressed in terms of our proposed utility, to obtain the following expression: where W a = a ∑ a � ∈A a � , leading to the following objective function: which can be easily linearized to provide a mixed-integer programming formulation.
The victim needs is evaluated as where, 'length' is the number of days in which victims need to be supplied, l r shows for how long one unit of aid r can supply the victims, victims a is the number of homeless and displaced people in area a in scenario , and coverage r is the number of people covered by one unit of relief aid r. The characteristics of the different relief aids are summarized in Table 2. We use the historical data from 2007-2016 from the Integrated Disaster Information System (S2ID 2018) to estimate the victim needs for ten disaster scenarios with equal nominal probabilities of 0.1.
We consider very small, small, medium, large and very large response facilities whose capacities are 1269, 2538, 5076, 11559 and 22087 m 3 , respectively. The setup cost for a response facility is assumed to be proportional to the construction cost published in monthly reports by the local Unions of Building Construction Industry. Shipping costs are evaluated based on transportation via medium-sized trucks that cover 2.5 km per litre of diesel, at a cost 3 BRL per litre of diesel. The financial budget is taken as ten percent of the minimum total cost required to meet all victim needs. The budgets for both stages are obtained by solving Model (P1 � ) with a cost-minimization objective. The first-and second-stage budgets are BRL 61,413,460 and BRL 181,496 respectively. The socioeconomic data are extracted from the Human Development Atlas published by the United Nations Development Programme at http:// www. atlas brasil. org. br. Figure 1 shows the computed FGT poverty level for all Brazilian states.
Suppose the Kullback-Leibler (K-L) divergence gives an accurate depiction of the decision maker's ambiguity aversion. We use Ξ = 0.13 to allow a maximum disaster scenario probability of 0.3. The least-square fits are pictured in Fig. 3.
The LS-PL ( L = U = 5 is chosen here) mimics excellently the K-L divergence, but provides better practicability. The Moreau-Yosida regularization further improves the fit, which we do not illustrate in Fig. 3 because the difference is visually imperceptible. The SSDs are 4.84 × 10 −2 , 1.48 × 10 −4 , and 1.30 × 10 −4 for the LS-ICV, LS-PL, and regularized LS-PL, respectively. Table 3 showcases the variations in SSDs of the LS-PL and the regularized LS-PL with the number  Figure 3 shows that with this approximation accuracy, the piecewise linear function is almost indistinguishable from the original f-divergence function. We use CPLEX to solve all models except the model under the K-L divergence, which is mixed-integer with exponential terms. For the latter, we used 3 different solvers in an attempt to get the best solution, BARON, SCIP and DICOPT. The optimal solutions of the various models are illustrated in Table 4. We note that Model (S1) is Model (P1 � ) with an effectiveness-only objective ( ∑ r∈R,a∈A,n∈N u ran X ran ). We see that the model under the K-L divergence fails to converge to the optimal solution after 5 h, whereas the models under our proposed divergences are easily solved. The worst solution time is that of the robust stochastic optimization under the regularized LS-PL, which solves to optimality in under 30 min.
The comparisons of the results of our proposed models with that of a stochastic programming model with an effectiveness-only objective showcase the importance of the Gini coefficient in improving the equity of humanitarian operations. Compared to Model (S1), all other models have improved average Gini scores, at the expense of a drop in the effectiveness. We can see the mechanism through which this greater equitability is achieved in Fig. 2.
We see clearly that while the effectiveness-only model (S1) provides better average demand coverage for the Amazonas (am), when equity considerations are factored in, the demand coverage for the Amazonas is sacrificed to obtain better demand coverage in almost every other federal unit. This yields better average demand coverages over all federal units. The average demand coverages are 0.16, 0.23, 0.22, 0.23, 0.22 for Models (S1), (P1 � ) , LS-ICV, LS-PL, and regularized LS-PL, respectively. It also yields lower standard deviations of demand coverages, with 0.084, 0.072, 0.071, 0.074, 0.083 for Models (S1), (P1 � ) , LS-ICV, LS-PL, and regularized LS-PL, respectively. Therefore, resource is redirected from the Amazonas, where victim needs are disproportionately high, to obtain better average and spread of demand coverage across all federal units.

Impact of ambiguity consideration
The previous section studied equity considerations. In this subsection, we investigate the importance of considering ambiguity in humanitarian operations. To do this, we generate 50 random probability vectors such that probabilities do not exceed 0.3 (to keep within the divergence radius). Figure 4 shows the distribution of optimal second-stage objectives when first-stage decisions are fixed with the optimal solutions of the models used in the previous section.

Probability of obtaining beƩer soluƟons (EffecƟveness x Equity)
than the classical stochasƟc programming model  While the performance distributions in Fig. 4 show the overall performance behaviors of the models, they do not portray the scenario-by-scenario comparisons of model performances, e.g., in scenario s, which model performs best, and by how much? Fig. 5 illustrates how scenario-by-scenario differences between our robust stochastic optimization models and the classical stochastic programming model are then distributed.
The benefit of incorporating ambiguity in humanitarian logistics planning is that it adds greater robustness to wrong probability estimations. In other words, the ambiguity-based approaches hedge against wrong probability predictions, making the solution less sensitive to initial probability assignments, which is where the classical stochastic programming model fails by design. We observe significant improvements in our simulations. Our robust stochastic optimization models yield performance improvements over the classical stochastic programming model in 45 out of the 50 randomly generate probability scenarios. The improvements in the effectiveness × equity measure are greater than 1000, on average, which corresponds to around 30% of the overall average performance (details are found in Table 5).

Practicability under random instances
So far, our models have been "trained" on 10 scenarios representing 10 years of Brazilian disaster impacts. Now, we test the practicability of our approaches over larger numbers of scenarios. Additional scenarios are generated randomly with 10% perturbations around existing ones, so as to maintain the correlation of disaster impacts. Values of Ξ are still computed such that maximum probabilities do not exceed three times the probability in the equiprobable setting. Pieces are recomputed for piecewise linear methods (LS-PL and regularized) when the number of scenarios is changed and the solvers and operating system remain unchanged. Our results are shown in Table 6.
The practicability of our robust stochastic optimization models is evident from the results. As we increase the number of scenarios, the solution times of our linear models (LS-ICV and LS-PL) remain comparable to the classical stochastic programming model. Not surprisingly, the robust stochastic optimization model under Moreau-Yosida regularization of piecewise linear divergences has higher solution times for being mixed-integer second-order conic. As mentioned before, this formulation accepts a sacrifice on practicability in order to ensure the differentiability of the divergence function. In the worst-case, the regularized model takes around 5 h to solve. All our robust stochastic optimization models (LS-ICV, LS-PL and Regularized) perform remarkably better than the robust stochastic optimization model under the original f-divergence measure. The latter could not close the optimality gap under 500%, even after 10 h of running time. This confirms our original thesis that f-divergence functions are largely impractical to implement, especially on mixed-integer models. Our divergence functions approximate f-divergences with great accuracy (especially our piecewise linear and regularized piecewise linear functions), while yielding vastly more practicable models.
In order to make sure that our practicable models are able to maintain their performance improvements over a larger number of scenarios, we ran them on 50 disaster scenarios and 50 randomly generated probability vectors, obtaining the summarized results given in Fig. 6 and Table 7. To offer a greater challenge to our models, we randomly generated 50 different disaster scenarios for each probability vector run. Our robust stochastic optimization models improve the classical stochastic programming model on average, with the best improvements coming from our piecewise linear and regularized piecewise linear divergences. More scenarios experience more improvements than deterioriations and best-case improvements are more pronounced than in the 10-scenario setting.

Conclusion and future Work
In this paper, we have introduced novel divergence functions that yield practicable robust stochastic optimization counterparts, while providing greater versatility in modeling ambiguity aversions. We have also provided ways in which to tailor these functions such that they mimic existing f-divergence functions while offering better practicability. We have applied our approaches to an important problem of disaster preparedness based on a realistic case study of natural hazards in Brazil.
Our humanitarian logistics planning model optimizes an effectiveness-equity tradeoff, defined with a new utility function and a Gini mean absolute deviation coefficient, both encompassing critical aspects in humanitarian settings, such as equity and vulnerability. With the case study, we showcase 1) the significant improvement  in terms of practicability of the robust stochastic optimization counterparts with our proposed divergence functions compared to existing ones, 2) the greater equity of humanitarian response that the objective function enforces and 3) the greater robustness to variations in probability estimations of the resulting plans when ambiguity is considered. In particular, it is remarkable that our findings suggest our robust stochastic optimization models could improve performances on wrong nominal probability estimations in 90% of scenarios, at best, compared to traditional stochastic programming modeling, which is crucial in real-life settings where such improvements mean better allocation of scarce resources to suffering populations. Future research includes enriching our ambiguity sets with CVaR-based metric, in an effort to further reduce the tail of performance deterioriations, thus create even more pronounced performance improvements.

Appendix: Proofs of theorems, propositions, and lemmas
conjugacy. Since I (q, q) = 0 , Slater condition holds and the inequality in the above derivation becomes an equality constraint. If for any fixed , −∞ < Q(x, ) < ∞ for all feasible x and * Q(x, ) − + + − is monotonically increasing in Q(x, ) , combining the first-stage decision, Model (P1) becomes ◻

Proof of Theorem 2
, is infeasible and thus, at least one component To prove that when d = 1 D , ∀d ∈ [D] , this infimal convolution is equivalent to the variation distance | p q − 1 | , we rely on the subadditivity of absolute value functions. The infimal convolution where the first equality results from the definition of infimal convolution, the second equality follows from the definition of conjugacy and the third equality is from the properties of the infimal convolution of proper, convex and lower semicontinuous functions. We can calculate that Therefore, * d ( Q(x, ) − + + − ) can be expressed via auxiliary variables in the set Substituting in Model (14), which is both feasible and bounded, and combining with the first-stage model, we obtain Model (P3). ◻

Proof of Theorem 4
The minimizer * of SSD is such that where the first equality results from the definition of infimal convolution, the second equality follows from the definition of conjugacy and the third equality is from the properties of the infimal convolution of proper, convex and lower semicontinuous functions. Because of Lemma 1, the above model becomes where * is the conjugate of 3Φ (H − 1) 3 + 1 | p q − 1 |.
We can calculate that Therefore, * ( Q(x, ) − + + − ) can be expressed via the set of auxiliary varia- Substituting in Model (15), which is both feasible and bounded, and combining with the first-stage model, we obtain Model (P4).

Proof of Proposition 7
Suppose a piecewise linear divergence is constructed such that fitting is done separately for ranges z ≤ 1 and 1 ≤ z ≤ H . If L and U pieces with equally spaced intersection points are used for z ≤ 1 and 1 ≤ z ≤ H , respectively, the piecewise linear divergence that fits an existing divergence (z) , such that pieces are sequentially least-square fitted starting at z = 1 , is given by Proof Proof. Case 0 ≤ z ≤ 1 : The L th piece has the function a L (1 − z) since it must be zero at z = 1 . Therefore, the SSD is the function This implies that ) and the piecewise linear function is where f L+1 ( l L ) = 0 and Ψ l = ∫ l∕L (l−1)∕L (z)( l L − z)dz.
Differentiating with respect to a u and setting to zero, we obtain

Proof of Corollary 8
The minimum SSD when the piecewise linear function G(z) is used to fit an existing divergence (z) is a p − a p+1 (y − a p ) + b p } . Replacing the conjugate formulation in the above model and given the latter's feasibility and boundedness, we obtain where, from the definition of G , f a L+1 ( and Combining with the first-stage model, we obtain Model (P5). ◻