Sample Average Approximation of Conditional Value-at-risk based Variational Inequalities

This paper focuses on a class of variational inequalities (VIs), where the map defining the VI is given by the component-wise conditional value-at-risk (CVaR) of a random function. We focus on solving the VI using sample average approximation, where solutions of the VI are estimated with solutions of a sample average VI that uses empirical estimates of the CVaRs. We establish two properties for this scheme. First, under continuity of the random map and the uncertainty taking values in a bounded set, we prove asymptotic consistency, establishing almost sure convergence of the solution of the sample average problem to the true solution. Second, under the additional assumption of random functions being Lipschitz, we prove exponential convergence where the probability of the distance between an approximate solution and the true solution being smaller than any constant approaches unity exponentially fast. The exponential decay bound is refined for the case where random functions have a specific separable form in the decision variable and uncertainty. We adapt these results to the case of uncertain routing games and derive explicit sample guarantees for obtaining a CVaR-based Wardrop equilibria using the sample average procedure. We illustrate our theoretical findings by approximating the CVaR-based Wardrop equilibria for a modified Sioux Falls network.


Introduction
Consider the following variational inequality problem VI(X , F ): find x * ∈ X such that (x − x * ) F (x * ) ≥ 0, for all x ∈ X , where X ⊂ R n is a compact set and each component i ∈ {1, 2, . . .n} of the map F : R n → R n , denoted F i : R n → R, is given by In the above equation, f i : X ×U → R is referred to as the random function, the set U ⊂ R m is compact, and P is the distribution of u supported over the set U. We assume f i is continuous.The map F i gives the conditional value-at-risk (CVaR) at level α ∈ (0, 1) of the random function f i .The CVaR computes the tail expectation of the underlying random variable [18] and can be determined by the following optimization where E P is the expectation under the distribution P and the operator [ • ] + gives the positive part, i.e., [v] + = max{0, v}.The parameter α characterizes the risk-averseness.When α is close to unity, the decision-maker is risk-neutral, whereas, α close to the origin implies high risk-averseness.The main purpose of the paper is to analyze the statistical properties of a sample average approximation (SAA) scheme for solving the variational inequality VI(X , F ) given in (1).The set of solutions of this problem is denoted by SOL(X , F ). Variational inequality problems defined using a set of random functions is surveyed in [17].The most widely studied VI problem in this context, termed stochastic variational inequalities (SVIs), is the one where the map defining the VI is the expectation of a random function.Risk-based VIs, where the VI map is given as the risk of a random function, naturally generalize the setup of SVI and find application in finding the Wardrop equilibria in a network routing problem where users are risk-averse.While several works explore sample average schemes for SVIs, there is no such study for risk-averse VIs.This paper aims to fill this gap.
Early investigations on statistical aspects of SAA for generalized equations and SVIs appeared in [9] and [8], respectively.These works focused on asymptotic properties of the SAA schemes, that is, consistency of estimators and their asymptotic distributions.The former is concerned with showing the convergence with probability one of solutions of the SAA to solutions of the original problem as the sample size tends to infinity.The latter determines the distribution of the approximate solutions in the asymptotic limit.While these properties show the limiting behavior, they do not illustrate the guarantees in the finite-sample regime.This feature was explored in [27,15,28,19] where it was shown that for generalized equilibrium problems under various set of assumptions, one can demonstrate exponential convergence of the approximate solutions.Meaning that the probability that the SAA solution is a fixed distance away from the original solution decays exponentially as the sample size tends to infinity.Technically, establishing such a property relies on conducting sensitivity analysis for the VI and then combining it with uniform large deviation bounds on random functions.All these studies share the common property that the underlying map is the expectation of the random function, while in this paper we look at CVaR-based maps.
The works [11], [21], and [1] study SAA of CVaR in the context of stochastic optimization problems, where CVaR is either being minimized or used to define the constraints.In [11] and [21] asymptotic consistency and exponential convergence of Karush-Kuhn-Tucker (KKT) points of the sample average optimization problem to that of the true one was established.In [1], the SAA of CVaR is used to approximate the solution of risk-constrained optimization problem.Since CVaR is used to define a VI problem in our case, the analysis does not follow directly from these existing results.Moreover, as opposed to the general large deviation bounds provided in these works, the exponential bounds derived here are explicit without involving ambiguous constants.In another data-based approach [16], the CVaR is perceived as the expected shortfall and desirable statistical guarantees are obtained for the optimizers of its sample average.One of the motivations for our work is to approximate the Wardrop equilibirum for a network routing problem where agents choose paths that have minimum risk.Such a setting was extensively studied in [13] where various notions of equilibrium and related computational aspects of finding them were discussed.Among other works that consider risk, [12] and [14] assume the cost of each path to be the weighted sum of the mean and the variance of the uncertain cost.However, none of these works focus on CVaR-based routing.In the transportation literature, the CVaR-based equilibrium is also known as the mean excess traffic equilibrium, see e.g., [2,29] and references therein.While these works have explored numerous algorithms for computing the equilibrium, they lack theoretical performance guarantees for sample-based solutions.
For analyzing the SAA of (1), we assume that a certain number of independent and identically distributed samples of the random variable u are available using which the expectation operator in the definition of the CVaR is replaced with its sample average.The resulting empirical CVaR gives rise to a set of functions that are sample average versions of F .Using these, we define a sample average variational inequality.Our contributions are as follows: (i) We establish asymptotic consistency of the sample average scheme, that is, the set of solutions of the sample average VI converge almost surely, in a set-valued sense, to the set SOL(X , F ). (ii) Under the assumption that random functions are uniformly Lipschitz continuous in x, we show exponential convergence of the solution set of the sample average VI to the set SOL(X , F ).That is, given any constant, the probability that the distance of a solution of the sample average prob-lem from the set SOL(X , F ) is less than that constant approaches unity exponentially with the number of samples.(iii) We give tighter sample guarantees with explicit expression for the coefficient in the exponential bound for a particular class of separable random functions.(iv) We illustrate the application of the derived approximations in computing a CVaR-based Wardrop equilibrium for a network routing problem that is defined using uncertain costs.
A preliminary version of the paper appeared as [3], where the focus was finding the Wardrop equilibrium problem for a network routing problem.As compared to it, the present article has a more general problem setup focusing not just on a Wardrop equilibrium problem, but on a general VI.In addition, the tighter sample guarantees for separable functions given in Section 4 are new here and the simulation example is much more elaborate.Notation: Let R, R ≥0 , R >0 , and N denote the set of real, nonnegative real, positive real, and natural numbers, respectively.Let • denote the Euclidean 2-norm.We use [N ] := {1, . . ., N } for positive integer N .For x ∈ R, we let [x] + = max(x, 0) and x be the smallest integer greater than or equal to x.The cardinality of a set S is denoted by |S|.The distance of a point x ∈ R m to a set S ⊂ R m is denoted by dist(x, S) := inf y∈S x − y .The deviation of a set A ⊂ R m from S is D(A, S) := sup y∈A dist(y, S).

Preliminaries
Here we collect relevant mathematical background used throughout the paper.

Variational Inequality
Given a map F : R n → R n and a closed set X ⊂ R n , the variational inequality (VI) problem, denoted VI(X , F ), involves finding x * ∈ X such that (x − x * ) F (x * ) ≥ 0 for all x ∈ X .Such a point is called a solution of the VI problem.The set of solutions of VI(X , F ) are denoted by SOL(X , F ).The map F is monotone on the set X if (F (x) − F (x )) (x − x ) ≥ 0 for all x, x ∈ X .The map F is strictly monotone on X if this inequality is strict for x = x .Finally, F is strongly monotone on X with modulus σ > 0 if (F (x) − F (x )) (x − x ) ≥ σ x − x 2 for all x, x ∈ X .If F is either strictly or strongly monotone, then SOL(X , F ) is singleton.

A sequence of functions {f
, where X and Y are Euclidean spaces, is said to converge uniformly on a set X ⊂ X to f : X → Y if for any > 0, there exists N ∈ N such that Similar definition applies for convergence in probability.That is, consider a random sequence of functions {f ω N : X → Y} ∞ N =1 defined on a probability space (Ω, F, P ).The sequence is said to converge uniformly to f : X → Y on X almost surely (shorthand, a.s.) if f ω N → f uniformly on X for almost all ω ∈ Ω.

Risk Measures
Next we review notions on value-at-risk and CVaR from [18].Given a realvalued random variable Z with probability distribution P, we denote the cumulative distribution function by Given a probability level α ∈ (0, 1), the value-at-risk of Z at level α, denoted VaR P α [Z], is the left-side (1 − α)-quantile of Z. Formally, The CVaR, also referred to as the average value-at-risk in [18], of Z at level α, denoted CVaR P α [Z], is given as Under the continuity of the cumulative distribution function at VaR P α [Z], we have that CVaR P α [Z] is the expectation of Z when it takes values bigger than ].The parameter α characterizes the risk-averseness.When α is close to unity, the decision-maker is risk-neutral, whereas, α close to the origin implies high risk-averseness.The minimum in ( 4) is attained at a point in the interval [t m , t M ], where 3 Sample Average Approximation of VI(X , F ) The approach in the sample average framework is to replace the expectation operator in any problem with the average over the obtained samples [18].This is one of the main Monte Carlo methods for problems with expectations; see [5] for a detailed survey of other sample-based techniques.In our setup, for each component F i , we will replace the expectation operator in the definition of the CVaR in (3) with the sample average.The thus formed set of functions result in a VI problem that approximates VI(X , F ).
Note that the map F is continuous since f i , i ∈ [n] are so and X and U are compact.One can reason this fact using arguments similar to those of the proof of Lemma 4. As a consequence of the continuity of F , the set of solutions SOL(X , F ) of the problem VI(X , F ) is nonempty and compact [7, Corollary 2.2.5].For convenience, we use the notation S = SOL(X , F ).
Let U N := { u 1 , u 2 , . . ., u N } be the set of N ∈ N independent and identically distributed samples of u drawn from P.Then, the sample average approximation of the CVaR associated to component The above expression is also known as the empirical estimate of the CVaR, or empirical CVaR in short.The expression is also the CVaR of the random function at level α under the empirical distribution 1 N N j=1 δ u j , where δ u j is the unit point mass at u j .Note that the operator CVaR N α is random as it depends on the realization U N of the random variable.To emphasize this dependency, we represent with • N entities that are random.Using (5) as the approximate function, define the sample average VI problem as VI(X , F N ), where for all i ∈ [n].We denote the set of solutions of VI(X , F N ) by S N ⊂ X .This serves as a reminder that it approximates S. The notion of approximation is made precise next.Note that S N is nonempty as X is compact and F N is continuous.
Definition 1 (Asymptotic consistency and exponential convergence): The set S N is an asymptotically consistent approximation of S, or in short, S N is asymptotically consistent, if any sequence of solutions { x N ∈ S N } ∞ N =1 has almost surely (a.s.) all accumulation points in S. The set S N is said to converge exponentially to S if for any > 0, there exist positive constants c and δ such that for any sequence { x N ∈ S N } ∞ N =1 , the following holds for all N ∈ N.

•
The asymptotic consistency of S N is equivalent to saying D( S N , S) → 0 a.s. as N → ∞.The expression (6) gives a precise rate for this convergence.In our work, all convergence results are for N → ∞ and so we drop restating this fact for convenience's sake.In the following sections, we will establish the asymptotic consistency and the exponential convergence of S N under suitable assumptions.

Asymptotic Consistency of S N
We begin with stating the bound on the optimizers of the problem defining the CVaR (3) and the empirical CVaR (5).This restricts our attention to compact domains for variables (x, t, u), a property useful in showing consistency.Denote for each i ∈ [n], functions The map ψ N i is the sample average of ψ i .Given our assumption that U is compact, we have that the expected value of f i is bounded for any x ∈ X .Using this fact, one can deduce by strong law of large numbers [6] that for any fixed (x, t) ∈ X × R, ψ N i (x, t) → ψ i (x, t) a.s.We however require uniform convergence of these maps to conclude consistency, which will be established in Theorem 1 below.Observe that, by definition, The following result gives explicit bounds on the optimizers of these problems.
Lemma 1 (Bounds on optimizers of problems defining (empirical) CVaR): For any x ∈ X and i ∈ [n], the optimizers of the problems in (3) and (5) exist and belong to the compact set T = [ , L], where Furthermore, the set of functions for i ∈ [n], satisfy for all (x, t, u) ∈ X × T × U, Proof.From [18, Chapter 6], optimizers of (3) and ( 5) exist and they lie in the closed interval defined by the left-and the right-side (1 − α)-quantile of the respective random variables.Since this interval belongs to the set of values the functions take, we conclude that the optimizers belong to T .To conclude (9), note that Here, the first inequality follows from the bound on f i , the first equality is because t ∈ [ , L], and the second inequality is due to the fact that α < 1.
Similarly, for the lower bound, This completes the proof.
We make a note here that optimizers of problems defining the CVaR in (3) and ( 5) exist and are bounded for more general cases, even when the support of the random variable is unbounded, see e.g., [18,Chapter 6].Nevertheless, the above result provides an explicit bound which is used later in deriving precise exponential convergence guarantees.
As a consequence of Lemma 1, one can show uniform convergence of ψ N p to ψ p .Our next step is to analyze the sensitivity of F as one perturbs the underlying map ψ.In combination with the uniform convergence of ψ N p , this result leads to the uniform convergence of F N to F .Lemma 2 (Sensitivity of F with respect to ψ): Proof.The first step is to show the sensitivity of the map CVaR P α [f i (•, u)] with respect to ψ p .To this end, fix i ∈ [n] and x ∈ X , and let These optimizers exist due to Lemma 1.We now have The first inequality is due to optimality and the second inequality holds by assumption.Similarly, one can show that The above two sets of inequalities along with the fact that CVaR Finally, the conclusion follows from the inequality The final preliminary result states proximity of S N to S given that the difference between F N and F is bounded.The proof is a consequence of [27, Lemma 2.1] that studies sensitivity of generalized equations.
Lemma 3 (Sensitivity of S with respect to F ): For any > 0, there exists δ( ) > 0 such that D( S N , S) ≤ whenever Next is the main result of this section, establishing the asymptotic consistency of S N .The proof puts to use the preliminary lemmas on sensitivity presented above along with the uniform convergence of ψ N p to ψ p .
Theorem 1 (Asymptotic consistency of S N ): We have D( S N , S) → 0 almost surely.
Proof.Consider first the a.s.uniform convergence ψ N i → ψ i over the compact set X × T .Note that ψ i (x, t) = E P [φ i (x, t, u)] where φ i is given in ( 8) and so, ψ N i is the sample average of ψ i .For any fixed u ∈ U, the map φ i (•, •, u) is continuous and for any (x, t) ∈ X × T , due to Lemma 1, the map φ i (x, t, •) is dominated by the integrable function (a constant in this case) + L− α .Hence, by the uniform law of large numbers result [18,Theorem 7.48], we conclude that ψ N i → ψ i uniformly a.s. on X × T .Using this fact in the sensitivity result of Lemma 2 implies that F N → F uniformly a.s. on the set X .Finally, we arrive at the conclusion using Lemma 3.

Exponential Convergence of S N
Here, our strategy will be to use the concentration inequality for the empirical CVaR given in [24] and derive the uniform exponential convergence of F N to F .Later, we will use Lemma 3 to infer exponential convergence of S N .Note that the inequality given in [24] requires compact support of the random variable and it is tight when it comes to the dependency on the risk parameter α.For unbounded support, one can use deviation inequalities from [10].
For a fixed i ∈ [n] and x ∈ X , the deviation between the CVaR and its empirical counterpart can be bounded using the results in [24] as In the above bound, the denominator in the exponent uses the fact that any realization of f i (x, u) given any x is supported on the compact set [ , L].Similar to the narrative of the previous section, while the above inequality holds pointwise, what we need is uniform exponential bound for proximity of F to F N .Below, we will derive such a bound under the following condition.

Assumption 2. (Uniform Lipschitz continuity of f i ):
There exists a constant M > 0 such that for all x, x ∈ X , u ∈ U, and i ∈ [n].

•
Under the above Lipschitz condition on the random functions, one can show the following.
Lemma 4 (Lipschitz continuity of (empirical) CVaR): Under Assumption 2, Proof.We will show the property for the function Assumption 2 yields the Lipschitz continuity property for the map ψ N i .To establish this, fix any i ∈ [n] and t ∈ R and notice that In the above relations, the first is a consequence of the triangle inequality, the second inequality follows from the fact that the map [ • ] + is Lipschitz continuous with constant as unity, and the last inequality uses the Lipschitz continuity property of f i .Now let t, t ∈ R be such that ψ N i (x, t) = inf t∈R ψ N i (x, t) and . Existence of such an optimizer follows from the discussion in [18, Section 6.2.4].Next note the following sequence of inequalities that can be inferred from the optimality condition and the Lipschitz continuity property of ψ N i shown above, One can exchange x with x in the above reasoning and obtain inf t∈R Inequalities ( 14) and ( 15) imply that inf t∈R The proof concludes by using this fact in (13).
Next, we establish the exponential convergence of F N .The proof is largely inspired from the steps given in [19,Theorem 5.1] and is a standard argument in these set of results.We note that the obtained bound is very crude and in practice, the achieved performance is much better.
Proposition 1 (Uniform exponential convergence of F N to F ): Under Assumption 2, for any 0 < < diam(X )/2, the following inequality holds for all N ∈ N, where and diam(X ) = sup x,x ∈X x − x is the diameter of X .
Proof.The idea of moving from the pointwise exponential bound (11) to a uniform bound is to impose the pointwise bound jointly on a finite number of points and use the Lipschitz continuity property (Lemma 4) to bound the deviation of the rest of the set from this finite set.Making precise the mathematical details, note that one can cover the set X with number of points, labeled C := {x 1 , . . ., xK } ⊂ X , such that for any x ∈ X , there exists a point xi(x) ∈ C with The number K can be computed as follows.From (18), we require x− xi(x) ≤ α 4M .Thus, from Definition 3, the number of points in C need only be bigger than the α 4M -covering number of X .Thus, any upper bound on this covering number suffices.From Lemma 7, one such upper bound is 3 diam(X ) , where vol(B) is the volume of the unit norm ball B in R n .Since vol(B) ≥ n/2 !, we get the desired value for K given in (17).Having identified the set of points C, we next combine the Lipschitz bound given in Lemma 4 and the inequality (18), to get for all i ∈ [n] and x ∈ X , The above inequalities control the deviation of CVaR over the set X from the values these functions take on the set C. The next step entails bounding the deviation of the CVaR and the empirical CVaR on the set C. Employing (11) and the union bound, we have The next set of inequalities characterize the difference between the CVaR and the empirical CVaR over the set X using the Lipschitz continuity property (19).Fix i ∈ [n] and let x ∈ X .Note that using (19), Next, the deviation between the CVaR and its empirical counterpart is bounded using (20) and the above characterization as The final step is to connect the above inequality to the difference between F N and F .From the proof of Lemma 2, one can deduce that if Therefore, using ( 21) we obtain This concludes the proof.
The main result is given below.The proof follows from the uniform exponential convergence of F N .Theorem 3 (Exponential convergence of S N to S): Let Assumption 2 hold.Then, for any 0 < < diam(X )/2, and N ∈ N, the following inequality holds where γ and β are given in (16) and δ : R >0 → R >0 is a map such that the pair ( , δ( )) satisfies the condition of Lemma 3.
Proof.Consider any > 0. By Lemma 3, if sup x∈X F N (x) − F (x) ≤ δ( ), then D( S N , S) ≤ .From Proposition 1, for any δ( ) > 0, there exist γ(δ( )) and β(δ( )), given in (16a) and (16b), respectively, such that for all N .The proof follows by using the above facts and the set of inequalities: Remark 1 (Sample guarantees for approximating S with S N ): Theorem 3 implies that if one wants D( S N , S) ≤ with confidence 1 − ζ, where ζ ∈ (0, 1) is a small positive number, then one would require at most number of samples of the random variable.Due to the exponential rate, a good feature of this sample guarantee is that N depends on the accuracy ζ logarithmically.That is, one can obtain high confidence bounds with fewer samples.However, the sample size grows poorly with other parameters, especially, δ( ) and the dimension n.Further, note that to obtain an accurate sample guarantee, one needs to estimate δ(•) which depends on the regularity of random functions.Improving the sample complexity for specific random functions is discussed in the following section.•

Separable Uncertain Functions
Here we illustrate how specific structure of the random functions yields tighter sample guarantees.Further, we discuss the tractability of solving the sample average VI problem.
Proposition 2 (Exponential convergence for separable functions): Assume that the random functions have the form where fi , g i , and fi are non-negative real-valued continuous functions.Then, for any > 0 and N ∈ N, the following holds where β( ) := α 2 11n(f max g rge ) 2 , with f max := sup i∈[n],x∈X fi (x) and g rge := sup i∈[n] sup u∈U g i (u)−inf u∈U g i (u) .
Proof.Since CVaR is positive-homogeneous and shift-invariant [18,Chapter 6], one gets for all i ∈ [n].Using this fact, for any > 0, we reason as where (a) follows from ( 24) (note that a similar equality as ( 24) holds for CVaR P α ) and the fact that f takes non-negative values and (b) is a result of the deviation inequality (11) applied in combination with the union bound.Using (25) and proceeding along the lines of Theorem 3 we obtain (23).
Recounting the way we obtained exponential convergence in the previous section, the key step was in Proposition 1 where we moved from the deviation inequality for finite number of points in X to the uniform exponential convergence of F .Such an exercise was inevitable due to the possible interdependence of x and u in the random function.Not only that, the bound for this reason scaled poorly with many parameters, such as, the dimension n and the size of X .However, when the random function takes the form (22), then one need not construct a cover for X to derive uniform exponential convergence of F .Thus, we obtain a tighter bound (23).
Remark 2 (Sample guarantees and tractability for separable functions): Analogous to Remark 1, we deduce using Proposition 2 that for separable functions (22), the accuracy D( S N , S) ≤ with confidence 1 − ζ is guaranteed with number of samples.As expected, the above sample size does not depend on the size of X .Further, for the above derivation we need not assume the random function to be Lipschitz continuous.
We next comment about solving VI(X , F N ) for separable functions.For convenience, denote the concatenation of fi and fi for i ∈ [n] with functions f and f , respectively.Further, let the vector g N := ( CVaR collect the empirical CVaR of the random function of each path.Then, the aim is to solve VI(X , f g N + f ), where represents component-wise product.Given samples, the approach would be to compute g N and then proceed to solve the VI.Note that computing each component g N i amounts to solving a linear program: The appealing part of this process is the deconstruction into two steps: computing the empirical CVaR independent of x and solving the VI without worrying about samples.Further, one can derive conditions on the underlying functions that guarantee monotonicity of f g N + f that consequently lead to efficient algorithms that solve the VI, see e.g.methods given in [7].
• are denoted by O ⊂ V and D ⊂ V, respectively.The set of origin-destination (OD) pairs is W ⊆ O × D. Let P w denote the set of available paths for the OD pair w ∈ W and let P = ∪ w∈W P w be the set of all paths2 .Consider the setting of nonatomic routing where numerous agents traverse the network and so each individual agent's action has infinitesimal impact on the aggregate traffic flow.As a consequence, flow is modeled as a continuous variable.Each agent is associated with an OD pair w ∈ W and is allowed to select any path p ∈ P w .The route choices give rise to the aggregate traffic which is modeled as a flow vector h ∈ R |P| ≥0 with h p being the flow on a path p ∈ P. The flow between each OD pair must satisfy the travel demand.We denote the demand for the OD pair w ∈ W by d w ∈ R ≥0 and the set of feasible flows by Agents who choose path p ∈ P experience a non-negative uncertain cost denoted by C p : R , where u ∈ R m models the uncertainty.Let P and U ⊂ R m be the distribution and support of u, respectively.Assume that U is compact.For the cost function, assume that for every p ∈ P and u ∈ U, the function h → C p (h, u) is continuous.For every p ∈ P and h ∈ H, the function u → C p (h, u) is measurable.
In addition, for all p ∈ P, assume that C p takes finite value over H×U.The above described elements collectively represent an uncertain routing game.To assign an appropriate objective for agents, we assume that agents are riskaverse and look for paths with least CVaR.We assume that all agents have the same risk-aversion characterized by the parameter α ∈ (0, 1).The CVaR associated to path p as a function of the flow is The notion of equilibrium then is that of Wardrop [4], where the cost associated to a path is its CVaR.
Definition 2 (Conditional value-at-risk based Wardrop equilibrium (CWE)): is called a CVaR-based Wardrop equilibrium (CWE) for the uncertain routing game if: (i) h * satisfies the demand for all OD pairs and (ii) for any OD pair w, a path p ∈ P w has nonzero flow if the CVaR of path p is minimum among all paths in P w .Formally, h * is a CWE if h * ∈ H and h * p > 0 for p ∈ P w only if We denote the set of CWE by S CWE ⊂ H. • One can verify that the set S CWE is equivalent to the set of solutions to the variational inequality (VI) problem VI(H, G) (see Section 2 for relevant notions) [20], where for all p ∈ P. Note that the set H is compact and convex.Further, the map h → G(h) is continuous since C p , p ∈ P are so and H and U are compact.Therefore, the set of solutions SOL(H, G) is nonempty and compact [7, Corollary 2.2.5].Consequently, the set S CWE is nonempty and compact.Due to this connection between the CWE and the solution of the risk-based VI, we can apply the results developed in the previous section to study the sample average approximation to the CWE.We present the following main result.Given N i.i.d samples of u, the sample average approximation of the function G is defined component-wise as where We denote the set of solutions of the VI(H, G N ) by S N CWE .We have the guarantee: The following hold: (i) For any > 0, there exists δ( for all h, h ∈ H, u ∈ U, and p ∈ P. Let Then, for any > 0, and N ∈ N, the following inequality holds ))e −β(δ( ))N , where δ( ) satisfies the condition given in (i), and The proof follows in the same way as that of Theorem 1 and 3, using the fact that the covering number of H can be bounded as given in Lemma 9 in the appendix.This sharper bound on covering number brings out a notable difference in the constants given in the exponential bound (31a) and (31b) as compared to those derived in Theorem 3.

Numerical Example: Sioux Falls
Here we illustrate the method of sample average approximation for the computation of the CWE through an example.We consider the Sioux Falls traffic network that consists of 24 nodes and 76 edges [23].The (deterministic) cost associated to each edge e ∈ E of the network is the travel time and is given as an affine function of the flow on the edge, , where e is the flow on edge e ∈ E, t e is the free-flow travel time, and c e is the capacity of the edge.The values for constants t e and c e are taken from the repository [23].The constant b e is fixed to be 100 for all edges.For simplicity, we consider three OD pairs W = {(1, 19), (13,8), (12,18)} and for each pair, we choose 10 paths that have the shortest free-flow travel time.The demand is given as d (1,19) = 300, d (13,8) = 600, and d (12,18) = 200.The cost of each path is set to be the summation of the costs of the edges contained in it.That is, for some path p ∈ P, where the summation is over all edges that constitute the path.Note that the flow on any edge is the sum of the flow of the paths that use that edge.That is, We assume that the cost associated to each edge that contains either of the nodes 10, 16, or 17 is uncertain.Specifically, for such an edge e, the uncertain cost is given as where u e has uniform distribution over the set [0, 0.5t e ].For all other edges, we set u e to be zero with probability one.All uncertainties are assumed to be mutually independent.The uncertain cost of a path p ∈ P is given as This defines completely the routing game with uncertain costs.

Affine Separable Costs and LCP
In the example explained above, cost C p is linear in flow and the uncertainty is additive.Therefore, solving the sample average VI is equivalent to solving a linear complementarity problem (LCP), which in turn is a convex optimization problem with quadratic cost and affine constraints.We next drive this optimization problem.Let Q ∈ {0, 1} |E|×|P| be the (edge, path)-incidence matrix where Q ep entry is 1 if and only if edge e belongs to path p.Then, using (32), the vector containing all edge flows can be written as = Qh, where h consists of flows on paths.Stacking all uncertain edge costs J e in a vector J and using its affine separable form, we obtain where u and t are vector of uncertainties and free-flow travel time of all edges, respectively, and R is a diagonal matrix where the diagonal entry corresponding to edge e is b e t e /c e .Using the above relation, the vector of costs incurred on paths takes the form Further, since CVaR is shift-invariant, we obtain where the last term in the above relation is the vector of element-wise CVaRs.Similarly, the sample average approximation of G is given as Our aim is find the solution of VI(H, G).To represent the set of feasible flows in a compact form, denote B ∈ {0, 1} |W|×|P| as the (OD pair, path)-incidence matrix where B wp is 1 if and only if p ∈ P w .Denoting the vector of demands as d = (d w ) w∈W , the set of feasible flows are vectors h ≥ 0 satisfying Bh = d.Using this notation and the explanation given in [26, Section 2.2], finding the solution of VI(H, G N ) is equivalent to solving the LCP given as: find x = (h; v) ∈ R |P|+|W| such that x ≥ 0, M N (x) ≥ 0, and x M N (x) = 0, where This LCP can be equivalently solved by finding the optimizer of the following problem minimize x M N (x) subject to M N (x) ≥ 0, Since M N is affine in x, the above problem is quadratic and one can show using the properties of matrices Q, R, and B, that the problem is convex.To summarize, the VI(H, G) can be approximated by VI(H, G N ) and the latter can be solved by first finding CVaR N α [Q u] and then solving (33).

Computing CWE
For the above explained Sioux falls example, we set α = 0.05.This defines uniquely the CWE h * .For the sample average approximation, we consider three scenarios with different number of samples, N ∈ {50, 500, 5000}.We consider 500 runs for each of the scenarios.Each run collects N number of i.i.d samples of the uncertainty u, constructs the empirical CVaR N α [Q u], and computes the approximation of the CWE h N by solving (33).Figure 1 illustrates our results.It plots the cumulative distribution function of the random variable h N − h * as estimated using the 500 runs.Note that the complete distribution moves to the left with increasing number of samples.This confirms our theoretical findings that as N increases, the approximate solution h N approaches the CWE almost surely.

Conclusions
We considered a risk-based variational inequality and studied the sample average approximation method for solving it.In particular, we derived asymptotic consistency and exponential convergence under suitable assumptions.For the case of separable random function, we derived sharper convergence bounds.Lastly we demonstrated the application of our result in the case of uncertain network routing problem where one can determine the CVaR-based Wardrop equilibrium using the sample average scheme.Future work will involve exploring tractability of the resulting sample average VI under various conditions.We also wish to investigate other efficient sampling techniques, especially when the dimension of the problem is large.Finally, we plan to investigate decentralized learning methods for finding the solution of risk-averse VIs.

Appendix
Here, we estimate the covering number of a general set X and the feasible flow set H related to the network routing problem.This computation helps in establishing Proposition 1 and Theorem 4. In order to present the results, we need a couple of definitions.
, where B(x, ) is the closed ball in Euclidean metric with center as x and radius .The minimum number of points required to form an -cover of X is called the -covering number. • From [25], we have the following result.We give the proof here for the sake of completeness.Proof.Note that the -covering number is bounded above by the -packing number of the set [22,Chapter 3].The latter is defined as the maximum number of points that can be selected from X such that they are mutually more than distance apart.Let {x 1 , x 2 , . . ., x P } ⊂ X be these points and P be the -packing number.Our next step is to derive a bound for P .Note that by definition of packing, closed balls B(x i , /2) := {y ∈ R n | x i − y ≤ /2}, i ∈ {1, . . ., P } are disjoint and ∪ P i=1 B(x i , /2) ⊂ X + ( /2)B, where the set addition is considered to be the Minkowski sum.Taking the volume on both sides yields vol(X + ( /2)B) ≥ vol ∪ P i=1 B(x i , /2) = P vol(( /2)B).
The following is an application of the above result.
Lemma 7 (Covering number of a compact set): The -covering number of a compact set X ⊂ R n , where ≤ diam(X )/2, is upper bounded by 3 diam(X ) n 1 vol(B) , where vol(B) is the volume of the unit norm ball B in R n and diam(X ) = sup x,x ∈X x − x is the diameter of X .
Proof.Consider the set M := [− diam(X )/2, diam(X )/2] n , where diam(X ) is the diameter of the set X .One can verify that the covering number of X is upper bounded by that of the set M. This is because the set X can be entirely contained in M after performing a translation operation.Note that vol(M) = diam(X ) n .The result then follows from Lemma 6.
Next, we provide a bound on the covering number of a simplex.In the consequent result, we use this bound to analyze the covering number of the feasible flow set H. x i = d}, the -covering number is bounded above by where K = √ nd .
Proof.Consider the set of points and where K = √ nd .Note that C ⊂ ∆ n d .We will show that this is a valid -cover for ∆ n d .To this end, pick any point x ∈ ∆ n d .We will construct a point x c ∈ C such that x − x c ≤ .Let x up , x dn ∈ R n ≥0 be such that each j-th component is given by Note that x dn x x up , where denotes element-wise inequality.Further, Consequently, for such a vector we have Thus, our aim is to find a vector y that belongs to C and for which x dn y x up holds.Define Note that δ is an integer as n j=1 x j = d and each component x dn j is a product of an integer and the quantity d K .Now consider a vector y δ ∈ {0, 1} n such that Since each i s is a nonnegative integer, using the above inequality, the number of points in C is the number of ways K identical objects can be put into n distinct bins.This number is given as (35).
Using the above result, we next derive an upper bound on the -covering number the set H. where the first condition follows from the triangle inequality and the second from the definition of C w .This completes the proof.

Fig. 1
Fig.1Plot illustrating the the convergence of the approximate solution h N to the CVaRbased Wardrop equilibrium h * for Sioux falls network, see Section 6 for details.Each line is the cumulative distribution of h N − h * with a different sample size used for the approximation.The cdf is obtained using 500 runs.

Lemma 6 (
Covering number of a set): The -covering number of a convex set X ⊂ R n that satisfies B ⊂ X is upper bounded by 3 n vol(X ) vol(B) , where vol stands for volume, B is the unit norm ball in R n , and the operator + represents the Minkowski sum.

Lemma 8 (
Covering number for a simplex): For the simplex∆ n d := {x ∈ R n ≥0 | n i=1 j .By construction, for any vector y satisfying x dn y x up , we have y − x ∞ ≤ d K .

n
j=1 y δ j = δ.Set y = x dn + y δ .It is easy to see that by construction x dn y x up and y ∈ C. The former establishes y − x ≤ due to the reasoning in (36).Thus, C is an -cover for ∆ n d .As a consequence, to complete the proof we need to enumerate the points in C. To this end, note that for any point x c = d i1 K , i2 K , . . .,

Lemma 9 (
Covering number of H):The -covering number of the set of feasible flows H given in(27) is bounded above byw∈W |P w | + K w − 1 K w − 1 ,(37)where denotes the product andK w = |W| √ Pwdwfor all w ∈ W.Proof.First note that H = w∈W H w , where represents the Cartesian product andH w := h w ∈ R |Pw| ≥0 p∈Pw h w p = d wfor all w ∈ W. That is, H w represents the set of feasible flows for paths corresponding to the OD pair w.From Lemma 8, the number of points required to cover the set H w with balls of radius |W| is|P w | + K w − 1 K w − 1 ,whereK w = |W| √ Pwdw .Consider these set of points to be represented by C w ⊂ H w .Now consider the set of points C = {(h w ) w∈W | h w ∈ C w for all w ∈ W}.The number of points in C is equal to the value in (37).We show next that C is an -cover for H. Pick any h = (h w ) w∈W ∈ H, we have min h∈C h − h ≤ w∈W min w∈Cw h w − hw ≤ w∈W |W| = ,