First order asymptotics of the sample average approximation method to solve risk averse stochastic progams

We investigate statistical properties of the optimal value of the Sample Average Approximation of stochastic programs, continuing the study in Kr\"atschmer (2023). Central Limit Theorem type results are derived for the optimal value. As a crucial point the investigations are based on a new type of conditions from the theory of empirical processes which do not rely on pathwise analytical properties of the goal functions. In particular, continuity in the parameter is not imposed in advance as usual in the literature on the Sample Average Approximation method. It is also shown that the new condition is satisfied if the paths of the goal functions are H\"older continuous so that the main results carry over in this case. Moreover, the main results are applied to goal functions whose paths are piecewise H\"older continuous as e.g. in two stage mixed-integer programs. The main results are shown for classical risk neutral stochastic programs, but we also demonstrate how to apply them to the Sample Average Approximation of risk averse stochastic programs. In this respect we consider stochastic programs expressed in terms of absolute semideviations and divergence risk measures.


Introduction
where Θ denotes a compact subset of R m , whereas Z stands for a d-dimensional random vector with distribution P Z .In general the parameterized distribution of the goal function G is unknown, but some information is available by i.i.d.samples.Using this information, a general device to solve approximately problem (1.1) is provided by the so-called Sample Average Approximation (SAA) (see [29]).For explanation, let us consider a sequence (Z j ) j∈N of independent d-dimensional random vectors on some fixed complete atomless probability space (Ω, F , P) which are identically distributed as the d-dimensional random vector Z. G(θ, Z j ) (n ∈ N). (1. 2 The optimal values depend on the sample size and the realization of the samples of Z. Their asymptotic behaviour with increasing sample size, also known as the first order asymptotics of (1.1), is well-known.More precisely, the sequence of optimal values of the approximated optimization problem converges P-a.s. to the optimal value of the genuine stochastic program.Moreover, if G is Lipschitz continuous in θ, and if (1.1) has a unique solution, then the stochastic sequence is asymptotically normally distributed.In [10] asymptotic distributions of this stochastic sequence have also be found for stochastic mixed-integer programs, where typically the objectives are not continuous in the parameter.For these results, and more on asymptotics of the SAA method the reader may consult the monograph [29], and in addition the contributions [23] and [25].
In several fields like finance, insurance or microeconomics, the assumption of risk neutral decision makers are considered to be too idealistic.Instead there it is preferred to study the behaviour of actors with a more cautious attitude, known as risk aversion.In this view the optimization problem (1.1) should be replaced with a risk averse stochastic program, i.e. an optimization problem inf θ∈Θ ρ G(θ, Z) , (1.4) where ρ stands for a functional which is nondecreasing w.r.t. the increasing convex order.A general class of functionals fulfilling this requirement is built by the so called distribution-invariant convex risk measures (see e.g.[11], [29]).They play an important role as building blocks in quantitative risk management (see [21], [24], [27]), and they have been suggested as a systematic approach for calculations of insurance premia (cf.[15]).Distribution-invariance denotes the property that a functional ρ has the same outcome for random variables with identical distribution.Hence, a distribution-invariant convex risk measure ρ may be associated with a functional R ρ on sets of distribution functions (see e.g.[19,Section 4.2], and also [5, (2.4)]).In this case (1.4) reads as follows where F θ is the distribution function of G(θ, Z).Then we may modify the SAA method by inf θ∈Θ R ρ ( Fn,θ ) (n ∈ N). (1.5) As the title says, the subject of the paper is the first order asymptotics of the SAA method for (1.4) with ρ being a distribution-invariant convex risk measure.It is already known that under rather general conditions on the mapping G we have (see [28]).In [18] nonasymptotic upper estimates of are derived, dependent on the sample size n.Besides the risk neutral case risk averse stochastic programs in terms of upper semideviations and divergence risk measures were considered there.As a by product uniform tightness of the stochastic sequence is obtained in all cases.In this paper we continue this work, focussing on the asymptotic distributions of the stochastic sequence.To the best of our knowledge, this issue has been studied in [14] and [7] only.In both contributions, G is assumed to be Lipschitz continuous in θ, and a subclass of distribution-invariant convex risk measures of a specific form is considered.We shall extend the investigations in respect of allowing for more general goal functions G, which make possible to apply the results to stochastic programs whose goal functions are not continuous in the parameter.The value functions of two stage mixed-integer stochastic programs are prominent examples for such a type goal functions.Concerning the choice of distribution-invariant convex risk measures we shall restrict ourselves to absolute semideviations and divergence risk measures.These classes have only a small intersection with the class of distribution-invariant convex risk measures in [14] but no one with the class in [7].
The paper is organized as follows.We shall start with a general central limit theorem type result for the optimal values of classical risk neutral stochastic programs.The point is that we may extend this result if the SAA method is applied to risk averse stochastic programs.In Section 3 this will be demonstrated in the case that stochastic programs are expressed in terms of absolute semideviations, whereas in Section 4 the application to stochastic programs under divergence risk measures are considered.Our main asymptotic results are based on a technical result concerning the convergence of the sequence n j=1 G(•, Z j )/n n∈N in the path space.It will be formulated in Section 5. Finally Section 6 gathers proofs of results from the previous sections.
The essential new ingredient of our results is to replace analytic conditions on the paths G(•, z) with requirements which intuitively make the family {G(θ, Z) | θ ∈ Θ} of random variables small in some certain sense.Fortunately, the respective invoked conditions are satisfied if the paths G(•, z) are Hölder continuous.We shall also see that we may utilize our results to study the SAA method for stochastic programs, where the paths G(•, z) are piecewise Hölder continuous but not necessarily continuous or convex.Value functions of two stage mixed-integer programs are typical examples for goal functions of such a kind.

First order asymptotics in the risk neutral case
In this section we study the SAA (1.2) associated with the risk neutral stochastic program (1.1).We shall restrict ourselves to mappings G which satisfy the following properties.(A2) There is some strictly positive P Z -integrable mapping ξ : Note that under these assumptions the optimization problems (1.1) and (1.2) are well defined with finite optimal values.

1)
] is a random variable on (Ω, F , P) for every n ∈ N.
2) There exists a solution of (1.1), and the set of solutions is compact.
The proof may be found in Subsection 6.1.
In order to develop nonasymptotic confidence intervals on the optimal value of (1.1) the authors in [13] provide specific separate upper estimates for the deviation probabil-ities They assume G to be convex in θ such that the goal function of (1.1) is differentiable, and they also impose conditions on the tail behaviour of the random variables G(θ, Z).
Avoiding such regularity conditions, in [18] upper estimations has been derived for deviations probabilities There, no further restrictions beyond property (A2) are imposed on the tail behaviour of the random variables G(θ, Z).Moreover, the analytical requirements on paths of G are replaced with some specific condition on the function class {G(θ, •) | θ ∈ Θ} which we shall explain in more detail soon in this section.The results in [18] already imply (see [18,Theorem 2.2]).Moreover, the sequence (1.3) is uniformly tight, i.e. relatively compact w.r.t. the topology of weak covergence (see [18,Theorem 2.5]).
Throughout this section we want to complete the results from [18] by some criterion to ensure weak convergence of the sequence (1.3), and to find its asymptotic distributions.In contrast to the literature on the first order asymptotics of the SAA method we do not want to impose analytical properties for the objective G like e.g.continuity convexity in the parameter θ.Instead, as in [18], we suggest a condition which makes the function class F Θ := {G(θ, •) | θ ∈ Θ} small in some sense.Convenient ways to express this idea may be provided by general devices from empirical process theory which are based on covering numbers for classes of Borel measurable mappings from R d into R w.r.t.L p -norms.To recall these concepts adapted to our situation, let us fix any nonvoid set F of Borel measurable mappings from R d into R and any probability measure

• Covering numbers for F
We use N η, F, L p (Q) to denote the minimal number to cover F by closed d Q,pballs of radius η > 0 with centers in F. We define N η, F, L p (Q) := ∞ if no finite cover is available.
• An envelope of F is defined to mean some Borel measurable mapping C F : R d → R satisfying sup h∈F |h| ≤ C F .If an envelope C F has strictly positive outcomes, we shall speak of a positive envelope.
• M fin denotes the set of all probability measures on B(R d ) with finite support.
Usually, upper estimations of covering numbers are used instead of exact calculations.
For abbreviation let us introduce for a class F of Borel measurable functions from R d into R with arbitrary positive envelope C F of F the following notation As we shall see, the finiteness of some integral J(F Θ , C F Θ , 1) is already sufficient for our purposes.Henceforth we shall restrict considerations to "small" classes F Θ in the sense that J (F Θ , C F Θ , 1) is finite for some positive square In the following result concerning the asymptotic distribution of the sequence (1. In particular, if the optimization problem (1.1) has a unique solution θ * , under the given assumptions the sequence (1.3) converges weakly to some centered normally distributed random variable with variance Var G(θ * , Z 1 ) .
The proof is delegated to Subsection 6.2.
The result on the asymptotic distribution crucially requires J(F Θ , ξ, 1) to be finite.This property is always satisfied if the involved covering numbers have polynomial rates.Indeed this relies on the observation, that by using change of variable formula several times along with integration by parts, we obtain Inequality (2.5) may be applied if there exist K ≥ e, v ≥ 1 such that the following condition is satisfied Prominent examples satisfying (2.6) are provided by so called VC-subgraph classes (see e.g.[30]).
In [18] explicit upper estimates of the terms J (F Θ , ξ, δ) have been derived for objectives G satisfying specific analytical properties.The line of reasoning there is to show property (2.6) and then to utilize (2.5) (see Propositions 2.6, 2.8 and their proofs in [18]).Let us recall these specializations.
Denoting the Euclidean metric on R m by d m,2 , the first one is built on the following condition.
In the second special case from [18] objectives G have been considered having the following kind of structure of piecewise Hölder continuity.
In two stage mixed-integer programs the goal functions typically may be represented by (PH) if the random vector Z has compact support (see [10, p. 121] together with [18]).Within this special situation with compact Θ the authors in [10] derive the same asymptotic distributions for the sequence (1.3) as in Theorem 2.2.Their line of reasoning is based upon the corresponding representation (PH) of G, and it relies also on finiteness of the integrals J(F Θ , C F Θ , δ).We may extend their result to general objectives G having representation (PH), because under this condition the application of Theorem 2.2 is quite immediate.
According to Proposition 2.8 from [18] requirement (A1) is fulfilled, and is square P Z -integrable, satisfying (A2) and J(F Θ , ξ, 1) < ∞.Also explicit upper estimations of J(F Θ , ξ, 1) are provided there.Hence all requirements of Theorem 2.2 are met.Then for every positive upper estimate L ≥ Var G(θ * , Z 1 ) we may define by These confidence intervals may be considered as an alternative of the nonasymptotic confidence intervals which may be built directly on the upper estimates for the deviation probabilities (2.3) derived in Theorem 2.2 from [18].It should be emphasized that these both ways to find confidence intervals do not require in advance path properties of the objective G like continuity or convexity.In particular these methods may be used in the case that objectives have representation (PH), e.g. in two stage mixed-integer programs.
In [13] the authors develop nonasymptotic confidence intervals which are based on specific separate upper estimates for the deviation probabilities (2.1) and (2.2).The special feature of their suggestion is that the confidence intervals are independent of the dimension of the parameters (see [13,Discussion 2.1.3,(3)]).However, G has to be convex in the parameter.

First order asymptotics under absolute semideviations
Let L 1 (Ω, F , P) denote the usual L 1 -space on (Ω, F , P), where we tacitely identify random variables which are different on P-null sets only.
We want to study the risk averse stochastic program (1.4), where in the objective the functional ρ is an absolute semideviation.This means that for a ∈]0, 1] the functional ρ = ρ 1,a is defined as follows It is well-known that absolute semideviations are increasing w.r.t. the increasing convex order (cf.e.g.[29, Theorem 6.51 along with Example 6.23 and Proposition 6.8]).They are also distribution-invariant so that we may define the associated functional R ρ 1,a on the set of distributions functions of random variables with first absolute moments.The aim of this section is the optimization problem where F θ stands for the distribution function of G(θ, Z) for θ ∈ Θ.The set of minimizers of this problem will be denoted by argmin R ρ 1,a .
Introducing the notation we may describe this optimization also in the following way The stochastic objective of the approximative problem according to the SAA method has the following representation.
Let (A1), (A2) be fulfilled, and let J(F Θ , ξ, 1) < ∞, where ξ is from (A2).Then under some minor additional regularity conditions on G we already know that [18,Theorem 3.5]).Moreover, also by Theorem 3.5 from [18], the sequence is uniformly tight.The aim of this section is to find asymptotic distributions of this sequence.The starting point is the following observation from (3.3) where, setting Then the key is to show that sequence (3.5) has the same asymptotic distribution as the following sequence if one of these sequences converges weakly.In this case we may apply Theorem 2.2 to the sequence (3.7) to derive the asymptotic distribution of sequence (3.5).
The investigations are based on the following mild continuity requirement for the objective G.
Requirement (A3) implies some useful convergence property in continuity points of the distribution functions F θ .Lemma 3.1 Let t ∈ R be a continuity point of F θ for some θ ∈ Θ, and let (A3) be fulfilled.If θ n → θ and t n → t, then F θn (t n ) → F θ (t).
Proof Fix any ε > 0. Then there is some The statement may be derived immediately from continuity of F θ at t. ✷ The following result is the providing step to show that the sequences (3.5) and (3.7) have identical asymptotic distributions if (3.7) converges weakly.
Proposition 3.2 Let (A1) -(A3) be fulfilled, where the mapping ξ from (A2) is square P Z -integrable.Furthermore the distribution function where P * stands for the outer probability of P.
The proof of Proposition 3.2 may be found in Subsection 6.3.Now, combining Proposition 3.2 with Theorem 2.2, we may derive our result on the first order asymptotics of the SAA under absolute semideviations.Theorem 3.3 Let a ∈]0, 1], let (A1) -(A3) be fulfilled, where the mapping ξ from condition (A2) is P Z -integrable of order 4. Furthermore the distribution function 2) The set argmin R ρ 1,a is nonvoid and compact.
3) There exists some centered Gaussian process G = ( G θ ) θ∈Θ such that the sequence In particular, if the optimization problem (3.3) has a unique solution θ * , then under the given assumptions we have weak convergence to some centered normally distributed random variable with variance Var G 1,a (θ * , Z 1 ) .
Secondly, E[ G 1,a (θ, Z 1 )] = ρ 1,a G(θ, Z 1 ) holds for any θ ∈ Θ. Hence in view of Proposition 3.2 along with a version of Slutsky's lemma ([30, Lemma 1.10.2]) it remains to show that the objective G 1,a meets the requirements of Proposition 2.1 and Theorem 2.2.
In Lemma 6.1 below it will be shown that the mapping θ → E[G(θ, Z 1 )] on Θ is continuous under (A1) -(A3).This implies by Lemma 3.1 the continuity of the mappings For a nonnvoid bounded subset K of R we denote by F K the set of all constant mappings on R d with outcomes in K.We shall use notation N(ε, K, | • |) for the minimal number to cover K by intervals of the form [a − ε, a + ε], where ε > 0 and a ∈ K. Then In particular, using notation (2.4), we have J(F K , c, 1) < ∞ for every positive, constant envelope c of F K .So we have finiteness of the terms J(F K 1 , c 1 , 1) and J(F K 2 , c 2 , 1), where ].Moreover, since ξ is P Z -integrable of order 4, we obtain by Lemma 3.4 from [18] that there exists some square a , ξ 1,a , 1).Now, invoking Lemma 9.14 and Theorem 9.15 both from [17], and recalling J(F Θ , ξ, 1) < ∞, the mapping ξ+E[ξ(Z 1 )]+ξ 1,a is a square By continuity of the three mappings Remark 3.5 Theorem 3.3 offers a method to construct asymptotic confidence intervals on the optimal value of optimization (3.1) in the case that there is some unique solution θ * .The way is exactly the same one as we may find asymptotic confidence intervals on the optimal value of (1.1) according to Remark 2.5.The only difference is the choice of the involved positive estimate L which should satisfy L ≥ Var G 1,a (θ * , Z 1 ) .These asymptotic confidence intervals might be compared with nonasymptotic confidence intervals which are based on upper estimates for the deviation probabilities from Theorem 3.5 in [18].By Example 2.4 both methods may be applied for objectives with representation (PH), e.g. in two stage mixed-integer programs.

First order asymptotics under divergence risk measures
Let denote by L p := L p (Ω, F , P) the usual L p -space on (Ω, F , P) (p ∈ [0, ∞]), where we tacitely identify random variables which are differ on P-null sets only.
We want to study the risk averse stochastic program (1.4), where we shall focus on ρ being a divergence measure.For introduction, let us consider a lower semicontinuous convex mapping The Orlicz heart is known to be a vector space enclosing all P-essentially bounded random variables.Moreover, by Jensen's inequality all members of H Φ * are P-integrable.For more on Orlicz hearts w.r.t. to Young functions the reader may consult [9].
We can define the following mapping for all X ∈ H Φ * , where P Φ , denotes the set of all probability measures P which are absolutely continuous w.r.t.P such that Φ dP dP is P−integrable.Note that dP dP X is P−integrable for every P ∈ P Φ and any X ∈ H Φ * due to Young's inequality.We shall call ρ Φ the divergence risk measure w.r.t.Φ.
Theorem 4.1 The divergence risk measure ρ Φ w.r.t.Φ satisfies the following representation The representation in Theorem 4.1 is also known as the optimized certainty equivalent w.r.t.Φ * .As optimized certainty equivalent the divergence measure ρ Φ may be seen directly to be nondecreasing w.r.t. the increasing convex order.Theorem 4.1 also shows that ρ Φ is distribution-invariant.In particular, we may define the functional R ρ Φ associated with ρ Φ on the set F Φ * of all distribution functions of the random variables from H Φ * .Note that (Ω, F , P) supports some random variable U which is uniformly distributed on ]0, 1[ because (Ω, F , P) is assumed to be atomless.Then we obtain for any distribution function F ∈ F Φ * with left-continuous quantile function For ease of reference we shall use notation M F to denote the set of all x ∈ R which solve the minimization in definition (4.1) of R ρ Φ (F ).In view of Proposition A.1 from the Appendix, each set M F is a nonvoid compact interval.
Throughout this section we focus on the following specialization of optimization problem (1.4) where F θ stands for the distribution function of G(θ, Z) for θ ∈ Θ.The set of minimizers of the problem (4.2) will be denoted by argmin R ρ Φ .The SAA (1.5) of (4.2) reads as follows.
We shall strengthen condition (A2) to the following property.
Note that (A2') together with (A1) implies that G(θ, Z 1 ) belongs to H Φ * for every θ ∈ Θ so that the genuine optimization problem (4.2) is well-defined.According to Theorem 4.6 in [18] is uniformly tight.The essential requirements are assumptions (A1), (A2') and the finiteness of J (F Θ , ξ, 1), where ξ is from (A2').In this section we want to derive asymptotic distributions of the sequence (4.4).Representation (4.3) along with Theorem 4.1 suggests to apply Theorem 2.2 to the SAA of inf where Unfortunately, the application is not immediate because the parameter space is not totally bounded w.r.t. the Euclidean metric on R m+1 .We already know that the solution set of the optimization problem (4.5) is compact under (A1), (A2') (see [18,Lemma 5.8]).Conditions (A1), (A2') also imply that the associated SAA problems have nonvoid compact solution sets.Unfortunately, they may depend on the realizations of the samples.In [18] a kind of compactification was suggested which allows to restrict the parameter set of the random process G Φ to suitable compact subsets.The idea is to show that with arbitrarily high probability we may find for large sample sizes events from F on which all solution sets of the SAA problems are contained in a common compact superset.The following result from [18] gives a precise formulation of this idea.For preparation consider any mapping ξ as in (A2') and let us introduce for δ > 0 and n ∈ N Note that A ξ n,δ belongs to F , and P(A ξ n,δ ) → 1 for n → ∞ due to the law of large numbers.The following result has been shown in [18] (Theorem 5.7 with Lemma 5.8).Proposition 4.2 Let (A1), (A2') be fulfilled.Then the set of solutions of problem (4.5) is nonvoid and compact, and there always exists a solution of (4.3) for any ω.
Furthermore with mapping ξ from (A2'), for every δ > 0 and n ∈ N there is some and for δ > 0, n ∈ N and ω ∈ A ξ n,δ .Based upon Proposition 4.2 it will turn out that it is already sufficient to apply Theorem 2.2 to the SAA corresponding to the function classes of the following type The finiteness of the terms ) is already guaranteed by finiteness of the terms J(F Θ , C F Θ , 1) associated with the genuine objective G.This will be the subject of the following result which has been proved in [18] (Lemma 4.3).

Lemma 4.3 Let Φ * ′
+ denote the right-sided derivative of Φ * .If ξ is a square P Zintegrable positive envelope of F Θ , then for any k ∈ N the mapping • is square P Z -integrable for (θ, x) ∈ Θ × R, then we shall endow Θ × R with the semimetric d Θ,Φ defined by Now the application of Theorem 2.2 to the restricted optimization problems associated with the function classes F Θ Φ,k reads as follows.
Proposition 4.4 Let (A1), (A2') be fulfilled.The Borel measurable mapping ξ from (A2') is assumed to satisfy the property that the mapping ) is finite, then the following statements are true.

1) inf
is a random variable on the probability space (Ω, F , P) for arbitrary k, n ∈ N.
2) The sets are nonvoid and compact.
3) For k ∈ N the semimetric d Θ,Φ is totally bounded on Θ × [−k, k], and there exists some centered Gaussian process . This Gaussian process has uniformly continuous paths w.r.t.d Θ,Φ and satisfies for θ, ϑ ∈ Θ and x, y Proof Note that by (A1) the objective G Φ is measurable w.r.t. the product σ-algebra B(Θ)⊗R(R d ) and lower semicontinuous in the parameters (θ, x) because Φ * is continuous and nonincreasing.Next, by Lemma 4.3, the mapping ξ k is a positive envelope of F Θ Φ,k , which is also square P Z -integrable by assumption.Hence G Φ (θ, x), • is square Theorem 4.5 Let (A1), (A2') be fulfilled.The measurable mapping ξ from (A2') is assumed to satisfy the property that the mapping ) is finite, then the following statements are valid.
3) For k ∈ N the semimetric d Θ,Φ is totally bounded on Θ × [−k, k], and there exists some centered Gaussian process In Then statement 1) may be concluded immediately from statement Next, by assumption, the sequences ξ(Z j ) j∈N and Φ * ξ(Z j ) j∈N consist of independent integrable random variables which are identically distributed as ξ(Z 1 ) and Φ * (ξ(Z 1 ) respectively.Then P(Ω \ A ξ n,1 ) → 0 by law of large numbers, and thus for ε > 0. Note also that by choice of k 1 the set S R of solutions of (4.5) coincides with the set Then in view of statement 3) of Proposition 4.4 along with (4.8) and (4.3) where ) are identically distributed random variables.Then in view of (4.9) Moreover, we already know from statement 2) that the sets argmin R ρ Φ and M F θ (θ ∈ Θ) are nonvoid.We may also observe that argmin R ρ Φ = Pr(S R ) holds, where Pr denotes the standard projection from Θ × R onto Θ. Therefore Next let us illustrate the assumptions of Theorem 4.5 by the example of the so called Average Value at Risk also known as Expected Shortfall.
Example 4.8 Let Φ be defined by In particular H Φ * coincides with L 1 , and we may recognize R ρ Φ as the so called Average Value at Risk w.r.t.α (e.g.[11], [29]), i.e.
(see e.g.[16]).It may be verified easily that where F → 0 denotes the right-continuous quantile function of F 0 ∈ F Φ * .1) Let ξ be a square P Z -integrable positive envelope of F Θ .Then ξ satisfies (A2'), and the mapping In particular Theorem 4.5 carries over if (A1) is satisfied, and if J(F Θ , ξ, 1) is finite.
2) If condition (H) is satisfied, and if G(θ, •) is square P Z -integrable for some θ ∈ Θ, then we may find by Example 2.3 some square P Z -integrable positive envelope of F Θ with finite J(F Θ , ξ, 1).Hence in view of statement 1) Theorem 4.5 may be always applied under (A1).
3) In case that G has representation (PH) being lower semicontinuous in θ, there exists by Example 2.4 some square P Z -integrable positive envelope of F Θ with finite J(F Θ , ξ, 1).Furthermore (A1) is automatically fulfilled.Therefore by statement 1) Theorem 4.5 may be applied.
In addition convexity in θ was imposed on the goal function G (see [14,Theorem 2]).However the investigations there are extended to optimization problems Remark 4.9 If the optimization problem (4.2) has a unique solution θ * , and if M F θ * has one element x θ * only, then with positive upper estimate L ≥ Var Φ * G(θ * , Z 1 ) + x * we may construct by Theorem 4.5 asymptotic confidence intervals on the optimal value of (4.2) in the same way as described in Remark 2.5.Alternatively, nonasymptotic confidence intervals may be built on upper estimates for the deviation probabilities from Theorem 4.6 in [18].In view of Example 2.4 both approaches may be applied to objectives satisfying representation (PH), e.g. in two stage mixed-integer programs.
In the special case of ρ being the Average Value at Risk it would be interesting to compare the confidence intervals corresponding to both methods with those confidence which are obtained with stochastic mirror descent as in Subsection 5.2.2 of [14].However, for this purpose we have to impose further regularity conditions on the objective at least convexity and continuity in the parameters.

The basic technical result
In Sections 3, 4 the main convergences results, namely Theorems 3.3, 4.5, are derived as applications of Theorem 2.2.Roughly speaking our verification of Theorem 2.2 combines a convergence theorem for empirical processes with the functional delta method for optimal values.This section provides a suitable convergence result for empirical processes adapted to the situation of the paper.
We shall use the following notations.For some nonvoid set T we shall denote by l ∞ (T) the space of all bounded real-valued mappings on T. It will be endowed with the supremum norm • T,∞ and the induced Borel σ-algebra.
Turning over to the issue of asymptotic distributions of the random processes Y n , we are faced with the inconvenience that they might not be Borel random elements of l ∞ (Θ).Hence in general we may not apply weak convergence to the random processes Y n .Fortunately, for our purposes it is sufficient to look instead when the sequence ( √ n Y n ) n∈N of empirical processes √ n Y n converges in law (in the sense of Hoffmann-Jorgensen) to some tight Borel random element of l ∞ (Θ).Recall that for a sequence (Ω n , F n , P n ) n∈N of probability spaces and a metric space (D, d D ), a sequence (W n ) n∈N of mappings W n : Ω n → D is called to converge in law (in the sense of Hoffmann-Jorgensen) to a Borel random element ] holds for every bounded continuous f : D → R.Here E * n is used to denote the outer expectation w.r.t. to P n .For introduction and further studies of this kind of convergence we recommend [30], where, however, it is called weak convergence.Note that the mappings are not required to be Borel random elements of D. Thus convergence in law differs from the usual weak convergence of Borel random elements.We decided to emphasize this difference by avoiding the term weak convergence.Obviously both concepts coincide if the involved mappings are Borel random elements of D.
Let us remind the semimetric d Θ on Θ, defined just before Theorem 2.2.Our basic technical result is the following criterion which guarantees almost sure convergence of Y n (θ, •) Θ,∞ n∈N , and convergence in law of ( √ n Y n ) n∈N to some tight centered Gaussian random element of l ∞ (Θ).
for every ω ∈ Ω, and the sequence ( √ nX I n ) n∈N converges in law to some tight centered Gaussian random element of l ∞ (Θ × I).

Proof of Theorem 5.1 and Theorem 2.2
Let (Y n ) n∈N be the sequence of stochastic processes introduced in Section 5.In order to show the desired convergences of Y n Θ,∞ n∈N and ( √ n Y n ) n∈N in l ∞ (Θ) we shall invoke results from empirical process theory.We have to circumvent some subtleties of measurability, reminding the notion of P Z -measurable classes.A class F of Borel measurable mappings from R d into R is called a P Z -measurable class if for n ∈ N and a 1 , . . ., a n ∈ R n , and every h ∈ F the mapping is well-defined and measurable on the completion of the n-times product probability space R nd , B(R nd ), (P Z ) n of R d , B(R d ), P Z (see [30,Definition 2.3.3]).Define for a nonvoid nonvoid Γ ⊆ Θ × Θ the function class If Γ is a Borel subset of Θ × Θ these classes are already P Z -measurable classes which is subject of the following result.
Since Γ is a Borel subset of a Polish space, the mappings are random variables on the completion of the probability space R dn , B(R dn ), (P Z ) n for n ∈ N and (a 1 , . . ., a n ) ∈ R n (see [30,Example 1.7.5]).
In exactly the same way we may also verify F Θ as a P Z -measurable class.This completes the proof.✷ Now, we are ready to show Theorem 5.1.
Proof of Theorem 5.1: Let (Ω, F, P) = (R d ) N , B(R d ) ⊗N , (P Z ) N be the countable product probability space of R d , B(R d ), P Z .Furthermore for j ∈ N, the mapping For any real valued mapping f on Ω the set E f of all mappings f : Ω → R ∪ {∞} satisfying f ≥ f pointwise and {f > x} ∈ F for x ∈ R is nonvoid.Moreover, there exists some f * ∈ E f such that f * ≤ f P-a.s. for every f ∈ E f (see [30, Since F Θ is a P Z -measurable class by Lemma 6.2, and since the positive envelope ξ is assumed to be square P Z -integrable, we may apply Theorem 2.14.1 from [30] to find a constant M > 0 such that In particular, by finiteness of J(F Θ , ξ, 1), we end up with Then in view of Corollary 3.7.9from [12] statement 1) follows immediately.
which is Borel measurable, and thus Γ δ is a Borel subset of Θ × Θ.So in view of Lemma For further preparation, combining Theorem 5.1 with Egorov's theorem, we may select some sequence (Ω k ) k∈N in F satisfying The next result deals with the asymptotics of the first summand in (6.1).
Proof Let for ε > 0 and n ∈ N define the set B ε n to consist of all ω ∈ Ω such that sup θ∈Θ |W n (θ, ω)| > ε.We select a sequence (Ω k ) k∈N of events as in (6.2).By choice of these events it suffices to show P * (B ε n ∩ Ω k ) → 0 for every k ∈ N.So let us fix any k ∈ N. Set ] and let us remind the mapping G I and random processes X In view of (6.2) there is some n 0 ∈ N such that X n (ω, θ) ∈ I for ω ∈ Ω k , θ ∈ Θ, and n ∈ N with n ≥ n 0 .This implies for ω ∈ Ω k , θ ∈ Θ, and n ∈ N with n ≥ n 0 .Furthermore by (6.2) Combining (6.3), (6.4) and (6.5), we end up with P * (B ε n ∩ Ω k ) → 0 for n → ∞ which completes the proof.✷ Let us turn over to the asymptotics of the second summand in (6.1).

B. Appendix
Let G = (G t ) t∈T be some centered Gaussian process on some probability space (Ω, F, P) with intrinsic semimetric d T on T, defined by d T (t, t) = Var(G t − G t ).We assume

(
A1) The set Θ is a compact subset of R m .The mapping G is measurable w.r.t. the product B(Θ) ⊗ B(R d ) of the Borel σ-algebra B(Θ) on Θ and the Borel σ-algebra on R d , and G(•, z) is lower semicontinuous for every z ∈ R d .
the entire statement of Proposition 4.4 follows immediately from Proposition 2.1, and Theorem 2.2 along with Lemma 4.3.✷ Combining Proposition 4.4 with Proposition 4.2 we may derive our main result concerning first order asymptotics of the SAA (4.3).

(4. 11 )Remark 4 . 6 Remark 4 . 7
Combining (4.10) and(4.11)with the above mentioned properties of G Φ , the first part of statement 3) follows immediately.The remaing part is an obvious consequence of the first one.The proof is complete.✷If Φ(0) = 0, and if Φ * is strictly convex on ]0, ∞[, then M F θ is a singleton for every θ ∈ Θ due to Proposition A.2 in the Appendix below.In this situation we may obtain asymptotic normality in the statement 3) of Theorem 4.5 if the genuine optimization problem (4.2) has a unique solution θ * .In case that G either satisfies condition (H) or has representation (PH) Example 2.3 or Example 2.4 respectively show how to find proper positive envelopes ξ of F Θ meeting all requirements of Theorem 4.5.

1 )Corollary 5 . 2
lim n→∞ Y n Θ,∞ = 0 P-a.s.. 2) d Θ is totally bounded, and there exists some tight random element G of l ∞ (Θ) such that the sequence ( √ n Y n ) n∈N converges in law to G.This tight random element is a centered Gaussian process G = (G θ ) θ∈Θ which has uniformly continuous paths w.r.t.d Θ , satisfying in additionE G θ • G ϑ = Cov G(θ, Z 1 ), G(ϑ, Z 1 ) for θ, ϑ ∈ Θ.Theorem 5.1 has the following corollary which will turn out to be useful in the context of the SAA method under absolute semideviation.Define for any nonvoid compact interval I ⊆ R the real valued mappingG I on (Θ × I) × R d via G I (θ, t), z := G(θ, z) − t) + .Then under the assumptions of Theorem 5.1 the mappings
addition the paths of the Gaussian process are uniformly continuous on the set Θ × [−k, k] w.r.t.d Θ,Φ for k ∈ N.Moreover, if the optimization problem (4.2) has a unique solution θ * , and if M F θ * has one element x θ * only, then the weak limit is some centered normally distributed random variable with variance Var Φ * G(θ * , Z 1 ) + x * .
[30,osition 6.4Let (A1) -(A3) be fulfilled, where the mapping ξ from (A2) is square P Z -integrable.Furthermore let F θ be continuous at E[G(θ, Z 1 )] for every θ ∈ Θ.Using notation (2.4), if J(F Θ , ξ, 1) is finite, then lim Proof Let us remind the sequence (Y n ) n∈N introduced at the beginning of Section 5, and recall that Y n Θ,∞ is a random variable on (Ω, F , P).We may apply Theorem 5.1 to conclude that the sequence ( √ nY n ) n∈N converges in law to some tight random element of l ∞ (Θ).Since the norm • Θ,∞ is a continuous function on l ∞ (Θ), the application of the continuous mapping theorem for convergence in law (see[30, Theorem 1.11.1])yields that ( √ n Y n Θ,∞ ) n∈N converges weakly to some random variable.In particular, by Prokhorov's theorem, this sequence of random variables is uniformly tight.Hence we may find some strictly increasing sequence (a k ) k∈N of positive real numbers such that the inequality P n→∞ P * sup θ∈Θ |U n (θ, ω)| > ε = 0 for ε > 0,where P * stands for the outer probability of P.