1 Introduction

Consider a classical risk neutral stochastic program

$$\begin{aligned} \inf _{\theta \in \Theta }{\mathbb {E}}\big [G(\theta ,Z)\big ], \end{aligned}$$
(1.1)

where \(\Theta \) denotes a compact subset of \({\mathbb {R}}^{m}\), whereas Z stands for a d-dimensional random vector with distribution \({\mathbb {P}}^{Z}\). In general the parameterized distribution of the goal function G is unknown, but some information is available by i.i.d. samples. Using this information, a general device to solve approximately problem (1.1) is provided by the so-called Sample Average Approximation (SAA) (see [29]). For explanation, let us consider a sequence \((Z_{j})_{j\in {\mathbb {N}}}\) of independent d-dimensional random vectors on some fixed complete atomless probability space \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) which are identically distributed as the d-dimensional random vector Z. Let us set

$$\begin{aligned} {\hat{F}}_{n,\theta }(t):= \frac{1}{n}~\sum _{j=1}^{n}\mathbbm {1}_{]-\infty ,t]}\big (G(\theta ,Z_{j})\big ) \end{aligned}$$

to define the empirical distribution function \({\hat{F}}_{n,\theta }\) of \(G(\theta ,Z)\) based on the i.i.d. sample \((Z_{1},\ldots ,Z_{n})\). Then the SAA method approximates the genuine optimization problem (1.1) by the following one

$$\begin{aligned} \inf _{\theta \in \Theta }\int _{{\mathbb {R}}}t ~d{\hat{F}}_{n,\theta }(t) = \inf _{\theta \in \Theta }\frac{1}{n}~\sum _{j=1}^{n}G(\theta ,Z_{j})\quad (n\in {\mathbb {N}}). \end{aligned}$$
(1.2)

The optimal values depend on the sample size and the realization of the samples of Z. Their asymptotic behaviour with increasing sample size, also known as the first order asymptotics of (1.1), is well-known. More precisely, the sequence of optimal values of the approximated optimization problem converges \({\mathbb {P}}\)-a.s. to the optimal value of the genuine stochastic program. Moreover, if G is Lipschitz continuous in \(\theta \), and if (1.1) has a unique solution, then the stochastic sequence

$$\begin{aligned} \left( \sqrt{n}\Big [\inf _{\theta \in \Theta }\int _{{\mathbb {R}}}t ~d{\hat{F}}_{n,\theta }(t) - \inf _{\theta \in \Theta }{\mathbb {E}}\big [G(\theta ,Z)\big ]\Big ]\right) _{n\in {\mathbb {N}}} \end{aligned}$$
(1.3)

is asymptotically normally distributed. In [10] asymptotic distributions of this stochastic sequence have also be found for stochastic mixed-integer programs, where typically the objectives are not continuous in the parameter. For these results, and more on asymptotics of the SAA method the reader may consult the monograph [29], and in addition the contributions [23] and [25].

In several fields like finance, insurance or microeconomics, the assumption of risk neutral decision makers are considered to be too idealistic. Instead there it is preferred to study the behaviour of actors with a more cautious attitude, known as risk aversion. In this view the optimization problem (1.1) should be replaced with a risk averse stochastic program, i.e. an optimization problem

$$\begin{aligned} \inf _{\theta \in \Theta }\rho \big (G(\theta ,Z)\big ), \end{aligned}$$
(1.4)

where \(\rho \) stands for a functional which is nondecreasing w.r.t. the increasing convex order. A general class of functionals fulfilling this requirement is built by the so called distribution-invariant convex risk measures (see e.g. [11, 29]). They play an important role as building blocks in quantitative risk management (see [21, 24, 27]), and they have been suggested as a systematic approach for calculations of insurance premia (cf. [15]). Distribution-invariance denotes the property that a functional \(\rho \) has the same outcome for random variables with identical distribution. Hence, a distribution-invariant convex risk measure \(\rho \) may be associated with a functional \({\mathcal {R}}_{\rho }\) on sets of distribution functions (see e.g. [19, Sect. 4.2], and also [5, (2.4)]). In this case (1.4) reads as follows

$$\begin{aligned} \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho }(F_{\theta }), \end{aligned}$$

where \(F_{\theta }\) is the distribution function of \(G(\theta ,Z)\). Then we may modify the SAA method by

$$\begin{aligned} \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho }({\hat{F}}_{n,\theta })\quad (n\in {\mathbb {N}}). \end{aligned}$$
(1.5)

As the title says, the subject of the paper is the first order asymptotics of the SAA method for (1.4) with \(\rho \) being a distribution-invariant convex risk measure. It is already known that under rather general conditions on the mapping G we have

$$\begin{aligned} \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho }\big ({\hat{F}}_{n,\theta }\big )\rightarrow \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho }\big (F_{\theta }\big )\quad {\mathbb {P}}-\hbox {a.s.} \end{aligned}$$

(see [28]). In [18] nonasymptotic upper estimates of

$$\begin{aligned} {\mathbb {P}}\Big (\Big \{\big |\inf _{\theta \in \Theta }{\mathcal {R}}_{\rho }\big ({\hat{F}}_{n,\theta }\big ) - \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho }\big (F_{\theta }\big )\big |\ge \varepsilon \Big \}\Big )\quad (n\in {\mathbb {N}},\varepsilon > 0) \end{aligned}$$

are derived, dependent on the sample size n. Besides the risk neutral case risk averse stochastic programs in terms of upper semideviations and divergence risk measures were considered there. As a by product uniform tightness of the stochastic sequence

$$\begin{aligned} \Big (\sqrt{n}\big [\inf _{\theta \in \Theta }{\mathcal {R}}_{\rho }\big ({\hat{F}}_{n,\theta }\big ) - \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho }\big (F_{\theta }\big )\big ]\Big )_{n\in {\mathbb {N}}} \end{aligned}$$

is obtained in all cases. In this paper we continue this work, focussing on the asymptotic distributions of the stochastic sequence. To the best of our knowledge, this issue has been studied in [14] and [7] only. In both contributions, G is assumed to be Lipschitz continuous in \(\theta \), and a subclass of distribution-invariant convex risk measures of a specific form is considered. We shall extend the investigations in respect of allowing for more general goal functions G, which make possible to apply the results to stochastic programs whose goal functions are not continuous in the parameter. The value functions of two stage mixed-integer stochastic programs are prominent examples for such a type goal functions. Concerning the choice of distribution-invariant convex risk measures we shall restrict ourselves to absolute semideviations and divergence risk measures. These classes have only a small intersection with the class of distribution-invariant convex risk measures in [14] but no one with the class in [7].

The paper is organized as follows. We shall start with a general central limit theorem type result for the optimal values of classical risk neutral stochastic programs. The point is that we may extend this result if the SAA method is applied to risk averse stochastic programs. In Sect. 3 this will be demonstrated in the case that stochastic programs are expressed in terms of absolute semideviations, whereas in Sect. 4 the application to stochastic programs under divergence risk measures are considered. Our main asymptotic results are based on a technical result concerning the convergence of the sequence \(\big (\sum _{j=1}^{n}G(\cdot ,Z_{j})/n\big )_{n\in {\mathbb {N}}}\) in the path space. It will be formulated in Sect. 5. Finally Sect. 6 gathers proofs of results from the previous sections.

The essential new ingredient of our results is to replace analytic conditions on the paths \(G(\cdot ,z)\) with requirements which intuitively make the family \(\{G(\theta ,Z)\mid \theta \in \Theta \}\) of random variables small in some certain sense. Fortunately, the respective invoked conditions are satisfied if the paths \(G(\cdot ,z)\) are Hölder continuous. We shall also see that we may utilize our results to study the SAA method for stochastic programs, where the paths \(G(\cdot ,z)\) are piecewise Hölder continuous but not necessarily continuous or convex. Value functions of two stage mixed-integer programs are typical examples for goal functions of such a kind.

2 First order asymptotics in the risk neutral case

In this section we study the SAA (1.2) associated with the risk neutral stochastic program (1.1). We shall restrict ourselves to mappings G which satisfy the following properties.

  1. (A1)

    The set \(\Theta \) is a compact subset of \({\mathbb {R}}^{m}\). The mapping G is measurable w.r.t. the product \({\mathcal {B}}(\Theta )\otimes {\mathcal {B}}({\mathbb {R}}^{d})\) of the Borel \(\sigma \)-algebra \({\mathcal {B}}(\Theta )\) on \(\Theta \) and the Borel \(\sigma \)-algebra on \({\mathbb {R}}^{d}\), and \(G(\cdot ,z)\) is lower semicontinuous for every \(z\in {\mathbb {R}}^{d}\).

  2. (A2)

    There is some strictly positive \({\mathbb {P}}^{Z}\)-integrable mapping \(\xi :{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\) such that

    $$\begin{aligned} \sup \limits _{\theta \in \Theta }|G(\theta ,z)|\le \xi (z)\quad \hbox {for}~z\in {\mathbb {R}}^{d}. \end{aligned}$$

Note that under these assumptions the optimization problems (1.1) and (1.2) are well defined with finite optimal values.

Proposition 2.1

Under (A1), (A2) the following statements are valid.

  1. (1)

    \(\inf \limits _{\theta \in \Theta }\frac{1}{n}\sum \limits _{j=1}^{n}G(\theta ,Z_{j}) - \inf \limits _{\theta \in \Theta }{\mathbb {E}}[G(\theta ,Z_{1})]\) is a random variable on \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) for every \(n\in {\mathbb {N}}\).

  2. (2)

    There exists a solution of (1.1), and the set of solutions is compact.

The proof may be found in Sect. 6.1.

In order to develop nonasymptotic confidence intervals on the optimal value of (1.1) the authors in [13] provide specific separate upper estimates for the deviation probabilities

$$\begin{aligned}&{\mathbb {P}}\Big (\Big \{\inf \limits _{\theta \in \Theta }~\frac{1}{n}~\sum _{j=1}^{n}G(\theta ,Z_{j}) - \inf \limits _{\theta \in \Theta }{\mathbb {E}}\big [G(\theta ,Z_{1})\big ]\le -\varepsilon /\sqrt{n}\Big \}\Big )\quad (n\in {\mathbb {N}},\varepsilon > 0), \end{aligned}$$
(2.1)
$$\begin{aligned}&{\mathbb {P}}\Big (\Big \{\inf \limits _{\theta \in \Theta }~\frac{1}{n}~\sum _{j=1}^{n}G(\theta ,Z_{j}) - \inf \limits _{\theta \in \Theta }{\mathbb {E}}\big [G(\theta ,Z_{1})\big ]\ge \varepsilon /\sqrt{n}\Big \}\Big )\quad (n\in {\mathbb {N}},\varepsilon > 0). \end{aligned}$$
(2.2)

They assume G to be convex in \(\theta \) such that the goal function of (1.1) is differentiable, and they also impose conditions on the tail behaviour of the random variables \(G(\theta ,Z)\). Avoiding such regularity conditions, in [18] upper estimations has been derived for deviations probabilities

$$\begin{aligned} {\mathbb {P}}\Big (\Big \{\big |\inf _{\theta \in \Theta }\frac{1}{n}\sum _{i=1}^{n}G(\theta ,Z_{j}) - \inf _{\theta \in \Theta }{\mathbb {E}}[G(\theta ,Z)]\big |\ge \varepsilon \Big \}\Big )\quad (n\in {\mathbb {N}},\varepsilon > 0). \end{aligned}$$
(2.3)

There, no further restrictions beyond property (A2) are imposed on the tail behaviour of the random variables \(G(\theta ,Z)\). Moreover, the analytical requirements on paths of G are replaced with some specific condition on the function class \(\{G(\theta ,\cdot )\mid \theta \in \Theta \}\) which we shall explain in more detail soon in this section.

The results in [18] already imply

$$\begin{aligned} n^{a}\Big [\inf _{\theta \in \Theta }\int _{{\mathbb {R}}}t ~d{\hat{F}}_{n,\theta }(t) - \inf _{\theta \in \Theta }{\mathbb {E}}\big [G(\theta ,Z)\big ]\Big ]\underset{n\rightarrow \infty }{\rightarrow } 0\quad \hbox {in probability}\quad \hbox {for}~a\in [0,1/2[ \end{aligned}$$

(see [18, Theorem 2.2]). Moreover, the sequence (1.3) is uniformly tight, i.e. relatively compact w.r.t. the topology of weak covergence (see [18, Theorem 2.5]).

Throughout this section we want to complete the results from [18] by some criterion to ensure weak convergence of the sequence (1.3), and to find its asymptotic distributions. In contrast to the literature on the first order asymptotics of the SAA method we do not want to impose analytical properties for the objective G like e.g. continuity or convexity in the parameter \(\theta \). Instead, as in [18], we suggest a condition which makes the function class \({\mathbb {F}}^{\Theta }:= \{G(\theta ,\cdot )\mid \theta \in \Theta \}\) small in some sense. Convenient ways to express this idea may be provided by general devices from empirical process theory which are based on covering numbers for classes of Borel measurable mappings from \({\mathbb {R}}^{d}\) into \({\mathbb {R}}\) w.r.t. \(L^{p}\)-norms. To recall these concepts adapted to our situation, let us fix any nonvoid set \({\mathbb {F}}\) of Borel measurable mappings from \({\mathbb {R}}^{d}\) into \({\mathbb {R}}\) and any probability measure \({\mathbb {Q}}\) on \({\mathcal {B}}({\mathbb {R}}^{d})\) with metric \(d_{{\mathbb {Q}},p}\) induced by the \(L^{p}\)-norm \(\Vert \cdot \Vert _{{\mathbb {Q}},p}\) for \(p\in [1,\infty [\).

  • Covering numbers for \({\mathbb {F}}\) We use \(N\big (\eta ,{\mathbb {F}},L^{p}({\mathbb {Q}})\big )\) to denote the minimal number to cover \({\mathbb {F}}\) by closed \(d_{{\mathbb {Q}},p}\)-balls of radius \(\eta > 0\) with centers in \({\mathbb {F}}\). We define \(N\big (\eta ,{\mathbb {F}},L^{p}({\mathbb {Q}})\big ):= \infty \) if no finite cover is available.

  • An envelope of \({\mathbb {F}}\) is defined to mean some Borel measurable mapping \(C_{{\mathbb {F}}}:{\mathbb {R}}^{d}\!\rightarrow \!{\mathbb {R}}\) satisfying \(\sup _{h\in {\mathbb {F}}}|h|\le C_{{\mathbb {F}}}\). If an envelope \(C_{{\mathbb {F}}}\) has strictly positive outcomes, we shall speak of a positive envelope.

  • \({\mathcal {M}}_{\text {\tiny fin}}\) denotes the set of all probability measures on \({\mathcal {B}}({\mathbb {R}}^{d})\) with finite support.

Usually, upper estimations of covering numbers are used instead of exact calculations.

For abbreviation let us introduce for a class \({\mathbb {F}}\) of Borel measurable functions from \({\mathbb {R}}^{d}\) into \({\mathbb {R}}\) with arbitrary positive envelope \(C_{{\mathbb {F}}}\) of \({\mathbb {F}}\) the following notation

$$\begin{aligned}&{\overline{J}}({\mathbb {F}},C_{{\mathbb {F}}},\delta ) := \int _{0}^{\delta }\sup _{{\mathbb {Q}}\in {\mathcal {M}}_{\text {\tiny fin}}}\sqrt{\ln \big ( N\big (\varepsilon ~\Vert C_{{\mathbb {F}}}\Vert _{{\mathbb {Q}},2},{\mathbb {F}},L^{2}({\mathbb {Q}})\big )\big )}~d\varepsilon . \end{aligned}$$
(2.4)

As we shall see, the finiteness of some integral \({\overline{J}}({\mathbb {F}}^{\Theta },C_{{\mathbb {F}}^{\Theta }},1)\) is already sufficient for our purposes. Henceforth we shall restrict considerations to “small” classes \({\mathbb {F}}^{\Theta }\) in the sense that \({\overline{J}}({\mathbb {F}}^{\Theta },C_{{\mathbb {F}}^{\Theta }},1)\) is finite for some positive square \({\mathbb {P}}^{Z}\)-integrable envelope \(C_{{\mathbb {F}}^{\Theta }}\) of \({\mathbb {F}}^{\Theta }\).

In the following result concerning the asymptotic distribution of the sequence (1.3) we shall use the symbol \({\text {argmin}}_{\text {\tiny RN}}\) to denote the set of minimizers of the problem (1.1) which is nonvoid and compact by Proposition 2.1. Furthermore, we shall endow \(\Theta \) with the alternative semimetric \({\overline{d}}_{\Theta }\) defined by \({\overline{d}}_{\Theta }(\theta ,\vartheta ):= \sqrt{{\mathbb {V}}\text {ar}\big (G(\theta ,Z_{1}) - G(\vartheta ,Z_{1})\big )}\).

Theorem 2.2

Let (A1), (A2) be fulfilled, where the mapping \(\xi \) from (A2) is square \({\mathbb {P}}^{Z}\)-integrable. Using notation (2.4), if \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) is finite, then \({\overline{d}}_{\Theta }\) is totally bounded and there exists some centered Gaussian process \({\mathfrak {G}} = ({\mathfrak {G}}_{\theta })_{\theta \in \Theta }\) such that the sequence (1.3) converges weakly to \( \sup \limits _{\theta \in {\text {argmin}}_{\text {\tiny RN}}}{\mathfrak {G}}_{\theta }. \) This Gaussian process has uniformly continuous paths w.r.t. \({\overline{d}}_{\Theta }\) and satisfies

$$\begin{aligned} {\mathbb {E}}\big [{\mathfrak {G}}_{\theta }\cdot {\mathfrak {G}}_{\vartheta }\big ] = {\mathbb {C}}\text {ov}\big (G(\theta ,Z_{1}), G(\vartheta ,Z_{1})\big )\quad \hbox {for}~\theta , \vartheta \in \Theta . \end{aligned}$$

In particular, if the optimization problem (1.1) has a unique solution \(\theta ^{*}\), under the given assumptions the sequence (1.3) converges weakly to some centered normally distributed random variable with variance \({\mathbb {V}}\text {ar}\big (G(\theta ^{*},Z_{1})\big )\).

The proof is delegated to Sect. 6.2.

The result on the asymptotic distribution crucially requires \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) to be finite. This property is always satisfied if the involved covering numbers have polynomial rates. Indeed this relies on the observation, that by using change of variable formula several times along with integration by parts, we obtain

$$\begin{aligned} \int _{0}^{1}\sqrt{v\ln (K/\varepsilon )}~d\varepsilon \le 2\sqrt{v \ln (K)}\quad \hbox {for}~v\ge 1, K\ge e. \end{aligned}$$
(2.5)

Inequality (2.5) may be applied if there exist \(K\ge e, v\ge 1\) such that the following condition is satisfied

$$\begin{aligned} N\big (\varepsilon ~\Vert C_{{\mathbb {F}}^{\Theta }}\Vert _{{\mathbb {Q}},2},{\mathbb {F}}^{\Theta },L^{2}({\mathbb {Q}})\big )\big )\le (K/\varepsilon )^{v}\quad \hbox {for}~{\mathbb {Q}}\in {\mathcal {M}}_{\text {\tiny fin}}\quad \hbox {and}~\varepsilon \in ]0,1[. \end{aligned}$$
(2.6)

Prominent examples satisfying (2.6) are provided by so called VC-subgraph classes (see e.g. [30]).

In [18] explicit upper estimates of the terms \({\overline{J}}({\mathbb {F}}^{\Theta }, \xi ,\delta )\) have been derived for objectives G satisfying specific analytical properties. The line of reasoning there is to show property (2.6) and then to utilize (2.5) (see Propositions 2.6, 2.8 and their proofs in [18]). Let us recall these specializations.

Denoting the Euclidean metric on \({\mathbb {R}}^{m}\) by \(d_{m,2}\), the first one is built on the following condition.

  1. (H)

    There exist some \(\beta \in ]0,1]\) and a square \({\mathbb {P}}^{Z}\)-integrable strictly positive mapping \(C:{\mathbb {R}}^{d}\rightarrow ]0,\infty [\) such that

    $$\begin{aligned} \big |G(\theta ,z) - G(\vartheta ,z)\big |\le C(z)~d_{m,2}(\theta ,\vartheta )^{\beta }\quad \hbox {for}~z\in {\mathbb {R}}^{d}, \theta , \vartheta \in \Theta . \end{aligned}$$

Property (H) simplifies Theorem 2.2.

Example 2.3

Let \(\Theta \) be compact, let condition (H) be fulfilled with \(\beta \in ]0,1]\), and let \(G(\theta ,\cdot )\) be Borel measurable for \(\theta \in \Theta \). Furthermore, \(G({\overline{\theta }},\cdot )\) is assumed to be square \({\mathbb {P}}^{Z}\)-integrable for some \({\overline{\theta }}\in \Theta \).

First of all (A1) is satisfied (see e.g. [22, Lemma 6.7.3]). Secondly, we are in the position to invoke Proposition 2.6 from [18]. Hence, with \(\Delta (\Theta )\) denoting the diameter of \(\Theta \) w.r.t. \(d_{m,2}\), the mapping \(\xi := C\cdot \Delta (\Theta )^{\beta } + |G({\overline{\theta }},\cdot )|\) is square \({\mathbb {P}}^{Z}\)-integrable satisfying property (A2), and \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) is finite with certain explicit upper estimates. In particular, Theorem 2.2 may be applied directly, and all the statements there carry over immediately. This extends the classical result on the asymptotic distributions of the sequence (1.3), where condition (H) with \(\beta = 1\) is imposed (see [29]).

In the second special case from [18] objectives G have been considered having the following kind of structure of piecewise Hölder continuity.

  1. (PH)

    \( G(\theta ,z) = \sum \limits _{i = 1}^{r}{\mathbbm {1}_{\bigcap _{l=1}^{s_{i}}\{{\Lambda _{il}(\theta , \cdot )} + a^{i}_{l}\in I_{il}\}}}(z)\cdot {G^{i}(\theta ,z)}, \) where

    • \(r, s_{1},\dots ,s_{r}\in {\mathbb {N}}\),

    • \(G^{i}\) satisfies (A1), and (H) with \(\beta _{i}\in ]0,1]\) as well as strictly positive square \({\mathbb {P}}^{Z}\)-integrable \(C_{i}:{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\) for \(i\in \{1,\ldots ,r\}\),

    • \(\Lambda _{il}:{\mathbb {R}}^{m}\times {\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\) Borel measurable with \(\Lambda _{il}(\cdot ,z)\) affine linear for \(z\in {\mathbb {R}}^{d}\) (\(i\in \{1,\ldots ,r\}\), \(l\in \{1,\ldots ,s_i\}\)) ,

    • \(a^{i}_{l}\in {\mathbb {R}}\) for \(i\in \{1,\dots ,r\}, l\in \{1,\dots ,s_{i}\}\),

    • \(I_{il} = ]0,\infty [\) or \(I_{il} = [0,\infty [\) for \(i\in \{1,\dots ,r\}\) and \(l\in \{1,\dots ,s_{i}\}\),

    • The set

      $$\begin{aligned} \left\{ \bigcap \limits _{l=1}^{s_{i}}\big \{{\Lambda _{il}(\theta , \cdot )} + a^{i}_{l}\in I_{il}\}\mid i\in \{1,\ldots ,r\}\big \}\right\} \end{aligned}$$

      is a partition of \({\mathbb {R}}^{d}\).

Note that G satisfying condition (PH) does not have continuity or convexity in \(\theta \) in advance.

In two stage mixed-integer programs the goal functions typically may be represented by (PH) if the random vector Z has compact support (see [10, p. 121] together with [18]). Within this special situation with compact \(\Theta \) the authors in [10] derive the same asymptotic distributions for the sequence (1.3) as in Theorem 2.2. Their line of reasoning is based upon the corresponding representation (PH) of G, and it relies also on finiteness of the integrals \({\overline{J}}({\mathbb {F}}^{\Theta }, C_{{\mathbb {F}}^{\Theta }},\delta )\). We may extend their result to general objectives G having representation (PH), because under this condition the application of Theorem 2.2 is quite immediate.

Example 2.4

Let \(\Theta \) be compact, and let G be lower semicontinuous in \(\theta \) having representation (PH) with mappings \(G^{i}, C_{i}\) and \(\beta _{i}\in ]0,1]\) for \(i\in \{1,\ldots ,r\}\). Moreover, \(G^{1}({\overline{\theta }},\cdot ),\ldots ,G^{r}({\overline{\theta }},\cdot )\) are supposed to be square \({\mathbb {P}}^{Z}\)-integrable for some \({\overline{\theta }}\in \Theta \). The notation \(\Delta (\Theta )\) stands for the diameter of \(\Theta \) w.r.t. the Euclidean metric on \({\mathbb {R}}^{m}\).

According to Proposition 2.8 from [18] requirement (A1) is fulfilled, and

$$\begin{aligned} \xi := \sum _{i=1}^{r}\big [\Delta (\Theta )^{\beta _{i}}~C_{i} + |G^{i}({\overline{\theta }},\cdot )|\big ] \end{aligned}$$

is square \({\mathbb {P}}^{Z}\)-integrable, satisfying (A2) and \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1) < \infty \). Also explicit upper estimations of \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) are provided there. Hence all requirements of Theorem 2.2 are met.

Remark 2.5

Let the assumptions of Theorem 2.2 be fulfilled. If optimization (1.1) has a unique solution \(\theta ^{*}\) we may utilize the result of Theorem 2.2 to construct asymptotic confidence intervals on the optimal value \({\mathbb {E}}[G(\theta ^{*},Z)]\) of (1.1) in the following way. Choose for level \(\beta \!\in \!\, ]0,1[\)real numbers \(a,b \!>\! 0\) such that \(\Phi _{N(0,1)}(a) \!+\! \Phi _{N(0,1)}(b) \!\ge \! 2 \!-\!\beta \) holds, where \(\Phi _{N(0,1)}\) denotes the distribution function of the standard normal distribution. Then for every positive upper estimate \(L\ge \sqrt{{\mathbb {V}}\text {ar}\big (G(\theta ^{*},Z_{1})\big )}\) we may define by

$$\begin{aligned} I^{n}_{a,b, L}:= \left[ \inf _{\theta \in \Theta }\frac{1}{n}\sum _{j=1}^{n}G(\theta ,Z_{j}) - \frac{L~a}{\sqrt{n}}~,~\inf _{\theta \in \Theta }\frac{1}{n}\sum _{j=1}^{n}G(\theta ,Z_{j}) + \frac{L~b}{\sqrt{n}}\right] \end{aligned}$$

a sequence \(\big (I^{n}_{a,b, L}\big )_{n\in {\mathbb {N}}}\) of confidence intervals on \({\mathbb {E}}[G(\theta ^{*},Z)]\) which fulfills

$$\begin{aligned} \liminf _{n\rightarrow \infty }{\mathbb {P}}\big (\big \{{\mathbb {E}}[G(\theta ^{*},Z)]\in I^{n}_{a,b, L}\big \}\big )\ge 1 - \beta . \end{aligned}$$

These confidence intervals may be considered as an alternative of the nonasymptotic confidence intervals which may be built directly on the upper estimates for the deviation probabilities (2.3) derived in Theorem 2.2 from [18]. It should be emphasized that these both ways to find confidence intervals do not require in advance path properties of the objective G like continuity or convexity. In particular these methods may be used in the case that objectives have representation (PH), e.g. in two stage mixed-integer programs.

In [13] the authors develop nonasymptotic confidence intervals which are based on specific separate upper estimates for the deviation probabilities (2.1) and (2.2). The special feature of their suggestion is that the confidence intervals are independent of the dimension of the parameters (see [13, Discussion 2.1.3, (3)]). However, G has to be convex in the parameter.

3 First order asymptotics under absolute semideviations

Let \(L^1(\Omega ,{\mathcal {F}},{\mathbb {P}})\) denote the usual \(L^{1}\)-space on \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\), where we tacitely identify random variables which are different on \({\mathbb {P}}\)-null sets only.

We want to study the risk averse stochastic program (1.4), where in the objective the functional \(\rho \) is an absolute semideviation. This means that for \(a\in ]0,1]\) the functional \(\rho = \rho _{1,a}\) is defined as follows

$$\begin{aligned} \rho _{1,a}:L^{1}(\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\rightarrow {\mathbb {R}},~X\mapsto {\mathbb {E}}[X] + a~ {\mathbb {E}}\big [\big (X - {\mathbb {E}}[X]\big )^{+}\big ]. \end{aligned}$$

It is well-known that absolute semideviations are increasing w.r.t. the increasing convex order (cf. e.g. [29, Theorem 6.51 along with Example 6.23 and Proposition 6.8]). They are also distribution-invariant so that we may define the associated functional \({\mathcal {R}}_{\rho _{1,a}}\) on the set of distributions functions of random variables with first absolute moments. The aim of this section is the optimization problem

$$\begin{aligned} \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big (F_{\theta }\big ), \end{aligned}$$
(3.1)

where \(F_{\theta }\) stands for the distribution function of \(G(\theta ,Z)\) for \(\theta \in \Theta \). The set of minimizers of this problem will be denoted by \({\text {argmin}}{\mathcal {R}}_{\rho _{1,a}}\).

Introducing the notation

$$\begin{aligned} G_{1}:\Theta \times {\mathbb {R}}^{d}\rightarrow {\mathbb {R}},~(\theta ,z)\mapsto \big (G(\theta ,z) - {\mathbb {E}}[G(\theta ,Z_{1})]\big )^{+}, \end{aligned}$$
(3.2)

we may describe this optimization also in the following way

$$\begin{aligned} \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big (F_{\theta }\big ) = \inf _{\theta \in \Theta }\left\{ {\mathbb {E}}[G(\theta ,Z_{1})] + a~{\mathbb {E}}[G_{1}(\theta ,Z_{1})]\right\} \end{aligned}$$
(3.3)

The stochastic objective of the approximative problem according to the SAA method has the following representation.

$$\begin{aligned} {\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big ) = \frac{1}{n}\sum _{j=1}^{n}G(\theta ,Z_{j}) + a~\frac{1}{n}\sum _{j=1}^{n}\big [G(\theta ,Z_{j}) - \frac{1}{n}\sum _{i=1}^{n}G(\theta ,Z_{i})\big ]^{+} \end{aligned}$$
(3.4)

Let (A1), (A2) be fulfilled, and let \({\overline{J}}({\mathbb {F}}^{\Theta }, \xi ,1) < \infty \), where \(\xi \) is from (A2). Then under some minor additional regularity conditions on G we already know that

$$\begin{aligned} n^{a}~\Big [\inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big ) - \inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big (F_{\theta }\big )\Big ]\underset{n\rightarrow \infty }{\rightarrow } 0\quad \hbox {in probability} \end{aligned}$$

holds for \(a\in ]0,1/2[\) (see [18, Theorem 3.5]). Moreover, also by Theorem 3.5 from [18], the sequence

$$\begin{aligned} \Big (\sqrt{n}~\Big [\inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big ) - \inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big (F_{\theta }\big )\Big ]\Big )_{n\in {\mathbb {N}}} \end{aligned}$$
(3.5)

is uniformly tight. The aim of this section is to find asymptotic distributions of this sequence. The starting point is the following observation from (3.3)

$$\begin{aligned} \rho _{1,a}\big (G(\theta ,Z_{1})\big ) = {\mathbb {E}}[{\widehat{G}}_{1,a}(\theta ,Z_{1})] \quad \hbox {for}~\theta \in \Theta , \end{aligned}$$
(3.6)

where, setting \({\overline{F}}_{\theta }:= 1- F_{\theta }\), the mapping \({\widehat{G}}_{1,a}:\Theta \times {\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\) is defined by

$$\begin{aligned} {\widehat{G}}_{1,a}(\theta ,z)&= \big [1- a{\overline{F}}_{\theta }\big ({\mathbb {E}}[G(\theta ,Z_{1})]\big )\big ]~G(\theta ,z) \\&\quad + a {\mathbb {E}}[G(\theta ,Z_{1})]{\overline{F}}_{\theta }\big ({\mathbb {E}}[G(\theta ,Z_{1})]\big ) + a G_{1}(\theta ,z). \end{aligned}$$

Then the key is to show that sequence (3.5) has the same asymptotic distribution as the following sequence

$$\begin{aligned} \Big (\sqrt{n}\Big [\inf _{\theta \in \Theta }\frac{1}{n}\sum _{j=1}^{n}{\widehat{G}}_{1,a}(\theta ,Z_{j}) - \inf _{\theta \in \Theta }{\mathbb {E}}\big [{\widehat{G}}_{1,a}\big (\theta ,Z_{1}\big )\big ]\Big ]\Big )_{n\in {\mathbb {N}}} \end{aligned}$$
(3.7)

if one of these sequences converges weakly. In this case we may apply Theorem 2.2 to the sequence (3.7) to derive the asymptotic distribution of sequence (3.5).

The investigations are based on the following mild continuity requirement for the objective G.

  1. (A3)

    \(G(\theta _{n},\cdot )\rightarrow G(\theta ,\cdot )\) in \({\mathbb {P}}^{Z}\)-probability whenever \(\theta _{n}\rightarrow \theta \) w.r.t. the Euclidean metric.

Requirement (A3) implies some useful convergence property in continuity points of the distribution functions \(F_{\theta }\).

Lemma 3.1

Let \(t\in {\mathbb {R}}\) be a continuity point of \(F_{\theta }\) for some \(\theta \in \Theta \), and let (A3) be fulfilled. If \(\theta _{n}\rightarrow \theta \) and \(t_{n}\rightarrow t\), then \(F_{\theta _{n}}(t_{n})\rightarrow F_{\theta }(t)\).

Proof

Fix any \(\varepsilon > 0\). Then there is some \(n_{0}\in {\mathbb {N}}\) such that \(|t_{n} - t|\le \varepsilon /2\) for \(n\in {\mathbb {N}}\) with \(n\ge n_{0}\). Furthermore

$$\begin{aligned}&F_{\theta _{n}}(t_{n})\le F_{\theta _{n}}(t + \varepsilon /2)\le F_{\theta }(t + \varepsilon ) + {\mathbb {P}}\big (\big \{|G(\theta _{n},Z_{1}) - G(\theta ,Z_{1})|> \varepsilon /2\big \}\big )\\&F_{\theta _{n}}(t_{n})\ge F_{\theta _{n}}(t - \varepsilon /2)\ge {\mathbb {P}}\big (\big \{G(\theta ,Z_{1}) \le t-\varepsilon ,|G(\theta _{n},Z_{1}) - G(\theta ,Z_{1})|\le \varepsilon /2\big \}\big )\\&\qquad \quad \ge F_{\theta }(t - \varepsilon ) - {\mathbb {P}}\big (\big \{|G(\theta _{n},Z_{1}) - G(\theta ,Z_{1})| > \varepsilon /2\big \}\big ) \end{aligned}$$

for \(n\in {\mathbb {N}}\) with \(n\ge n_{0}\). Hence by (A3)

$$\begin{aligned} \liminf _{n\rightarrow \infty }F_{\theta _{n}}(t_{n})\ge F_{\theta }(t-\varepsilon )\quad \hbox {and}\quad \limsup _{n\rightarrow \infty }F_{\theta _{n}}(t_{n})\le F_{\theta }(t+\varepsilon ). \end{aligned}$$

The statement may be derived immediately from continuity of \(F_{\theta }\) at t. \(\square \)

The following result is the providing step to show that the sequences (3.5) and (3.7) have identical asymptotic distributions if (3.7) converges weakly.

Proposition 3.2

Let (A1)–(A3) be fulfilled, where the mapping \(\xi \) from (A2) is square \({\mathbb {P}}^{Z}\)-integrable. Furthermore the distribution function \(F_{\theta }\) is continuous at \({\mathbb {E}}[G(\theta ,Z_{1})]\) for \(\theta \in \Theta \). Using notation (2.4), if \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) is finite, then

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {P}}^{*}\Big (\Big \{\big |\sqrt{n}\inf _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big ) - \sqrt{n}\inf _{\theta \in \Theta }\frac{1}{n}\sum _{j=1}^{n}{\widehat{G}}_{1,a}(\theta ,Z_{j})\big |> \varepsilon \Big \}\Big ) = 0\quad \hbox {for}~\varepsilon > 0, \end{aligned}$$

where \({\mathbb {P}}^{*}\) stands for the outer probability of \({\mathbb {P}}\).

The proof of Proposition 3.2 may be found in Sect. 6.3.

Now, combining Proposition 3.2 with Theorem 2.2, we may derive our result on the first order asymptotics of the SAA under absolute semideviations.

Theorem 3.3

Let \(a\in ]0,1]\), let (A1)–(A3) be fulfilled, where the mapping \(\xi \) from condition (A2) is \({\mathbb {P}}^{Z}\)-integrable of order 4. Furthermore the distribution function \(F_{\theta }\) is continuous at \({\mathbb {E}}[G(\theta ,Z_{1})]\) for \(\theta \in \Theta \). Using notation (2.4), if \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) is finite, then the following statements hold.

  1. (1)

    \(\sqrt{n}\big [\inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big ) - \inf \limits _{\theta \in \Theta }{\mathcal {R}}_{1,a}\big (F_{\theta }\big )\big ]\) is a random variable on \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) for \(n\in {\mathbb {N}}\).

  2. (2)

    The set \({\text {argmin}}{\mathcal {R}}_{\rho _{1,a}}\) is nonvoid and compact.

  3. (3)

    There exists some centered Gaussian process \(\widehat{{\mathfrak {G}}} = (\widehat{{\mathfrak {G}}}_{\theta })_{\theta \in \Theta }\) such that the sequence

    $$\begin{aligned} \Big (\sqrt{n}\big [\inf _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big ) - \inf _{\theta \in \Theta }{\mathcal {R}}_{1,a}\big (F_{\theta }\big )\big ]\Big )_{n\in {\mathbb {N}}} \end{aligned}$$

    converges weakly to \( \sup \limits _{\theta \in {\text {argmin}}{\mathcal {R}}_{\rho _{1,a}}}\widehat{{\mathfrak {G}}}_{\theta }. \) This Gaussian process has uniformly continuous paths w.r.t. Euclidean metric and satisfies

    $$\begin{aligned} {\mathbb {E}}\big [\widehat{{\mathfrak {G}}}_{\theta }\cdot \widehat{{\mathfrak {G}}}_{\vartheta }\big ] = {\mathbb {C}}\text {ov}\big ({\widehat{G}}_{1,a}(\theta ,Z_{1}), {\widehat{G}}_{1,a}(\vartheta ,Z_{1})\big )\quad \hbox {for}~\theta , \vartheta \in \Theta . \end{aligned}$$

    In particular, if the optimization problem (3.3) has a unique solution \(\theta ^{*}\), then under the given assumptions we have weak convergence to some centered normally distributed random variable with variance \({\mathbb {V}}\text {ar}\big ({\widehat{G}}_{1,a}(\theta ^{*},Z_{1})\big )\).

Proof

Firstly, by (A2) the genuine optimization problem and its SAA counterpart have finite optimal values. Furthermore

$$\begin{aligned}{} & {} \big \{\omega \in \Omega \mid \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big )_{|\omega }< t\big \} \\{} & {} \quad = \text {Pr}_{\Omega }\Big (\big \{(\theta ,\omega )\in \Theta \times \Omega \mid {\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big )_{|\omega } < t\big \}\Big )\quad \hbox {for}~t\in {\mathbb {R}}, \end{aligned}$$

where \(\text {Pr}_{\Omega }\) denotes the standard projection from \(\Theta \times \Omega \) onto \(\Omega \). In view of (3.4) the set \(\big \{(\theta ,\omega )\in \Theta \times \Omega \mid {\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big )_{|\omega } < t\big \}\) belongs to \({\mathcal {B}}(\Theta )\otimes {\mathcal {F}}\) for every \(t\in {\mathbb {R}}\) due to (A1). Since \(\Theta \) is a Polish space, and since \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) is complete, we end up with \(\big \{\omega \in \Omega \mid \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big )_{|\omega } < t\big \}\in {\mathcal {F}}\) (see [6, Proposition 8.4.4]). This shows statement 1).

Secondly, \({\mathbb {E}}[{\widehat{G}}_{1,a}(\theta ,Z_{1})] = \rho _{1,a}\big (G(\theta ,Z_{1})\big )\) holds for any \(\theta \in \Theta \). Hence in view of Proposition 3.2 along with a version of Slutsky’s lemma ([30, Lemma 1.10.2]) it remains to show that the objective \({\widehat{G}}_{1,a}\) meets the requirements of Proposition 2.1 and Theorem 2.2.

In Lemma 6.1 below it will be shown that the mapping \(\theta \mapsto {\mathbb {E}}[G(\theta ,Z_{1})]\) on \(\Theta \) is continuous under (A1)–(A3). This implies by Lemma 3.1 the continuity of the mappings \(\theta \mapsto {\overline{F}}_{\theta }\big ({\mathbb {E}}[G(\theta ,Z_{1})]\big )\) and \(\theta \mapsto {\mathbb {E}}[G(\theta ,Z_{1})]~{\overline{F}}_{\theta }\big ({\mathbb {E}}[G(\theta ,Z_{1})]\big )\) on \(\Theta \) because each \(F_{\theta }\) is assumed to be continuous at \({\mathbb {E}}[G(\theta ,Z_{1})]\). Since in addition the number \(\big [1 - a {\overline{F}}_{\theta }\big ({\mathbb {E}}[G(\theta ,Z_{1})]\big )\big ]\) is nonnegative for \(\theta \) and \((\cdot )^{+}\) is nondecreasing, the objective \({\widehat{G}}_{1,a}\) is lower semicontinuous in \(\theta \) due to (A1). Property (A1) also implies that \({\widehat{G}}_{1,a}\) is measurable w.r.t. \({\mathcal {B}}(\Theta )\otimes {\mathcal {B}}({\mathbb {R}}^{d})\).

For a nonnvoid bounded subset \({\mathcal {K}}\) of \({\mathbb {R}}\) we denote by \({\overline{{\mathbb {F}}}}^{{\mathcal {K}}}\) the set of all constant mappings on \({\mathbb {R}}^{d}\) with outcomes in \({\mathcal {K}}\). We shall use notation \(N(\varepsilon ,{\mathcal {K}},|\cdot |)\) for the minimal number to cover \({\mathcal {K}}\) by intervals of the form \([a - \varepsilon ,a+ \varepsilon ]\), where \(\varepsilon > 0\) and \(a\in {\mathcal {K}}\). Then

$$\begin{aligned} N\big (\varepsilon ,{\overline{{\mathbb {F}}}}^{{\mathcal {K}}},L^{2}({\mathbb {Q}})\big )\le N(\varepsilon ,{\mathcal {K}},|\cdot |)\le \frac{2 (\sup {\mathcal {K}}- \inf {\mathcal {K}})}{\varepsilon }\quad \hbox {for}~{\mathbb {Q}}\in {\mathcal {M}}_{\text {\tiny fin}},~\varepsilon > 0. \end{aligned}$$

In particular, using notation (2.4), we have \({\overline{J}}({\overline{{\mathbb {F}}}}^{{\mathcal {K}}},c,1) < \infty \) for every positive, constant envelope c of \({\overline{{\mathbb {F}}}}^{{\mathcal {K}}}\). So we have finiteness of the terms \({\overline{J}}({\overline{{\mathbb {F}}}}^{{\mathcal {K}}_{1}},c_{1},1)\) and \({\overline{J}}({\overline{{\mathbb {F}}}}^{{\mathcal {K}}_{2}},c_{2},1)\), where \({\mathcal {K}}_{1}:= \big \{1 - a {\overline{F}}_{\theta }\big ({\mathbb {E}}[G(\theta ,Z_{1})]\big )\mid \theta \in \Theta \big \}\), \({\mathcal {K}}_{2}:= \big \{a ~{\mathbb {E}}[G(\theta ,Z_{1})]~{\overline{F}}_{\theta }\big ({\mathbb {E}}[G(\theta ,Z_{1})]\big )\mid \theta \in \Theta \big \}\) and \(c_{1}:= 1\), \(c_{2}:={\mathbb {E}}[\xi (Z_{1})]\). Moreover, since \(\xi \) is \({\mathbb {P}}^{Z}\)-integrable of order 4, we obtain by Lemma 3.4 from [18] that there exists some square \({\mathbb {P}}^{Z}\)-integrable positive envelope \(\xi _{1,a}\) of \({\overline{{\mathbb {F}}}}^{\Theta ,1}_{a}:= \{a~G_{1}(\theta ,\cdot )\mid \theta \in \Theta \}\) with finite \({\overline{J}}({\overline{{\mathbb {F}}}}_{a}^{\Theta ,1},\xi _{1,a},1)\). Now, invoking Lemma 9.14 and Theorem 9.15 both from [17], and recalling \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1) < \infty \), the mapping \(\xi + {\mathbb {E}}[\xi (Z_{1})] + \xi _{1,a}\) is a square \({\mathbb {P}}^{Z}\)-integrable positive envelope of \({\widehat{{\mathbb {F}}}}_{1,a}:= \{{\widehat{G}}_{1,a}(\theta ,\cdot )\mid \theta \in \Theta \}\) with finite \({\overline{J}}\big ({\widehat{{\mathbb {F}}}}_{1,a},\xi + {\mathbb {E}}[\xi (Z_{1})]+\xi _{1,a},1\big )\).

Now, we are ready to apply Proposition 2.1 and Theorem 2.2 to the objective \({\widehat{G}}_{1,a}\). Statement 2) follows immediately from Proposition 2.1. According to Theorem 2.2 we may find some centered Gaussian process \(\widehat{{\mathfrak {G}}} = (\widehat{{\mathfrak {G}}}_{\theta })_{\theta \in \Theta }\) with the same covariances as in statement 3) such that the sequence ( 3.7) converges weakly to \( \sup \limits _{\theta \in {\text {argmin}}{\mathcal {R}}_{\rho _{1,a}}}\widehat{{\mathfrak {G}}}_{\theta }. \) If the optimization problem (3.3) has a unique solution \(\theta ^{*}\), the limit is a centered normally distributed random variable with variance \({\mathbb {V}}\text {ar}\big ({\widehat{G}}_{1,a}(\theta ^{*},Z_{1})\big )\).

It is also known from Theorem 2.2 that the process \(\widehat{\mathfrak {G}}\) has uniformly continuous paths w.r.t. the semimetric \({\overline{d}}_{\Theta ,1,a}\) defined by \({\overline{d}}_{\Theta ,1,a}(\theta ,\vartheta ) = \sqrt{{\mathbb {V}}\text {ar}\big ({\widehat{G}}_{1,a}(\theta ,Z_{1}) - {\widehat{G}}_{1,a}(\vartheta ,Z_{1})\big )}\). By continuity of the three mappings \(\theta \mapsto {\mathbb {E}}[G(\theta ,Z_{1})]\), \(\theta \mapsto {\overline{F}}_{\theta }\big ({\mathbb {E}}[G(\theta ,Z_{1})]\big )\) as well as \(\theta \mapsto {\mathbb {E}}[G(\theta ,Z_{1})]~{\overline{F}}_{\theta }\big ({\mathbb {E}}[G(\theta ,Z_{1})]\big )\) on \(\Theta \) along with (A3) we may conclude convergence \({\widehat{G}}_{1,a}(\theta _{n},Z_{1})\rightarrow {\widehat{G}}_{1,a}(\theta ,Z_{1})\) in probability for \(\theta _{n}\rightarrow \theta \) w.r.t. the Euclidean metric. Moreover in this case, the sequence \(\big ({\widehat{G}}_{1,a}(\theta _{n},Z_{1})\big )_{n\in {\mathbb {N}}}\) is dominated by the square integrable random variable \(\xi (Z_{1}) + {\mathbb {E}}[\xi (Z_{1})] + \xi _{1,a}(Z_{1})\). Then by Vitalis’ theorem (see [1, Proposition 21.4]) we end up with \(\sqrt{{\mathbb {E}}[|{\widehat{G}}_{1,a}(\theta _{n},Z_{1}) -{\widehat{G}}_{1,a}(\theta ,Z_{1})|^{2}]}\rightarrow 0\), and thus \({\overline{d}}_{\Theta ,1,a}(\theta _{n},\theta )\rightarrow 0\). Therefore \(\widehat{{\mathfrak {G}}}\) has also continuous paths w.r.t. the Euclidean metric, and they are even uniformly continuous due to compactness of \(\Theta \). \(\square \)

Remark 3.4

Let \(G(\theta ,\cdot )\) be an \({\mathbb {P}}^{Z}\)-integrable random variable with distribution function \(F_{\theta }\) being continuous at \({\mathbb {E}}[G(\theta ,Z_{1})]\) for \(\theta \in \Theta \). If G either satisfies condition (H) or has representation (PH), then Example 2.3 or Example 2.4 respectively provide constructions for proper positive envelopes \(\xi \) of \({\mathbb {F}}^{\Theta }\) to apply Theorem 3.3.

Remark 3.5

Theorem 3.3 offers a method to construct asymptotic confidence intervals on the optimal value of optimization (3.1) in the case that there is some unique solution \(\theta ^{*}\). The way is exactly the same one as we may find asymptotic confidence intervals on the optimal value of (1.1) according to Remark 2.5. The only difference is the choice of the involved positive estimate L which should satisfy \(L\ge \sqrt{{\mathbb {V}}\text {ar}\big ({\widehat{G}}_{1,a}(\theta ^{*},Z_{1})\big )}\).

These asymptotic confidence intervals might be compared with nonasymptotic confidence intervals which are based on upper estimates for the deviation probabilities

$$\begin{aligned} {\mathbb {P}}\Big (\Big \{\big |\inf _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big ) - \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big (F_{\theta }\big )\big |\ge \varepsilon \Big \}\Big )\quad (n\in {\mathbb {N}},\varepsilon > 0) \end{aligned}$$

from Theorem 3.5 in [18]. By Example 2.4 both methods may be applied for objectives with representation (PH), e.g. in two stage mixed-integer programs.

4 First order asymptotics under divergence risk measures

Let denote by \(L^p:=L^p(\Omega ,{\mathcal {F}},{\mathbb {P}})\) the usual \(L^{p}\)-space on \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) (\(p\in [0,\infty ]\)), where we tacitely identify random variables which are differ on \({\mathbb {P}}\)-null sets only.

We want to study the risk averse stochastic program (1.4), where we shall focus on \(\rho \) being a divergence measure. For introduction, let us consider a lower semicontinuous convex mapping \(\Phi : [0,\infty [\rightarrow [0,\infty ]\) satisfying \(\Phi (0) < \infty \), \(\Phi (x_{0}) < \infty \) for some \(x_{0} > 1,\) \(\inf _{x\ge 0}\Phi (x) = 0,\) and the growth condition \(\lim _{x\rightarrow \infty }\frac{\Phi (x)}{x} = \infty .\) Its Fenchel-Legendre transform

$$\begin{aligned} \Phi ^{*}:{\mathbb {R}}\rightarrow {\mathbb {R}}\cup \{\infty \},~y\mapsto \sup _{x\ge 0}~\big (xy - \Phi (x)\big ) \end{aligned}$$

is a finite nondecreasing convex function whose restriction \(\Phi ^{*}\bigr |_{[0,\infty [}\) to \([0,\infty [\) is a finite Young function, i.e. a continuous nondecreasing and unbounded real valued mapping with \(\Phi ^{*}(0) = 0\) (cf. [2, Lemma A.1]). Note also that the right-sided derivative \(\Phi ^{*'}_{+}\) of \(\Phi ^{*}\) is nonnegative and nondecreasing. We shall use \(H^{\Phi ^{*}}\) to denote the Orlicz heart w.r.t. \(\Phi ^{*}\bigr |_{[0,\infty [}\) defined to mean the set of all random variables X on \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) satisfying \({\mathbb {E}}[\,\Phi ^{*}(c|X|)\,]<\infty \) for all \(c > 0\). Here we identify random variables which differ on \({\mathbb {P}}\)-null sets only.

The Orlicz heart is known to be a vector space enclosing all \({\mathbb {P}}\)-essentially bounded random variables. Moreover, by Jensen’s inequality all members of \(H^{\Phi ^{*}}\) are \({\mathbb {P}}\)-integrable. For more on Orlicz hearts w.r.t. to Young functions the reader may consult [9].

We can define the following mapping

$$\begin{aligned} \rho ^{\Phi }(X)=\sup _{{\overline{{\mathbb {P}}}}\in {\mathcal {P}}_{\Phi }}\left( {\mathbb {E}}_{{\overline{{\mathbb {P}}}}}\left[ X\right] - {\mathbb {E}}\left[ \Phi \left( \frac{d{\overline{{\mathbb {P}}}}}{d{\mathbb {P}}}\right) \right] \right) \end{aligned}$$

for all \(X\in H^{\Phi ^{*}},\) where \({\mathcal {P}}_{\Phi },\) denotes the set of all probability measures \({\overline{{\mathbb {P}}}}\) which are absolutely continuous w.r.t. \({\mathbb {P}}\) such that \(\Phi \left( \frac{d{\overline{{\mathbb {P}}}}}{d{\mathbb {P}}}\right) \) is \({\mathbb {P}}-\)integrable. Note that \(\frac{d{\overline{{\mathbb {P}}}}}{d{\mathbb {P}}}~ X\) is \({\mathbb {P}}-\)integrable for every \({\overline{{\mathbb {P}}}}\in {\mathcal {P}}_{\Phi }\) and any \(X\in H^{\Phi ^{*}}\) due to Young’s inequality. We shall call \(\rho ^{\Phi }\) the divergence risk measure w.r.t. \(\Phi \).

Ben-Tal and Teboulle ([3, 4]) discovered another more convenient representation. It reads as follows (see [2]).

Theorem 4.1

The divergence risk measure \(\rho ^{\Phi }\) w.r.t. \(\Phi \) satisfies the following representation

$$\begin{aligned} \rho ^{\Phi }(X) = \inf _{x\in {\mathbb {R}}}{\mathbb {E}}\left[ \Phi ^{*}(X + x) - x\right] \quad \hbox {for all}~X\in H^{\Phi ^{*}}. \end{aligned}$$

The representation in Theorem 4.1 is also known as the optimized certainty equivalent w.r.t. \(\Phi ^{*}\). As optimized certainty equivalent the divergence measure \(\rho ^{\Phi }\) may be seen directly to be nondecreasing w.r.t. the increasing convex order. Theorem 4.1 also shows that \(\rho ^{\Phi }\) is distribution-invariant. In particular, we may define the functional \({\mathcal {R}}_{\rho ^{\Phi }}\) associated with \(\rho ^{\Phi }\) on the set \({\mathbb {F}}_{\Phi ^{*}}\) of all distribution functions of the random variables from \(H^{\Phi ^{*}}\). Note that \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) supports some random variable U which is uniformly distributed on ]0, 1[ because \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) is assumed to be atomless. Then we obtain for any distribution function \(F\in {\mathbb {F}}_{\Phi ^{*}}\) with left-continuous quantile function \(F^{\leftarrow }\)

$$\begin{aligned} {\mathcal {R}}_{\rho ^{\Phi }}(F) ~=~ \rho ^{\Phi }\big (F^{\leftarrow }(U)\big ) ~=~ \inf _{x\in {\mathbb {R}}}\left( \int _{0}^{1}\mathbbm {1}_{]0,1[}(u)~\Phi ^{*}\big (F^{\leftarrow }(u) + x\big )~du - x\right) .\nonumber \\ \end{aligned}$$
(4.1)

For ease of reference we shall use notation \(M_{F}\) to denote the set of all \(x\in {\mathbb {R}}\) which solve the minimization in definition (4.1) of \({\mathcal {R}}_{\rho ^{\Phi }}(F)\). In view of Proposition A.1 from the Appendix, each set \(M_{F}\) is a nonvoid compact interval.

Throughout this section we focus on the following specialization of optimization problem (1.4)

$$\begin{aligned} \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}\big (F_{\theta }\big ), \end{aligned}$$
(4.2)

where \(F_{\theta }\) stands for the distribution function of \(G(\theta ,Z)\) for \(\theta \in \Theta \). The set of minimizers of the problem (4.2) will be denoted by \({\text {argmin}}{\mathcal {R}}_{\rho ^{\Phi }}\).

The SAA (1.5) of (4.2) reads as follows.

$$\begin{aligned} \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}\big ({\hat{F}}_{n,\theta }\big ) = \inf _{\theta \in \Theta }\inf _{x\in {\mathbb {R}}}\left( \frac{1}{n}\sum _{i=1}^{n}\Phi ^{*}\big ( G(\theta , Z_{i}) +x\big ) - x\right) \quad (n\in {\mathbb {N}}). \end{aligned}$$
(4.3)

We shall strengthen condition (A2) to the following property.

  1. (A2’)

    There exists some positive envelope \(\xi \) of \({\mathbb {F}}^{\Theta }\) satisfying \(\xi (Z_{1})\in {\mathcal {H}}^{\Phi ^{*}}\).

Note that (A2’) together with (A1) implies that \(G(\theta ,Z_{1})\) belongs to \({\mathcal {H}}^{\Phi ^{*}}\) for every \(\theta \in \Theta \) so that the genuine optimization problem (4.2) is well-defined.

According to Theorem 4.6 in [18]

$$\begin{aligned} n^{a}~\Big [\inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}\big ({\hat{F}}_{n,\theta }\big ) - \inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}\big (F_{\theta }\big )\Big ]\underset{n\rightarrow \infty }{\rightarrow } 0\quad \hbox {in probability}\quad \hbox {for}~a\in ]0,1/2[, \end{aligned}$$

and the sequence

$$\begin{aligned} \Big (\sqrt{n}~\Big [\inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}\big ({\hat{F}}_{n,\theta }\big ) - \inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}\big (F_{\theta }\big )\Big ]\Big )_{n\in {\mathbb {N}}} \end{aligned}$$
(4.4)

is uniformly tight. The essential requirements are assumptions (A1), (A2’) and the finiteness of \({\overline{J}}({\mathbb {F}}^{\Theta }, \xi ,1)\), where \(\xi \) is from (A2’). In this section we want to derive asymptotic distributions of the sequence (4.4).

Representation (4.3) along with Theorem 4.1 suggests to apply Theorem 2.2 to the SAA of

$$\begin{aligned} \inf _{(\theta ,x)\in \Theta \times {\mathbb {R}}}{\mathbb {E}}\big [G_{\Phi }\big ((\theta ,x),Z_{1}\big )\big ], \end{aligned}$$
(4.5)

where

$$\begin{aligned} G_{\Phi }: (\Theta \times {\mathbb {R}})\times {\mathbb {R}}^{d}\rightarrow {\mathbb {R}}, \big ((\theta ,x),z\big )\mapsto \Phi ^{*}\big (G(\theta ,z) + x\big ) - x. \end{aligned}$$
(4.6)

Unfortunately, the application is not immediate because the parameter space is not totally bounded w.r.t. the Euclidean metric on \({\mathbb {R}}^{m + 1}\). We already know that the solution set of the optimization problem (4.5) is compact under (A1), (A2’) (see [18, Lemma 5.8]). Conditions (A1), (A2’) also imply that the associated SAA problems have nonvoid compact solution sets. Unfortunately, they may depend on the realizations of the samples. In [18] a kind of compactification was suggested which allows to restrict the parameter set of the random process \(G_{\Phi }\) to suitable compact subsets. The idea is to show that with arbitrarily high probability we may find for large sample sizes events from \({\mathcal {F}}\) on which all solution sets of the SAA problems are contained in a common compact superset. The following result from [18] gives a precise formulation of this idea. For preparation consider any mapping \(\xi \) as in (A2’) and let us introduce for \(\delta > 0\) and \(n\in {\mathbb {N}}\)

$$\begin{aligned} A_{n,\delta }^{\xi }:= \left\{ \frac{1}{n}\sum _{j=1}^{n}\xi (Z_{j})\le {\mathbb {E}}[\xi (Z_{1})] + \delta ,~\frac{1}{n}\sum _{j=1}^{n}\Phi ^{*}\big (\xi (Z_{j})\big )\le {\mathbb {E}}\big [\Phi ^{*}\big (\xi (Z_{1})\big )\big ] + \delta \right\} . \end{aligned}$$

Note that \(A_{n,\delta }^{\xi }\) belongs to \({\mathcal {F}}\), and \({\mathbb {P}}(A_{n,\delta }^{\xi })\rightarrow 1\) for \(n\rightarrow \infty \) due to the law of large numbers. The following result has been shown in [18] (Theorem 5.7 with Lemma 5.8).

Proposition 4.2

Let (A1), (A2’) be fulfilled. Then the set of solutions of problem (4.5) is nonvoid and compact, and there always exists a solution of (4.3) for any \(\omega \). Furthermore with mapping \(\xi \) from (A2’), for every \(\delta > 0\) and \(n\in {\mathbb {N}}\) there is some \(k_{\delta }\in {\mathbb {N}}\) such that

$$\begin{aligned} \Big \{(\theta ,x)\in \Theta \times {\mathbb {R}}\mid {\mathbb {E}}\big [G_{\Phi }\big ((\theta ,x),Z_{1}\big )\big ] = \inf _{\begin{array}{c} \theta \in \Theta \\ x\in {\mathbb {R}} \end{array}}{\mathbb {E}}\big [G_{\Phi }\big ((\theta ,x),Z_{1}\big )\big ]\Big \}\subseteq \Theta \times [-k_{\delta },k_{\delta }], \end{aligned}$$

and

$$\begin{aligned} \Bigg \{ (\theta ,x)\in \Theta \times {\mathbb {R}}\mid \frac{1}{n}\sum _{j=1}^{n}G_{\Phi }\big ((\theta ,x),Z_{j}(\omega )\big )= & {} \inf _{\begin{array}{c} \theta \in \Theta \\ x\in {\mathbb {R}} \end{array}}\sum _{j=1}^{n}G_{\Phi }\big ((\theta ,x),Z_{j}(\omega )\big )\Bigg \}\\\subseteq & {} \Theta \times [-k_{\delta },k_{\delta }] \end{aligned}$$

for \(\delta > 0\), \(n\in {\mathbb {N}}\) and \(\omega \in A_{n,\delta }^{\xi }\).

Based upon Proposition 4.2 it will turn out that it is already sufficient to apply Theorem 2.2 to the SAA corresponding to the function classes of the following type

$$\begin{aligned} {\mathbb {F}}^{\Theta }_{\Phi ,k}:=\big \{G_{\Phi }\big ((\theta ,x),\cdot \big )\mid (\theta ,x)\in \Theta \times [-k,k]\big \}\quad (k\in {\mathbb {N}}). \end{aligned}$$
(4.7)

The finiteness of the terms \({\overline{J}}({\mathbb {F}}^{\Theta }_{\Phi ,k},C_{{\mathbb {F}}^{\Theta }_{\Phi ,k}},1)\) is already guaranteed by finiteness of the terms \({\overline{J}}({\mathbb {F}}^{\Theta },C_{{\mathbb {F}}^{\Theta }},1)\) associated with the genuine objective G. This will be the subject of the following result which has been proved in [18] (Lemma 4.3).

Lemma 4.3

Let \(\Phi ^{*'}_{+}\) denote the right-sided derivative of \(\Phi ^{*}\). If \(\xi \) is a square \({\mathbb {P}}^{Z}\)-integrable positive envelope of \({\mathbb {F}}^{\Theta }\), then for any \(k\in {\mathbb {N}}\) the mapping

$$\begin{aligned} C_{{\mathbb {F}}^{\Theta }_{\Phi ,k}}:= 2 \big [\Phi ^{*'}_{+}\big (\xi + k\big ) + 1] \sqrt{\xi ^{2} + k^{2}} \end{aligned}$$

is a positive envelope of \({\mathbb {F}}^{\Theta }_{\Phi ,k}\) satisfying \({\overline{J}}({\mathbb {F}}^{\Theta }_{\Phi ,k},C_{{\mathbb {F}}^{\Theta }_{\Phi ,k}},1) < \infty \) if \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) is finite.

Next, if \(G_{\Phi }\big ((\theta ,x),\cdot \big )\) is square \({\mathbb {P}}^{Z}\)-integrable for \((\theta ,x)\in \Theta \times {\mathbb {R}}\), then we shall endow \(\Theta \times {\mathbb {R}}\) with the semimetric \({\overline{d}}_{\Theta ,\Phi }\) defined by

$$\begin{aligned} {\overline{d}}_{\Theta ,\Phi }\big ((\theta ,x),(\vartheta ,y)\big ):= \sqrt{{\mathbb {V}}\text {ar}\big [G_{\Phi }\big ((\theta ,x),Z_{1}\big ) - G_{\Phi }\big ((\vartheta ,y),Z_{1}\big )\big ]}. \end{aligned}$$

Now the application of Theorem 2.2 to the restricted optimization problems associated with the function classes \({\mathbb {F}}^{\Theta }_{\Phi ,k}\) reads as follows.

Proposition 4.4

Let (A1), (A2’) be fulfilled. With \(\xi \) from (A2’) the mapping \(\xi _{k}:= [\Phi ^{*'}_{+}(\xi + k) + 1]\sqrt{\xi ^{2} + k^{2}}\) is assumed to be square \({\mathbb {P}}^{Z}\)-integrable for every \(k\in {\mathbb {N}}\). If \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) is finite, then the following statements are true.

  1. (1)

    \(\inf \limits _{\begin{array}{c} \theta \in \Theta \\ x\in [-k,k] \end{array}}\frac{1}{n}\sum \limits _{j=1}^{n}G_{\Phi }\big ((\theta ,x),Z_{j}\big ) - \inf \limits _{\begin{array}{c} \theta \in \Theta \\ x\in [-k,k] \end{array}}{\mathbb {E}}\Big [G_{\Phi }\big ((\theta ,x),Z_{1}\big )\Big ]\) is a random variable on the probability space \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) for arbitrary \(k,n\in {\mathbb {N}}\).

  2. (2)

    For \(k\in {\mathbb {N}}\) the set

    $$\begin{aligned} {\mathcal {S}}_{k}{} & {} := \Big \{(\theta ,x)\in \Theta \times [-k,k]\mid {\mathbb {E}}\big [G_{\Phi }\big ((\theta ,x),Z_{1}\big )\big ] = \inf _{\begin{array}{c} \theta \in \Theta \\ x\in [-k,k] \end{array}}{\mathbb {E}}\big [G_{\Phi }\big ((\theta ,x),Z_{1}\big )\big ]\Big \} \end{aligned}$$

    is nonvoid and compact.

  3. (3)

    For \(k\in {\mathbb {N}}\) the semimetric \({\overline{d}}_{\Theta ,\Phi }\) is totally bounded on \(\Theta \times [-k,k]\), and there exists some centered Gaussian process \({\mathfrak {G}}^{k} = ({\mathfrak {G}}^{k}_{(\theta ,x)})_{(\theta ,x)\in \Theta \times [-k,k]}\) such that the sequence

    $$\begin{aligned} \Big (\sqrt{n}\Big [\inf \limits _{\begin{array}{c} \theta \in \Theta \\ x\in [-k,k] \end{array}}\frac{1}{n}\sum \limits _{j=1}^{n}G_{\Phi }\big ((\theta ,x),Z_{j}\big ) - \inf \limits _{\begin{array}{c} \theta \in \Theta \\ x\in [-k,k] \end{array}}{\mathbb {E}}[G_{\Phi }\big ((\theta ,x),Z_{1}\big )]\Big ]\Big )_{n\in {\mathbb {N}}} \end{aligned}$$

    converges weakly to \( \sup \limits _{(\theta ,x)\in {\mathcal {S}}_{k}}{\mathfrak {G}}^{k}_{(\theta ,x)}. \) This Gaussian process has uniformly continuous paths w.r.t. \({\overline{d}}_{\Theta ,\Phi }\) and satisfies

    $$\begin{aligned} {\mathbb {E}}\big [{\mathfrak {G}}^{k}_{(\theta ,x)}\cdot {\mathfrak {G}}^{k}_{(\vartheta ,y)}\big ] = {\mathbb {C}}\text {ov}\Big (G_{\Phi }\big ((\theta ,x),Z_{1}\big ), G_{\Phi }\big ((\vartheta ,y),Z_{1})\Big ) \end{aligned}$$

    for \(\theta , \vartheta \in \Theta \) and \(x, y\in [-k,k]\).

Proof

Note that by (A1) the objective \(G_{\Phi }\) is measurable w.r.t. the product \(\sigma \)-algebra \({\mathcal {B}}(\Theta )\,{\otimes }\,{\mathcal {B}}({\mathbb {R}}^{d})\) and lower semicontinuous in the parameters \((\theta ,x)\) because \(\Phi ^{*}\) is continuous and nonincreasing. Next, by Lemma 4.3, the mapping \(\xi _{k}\) is a positive envelope of \({\mathbb {F}}^{\Theta }_{\Phi ,k}\), which is also square \({\mathbb {P}}^{Z}\)-integrable by assumption. Hence \(G_{\Phi }\big ((\theta ,x),\cdot \big )\) is square \({\mathbb {P}}^{Z}\)-integrable for \((\theta ,x)\in \Theta \times [-k,k]\). Thus \({\overline{d}}_{\Theta ,\Phi }\) is well-defined on \(\Theta \times [-k,k]\) for \(k\in {\mathbb {N}}\).

Now, the entire statement of Proposition 4.4 follows immediately from Proposition 2.1, and Theorem 2.2 along with Lemma 4.3. \(\square \)

Combining Proposition 4.4 with Proposition 4.2 we may derive our main result concerning first order asymptotics of the SAA (4.3).

Theorem 4.5

Let (A1), (A2’) be fulfilled. The measurable mapping \(\xi \) from (A2’) is assumed to satisfy the property that the mapping \(\xi _{k}:= [\Phi ^{*'}_{+}(\xi + k) + 1]\sqrt{\xi ^{2} + k^{2}}\) is square \({\mathbb {P}}^{Z}\)-integrable for every \(k\in {\mathbb {N}}\). If \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) is finite, then the following statements are valid.

  1. (1)

    \(\big (\inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}({\widehat{F}}_{n,\theta }) - \inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}(F_{\theta })\big )_{n\in {\mathbb {N}}}\) is a sequence of random variables.

  2. (2)

    The sets \({\text {argmin}}{\mathcal {R}}_{\rho ^{\Phi }}\) and \(M_{F_{\theta }}\) (\(\theta \in \Theta \)) are nonvoid and compact.

  3. (3)

    For \(k\in {\mathbb {N}}\) the semimetric \({\overline{d}}_{\Theta ,\Phi }\) is totally bounded on \(\Theta \times [-k,k]\), and there exists some centered Gaussian process \({\mathfrak {G}}^{\Phi } = ({\mathfrak {G}}^{\Phi }_{(\theta ,x)})_{(\theta ,x)\in \Theta \times {\mathbb {R}}}\) with

    $$\begin{aligned} {\mathbb {E}}\big [{\mathfrak {G}}^{\Phi }_{(\theta ,x)}\cdot {\mathfrak {G}}^{\Phi }_{(\vartheta ,k)}\big ]{} & {} = {\mathbb {C}}\text {ov}\Big (G_{\Phi }\big ((\theta ,x),Z_{1}\big ), G_{\Phi }\big ((\vartheta ,y),Z_{1})\Big )\\{} & {} \qquad \hbox {for}~\theta , \vartheta \in \Theta ;~x, y\in {\mathbb {R}}\end{aligned}$$

    such that the sequence \(\big (\sqrt{n}[\inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}({\widehat{F}}_{n,\theta }) - \inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}(F_{\theta })]\big )_{n\in {\mathbb {N}}}\) converges weakly to

    $$\begin{aligned} \sup _{\theta \in {\text {argmin}}{\mathcal {R}}_{\rho ^{\Phi }}}\sup _{x\in M_{F_{\theta }}}{\mathfrak {G}}^{\Phi }. \end{aligned}$$

    In addition the paths of the Gaussian process are uniformly continuous on the set \(\Theta \times [-k,k]\) w.r.t. \({\overline{d}}_{\Theta ,\Phi }\) for \(k\in {\mathbb {N}}\). Moreover, if the optimization problem (4.2) has a unique solution \(\theta ^{*}\), and if \(M_{F_{\theta ^{*}}}\) has one element \(x_{\theta ^{*}}\) only, then the weak limit is some centered normally distributed random variable with variance \({\mathbb {V}}\text {ar}\big (\Phi ^{*}\big (G(\theta ^{*},Z_{1}) + x^{*}\big )\big )\).

Proof

By Theorem 4.1 and representation (4.3) we have

$$\begin{aligned}&\inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}(F_{\theta }) = \inf \limits _{(\theta ,x)\in \Theta \times {\mathbb {R}}}{\mathbb {E}}[G_{\Phi }\big ((\theta ,x),Z_{1}\big )],\\&\nonumber \inf \limits _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}({\widehat{F}}_{n,\theta }) = \inf _{k\in {\mathbb {N}}}~\inf _{(\theta ,x)\in \Theta \times [-k,k]}\frac{1}{n}\sum _{j=1}^{n}G_{\Phi }\big ((\theta ,x),Z_{j}\big )\quad \hbox {for}~n\in {\mathbb {N}}. \end{aligned}$$
(4.8)

Then statement 1) may be concluded immediately from statement 1) of Proposition 4.4 because the optimal values of the optimization problemes (4.2) and (4.3) are always finite due to Proposition 4.2. Concerning statement 2), we already know from Proposition A.1 in the Appendix that for any \(\theta \in \Theta \), the set \(M_{F_{\theta }}\) is nonvoid and compact. Furthermore, the set of minimizers of (4.5) is nonvoid and compact by Proposition 4.2. Then \(\theta \) belongs to \({\text {argmin}}~{\mathcal {R}}_{\rho ^{\Phi }}\) if and only if \((\theta ,x)\) is a solution of (4.5) for some \(x\in {\mathbb {R}}\). Hence \({\text {argmin}}~{\mathcal {R}}_{\rho ^{\Phi }}\) is nonvoid and compact so that statement 2) is verified. Hence it remains to show statement 3).

For this purpose let us select a sequence \((\mathfrak {G}^{k})_{k\in {\mathbb {N}}}\) of centered Gaussian processes \(\mathfrak {G}^{k} = \big (\mathfrak {G}^{k}_{(\theta ,x)}\big )_{(\theta ,x)\in \Theta \times [-k,k]}\) as in statement 3) of Proposition 4.4. With \(\xi \) from (A2’), and defining for \(n\in {\mathbb {N}}\) the set \(A_{n,1}^{\xi }\) as in the text preceding Proposition 4.2, we may find by Proposition 4.2 some \(k_{1}\in {\mathbb {N}}\) such that the set of minimizers of \({\mathbb {E}}\big [G_{\Phi }(\cdot ,Z_{1})\big ]\) is contained in \(\Theta \times [-k_{1},k_{1}]\), and for \(n\in {\mathbb {N}}, \omega \in A_{n,1}^{\xi }\)

$$\begin{aligned} \inf _{(\theta ,x)\in \Theta \times {\mathbb {R}}}\frac{1}{n}\sum _{j=1}^{N}G_{\Phi }\big ((\theta ,x),Z_{j}(\omega )\big ) = \inf _{\begin{array}{c} (\theta ,x)\in \Theta \\ x\in [-k_{1},k_{1}] \end{array}}\frac{1}{n}\sum _{j=1}^{N} G_{\Phi }\big ((\theta ,x),Z_{j}(\omega )\big ). \end{aligned}$$

Next, by assumption, the sequences \(\big (\xi (Z_{j})\big )_{j\in {\mathbb {N}}}\) and \(\big \{\Phi ^{*}\big (\xi (Z_{j})\big )\big \}_{j\in {\mathbb {N}}}\) consist of independent integrable random variables which are identically distributed as \(\xi (Z_{1})\) and \(\Phi ^{*}(\xi (Z_{1})\big )\) respectively. Then \({\mathbb {P}}(\Omega \setminus A_{n,1}^{\xi }) \rightarrow 0\) by law of large numbers, and thus

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {P}}\Big (\Big \{\big |\sqrt{n}\big [\inf _{\begin{array}{c} \theta \in \Theta \\ |x|\le k_{1} \end{array}}\frac{1}{n}\sum _{j=1}^{n}G_{\Phi }\big ((\theta ,x),Z_{j}\big )- \inf _{\begin{array}{c} \theta \in \Theta \\ x\in {\mathbb {R}} \end{array}}\frac{1}{n}\sum _{j=1}^{n}G_{\Phi }\big ((\theta ,x),Z_{j}\big )\big ]\big | > \varepsilon \Big \}\Big ) = 0 \end{aligned}$$

for \(\varepsilon > 0\). Note also that by choice of \(k_{1}\) the set \({\mathcal {S}}_{{\mathbb {R}}}\) of solutions of (4.5) coincides with the set \({\mathcal {S}}_{k_{1}}\) of minimizers of \({\mathbb {E}}\big [G_{\Phi }(\cdot ,Z_{1})\big ]\) on \(\Theta \times [-k_{1},k_{1}]\). Then in view of statement 3) of Proposition 4.4 along with (4.8) and (4.3)

$$\begin{aligned} \big (\sqrt{n}\big [ \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}({\widehat{F}}_{n,\theta }) - \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}(F_{\theta })\big ]\big )_{n\in {\mathbb {N}}}~\hbox {converges weakly to}~\inf _{(\theta ,x)\in {\mathcal {S}}_{{\mathbb {R}}}}\mathfrak {G}^{k_{1}}_{(\theta ,x)}.\nonumber \\ \end{aligned}$$
(4.9)

Note also that the semimetric \({\overline{d}}_{\Theta ,\Phi }\) is totally bounded on \(\Theta \times [-k,k]\) for every \(k\in {\mathbb {N}}\) by statement 3) of Proposition 4.4. So we may find an isotone sequence \((\Gamma _{k})_{k\in {\mathbb {N}}}\) of at most countable sets such that \(\Gamma _{k}\subseteq \Theta \times [-k,k]\) dense w.r.t. \({\overline{d}}_{\Theta ,\Phi }\) for every \(k\in {\mathbb {N}}\). Set \(\Gamma := \bigcup _{k=1}^{\infty }\Gamma _{k}\).

By Kolmogorov’s consistency theorem there exists some centered Gaussian process \(\overline{{\mathfrak {G}}}^{\Phi } = \big (\overline{{\mathfrak {G}}}^{\Phi }_{(\theta ,x)}\big )_{(\theta ,x)\in \Gamma }\) on a probability space \(({\overline{\Omega }},\overline{{{{\mathcal {F}}}}},{\overline{{\mathbb {P}}}})\) with covariances as in statement 3) of Theorem 4.5 (see e.g. [8, Theorem 12.1.3]). Hence \(\overline{{\mathfrak {G}}}^{\Phi }\) and \({\mathfrak {G}}^{k}\) have identical finite dimensional marginal distributions on \(\Theta \times [-k,k]\) for \(k\in {\mathbb {N}}\). This implies for \(\delta >0\) and \(k\in {\mathbb {N}}\)

$$\begin{aligned} {\mathbb {E}}\big [\sup _{((\theta ,x), (\vartheta ,y))\in {\mathcal {U}}^{k}_{\delta }}|\overline{{\mathfrak {G}}}^{\Phi }_{(\theta ,x)} - \overline{{\mathfrak {G}}}^{\Phi }_{(\vartheta ,y)}|\big ]&= {\mathbb {E}}\big [\sup _{((\theta ,x), (\vartheta ,y))\in {\mathcal {U}}^{k}_{\delta }}|{\mathfrak {G}}^{k}_{(\theta ,x)} - {\mathfrak {G}}^{k}_{(\vartheta ,y)}|\big ], \end{aligned}$$

where \({\mathcal {U}}^{k}_{\delta }:= \big \{\big ((\theta ,x),(\vartheta ,y)\big )\in \Gamma _{k}\times \Gamma _{k}\mid {\overline{d}}_{\Theta ,\Phi }\big ((\theta ,x),(\vartheta ,y)\big ) < \delta \big \}\). Since each \({\mathfrak {G}}^{k}\) has \({\overline{d}}_{\Theta ,\Phi }\)-uniformly continuous paths, we may invoke Theorem 4 from [20] to conclude

$$\begin{aligned} \lim _{\delta \searrow 0}{\mathbb {E}}\big [\sup _{((\theta ,x), (\vartheta ,y))\in {\mathcal {U}}^{k}_{\delta }}|\overline{{\mathfrak {G}}}^{\Phi }_{(\theta ,x)} - \overline{{\mathfrak {G}}}^{\Phi }_{(\vartheta ,y)}|\big ] = 0\quad \hbox {for}~ k\in {\mathbb {N}}. \end{aligned}$$

Then by Proposition B.1 in Appendix 1 there exists some version \(\mathfrak {G}^{\Phi } = \big (\mathfrak {G}^{\Phi }_{(\theta ,x)}\big )_{(\theta ,x)\in \Theta \times {\mathbb {R}}}\) of \(\overline{{\mathfrak {G}}}^{\Phi }\) which has \({\overline{d}}_{\Theta ,\Phi }\)-uniformly continuous paths on \(\Theta \times [-k,k]\) for \(k\in {\mathbb {N}}\). In particular \(\mathfrak {G}^{\Phi }\) is a centered Gaussian process with covariances as in statement 3) of Theorem 4.5. This also means that \(\mathfrak {G}^{\Phi }\) and \(\mathfrak {G}^{k_{1}}\) have identical finite-dimensional distributions on \(\Theta \times [-k_{1},k_{1}]\), implying that \(\inf \nolimits _{(\theta ,x)\in {\widetilde{\Gamma }}}\mathfrak {G}^{\Phi }\) and \(\inf \nolimits _{(\theta ,x)\in {\widetilde{\Gamma }}}\mathfrak {G}^{k_{1}}\) are identically distributed for any nonvoid at most countable subset \({\widetilde{\Gamma }}\subseteq \Theta \times [-k_{1},k_{1}]\). Note that \({\overline{d}}_{\Theta ,\Phi }\) is totally bounded on \({\mathcal {S}}_{{\mathbb {R}}}\), and thus separable. Recall also that the paths of the processes \(\mathfrak {G}^{\Phi }\) and \(\mathfrak {G}^{k_{1}}\) are uniformly continuous on \(\Theta \times [-k_{1},k_{1}]\) w.r.t. \({\overline{d}}_{\Theta ,\Phi }\). Therefore we may verify that the mappings \(\inf \nolimits _{(\theta ,x)\in {\mathcal {S}}_{{\mathbb {R}}}}\mathfrak {G}^{\Phi }_{(\theta ,x)}\) and \(\inf \nolimits _{(\theta ,x)\in {\mathcal {S}}_{{\mathbb {R}}}}\mathfrak {G}^{k_{1}}_{(\theta ,x)}\) are identically distributed random variables. Then in view of (4.9)

$$\begin{aligned} \big (\sqrt{n}\big [ \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}({\widehat{F}}_{n,\theta }) - \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}(F_{\theta })\big ]\big )_{n\in {\mathbb {N}}}~\hbox {converges weakly to}~\inf _{(\theta ,x)\in {\mathcal {S}}_{{\mathbb {R}}}}\mathfrak {G}^{\Phi }_{(\theta ,x)}.\nonumber \\ \end{aligned}$$
(4.10)

Moreover, we already know from statement 2) that the sets \({\text {argmin}}{\mathcal {R}}_{\rho ^{\Phi }}\) and \( M_{F_{\theta }}\) \((\theta \in \Theta )\) are nonvoid. We may also observe that \({\text {argmin}}{\mathcal {R}}_{\rho ^{\Phi }} = \text {Pr}({\mathcal {S}}_{{\mathbb {R}}})\) holds, where \(\text {Pr}\) denotes the standard projection from \(\Theta \times {\mathbb {R}}\) onto \(\Theta \). Therefore

$$\begin{aligned} M_{F_{\theta }}\subseteq [-k_{1},k_{1}]~\hbox {for}~\theta \in {\text {argmin}}R_{\rho ^{\Phi }},~ \inf _{(\theta ,x)\in {\mathcal {S}}_{{\mathbb {R}}}}\mathfrak {G}^{\Phi }_{(\theta ,x)} = \inf _{\theta \in {\text {argmin}}R_{\rho ^{\Phi }}}\inf _{x\in M_{F_{\theta }}}\mathfrak {G}^{\Phi }_{(\theta ,x)}.\nonumber \\ \end{aligned}$$
(4.11)

Combining (4.10) and (4.11) with the above mentioned properties of \(\mathfrak {G}^{\Phi }\), the first part of statement 3) follows immediately. The remaing part is an obvious consequence of the first one. The proof is complete. \(\square \)

Remark 4.6

If \(\Phi (0) = 0\), and if \(\Phi ^{*}\) is strictly convex on \(]0,\infty [\), then \(M_{F_{\theta }}\) is a singleton for every \(\theta \in \Theta \) due to Proposition A.2 in the Appendix below. In this situation we may obtain asymptotic normality in the statement 3) of Theorem 4.5 if the genuine optimization problem (4.2) has a unique solution \(\theta ^{*}\).

Remark 4.7

In case that G either satisfies condition (H) or has representation (PH) Example 2.3 or Example 2.4 respectively show how to find proper positive envelopes \(\xi \) of \({\mathbb {F}}^{\Theta }\) meeting all requirements of Theorem 4.5.

Next let us illustrate the assumptions of Theorem 4.5 by the example of the so called Average Value at Risk also known as Expected Shortfall.

Example 4.8

Let \(\Phi \) be defined by \(\Phi _{\alpha }(x):= 0\) for \(x\le 1/(1-\alpha )\) for some \(\alpha \in ]0,1[\), and \(\Phi (x):= \infty \) if \(x > 1/(1-\alpha )\). Then \(\Phi ^{*}_{\alpha }(y) = y^{+}/(1-\alpha )\) for \(y\in {\mathbb {R}}\). In particular \(H^{\Phi ^{*}}\) coincides with \(L^{1}\), and we may recognize \({\mathcal {R}}_{\rho ^{\Phi }}\) as the so called Average Value at Risk w.r.t. \(\alpha \) (e.g. [11, 29]), i.e.

$$\begin{aligned} {\mathcal {R}}_{\rho ^{\Phi }}(F){} & {} = \frac{1}{1-\alpha }~\int _{F^{\leftarrow }(\alpha )}^{1}\mathbbm {1}_{]0,1[}(u)~F^{\leftarrow }(u)~du \\{} & {} = \inf _{x\in {\mathbb {R}}}\left( \int _{0}^{1}\mathbbm {1}_{]0,1[}(u)~\frac{(F^{\leftarrow }(u) + x)^{+}}{1-\alpha }~du - x\right) \end{aligned}$$

(see e.g. [16]). It may be verified easily that \(M_{F_{0}} = [F^{\leftarrow }_{0}(\alpha ),F^{\rightarrow }_{0}(\alpha )]\), where \(F_{0}^{\rightarrow }\) denotes the right-continuous quantile function of \(F_{0}\in {\mathbb {F}}_{\Phi ^{*}}\).

  1. (1)

    Let \(\xi \) be a square \({\mathbb {P}}^{Z}\)-integrable positive envelope of \({\mathbb {F}}^{\Theta }\). Then \(\xi \) satisfies (A2’), and the mapping \(\xi _{k}:= [\Phi ^{*'}_{\alpha +}(\xi + k) + 1]\sqrt{\xi ^{2} + k^{2}}\) is square \({\mathbb {P}}^{Z}\)-integrable for every \(k\in {\mathbb {N}}\). In particular Theorem 4.5 carries over if (A1) is satisfied, and if \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) is finite.

  2. (2)

    If condition (H) is satisfied, and if \(G({\overline{\theta }},\cdot )\) is square \({\mathbb {P}}^{Z}\)-integrable for some \({\overline{\theta }}\in \Theta \), then we may find by Example 2.3 some square \({\mathbb {P}}^{Z}\)-integrable positive envelope of \({\mathbb {F}}^{\Theta }\) with finite \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\). Hence in view of statement 1) Theorem 4.5 may be always applied under (A1).

  3. (3)

    In case that G has representation (PH) being lower semicontinuous in \(\theta \), there exists by Example 2.4 some square \({\mathbb {P}}^{Z}\)-integrable positive envelope of \({\mathbb {F}}^{\Theta }\) with finite \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\). Furthermore (A1) is automatically fulfilled. Therefore by statement 1) Theorem 4.5 may be applied.

  4. (4)

    In [14] the asymptotic result of Theorem 4.5 was obtained under (H) with \(\beta = 1\). In addition convexity in \(\theta \) was imposed on the goal function G (see [14, Theorem 2]). However the investigations there are extended to optimization problems

    $$\begin{aligned} \inf _{\theta \in \Theta }\sup _{w\in \mathfrak {W}}\Big (w_{0}{\mathbb {E}}\big [G(\theta ,Z_{1})\big ] + \sum _{i=1}^{r}w_{i} \rho _{\Phi _{\alpha _{i}}}\big (G(\theta ,Z_{1})\big )\Big ) \end{aligned}$$

    for fixed \(\alpha _{1},\ldots ,\alpha _{r}\in ]0,1[\). Here \(\mathfrak {W}\) denotes a nonvoid subset of \(\Delta _{r+1}\) which consists of all \(w\in [0,\infty [^{r+1}\) satisfying \(w_{0}+\dots +w_{r} = 1\).

Remark 4.9

If the optimization problem (4.2) has a unique solution \(\theta ^{*}\), and if \(M_{F_{\theta ^{*}}}\) has one element \(x_{\theta ^{*}}\) only, then with arbitrary positive upper estimate \(L\ge \sqrt{{\mathbb {V}}\text {ar}\big (\Phi ^{*}\big (G(\theta ^{*},Z_{1}) + x^{*}\big )\big )}\) we may construct by Theorem 4.5 asymptotic confidence intervals on the optimal value of (4.2) in the same way as described in Remark 2.5. Alternatively, nonasymptotic confidence intervals may be built on upper estimates for the deviation probabilities

$$\begin{aligned} {\mathbb {P}}\Big (\Big \{\big |\inf _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}\big ({\hat{F}}_{n,\theta }\big ) - \inf _{\theta \in \Theta }{\mathcal {R}}_{\rho ^{\Phi }}\big (F_{\theta }\big )\big |\ge \varepsilon \Big \}\Big )\quad (n\in {\mathbb {N}},\varepsilon > 0) \end{aligned}$$

from Theorem 4.6 in [18]. In view of Example 2.4 both approaches may be applied to objectives satisfying representation (PH), e.g. in two stage mixed-integer programs.

In the special case of \(\rho \) being the Average Value at Risk it would be interesting to compare the confidence intervals corresponding to both methods with those confidence which are obtained with stochastic mirror descent as in Sect. 5.2.2 of [14]. However, for this purpose we have to impose further regularity conditions on the objective at least convexity and continuity in the parameters.

5 The basic technical result

In Sects. 3, 4 the main convergences results, namely Theorems 3.3, 4.5, are derived as applications of Theorem 2.2. Roughly speaking our verification of Theorem 2.2 combines a convergence theorem for empirical processes with the functional delta method for optimal values. This section provides a suitable convergence result for empirical processes adapted to the situation of the paper.

We shall use the following notations. For some nonvoid set \({\mathbb {T}}\) we shall denote by \(l^{\infty }({\mathbb {T}})\) the space of all bounded real-valued mappings on \({\mathbb {T}}\). It will be endowed with the supremum norm \(\Vert \cdot \Vert _{{\mathbb {T}},\infty }\) and the induced Borel \(\sigma \)-algebra.

Let us introduce the random processes

$$\begin{aligned} Y_{n}: \Theta \times \Omega \rightarrow {\mathbb {R}},~(\theta ,\omega )\mapsto \frac{1}{n}\sum _{j=1}^{n}\Big (G\big (\theta ,Z_{j}(\omega )\big ) - {\mathbb {E}}[G(\theta ,Z_{1})]\Big )\quad (n\in {\mathbb {N}}). \end{aligned}$$

If the objective G satisfies the properties (A1) and (A2), then \(Y_{n}(\cdot ,\omega )\) belongs to \(l^{\infty }(\Theta )\) for every \(\omega \in \Omega \).

Under (A1), (A2) the mapping \(\theta \mapsto {\mathbb {E}}[G(\theta ,Z_{1})]\) on \(\Theta \) is lower semicontinuous (see Lemma 6.1 below). Hence each process \(Y_{n}\) is \({\mathcal {B}}(\Theta )\otimes {\mathcal {B}}({\mathbb {R}}^{d})\)-measurable by (A1), \(\Theta \) is a Polish space, and \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) is complete. Then \(\sup _{\theta \in \Theta }|Y_{n}(\theta ,\cdot )| = \Vert Y_{n}\Vert _{\Theta ,\infty }\) is a random variable on \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) (see [30, Lemma 1.7.5]). We shall study the convergence of the sequence built upon these random variables.

Turning over to the issue of asymptotic distributions of the random processes \(Y_{n}\), we are faced with the inconvenience that they might not be Borel random elements of \(l^{\infty }(\Theta )\). Hence in general we may not apply weak convergence to the random processes \(Y_{n}\). Fortunately, for our purposes it is sufficient to look instead when the sequence \((\sqrt{n}~Y_{n})_{n\in {\mathbb {N}}}\) of empirical processes \(\sqrt{n}~Y_{n}\) converges in law (in the sense of Hoffmann-Jorgensen) to some tight Borel random element of \(l^{\infty }(\Theta )\). Recall that for a sequence \(\big (({\overline{\Omega }}_{n},{\overline{{\mathcal {F}}}}_{n},{\overline{{\mathbb {P}}}}_{n})\big )_{n\in {\mathbb {N}}}\) of probability spaces and a metric space \(({\mathcal {D}},d_{{\mathcal {D}}})\), a sequence \(({\overline{W}}_{n})_{n\in {\mathbb {N}}}\) of mappings \({\overline{W}}_{n}:{\overline{\Omega }}_{n}\rightarrow {\mathcal {D}}\) is called to converge in law (in the sense of Hoffmann-Jorgensen) to a Borel random element \({\overline{W}}\) of \({\mathcal {D}}\) if the convergence \({\mathbb {E}}_{n}^{*}[f({\overline{W}}_{n})]\rightarrow {\mathbb {E}}[f({\overline{W}})]\) holds for every bounded continuous \(f:{\mathcal {D}}\rightarrow {\mathbb {R}}\). Here \({\mathbb {E}}_{n}^{*}\) is used to denote the outer expectation w.r.t. to \({\overline{{\mathbb {P}}}}_{n}\). For introduction and further studies of this kind of convergence we recommend [30], where, however, it is called weak convergence. Note that the mappings are not required to be Borel random elements of \({\mathcal {D}}\). Thus convergence in law differs from the usual weak convergence of Borel random elements. We decided to emphasize this difference by avoiding the term weak convergence. Obviously both concepts coincide if the involved mappings are Borel random elements of \({\mathcal {D}}\).

Let us remind the semimetric \({\overline{d}}_{\Theta }\) on \(\Theta \), defined just before Theorem 2.2. Our basic technical result is the following criterion which guarantees almost sure convergence of \(\big (\Vert Y_{n}(\theta ,\cdot )\Vert _{\Theta ,\infty }\big )_{n\in {\mathbb {N}}}\), and convergence in law of \((\sqrt{n}~Y_{n})_{n\in {\mathbb {N}}}\) to some tight centered Gaussian random element of \(l^{\infty }(\Theta )\).

Theorem 5.1

Let (A1), (A2) be fulfilled, where the mapping \(\xi \) from (A2) is square \({\mathbb {P}}^{Z}\)-integrable. Using notation (2.4), if \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) is finite, then the following statements are valid.

  1. (1)

    \(\lim \limits _{n\rightarrow \infty }\Vert Y_{n}\Vert _{\Theta ,\infty } = 0\) \({\mathbb {P}}\)-a.s..

  2. (2)

    \({\overline{d}}_{\Theta }\) is totally bounded, and there exists some tight random element \({\mathfrak {G}}\) of \(l^{\infty }(\Theta )\) such that the sequence \((\sqrt{n}~Y_{n})_{n\in {\mathbb {N}}}\) converges in law to \({\mathfrak {G}}\). This tight random element is a centered Gaussian process \({\mathfrak {G}} = ({\mathfrak {G}}_{\theta })_{\theta \in \Theta }\) which has uniformly continuous paths w.r.t. \({\overline{d}}_{\Theta }\), satisfying in addition

    $$\begin{aligned} {\mathbb {E}}\big [{\mathfrak {G}}_{\theta }\cdot {\mathfrak {G}}_{\vartheta }\big ] = {\mathbb {C}}\text {ov}\big (G(\theta ,Z_{1}), G(\vartheta ,Z_{1})\big )\quad \hbox {for}~\theta , \vartheta \in \Theta . \end{aligned}$$

Theorem 5.1 has the following corollary which will turn out to be useful in the context of the SAA method under absolute semideviation.

Corollary 5.2

Define for any nonvoid compact interval \({\mathcal {I}}\subseteq {\mathbb {R}}\) the real valued mapping \({\overline{G}}_{{\mathcal {I}}}\) on \((\Theta \times {\mathcal {I}})\times {\mathbb {R}}^{d}\) via \({\overline{G}}_{{\mathcal {I}}}\big ((\theta ,t),z\big ):= \big (G(\theta ,z)- t)^{+}\). Then under the assumptions of Theorem 5.1 the mappings

$$\begin{aligned}{} & {} {\overline{X}}^{{\mathcal {I}}}_{n}: (\Theta \times {\mathcal {I}})\times \Omega \rightarrow {\mathbb {R}},\\{} & {} \big ((\theta ,t), \omega \big )\mapsto \frac{1}{n}\sum _{j=1}^{n}{\overline{G}}_{{\mathcal {I}}}\big ((\theta ,t),Z_{j}(\omega )\big ) -{\mathbb {E}}\big [{\overline{G}}_{{\mathcal {I}}}\big ((\theta ,t),Z_{1}\big )\big ]\quad (n\in {\mathbb {N}}) \end{aligned}$$

satisfy \({\overline{X}}^{{\mathcal {I}}}_{n}(\cdot ,\omega )\in l^{\infty }(\Theta \times {\mathcal {I}})\) for every \(\omega \in \Omega \), and the sequence \((\sqrt{n}{\overline{X}}^{{\mathcal {I}}}_{n})_{n\in {\mathbb {N}}}\) converges in law to some tight centered Gaussian random element of \(l^{\infty }(\Theta \times {\mathcal {I}})\).

Proof

We may observe

$$\begin{aligned} \big |{\overline{G}}_{{\mathcal {I}}}\big ((\theta ,t),z\big ) - {\overline{G}}_{{\mathcal {I}}}\big ((\vartheta ,s),z\big )\big |^{2} \le 4\big (|G(\theta ,z) - G(\vartheta ,z)|^{2} + |t - s|^{2}\big ) \end{aligned}$$

for \(\theta ,\vartheta \in \Theta ; t, s\in {\mathbb {R}}\). Property (A2) is assumed to hold for some square \({\mathbb {P}}^{Z}\)-integrable mapping \(\xi \). Then \(\xi + |\inf {\mathcal {I}}| + |\sup {\mathcal {I}}|\) is a square \({\mathbb {P}}^{Z}\)-integrable positive envelope of \(\big \{{\overline{G}}_{{\mathcal {I}}}\big ((\theta ,t),\cdot \big )\mid (\theta ,t)\in \Theta \times {\mathcal {I}} \big \}\). Now, the statement of Corollary 5.2 may be derived easily from Theorem 5.1, using Corollary 2.10.13 from [30]. \(\square \)

6 Proofs

Let us introduce the sequence \(\big (X_{n}\big )_{n\in {\mathbb {N}}}\) of random processes

$$\begin{aligned} X_{n}:\Omega \times \Theta \rightarrow {\mathbb {R}},~X_{n}(\omega ,\theta ):= \frac{1}{n}~\sum _{i=1}^{n}G\big (\theta ,Z_{i}(\omega )\big ) \quad (n\in {\mathbb {N}}), \end{aligned}$$

and under (A2) the mapping \( \psi :\theta \rightarrow {\mathbb {R}},~\theta \mapsto {\mathbb {E}}[G(\theta ,Z_{1})]. \) First of all we want to fix lower semicontinuous of the mapping \(\psi \).

Lemma 6.1

Under (A1), (A2) the mapping \(\psi \) is lower semicontinuous. It is even continuous if in addition property (A3) holds.

Proof

With \(\xi \) from (A2) the mapping \(G\big (\cdot ,Z_{1}(\omega )\big ) + \xi \big (Z_{1}(\omega )\big )\) is nonnegative and lower semicontinuous for \(\omega \in \Omega \) due to (A2) along with (A1). Then an application of Fatou’s Lemma shows that \(\psi + {\mathbb {E}}[\xi (Z_{1})]\) is lower semicontinuous. This implies lower semicontinuity of \(\psi \). If in addition (A3) is fulfilled, then continuity of \(\psi \) follows directly from Vitalis’ theorem (see [1, Proposition 21.4]). This completes the proof. \(\square \)

6.1 Proof of Proposition 2.1

The mapping \(\psi \) is lower semicontinuous due to Lemma 6.1. Then statement 2) follows immediately due to compactness of \(\Theta \).

Concerning statement 1) let \(n\in {\mathbb {N}}\). The mapping \(X_{n}(\omega ,\cdot )\) is bounded from below on \(\Theta \) for every \(\omega \in \Omega \) by (A2). Furthermore

$$\begin{aligned} \big \{\omega \in \Omega \mid \inf _{\theta \in \Theta }X_{n}(\omega ,\theta )< t\big \} = \text {Pr}_{\Omega }\big (\big \{(\omega ,\theta )\in \Omega \times \Theta \mid X_{n}(\omega ,\theta ) < t\big \}\big )\quad \hbox {for}~t\in {\mathbb {R}}, \end{aligned}$$

where \(\text {Pr}_{\Omega }\) denotes the standard projection from \(\Omega \times \Theta \) onto \(\Omega \). By the assumption (A1) the set \(\big \{(\theta ,\omega )\in \Omega \times \Theta \mid X_{n}(\omega ,\theta ) < t\big \}\) belongs to \({\mathcal {F}}\otimes {\mathcal {B}}(\Theta )\) for every \(t\in {\mathbb {R}}\). Since \(\Theta \) is a Polish space, and since \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) is complete, we may conclude that the set \(\big \{\omega \in \Omega \mid \inf _{\theta \in \Theta }X_{n}(\cdot ,\theta ) < t\big \}\) is a member of \({\mathcal {F}}\) (see [6, Proposition 8.4.4]). In particular, \(\inf _{\theta \in \Theta }X_{n}(\cdot ,\theta )\) is a random variable on \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\) which completes the proof. \(\Box \)

6.2 Proof of Theorem 5.1 and Theorem 2.2

Let \((Y_{n})_{n\in {\mathbb {N}}}\) be the sequence of stochastic processes introduced in Sect. 5. In order to show the desired convergences of \(\big (\Vert Y_{n}\Vert _{\Theta ,\infty }\big )_{n\in {\mathbb {N}}}\) and \((\sqrt{n}~Y_{n})_{n\in {\mathbb {N}}}\) in \(l^{\infty }(\Theta )\) we shall invoke results from empirical process theory. We have to circumvent some subtleties of measurability, reminding the notion of \({\mathbb {P}}^{Z}\)-measurable classes. A class \({\mathbb {F}}\) of Borel measurable mappings from \({\mathbb {R}}^{d}\) into \({\mathbb {R}}\) is called a \({\mathbb {P}}^{Z}\)-measurable class if for \(n\in {\mathbb {N}}\) and \(a_{1},\ldots ,a_{n}\in {\mathbb {R}}^{n}\), and every \(h\in {\mathbb {F}}\) the mapping

$$\begin{aligned} {\mathbb {R}}^{nd}\rightarrow {\mathbb {R}},~(z_{1},\ldots ,z_{n})\mapsto \sup _{h\in {\mathbb {F}}}\big |\sum _{j=1}^{n}a_{i} h(z_{j})\big | \end{aligned}$$

is well-defined and measurable on the completion of the n-times product probability space \(\big ({\mathbb {R}}^{nd},{\mathcal {B}}({\mathbb {R}}^{nd}),({\mathbb {P}}^{Z})^{n}\big )\) of \(\big ({\mathbb {R}}^{d},{\mathcal {B}}({\mathbb {R}}^{d}),{\mathbb {P}}^{Z}\big )\) (see [30, Definition 2.3.3]).

Define for a nonvoid nonvoid \(\Gamma \subseteq \Theta \times \Theta \) the function class \({\mathbb {F}}_{\Gamma }^{\Theta }\) consisting of all \(G(\theta ,\cdot ) - G(\vartheta ,\cdot )\) with \(\theta , \vartheta \in \Gamma \), and set \({\mathbb {F}}^{\Theta ,2}_{\Gamma }:= \big \{|G(\theta ,\cdot ) - G(\vartheta ,\cdot )|^{2}\mid \theta , \vartheta \in \Gamma \big \}\). If \(\Gamma \) is a Borel subset of \(\Theta \times \Theta \) these classes are already \({\mathbb {P}}^{Z}\)-measurable classes which is subject of the following result.

Lemma 6.2

Let \(\Gamma \) be some nonvoid Borel subset of \(\Theta \times \Theta \). If (A1) and (A2) are satisfied, then \({\mathbb {F}}^{\Theta }\), \({\mathbb {F}}^{\Theta }_{\Gamma }\) and \({\mathbb {F}}^{\Theta ,2}_{\Gamma }\) are \({\mathbb {P}}^{Z}\)-measurable classes.

Proof

Under (A1) and (A2), for \(n\in {\mathbb {N}}\), \((a_{1},\ldots ,a_{n})\in {\mathbb {R}}^{n}\) the processes

$$\begin{aligned}&\big ((z_{1},\ldots ,z_{n}), (\theta ,\vartheta )\big )\mapsto \sum _{j=1}^{n}a_{j}~\big [G(\theta ,z_{j}) - G(\theta ,z_{j})\big ]\\&\big ((z_{1},\ldots ,z_{n}), (\theta ,\vartheta )\big )\mapsto \sum _{j=1}^{n}a_{j}~\big [G(\theta ,z_{j}) - G(\theta ,z_{j})\big ]^{2} \end{aligned}$$

on \({\mathbb {R}}^{d n}\times \Gamma \) are measurable w.r.t. the product \({\mathcal {B}}({\mathbb {R}}^{d n})\otimes {\mathcal {B}}(\Gamma )\) of the Borel \(\sigma \)-algebra \({\mathcal {B}}({\mathbb {R}}^{d n})\) on \({\mathbb {R}}^{dn}\) and the Borel \(\sigma \)-algebra \({\mathcal {B}}(\Gamma )\) on \(\Gamma \) with

$$\begin{aligned} \sup _{(\theta ,\vartheta )\in \Gamma }\big |\sum _{j=1}^{n}a_{j}~[G(\theta ,z_{j}) - G(\theta ,z_{j})]\big |~\vee ~ \sup _{(\theta ,\vartheta )\in \Gamma }\big |\sum _{j=1}^{n}a_{j}~[G(\theta ,z_{j}) - G(\theta ,z_{j})]^{2}\big | < \infty \end{aligned}$$

for \((z_{1},\ldots ,z_{n})\in {\mathbb {R}}^{dn}\). Since \(\Gamma \) is a Borel subset of a Polish space, the mappings

$$\begin{aligned}&{\mathbb {R}}^{dn}\rightarrow {\mathbb {R}},~(z_{1},\ldots ,z_{n})\mapsto \sup _{(\theta ,\vartheta )\in \Gamma }\big |\sum _{j=1}^{n}a_{j}~[G(\theta ,z_{j}) - G(\theta ,z_{j})]\big |\\&{\mathbb {R}}^{dn}\rightarrow {\mathbb {R}},~(z_{1},\ldots ,z_{n})\mapsto \sup _{(\theta ,\vartheta )\in \Gamma }\big |\sum _{j=1}^{n}a_{j}~[G(\theta ,z_{j}) - G(\theta ,z_{j})]^{2}\big | \end{aligned}$$

are random variables on the completion of the probability space \(\big ({\mathbb {R}}^{dn},{\mathcal {B}}({\mathbb {R}}^{d n}),({\mathbb {P}}^{Z})^{n}\big )\) for \(n\in {\mathbb {N}}\) and \((a_{1},\ldots ,a_{n})\in {\mathbb {R}}^{n}\) (see [30, Example 1.7.5]).

In exactly the same way we may also verify \({\mathbb {F}}^{\Theta }\) as a \({\mathbb {P}}^{Z}\)-measurable class. This completes the proof. \(\square \)

Now, we are ready to show Theorem 5.1.

Proof of Theorem 5.1:

Let \(({\overline{\Omega }},\overline{{{{\mathcal {F}}}}},{\overline{{\mathbb {P}}}})= \big (({\mathbb {R}}^{d})^{{\mathbb {N}}},{\mathcal {B}}({\mathbb {R}}^{d})^{\otimes {\mathbb {N}}},({\mathbb {P}}^{Z})^{{\mathbb {N}}}\big )\) be the countable product probability space of \(\big ({\mathbb {R}}^{d},{\mathcal {B}}({\mathbb {R}}^{d}), {\mathbb {P}}^{Z}\big )\). Furthermore for \(j\in {\mathbb {N}}\), the mapping \(\pi _{j}: ({\mathbb {R}}^{d})^{{\mathbb {N}}}\rightarrow {\mathbb {R}}^{d}\) is defined by \(\pi _{j}\big ((z_{i})_{i\in {\mathbb {N}}}\big ) = z_{j}\).

For any real valued mapping f on \({\overline{\Omega }}\) the set \({\mathcal {E}}_{f}\) of all mappings \({\overline{f}}: {\overline{\Omega }}\rightarrow {\mathbb {R}}\cup \{\infty \}\) satisfying \({\overline{f}}\ge f\) pointwise and \(\{{\overline{f}} > x\}\in {\overline{{\mathcal {F}}}}\) for \(x\in {\mathbb {R}}\) is nonvoid. Moreover, there exists some \(f^{*}\in {\mathcal {E}}_{f}\) such that \(f^{*}\le {\overline{f}}\) \({\overline{{\mathbb {P}}}}\)-a.s. for every \({\overline{f}}\in {\mathcal {E}}_{f}\) (see [30, Lemma 1.2.1]). It is also known that \((\lambda f)^{*} = \lambda f^{*}\) holds for any \(\lambda > 0\) (see [30, Lemma 1.2.2]).

The mappings \(\big (\sup _{\theta \in \Theta }|\sum _{j=1}^{n}G(\theta ,\pi _{j})/n - {\mathbb {E}}_{{\overline{{\mathbb {P}}}}}[G(\theta ,\pi _{1})]|\big )^{*}\) have finite values if (A2) is fulfilled. Then by Markov’s inequality

$$\begin{aligned}&{\overline{{\mathbb {P}}}}\Big (\Big \{\big (\sup _{\theta \in \Theta }|\frac{1}{n}\sum _{j=1}^{n}G(\theta ,\pi _{j}) - {\mathbb {E}}_{{\overline{{\mathbb {P}}}}}[G(\theta ,\pi _{1})]|\big )^{*} > \varepsilon \Big \}\Big ) \\&\le {\mathbb {E}}_{{\overline{{\mathbb {P}}}}}\Big [\big (\sup _{\theta \in \Theta }|\frac{1}{\sqrt{n}}\sum _{j=1}^{n}G(\theta ,\pi _{j}) - \sqrt{n}{\mathbb {E}}_{{\overline{{\mathbb {P}}}}}[G(\theta ,\pi _{1})]|\big )^{*}\Big ]/(\sqrt{n}\varepsilon ) \quad \hbox {for}~n\in {\mathbb {N}}. \end{aligned}$$

Since \({\mathbb {F}}^{\Theta }\) is a \({\mathbb {P}}^{Z}\)-measurable class by Lemma 6.2, and since the positive envelope \(\xi \) is assumed to be square \({\mathbb {P}}^{Z}\)-integrable, we may apply Theorem 2.14.1 from [30] to find a constant \(M > 0\) such that

$$\begin{aligned}{} & {} {\mathbb {E}}_{{\overline{{\mathbb {P}}}}}\Big [\big (\sup _{\theta \in \Theta }|\frac{1}{\sqrt{n}}\sum _{j=1}^{n}G(\theta ,\pi _{j}) - \sqrt{n}{\mathbb {E}}_{{\overline{{\mathbb {P}}}}}[G(\theta ,\pi _{1})]|\big )^{*}\Big ]\\{} & {} \quad \le M \big [1 + {\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\big ]~\Vert \xi \Vert _{{\mathbb {P}}^{Z},2} \quad \hbox {for}~n\in {\mathbb {N}}. \end{aligned}$$

In particular, by finiteness of \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\), we end up with

$$\begin{aligned} \lim _{n\rightarrow \infty }{\overline{{\mathbb {P}}}}\Big (\Big \{\big (\sup _{\theta \in \Theta }|\frac{1}{n}\sum _{j=1}^{n}G(\theta ,\pi _{j}) - {\mathbb {E}}_{{\overline{{\mathbb {P}}}}}[G(\theta ,\pi _{1})]|\big )^{*} > \varepsilon \Big \}\Big ) = 0. \end{aligned}$$

Then in view of Corollary 3.7.9 from [12] statement 1) follows immediately.

Concerning statement 2) let for \(\delta > 0\) the set \(\Gamma _{\delta }\) consist of all \((\theta ,\vartheta )\in \Theta \times \Theta \) such that \(\Vert G(\theta ,\cdot ) - G(\vartheta ,\cdot )\Vert _{{\mathbb {P}}^{Z},2} < \delta \). Note that all \(G(\theta ,\cdot )\) are square \({\mathbb {P}}^{Z}\)-integrable because the positive envelope \(\xi \) of \({\mathbb {F}}^{\Theta }\) is assumed to be square \({\mathbb {P}}^{Z}\)-integrable. The mapping \(\big ((\theta ,\vartheta ),z\big )\mapsto |G(\theta ,z) - G(\vartheta ,z)|^{2}\) on \((\Theta \times \Theta )\times {\mathbb {R}}^{d}\) is measurable w.r.t. the product \(\sigma \)-algebra \({\mathcal {B}}(\Theta \times \Theta )\otimes {\mathcal {B}}({\mathbb {R}}^{d})\) of the Borel \(\sigma \)-algebra \({\mathcal {B}}(\Theta \times \Theta )\) and the Borel \(\sigma \)-algebra \({\mathcal {B}}({\mathbb {R}}^{d})\) on \({\mathbb {R}}^{d}\) due to (A1). Furthermore \(|G(\theta ,\cdot ) - G(\vartheta ,\cdot )|^{2}\) is \({\mathbb {P}}^{Z}\)-integrable for \(\theta , \vartheta \in \Theta \). Then by Tonelli’s theorem, \(\phi (\theta ,\vartheta ):= \Vert G(\theta ,\cdot ) - G(\vartheta ,\cdot )\Vert _{{\mathbb {P}}^{Z},2}\) defines a mapping \(\phi :\Theta \times \Theta \rightarrow {\mathbb {R}}\) which is Borel measurable, and thus \(\Gamma _{\delta }\) is a Borel subset of \(\Theta \times \Theta \). So in view of Lemma 6.2 the set \({\mathbb {F}}^{\Theta }_{\Gamma _{\delta }}\) is a \({\mathbb {P}}^{Z}\)-measurable class. We also know from Lemma 6.2 that \({\mathbb {F}}^{\Theta ,2}_{\Theta \times \Theta }\) is a \({\mathbb {P}}^{Z}\)-measurable class.

Since in addition \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) is finite, we are in the position to apply Theorem 2.5.2 from [30]. According to this result we may conclude immediately the entire statement 2) which completes the proof. \(\Box \)

Let us turn over to the proof of Theorem 2.2. It is a straightforward consequence of Theorem 5.1 via the functional delta method.

Proof of Theorem 2.2:

In view of the convergence result for \((\sqrt{n}~Y_{n})_{n\in {\mathbb {N}}}\) by Theorem 5.1 the functional delta method for infimal value mappings on \(l^{\infty }(\Theta )\) (see [26, Proposition 1 with Theorem 1]) yields the claimed convergence of the sequence (1.3) in Theorem 2.2. The remainig part of Theorem 2.2 is an obvious consequence. \(\Box \)

6.3 Proof of Proposition 3.2

Let us remind the random processes \(X_{n}\) and the mapping \(\psi \) introduced at the beginning of this section. For abbreviation we introduce the sequence \((V_{n})_{n\in {\mathbb {N}}}\) of random processes \(V_{n}: \Theta \times \Omega \rightarrow {\mathbb {R}}\), defined by

$$\begin{aligned} V_{n}(\theta ,\omega ) = \frac{1}{n} \sum _{j=1}^{n}\Big (G\big (\theta ,Z_{j}(\omega )\big ) - X_{n}(\omega ,\theta )\Big )^{+} - {\mathbb {E}}\big [\big (G\big (\theta ,Z_{1}\big ) - X_{n}(\omega ,\theta )\big )^{+}\big ]. \end{aligned}$$

These random processes are building blocks of the following mappings

$$\begin{aligned}{} & {} W_{n}: \Theta \times \Omega \rightarrow {\mathbb {R}},\\{} & {} (\theta ,\omega )\mapsto \sqrt{n}~\big [V_{n}(\theta ,\omega ) - \frac{1}{n}\sum _{j=1}^{n}\big (G_{1}(\theta ,Z_{j}(\omega )) - {\mathbb {E}}\big [G_{1}(\theta ,Z_{1})\big ]\big )\big ]~(n\in {\mathbb {N}}). \end{aligned}$$

The sequence \((U_{n})_{n\in {\mathbb {N}}}\) of mappings \(U_{n}: \Theta \times \Omega \rightarrow {\mathbb {R}}\), defined by

$$\begin{aligned} U_{n}(\theta ,\omega )= & {} \sqrt{n}~{\mathbb {E}}\big [\big (G(\theta ,Z_{1})- X_{n}(\omega ,\theta )\big )^{+} - G_{1}(\theta ,Z_{1})\big ] \\{} & {} + \sqrt{n}~{\overline{F}}_{\theta }\big (\psi (\theta )\big )~[X_{n}(\omega ,\theta ) - \psi (\theta )] \end{aligned}$$

will also play an important role.

We start with the following observation

$$\begin{aligned} \sqrt{n}~\Big [{\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big )_{|\omega } - \frac{1}{n}\sum _{j=1}^{n}{\widehat{G}}_{1,a}\big (\theta ,Z_{j}(\omega )\big ) \Big ] = a ~W_{n}(\theta ,\omega ) + a ~U_{n}(\theta ,\omega ) \end{aligned}$$

for \(\omega \in \Omega \) and \(\theta \in \Theta \). In particular

$$\begin{aligned}{} & {} \sqrt{n}~\Big |\inf _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big )_{|\omega } - \inf _{\theta \in \Theta }\frac{1}{n}\sum _{j=1}^{n}{\widehat{G}}_{1,a}\big (\theta ,Z_{j}(\omega )\big ) \Big |\nonumber \\{} & {} \le a~\sup _{\theta \in \Theta }|W_{n}(\theta ,\omega )| + a~\sup _{\theta \in \Theta }|U_{n}(\theta ,\omega )| \quad \hbox {for}~\omega \in \Omega . \end{aligned}$$
(6.1)

For further preparation, combining Theorem 5.1 with Egorov’s theorem, we may select some sequence \((\Omega _{k})_{k\in {\mathbb {N}}}\) in \({\mathcal {F}}\) satisfying

$$\begin{aligned} {\mathbb {P}}(\Omega _{k}) \ge 1 - 1/2^{k}\quad \hbox {and}\quad \lim _{n\rightarrow \infty }\sup _{\omega \in \Omega _{k}}\sup _{\theta \in \Theta }\big |X_{n}(\omega ,\theta ) - \psi (\theta )\big | = 0\quad \hbox {for}~ k\in {\mathbb {N}}. \end{aligned}$$
(6.2)

The next result deals with the asymptotics of the first summand in (6.1).

Proposition 6.3

Let (A1), (A2) be fulfilled, where the mapping \(\xi \) from (A2) is square \({\mathbb {P}}^{Z}\)-integrable. Furthermore let \(F_{\theta }\) be continuous at \({\mathbb {E}}[G(\theta ,Z_{1})]\) for every \(\theta \in \Theta \). Using notation (2.4), if \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) is finite, then

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {P}}^{*}\big (\big \{\sup _{\theta \in \Theta }|W_{n}(\theta ,\cdot )|> \varepsilon \big \}\big ) = 0\quad \hbox {for}~\varepsilon > 0, \end{aligned}$$

where \({\mathbb {P}}^{*}\) denotes the outer probability of \({\mathbb {P}}\).

Proof

Let for \(\varepsilon > 0\) and \(n\in {\mathbb {N}}\) define the set \(B_{n}^{\varepsilon }\) to consist of all \(\omega \in \Omega \) such that \(\sup _{\theta \in \Theta }|W_{n}(\theta ,\omega )| > \varepsilon \). We select a sequence \((\Omega _{k})_{k\in {\mathbb {N}}}\) of events as in (6.2). By choice of these events it suffices to show \({\mathbb {P}}^{*}(B_{n}^{\varepsilon }\cap \Omega _{k})\rightarrow 0\) for every \(k\in {\mathbb {N}}\). So let us fix any \(k\in {\mathbb {N}}\).

Set \({\mathcal {I}}:= \big [-1 - {\mathbb {E}}[\xi (Z_{1})], 1 + {\mathbb {E}}[\xi (Z_{1})]\big ]\) and let us remind the mapping \({\overline{G}}_{{\mathcal {I}}}\) and random processes \({\overline{X}}_{n}^{{\mathcal {I}}}\) defined in Corollary 5.2. Note that \(V_{n}(\theta ,\omega ) = {\overline{X}}_{n}^{{\mathcal {I}}}\big ((\theta ,X_{n}(\omega ,\theta )),\omega \big )\) holds for \(X_{n}(\omega ,\theta )\in {\mathcal {I}}\).

From Corollary 5.2 we already know that the sequence \((\sqrt{n}~{\overline{X}}^{{\mathcal {I}}}_{n})_{n\in {\mathbb {N}}}\) converges in law to some tight Gaussian random element of \(l^{\infty }(\Theta \times {\mathcal {I}})\). This means

$$\begin{aligned} \lim _{\delta \searrow 0}\limsup _{n\rightarrow \infty }{\mathbb {P}}^{*}\Big (\Big \{\sup _{\begin{array}{c} (\theta ,t),(\vartheta ,s)\in \Theta \times {\mathcal {I}}\\ {\overline{d}}((\theta ,t),(\vartheta ,s)) < \delta \end{array}}\big |\sqrt{n}~{\overline{X}}^{{\mathcal {I}}}_{n}\big ((\theta ,t),\cdot \big )- \sqrt{n}~{\overline{X}}^{{\mathcal {I}}}_{n}\big ((\vartheta ,s),\cdot \big )\big | > \varepsilon \Big \}\Big ) = 0,\nonumber \\ \end{aligned}$$
(6.3)

where \({\overline{d}}\) denotes the semimetric on \(\Theta \times {\mathcal {I}}\), defined by

$$\begin{aligned} {\overline{d}}\big ((\theta ,t),(\vartheta ,s)\big ) = \sqrt{{\mathbb {V}}\text {ar}\Big ({\overline{G}}_{{\mathcal {I}}}\big ((\theta ,t),Z_{1}\big ) - {\overline{G}}_{{\mathcal {I}}}\big ((\vartheta ,s),Z_{1}\big )\Big )} \end{aligned}$$

(see e.g. [30, Example 1.5.10]).

In view of (6.2) there is some \(n_{0}\in {\mathbb {N}}\) such that \(X_{n}(\omega ,\theta )\in {\mathcal {I}}\) for \(\omega \in \Omega _{k}\), \(\theta \in \Theta \), and \(n\in {\mathbb {N}}\) with \(n\ge n_{0}\). This implies

$$\begin{aligned} |W_{n}(\theta ,\omega )| = \Big |\sqrt{n}~{\overline{X}}^{{\mathcal {I}}}_{n}\Big (\big (\theta ,X_{n}(\omega ,\theta )\big ),\omega \Big )- \sqrt{n}~{\overline{X}}^{{\mathcal {I}}}_{n}\Big (\big (\vartheta ,\psi (\theta )\big ),\omega \Big )\Big | \end{aligned}$$
(6.4)

for \(\omega \in \Omega _{k}\), \(\theta \in \Theta \), and \(n\in {\mathbb {N}}\) with \(n\ge n_{0}\). Furthermore by (6.2)

$$\begin{aligned} \limsup _{n\rightarrow \infty }\sup _{\theta \in \Theta }\sup _{\omega \in \Omega _{k}}{\overline{d}}\Big (\big (\theta ,X_{n}(\omega ,\theta )\big ),\big (\theta ,\psi (\theta )\big )\Big )\le \lim _{n\rightarrow \infty }\sup _{\theta \in \Theta }\sup _{\omega \in \Omega _{k}}\big |X_{n}(\omega ,\theta ) - \psi (\theta )\big | = 0.\nonumber \\ \end{aligned}$$
(6.5)

Combining (6.3), (6.4) and (6.5), we end up with \({\mathbb {P}}^{*}(B_{n}^{\varepsilon }\cap \Omega _{k})\rightarrow 0\) for \(n\rightarrow \infty \) which completes the proof. \(\square \)

Let us turn over to the asymptotics of the second summand in (6.1).

Proposition 6.4

Let (A1)–(A3) be fulfilled, where the mapping \(\xi \) from (A2) is square \({\mathbb {P}}^{Z}\)-integrable. Furthermore let \(F_{\theta }\) be continuous at \({\mathbb {E}}[G(\theta ,Z_{1})]\) for every \(\theta \in \Theta \). Using notation (2.4), if \({\overline{J}}({\mathbb {F}}^{\Theta },\xi ,1)\) is finite, then

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {P}}^{*}\big (\big \{\sup _{\theta \in \Theta }|U_{n}(\theta ,\omega )|> \varepsilon \big \}\big ) = 0\quad \hbox {for}~\varepsilon > 0, \end{aligned}$$

where \({\mathbb {P}}^{*}\) stands for the outer probability of \({\mathbb {P}}\).

Proof

Let us remind the sequence \((Y_{n})_{n\in {\mathbb {N}}}\) introduced at the beginning of Sect. 5, and recall that \(\Vert Y_{n}\Vert _{\Theta ,\infty }\) is a random variable on \((\Omega ,{{{\mathcal {F}}}},{\mathbb {P}})\). We may apply Theorem 5.1 to conclude that the sequence \((\sqrt{n} Y_{n})_{n\in {\mathbb {N}}}\) converges in law to some tight random element of \(l^{\infty }(\Theta )\). Since the norm \(\Vert \cdot \Vert _{\Theta ,\infty }\) is a continuous function on \(l^{\infty }(\Theta )\), the application of the continuous mapping theorem for convergence in law (see [30, Theorem 1.11.1]) yields that \((\sqrt{n}~ \Vert Y_{n}\Vert _{\Theta ,\infty })_{n\in {\mathbb {N}}}\) converges weakly to some random variable. In particular, by Prokhorov’s theorem, this sequence of random variables is uniformly tight. Hence we may find some strictly increasing sequence \((a_{k})_{k\in {\mathbb {N}}}\) of positive real numbers such that the inequality \({\mathbb {P}}(\{\sqrt{n}~\Vert Y_{n}\Vert _{\Theta ,\infty }\le a_{k}\})\ge 1 - 1/2^{k}\) holds for \(k,n\in {\mathbb {N}}\).

For \(k,n\in {\mathbb {N}}\), \(\theta \in \Theta \) and \(\omega \in A_{kn}:= \{\sqrt{n}~\Vert Y_{n}\Vert _{\Theta ,\infty }\le a_{k}\}\) we have

$$\begin{aligned}&|U_{n}(\theta ,\omega )|\nonumber \\&\nonumber = \sqrt{n}~\big |{\mathbb {E}}\big [\big (G(\theta ,Z_{1}) - \psi (\theta ) - Y_{n}(\theta ,\omega )\big )^{+}- \big (G(\theta ,Z_{1}) - \psi (\theta )\big )^{+}\big ] + {\overline{F}}_{\theta }\big (\psi (\theta )\big ) Y_{n}(\theta ,\omega )\big |\\&\nonumber = \sqrt{n}~\int _{- Y_{n}(\theta ,\omega )^{-}}^{0}\hspace{-1cm}\big [{\overline{F}}_{\theta }\big (\psi (\theta ) + t\big ) - {\overline{F}}_{\theta }\big (\psi (\theta )\big )\big ]~dt ~-~ \sqrt{n}~\int ^{Y_{n}(\theta ,\omega )^{+}}_{0}\hspace{-1cm}\big [{\overline{F}}_{\theta }\big (\psi (\theta ) + t\big ) - {\overline{F}}_{\theta }\big (\psi (\theta )\big )\big ]~dt\\&\nonumber = \sqrt{n}~\int ^{Y_{n}(\theta ,\omega )^{-}}_{0}\hspace{-1cm}\big [{\overline{F}}_{\theta }\big (\psi (\theta ) - t\big ) - {\overline{F}}_{\theta }\big (\psi (\theta )\big )\big ]~dt ~-~ \sqrt{n}~\int ^{Y_{n}(\theta ,\omega )^{+}}_{0}\hspace{-1cm}\big [{\overline{F}}_{\theta }\big (\psi (\theta ) + t\big ) - {\overline{F}}_{\theta }\big (\psi (\theta )\big )\big ]~dt\\&\nonumber \le \sqrt{n}~\int ^{a_{k}/\sqrt{n}}_{0}\big [F_{\theta }\big (\psi (\theta ) + t\big ) - F_{\theta }\big (\psi (\theta ) - t\big )\big ]~dt\\&= \int _{0}^{a_{k}}\big [F_{\theta }\big (\psi (\theta ) + s/\sqrt{n}\big ) - F_{\theta }\big (\psi (\theta ) - s/\sqrt{n}\big )\big ]~ds =: b_{k,n}(\theta ), \end{aligned}$$
(6.6)

where in the last step we have used the change of variable formula.

Next, let \((\theta _{k,n})_{n\in {\mathbb {N}}}\) be a sequence in \(\Theta \) satisfying \(b_{k,n}(\theta _{k,n}) > \sup _{\vartheta \in \Theta }b_{k,n}(\vartheta ) - 1/n\) for every \(n\in {\mathbb {N}}\). By compactness of \(\Theta \) any subsequence \((\theta _{k,i(n)})_{n\in {\mathbb {N}}}\) of \((\theta _{k,n})_{n\in {\mathbb {N}}}\) has a further subsequence \((\theta _{k,\varphi (i(n))})_{n\in {\mathbb {N}}}\) which converges to some \(\theta \in \Theta \) w.r.t. the Euclidean metric. The mapping \(\psi \) is continuous under assumptions (A1)–(A3) due to Lemma 6.1. Hence \(\psi (\theta _{k,\varphi (i(n))}) + s/\sqrt{\varphi (i(n))}\rightarrow \psi (\theta )\) and \(\psi (\theta _{k,\varphi (i(n))}) - s/\sqrt{\varphi (i(n))}\rightarrow \psi (\theta )\) for \(s\in [0,a_{k}]\). Since \(F_{\theta }\) is assumed to be continuous at \(\psi (\theta )\), we may invoke Lemma 3.1 to end up with

$$\begin{aligned} \big [F_{\theta _{k,\varphi (i(n))}}\big (\psi (\theta _{k,\varphi (i(n))}) \!+\! s/\sqrt{\varphi (i(n))}\big )\!-\! F_{\theta _{k,\varphi (i(n))}}\big (\psi (\theta _{k,\varphi (i(n))}) \!-\! s/\sqrt{\varphi (i(n))}\big )\big ]\!\rightarrow \! 0 \end{aligned}$$

for \(s\in [0,a_{k}]\). Then by dominated convergence theorem \(b_{k,\varphi (i(n))}(\theta _{\varphi (i(n))})\rightarrow 0\), and thus \(\sup _{\vartheta \in \Theta }b_{k,\varphi (i(n))}(\vartheta )\rightarrow 0\) due to the choice of the sequence \((\theta _{k,n})_{n\in {\mathbb {N}}}\). Therefore we may conclude

$$\begin{aligned} \lim _{n\rightarrow \infty }\sup _{\vartheta \in \Theta }b_{k,n}(\vartheta ) = 0\quad \hbox {for}~k\in {\mathbb {N}}. \end{aligned}$$
(6.7)

Putting (6.6) and (6.7) together we may conclude

$$\begin{aligned} \limsup _{n\rightarrow \infty }{\mathbb {P}}^{*}\big (\big \{\sup _{\theta \in \Theta }|U_{n}(\theta ,\cdot )|> \varepsilon \big \}\big )&\le \limsup _{n\rightarrow \infty }{\mathbb {P}}^{*}\big (\big \{\sup _{\theta \in \Theta }|U_{n}(\theta ,\cdot )| > \varepsilon \big \}\cap A_{kn}\big ) + 1/2^{k}\\&= 1/2^{k} \end{aligned}$$

for \(\varepsilon > 0\) and \(k\in {\mathbb {N}}\). This completes the proof. \(\square \)

Now we are ready to show Proposition 3.2.

Proof of Proposition 3.2:

Putting (6.1) together with Proposition 6.3 and Proposition 6.4, we obtain for any \(\varepsilon > 0\)

$$\begin{aligned}&\limsup _{n\rightarrow \infty }{\mathbb {P}}^{*}\Big (\Big \{\big |\sqrt{n}\inf _{\theta \in \Theta }{\mathcal {R}}_{\rho _{1,a}}\big ({\hat{F}}_{n,\theta }\big ) - \sqrt{n}\inf _{\theta \in \Theta }\frac{1}{n}\sum _{j=1}^{n}{\widehat{G}}_{1,a}(\theta ,Z_{j})\big |> \varepsilon \Big \}\Big )\\&\le \limsup _{n\rightarrow \infty }{\mathbb {P}}^{*}\big (\big \{\sup _{\theta \in \Theta }|W_{n}(\theta ,\cdot )| + \sup _{\theta \in \Theta }|U_{n}(\theta ,\cdot )|> \varepsilon /a\big \}\big )\\&\le \limsup _{n\rightarrow \infty }\big [{\mathbb {P}}^{*}\big (\big \{\sup _{\theta \in \Theta }|W_{n}(\theta ,\cdot )|> \varepsilon /(2 a)\big \}\big ) + {\mathbb {P}}^{*}\big (\big \{ \sup _{\theta \in \Theta }|U_{n}(\theta ,\cdot )| > \varepsilon /(2 a)\big \}\big )\big ] = 0. \end{aligned}$$

This completes the proof. \(\Box \)