1 Introduction

In stochastic optimization, stability usually refers to the continuity properties of optimal values and solution sets as mappings from a set of probability measures, endowed with a suitable distance, into the extended reals and solution space, respectively, see [22]. The distance on the space of probability measures must be selected in order to allow the estimation of differences of the relevant functions, which depend on probability measures. There exists a wide variety of possible distances of probability measures based on various constructions [19, 27]. In the present context, distances with \(\zeta \)-structure introduced first in [27] appear as a natural choice. For a given metric space \(\varOmega \) such a distance is of the form

$$\begin{aligned} d_{\mathfrak {F}}(\mathbb {P},\mathbb {Q}) = \sup _{f \in \mathfrak {F}} \left| \int _{\varOmega } f(\omega ) \ \mathrm {d} \mathbb {P}(\omega ) - \int _{\varOmega } f(\omega ) \ \mathrm {d} \mathbb {Q}(\omega ) \right| , \end{aligned}$$
(1)

where \(\mathfrak {F}\) is a family of Borel measurable functions from \(\varOmega \) to \(\overline{\mathbb {R}}\) and \(\mathbb {P}\), \(\mathbb {Q}\) are Borel probability measures on \(\varOmega \). Note that the distance \(d_{\mathfrak {F}}\) is non-negative, symmetric and satisfies the triangle inequality. It also satisfies \(d_{\mathfrak {F}}(\mathbb {P},\mathbb {P})=0\) and is, thus, a probability metric in the sense of [27]. However, \(d_{\mathfrak {F}}(\mathbb {P},\mathbb {Q})=0\) only implies \(\mathbb {P}=\mathbb {Q}\), when the family \(\mathfrak {F}\) is rich enough. Hence, \(d_{\mathfrak {F}}\) is a semi-metric in the usual terminology, in general.

The smallest relevant family \(\mathfrak {F}\) of Borel measurable functions in our stability studies contains only those functions which appear in the stochastic optimization problem under consideration. In this case, \(d_{\mathfrak {F}}\) may be called the minimal information (m.i.) distance. Stability results with respect to such m.i. distances serve as the starting point (i) to study stability with respect to the weak convergence of probability measures and (ii) to enlarge the family \(\mathfrak {F}\) properly by functions sharing essential analytical properties with the original ones. The latter strategy may lead to probability metrics that enjoy desirable properties like dual representations and convergence characterizations.

This method of probability metrics provides quantitative statements on the stability of solutions and optimal values of stochastic programming problems. Nevertheless, the existing theory has not been developed for optimization problems in which the design or decision variables may be infinite-dimensional, as is the case in PDE-constrained optimization under uncertainty. By including infinite-dimensional feasible sets, we introduce a number of complications; in particular, the loss of norm compactness of the feasible set, even in the case of convex, closed, and bounded feasible sets.

After fixing some essential notation in Sect. 2, we state the class of infinite-dimensional stochastic optimization problems for which we study stability in the subsequent sections in Sect. 3. Section 4 contains qualitative results by providing conditions that imply convergence of optimal values and solutions if the underlying sequence of probability distribution converges to a limit distribution in some sense. In Sect. 5 we show that optimal values and solutions even allow Lipschitz or Hölder estimates in terms of the \(\zeta \)-distance. In Sect. 6 we argue that the stability analysis of the preceding sections applies to certain stochastic PDE-constrained optimization problems. Finally, in Sect. 7, we provide a study of the results in Sect. 6 for the case when Monte Carlo approximations are used.

2 Notation and preliminary results

We assume throughout that \(\varOmega \) is a complete separable metric space, i.e., Polish space, and \(\mathcal {F}\) the associated Borel \(\sigma \)-algebra. In addition, we will work exclusively with Borel probability measures \(\mathbb P : \mathcal {F} \rightarrow [0,1]\). that ensure \((\varOmega ,\mathcal {F},\mathbb P)\) is a complete probability space. In particular, if \(\varOmega \) is finite, then \(\mathcal {F}\) must be the power set of \(\varOmega \). For the abstract portion of our results, we will always assume that \(\varTheta \) is a real separable Hilbert space and \(\varTheta _\mathrm{ad} \subset \varTheta \) is a nonempty, closed, and convex set. Given an appropriately chosen integrand \(f: \varTheta \times \varOmega \rightarrow \overline{\mathbb R}\), we will consider the potentially infinite dimensional stochastic optimization problems:

$$\begin{aligned} \nu (\mathbb P) := \inf _{\theta \in \varTheta _\mathrm{ad}} \int _{\varOmega } f(\theta ,\omega ) \ \mathrm {d}\mathbb P(\omega ). \end{aligned}$$
(2)

Here, we also introduce the notion of optimal value function \(\nu \) as a function from the space of all Borel probability measures \(\mathcal {P}(\varOmega )\) into \(\overline{\mathbb R}\). This potentially extended real-valued function will play a key role in our discussions. If necessary, we will denote the expectation by either \(\mathbb E\) or if it is not clear in context \(\mathbb E_{\mathbb P}\) to denote the dependence on the measures \(\mathbb P\).

Given a complete probability space \((\varOmega ,\mathcal {F},\mathbb P)\) and a real Banach space W, we recall the definition of the Bochner space \(L^p(\varOmega ,\mathcal {F},\mathbb {P};W)\) \(p \in [1,\infty )\) as the space of (equivalence classes) of strongly measurable functions v, which map \(\varOmega \) into W and satisfy \(\int _{\varOmega } \Vert v(\omega ) \Vert ^{p}_{W} \ \mathrm {d} \mathbb P(\omega ) < +\infty \), cf. [13]. If \(p = \infty \), then \(L^\infty (\varOmega ,\mathcal {F},\mathbb {P};W)\) consists of essentially bounded W-valued strongly measurable functions. In both cases \(L^p(\varOmega ,\mathcal {F},\mathbb {P};W)\) is a Banach space with the natural norm(s)

$$\begin{aligned} \Vert v\Vert _{L^p(\varOmega ,\mathcal {F},\mathbb {P};W)} =\left\{ \begin{array}{ll} \left[ \mathbb {E}\Vert v\Vert ^p_W\right] ^{1/p}\,&{},\text { for }p \in [1,\infty ),\\ {\text {*}}{ess\,sup}_{\omega \in \varOmega }\Vert v(\omega )\Vert _{W}\,&{}, \text { for }p=\infty . \end{array}\right. \end{aligned}$$

In the special case when \(W=\mathbb {R}\), we simply write \(L^p(\varOmega ,\mathcal {F},\mathbb {P})\). As usual norm convergence will be typically denote by \(\rightarrow \), whereas \(\rightharpoonup \) signifies weak convergence and \({\mathop {\rightharpoonup }\limits ^{*}}\) weak-star convergence.

In our stability analysis, we make use of distances with \(\zeta \)-structure on \(\mathcal {P}(\varOmega )\) having the form (1). We will refer to these objects as \(\zeta \)-distances for brevity. Given a family \(\mathfrak {F}\) of Borel measurable functions from \(\varOmega \) into \(\overline{\mathbb R}\), the \(\zeta \)-distance \(d_{\mathfrak {F}}\) on \((\varOmega ,\mathcal {F})\) is a highly flexible structure that allows us to define so-called minimal information distances and Fortet-Mourier metrics; each defined in the text below. Properties of \(\zeta \)-distances like a characterization of its maximal generator and its relation to weak convergence of probability measures can be found in [18, 23]. Recall that a sequence of probability measures \(\left\{ \mathbb P_N \right\} \) on \((\varOmega , \mathcal {F})\) is said to narrowly/weakly converge to the probability measure \(\mathbb P\) provided

$$\begin{aligned} \mathbb E_{\mathbb P_N}[f] \rightarrow \mathbb E_{\mathbb P}[f] \quad \forall f \in C^0_b(\varOmega ), \end{aligned}$$

where \(C^0_{b}(\varOmega )\) is the space of all bounded continuous functions on \(\varOmega \). A family \(\mathfrak {F}\) of Borel measurable functions is called a \(\mathbb {P}\)-uniformity class if

$$\begin{aligned} \lim _{N\rightarrow \infty }d_{\mathfrak {F}}(\mathbb {P},\mathbb {P}_{N})=0 \end{aligned}$$

holds for each sequence \(\left\{ \mathbb {P}_N \right\} \) of probability measures converging weakly to \(\mathbb {P}\). For example, it is known that \(\mathfrak {F}\) is a \(\mathbb {P}\)-uniformity class if \(\mathfrak {F}\) is uniformly bounded and \(\mathbb {P}(\{\omega \in \varOmega :\mathfrak {F} \text{ is } \text{ not } \text{ equicontinuous } \text{ at } \omega \})=0\) [23].

Finally, we recall that given a \(\sigma \)-algebra \(\mathcal {F}\) along with a nominal \(\sigma \)-finite \(\sigma \)-additive positive measure \(\mathbb P\) on \(\varOmega \), e.g., a Borel probability measure \(\mathbb P \in \mathcal {P}(\varOmega )\), the dual space of \(L^{\infty }(\varOmega ,\mathcal {F},\mathbb P)\) can be identified with the space of all finitely additive signed measures \(\mathrm {ba}(\varOmega )\) on \(\mathcal {F}\) absolutely continuous with respect to \(\mathbb P\), see e.g., [9].

3 The optimization problem

In order to carry out the stability analysis, we restrict the class of allowable integrands \(f(\theta ,\omega )\). These restrictions will henceforth be taken as standing assumptions. The particular class considered in this paper is inspired by applications in PDE-constrained optimization under uncertainty in which the PDE is given by a linear elliptic partial differential equation with random coefficients, right-hand side, and/or boundary conditions. We refer the reader to [17] for an overview of the state-of-the-art theory including more general objective functions and risk measures. In addition, many problems in functional data analysis exhibit practically the same form used below, see e.g., [21].

Let V and H be real Hilbert spaces such that V embeds continuously into H, and \(\theta _d \in H\). For \(\theta \in \varTheta \) and \(\omega \in \varOmega \), let \(\varSigma (\omega )\theta = S(\omega )\theta - s(\omega )\), where \(S(\omega ):\varTheta \rightarrow V\) is bounded and linear in \(\theta \) independently of \(\omega \) and \(s(\omega ) \in H\). We then define

$$\begin{aligned} f(\theta ,\omega ) := \frac{1}{2} \Vert \varSigma (\omega )\theta - \theta _d \Vert ^2_{H} = \frac{1}{2} \Vert S(\omega )\theta - (\theta _d + s(\omega )) \Vert ^2_{H}. \end{aligned}$$

Furthermore, we assume that for every \(\theta \in \varTheta \) (or \(\theta \in \varTheta _\mathrm{ad}\)) and any \(\mathbb P \in \mathcal {P}(\varOmega )\)

$$\begin{aligned} f(\theta ,\cdot ) \in L^1(\varOmega ,\mathcal {F},\mathbb P). \end{aligned}$$

This implicitly adds mild regularity assumptions on S and s that are typically fulfilled when S is related to the solution of a parametric elliptic PDE, e.g., \( S(\cdot )\theta , s(\cdot ) \in L^2(\varOmega ,\mathcal {F},\mathbb P; V). \) Then for \(\alpha > 0\), we consider the optimization problems

$$\begin{aligned} \inf _{\theta \in \varTheta _\mathrm{ad}} F(\theta ) := \mathbb E_{\mathbb P}[f(\theta )] + \frac{\alpha }{2} \Vert \theta \Vert ^2_{\varTheta }. \end{aligned}$$
(3)

Theorem 1

Problem (3) admits a unique solution \(\theta _{\mathbb P} \in \varTheta _\mathrm{ad}\) for every \(\mathbb P \in \mathcal {P}(\varOmega )\).

Proof

For existence, it suffices to prove F is proper, convex, lower-semicontinuous and coercive, cf. e.g., [3, Sec. 3.3]. Since \(f(\theta ,\cdot ) \in L^1(\varOmega , \mathcal {F}, \mathbb P)\) for any \(\theta \in \varTheta _\mathrm{ad}\) and \(f \ge 0\), F is proper. Convexity follows directly from the \(\mathbb P\)-a.e. convexity of \(\theta \mapsto f(\theta ,\omega ) + \frac{\alpha }{2} \Vert \theta \Vert ^2_{\varTheta }\) and the monotonicity of the expectation \(\mathbb E_{\mathbb P}\). Lower semicontinuity is a result of Fatou’s lemma: Let \(\theta _k \rightarrow \theta \) in \(\varTheta \). Then since \(f \ge 0\) and \(f(\theta _k,\cdot ) \rightarrow f(\theta ,\cdot )\) \(\mathbb P\)-a.s. (by the assumptions on S) we have

$$\begin{aligned} \liminf _{k} \mathbb E_{\mathbb P}\left[ f\left( \theta _k\right) \right] + \frac{\alpha }{2} \Vert \theta _k\Vert ^2_{\varTheta } \ge \mathbb E_{\mathbb P}\left[ \liminf _{k} f\left( \theta _k\right) \right] + \frac{\alpha }{2} \Vert \theta \Vert ^2_{\varTheta } = \mathbb E_{\mathbb P}[f(\theta )] + \frac{\alpha }{2} \Vert \theta \Vert ^2_{\varTheta }. \end{aligned}$$

Since \(\mathbb E_{\mathbb P}[f(\theta ,\cdot )] \ge 0\) for all \(\theta \in \varTheta _\mathrm{ad}\), F is coercive. Given F is proper, convex, and lower semicontinuous, F is weakly lower semicontinuous, as well. Since F is coercive, the level set \( \left\{ \theta \in \varTheta _\mathrm{ad} \left| F(\theta ) \le \alpha _0\right. \right\} , \) where \(\theta _0 \in \varTheta _\mathrm{ad}\) and \(\alpha _0 := F(\theta _0)\), is weakly sequentially compact. It then follows from the direct method that (3) admits a solution \(\theta _{\mathbb P}\). Given \(\alpha > 0\), F is strictly convex. Hence, \(\theta _{\mathbb P}\) is unique. \(\square \)

4 Qualitative stability

In this section, we provide stability results that ensure the approximating optimization problems obtained by replacing \(\mathbb P\) by another probability measure \(\mathbb Q \in \mathcal {P}(\varOmega )\) will converge in some sense to the original problem. In particular, we show that the solutions \(\theta _{\mathbb Q}\) will strongly converge to \(\theta _{\mathbb P}\) provided \(\mathbb Q\) converges to \(\mathbb P\) with respect to a properly chosen \(\zeta \)-distance. This basic result serves as the foundation needed to prove continuity of the solutions with respect to narrow convergence of probability measures. However, in order to do the latter, additional regularity properties will be required on the integrands with respect to \(\omega \). These stability results are in some sense more versatile than the quantitative results below. Nevertheless, they do not provide us with a rate of convergence.

Theorem 2

In the context of Theorem 1, suppose we are given a sequence \(\left\{ \mathbb P_N\right\} \) with \(\mathbb P_N \in \mathcal {P}(\varOmega )\) and a probability measure \(\mathbb P \in \mathcal {P}(\varOmega )\) such that \(d_{\mathfrak {F}}(\mathbb P_{N},\mathbb P) \rightarrow 0\), where \(\mathfrak {F}\) is any class of measurable functions from \(\varOmega \) into \(\overline{\mathbb R}\) large enough to contain \(f(\theta ,\cdot )\) for any \(\theta \in \left\{ \theta _{N} : N \in \mathbb N \right\} \cup \left\{ \theta _{\mathbb P}\right\} \) with \(\theta _{N} := \theta _{\mathbb P_{N}}\). Then \(\theta _{N} \rightarrow \theta _{\mathbb P}\) strongly in \(\varTheta \) as \(N \rightarrow +\infty \).

Remark 1

The obvious candidate for the set \(\mathfrak {F}\) would be to choose the collection of all possible integrands \(f(\theta ,\cdot ) :\varOmega \rightarrow \mathbb R\) indexed by \(\theta \in \varTheta _\mathrm{ad}\). In terms of the associated \(\zeta \)-distance, this would result in what is referred to in [19, 20, 22] as the minimal information metric.

Proof

We first show \(\left\{ \theta _{N}\right\} \) is uniformly bounded in \(\varTheta \). Indeed, we have

$$\begin{aligned} \frac{\alpha }{2} \Vert \theta _N \Vert ^2_{\varTheta } \le \mathbb E_{\mathbb P_{N}}\left[ f(\theta _N)\right] + \frac{\alpha }{2} \Vert \theta _{N} \Vert ^2_{\varTheta } \le \mathbb E_{\mathbb P_N}[f(\theta )] + \frac{\alpha }{2} \Vert \theta \Vert ^2_{\varTheta }\quad \theta \in \varTheta _\mathrm{ad}. \end{aligned}$$
(4)

For any fixed \(\theta \in \varTheta _\mathrm{ad}\), it follows from the hypotheses that

$$\begin{aligned} \mathbb E_{\mathbb P_N}[f(\theta )] = \mathbb E_{\mathbb P_N}[f(\theta )] - \mathbb E_{\mathbb P}[f(\theta )] + \mathbb E_{\mathbb P}[f(\theta )] \le d_{\mathfrak {F}}(\mathbb P_N,\mathbb P) + \mathbb E_{\mathbb P}[f(\theta )]. \end{aligned}$$
(5)

Substituting this into (4) we obtain the bound

$$\begin{aligned} \frac{\alpha }{2} \Vert \theta _N \Vert ^2_{\varTheta } \le d_{\mathfrak {F}}(\mathbb P_N,\mathbb P) + F(\theta ) \quad \theta \in \varTheta _\mathrm{ad}. \end{aligned}$$

Since \(d_{\mathfrak {F}}(\mathbb P_N, \mathbb P) \rightarrow 0\), \(\left\{ \theta _N\right\} \) is bounded in \(\varTheta \). Therefore, there exists a \(\widehat{\theta } \in \varTheta _\mathrm{ad}\) and a weakly convergent subsequence \(\left\{ \theta _{N_{\ell }}\right\} \) such that \(\theta _{N_{\ell }} \rightharpoonup \widehat{\theta }\) as \(\ell \rightarrow +\infty \).

For fixed \(\mathbb P\), it follows from the proof of Theorem 1 that \(\mathbb E_{\mathbb P}[f(\cdot )] : \varTheta \rightarrow \mathbb R\) is weakly lower semicontinuous. Therefore,

$$\begin{aligned} \begin{aligned} \mathbb E_{\mathbb P}\left[ f\left( \widehat{\theta }\right) \right] + \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta }&\le \liminf _{\ell } \mathbb E_{\mathbb P}\left[ f\left( \theta _{N_{\ell }}\right) \right] + \frac{\alpha }{2} \Vert \theta _{N_{\ell }} \Vert ^2_{\varTheta }\\&\le \liminf _{\ell } \left[ \mathbb E_{\mathbb P_{N_{\ell }}}\left[ f\left( \theta _{N_{\ell }}\right) \right] + \frac{\alpha }{2} \Vert \theta _{N_{\ell }} \Vert ^2_{\varTheta } + \mathbb E_{\mathbb P}\left[ f\left( \theta _{N_{\ell }}\right) \right] - \mathbb E_{\mathbb P_{N_{\ell }}}\left[ f\left( \theta _{N_{\ell }}\right) \right] \right] \\&\le \liminf _{\ell } \left[ \mathbb E_{\mathbb P_{N_{\ell }}}\left[ f\left( \theta _{N_{\ell }}\right) \right] + \frac{\alpha }{2} \Vert \theta _{N_{\ell }} \Vert ^2_{\varTheta } + d_{\mathfrak {F}}\left( \mathbb P_{N_{\ell }},\mathbb P\right) \right] \\&= \liminf _{\ell } \left[ \mathbb E_{\mathbb P_{N_{\ell }}}\left[ f\left( \theta _{N_{\ell }}\right) \right] + \frac{\alpha }{2} \Vert \theta _{N_{\ell }} \Vert ^2_{\varTheta } \right] . \end{aligned} \end{aligned}$$
(6)

It then follows from (6), the optimality of \(\theta _{N_{\ell }}\), and (5) that for any \(\theta \in \varTheta _\mathrm{ad}\) we have:

$$\begin{aligned} \begin{aligned} \mathbb E_{\mathbb P}[f(\widehat{\theta })] + \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta }&\le \liminf _{\ell } \left[ \mathbb E_{\mathbb P_{N_{\ell }}}\left[ f\left( \theta _{N_{\ell }}\right) \right] + \frac{\alpha }{2} \Vert \theta _{N_{\ell }} \Vert ^2_{\varTheta } \right] \\&\le \liminf _{\ell } \left[ \mathbb E_{\mathbb P_{N_{\ell }}}[f(\theta )] + \frac{\alpha }{2} \Vert \theta \Vert ^2_{\varTheta } \right] \\&\le \liminf _{\ell } d_{\mathfrak {F}}(\mathbb P_{N_{\ell }},\mathbb P) + \mathbb E_{\mathbb P}[f(\theta )] + \frac{\alpha }{2} \Vert \theta \Vert ^2_{\varTheta }\\&= \mathbb E_{\mathbb P}[f(\theta )] + \frac{\alpha }{2} \Vert \theta \Vert ^2_{\varTheta }. \end{aligned} \end{aligned}$$
(7)

Hence, \(\theta _{\mathbb P} = \widehat{\theta }\). Since \(\theta _{\mathbb P}\) is unique and the previous arguments hold for all weakly convergent subsequences of \(\left\{ \theta _{N}\right\} \), we have \(\theta _{N} \rightharpoonup \theta _{\mathbb P} = \widehat{\theta }\) as \(N \rightarrow +\infty \). It remains to prove \(\Vert \theta _{N} - \theta _{\mathbb P}\Vert _{\varTheta } \rightarrow 0\).

Clearly we have the inequality

$$\begin{aligned} \liminf _{N} \Vert \theta _{N} \Vert _{\varTheta } \ge \Vert \widehat{\theta } \Vert _{\varTheta } \end{aligned}$$
(8)

by weak lower semicontinuity of the norm \(\Vert \cdot \Vert _{\varTheta }\). On the other hand, by rearranging terms, the definition of \(\theta _{N}\) and feasibility of \(\widehat{\theta }\) yield

$$\begin{aligned} \begin{aligned} \frac{\alpha }{2} \Vert \theta _{N} \Vert ^2_{\varTheta }&\le \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta } + \mathbb E_{\mathbb P_{N}}[f(\widehat{\theta })] - \mathbb E_{\mathbb P_{N}}[f(\theta _{N})]\\&= \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta } + \mathbb E_{\mathbb P_{N}}[f(\widehat{\theta })] - \mathbb E_{\mathbb P_{N}}[f(\theta _{N})] + \mathbb E_{\mathbb P}[f(\theta _{N})] - \mathbb E_{\mathbb P}[f(\theta _{N})]\\&\le \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta } + \mathbb E_{\mathbb P_{N}}[f(\widehat{\theta })] - \mathbb E_{\mathbb P}[f(\theta _{N})] + d_{\mathfrak {F}}(\mathbb P_{N},\mathbb P)\\&= \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta } + \mathbb E_{\mathbb P_{N}}[f(\widehat{\theta })] - \mathbb E_{\mathbb P}[f(\widehat{\theta })] + \mathbb E_{\mathbb P}[f(\widehat{\theta })] - \mathbb E_{\mathbb P}[f(\theta _{N})] + d_{\mathfrak {F}}(\mathbb P_{N},\mathbb P)\\&\le \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta } + 2 d_{\mathfrak {F}}(\mathbb P_N,\mathbb P) + \mathbb E_{\mathbb P}[f(\widehat{\theta })] - \mathbb E_{\mathbb P}[f(\theta _{N})]. \end{aligned} \end{aligned}$$

Therefore, we again appeal to the weak lower semicontinuity of \(\mathbb E_{\mathbb P}[f(\cdot )]\) on \(\varTheta \) to obtain

$$\begin{aligned} \begin{aligned} \limsup _{N} \frac{\alpha }{2} \Vert \theta _{N} \Vert ^2_{\varTheta }&\le \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta } + \limsup _{N}\left[ 2 d_{\mathfrak {F}}(\mathbb P_N,\mathbb P) + \mathbb E_{\mathbb P}[f(\widehat{\theta })] - \mathbb E_{\mathbb P}[f(\theta _{N})]\right] \\&= \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta } + \mathbb E_{\mathbb P}[f(\widehat{\theta })] - \liminf _{N} [f(\theta _{N})]]\\&\le \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta } + \mathbb E_{\mathbb P}[f(\widehat{\theta })] - \mathbb E_{\mathbb P}[f(\widehat{\theta })]\\&= \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta }. \end{aligned} \end{aligned}$$
(9)

Combining (8) and (9), we have \( \Vert \theta _{N} \Vert _{\varTheta } \rightarrow \Vert \widehat{\theta } \Vert _{\varTheta }\). Then since \(\varTheta \) is a Hilbert space and \(\theta _{N} \rightharpoonup \widehat{\theta }\), the assertion follows. \(\square \)

An alternative perspective on qualitative stability is offered by our next result. Here, we will prove convergence of the sequence of minimizers under different data assumptions on the integrands and a different form of weak convergence of measures. We note that in PDE-constrained optimization under uncertainty these assumptions are less restrictive than they may appear. In particular, we do not require \(f(\theta ,\cdot ) : \varOmega \rightarrow \mathbb R\) to be continuous as is needed below for the Fortet-Mourier metric. The caveat here is the requirement that \(\mathbb P_N\) is absolutely continuous with respect to \(\mathbb P\).

Theorem 3

In addition to the standing assumptions, fix some \(\mathbb P \in \mathcal {P}(\varOmega )\) and suppose that for all \(\theta \in \varTheta _\mathrm{ad}\) \(f(\theta ,\cdot ) \in L^{\infty }(\varOmega ,\mathcal {F},\mathbb P)\). Assume furthermore that the superposition operator \(\varPhi : \varTheta \rightarrow L^{\infty }(\varOmega ,\mathcal {F},\mathbb P)\) defined by

$$\begin{aligned} \varPhi (\theta )(\omega ) := f(\theta ,\omega ) \end{aligned}$$

is completely continuous. Let \(\left\{ \mathbb P_{N}\right\} \subset \mathcal {P}(\varOmega )\) such that

  1. 1.

    for all \(N \in \mathbb N\) \(\mathbb P_{N}<< \mathbb P\) \((\mathbb P_{N}\) is absolutely continuous with respect to \(\mathbb P)\) and

  2. 2.

    \(\mathbb P_{N}\rightarrow \mathbb P\) with respect to the weak-star topology on \((L^{\infty }(\varOmega ,\mathcal {F},\mathbb P))^{*}\).

Then \(\theta _{N} \rightarrow \theta _{\mathbb P}\).

Proof

As noted in Sect. 2, each \(\mathbb Q \in \mathcal {P}(\varOmega )\) is an element of \((L^{\infty }(\varOmega ,\mathcal {F},\mathbb P))^*\) provided \(\mathbb Q<< \mathbb P\). The rest of the proof mirrors that of Theorem 2. Given the sequence of minimizers \(\left\{ \theta _{N}\right\} \) we immediately obtain a uniform bound on \(\Vert \theta _N\Vert \) from (4) since for any \(\theta \in \varTheta _\mathrm{ad}\) \(f(\theta ,\cdot ) \in L^{\infty }(\varOmega ,\mathcal {F},\mathbb P)\) and \(\mathbb P_N {\mathop {\rightharpoonup }\limits ^{*}} \mathbb P\). As before, we let \(\left\{ \theta _{N_\ell }\right\} _{\ell =1}^{\infty }\) denote the weakly convergent subsequence and \(\widehat{\theta }\) the associated weak limit.

Turning now to the estimate derived in (6), we see that

$$\begin{aligned} \begin{aligned} \mathbb E_{\mathbb P}\left[ f\left( \widehat{\theta }\right) \right] + \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta }&\le \liminf _{\ell } \mathbb E_{\mathbb P}\left[ f\left( \theta _{N_{\ell }}\right) \right] + \frac{\alpha }{2} \Vert \theta _{N_{\ell }} \Vert ^2_{\varTheta }\\&\le \liminf _{\ell } \left[ \mathbb E_{\mathbb P_{N_{\ell }}}\left[ f\left( \theta _{N_{\ell }}\right) \right] + \frac{\alpha }{2} \Vert \theta _{N_{\ell }} \Vert ^2_{\varTheta } + \mathbb E_{\mathbb P}\left[ f\left( \theta _{N_{\ell }}\right) \right] - \mathbb E_{\mathbb P_{N_{\ell }}}\left[ f\left( \theta _{N_{\ell }}\right) \right] \right] \\&= \liminf _{\ell } \left[ \mathbb E_{\mathbb P_{N_{\ell }}}\left[ f\left( \theta _{N_{\ell }}\right) \right] + \frac{\alpha }{2} \Vert \theta _{N_{\ell }} \Vert ^2_{\varTheta } \right] . \end{aligned} \end{aligned}$$
(10)

Here, \(\varPhi (\theta _{N_l}) \rightarrow \varPhi (\widehat{\theta })\) strongly in \(L^{\infty }(\varOmega ,\mathcal {F},\mathbb P)\) due the assumption of complete continuity. Therefore, both \(\mathbb E_{\mathbb P}[f(\theta _{N_{\ell }})]\) and \(\mathbb E_{\mathbb P_{N_{\ell }}}[f(\theta _{N_{\ell }})]\) converge to \(\mathbb E_{\mathbb P}[f(\widehat{\theta })]\).

As in the proof of Theorem 2, we obtain optimality of \(\widehat{\theta }\) by adapting the inequality (7), i.e., for every \(\theta \in \varTheta _\mathrm{ad}\) we have

$$\begin{aligned} \begin{aligned} \mathbb E_{\mathbb P}\left[ f\left( \widehat{\theta }\right) \right] + \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta }&\le \liminf _{\ell } \left[ \mathbb E_{\mathbb P_{N_{\ell }}}\left[ f\left( \theta _{N_{\ell }}\right) \right] + \frac{\alpha }{2} \Vert \theta _{N_{\ell }} \Vert ^2_{\varTheta } \right] \\&\le \liminf _{\ell } \left[ \mathbb E_{\mathbb P_{N_{\ell }}}[f(\theta )] + \frac{\alpha }{2} \Vert \theta \Vert ^2_{\varTheta } \right] \\&= \mathbb E_{\mathbb P}[f(\theta )] + \frac{\alpha }{2} \Vert \theta \Vert ^2_{\varTheta }. \end{aligned} \end{aligned}$$
(11)

Here, the regularity of the integrand ensures that \(\mathbb E_{\mathbb P_{N_{\ell }}}[f(\theta )] \) converges to \(\mathbb E_{\mathbb P}[f(\theta )]\); from which it follows that \(\widehat{\theta } = \theta _{\mathbb P}\). As in the proof of Theorem 2, we can again argue that the entire sequence \(\left\{ \theta _{\mathbb P_{N}}\right\} \) weakly converges to \(\theta _{\mathbb P}\).

In order to prove norm convergence, we note that

$$\begin{aligned} \begin{aligned} \frac{\alpha }{2} \Vert \theta _{N} \Vert ^2_{\varTheta }&\le \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta } + \mathbb E_{\mathbb P_{N}}[f(\widehat{\theta })] - \mathbb E_{\mathbb P_{N}}[f(\theta _{N})].\\ \end{aligned} \end{aligned}$$

Then by the complete continuity and regularity assumptions, we have

$$\begin{aligned} \limsup _{N} \frac{\alpha }{2} \Vert \theta _{N} \Vert ^2_{\varTheta } \le \limsup _{N} \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta } + \mathbb E_{\mathbb P_{N}}[f(\widehat{\theta })] - \mathbb E_{\mathbb P_{N}}[f(\theta _{N})] = \limsup _{N} \frac{\alpha }{2} \Vert \widehat{\theta } \Vert ^2_{\varTheta }. \end{aligned}$$

This completes the proof. \(\square \)

Next, we return to the setting using probability metrics to obtain some important implications of the Theorem 2 under further regularity assumptions on the integrands. In our setting, we recall that the space of all (Borel) probability measures with finite p-th moments is defined by

$$\begin{aligned} \mathcal {P}_{p}(\varOmega ) := \left\{ \mathbb P \in \mathcal {P}(\varOmega ) \left| \; \int _{\varOmega } d(\omega _0,\omega )^p \mathrm {d} \mathbb P(\omega ) < +\infty \right. \right\} , \end{aligned}$$

for some arbitrary \(\omega _0 \in \varOmega \). We recall that a sequence \(\left\{ \mathbb P_{N}\right\} \subset \mathcal {P}_{p}(\varOmega )\) converges weakly (narrowly) provided for all \(\varphi \in C^0_b(\varOmega )\)

$$\begin{aligned} \mathbb E_{\mathbb P_N}[ \varphi ] \rightarrow \mathbb E_{\mathbb P}[\varphi ] \text { and } \mathbb E_{\mathbb P_N}\left[ d\left( \omega _0,\cdot \right) ^p\right] \rightarrow \mathbb E_{\mathbb P}\left[ d\left( \omega _0,\cdot \right) ^p\right] \end{aligned}$$

as \(N \rightarrow \infty \). This type of weak convergence shares an intimate link with a certain class of \(\zeta \)-distances known as Fortet-Mourier metrics. To start, for \(p \in [1,\infty )\), we define the sets \(\mathcal {F}_p(\varOmega )\) of locally Lipschitz functions with a certain p-related growth condition by

$$\begin{aligned}&\mathcal {F}_{p}(\varOmega ) := \left\{ f : \varOmega \rightarrow \mathbb R \left| \;\right. \right. | f\left( \omega _1\right) - f\left( \omega _2\right) | \\&\quad \left. \left. \le \max \left\{ 1,d\left( \omega _1,\omega _0\right) ^{p-1}, d\left( \omega _2,\omega _0\right) ^{p-1}\right\} d\left( \omega _1,\omega _2\right) \quad \forall \omega _1,\omega _2 \in \varOmega \right. \right\} . \end{aligned}$$

We then define the Fortet-Mourier metric of order p for two measures \(\mathbb P,\mathbb Q \in \mathcal {P}_{p}(\varOmega )\) by

$$\begin{aligned} \zeta _{p}(\mathbb P,\mathbb Q) = d_{\mathcal {F}_{p}(\varOmega )}(\mathbb P,\mathbb Q). \end{aligned}$$

In particular, \(\zeta _{p}\) is equivalent to the so-called Kantorovich-Rubinstein functional with cost function given by

$$\begin{aligned} c\left( \omega _1,\omega _2\right) = \max \left\{ 1,d\left( \omega _1,\omega _0\right) ^{p-1}, d\left( \omega _2,\omega _0\right) ^{p-1}\right\} d\left( \omega _1,\omega _2\right) . \end{aligned}$$

(see [19, Theorem 5.3.3] along with the discussion on page 93 in [19]). Furthermore, it follows from [19, Theorem 6.2.1] that \(\left\{ \mathbb P_{N}\right\} \subset \mathcal {P}_{p}(\varOmega )\) converges weakly (narrowly) to \(\mathbb P \in \mathcal {P}_{p}(\varOmega )\) if and only if \(\zeta _p(\mathbb P_N,\mathbb P) \rightarrow 0\) as \(N \rightarrow +\infty \). We may therefore connect Theorem 2 directly to the weak convergence of probability measures.

Proposition 1

In the setting of Theorem 2, suppose there exists a \(p \in [1,\infty )\) and some \(L > 0\) such that

$$\begin{aligned} \mathfrak {F}= \mathcal {F}_{p}(\varOmega ) \text { and } \left\{ \frac{1}{L}f(\theta , \cdot ) : \varOmega \rightarrow \mathbb R \left| \; \theta \in \varTheta _\mathrm{ad} \right. \right\} \subset \mathfrak {F}. \end{aligned}$$

Then the solution mapping \(\mathcal {P}_{p}(\varOmega ) \ni \mathbb Q \mapsto \theta _{\mathbb Q} \in \varTheta _\mathrm{ad}\) is continuous with respect to weak (narrow) convergence of probability measures on \(\mathcal {P}_{p}(\varOmega )\).

Proof

After rescaling the integrands by \(1/L > 0\), this is a direct consequence of Theorem 2 in light of the preceding arguments. \(\square \)

Finally, we note that an alternative means of obtaining the sequential convergence result in Proposition 1 would be to appeal to the link between the weak topology on \(\mathcal {P}_{p}(\varOmega )\) and the topologies generated by the well-known Wasserstein distance \(W_p\) of order p. Let \(\gamma _i\) \((i=1,2)\) be the projection onto the first or second term of \(\varOmega \times \varOmega \), respectively, and for \(\pi \in \mathcal {P}(\varOmega \times \varOmega )\) denote the marginals by \(\pi ^i := \pi _{\#} \gamma _i := \pi \circ \gamma _i^{-1}\). Then the Wasserstein distance of order p is given by

$$\begin{aligned} W^p_p\left( \mathbb P,\mathbb Q\right) = \inf \left\{ \int _{\varOmega \times \varOmega } d(\omega _1,\omega _2)^p \mathrm {d}\pi \left( \omega _1,\omega _2\right) \left| \pi \in \mathcal {P}\left( \varOmega \times \varOmega \right) , \pi ^1 = \mathbb P \text { and } \pi ^2 = \mathbb Q \right. \right\} . \end{aligned}$$

For this distance we have the estimate:

$$\begin{aligned} \zeta _p\left( \mathbb P,\mathbb Q\right) \le \left( 1 + \int _{\varOmega } d(\omega _0,\omega )^p \ \mathrm {d} \mathbb P(\omega ) + \int _{\varOmega } d(\omega _0,\omega )^p \ \mathrm {d}\mathbb Q(\omega )\right) ^{\frac{p-1}{p}} W_p(\mathbb P,\mathbb Q). \end{aligned}$$
(12)

Therefore, if we start with a sequence of Borel probability measures \(\left\{ \mathbb P_{N}\right\} \) and \(\mathbb P \in \mathcal {P}(\varOmega )\) such that \(W_p(\mathbb P_N,\mathbb P) \rightarrow 0\), then we obtain the same statement as in Proposition 1. However, as shown in [20] the convergence in the Wasserstein metric is potentially strictly slower than in the Fortet-Mourier metric.

5 Quantitative stability

As mentioned above, quantitative stability provides us with Lipschitz or Hölder-type estimates of the optimal values and solutions. This is first done using the “weakest” possible \(\zeta \)-distance \(d_{\mathfrak {F}}\) in which \(\mathfrak {F}\) is directly related to the integrands without additional regularity assumptions on the dependence on \(\omega \). Further estimates related to Fortet-Mourier and Wasserstein metrics then follow as corollaries under Lipschitz conditions on the integrands.

Theorem 4

Under the standing asusmptions, let \(\mathbb P,\mathbb Q \in \mathcal {P}(\varOmega )\) and let \(\mathfrak {F}\) be any set of Borel measurable functions that contains \(g_\theta (\cdot ) := f(\theta ,\cdot )\), where \(\theta = \theta _{\mathbb P}\) and \(\theta _{\mathbb Q}\). Then we have the estimates:

$$\begin{aligned} |\nu (\mathbb Q) - \nu (\mathbb P)|\le & {} d_{\mathfrak {F}}(\mathbb Q,\mathbb P) \end{aligned}$$
(13)
$$\begin{aligned} \Vert \theta _{\mathbb Q} - \theta _{\mathbb P} \Vert\le & {} 2 \sqrt{\frac{2}{\alpha } d_{\mathfrak {F}}(\mathbb Q,\mathbb P)}. \end{aligned}$$
(14)

Proof

For the Lipschitz estimate (13), we observe that

$$\begin{aligned} | \nu (\mathbb Q) - \nu (\mathbb P)| =&\max \left\{ \nu (\mathbb Q) - \nu (\mathbb P),\nu (\mathbb P) - \nu (\mathbb Q)\right\} \\ =&\max \left\{ \mathbb E_{\mathbb Q}\left[ f(\theta _{\mathbb Q})\right] + \frac{\alpha }{2} \Vert \theta _{\mathbb Q} \Vert ^2 - \mathbb E_{\mathbb P}\left[ f\left( \theta _{\mathbb P}\right) \right] - \frac{\alpha }{2} \Vert \theta _{\mathbb P} \Vert ^2,\right. \\&\quad \left. \mathbb E_{\mathbb P}\left[ f\left( \theta _{\mathbb P}\right) \right] + \frac{\alpha }{2} \Vert \theta _{\mathbb P} \Vert ^2 - \mathbb E_{\mathbb Q}\left[ f\left( \theta _{\mathbb Q}\right) \right] - \frac{\alpha }{2} \Vert \theta _{\mathbb Q} \Vert ^2\right\} \\ \le&\max \left\{ \mathbb E_{\mathbb Q}[f(\theta _{\mathbb P})] - \mathbb E_{\mathbb P}\left[ f\left( \theta _{\mathbb P}\right) \right] , \mathbb E_{\mathbb P}\left[ f\left( \theta _{\mathbb Q}\right) \right] - \mathbb E_{\mathbb Q}\left[ f\left( \theta _{\mathbb Q}\right) \right] \right\} \\ \le&\max \{ |\mathbb E_{\mathbb Q}\left[ f\left( \theta _{\mathbb P}\right) \right] - \mathbb E_{\mathbb P}\left[ f\left( \theta _{\mathbb P}\right) \right] |,\\&\quad |\mathbb E_{\mathbb P}\left[ f\left( \theta _{\mathbb Q}\right) \right] - \mathbb E_{\mathbb Q}[f\left( \theta _{\mathbb Q}\right) |\}\\ \le&d_{\mathfrak {F}}(\mathbb Q,\mathbb P). \end{aligned}$$

For the Hölder estimate on the solution mapping (14), we start by letting \(\delta := d_{\mathfrak {F}}(\mathbb Q,\mathbb P)\) and observing that

$$\begin{aligned} 2 \delta&\ge \delta + |\nu (\mathbb Q) - \nu (\mathbb P)|\nonumber \\&\ge \delta + \nu (\mathbb Q) - \nu (\mathbb P)\nonumber \\&= \delta + \mathbb E_{\mathbb Q}[f(\theta _{\mathbb Q})] + \frac{\alpha }{2} \Vert \theta _{\mathbb Q} \Vert ^2 - \mathbb E_{\mathbb P}[f(\theta _{\mathbb P})] - \frac{\alpha }{2} \Vert \theta _{\mathbb P} \Vert ^2\nonumber \\&= \delta \pm (\mathbb E_{\mathbb P}[f(\theta _{\mathbb Q})] + \frac{\alpha }{2} \Vert \theta _{\mathbb Q}\Vert ^2) + \mathbb E_{\mathbb Q}[f(\theta _{\mathbb Q})] + \frac{\alpha }{2} \Vert \theta _{\mathbb Q} \Vert ^2 - \mathbb E_{\mathbb P}[f(\theta _{\mathbb P})] - \frac{\alpha }{2} \Vert \theta _{\mathbb P} \Vert ^2\nonumber \\&= \delta - (\mathbb E_{\mathbb P}[f(\theta _{\mathbb Q})] - \mathbb E_{\mathbb Q}[f(\theta _{\mathbb Q})] )\nonumber \\&\quad \quad + \mathbb E_{\mathbb P}[f(\theta _{\mathbb Q})] + \frac{\alpha }{2} \Vert \theta _{\mathbb Q}\Vert ^2 - \mathbb E_{\mathbb P}[f(\theta _{\mathbb P})] - \frac{\alpha }{2} \Vert \theta _{\mathbb P}\Vert ^2\nonumber \\&\ge \mathbb E_{\mathbb P}[f(\theta _{\mathbb Q})] + \frac{\alpha }{2} \Vert \theta _{\mathbb Q}\Vert ^2 - \mathbb E_{\mathbb P}[f(\theta _{\mathbb P})] - \frac{\alpha }{2} \Vert \theta _{\mathbb P}\Vert ^2. \end{aligned}$$
(15)

Using the quadratic term \(\frac{\alpha }{2} \Vert \cdot \Vert ^2\), the convexity of the integrand \(f(\cdot ,\omega )\), the convexity of \(\varTheta _\mathrm{ad}\), and optimality of \(\theta _{\mathbb P}\), we have

$$\begin{aligned} \begin{aligned} \mathbb E_{\mathbb P}[f(\theta _{\mathbb P})] + \frac{\alpha }{2} \Vert \theta _{\mathbb P}\Vert ^2&\le \mathbb E_{\mathbb P}[f(\theta _{\mathbb P}/2 + \theta _{\mathbb Q}/2)] + \frac{\alpha }{2} \Vert \theta _{\mathbb P}/2 + \theta _{\mathbb Q}/2\Vert ^2\\&\le \frac{1}{2} ( \mathbb E_{\mathbb P}[f(\theta _{\mathbb P})] + \frac{\alpha }{2} \Vert \theta _{\mathbb P} \Vert ^2) \\&\quad +\frac{1}{2} (\mathbb E_{\mathbb P}[f(\theta _{\mathbb Q})] +\frac{\alpha }{2} \Vert \theta _{\mathbb Q}\Vert ^2) - \frac{\alpha }{8} \Vert \theta _{\mathbb P} - \theta _{\mathbb Q} \Vert ^2. \end{aligned} \end{aligned}$$

It follows that

$$\begin{aligned} \mathbb E_{\mathbb P}[f(\theta _{\mathbb Q})] + \frac{\alpha }{2} \Vert \theta _{\mathbb Q}\Vert ^2 - \mathbb E_{\mathbb P}[f(\theta _{\mathbb P})] - \frac{\alpha }{2} \Vert \theta _{\mathbb P}\Vert ^2 \ge \frac{\alpha }{8} \Vert \theta _{\mathbb Q} - \theta _{\mathbb P} \Vert ^2. \end{aligned}$$
(16)

Combining (16) with (15) above yields

$$\begin{aligned} \Vert \theta _{\mathbb Q} - \theta _{\mathbb P} \Vert \le \sqrt{\frac{8 \delta }{\alpha }} = 2\sqrt{\frac{2 d_{\mathfrak {F}}(\mathbb Q,\mathbb P)}{\alpha }} \end{aligned}$$

as was to be shown. \(\square \)

We may now return to the results at the end of Sect. 4 in order to derive quantitative stability results using the familiar Fortet-Mourier and Wasserstein distances.

Corollary 1

In the setting of Theorem 4, suppose there exists a \(p \in [1,\infty )\) and some \(L > 0\) such that

$$\begin{aligned} \mathfrak {F}= \mathcal {F}_{p}(\varOmega ) \text { and } \left\{ \frac{1}{L}f(\theta , \cdot ) : \varOmega \rightarrow \mathbb R \left| \; \theta \in \varTheta _\mathrm{ad} \right. \right\} \subset \mathfrak {F}. \end{aligned}$$

Then the following estimates hold for the associated Fortet-Mourier metric:

$$\begin{aligned} |\nu (\mathbb Q) - \nu (\mathbb P)| \le L \zeta _p(\mathbb Q,\mathbb P),\quad \Vert \theta _{\mathbb Q} - \theta _{\mathbb P} \Vert \le 2 L \sqrt{\frac{2}{\alpha } \zeta _p(\mathbb Q,\mathbb P)}. \end{aligned}$$

Consequently, the following estimates hold for the Wasserstein metric \(W_p\):

$$\begin{aligned} \begin{aligned} |\nu (\mathbb Q) - \nu (\mathbb P)|&\le L c(\omega _0,\mathbb P,\mathbb Q)W_p(\mathbb Q,\mathbb P),\\ \Vert \theta _{\mathbb Q} - \theta _{\mathbb P} \Vert&\le 2 L \sqrt{\frac{2}{\alpha } c(\omega _0,\mathbb P,\mathbb Q) W_p(\mathbb Q,\mathbb P)}. \end{aligned} \end{aligned}$$

Here, we set \(c(\omega _0,\mathbb P,\mathbb Q) = \left( 1 + \int _{\varOmega } d(\omega _0,\omega )^p \mathrm {d} \mathbb P(\omega ) + \int _{\varOmega } d(\omega _0,\omega )^p \mathrm {d}\mathbb Q(\omega )\right) ^{\frac{p-1}{p}} \).

6 An application to PDE-constrained optimization under uncertainty

We conclude the theoretical discussion with an example from PDE-constrained optimization under uncertainty to demonstrate the applicability of our results. For the purpose of discussion, we start with an arbitrary Borel probability measure \(\mathbb P\) in order to introduce the problem. The notation mirrors in part that of [17]. For readability, we indicate the associated quantities in the general discussions above.

Our goal is twofold. Under reasonable data assumptions, we will define a class of integrands \(\mathfrak {F}\), which allow us to use the m.i. metric in our stability results to prove convergence of optimal values and optimal solutions under weak convergence of a sequence of measures \(\{\mathbb {P}_{N}\}\). Afterwards, assuming the underlying function spaces are replaced by finite-dimensional subspaces defined by a standard finite-element discretization, we derive an a priori-type error bound and argue that the fully discrete problems converge to the original continuous problems.

We will consider a class of optimization problems in which we seek to minimize the objective function

$$\begin{aligned} \begin{aligned} {\mathcal {J}}(u,z)&:= \frac{1}{2} \int _{\varOmega } \int _{D} | u(x,\omega ) - \widetilde{u}_d(x)|^2 \, \mathrm {d} x \mathrm {d} \mathbb P(\omega ) + \frac{\alpha }{2} \int _{D} |z(x)|^2 \ \mathrm {d}x \\&= \frac{1}{2} \mathbb E_{\mathbb P}\left[ \Vert u - \widetilde{u}_d \Vert ^2_{L^2(D)}\right] + \frac{\alpha }{2} \Vert z \Vert ^2_{L^2(D)} \end{aligned} \end{aligned}$$
(17)

subject to the condition that \(z \in Z_\mathrm{ad} \subset L^2(D)\), a closed bounded convex set, and for \(z \in Z_\mathrm{ad}\), u solves a random partial differential equation (PDE), which we define below. The function \(\widetilde{u}_{d} \in L^2(D)\) can be thought of as a desired state or general target function.

To be precise, let \(D\subset \mathbb {R}^{n}\) be an open, bounded Lipschitz domain, \(V=H_0^1(D)\) the classical Sobolev space with inner product \((\cdot ,\cdot )_V\), and \(V^{\star }=H^{-1}(D)\) its dual with norm \(\Vert \cdot \Vert _{\star }\) und dual pairing \(\langle \cdot ,\cdot \rangle \). In addition, let \(H=L^{2}(D)\) with inner product \((\cdot ,\cdot )_H\). Furthermore, let \(\varOmega \) be a metric space with metric \(\rho \) and Borel \(\sigma \)-field \(\mathcal {F}\) and let \(\mathbb {P}\) be a Borel probability measure.

Within this framework, we consider the bilinear form \(a(\cdot ,\cdot ;\omega ):V\times V\rightarrow \mathbb {R}\) defined by

$$\begin{aligned} a(u,v;\omega )=\int _{D}\sum _{i,j=1}^{n}b_{ij}(x,\omega )\frac{\partial u(x)}{\partial x_{i}} \frac{\partial v(x)}{\partial x_{j}}dx \end{aligned}$$

where \(\omega \in \varOmega \). The associated random PDE can be defined pointwise as:

$$\begin{aligned} a(u,v;\omega )=\int _{D}(z(x)+g(x,\omega ))v(x)dx \quad \text{ for } \mathbb {P}\text{-a.e. } \omega \in \varOmega , \end{aligned}$$

for all test functions \(v\in C_{0}^{\infty }(D)\), z varying in a constraint set \(Z_\mathrm{ad}\subset H\), and \(g, b_{ij}:D\times \varOmega \rightarrow \mathbb {R}\), which are assumed to be at least measurable in \(\varOmega \) and square (Lebesgue) integrable in D.

In order to use our stability results in this context, we will need further data assumptions on the bilinear form. For each \(\omega \in \varOmega \), we let \(A(\omega ) : V\rightarrow V^{\star }\) be the mapping

$$\begin{aligned} \langle A(\omega )u,v\rangle =a(u,v;\omega )\quad (\forall u,v\in V). \end{aligned}$$

The existence of \(A(\omega )\) as a bounded linear operator is due to the Riesz representation theorem and the Lax-Milgram lemma based on the following assumptions, see e.g., [1, Chap. 6., 6.1, 6.2]. First, we impose the condition that there exist \(L> \gamma >0\) such that

$$\begin{aligned} \gamma \sum _{i=1}^{n}y_{i}^{2}\le \sum _{i,j=1}^{n}b_{ij}(x,\omega )y_iy_j\le L\sum _{i=1}^{n}y_{i}^{2} \quad (\forall y\in \mathbb {R}^{n}) \end{aligned}$$

for all \(x\in D\) and \(\mathbb {P}\)-a.e. \(\omega \in \varOmega \). This implies that each \(b_{ij}\) is essentially bounded in \(D \times \varOmega \) with respect to the associated product measure. Moreover, the mapping \(A(\omega ) :V\rightarrow V^{\star }\) is uniformly positive definite (with constant \(\gamma \)) and uniformly bounded (with constant L) with respect to \(\mathbb {P}\)-a.e. \(\omega \in \varOmega \), i.e.,

$$\begin{aligned} \gamma \Vert u\Vert _{V}^{2}\le \langle A(\omega )u,u\rangle \le L\Vert u\Vert _{V}^{2}\quad (\forall u\in V). \end{aligned}$$

In addition, the inverse mapping \(A(\omega )^{-1}:V^{\star }\rightarrow V\) is again uniformly positive definite (with constant \(\frac{1}{L}\)) and uniformly bounded (with constant \(\frac{1}{\gamma }\)) with respect to \(\mathbb {P}\)-a.e. \(\omega \in \varOmega \).

Under these data assumptions, we may now define a class of integrands for the m.i. metric for which \(d_{\mathfrak {F}}(\mathbb P_{N},\mathbb P) \rightarrow 0\) for any sequence of Borel probability measures \(\{\mathbb {P}_{N}\}\) that converges weakly to \(\mathbb {P}\). To this end, we define the functions \(f:Z_\mathrm{ad}\times \varOmega \rightarrow \mathbb {R}\) by

$$\begin{aligned} f(z,\omega )= & {} \frac{1}{2}\Vert A(\omega )^{-1}\left( z+g(\omega )\right) -\widetilde{u}_{d}\Vert _{H}^{2}=\frac{1}{2}\Vert A(\omega )^{-1}z - \left( \widetilde{u}_{d}-A(\omega )^{-1}g(\omega )\right) \Vert _{H}^{2}\\= & {} \frac{1}{2}\int _{D}([A(\omega )^{-1}z](x)-(\widetilde{u}_{d}(x)-[A(\omega )^{-1}g(\cdot ,\omega )](x)))^2 dx. \end{aligned}$$

Our aim is to derive conditions implying that the class \(\mathfrak {F}=\{f(z,\cdot ):z\in Z_\mathrm{ad}\}\) is uniformly bounded and equicontinuous, and consequently a \(\mathbb {P}\)-uniformity class, cf. [23].

Lemma 1

In addition to the standing assumptions, suppose that \(\widetilde{u}_{d}\in H\) and \(g\in L^{2}(\varOmega ,\mathcal {F},\mathbb P;V^{\star })\). Then for some \(C>0\) we have

$$\begin{aligned} |f(z,\omega )|\le C(1+\Vert g(\omega )\Vert _{\star }^2)\quad (\mathbb {P}\text{- } \text{ a.e. } \omega \in \varOmega ,\,z\in Z_\mathrm{ad}). \end{aligned}$$

Proof

For \(z\in Z_\mathrm{ad}\) and \(\omega \in \varOmega \) we obtain

$$\begin{aligned} |f(z,\omega )|\le & {} \left( \Vert A(\omega )^{-1}z\Vert _H^2+\Vert \widetilde{u}_{d}-A(\omega )^{-1}g(\omega )\Vert _H^2\right) \\\le & {} \left( \Vert A(\omega )^{-1}z\Vert _H^2+2\Vert \widetilde{u}_d\Vert _{H}^{2}+2\Vert A(\omega )^{-1}g(\omega )\Vert _H^2\right) \\\le & {} \left( \Vert A(\omega )^{-1}z\Vert _V^2+2\Vert A(\omega )^{-1}g(\omega )\Vert _V^2+2\Vert \widetilde{u}_d\Vert _{H}^{2}\right) \\\le & {} \left( \frac{c}{\gamma }(\Vert z\Vert _{\star }^2+2\Vert g(\omega )\Vert _{\star }^2)+2\Vert \widetilde{u}_d\Vert _{H}^{2}\right) , \end{aligned}$$

where we used the Poincaré-Friedrichs inequality twice (with some constant c) and the uniform boundedness of \(\Vert A(\omega )^{-1}\Vert \) by \(\frac{1}{\gamma }\). Since z varies in the bounded set \(Z_\mathrm{ad}\), there is a positive constant C such that the assertion holds. \(\square \)

Lemma 1 provides us with a uniform bound on all functions in \(\mathfrak {F}\). The proof of the following Lemma makes use of a result in [12]. Since this book is not readily available in English, we provide it and a short proof in the “Appendix”.

Lemma 2

In addition to the assumptions of Lemma 1, suppose there is a constant \(C>0\) such that

$$\begin{aligned} \sqrt{\sum _{i,j=1}^{n}|b_{ij}(x,\omega )-b_{ij}(x,\omega ')|^{2}}\le C\rho (\omega ,\omega ')\quad (\forall \omega ,\omega '\in \varOmega ). \end{aligned}$$

Then for any \(g\in V^{\star }\) and \(\omega ,\omega '\in \varOmega \) we have

$$\begin{aligned} \Vert A(\omega )^{-1}g-A(\omega ')^{-1}g\Vert _{V}\le \frac{t}{1-\kappa (t)}C\Vert g\Vert _{\star } \rho (\omega ,\omega ')\quad \forall t\in \left( 0,\frac{2\gamma }{L^2}\right) , \end{aligned}$$

where \(\kappa (t) = \sqrt{1-2\gamma t + L^2 t^2}\).

Proof

First we study the dependence of \(A(\omega )u\) on \(\omega \). Let \(u,v\in V\) and \(\omega ,\omega '\in \varOmega \).

$$\begin{aligned} |\langle (A(\omega )- A(\omega '))u,v\rangle |= & {} \left| \int _{D}\sum _{i,j=1}^{n}(b_{ij}(x,\omega ) -b_{ij}(x,\omega '))\frac{\partial u(x)}{\partial x_{i}}\frac{\partial v(x)}{\partial x_{j}}dx\right| \\= & {} \left| \left( B(\cdot ;\omega ,\omega ') \nabla u(\cdot ), \nabla v(\cdot ) \right) \right| \\\le & {} \int _{D}\Vert B(x;\omega ,\omega ')\Vert |\nabla u(x)| |\nabla v(x)| dx\\\le & {} \sup _{x\in D}\Vert B(x;\omega ,\omega ')\Vert \Vert u\Vert _{V}\Vert v\Vert _{V}, \end{aligned}$$

where \(B(x;\omega ,\omega ')\) denotes the \(n\times n\)-matrix

$$\begin{aligned} B(x;\omega ,\omega ')=(b_{ij}(x,\omega )-b_{ij}(x,\omega '))_{i,j=1,\ldots ,n} \end{aligned}$$

with Frobenius norm \(\Vert B(x;\omega ,\omega ')\Vert \), \(\nabla u\) the gradient of u in the sense of Sobolev and \(|\nabla u|\) its Euclidean norm. Hence, we obtain

$$\begin{aligned} \Vert (A(\omega )- A(\omega '))u\Vert _{\star }\le \Vert u\Vert _{V}\sup _{x\in D}\sqrt{\sum _{i,j=1}^{n} |b_{ij}(x,\omega )-b_{ij}(x,\omega ')|^{2}}\le C\Vert u\Vert _{V}\rho (\omega ,\omega '). \end{aligned}$$

Next we consider the mapping \(K_{t}(\omega )u=u-tJ^{-1}(A(\omega )u-g)\) for some \(t\in (0,\frac{2\gamma }{L^2})\), \(\omega \in \varOmega \), \(g\in V^{\star }\) and any \(u\in V\). Then it follows from Proposition 2 that

$$\begin{aligned} \Vert K_{t}(\omega )u-K_{t}(\omega )u'\Vert _{V}\le \kappa (t)\Vert u-u'\Vert _{V} \end{aligned}$$

for any \(u,u'\in V\) and \(\kappa (t)=\sqrt{1-2t\gamma +t^2L^2}<1\). Furthermore, the unique fixed point of \(K_t(\omega )\) belongs to the ball around zero with radius

$$\begin{aligned} r=\frac{\Vert K_{t}0-0\Vert _{V}}{1-\kappa (t)} =\frac{t}{1-\kappa (t)}\Vert J^{-1}g\Vert _{V} =\frac{t}{1-\kappa (t)}\Vert g\Vert _{\star }. \end{aligned}$$

For any \(u\in V\) and \(\omega ,\omega '\in \varOmega \) we have

$$\begin{aligned} \Vert K_{t}(\omega )u-K_{t}(\omega ')u\Vert _{V}=t\Vert J^{-1}(A(\omega )-A(\omega '))u\Vert _{V} =t\Vert (A(\omega )-A(\omega '))u\Vert _{\star } \end{aligned}$$

and apply Proposition 3 from the “Appendix” with \(P=\varOmega \), \(X=\mathbb {B}(0,r)=\{u\in V:\Vert u\Vert _{V}\le r\}\), \(F(p,u)=K_{t}(\omega )u\) and \(F(p',u)=K_{t}(\omega ')u\). We obtain

$$\begin{aligned} \Vert \bar{x}(g,\omega )-\bar{x}(g,\omega ')\Vert \le \frac{t}{1-\kappa (t)}C r \rho (\omega ,\omega ') \end{aligned}$$

where \(\bar{x}(g,\omega )=A(\omega )^{-1}g\) and \(r=\frac{t}{1-\kappa (t)}\Vert g\Vert _{\star }\). This completes the proof. \(\square \)

We now have enough results to prove that \(\mathfrak {F}\) constitutes a \(\mathbb P\)-uniformity class, which is a direct consequence of the following theorem.

Theorem 5

In addition to the hypotheses of Lemma 2, assume \(g\in L^{\infty }(\varOmega ,\mathcal {F},\mathbb {P}; H)\) and there exists \(\bar{C}>0\) such that

$$\begin{aligned} |g(x,\omega )-g(x,\omega ')|\le \bar{C}\rho (\omega ,\omega ')\quad (\forall \omega ,\omega ' \in \varOmega ,\text{ a. } \text{ e. } \text{ in } D). \end{aligned}$$

Then \(\mathfrak {F}\) is uniformly bounded and equi-Lipschitz continuous with respect to \(\rho \) on \(\varOmega \).

Proof

For any \(z\in Z_\mathrm{ad}\) and \(\omega '\in \varOmega \) let \(F(z,\omega )=A(\omega )^{-1}(z+g(\omega )) -\widetilde{u}_{d}\). Our assumptions imply that \(F(\cdot ,\cdot )\) is \(\mathbb {P}\)-a.s. uniformly bounded in H by some constant \(\hat{C}\) (see the proof of Lemma 1). Furthermore, we obtain for any \(z\in Z_\mathrm{ad}\) and \(\omega ,\omega '\in \varOmega \):

$$\begin{aligned} |f(z,\omega )-f(z,\omega ')|= & {} \frac{1}{2}(\Vert F(z,\omega )\Vert _{H}+\Vert F(z,\omega ')\Vert _{H})|\Vert F(z,\omega )\Vert _{H} -\Vert F(z,\omega ')\Vert _{H}|\\\le & {} \!\!\hat{C}\Vert F(z,\omega )-F(z,\omega ')\Vert _{H}\\\le & {} \!\!\hat{C}(\Vert (A(\omega )^{-1}-A(\omega ')^{-1})z\Vert _{V}+ \Vert A(\omega )^{-1}g(\omega )-A(\omega ')^{-1}g(\omega ')\Vert _{V}), \end{aligned}$$

where we used the uniform boundedness and the Poincaré-Friedrichs’ inequality. For the first term on the right-hand side we argue as in Lemma 2. The second term is estimated by

$$\begin{aligned} \Vert A(\omega )^{-1}g(\omega )-A(\omega ')^{-1}g(\omega ')\Vert _{V}\le & {} \Vert (A(\omega )^{-1}-A(\omega ')^{-1})g(\omega )\Vert _{V}\\&+\Vert A(\omega ')^{-1}(g(\omega )-g(\omega '))\Vert _{V}. \end{aligned}$$

Now, we use again Lemma 2 for the first term and both Lemma 1 and the assumption on g for the second. Combining these observations, we obtain the assertion. \(\square \)

Theorem 5 establishes the \(\mathbb P\)-uniformity of \(\mathfrak {F}\) under relatively mild assumptions. In particular, we do not require the terms \(b_{ij}\) and g to be smooth in any way with respect to \(\omega \). Of course in many interesting applications, see e.g. [4, 5], one can demonstrate much higher regularity of \(F(z,\omega )=A(\omega )^{-1}(z+g(\omega )) -\widetilde{u}_{d}\) in \(\omega \) for each \(z \in Z_\mathrm{ad}\) if some smoothness of \(b_{ij}\) and g is in fact available. And though the presence of \(\Vert \cdot \Vert ^2_{H}\) in \(f(z,\omega )\) rules out 1-Lipschitz continuity, as required by estimates using the Wasserstein distances, the quantitative estimates using the Fortet-Mourier metric, e.g., \(\zeta _2\), which are incidentally strictly sharper than the Wasserstein estimates, are still applicable provided the local growth conditions for functions in \(\mathcal {F}_p(\varOmega )\) are fulfilled. On the other hand, as a general point of critique, the minimal information metric along with Theorems 4 and 5 preclude the need to enlarge the set of integrands \(\mathfrak {F}\) in order make use of the rougher estimates given by the Fortet-Mourier estimates.

This brings us to our second goal of this section. In order to solve optimization problems of the type

$$\begin{aligned} \min \; \mathbb E_{\mathbb P}[f(z)] + \frac{\alpha }{2} \Vert z\Vert ^2_{H} \text { over } z \in Z_\mathrm{ad} \end{aligned}$$
(18)

with \(f \in \mathfrak {F}\) numerically, not only \(\mathbb P\) but the decision variables z and the underlying partial differential equation must be approximated. In order words, using a finite-sample-based approximation \(\mathbb P_{N}\) of \(\mathbb P\) and a finite-element discretization for the deterministic quantities in H and V, we would typically consider the finite-dimensional problems of the type:

$$\begin{aligned} \begin{aligned} \min \; \frac{1}{2}\sum _{i=1}^N \pi _i\left[ \Vert (A_i^h)^{-1}(z^{h}) - u_{d,i}^{h} \Vert _{H}^2\right] + \frac{\alpha }{2} \Vert z^{h}\Vert ^2_{H} \text { over } z^{h} \in Z^{h}_\mathrm{ad}.&\\ \end{aligned} \end{aligned}$$
(19)

Here, \(A^h_i\) is defined from \(A(\omega )\) by replacing \(\omega \) with a realization \(\omega _i\) and V by a finite-dimensional subspace \(V_{h}\) derived by a standard finite-element approximation. The term \(u^h_{d,i} = (A_i^h)^{-1}(g^{h}_i - \widetilde{u}_{d,h})\) and \(Z^h_\mathrm{ad}\) is an approximate of \(Z_\mathrm{ad}\) using a finite-element approximation of H.

For piecewise constant approximations of the control z, the original ideas date back to Falk [10], see the recent chapter [2] for a quick reference. For another more recent, comprehensive treatment see [26] (for a posteriori error estimates) as well as the monograph [15, Chap. 3] and the many references therein. In many of these works, as well as in our concrete example in the next section, the set \(Z_\mathrm{ad}\) has the concrete form:

$$\begin{aligned} Z_\mathrm{ad} := \left\{ z \in L^2(D) \left| \; \underline{a}(x) \le z(x) \le \overline{a}(x) \text { a.e. } x \in D \right. \right\} , \end{aligned}$$

where \(\underline{a}, \overline{a}\) are sufficiently regular; often \(L^{\infty }(D)\). Therefore, for the sake of argument, we may assume that if D, B, g, \(\underline{a}\), \(\overline{a}\), and \(\widetilde{u}_d\) are sufficiently regular, then there exists a (random variable) \(C_{N} \ge 0\) along with a real \(q \in (0,3/2]\) such that

$$\begin{aligned} \Vert z^h_{\mathbb P_N} - z_{\mathbb P_N} \Vert _{H} \le C_{N} h^q\quad \omega \in \varOmega . \end{aligned}$$
(20)

Here, the dependence on N results from the fact that the typical estimates, see e.g., [2, Thm. 10], depend on constants related to the coefficients of the PDE, the right-hand side g, and \(u_{d}\), which is stochastic. Therefore, in the estimate (20), \(C_{N}\) is related to a realization of a random sample of length N. Using (20), we can apply Theorem 4 and the triangle inequality to obtain the estimate

$$\begin{aligned} \begin{aligned} \Vert z_{\mathbb P} - z^h_{\mathbb P_N}\Vert _{H}&\le 2 \sqrt{2} \alpha ^{-1/2} d_{\mathfrak {F}}(\mathbb P,\mathbb P_{N})^{1/2} + C_{N} h^q. \end{aligned} \end{aligned}$$
(21)

It should also be noted that for piecewise constant approximations of \(Z^{h}_\mathrm{ad}\), we can only expected \(q = 1\). The case for \(q = 3/2\) requires a significant amount of regularity, and \(q \in (1,3/2)\) depends on both the regularity as well as the type of discretization. In light of Theorem 5, (21) guarantees the convergence of the fully discrete solutions \(z^h_{\mathbb P_N}\) to the original infinite dimensional solution, provided \(h \downarrow 0\), and \(\mathbb P_{N} \rightarrow \mathbb P\) weakly.

7 Monte Carlo approximation and numerical illustration for PDE-constrained optimization

In this final section, we provide additional discussions on the behavior of \(d_{\mathfrak {F}}(\mathbb P, \mathbb P_N)\) for Monte Carlo approximations \(\mathbb P_N\) of \(\mathbb P\) in order to give the reader a better impression of the potential rate of convergence; in particular, for the setting in the previous section.

7.1 Monte Carlo approximation

For the sake of argument, we assume that the integrands f can be written

$$\begin{aligned} f(z,\omega ) = f(z,\varvec{\xi }(\omega )) + \frac{\alpha }{2} \Vert z\Vert ^2_{H}, \end{aligned}$$

where \(\varvec{\xi }: \varOmega \rightarrow \varXi \subset \mathbb R^d\) is a random vector. This is often the case for PDE-models and is used in the example in Sect. 7.2 below.

Let \(\varvec{\xi }^1,\varvec{\xi }^2,\ldots ,\varvec{\xi }^N,\ldots \) be independent identically distributed \(\varXi \)-valued random vectors on some probability space \((\varOmega ,\mathcal {F}, P)\) having the common probability law \(\mathbb P\), i.e., \(\mathbb P= P \circ (\varvec{\xi }^1)^{-1}\). In this context, we define the empirical measures

$$\begin{aligned} \mathbb P_N(\cdot )=\frac{1}{N}\sum _{i=1}^{N}\delta _{\varvec{\xi }^i(\cdot )} \qquad (n\in \mathbb N) \end{aligned}$$
(22)

and the empirical or Monte Carlo approximation of the stochastic program (18) with sample size N, i.e.,

$$\begin{aligned} \min \left\{ \frac{1}{N}\sum \limits _{i=1}^{N} f(z,\varvec{\xi }^i(\cdot )): z\in Z_\mathrm{ad}\right\} . \end{aligned}$$
(23)

The optimal value \(\nu (\mathbb P_{N}(\cdot ))\) of (23) is a real random variable and the solution \(z_{\mathbb P_{N}(\cdot )}\) an H-valued random element. It is well known that the sequence \(\left\{ \mathbb P_{N}(\cdot )\right\} \) of empirical measures converges weakly to \(\mathbb P\) P-almost surely (see, e.g., [8, Theorem 11.4.1]). Theorem 5 implies that the class \(\mathfrak {F}\) is a P-uniformity class and, hence, the sequence \(d_{\mathfrak {F}}(\mathbb P_{N}(\cdot ),\mathbb P)\) converges to zero P-almost surely. According to Theorem 4, the sequences \(\left\{ v(\mathbb P_{N}(\cdot ))\right\} \) and \(\left\{ z_{\mathbb P_{N}(\cdot )}\right\} \) of empirical optimal values and solutions converge P-almost surely to their true optimal values and solutions \(v(\mathbb P)\) and \(z_{\mathbb P}\), respectively.

In order to obtain rates of convergence for the sequences of empirical optimal values and solutions, we consider their mean or mean square distance and conclude from Theorem 4 and Corollary 1 the estimates

$$\begin{aligned} \mathbb E\left[ |v(\mathbb P_{N})-v(\mathbb P)|\right]\le & {} \mathbb E\left[ d_{\mathfrak {F}}\left( \mathbb P_{N},\mathbb P\right) \right] \le L\,\mathbb E\left[ \zeta _{1}\left( \mathbb P_{N},\mathbb P\right) \right] \\ \left( \mathbb E\left[ \Vert z\left( \mathbb P_{N}\right) -z\left( \mathbb P\right) \Vert ^2_{H}\right] \right) ^{\frac{1}{2}}\le & {} \left( \mathbb E\left[ d_{\mathfrak {F}}\left( \mathbb P_{N},\mathbb P\right) \right] \right) ^{\frac{1}{2}} \le \left( \hat{L}\,\mathbb E\left[ \zeta _{1}\left( \mathbb P_{N}, \mathbb P\right) \right] \right) ^{\frac{1}{2}}, \end{aligned}$$

where L is the constant appearing in Corollary 1 and \(\hat{L}\) is given by \(\hat{L}=2L\sqrt{\frac{2}{\alpha }}\). Unfortunately, convergence rates of the mean convergence of \(d_{\mathfrak {F}}(\mathbb P_{N}(\cdot ),\mathbb P)\) are not known, but for \(\mathfrak {F}\) containing uniformly bounded and equi-Lipschitz continuous functions, the rate coincides essentially with that of \(\mathbb E[\zeta _{1}(\mathbb P_{N},\mathbb P)]\). For the latter it follows from [6, Theorem 1] that the estimate

$$\begin{aligned} \mathbb E\left[ \zeta _{1}\left( \mathbb P_{N},\mathbb P\right) \right] \le \kappa _{s}M_{s}\left( \mathbb P\right) N^{-\frac{1}{d}} \end{aligned}$$
(24)

is valid if \(d\ge 3\), \(s>\frac{d}{d-1}\), \(\kappa _{s}\) is a constant only depending on s and the sth absolute moment

$$\begin{aligned} M_{s}(\mathbb P)=\left( \int _{\mathbb R^d}\Vert \xi \Vert ^{s}d\xi \right) ^{\frac{1}{s}} \end{aligned}$$

is finite. It is argued in [11] that the rate (24) is sharp if \(\mathbb P\) is the uniform distribution on the unit cube \([-1,1]^{d}\). For the numerical experiments in the next subsection, we observe better rates than guaranteed by the probability metrics. This is a limitation of the probability metrics themselves, not their application to the problem at hand.

7.2 Numerical illustration

The previous discussion provides useful upper bounds for the case of the empirical measure \(\mathbb P_N\). However, these bounds are based on estimates of the Wasserstein metrics, not the minimal information metric. Therefore, in order to obtain a better impression of the possible rate of convergence associated with \(d_{\mathfrak {F}}(\mathbb P, \mathbb P_N)\) in practice, we provide here a concrete numerical example.

We propose a model problem, which is based on [16, Ex. 6.1, Ex. 6.2]. Let \(\alpha = 10^{-3}\), \(D = (0,1)\), \(\widetilde{u}(x) = \sin (50.0*x/\pi )\), and consider the optimal control problem

$$\begin{aligned} {\text {minimize}}_{z\in L^2(D)} \frac{1}{2} \mathbb E_{\mathbb P}\left[ \Vert u- \widetilde{u}_{d} \Vert ^2_{H}\right] +\frac{\alpha }{2} \Vert z \Vert ^2_{H}\text { over } z \in L^2(D) \end{aligned}$$
(25)

where \(z \in Z_\mathrm{ad}\) with

$$\begin{aligned} Z_\mathrm{ad} := \left\{ w \in L^2(D) \left| -0.75 \le w(x) \le 0.75 \text { a.e. } x \in D\right. \right\} \end{aligned}$$

and \(u=u(z)\in L^{\infty }(\varOmega , \mathcal {F}, \mathbb P;H^1(D))\) solves the weak form of

$$\begin{aligned} -\nu (\omega ) \partial _{xx} u(\omega ,x)&= g(\omega ,x) + z(x) \quad&(\omega ,x)\in \varOmega \times D, \end{aligned}$$
(26a)
$$\begin{aligned} u(\omega ,0) = d_0(\omega ),\quad u(\omega ,1)&= d_1(\omega ) \quad&\omega \in \varOmega . \end{aligned}$$
(26b)

Furthermore, we suppose that

$$\begin{aligned} \begin{aligned} \nu (\omega )&:= 10^{\xi _1(\omega )-2},\quad&g(\omega ,x)&:= \frac{\xi _2(\omega )}{100}\\ d_0(\omega )&:= 1+\frac{\xi _3(\omega )}{1000}\quad&d_1(\omega )&:= \frac{\xi _4(\omega )}{1000}, \end{aligned} \end{aligned}$$

with random variables \(\xi _i :\varOmega \rightarrow \mathbb R\), \(i=1,2,3,4\), such that the supports \(\xi _i\), \(i=1,2,3\), are \([-1,1]\) and the support of \(\xi _4\) is [1, 3]. For the sake of illustration, we assume that each of these random variables is uniformly distributed and after the usual change of variables, we consider instead

$$\begin{aligned} -\nu (\xi ) \partial _{xx} u(\xi ,x)&= g(\xi ,x) + z(x) \quad&(\xi ,x)\in \varXi \times D, \end{aligned}$$
(27a)
$$\begin{aligned} u(\xi ,0) = d_0(\xi ),\quad u(\xi ,1)&= d_1(\xi ) \quad&\xi \in \varXi . \end{aligned}$$
(27b)

with \(\varXi = [-1,1]\times [-1,1] \times [-1,1] \times [1,3]\), endowed with the associated uniform density. We define \(\varvec{\xi }:= (\xi _1,\dots ,\xi _4) \in \varXi \).

Finally, we note that the linearity of the differential equation allows us to use the superposition principle to separate the unique, z-dependent solution u(z) into the sum of random fields as \(u(z) = S(z) + \widehat{u}\), where S(z) maps z from H into the solution space and \(\widehat{u}\) is a fixed random field. With the aim of showing that the necessary continuity properties used in the previous section are fulfilled, let \(\widehat{u}:D\times \varXi \rightarrow \mathbb R\) denote a function such that \(\widehat{u}(\cdot ,\xi )\) belongs to \(H^1(D)\) and \(\widehat{u}\) satisfies the inhomogeneous random boundary condition in (27b). If we then recast (27) as the random elliptic PDE with random boundary conditions

$$\begin{aligned} A(\xi )u=z+g(\xi ),\quad u(x)=b(x,\xi )\quad (x\in \partial D,\,\xi \in \varXi ) \end{aligned}$$
(28)

and determine \(u\in H^1_0(D)\) such that

$$\begin{aligned} A(\xi )u=z+g(\xi )+A(\xi )\widehat{u}(\xi )\quad (\xi \in \varXi ). \end{aligned}$$

Then \(u+\widehat{u}\) solves the random boundary value problem (28) of the random elliptic PDE. Based on the model assumptions, we see that both A and the altered right-hand side \(g + A \widehat{u}\) satisfy the necessary conditions for our theory.

In order to illustrate the sensitivities for this model problem, we generate a sample on \(\varXi \) of size M and replace \(\mathbb P\) by the corresponding empirical measure \(\mathbb P_{M}\). To avoid confusion, we leave off the M subscript in the following discussion. The underlying spaces for the control, state, and adjoint variables are discretized using standard piecewise linear finite elements on a uniform mesh with parameter \(h = 1/(2^8-1)\).

The resulting finite-dimensional deterministic optimization problems are solved by a semismooth Newton method as proposed in [14, 24, 25], which in the current context is mesh-independent. The forward and adjoint equations as well as the linear equations for the Hessian-vector products are solved directly and the reduced system for the semismooth Newton step is calculated using conjugate gradients as some of the associated operators are only implicitly given.

The algorithm terminates once the discrete \(\ell ^2\)-norm of the residual of the optimality system reaches a tolerance of 1e-8. On average, the semismooth Newton algorithm required between three to four iterations with approximately 13 to 15 inner iterations for the CG solver. The solution \(z_{\mathbb P}\) of this problem is treated as the “true” solution. See Fig. 1 for the solution \(z_{\mathbb P}\) with \(M = 100\) and its behavior out of sample on the mean value of the state u. As expected, the control behaves well on average.

Fig. 1
figure 1

(l) Optimal solution \(z_{\mathbb P}\) of (25) for \(M = 100\), \(h = 1/(2^{10}-1)\). (r) Empirical mean value of \(u(z_{\mathbb P})\) out of sample using 1000 iid random realizations of \((\xi _1,\dots ,\xi _4)\)

We now investigate \(d_{\mathfrak {F}}(\mathbb P_{N},\mathbb P)\), where \(\mathbb P_{N}\) is an empirical probability measure based on a random sample of size N (\(1 \le N \le M\)) of the M realizations \(\varvec{\xi }^1,\dots , \varvec{\xi }^{M}\). Since a direct calculation of \(d_{\mathfrak {F}}(\mathbb P_{N},\mathbb P)\) for even simple examples can be quite challenging, we calculate instead \(v(\mathbb P_N)\) and \(z_{\mathbb P_{N}}\) and compute the associated errors

$$\begin{aligned} | \nu \left( \mathbb P_N\right) - \nu \left( \mathbb P\right) | \text { and } \Vert z_{\mathbb P_{N}} - z_{\mathbb P}\Vert _{H}. \end{aligned}$$

The experiment is repeated for each \(N =1,\dots , M\) 100 times. The plots of these errors can be see in Fig. 2.

The choice of setting \(M =100\) was made for computationally expediency. Though the semismooth Newton method implemented for this example works exceptionally well, it still requires hundreds if not thousands of PDE solves for each N. Nevertheless, due to the relatively small choice of M we cannot use all the data points in Fig. 2 in order to estimate a rate of convergence. As a compromise, we only use the data points for \(N =1,\dots , 25\). This provides us with the estimates

$$\begin{aligned} \Vert z_{\mathbb P_{N}} - z_{\mathbb P}\Vert _{H} = O(N^{ -0.4995})\;\text { and }\; | \nu \left( \mathbb P_{N}\right) - \nu (\mathbb P)| = O\left( N^{ -0.5591}\right) \end{aligned}$$

The estimates were generated by solving a linear least-squares regression problem with ridge (\(\ell ^2\)) regularization term. Whereas the convergence of the optimal values would indicate a rate of roughly 1/2, the rate of convergence for the optimal solutions would be a marked improvement over the theoretical estimates based on probability metrics. One explanation for this would be that the integrands for this specific model problem have much better smoothness properties than assumed. We leave a rigorous analysis of this question for future research.

Fig. 2
figure 2

Scatter plots of the computed errors (l) \(| \nu (\mathbb P_N) - \nu (\mathbb P)|\) (r) \(\Vert z_{\mathbb P_{N}} - z_{\mathbb P}\Vert _{H}\) for \(M = 100\), \(N \in \{1,\dots , 100\}\) repeated 100 times

8 Conclusion

We have shown that a number known results for stability of stochastic programs with finite-dimensional decision spaces can be carried over to infinite dimensions, provided certain convexity conditions are satisfied. As perhaps expected the best possible growth rates for the convergence of solutions using probability metrics are of Hölder-type. Our analysis gives rise to a number of possible future directions and open questions. For example, we consider a setting that is primarily related to risk-neutral problems, whereas problems using typically non-smooth risk measures in the objective are of significant interest for robust engineering design. In addition, the assumptions on the objective and linear operator \(\varSigma \) are essential for the qualitative stability analysis as the infinite-dimensional setting often requires us to make use of the weak topology. Without such regularity properties, it is unclear how to proceed in general. Finally, in order to develop adaptive numerical optimization methods based on estimates of the type (21), we need to more closely investigate the convergence of \(d_{\mathfrak {F}}(\mathbb P_N,\mathbb P)\) for the class of functions \(\mathfrak {F}\) used in Sect. 6. and specific approximations \(\mathbb P_N\). Section 7 provides some deeper insight into this question, however the picture is far from complete.