On quantitative stability in infinite-dimensional optimization under uncertainty

The vast majority of stochastic optimization problems require the approximation of the underlying probability measure, e.g., by sampling or using observations. It is therefore crucial to understand the dependence of the optimal value and optimal solutions on these approximations as the sample size increases or more data becomes available. Due to the weak convergence properties of sequences of probability measures, there is no guarantee that these quantities will exhibit favorable asymptotic properties. We consider a class of infinite-dimensional stochastic optimization problems inspired by recent work on PDE-constrained optimization as well as functional data analysis. For this class of problems, we provide both qualitative and quantitative stability results on the optimal value and optimal solutions. In both cases, we make use of the method of probability metrics. The optimal values are shown to be Lipschitz continuous with respect to a minimal information metric and consequently, under further regularity assumptions, with respect to certain Fortet-Mourier and Wasserstein metrics. We prove that even in the most favorable setting, the solutions are at best Hölder continuous with respect to changes in the underlying measure. The theoretical results are tested in the context of Monte Carlo approximation for a numerical example involving PDE-constrained optimization under uncertainty.


Introduction
In stochastic optimization, stability usually refers to the continuity properties of optimal values and solution sets as mappings from a set of probability measures, endowed with a suitable distance, into the extended reals and solution space, respectively, see [22]. The distance on the space of probability measures must be selected in order to allow the estimation of differences of the relevant functions, which depend on probability measures. There exists a wide variety of possible distances of probability measures based on various constructions [19,27]. In the present context, distances with ζ -structure introduced first in [27] appear as a natural choice. For a given metric space Ω such a distance is of the form where F is a family of Borel measurable functions from Ω to R and P, Q are Borel probability measures on Ω. Note that the distance d F is non-negative, symmetric and satisfies the triangle inequality. It also satisfies d F (P, P) = 0 and is, thus, a probability metric in the sense of [27]. However, d F (P, Q) = 0 only implies P = Q, when the family F is rich enough. Hence, d F is a semi-metric in the usual terminology, in general. The smallest relevant family F of Borel measurable functions in our stability studies contains only those functions which appear in the stochastic optimization problem under consideration. In this case, d F may be called the minimal information (m.i.) distance. Stability results with respect to such m.i. distances serve as the starting point (i) to study stability with respect to the weak convergence of probability measures and (ii) to enlarge the family F properly by functions sharing essential analytical properties with the original ones. The latter strategy may lead to probability metrics that enjoy desirable properties like dual representations and convergence characterizations.
This method of probability metrics provides quantitative statements on the stability of solutions and optimal values of stochastic programming problems. Nevertheless, the existing theory has not been developed for optimization problems in which the design or decision variables may be infinite-dimensional, as is the case in PDE-constrained optimization under uncertainty. By including infinite-dimensional feasible sets, we introduce a number of complications; in particular, the loss of norm compactness of the feasible set, even in the case of convex, closed, and bounded feasible sets. After fixing some essential notation in Sect. 2, we state the class of infinitedimensional stochastic optimization problems for which we study stability in the subsequent sections in Sect. 3. Section 4 contains qualitative results by providing conditions that imply convergence of optimal values and solutions if the underlying sequence of probability distribution converges to a limit distribution in some sense. In Sect. 5 we show that optimal values and solutions even allow Lipschitz or Hölder estimates in terms of the ζ -distance. In Sect. 6 we argue that the stability analysis of the preceding sections applies to certain stochastic PDE-constrained optimization problems. Finally, in Sect. 7, we provide a study of the results in Sect. 6 for the case when Monte Carlo approximations are used.

Notation and preliminary results
We assume throughout that Ω is a complete separable metric space, i.e., Polish space, and F the associated Borel σ -algebra. In addition, we will work exclusively with Borel probability measures P : F → [0, 1]. that ensure (Ω, F, P) is a complete probability space. In particular, if Ω is finite, then F must be the power set of Ω. For the abstract portion of our results, we will always assume that Θ is a real separable Hilbert space and Θ ad ⊂ Θ is a nonempty, closed, and convex set. Given an appropriately chosen integrand f : Θ × Ω → R, we will consider the potentially infinite dimensional stochastic optimization problems: Here, we also introduce the notion of optimal value function ν as a function from the space of all Borel probability measures P(Ω) into R. This potentially extended realvalued function will play a key role in our discussions. If necessary, we will denote the expectation by either E or if it is not clear in context E P to denote the dependence on the measures P. Given a complete probability space (Ω, F, P) and a real Banach space W , we recall the definition of the Bochner space L p (Ω, F, P; W ) p ∈ [1, ∞) as the space of (equivalence classes) of strongly measurable functions v, which map Ω into W and satisfy Ω v(ω) p W dP(ω) < +∞, cf. [13]. If p = ∞, then L ∞ (Ω, F, P; W ) consists of essentially bounded W -valued strongly measurable functions. In both cases L p (Ω, F, P; W ) is a Banach space with the natural norm(s) In the special case when W = R, we simply write L p (Ω, F, P). As usual norm convergence will be typically denote by →, whereas signifies weak convergence and * weak-star convergence. In our stability analysis, we make use of distances with ζ -structure on P(Ω) having the form (1). We will refer to these objects as ζ -distances for brevity. Given a family F of Borel measurable functions from Ω into R, the ζ -distance d F on (Ω, F) is a highly flexible structure that allows us to define so-called minimal information distances and Fortet-Mourier metrics; each defined in the text below. Properties of ζ -distances like a characterization of its maximal generator and its relation to weak convergence of probability measures can be found in [18,23]. Recall that a sequence of probability measures {P N } on (Ω, F) is said to narrowly/weakly converge to the probability measure P provided is the space of all bounded continuous functions on Ω. A family F of Borel measurable functions is called a P-uniformity class if lim N →∞ d F (P, P N ) = 0 holds for each sequence {P N } of probability measures converging weakly to P. For example, it is known that F is a P-uniformity class if F is uniformly bounded and P({ω ∈ Ω : F is not equicontinuous at ω}) = 0 [23].
Finally, we recall that given a σ -algebra F along with a nominal σ -finite σ -additive positive measure P on Ω, e.g., a Borel probability measure P ∈ P(Ω), the dual space of L ∞ (Ω, F, P) can be identified with the space of all finitely additive signed measures ba(Ω) on F absolutely continuous with respect to P, see e.g., [9].

The optimization problem
In order to carry out the stability analysis, we restrict the class of allowable integrands f (θ, ω). These restrictions will henceforth be taken as standing assumptions. The particular class considered in this paper is inspired by applications in PDE-constrained optimization under uncertainty in which the PDE is given by a linear elliptic partial differential equation with random coefficients, right-hand side, and/or boundary conditions. We refer the reader to [17] for an overview of the state-of-the-art theory including more general objective functions and risk measures. In addition, many problems in functional data analysis exhibit practically the same form used below, see e.g., [21].
Let V and H be real Hilbert spaces such that V embeds continuously into H , and θ d ∈ H . For θ ∈ Θ and ω ∈ Ω, let Σ(ω)θ = S(ω)θ − s(ω), where S(ω) : Θ → V is bounded and linear in θ independently of ω and s(ω) ∈ H . We then define Furthermore, we assume that for every θ ∈ Θ (or θ ∈ Θ ad ) and any P ∈ P(Ω) This implicitly adds mild regularity assumptions on S and s that are typically fulfilled when S is related to the solution of a parametric elliptic PDE, e.g., S(·)θ, s(·) ∈ L 2 (Ω, F, P; V ). Then for α > 0, we consider the optimization problems Theorem 1 Problem (3) admits a unique solution θ P ∈ Θ ad for every P ∈ P(Ω).

Qualitative stability
In this section, we provide stability results that ensure the approximating optimization problems obtained by replacing P by another probability measure Q ∈ P(Ω) will converge in some sense to the original problem. In particular, we show that the solutions θ Q will strongly converge to θ P provided Q converges to P with respect to a properly chosen ζ -distance. This basic result serves as the foundation needed to prove continuity of the solutions with respect to narrow convergence of probability measures. However, in order to do the latter, additional regularity properties will be required on the integrands with respect to ω. These stability results are in some sense more versatile than the quantitative results below. Nevertheless, they do not provide us with a rate of convergence.

Theorem 2
In the context of Theorem 1, suppose we are given a sequence {P N } with P N ∈ P(Ω) and a probability measure P ∈ P(Ω) such that d F (P N , P) → 0, where F is any class of measurable functions from Ω into R large enough to contain f (θ, ·) for any θ ∈ {θ N : N ∈ N} ∪ {θ P } with θ N := θ P N . Then θ N → θ P strongly in Θ as N → +∞.

Remark 1
The obvious candidate for the set F would be to choose the collection of all possible integrands f (θ, ·) : Ω → R indexed by θ ∈ Θ ad . In terms of the associated ζ -distance, this would result in what is referred to in [19,20,22] as the minimal information metric.
For any fixed θ ∈ Θ ad , it follows from the hypotheses that Substituting this into (4) we obtain the bound α 2 θ N Therefore, we again appeal to the weak lower semicontinuity of (8) and (9), we have θ N Θ → θ Θ . Then since Θ is a Hilbert space and θ N θ , the assertion follows.
An alternative perspective on qualitative stability is offered by our next result.
Here, we will prove convergence of the sequence of minimizers under different data assumptions on the integrands and a different form of weak convergence of measures. We note that in PDE-constrained optimization under uncertainty these assumptions are less restrictive than they may appear. In particular, we do not require f (θ, ·) : Ω → R to be continuous as is needed below for the Fortet-Mourier metric. The caveat here is the requirement that P N is absolutely continuous with respect to P.

Theorem 3 In addition to the standing assumptions, fix some P ∈ P(Ω) and suppose that for all
is completely continuous. Let {P N } ⊂ P(Ω) such that 1. for all N ∈ N P N << P (P N is absolutely continuous with respect to P) and 2. P N → P with respect to the weak-star topology on (L ∞ (Ω, F, P)) * .
Proof As noted in Sect. 2, each Q ∈ P(Ω) is an element of (L ∞ (Ω, F, P)) * provided Q << P. The rest of the proof mirrors that of Theorem 2. Given the sequence of minimizers {θ N } we immediately obtain a uniform bound on θ N from (4) since for any θ ∈ Θ ad f (θ, ·) ∈ L ∞ (Ω, F, P) and P N * P. As before, we let θ N ∞ =1 denote the weakly convergent subsequence and θ the associated weak limit.
Turning now to the estimate derived in (6), we see that As in the proof of Theorem 2, we obtain optimality of θ by adapting the inequality (7), i.e., for every θ ∈ Θ ad we have Here, the regularity of the integrand ensures that from which it follows that θ = θ P . As in the proof of Theorem 2, we can again argue that the entire sequence θ P N weakly converges to θ P . In order to prove norm convergence, we note that Then by the complete continuity and regularity assumptions, we have This completes the proof.
Next, we return to the setting using probability metrics to obtain some important implications of the Theorem 2 under further regularity assumptions on the integrands. In our setting, we recall that the space of all (Borel) probability measures with finite p-th moments is defined by as N → ∞. This type of weak convergence shares an intimate link with a certain class of ζ -distances known as Fortet-Mourier metrics. To start, for p ∈ [1, ∞), we define the sets F p (Ω) of locally Lipschitz functions with a certain p-related growth condition by We then define the Fortet-Mourier metric of order p for two measures P, Q ∈ P p (Ω) by In particular, ζ p is equivalent to the so-called Kantorovich-Rubinstein functional with cost function given by (see [19,Theorem 5.3.3] along with the discussion on page 93 in [19] Then the solution mapping P p (Ω) Q → θ Q ∈ Θ ad is continuous with respect to weak (narrow) convergence of probability measures on P p (Ω).
Proof After rescaling the integrands by 1/L > 0, this is a direct consequence of Theorem 2 in light of the preceding arguments.
Finally, we note that an alternative means of obtaining the sequential convergence result in Proposition 1 would be to appeal to the link between the weak topology on P p (Ω) and the topologies generated by the well-known Wasserstein distance W p of order p. Let γ i (i = 1, 2) be the projection onto the first or second term of Ω × Ω, respectively, and for π ∈ P(Ω × Ω) denote the marginals by π i := π # γ i := π • γ −1 i . Then the Wasserstein distance of order p is given by For this distance we have the estimate: Therefore, if we start with a sequence of Borel probability measures {P N } and P ∈ P(Ω) such that W p (P N , P) → 0, then we obtain the same statement as in Proposition 1. However, as shown in [20] the convergence in the Wasserstein metric is potentially strictly slower than in the Fortet-Mourier metric.

Quantitative stability
As mentioned above, quantitative stability provides us with Lipschitz or Hölder-type estimates of the optimal values and solutions. This is first done using the "weakest" possible ζ -distance d F in which F is directly related to the integrands without additional regularity assumptions on the dependence on ω. Further estimates related to Fortet-Mourier and Wasserstein metrics then follow as corollaries under Lipschitz conditions on the integrands.

Theorem 4
Under the standing asusmptions, let P, Q ∈ P(Ω) and let F be any set of Borel measurable functions that contains g θ (·) := f (θ, ·), where θ = θ P and θ Q . Then we have the estimates: Proof For the Lipschitz estimate (13), we observe that For the Hölder estimate on the solution mapping (14), we start by letting δ := d F (Q, P) and observing that Using the quadratic term α 2 · 2 , the convexity of the integrand f (·, ω), the convexity of Θ ad , and optimality of θ P , we have It follows that Combining (16) with (15) above yields as was to be shown.
We may now return to the results at the end of Sect. 4 in order to derive quantitative stability results using the familiar Fortet-Mourier and Wasserstein distances.

Corollary 1
In the setting of Theorem 4, suppose there exists a p ∈ [1, ∞) and some L > 0 such that Then the following estimates hold for the associated Fortet-Mourier metric: Consequently, the following estimates hold for the Wasserstein metric W p : Here, we set c(ω 0 , P,

An application to PDE-constrained optimization under uncertainty
We conclude the theoretical discussion with an example from PDE-constrained optimization under uncertainty to demonstrate the applicability of our results. For the purpose of discussion, we start with an arbitrary Borel probability measure P in order to introduce the problem. The notation mirrors in part that of [17]. For readability, we indicate the associated quantities in the general discussions above. Our goal is twofold. Under reasonable data assumptions, we will define a class of integrands F, which allow us to use the m.i. metric in our stability results to prove convergence of optimal values and optimal solutions under weak convergence of a sequence of measures {P N }. Afterwards, assuming the underlying function spaces are replaced by finite-dimensional subspaces defined by a standard finite-element discretization, we derive an a priori-type error bound and argue that the fully discrete problems converge to the original continuous problems.
We will consider a class of optimization problems in which we seek to minimize the objective function subject to the condition that z ∈ Z ad ⊂ L 2 (D), a closed bounded convex set, and for z ∈ Z ad , u solves a random partial differential equation (PDE), which we define below. The function u d ∈ L 2 (D) can be thought of as a desired state or general target function.
To be precise, let D ⊂ R n be an open, bounded Lipschitz domain, V = H 1 0 (D) the classical Sobolev space with inner product (·, ·) V , and V = H −1 (D) its dual with norm · und dual pairing ·, · . In addition, let H = L 2 (D) with inner product (·, ·) H . Furthermore, let Ω be a metric space with metric ρ and Borel σ -field F and let P be a Borel probability measure.
Within this framework, we consider the bilinear form a(·, ·; ω) : where ω ∈ Ω. The associated random PDE can be defined pointwise as: for all test functions v ∈ C ∞ 0 (D), z varying in a constraint set Z ad ⊂ H , and g, b i j : D ×Ω → R, which are assumed to be at least measurable in Ω and square (Lebesgue) integrable in D.
In order to use our stability results in this context, we will need further data assumptions on the bilinear form. For each ω ∈ Ω, we let A(ω) : V → V be the mapping The existence of A(ω) as a bounded linear operator is due to the Riesz representation theorem and the Lax-Milgram lemma based on the following assumptions, see e.g., [1, Chap. 6., 6.1, 6.2]. First, we impose the condition that there exist L > γ > 0 such that for all x ∈ D and P-a.e. ω ∈ Ω. This implies that each b i j is essentially bounded in D × Ω with respect to the associated product measure. Moreover, the mapping A(ω) : V → V is uniformly positive definite (with constant γ ) and uniformly bounded (with constant L) with respect to P-a.e. ω ∈ Ω, i.e., In addition, the inverse mapping A(ω) −1 : V → V is again uniformly positive definite (with constant 1 L ) and uniformly bounded (with constant 1 γ ) with respect to P-a.e. ω ∈ Ω.
Under these data assumptions, we may now define a class of integrands for the m.i. metric for which d F (P N , P) → 0 for any sequence of Borel probability measures {P N } that converges weakly to P. To this end, we define the functions f : Our aim is to derive conditions implying that the class F = { f (z, ·) : z ∈ Z ad } is uniformly bounded and equicontinuous, and consequently a P-uniformity class, cf. [23].

Lemma 1
In addition to the standing assumptions, suppose that u d ∈ H and g ∈ L 2 (Ω, F, P; V ). Then for some C > 0 we have Proof For z ∈ Z ad and ω ∈ Ω we obtain where we used the Poincaré-Friedrichs inequality twice (with some constant c) and the uniform boundedness of A(ω) −1 by 1 γ . Since z varies in the bounded set Z ad , there is a positive constant C such that the assertion holds.
Lemma 1 provides us with a uniform bound on all functions in F. The proof of the following Lemma makes use of a result in [12]. Since this book is not readily available in English, we provide it and a short proof in the "Appendix".

Lemma 2 In addition to the assumptions of Lemma 1, suppose there is a constant
Then for any g ∈ V and ω, ω ∈ Ω we have Proof First we study the dependence of A(ω)u on ω. Let u, v ∈ V and ω, ω ∈ Ω.
with Frobenius norm B(x; ω, ω ) , ∇u the gradient of u in the sense of Sobolev and |∇u| its Euclidean norm. Hence, we obtain Next we consider the mapping K t (ω)u = u − t J −1 (A(ω)u − g) for some t ∈ (0, 2γ L 2 ), ω ∈ Ω, g ∈ V and any u ∈ V . Then it follows from Proposition 2 that for any u, u ∈ V and κ(t) = 1 − 2tγ + t 2 L 2 < 1. Furthermore, the unique fixed point of K t (ω) belongs to the ball around zero with radius For any u ∈ V and ω, ω ∈ Ω we have and apply Proposition 3 from the "Appendix" with P = Ω, We obtain wherex(g, ω) = A(ω) −1 g and r = t 1−κ(t) g . This completes the proof.
We now have enough results to prove that F constitutes a P-uniformity class, which is a direct consequence of the following theorem.
Theorem 5 In addition to the hypotheses of Lemma 2, assume g ∈ L ∞ (Ω, F, P; H ) and there existsC > 0 such that

e. in D).
Then F is uniformly bounded and equi-Lipschitz continuous with respect to ρ on Ω.
Proof For any z ∈ Z ad and ω ∈ Ω let F(z, ω) = A(ω) −1 (z + g(ω)) − u d . Our assumptions imply that F(·, ·) is P-a.s. uniformly bounded in H by some constantĈ (see the proof of Lemma 1). Furthermore, we obtain for any z ∈ Z ad and ω, ω ∈ Ω: where we used the uniform boundedness and the Poincaré-Friedrichs' inequality. For the first term on the right-hand side we argue as in Lemma 2. The second term is estimated by Now, we use again Lemma 2 for the first term and both Lemma 1 and the assumption on g for the second. Combining these observations, we obtain the assertion.
Theorem 5 establishes the P-uniformity of F under relatively mild assumptions. In particular, we do not require the terms b i j and g to be smooth in any way with respect to ω. Of course in many interesting applications, see e.g. [4,5], one can demonstrate much higher regularity of F(z, ω) = A(ω) −1 (z + g(ω)) − u d in ω for each z ∈ Z ad if some smoothness of b i j and g is in fact available. And though the presence of · 2 H in f (z, ω) rules out 1-Lipschitz continuity, as required by estimates using the Wasserstein distances, the quantitative estimates using the Fortet-Mourier metric, e.g., ζ 2 , which are incidentally strictly sharper than the Wasserstein estimates, are still applicable provided the local growth conditions for functions in F p (Ω) are fulfilled. On the other hand, as a general point of critique, the minimal information metric along with Theorems 4 and 5 preclude the need to enlarge the set of integrands F in order make use of the rougher estimates given by the Fortet-Mourier estimates.
This brings us to our second goal of this section. In order to solve optimization problems of the type with f ∈ F numerically, not only P but the decision variables z and the underlying partial differential equation must be approximated. In order words, using a finite-sample-based approximation P N of P and a finite-element discretization for the deterministic quantities in H and V , we would typically consider the finitedimensional problems of the type: Here, A h i is defined from A(ω) by replacing ω with a realization ω i and V by a finitedimensional subspace V h derived by a standard finite-element approximation. The term and Z h ad is an approximate of Z ad using a finite-element approximation of H .
For piecewise constant approximations of the control z, the original ideas date back to Falk [10], see the recent chapter [2] for a quick reference. For another more recent, comprehensive treatment see [26] (for a posteriori error estimates) as well as the monograph [15,Chap. 3] and the many references therein. In many of these works, as well as in our concrete example in the next section, the set Z ad has the concrete form: where a, a are sufficiently regular; often L ∞ (D). Therefore, for the sake of argument, we may assume that if D, B, g, a, a, and u d are sufficiently regular, then there exists a (random variable) C N ≥ 0 along with a real q ∈ (0, 3/2] such that Here, the dependence on N results from the fact that the typical estimates, see e.g., [2,Thm. 10], depend on constants related to the coefficients of the PDE, the right-hand side g, and u d , which is stochastic. Therefore, in the estimate (20), C N is related to a realization of a random sample of length N . Using (20), we can apply Theorem 4 and the triangle inequality to obtain the estimate It should also be noted that for piecewise constant approximations of Z h ad , we can only expected q = 1. The case for q = 3/2 requires a significant amount of regularity, and q ∈ (1, 3/2) depends on both the regularity as well as the type of discretization. In light of Theorem 5, (21) guarantees the convergence of the fully discrete solutions z h P N to the original infinite dimensional solution, provided h ↓ 0, and P N → P weakly.

Monte Carlo approximation and numerical illustration for PDE-constrained optimization
In this final section, we provide additional discussions on the behavior of d F (P, P N ) for Monte Carlo approximations P N of P in order to give the reader a better impression of the potential rate of convergence; in particular, for the setting in the previous section.

Monte Carlo approximation
For the sake of argument, we assume that the integrands f can be written where ξ : Ω → Ξ ⊂ R d is a random vector. This is often the case for PDE-models and is used in the example in Sect. 7.2 below. Let ξ 1 , ξ 2 , . . . , ξ N , . . . be independent identically distributed Ξ -valued random vectors on some probability space (Ω, F, P) having the common probability law P, i.e., P = P • (ξ 1 ) −1 . In this context, we define the empirical measures and the empirical or Monte Carlo approximation of the stochastic program (18) with sample size N , i.e., The optimal value ν(P N (·)) of (23) is a real random variable and the solution z P N (·) an H -valued random element. It is well known that the sequence {P N (·)} of empirical measures converges weakly to P P-almost surely (see, e.g., [8,Theorem 11.4.1]). Theorem 5 implies that the class F is a P-uniformity class and, hence, the sequence d F (P N (·), P) converges to zero P-almost surely. According to Theorem 4, the sequences {v(P N (·))} and z P N (·) of empirical optimal values and solutions converge P-almost surely to their true optimal values and solutions v(P) and z P , respectively.
In order to obtain rates of convergence for the sequences of empirical optimal values and solutions, we consider their mean or mean square distance and conclude from Theorem 4 and Corollary 1 the estimates where L is the constant appearing in Corollary 1 andL is given byL = 2L 2 α . Unfortunately, convergence rates of the mean convergence of d F (P N (·), P) are not known, but for F containing uniformly bounded and equi-Lipschitz continuous functions, the rate coincides essentially with that of E[ζ 1 (P N , P)]. For the latter it follows from [6, Theorem 1] that the estimate is finite. It is argued in [11] that the rate (24) is sharp if P is the uniform distribution on the unit cube [−1, 1] d . For the numerical experiments in the next subsection, we observe better rates than guaranteed by the probability metrics. This is a limitation of the probability metrics themselves, not their application to the problem at hand.

Numerical illustration
The previous discussion provides useful upper bounds for the case of the empirical measure P N . However, these bounds are based on estimates of the Wasserstein metrics, not the minimal information metric. Therefore, in order to obtain a better impression of the possible rate of convergence associated with d F (P, P N ) in practice, we provide here a concrete numerical example.
We propose a model problem, which is based on [16, Ex. 6.1, Ex. 6.2]. Let α = 10 −3 , D = (0, 1), u(x) = sin(50.0 * x/π ), and consider the optimal control problem minimize z∈L 2 (D) where z ∈ Z ad with Furthermore, we suppose that with random variables ξ i : Ω → R, i = 1, 2, 3, 4, such that the supports ξ i , i = 1, 2, 3, are [−1, 1] and the support of ξ 4 is [1,3]. For the sake of illustration, we assume that each of these random variables is uniformly distributed and after the usual change of variables, we consider instead 3], endowed with the associated uniform density. We define ξ := (ξ 1 , . . . , ξ 4 ) ∈ Ξ . Finally, we note that the linearity of the differential equation allows us to use the superposition principle to separate the unique, z-dependent solution u(z) into the sum of random fields as u(z) = S(z)+ u, where S(z) maps z from H into the solution space and u is a fixed random field. With the aim of showing that the necessary continuity properties used in the previous section are fulfilled, let u : D × Ξ → R denote a function such that u(·, ξ) belongs to H 1 (D) and u satisfies the inhomogeneous random boundary condition in (27b). If we then recast (27) as the random elliptic PDE with random boundary conditions and determine u ∈ H 1 0 (D) such that Then u + u solves the random boundary value problem (28) of the random elliptic PDE. Based on the model assumptions, we see that both A and the altered right-hand side g + A u satisfy the necessary conditions for our theory.
In order to illustrate the sensitivities for this model problem, we generate a sample on Ξ of size M and replace P by the corresponding empirical measure P M . To avoid confusion, we leave off the M subscript in the following discussion. The underlying spaces for the control, state, and adjoint variables are discretized using standard piecewise linear finite elements on a uniform mesh with parameter h = 1/(2 8 − 1).
The resulting finite-dimensional deterministic optimization problems are solved by a semismooth Newton method as proposed in [14,24,25], which in the current context is mesh-independent. The forward and adjoint equations as well as the linear equations for the Hessian-vector products are solved directly and the reduced system for the semismooth Newton step is calculated using conjugate gradients as some of the associated operators are only implicitly given.
The algorithm terminates once the discrete 2 -norm of the residual of the optimality system reaches a tolerance of 1e-8. On average, the semismooth Newton algorithm required between three to four iterations with approximately 13 to 15 inner iterations for the CG solver. The solution z P of this problem is treated as the "true" solution. See Fig. 1 for the solution z P with M = 100 and its behavior out of sample on the mean value of the state u. As expected, the control behaves well on average.
We now investigate d F (P N , P), where P N is an empirical probability measure based on a random sample of size N (1 ≤ N ≤ M) of the M realizations ξ 1 , . . . , ξ M . Since a direct calculation of d F (P N , P) for even simple examples can be quite challenging, we calculate instead v(P N ) and z P N and compute the associated errors |ν (P N ) − ν (P) | and z P N − z P H .
The experiment is repeated for each N = 1, . . . , M 100 times. The plots of these errors can be see in Fig. 2. The estimates were generated by solving a linear least-squares regression problem with ridge ( 2 ) regularization term. Whereas the convergence of the optimal values would indicate a rate of roughly 1/2, the rate of convergence for the optimal solutions would be a marked improvement over the theoretical estimates based on probability metrics. One explanation for this would be that the integrands for this specific model problem have much better smoothness properties than assumed. We leave a rigorous analysis of this question for future research.

Conclusion
We have shown that a number known results for stability of stochastic programs with finite-dimensional decision spaces can be carried over to infinite dimensions, provided certain convexity conditions are satisfied. As perhaps expected the best possible growth rates for the convergence of solutions using probability metrics are of Hölder-type. Our analysis gives rise to a number of possible future directions and open questions. For example, we consider a setting that is primarily related to risk-neutral problems, whereas problems using typically non-smooth risk measures in the objective are of significant interest for robust engineering design. In addition, the assumptions on the objective and linear operator Σ are essential for the qualitative stability analysis as the infinite-dimensional setting often requires us to make use of the weak topology. Without such regularity properties, it is unclear how to proceed in general. Finally, in order to develop adaptive numerical optimization methods based on estimates of the type (21), we need to more closely investigate the convergence of d F (P N , P) for the class of functions F used in Sect. 6. and specific approximations P N . Section 7 provides some deeper insight into this question, however the picture is far from complete.
The following result can be found, e.g., in [7].
Proposition 3 (Theorem 1A.4 in [7]) Let P be a metric space with metric ρ and X a complete metric space with metric d. Let F : P × X → X and assume that there exist α ∈ (0, 1) and λ > 0 such that Then, for each p ∈ P, there exists a unique fixed point x( p) of F( p, ·) in X and we have the estimate d(x( p), x( p )) ≤ λ 1 − α ρ( p, p ) ∀ p, p ∈ P .