Abstract
The vast majority of stochastic optimization problems require the approximation of the underlying probability measure, e.g., by sampling or using observations. It is therefore crucial to understand the dependence of the optimal value and optimal solutions on these approximations as the sample size increases or more data becomes available. Due to the weak convergence properties of sequences of probability measures, there is no guarantee that these quantities will exhibit favorable asymptotic properties. We consider a class of infinite-dimensional stochastic optimization problems inspired by recent work on PDE-constrained optimization as well as functional data analysis. For this class of problems, we provide both qualitative and quantitative stability results on the optimal value and optimal solutions. In both cases, we make use of the method of probability metrics. The optimal values are shown to be Lipschitz continuous with respect to a minimal information metric and consequently, under further regularity assumptions, with respect to certain Fortet-Mourier and Wasserstein metrics. We prove that even in the most favorable setting, the solutions are at best Hölder continuous with respect to changes in the underlying measure. The theoretical results are tested in the context of Monte Carlo approximation for a numerical example involving PDE-constrained optimization under uncertainty.
1 Introduction
In stochastic optimization, stability usually refers to the continuity properties of optimal values and solution sets as mappings from a set of probability measures, endowed with a suitable distance, into the extended reals and solution space, respectively, see [22]. The distance on the space of probability measures must be selected in order to allow the estimation of differences of the relevant functions, which depend on probability measures. There exists a wide variety of possible distances of probability measures based on various constructions [19, 27]. In the present context, distances with \(\zeta \)-structure introduced first in [27] appear as a natural choice. For a given metric space \(\varOmega \) such a distance is of the form
where \(\mathfrak {F}\) is a family of Borel measurable functions from \(\varOmega \) to \(\overline{\mathbb {R}}\) and \(\mathbb {P}\), \(\mathbb {Q}\) are Borel probability measures on \(\varOmega \). Note that the distance \(d_{\mathfrak {F}}\) is non-negative, symmetric and satisfies the triangle inequality. It also satisfies \(d_{\mathfrak {F}}(\mathbb {P},\mathbb {P})=0\) and is, thus, a probability metric in the sense of [27]. However, \(d_{\mathfrak {F}}(\mathbb {P},\mathbb {Q})=0\) only implies \(\mathbb {P}=\mathbb {Q}\), when the family \(\mathfrak {F}\) is rich enough. Hence, \(d_{\mathfrak {F}}\) is a semi-metric in the usual terminology, in general.
The smallest relevant family \(\mathfrak {F}\) of Borel measurable functions in our stability studies contains only those functions which appear in the stochastic optimization problem under consideration. In this case, \(d_{\mathfrak {F}}\) may be called the minimal information (m.i.) distance. Stability results with respect to such m.i. distances serve as the starting point (i) to study stability with respect to the weak convergence of probability measures and (ii) to enlarge the family \(\mathfrak {F}\) properly by functions sharing essential analytical properties with the original ones. The latter strategy may lead to probability metrics that enjoy desirable properties like dual representations and convergence characterizations.
This method of probability metrics provides quantitative statements on the stability of solutions and optimal values of stochastic programming problems. Nevertheless, the existing theory has not been developed for optimization problems in which the design or decision variables may be infinite-dimensional, as is the case in PDE-constrained optimization under uncertainty. By including infinite-dimensional feasible sets, we introduce a number of complications; in particular, the loss of norm compactness of the feasible set, even in the case of convex, closed, and bounded feasible sets.
After fixing some essential notation in Sect. 2, we state the class of infinite-dimensional stochastic optimization problems for which we study stability in the subsequent sections in Sect. 3. Section 4 contains qualitative results by providing conditions that imply convergence of optimal values and solutions if the underlying sequence of probability distribution converges to a limit distribution in some sense. In Sect. 5 we show that optimal values and solutions even allow Lipschitz or Hölder estimates in terms of the \(\zeta \)-distance. In Sect. 6 we argue that the stability analysis of the preceding sections applies to certain stochastic PDE-constrained optimization problems. Finally, in Sect. 7, we provide a study of the results in Sect. 6 for the case when Monte Carlo approximations are used.
2 Notation and preliminary results
We assume throughout that \(\varOmega \) is a complete separable metric space, i.e., Polish space, and \(\mathcal {F}\) the associated Borel \(\sigma \)-algebra. In addition, we will work exclusively with Borel probability measures \(\mathbb P : \mathcal {F} \rightarrow [0,1]\). that ensure \((\varOmega ,\mathcal {F},\mathbb P)\) is a complete probability space. In particular, if \(\varOmega \) is finite, then \(\mathcal {F}\) must be the power set of \(\varOmega \). For the abstract portion of our results, we will always assume that \(\varTheta \) is a real separable Hilbert space and \(\varTheta _\mathrm{ad} \subset \varTheta \) is a nonempty, closed, and convex set. Given an appropriately chosen integrand \(f: \varTheta \times \varOmega \rightarrow \overline{\mathbb R}\), we will consider the potentially infinite dimensional stochastic optimization problems:
Here, we also introduce the notion of optimal value function \(\nu \) as a function from the space of all Borel probability measures \(\mathcal {P}(\varOmega )\) into \(\overline{\mathbb R}\). This potentially extended real-valued function will play a key role in our discussions. If necessary, we will denote the expectation by either \(\mathbb E\) or if it is not clear in context \(\mathbb E_{\mathbb P}\) to denote the dependence on the measures \(\mathbb P\).
Given a complete probability space \((\varOmega ,\mathcal {F},\mathbb P)\) and a real Banach space W, we recall the definition of the Bochner space \(L^p(\varOmega ,\mathcal {F},\mathbb {P};W)\) \(p \in [1,\infty )\) as the space of (equivalence classes) of strongly measurable functions v, which map \(\varOmega \) into W and satisfy \(\int _{\varOmega } \Vert v(\omega ) \Vert ^{p}_{W} \ \mathrm {d} \mathbb P(\omega ) < +\infty \), cf. [13]. If \(p = \infty \), then \(L^\infty (\varOmega ,\mathcal {F},\mathbb {P};W)\) consists of essentially bounded W-valued strongly measurable functions. In both cases \(L^p(\varOmega ,\mathcal {F},\mathbb {P};W)\) is a Banach space with the natural norm(s)
In the special case when \(W=\mathbb {R}\), we simply write \(L^p(\varOmega ,\mathcal {F},\mathbb {P})\). As usual norm convergence will be typically denote by \(\rightarrow \), whereas \(\rightharpoonup \) signifies weak convergence and \({\mathop {\rightharpoonup }\limits ^{*}}\) weak-star convergence.
In our stability analysis, we make use of distances with \(\zeta \)-structure on \(\mathcal {P}(\varOmega )\) having the form (1). We will refer to these objects as \(\zeta \)-distances for brevity. Given a family \(\mathfrak {F}\) of Borel measurable functions from \(\varOmega \) into \(\overline{\mathbb R}\), the \(\zeta \)-distance \(d_{\mathfrak {F}}\) on \((\varOmega ,\mathcal {F})\) is a highly flexible structure that allows us to define so-called minimal information distances and Fortet-Mourier metrics; each defined in the text below. Properties of \(\zeta \)-distances like a characterization of its maximal generator and its relation to weak convergence of probability measures can be found in [18, 23]. Recall that a sequence of probability measures \(\left\{ \mathbb P_N \right\} \) on \((\varOmega , \mathcal {F})\) is said to narrowly/weakly converge to the probability measure \(\mathbb P\) provided
where \(C^0_{b}(\varOmega )\) is the space of all bounded continuous functions on \(\varOmega \). A family \(\mathfrak {F}\) of Borel measurable functions is called a \(\mathbb {P}\)-uniformity class if
holds for each sequence \(\left\{ \mathbb {P}_N \right\} \) of probability measures converging weakly to \(\mathbb {P}\). For example, it is known that \(\mathfrak {F}\) is a \(\mathbb {P}\)-uniformity class if \(\mathfrak {F}\) is uniformly bounded and \(\mathbb {P}(\{\omega \in \varOmega :\mathfrak {F} \text{ is } \text{ not } \text{ equicontinuous } \text{ at } \omega \})=0\) [23].
Finally, we recall that given a \(\sigma \)-algebra \(\mathcal {F}\) along with a nominal \(\sigma \)-finite \(\sigma \)-additive positive measure \(\mathbb P\) on \(\varOmega \), e.g., a Borel probability measure \(\mathbb P \in \mathcal {P}(\varOmega )\), the dual space of \(L^{\infty }(\varOmega ,\mathcal {F},\mathbb P)\) can be identified with the space of all finitely additive signed measures \(\mathrm {ba}(\varOmega )\) on \(\mathcal {F}\) absolutely continuous with respect to \(\mathbb P\), see e.g., [9].
3 The optimization problem
In order to carry out the stability analysis, we restrict the class of allowable integrands \(f(\theta ,\omega )\). These restrictions will henceforth be taken as standing assumptions. The particular class considered in this paper is inspired by applications in PDE-constrained optimization under uncertainty in which the PDE is given by a linear elliptic partial differential equation with random coefficients, right-hand side, and/or boundary conditions. We refer the reader to [17] for an overview of the state-of-the-art theory including more general objective functions and risk measures. In addition, many problems in functional data analysis exhibit practically the same form used below, see e.g., [21].
Let V and H be real Hilbert spaces such that V embeds continuously into H, and \(\theta _d \in H\). For \(\theta \in \varTheta \) and \(\omega \in \varOmega \), let \(\varSigma (\omega )\theta = S(\omega )\theta - s(\omega )\), where \(S(\omega ):\varTheta \rightarrow V\) is bounded and linear in \(\theta \) independently of \(\omega \) and \(s(\omega ) \in H\). We then define
Furthermore, we assume that for every \(\theta \in \varTheta \) (or \(\theta \in \varTheta _\mathrm{ad}\)) and any \(\mathbb P \in \mathcal {P}(\varOmega )\)
This implicitly adds mild regularity assumptions on S and s that are typically fulfilled when S is related to the solution of a parametric elliptic PDE, e.g., \( S(\cdot )\theta , s(\cdot ) \in L^2(\varOmega ,\mathcal {F},\mathbb P; V). \) Then for \(\alpha > 0\), we consider the optimization problems
Theorem 1
Problem (3) admits a unique solution \(\theta _{\mathbb P} \in \varTheta _\mathrm{ad}\) for every \(\mathbb P \in \mathcal {P}(\varOmega )\).
Proof
For existence, it suffices to prove F is proper, convex, lower-semicontinuous and coercive, cf. e.g., [3, Sec. 3.3]. Since \(f(\theta ,\cdot ) \in L^1(\varOmega , \mathcal {F}, \mathbb P)\) for any \(\theta \in \varTheta _\mathrm{ad}\) and \(f \ge 0\), F is proper. Convexity follows directly from the \(\mathbb P\)-a.e. convexity of \(\theta \mapsto f(\theta ,\omega ) + \frac{\alpha }{2} \Vert \theta \Vert ^2_{\varTheta }\) and the monotonicity of the expectation \(\mathbb E_{\mathbb P}\). Lower semicontinuity is a result of Fatou’s lemma: Let \(\theta _k \rightarrow \theta \) in \(\varTheta \). Then since \(f \ge 0\) and \(f(\theta _k,\cdot ) \rightarrow f(\theta ,\cdot )\) \(\mathbb P\)-a.s. (by the assumptions on S) we have
Since \(\mathbb E_{\mathbb P}[f(\theta ,\cdot )] \ge 0\) for all \(\theta \in \varTheta _\mathrm{ad}\), F is coercive. Given F is proper, convex, and lower semicontinuous, F is weakly lower semicontinuous, as well. Since F is coercive, the level set \( \left\{ \theta \in \varTheta _\mathrm{ad} \left| F(\theta ) \le \alpha _0\right. \right\} , \) where \(\theta _0 \in \varTheta _\mathrm{ad}\) and \(\alpha _0 := F(\theta _0)\), is weakly sequentially compact. It then follows from the direct method that (3) admits a solution \(\theta _{\mathbb P}\). Given \(\alpha > 0\), F is strictly convex. Hence, \(\theta _{\mathbb P}\) is unique. \(\square \)
4 Qualitative stability
In this section, we provide stability results that ensure the approximating optimization problems obtained by replacing \(\mathbb P\) by another probability measure \(\mathbb Q \in \mathcal {P}(\varOmega )\) will converge in some sense to the original problem. In particular, we show that the solutions \(\theta _{\mathbb Q}\) will strongly converge to \(\theta _{\mathbb P}\) provided \(\mathbb Q\) converges to \(\mathbb P\) with respect to a properly chosen \(\zeta \)-distance. This basic result serves as the foundation needed to prove continuity of the solutions with respect to narrow convergence of probability measures. However, in order to do the latter, additional regularity properties will be required on the integrands with respect to \(\omega \). These stability results are in some sense more versatile than the quantitative results below. Nevertheless, they do not provide us with a rate of convergence.
Theorem 2
In the context of Theorem 1, suppose we are given a sequence \(\left\{ \mathbb P_N\right\} \) with \(\mathbb P_N \in \mathcal {P}(\varOmega )\) and a probability measure \(\mathbb P \in \mathcal {P}(\varOmega )\) such that \(d_{\mathfrak {F}}(\mathbb P_{N},\mathbb P) \rightarrow 0\), where \(\mathfrak {F}\) is any class of measurable functions from \(\varOmega \) into \(\overline{\mathbb R}\) large enough to contain \(f(\theta ,\cdot )\) for any \(\theta \in \left\{ \theta _{N} : N \in \mathbb N \right\} \cup \left\{ \theta _{\mathbb P}\right\} \) with \(\theta _{N} := \theta _{\mathbb P_{N}}\). Then \(\theta _{N} \rightarrow \theta _{\mathbb P}\) strongly in \(\varTheta \) as \(N \rightarrow +\infty \).
Remark 1
The obvious candidate for the set \(\mathfrak {F}\) would be to choose the collection of all possible integrands \(f(\theta ,\cdot ) :\varOmega \rightarrow \mathbb R\) indexed by \(\theta \in \varTheta _\mathrm{ad}\). In terms of the associated \(\zeta \)-distance, this would result in what is referred to in [19, 20, 22] as the minimal information metric.
Proof
We first show \(\left\{ \theta _{N}\right\} \) is uniformly bounded in \(\varTheta \). Indeed, we have
For any fixed \(\theta \in \varTheta _\mathrm{ad}\), it follows from the hypotheses that
Substituting this into (4) we obtain the bound
Since \(d_{\mathfrak {F}}(\mathbb P_N, \mathbb P) \rightarrow 0\), \(\left\{ \theta _N\right\} \) is bounded in \(\varTheta \). Therefore, there exists a \(\widehat{\theta } \in \varTheta _\mathrm{ad}\) and a weakly convergent subsequence \(\left\{ \theta _{N_{\ell }}\right\} \) such that \(\theta _{N_{\ell }} \rightharpoonup \widehat{\theta }\) as \(\ell \rightarrow +\infty \).
For fixed \(\mathbb P\), it follows from the proof of Theorem 1 that \(\mathbb E_{\mathbb P}[f(\cdot )] : \varTheta \rightarrow \mathbb R\) is weakly lower semicontinuous. Therefore,
It then follows from (6), the optimality of \(\theta _{N_{\ell }}\), and (5) that for any \(\theta \in \varTheta _\mathrm{ad}\) we have:
Hence, \(\theta _{\mathbb P} = \widehat{\theta }\). Since \(\theta _{\mathbb P}\) is unique and the previous arguments hold for all weakly convergent subsequences of \(\left\{ \theta _{N}\right\} \), we have \(\theta _{N} \rightharpoonup \theta _{\mathbb P} = \widehat{\theta }\) as \(N \rightarrow +\infty \). It remains to prove \(\Vert \theta _{N} - \theta _{\mathbb P}\Vert _{\varTheta } \rightarrow 0\).
Clearly we have the inequality
by weak lower semicontinuity of the norm \(\Vert \cdot \Vert _{\varTheta }\). On the other hand, by rearranging terms, the definition of \(\theta _{N}\) and feasibility of \(\widehat{\theta }\) yield
Therefore, we again appeal to the weak lower semicontinuity of \(\mathbb E_{\mathbb P}[f(\cdot )]\) on \(\varTheta \) to obtain
Combining (8) and (9), we have \( \Vert \theta _{N} \Vert _{\varTheta } \rightarrow \Vert \widehat{\theta } \Vert _{\varTheta }\). Then since \(\varTheta \) is a Hilbert space and \(\theta _{N} \rightharpoonup \widehat{\theta }\), the assertion follows. \(\square \)
An alternative perspective on qualitative stability is offered by our next result. Here, we will prove convergence of the sequence of minimizers under different data assumptions on the integrands and a different form of weak convergence of measures. We note that in PDE-constrained optimization under uncertainty these assumptions are less restrictive than they may appear. In particular, we do not require \(f(\theta ,\cdot ) : \varOmega \rightarrow \mathbb R\) to be continuous as is needed below for the Fortet-Mourier metric. The caveat here is the requirement that \(\mathbb P_N\) is absolutely continuous with respect to \(\mathbb P\).
Theorem 3
In addition to the standing assumptions, fix some \(\mathbb P \in \mathcal {P}(\varOmega )\) and suppose that for all \(\theta \in \varTheta _\mathrm{ad}\) \(f(\theta ,\cdot ) \in L^{\infty }(\varOmega ,\mathcal {F},\mathbb P)\). Assume furthermore that the superposition operator \(\varPhi : \varTheta \rightarrow L^{\infty }(\varOmega ,\mathcal {F},\mathbb P)\) defined by
is completely continuous. Let \(\left\{ \mathbb P_{N}\right\} \subset \mathcal {P}(\varOmega )\) such that
-
1.
for all \(N \in \mathbb N\) \(\mathbb P_{N}<< \mathbb P\) \((\mathbb P_{N}\) is absolutely continuous with respect to \(\mathbb P)\) and
-
2.
\(\mathbb P_{N}\rightarrow \mathbb P\) with respect to the weak-star topology on \((L^{\infty }(\varOmega ,\mathcal {F},\mathbb P))^{*}\).
Then \(\theta _{N} \rightarrow \theta _{\mathbb P}\).
Proof
As noted in Sect. 2, each \(\mathbb Q \in \mathcal {P}(\varOmega )\) is an element of \((L^{\infty }(\varOmega ,\mathcal {F},\mathbb P))^*\) provided \(\mathbb Q<< \mathbb P\). The rest of the proof mirrors that of Theorem 2. Given the sequence of minimizers \(\left\{ \theta _{N}\right\} \) we immediately obtain a uniform bound on \(\Vert \theta _N\Vert \) from (4) since for any \(\theta \in \varTheta _\mathrm{ad}\) \(f(\theta ,\cdot ) \in L^{\infty }(\varOmega ,\mathcal {F},\mathbb P)\) and \(\mathbb P_N {\mathop {\rightharpoonup }\limits ^{*}} \mathbb P\). As before, we let \(\left\{ \theta _{N_\ell }\right\} _{\ell =1}^{\infty }\) denote the weakly convergent subsequence and \(\widehat{\theta }\) the associated weak limit.
Turning now to the estimate derived in (6), we see that
Here, \(\varPhi (\theta _{N_l}) \rightarrow \varPhi (\widehat{\theta })\) strongly in \(L^{\infty }(\varOmega ,\mathcal {F},\mathbb P)\) due the assumption of complete continuity. Therefore, both \(\mathbb E_{\mathbb P}[f(\theta _{N_{\ell }})]\) and \(\mathbb E_{\mathbb P_{N_{\ell }}}[f(\theta _{N_{\ell }})]\) converge to \(\mathbb E_{\mathbb P}[f(\widehat{\theta })]\).
As in the proof of Theorem 2, we obtain optimality of \(\widehat{\theta }\) by adapting the inequality (7), i.e., for every \(\theta \in \varTheta _\mathrm{ad}\) we have
Here, the regularity of the integrand ensures that \(\mathbb E_{\mathbb P_{N_{\ell }}}[f(\theta )] \) converges to \(\mathbb E_{\mathbb P}[f(\theta )]\); from which it follows that \(\widehat{\theta } = \theta _{\mathbb P}\). As in the proof of Theorem 2, we can again argue that the entire sequence \(\left\{ \theta _{\mathbb P_{N}}\right\} \) weakly converges to \(\theta _{\mathbb P}\).
In order to prove norm convergence, we note that
Then by the complete continuity and regularity assumptions, we have
This completes the proof. \(\square \)
Next, we return to the setting using probability metrics to obtain some important implications of the Theorem 2 under further regularity assumptions on the integrands. In our setting, we recall that the space of all (Borel) probability measures with finite p-th moments is defined by
for some arbitrary \(\omega _0 \in \varOmega \). We recall that a sequence \(\left\{ \mathbb P_{N}\right\} \subset \mathcal {P}_{p}(\varOmega )\) converges weakly (narrowly) provided for all \(\varphi \in C^0_b(\varOmega )\)
as \(N \rightarrow \infty \). This type of weak convergence shares an intimate link with a certain class of \(\zeta \)-distances known as Fortet-Mourier metrics. To start, for \(p \in [1,\infty )\), we define the sets \(\mathcal {F}_p(\varOmega )\) of locally Lipschitz functions with a certain p-related growth condition by
We then define the Fortet-Mourier metric of order p for two measures \(\mathbb P,\mathbb Q \in \mathcal {P}_{p}(\varOmega )\) by
In particular, \(\zeta _{p}\) is equivalent to the so-called Kantorovich-Rubinstein functional with cost function given by
(see [19, Theorem 5.3.3] along with the discussion on page 93 in [19]). Furthermore, it follows from [19, Theorem 6.2.1] that \(\left\{ \mathbb P_{N}\right\} \subset \mathcal {P}_{p}(\varOmega )\) converges weakly (narrowly) to \(\mathbb P \in \mathcal {P}_{p}(\varOmega )\) if and only if \(\zeta _p(\mathbb P_N,\mathbb P) \rightarrow 0\) as \(N \rightarrow +\infty \). We may therefore connect Theorem 2 directly to the weak convergence of probability measures.
Proposition 1
In the setting of Theorem 2, suppose there exists a \(p \in [1,\infty )\) and some \(L > 0\) such that
Then the solution mapping \(\mathcal {P}_{p}(\varOmega ) \ni \mathbb Q \mapsto \theta _{\mathbb Q} \in \varTheta _\mathrm{ad}\) is continuous with respect to weak (narrow) convergence of probability measures on \(\mathcal {P}_{p}(\varOmega )\).
Proof
After rescaling the integrands by \(1/L > 0\), this is a direct consequence of Theorem 2 in light of the preceding arguments. \(\square \)
Finally, we note that an alternative means of obtaining the sequential convergence result in Proposition 1 would be to appeal to the link between the weak topology on \(\mathcal {P}_{p}(\varOmega )\) and the topologies generated by the well-known Wasserstein distance \(W_p\) of order p. Let \(\gamma _i\) \((i=1,2)\) be the projection onto the first or second term of \(\varOmega \times \varOmega \), respectively, and for \(\pi \in \mathcal {P}(\varOmega \times \varOmega )\) denote the marginals by \(\pi ^i := \pi _{\#} \gamma _i := \pi \circ \gamma _i^{-1}\). Then the Wasserstein distance of order p is given by
For this distance we have the estimate:
Therefore, if we start with a sequence of Borel probability measures \(\left\{ \mathbb P_{N}\right\} \) and \(\mathbb P \in \mathcal {P}(\varOmega )\) such that \(W_p(\mathbb P_N,\mathbb P) \rightarrow 0\), then we obtain the same statement as in Proposition 1. However, as shown in [20] the convergence in the Wasserstein metric is potentially strictly slower than in the Fortet-Mourier metric.
5 Quantitative stability
As mentioned above, quantitative stability provides us with Lipschitz or Hölder-type estimates of the optimal values and solutions. This is first done using the “weakest” possible \(\zeta \)-distance \(d_{\mathfrak {F}}\) in which \(\mathfrak {F}\) is directly related to the integrands without additional regularity assumptions on the dependence on \(\omega \). Further estimates related to Fortet-Mourier and Wasserstein metrics then follow as corollaries under Lipschitz conditions on the integrands.
Theorem 4
Under the standing asusmptions, let \(\mathbb P,\mathbb Q \in \mathcal {P}(\varOmega )\) and let \(\mathfrak {F}\) be any set of Borel measurable functions that contains \(g_\theta (\cdot ) := f(\theta ,\cdot )\), where \(\theta = \theta _{\mathbb P}\) and \(\theta _{\mathbb Q}\). Then we have the estimates:
Proof
For the Lipschitz estimate (13), we observe that
For the Hölder estimate on the solution mapping (14), we start by letting \(\delta := d_{\mathfrak {F}}(\mathbb Q,\mathbb P)\) and observing that
Using the quadratic term \(\frac{\alpha }{2} \Vert \cdot \Vert ^2\), the convexity of the integrand \(f(\cdot ,\omega )\), the convexity of \(\varTheta _\mathrm{ad}\), and optimality of \(\theta _{\mathbb P}\), we have
It follows that
Combining (16) with (15) above yields
as was to be shown. \(\square \)
We may now return to the results at the end of Sect. 4 in order to derive quantitative stability results using the familiar Fortet-Mourier and Wasserstein distances.
Corollary 1
In the setting of Theorem 4, suppose there exists a \(p \in [1,\infty )\) and some \(L > 0\) such that
Then the following estimates hold for the associated Fortet-Mourier metric:
Consequently, the following estimates hold for the Wasserstein metric \(W_p\):
Here, we set \(c(\omega _0,\mathbb P,\mathbb Q) = \left( 1 + \int _{\varOmega } d(\omega _0,\omega )^p \mathrm {d} \mathbb P(\omega ) + \int _{\varOmega } d(\omega _0,\omega )^p \mathrm {d}\mathbb Q(\omega )\right) ^{\frac{p-1}{p}} \).
6 An application to PDE-constrained optimization under uncertainty
We conclude the theoretical discussion with an example from PDE-constrained optimization under uncertainty to demonstrate the applicability of our results. For the purpose of discussion, we start with an arbitrary Borel probability measure \(\mathbb P\) in order to introduce the problem. The notation mirrors in part that of [17]. For readability, we indicate the associated quantities in the general discussions above.
Our goal is twofold. Under reasonable data assumptions, we will define a class of integrands \(\mathfrak {F}\), which allow us to use the m.i. metric in our stability results to prove convergence of optimal values and optimal solutions under weak convergence of a sequence of measures \(\{\mathbb {P}_{N}\}\). Afterwards, assuming the underlying function spaces are replaced by finite-dimensional subspaces defined by a standard finite-element discretization, we derive an a priori-type error bound and argue that the fully discrete problems converge to the original continuous problems.
We will consider a class of optimization problems in which we seek to minimize the objective function
subject to the condition that \(z \in Z_\mathrm{ad} \subset L^2(D)\), a closed bounded convex set, and for \(z \in Z_\mathrm{ad}\), u solves a random partial differential equation (PDE), which we define below. The function \(\widetilde{u}_{d} \in L^2(D)\) can be thought of as a desired state or general target function.
To be precise, let \(D\subset \mathbb {R}^{n}\) be an open, bounded Lipschitz domain, \(V=H_0^1(D)\) the classical Sobolev space with inner product \((\cdot ,\cdot )_V\), and \(V^{\star }=H^{-1}(D)\) its dual with norm \(\Vert \cdot \Vert _{\star }\) und dual pairing \(\langle \cdot ,\cdot \rangle \). In addition, let \(H=L^{2}(D)\) with inner product \((\cdot ,\cdot )_H\). Furthermore, let \(\varOmega \) be a metric space with metric \(\rho \) and Borel \(\sigma \)-field \(\mathcal {F}\) and let \(\mathbb {P}\) be a Borel probability measure.
Within this framework, we consider the bilinear form \(a(\cdot ,\cdot ;\omega ):V\times V\rightarrow \mathbb {R}\) defined by
where \(\omega \in \varOmega \). The associated random PDE can be defined pointwise as:
for all test functions \(v\in C_{0}^{\infty }(D)\), z varying in a constraint set \(Z_\mathrm{ad}\subset H\), and \(g, b_{ij}:D\times \varOmega \rightarrow \mathbb {R}\), which are assumed to be at least measurable in \(\varOmega \) and square (Lebesgue) integrable in D.
In order to use our stability results in this context, we will need further data assumptions on the bilinear form. For each \(\omega \in \varOmega \), we let \(A(\omega ) : V\rightarrow V^{\star }\) be the mapping
The existence of \(A(\omega )\) as a bounded linear operator is due to the Riesz representation theorem and the Lax-Milgram lemma based on the following assumptions, see e.g., [1, Chap. 6., 6.1, 6.2]. First, we impose the condition that there exist \(L> \gamma >0\) such that
for all \(x\in D\) and \(\mathbb {P}\)-a.e. \(\omega \in \varOmega \). This implies that each \(b_{ij}\) is essentially bounded in \(D \times \varOmega \) with respect to the associated product measure. Moreover, the mapping \(A(\omega ) :V\rightarrow V^{\star }\) is uniformly positive definite (with constant \(\gamma \)) and uniformly bounded (with constant L) with respect to \(\mathbb {P}\)-a.e. \(\omega \in \varOmega \), i.e.,
In addition, the inverse mapping \(A(\omega )^{-1}:V^{\star }\rightarrow V\) is again uniformly positive definite (with constant \(\frac{1}{L}\)) and uniformly bounded (with constant \(\frac{1}{\gamma }\)) with respect to \(\mathbb {P}\)-a.e. \(\omega \in \varOmega \).
Under these data assumptions, we may now define a class of integrands for the m.i. metric for which \(d_{\mathfrak {F}}(\mathbb P_{N},\mathbb P) \rightarrow 0\) for any sequence of Borel probability measures \(\{\mathbb {P}_{N}\}\) that converges weakly to \(\mathbb {P}\). To this end, we define the functions \(f:Z_\mathrm{ad}\times \varOmega \rightarrow \mathbb {R}\) by
Our aim is to derive conditions implying that the class \(\mathfrak {F}=\{f(z,\cdot ):z\in Z_\mathrm{ad}\}\) is uniformly bounded and equicontinuous, and consequently a \(\mathbb {P}\)-uniformity class, cf. [23].
Lemma 1
In addition to the standing assumptions, suppose that \(\widetilde{u}_{d}\in H\) and \(g\in L^{2}(\varOmega ,\mathcal {F},\mathbb P;V^{\star })\). Then for some \(C>0\) we have
Proof
For \(z\in Z_\mathrm{ad}\) and \(\omega \in \varOmega \) we obtain
where we used the Poincaré-Friedrichs inequality twice (with some constant c) and the uniform boundedness of \(\Vert A(\omega )^{-1}\Vert \) by \(\frac{1}{\gamma }\). Since z varies in the bounded set \(Z_\mathrm{ad}\), there is a positive constant C such that the assertion holds. \(\square \)
Lemma 1 provides us with a uniform bound on all functions in \(\mathfrak {F}\). The proof of the following Lemma makes use of a result in [12]. Since this book is not readily available in English, we provide it and a short proof in the “Appendix”.
Lemma 2
In addition to the assumptions of Lemma 1, suppose there is a constant \(C>0\) such that
Then for any \(g\in V^{\star }\) and \(\omega ,\omega '\in \varOmega \) we have
where \(\kappa (t) = \sqrt{1-2\gamma t + L^2 t^2}\).
Proof
First we study the dependence of \(A(\omega )u\) on \(\omega \). Let \(u,v\in V\) and \(\omega ,\omega '\in \varOmega \).
where \(B(x;\omega ,\omega ')\) denotes the \(n\times n\)-matrix
with Frobenius norm \(\Vert B(x;\omega ,\omega ')\Vert \), \(\nabla u\) the gradient of u in the sense of Sobolev and \(|\nabla u|\) its Euclidean norm. Hence, we obtain
Next we consider the mapping \(K_{t}(\omega )u=u-tJ^{-1}(A(\omega )u-g)\) for some \(t\in (0,\frac{2\gamma }{L^2})\), \(\omega \in \varOmega \), \(g\in V^{\star }\) and any \(u\in V\). Then it follows from Proposition 2 that
for any \(u,u'\in V\) and \(\kappa (t)=\sqrt{1-2t\gamma +t^2L^2}<1\). Furthermore, the unique fixed point of \(K_t(\omega )\) belongs to the ball around zero with radius
For any \(u\in V\) and \(\omega ,\omega '\in \varOmega \) we have
and apply Proposition 3 from the “Appendix” with \(P=\varOmega \), \(X=\mathbb {B}(0,r)=\{u\in V:\Vert u\Vert _{V}\le r\}\), \(F(p,u)=K_{t}(\omega )u\) and \(F(p',u)=K_{t}(\omega ')u\). We obtain
where \(\bar{x}(g,\omega )=A(\omega )^{-1}g\) and \(r=\frac{t}{1-\kappa (t)}\Vert g\Vert _{\star }\). This completes the proof. \(\square \)
We now have enough results to prove that \(\mathfrak {F}\) constitutes a \(\mathbb P\)-uniformity class, which is a direct consequence of the following theorem.
Theorem 5
In addition to the hypotheses of Lemma 2, assume \(g\in L^{\infty }(\varOmega ,\mathcal {F},\mathbb {P}; H)\) and there exists \(\bar{C}>0\) such that
Then \(\mathfrak {F}\) is uniformly bounded and equi-Lipschitz continuous with respect to \(\rho \) on \(\varOmega \).
Proof
For any \(z\in Z_\mathrm{ad}\) and \(\omega '\in \varOmega \) let \(F(z,\omega )=A(\omega )^{-1}(z+g(\omega )) -\widetilde{u}_{d}\). Our assumptions imply that \(F(\cdot ,\cdot )\) is \(\mathbb {P}\)-a.s. uniformly bounded in H by some constant \(\hat{C}\) (see the proof of Lemma 1). Furthermore, we obtain for any \(z\in Z_\mathrm{ad}\) and \(\omega ,\omega '\in \varOmega \):
where we used the uniform boundedness and the Poincaré-Friedrichs’ inequality. For the first term on the right-hand side we argue as in Lemma 2. The second term is estimated by
Now, we use again Lemma 2 for the first term and both Lemma 1 and the assumption on g for the second. Combining these observations, we obtain the assertion. \(\square \)
Theorem 5 establishes the \(\mathbb P\)-uniformity of \(\mathfrak {F}\) under relatively mild assumptions. In particular, we do not require the terms \(b_{ij}\) and g to be smooth in any way with respect to \(\omega \). Of course in many interesting applications, see e.g. [4, 5], one can demonstrate much higher regularity of \(F(z,\omega )=A(\omega )^{-1}(z+g(\omega )) -\widetilde{u}_{d}\) in \(\omega \) for each \(z \in Z_\mathrm{ad}\) if some smoothness of \(b_{ij}\) and g is in fact available. And though the presence of \(\Vert \cdot \Vert ^2_{H}\) in \(f(z,\omega )\) rules out 1-Lipschitz continuity, as required by estimates using the Wasserstein distances, the quantitative estimates using the Fortet-Mourier metric, e.g., \(\zeta _2\), which are incidentally strictly sharper than the Wasserstein estimates, are still applicable provided the local growth conditions for functions in \(\mathcal {F}_p(\varOmega )\) are fulfilled. On the other hand, as a general point of critique, the minimal information metric along with Theorems 4 and 5 preclude the need to enlarge the set of integrands \(\mathfrak {F}\) in order make use of the rougher estimates given by the Fortet-Mourier estimates.
This brings us to our second goal of this section. In order to solve optimization problems of the type
with \(f \in \mathfrak {F}\) numerically, not only \(\mathbb P\) but the decision variables z and the underlying partial differential equation must be approximated. In order words, using a finite-sample-based approximation \(\mathbb P_{N}\) of \(\mathbb P\) and a finite-element discretization for the deterministic quantities in H and V, we would typically consider the finite-dimensional problems of the type:
Here, \(A^h_i\) is defined from \(A(\omega )\) by replacing \(\omega \) with a realization \(\omega _i\) and V by a finite-dimensional subspace \(V_{h}\) derived by a standard finite-element approximation. The term \(u^h_{d,i} = (A_i^h)^{-1}(g^{h}_i - \widetilde{u}_{d,h})\) and \(Z^h_\mathrm{ad}\) is an approximate of \(Z_\mathrm{ad}\) using a finite-element approximation of H.
For piecewise constant approximations of the control z, the original ideas date back to Falk [10], see the recent chapter [2] for a quick reference. For another more recent, comprehensive treatment see [26] (for a posteriori error estimates) as well as the monograph [15, Chap. 3] and the many references therein. In many of these works, as well as in our concrete example in the next section, the set \(Z_\mathrm{ad}\) has the concrete form:
where \(\underline{a}, \overline{a}\) are sufficiently regular; often \(L^{\infty }(D)\). Therefore, for the sake of argument, we may assume that if D, B, g, \(\underline{a}\), \(\overline{a}\), and \(\widetilde{u}_d\) are sufficiently regular, then there exists a (random variable) \(C_{N} \ge 0\) along with a real \(q \in (0,3/2]\) such that
Here, the dependence on N results from the fact that the typical estimates, see e.g., [2, Thm. 10], depend on constants related to the coefficients of the PDE, the right-hand side g, and \(u_{d}\), which is stochastic. Therefore, in the estimate (20), \(C_{N}\) is related to a realization of a random sample of length N. Using (20), we can apply Theorem 4 and the triangle inequality to obtain the estimate
It should also be noted that for piecewise constant approximations of \(Z^{h}_\mathrm{ad}\), we can only expected \(q = 1\). The case for \(q = 3/2\) requires a significant amount of regularity, and \(q \in (1,3/2)\) depends on both the regularity as well as the type of discretization. In light of Theorem 5, (21) guarantees the convergence of the fully discrete solutions \(z^h_{\mathbb P_N}\) to the original infinite dimensional solution, provided \(h \downarrow 0\), and \(\mathbb P_{N} \rightarrow \mathbb P\) weakly.
7 Monte Carlo approximation and numerical illustration for PDE-constrained optimization
In this final section, we provide additional discussions on the behavior of \(d_{\mathfrak {F}}(\mathbb P, \mathbb P_N)\) for Monte Carlo approximations \(\mathbb P_N\) of \(\mathbb P\) in order to give the reader a better impression of the potential rate of convergence; in particular, for the setting in the previous section.
7.1 Monte Carlo approximation
For the sake of argument, we assume that the integrands f can be written
where \(\varvec{\xi }: \varOmega \rightarrow \varXi \subset \mathbb R^d\) is a random vector. This is often the case for PDE-models and is used in the example in Sect. 7.2 below.
Let \(\varvec{\xi }^1,\varvec{\xi }^2,\ldots ,\varvec{\xi }^N,\ldots \) be independent identically distributed \(\varXi \)-valued random vectors on some probability space \((\varOmega ,\mathcal {F}, P)\) having the common probability law \(\mathbb P\), i.e., \(\mathbb P= P \circ (\varvec{\xi }^1)^{-1}\). In this context, we define the empirical measures
and the empirical or Monte Carlo approximation of the stochastic program (18) with sample size N, i.e.,
The optimal value \(\nu (\mathbb P_{N}(\cdot ))\) of (23) is a real random variable and the solution \(z_{\mathbb P_{N}(\cdot )}\) an H-valued random element. It is well known that the sequence \(\left\{ \mathbb P_{N}(\cdot )\right\} \) of empirical measures converges weakly to \(\mathbb P\) P-almost surely (see, e.g., [8, Theorem 11.4.1]). Theorem 5 implies that the class \(\mathfrak {F}\) is a P-uniformity class and, hence, the sequence \(d_{\mathfrak {F}}(\mathbb P_{N}(\cdot ),\mathbb P)\) converges to zero P-almost surely. According to Theorem 4, the sequences \(\left\{ v(\mathbb P_{N}(\cdot ))\right\} \) and \(\left\{ z_{\mathbb P_{N}(\cdot )}\right\} \) of empirical optimal values and solutions converge P-almost surely to their true optimal values and solutions \(v(\mathbb P)\) and \(z_{\mathbb P}\), respectively.
In order to obtain rates of convergence for the sequences of empirical optimal values and solutions, we consider their mean or mean square distance and conclude from Theorem 4 and Corollary 1 the estimates
where L is the constant appearing in Corollary 1 and \(\hat{L}\) is given by \(\hat{L}=2L\sqrt{\frac{2}{\alpha }}\). Unfortunately, convergence rates of the mean convergence of \(d_{\mathfrak {F}}(\mathbb P_{N}(\cdot ),\mathbb P)\) are not known, but for \(\mathfrak {F}\) containing uniformly bounded and equi-Lipschitz continuous functions, the rate coincides essentially with that of \(\mathbb E[\zeta _{1}(\mathbb P_{N},\mathbb P)]\). For the latter it follows from [6, Theorem 1] that the estimate
is valid if \(d\ge 3\), \(s>\frac{d}{d-1}\), \(\kappa _{s}\) is a constant only depending on s and the sth absolute moment
is finite. It is argued in [11] that the rate (24) is sharp if \(\mathbb P\) is the uniform distribution on the unit cube \([-1,1]^{d}\). For the numerical experiments in the next subsection, we observe better rates than guaranteed by the probability metrics. This is a limitation of the probability metrics themselves, not their application to the problem at hand.
7.2 Numerical illustration
The previous discussion provides useful upper bounds for the case of the empirical measure \(\mathbb P_N\). However, these bounds are based on estimates of the Wasserstein metrics, not the minimal information metric. Therefore, in order to obtain a better impression of the possible rate of convergence associated with \(d_{\mathfrak {F}}(\mathbb P, \mathbb P_N)\) in practice, we provide here a concrete numerical example.
We propose a model problem, which is based on [16, Ex. 6.1, Ex. 6.2]. Let \(\alpha = 10^{-3}\), \(D = (0,1)\), \(\widetilde{u}(x) = \sin (50.0*x/\pi )\), and consider the optimal control problem
where \(z \in Z_\mathrm{ad}\) with
and \(u=u(z)\in L^{\infty }(\varOmega , \mathcal {F}, \mathbb P;H^1(D))\) solves the weak form of
Furthermore, we suppose that
with random variables \(\xi _i :\varOmega \rightarrow \mathbb R\), \(i=1,2,3,4\), such that the supports \(\xi _i\), \(i=1,2,3\), are \([-1,1]\) and the support of \(\xi _4\) is [1, 3]. For the sake of illustration, we assume that each of these random variables is uniformly distributed and after the usual change of variables, we consider instead
with \(\varXi = [-1,1]\times [-1,1] \times [-1,1] \times [1,3]\), endowed with the associated uniform density. We define \(\varvec{\xi }:= (\xi _1,\dots ,\xi _4) \in \varXi \).
Finally, we note that the linearity of the differential equation allows us to use the superposition principle to separate the unique, z-dependent solution u(z) into the sum of random fields as \(u(z) = S(z) + \widehat{u}\), where S(z) maps z from H into the solution space and \(\widehat{u}\) is a fixed random field. With the aim of showing that the necessary continuity properties used in the previous section are fulfilled, let \(\widehat{u}:D\times \varXi \rightarrow \mathbb R\) denote a function such that \(\widehat{u}(\cdot ,\xi )\) belongs to \(H^1(D)\) and \(\widehat{u}\) satisfies the inhomogeneous random boundary condition in (27b). If we then recast (27) as the random elliptic PDE with random boundary conditions
and determine \(u\in H^1_0(D)\) such that
Then \(u+\widehat{u}\) solves the random boundary value problem (28) of the random elliptic PDE. Based on the model assumptions, we see that both A and the altered right-hand side \(g + A \widehat{u}\) satisfy the necessary conditions for our theory.
In order to illustrate the sensitivities for this model problem, we generate a sample on \(\varXi \) of size M and replace \(\mathbb P\) by the corresponding empirical measure \(\mathbb P_{M}\). To avoid confusion, we leave off the M subscript in the following discussion. The underlying spaces for the control, state, and adjoint variables are discretized using standard piecewise linear finite elements on a uniform mesh with parameter \(h = 1/(2^8-1)\).
The resulting finite-dimensional deterministic optimization problems are solved by a semismooth Newton method as proposed in [14, 24, 25], which in the current context is mesh-independent. The forward and adjoint equations as well as the linear equations for the Hessian-vector products are solved directly and the reduced system for the semismooth Newton step is calculated using conjugate gradients as some of the associated operators are only implicitly given.
The algorithm terminates once the discrete \(\ell ^2\)-norm of the residual of the optimality system reaches a tolerance of 1e-8. On average, the semismooth Newton algorithm required between three to four iterations with approximately 13 to 15 inner iterations for the CG solver. The solution \(z_{\mathbb P}\) of this problem is treated as the “true” solution. See Fig. 1 for the solution \(z_{\mathbb P}\) with \(M = 100\) and its behavior out of sample on the mean value of the state u. As expected, the control behaves well on average.
(l) Optimal solution \(z_{\mathbb P}\) of (25) for \(M = 100\), \(h = 1/(2^{10}-1)\). (r) Empirical mean value of \(u(z_{\mathbb P})\) out of sample using 1000 iid random realizations of \((\xi _1,\dots ,\xi _4)\)
We now investigate \(d_{\mathfrak {F}}(\mathbb P_{N},\mathbb P)\), where \(\mathbb P_{N}\) is an empirical probability measure based on a random sample of size N (\(1 \le N \le M\)) of the M realizations \(\varvec{\xi }^1,\dots , \varvec{\xi }^{M}\). Since a direct calculation of \(d_{\mathfrak {F}}(\mathbb P_{N},\mathbb P)\) for even simple examples can be quite challenging, we calculate instead \(v(\mathbb P_N)\) and \(z_{\mathbb P_{N}}\) and compute the associated errors
The experiment is repeated for each \(N =1,\dots , M\) 100 times. The plots of these errors can be see in Fig. 2.
The choice of setting \(M =100\) was made for computationally expediency. Though the semismooth Newton method implemented for this example works exceptionally well, it still requires hundreds if not thousands of PDE solves for each N. Nevertheless, due to the relatively small choice of M we cannot use all the data points in Fig. 2 in order to estimate a rate of convergence. As a compromise, we only use the data points for \(N =1,\dots , 25\). This provides us with the estimates
The estimates were generated by solving a linear least-squares regression problem with ridge (\(\ell ^2\)) regularization term. Whereas the convergence of the optimal values would indicate a rate of roughly 1/2, the rate of convergence for the optimal solutions would be a marked improvement over the theoretical estimates based on probability metrics. One explanation for this would be that the integrands for this specific model problem have much better smoothness properties than assumed. We leave a rigorous analysis of this question for future research.
8 Conclusion
We have shown that a number known results for stability of stochastic programs with finite-dimensional decision spaces can be carried over to infinite dimensions, provided certain convexity conditions are satisfied. As perhaps expected the best possible growth rates for the convergence of solutions using probability metrics are of Hölder-type. Our analysis gives rise to a number of possible future directions and open questions. For example, we consider a setting that is primarily related to risk-neutral problems, whereas problems using typically non-smooth risk measures in the objective are of significant interest for robust engineering design. In addition, the assumptions on the objective and linear operator \(\varSigma \) are essential for the qualitative stability analysis as the infinite-dimensional setting often requires us to make use of the weak topology. Without such regularity properties, it is unclear how to proceed in general. Finally, in order to develop adaptive numerical optimization methods based on estimates of the type (21), we need to more closely investigate the convergence of \(d_{\mathfrak {F}}(\mathbb P_N,\mathbb P)\) for the class of functions \(\mathfrak {F}\) used in Sect. 6. and specific approximations \(\mathbb P_N\). Section 7 provides some deeper insight into this question, however the picture is far from complete.
References
Alt, H.W.: Linear functional analysis. Universitext: an application-oriented introduction. Springer, London Ltd, London. Translated from the German edition by Robert Nürnberg (2016). https://doi.org/10.1007/978-1-4471-7280-2
Antil, H., Leykekhman, D.: A brief introduction to PDE-constrained optimization. Frontiers in PDE-constrained optimization. IMA Vol. Math. Appl., vol. 163, pp. 3–40. Springer, New York (2018)
Attouch, H., Buttazzo, G., Michaille, G.: Variational analysis in Sobolev and BV spaces. MPS/SIAM Series on Optimization, vol. 6. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2006)
Bachmayr, M., Cohen, A., Migliorati, G.: Sparse polynomial approximation of parametric elliptic PDES. PART 1: Affine coefficients. ESAIM: M2AN 51(1), 321–339 (2017). https://doi.org/10.1051/m2an/2016045
Cohen, A., Devore, R., Schwab, C.: Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDE’s. Anal. Appl. (Singapore) 9(1), 11–47 (2011). https://doi.org/10.1142/S0219530511001728
Dereich, S., Scheutzow, M., Schottstedt, R.: Constructive quantization: approximation by empirical measures. Ann. Inst. Henri Poincaré Probab. Stat. 49(4), 1183–1203 (2013). https://doi.org/10.1214/12-AIHP489
Dontchev, A.L., Rockafellar, R.T.: Implicit functions and solution mappings: a view from variational analysis. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2014)
Dudley, R.M.: Real analysis and probability. The Wadsworth & Brooks/Cole Mathematics Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA (1989)
Dunford, N., Schwartz, J.T.: Linear operators. Part I. Wiley Classics Library. General theory, With the assistance of William G. Bade and Robert G. Bartle, Reprint of the 1958 original, A Wiley-Interscience Publication. Wiley, New York (1988)
Falk, R.S.: Approximation of a class of optimal control problems with order of convergence estimates. J. Math. Anal. Appl. 44, 28–47 (1973). https://doi.org/10.1016/0022-247X(73)90022-X
Fournier, N., Guillin, A.: On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Related Fields 162(3–4), 707–738 (2015). https://doi.org/10.1007/s00440-014-0583-7
Gajewski, H., Gröger, K., Zacharias, K.: Nichtlineare Operatorgleichungen und Operatordifferentialgleichungen. Mathematische Lehrbücher und Monographien. II, Abteilung, Mathematische Monographien, Band 38. Akademie-Verlag, Berlin (1974)
Hille, E., Phillips, R.S.: Functional analysis and semi-groups. vol. 31, Rev. ed. American Mathematical Society Colloquium Publications, American Mathematical Society, Providence (1957)
Hintermüller, M., Ito, K., Kunisch, K.: The primal-dual active set strategy as a semismooth Newton method. SIAM J. Optim. 13(3), 865–888 (2002)
Hinze, M., Pinnau, R., Ulbrich, M., Ulbrich, S.: Optimization with PDE constraints. Mathematical Modelling: Theory and Applications, vol. 23. Springer, New York (2009)
Kouri, D.P., Surowiec, T.M.: Risk-averse PDE-constrained optimization using the conditional value-at-risk. SIAM J. Optim. 26(1), 365–396 (2016). https://doi.org/10.1137/140954556
Kouri, D.P., Surowiec, T.M.: Existence and optimality conditions for risk-averse PDE-constrained optimization. SIAM/ASA J. Uncertain. Quantif. 6(2), 787–815 (2018). https://doi.org/10.1137/16M1086613
Müller, A.: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)
Rachev, S.T.: Probability metrics and the stability of stochastic models. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, Chichester (1991)
Rachev, S.T., Römisch, W.: Quantitative stability in stochastic programming: the method of probability metrics. Math. Oper. Res. 27(4), 792–818 (2002). https://doi.org/10.1287/moor.27.4.792.304
Ramsay, J.O., Silverman, B.W.: Functional data analysis. Springer Series in Statistics, 2nd edn. Springer, New York (2005)
Römisch, W.: Stability of stochastic programming problems. In: Stochastic programming, Handbooks Oper. Res. Management Sci., vol. 10, pp. 483–554. Elsevier, Amsterdam (2003). https://doi.org/10.1016/S0927-0507(03)10008-4
Topsøe, F.: On the connection between P-continuity and P-uniformity in weak convergence. Probab. Theory Appl. 12, 281–290 (1967)
Ulbrich, M.: Semismooth Newton methods for operator equations in function spaces. SIAM J. Optim. 13(3), 805–841 (2002)
Ulbrich, M.: Semismooth Newton methods for variational inequalities and constrained optimization problems in function spaces. MOS-SIAM Series on Optimization, vol. 11. SIAM, MOS, Philadelphia, PA (2011). https://doi.org/10.1137/1.9781611970692
Vexler, B., Wollner, W.: Adaptive finite elements for elliptic optimization problems with control constraints. SIAM J. Control Optim. 47(1), 509–534 (2008). https://doi.org/10.1137/070683416
Zolotarev, V.M.: Probability metrics. Theory Probab. Theory Appl. 28(2), 278–302 (1983)
Acknowledgements
TMS’s research was sponsored by the DFG Grants no. SU 963/1-1 and SU 963/2-1. The authors wish to express their gratitude to the two anonymous referees for their helpful and constructive comments.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Results from fixed point theory
A Results from fixed point theory
The following is a result can be found in the monograph [12].
We provide a translation here along with a short proof for the reader’s convenience
Proposition 2
(Lemma 3.1 in [12]) Let V be a Hilbert space with inner product \((\cdot ,\cdot )\), dual \(V^{\star }\) and dual pairing \(\langle \cdot ,\cdot \rangle \), \(b\in V^{\star }\), \(A:V\rightarrow V^{\star }\) a strongly monotone (with constant \(\gamma >0\)) and Lipschitz continuous (with modulus \(L>0\)) operator, and \(J:V\rightarrow V^{\star }\) the duality mapping, i.e., \(\langle Ju,v\rangle =(u,v)\), \(\forall u,v\in V\).
Then the mapping \(K_{t}:V\rightarrow V\) given by
is a contraction with constant \(0<\kappa (t)<1\), where
and \(t\in (0,\frac{2\gamma }{L^2})\). Moreover, the unique fixed point of \(K_t\) is the unique solution of \(Ax=b\) and belongs to the ball around zero with radius \(r=(1-\kappa (t))^{-1}\Vert K_{t}0-0\Vert =t(1-\kappa (t))^{-1}\Vert A0-b\Vert _{\star }\).
Remark 2
Note that \(\min \kappa (t)=L^{-1}\sqrt{L^2-\gamma ^2}\) and, hence, \(\kappa (t)\) is typically close to 1.
Proof
Let \(x, x'\in H\). Then
Clearly, \(0<\kappa (t)<1\) iff \(t\in (0,\frac{2\gamma }{L^2})\). Furthermore, the unique solution \(\bar{x}(b)\) of \(Ax=b\) satisfies
from which immediately obtain the estimate
This finishes the proof. \(\square \)
The following result can be found, e.g., in [7].
Proposition 3
(Theorem 1A.4 in [7]) Let P be a metric space with metric \(\rho \) and X a complete metric space with metric d. Let \(F: P\times X\rightarrow X\) and assume that there exist \(\alpha \in (0,1)\) and \(\lambda >0\) such that
Then, for each \(p\in P\), there exists a unique fixed point x(p) of \(F(p,\cdot )\) in X and we have the estimate
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hoffhues, M., Römisch, W. & Surowiec, T.M. On quantitative stability in infinite-dimensional optimization under uncertainty. Optim Lett 15, 2733–2756 (2021). https://doi.org/10.1007/s11590-021-01707-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-021-01707-2
Keywords
- Stability
- Stochastic programming
- Optimization under uncertainty
- Probability metrics
- PDE-constrained optimization
- Functional data analysis