Sample Average Approximations of Strongly Convex Stochastic Programs in Hilbert Spaces

We analyze the tail behavior of solutions to sample average approximations (SAAs) of stochastic programs posed in Hilbert spaces. We require that the integrand be strongly convex with the same convexity parameter for each realization. Combined with a standard condition from the literature on stochastic programming, we establish non-asymptotic exponential tail bounds for the distance between the SAA solutions and the stochastic program's solution, without assuming compactness of the feasible set. Our assumptions are verified on a class of infinite-dimensional optimization problems governed by affine-linear partial differential equations with random inputs. We present numerical results illustrating our theoretical findings.


Introduction
We apply the sample average approximation (SAA) to a class of strongly convex stochastic programs posed in Hilbert spaces, and study the tail behavior of the distance between SAA solutions and their true counterparts. Our work sheds light on the number of samples needed to reliably estimate solutions to infinitedimensional, linear-quadratic optimal control problems governed by affine-linear partial differential equations (PDEs) with random inputs, a class of optimization problems that has received much attention recently [30,40,42]. Our analysis requires that the integrand be strongly convex with the same convexity parameter for each random element's sample. This assumption is fulfilled for convex optimal controls problems with a strongly convex control regularizer, such as those considered in [30,40,42]. Throughout the paper, a function f : H → R ∪ {∞} is α-strongly convex with parameter α > 0 if f(·) − (α/2) · 2 H is convex, where H is a real Hilbert space with norm · H . Moreover, a function on a real Hilbert space is strongly convex if it is α-strongly convex with some parameter α > 0.
We consider the potentially infinite-dimensional stochastic program where U is a real, separable Hilbert space, Ψ : U → R ∪ {∞} is proper, lowersemicontinuous and convex, and J : U × Ξ → R is the integrand. Moreover, ξ is a random element mapping from a probability space to a complete, separable metric space Ξ equipped with its Borel σ-field. We also use ξ ∈ Ξ to represent a deterministic element. Let ξ 1 , ξ 2 , . . . be independent identically distributed Ξ-valued random elements defined on a complete probability space (Ω, F, P ) such that each ξ i has the same distribution as that of ξ. The SAA problem corresponding to (1) is We define F : U → R ∪ {∞} and the sample average function F N : U → R by Since we assume that the random elements ξ 1 , ξ 2 , . . . are defined on the common probability space (Ω, F, P ), we can view the functions f N and F N as defined on U × Ω and the solution u * N to (2) as a mapping from Ω to U . The second argument of f N and of F N is often dropped.
Let u * be a solution to (1) and u * N be a solution to (2). We assume that J(·, ξ) is α-strongly convex with parameter α > 0 for each ξ ∈ Ξ. Furthermore, we assume that F (·) and J(·, ξ) for all ξ ∈ Ξ are Gâteaux differentiable. Under these assumptions, we establish the error estimate valid with probability one. If ∇ u J(u * , ξ) U is integrable, then ∇F N (u * ) is just the empirical mean of ∇F (u * ) since F (·) and J(·, ξ) for all ξ ∈ Ξ are convex and Gâteaux differentiable at u * ; see Lemma 3. Hence we can analyze the mean square error E[ u * − u * yielding with (4) the bound To derive exponential tail bounds on u * − u * N U , we further assume the existence of τ > 0 with This condition and its variants are used, for example, in [15,25,44]. Using Jensen's inequality, we find that (7) implies (5) with σ 2 = τ 2 [44, p. 1584]. Combining (4) and (7) with the exponential moment inequality proven in [48,Thm. 3], we establish the exponential tail bound, our main contribution, This bound solely depends on the characteristics of J but not on properties of the feasible set, { u ∈ U : Ψ(u) < ∞ }, other than its convexity. For each δ ∈ (0, 1), the exponential tail bound yields, with a probability of at least 1 − δ, In particular, if ε > 0 and N ≥ 3τ 2 α 2 ε 2 ln(2/δ), then u * − u * N U < ε with a probability of at least 1 − δ, that is, u * can be estimated reliably via u * N . Requiring J(·, ξ) to be α-strongly convex for each ξ ∈ Ξ is a restrictive assumption. However, it is fulfilled for the following class of stochastic programs: where α > 0, H and U are real Hilbert spaces, and K(ξ) : U → H is a bounded, linear operator and h(ξ) ∈ H for each ξ ∈ Ξ. The control problems governed by affine-linear PDEs with random inputs considered, for example, in [21,22,30,41,42] can be formulated as instances of (10). In many of these works, the operator K(ξ) is compact for each ξ ∈ Ξ, the expectation function is twice continuously differentiable, and U is infinite-dimensional. In this case, the function F 1 generally lacks strong convexity. This may suggest that the α-strong convexity of the objective function of (10) is solely implied by the function (α/2) · 2 U + Ψ(·). The lack of the expectation function's strong convexity is essentially known [6, p. 3]. For example, if the set Ξ has finite cardinality, then the Hessian ∇ 2 F 1 (0) is the finite sum of compact operators and hence F 1 lacks strong convexity; see Sect. 6.
A common notion used to analyze the SAA solutions is that of an ε-optimal solution [54,56,57]. 1 We instead study the tail behavior of u * − u * N U since in the literature on PDE-constrained optimization the focus is on studying the proximity of approximate solutions to the "true" ones. For example, when analyzing finite element approximations of PDE-constrained problems, bounds on the error w * − w * h U as functions of the discretization parameters h are often established [28,60], where w * is the solution to a control problem and w * h is the solution to its finite element approximation. The estimate (4) is similar to that established in [28, p. 49] for the variational discretization-a finite element approximation-of a deterministic, linear-quadratic control problem. Since both the variational discretization and the SAA approach yield perturbed optimization problems, it is unsurprising that similar techniques can be used for some parts of the perturbation analysis.
The SAA approach has thoroughly been analyzed, for example, in [4,7,50,54,57,56]. Some consistency results for the SAA solutions and finite-sample size estimates require the compactness and total boundedness of the feasible set, respectively. However, in the literature on PDE-constrained optimization, the feasible sets are commonly noncompact; see, e.g., [29,Sect. 1.7.2.3]. Assuming that the function F defined in (3) is α-strongly convex with α > 0, Kouri and Shapiro [35,eq. (42) The setting in [35] corresponds to Ψ being the indicator function of a closed, convex, nonempty subset of U . In contrast to the estimate (4), the right-hand side in (11) depends on the random control u * N . This dependence implies that the right-hand side in (11) is more difficult to analyze than that in (4). However, the convexity assumption on F made in [35] is weaker than ours which requires the function J(·, ξ) be α-strongly convex for all ξ ∈ Ξ. The right-hand side (11) may be analyzed using the approaches developed in [53,Sects. 2 and 4].
For finite-dimensional optimization problems, the number of samples, required to obtain ε-optimal solutions via the SAA approach, can explicitly depend on the problem's dimension [1], [55,Ex. 1], [25,Prop. 2]. Guigues, Juditsky, and Nemirovski [25] demonstrate that confidence bounds on the optimal value of stochastic, convex, finite-dimensional programs, constructed via SAA optimal values, do not explicitly depend on the problem's dimension. This property is shared by our exponential tail bound.
After the initial version of the manuscript was submitted, we became aware of the papers [61,52] where assumptions similar to those used to derive (6) and (8) are utilized to analyze the reliability of SAA solutions. For unconstrained minimization in R n with Ψ = 0, tail bounds for u * − u * N 2 are established in [61] under the assumption that J(·, ξ) is α-strong convex for all ξ ∈ Ξ and some α > 0. Here, · 2 is the Euclidean norm on R n . Assuming further that ∇ u J(u * , ξ) 2 is essentially bounded by L > 0, the author establishes if ε ∈ (0, L/α], and the right-hand side in (12) is zero otherwise [61,Cor. 2]. While (12) is similar to (8) with τ = L, its derivation exploits the essential boundedness of ∇ u J(u * , ξ) 2 which is generally more restrictive than (7). The author establishes further tail bounds for u * − u * N 2 under different sets of assumptions on J(·, ξ), and provides exponential tail bounds for f (u * N ) − f (u * ) assuming that J(·, ξ) is Lipschitz continuous with a Lipschitz constant independent of ξ (see [61,Thm. 5]). For the possibly infinite-dimensional program (1), similar assumptions are used in [52,Thm. 2] to establish a non-exponential tail bound for f (u * N ) − f (u * ). While tail bounds for f (u * N ) − f (u * ) are derived in [61,52], the assumptions used to derive (6) and (8) do not imply bounds on f (u * N ) − f (u * ). Hoffhues, Römisch, and Surowiec [30] provide qualitative and quantitative stability results for the optimal value and for the optimal solutions of stochastic, linear-quadratic optimization problems posed in Hilbert spaces, similar to those in (10), with respect to Fortet-Mourier and Wasserstein metrics. These stability results are valid for approximating probability measures other than the empirical one, which is used to define the SAA problem (2). However, the convergence rate 1/N for E[ u * − u * N 2 U ], and exponential tail bounds on u * − u * N U are not established in [30]. For a class of constrained, linear elliptic control problems, Römisch and Surowiec [49] demonstrate the consistency of the solutions and the optimal value, the convergence rate 1/ ) to a real-valued random variable. These results are established using empirical process theory and are built on smoothness of the random elliptic operator and right-hand side with respect to the parameters. While our assumptions yield the mean square error bound (6) and the exponential tail bound (8), further condi-tions may be required to establish bounds on (6) is established in [41,Thm. 4.1] for class of linear elliptic control problems.
Besides considering risk-neutral, convex control problems with PDEs which can be expressed as those in Sect. 6, the authors of [40,42] study the minimization of u → Prob(J(u, ξ) ≥ ρ), where ρ ∈ R and evaluating J(u, ξ) requires solving a PDE. Furthermore, the authors of [40,42] prove the existence of solutions and use stochastic collocation to discretize the expected values. In [42,Sect. 5.3], the authors adaptively combine a Monte Carlo sampling approach with a stochastic Galerkin finite element method to reduce the computational costs, but error bounds are not established. Stochastic collocation is also used, for example, in [21,34]. Further approaches to discretize the expected value in (10) are, for example, quasi-Monte Carlo sampling [26] and low-rank tensor approximations [20]. A solution method for (1) is (robust) stochastic approximation. It has thoroughly been analyzed in [38,44] for finite-dimensional and in [22,24,45] for infinite-dimensional optimization problems. For reliable εoptimal solutions, the sample size estimates established in [44, Prop. 2.2] do not explicitly depend on the problem's dimension.
After providing some notation and preliminaries in Sect. 2, we establish exponential tail bounds for Hilbert space-valued random sums in Sect. 3. Combined with optimality conditions and the integrand's α-strong convexity, we establish exponential tail and mean square error bounds for SAA solutions in Sect. 4. Sect. 5 demonstrates the optimality of the tail bounds. We apply our findings to linear-quadratic control under uncertainty in Sect. 6, and identify a problem class that violates the integrability condition (7). Numerical results are presented in Sect. 7. In Sect. 8, we illustrate that the "dynamics" of finiteand infinite-dimensional stochastic programs can be quite different.

Notation and Preliminaries
Throughout the manuscript, we assume the existence of solutions to (1) and to (2). We refer the reader to [36,Prop. 3.12] and [30, Thm. 1] for theorems on the existence of solutions to infinite-dimensional stochastic programs.
The set dom Ψ = { u ∈ U : Ψ(u) < ∞ } is the domain of Ψ. The indicator function I U0 : U → R ∪ {∞} of a nonempty set U 0 ⊂ U is defined by I U0 (u) = 0 if u ∈ U 0 and I U0 (u) = ∞ otherwise. Let (Ω,F,P ) be a probability space. A Banach space W is equipped with its Borel σ-field B(W ). We denote by (·, ·) H the inner product of a real Hilbert space H equipped with the norm · H given For two Banach spaces V and W , L (V, W ) is the space of bounded, linear operators from V to W , and V * = L (V, R). We denote by ·, · V * ,V the dual pairing of V * and V . A function υ :Ω → W is strongly measurable if there exists a sequence of simple functions υ k :

Exponential Tail Bounds for Hilbert Space-Valued Random Sums
We establish two exponential tail bounds for Hilbert space-valued random sums which are direct consequences of known results [47,48]. Below, (Θ, Σ, µ) denotes a probability space. Proofs are presented at the end of the section.
As an alternative to the condition E[exp(τ −2 Z 2 H )] ≤ e used in Theorem 1 for τ > 0 and a random vector Z : Θ → H, we can express sub-Gaussianity with E[cosh(λ Z H )] ≤ exp(λ 2 σ 2 /2) for all λ ∈ R and some σ > 0. While these two conditions are equivalent up to problem-independent constants (see the proof of [11, Lem. 1.6 on p. 9] and Lemma 1), the constant σ can be smaller than τ . For example, if Z : Θ → H is a H-valued, mean-zero Gaussian random vector, then the latter condition holds with Proposition 1. Let H be a real, separable Hilbert space, and let Z i : Θ → H be independent, mean-zero random vectors such that We apply the following two facts to prove Theorem 1 and Proposition 1.
Proof of Proposition 1. We have exp(s)−s ≤ cosh(s 3/2) for all s ∈ R. Hence, the assumptions ensure E[exp(λ Z i H )−λ Z i H ] ≤ exp(3λ 2 σ 2 /4) for all λ ∈ R. The remainder of the proof is as that of Theorem 1.

Exponential Tail Bounds for SAA Solutions
We state conditions that allow us to derive exponential bounds on the tail probabilities of the distance between SAA solutions and their true counterparts. In Sect. 6, we demonstrate that our conditions are fulfilled for many linearquadratic control problems considered in the literature.

Assumptions and Measurability of SAA Solutions
Throughout the manuscript, u * is assumed to be a solution to (1).

Exponential Tail and Mean Square Error Bounds
We establish exponential tail and mean square error bounds on u * − u * N U . Theorem 3. Let u * be a solution to (1) and let u * N be a solution to (2). If Assumptions 1 and 2 (a) hold, then If in addition Assumption 2 (b) holds, then for all ε > 0, We prepare our proof of Theorem 3.  (3) is Gâteaux differentiable on a neighborhood of dom Ψ and with probability one,

Lemma 4. If Assumption 1 holds, then the function F N defined in
Proof. Since, for each ξ ∈ Ξ, J(·, ξ) is α-strongly convex and Gâteaux differentiable on a convex neighborhood V of dom Ψ, the sum rule and the definition of F N imply its Gâteaux differentiability on V and (18)  Lemma 5. Let Assumption 1 hold and let ω ∈ Ω be fixed. Suppose that u * is a solution to (1) and that u * Proof. Following the proof of [32,Thm. 4.42], we obtain for all u ∈ dom Ψ, We have Ψ(u * ), Ψ(u * N ) ∈ R. Choosing u = u * N in the first and u = u * in the second estimate in (20), and adding the resulting inequalities yields (19).
Proof. Choosing u 2 = u * and u 1 = u * N in (18), we find that (19), and the Cauchy-Schwarz inequality, we get Proof of Theorem 3. Lemma 2 ensures the measurability of u * N : Ω → U . We define q : Ξ → U by q(ξ) = ∇ u J(u * , ξ) − ∇F (u * ). Assumptions 1 (c) and 1 (e) ensure that q is well-defined and measurable. Hence, the random vectors Z i = q(ξ i ) (i = 1, 2, . . .) are independent identically distributed, and Lemma 3 ensures that they have zero mean. Using the definitions of F and of F N provided in (3), the Gâteaux differentiability of F at u * (see Assumption 1 (e)), and Lemma 4, we obtain Now, we prove (16). Combining the above statements with the separability of the Hilbert space U , we get E[ U ] ≤ σ 2 /N, yielding the mean square error bound (16).

Application to Linear-Quadratic Optimal Control
We consider the linear-quadratic optimal control problem where α > 0, Q ∈ L (Y, H), y d ∈ H and H is a real, separable Hilbert space. In this section, U and Ψ : U → R ∪ {∞} fulfill Assumptions 1 (a) and 1 (b), respectively. The parameterized solution operator S : U × Ξ → Y is defined as follows. For each (u, ξ) ∈ U × Ξ, S(u, ξ) is the solution to: find y ∈ Y : A(ξ)y + B(ξ)u = g(ξ).
Defining K(ξ) = −QA(ξ) −1 B(ξ) and h(ξ) = QA −1 (ξ)g(ξ) − y d , the control problem (24) can be written as We discuss differentiablity and the lack of strong convexity of the expectation function F 1 : U → R ∪ {∞} defined by Assumption 3. The map K : Ξ → L (U, H) is strongly measurable and h : We define the integrand J 1 : U × Ξ → R by Under the measurability conditions stated in Assumption 3, we can show that J 1 is a Carathéodory function. Assumption 3 implies that the function F 1 defined in (27) is smooth.
The function F 1 defined in (27) lacks strong convexity under natural conditions; see Lemma 8. In this case, we may deduce that the strong convexity of the objective function of (24) solely comes from the function (α/2) · 2 U + Ψ(·), and that the largest strong convexity parameter of F (·) = F 1 (·) + (α/2) · 2 U is α > 0.  We show that E[T (ξ)] is a compact operator. Let (v k ) ⊂ U be weakly converging to somev ∈ U . Hence there exists C ∈ (0, ∞) with v k U ≤ C for all k ∈ N [37, Thm. 4.8-3] which implies T (ξ)v k U ≤ C T (ξ) L (U,U ) for each ξ ∈ Ξ and k ∈ N. Since T (ξ) is compact for all ξ ∈ Ξ, we have for Now, we show that F 1 is not strongly convex. Since U is infinite-dimensional, the self-adjoint, compact operator E[T (ξ)] lacks a bounded inverse [37, p. 428 Lemma 7 and [27,p. 85]), we conclude that F 1 is not strongly convex.
The compactness of the Hessian of F 1 may also be studied using the theory on spectral decomposition of compact, self-adjoint operators [63, p. 159], or the results on the compactness of covariance operators [63, p. 174].

Examples
Many instances of the linear-quadratic control problem (24) frequently encountered in the literature are defined by the following data: The conditions imply that A(ξ) has a bounded inverse for each ξ ∈ Ξ [37, p. 101] and imply the existence of a solution to (24) when combined with Fatou's lemma; cf. [30,Thm. 1]. Moreover Assumptions 1-4 hold true.

Numerical Illustration
We empirically verify the results derived in Theorem 3 for finite element discretizations of two linear-quadratic, elliptic optimal control problems, which are instances of (24).
Since κ(ξ) = ξ 1 is a real-valued random variable, we can evaluate ∇F 1 (u) and its empirical mean using only two PDE solutions which can be shown by dividing (25) by κ(ξ). It allows us to compute the solutions to the finite element approximation of (24) and to their SAA problems with moderate computational effort even though n = 256 is relatively large.
To obtain a deterministic reference solution to the finite element approximation of (24), we approximate the probability distribution of ξ by a discrete uniform distribution. It is supported on the grid points of a uniform mesh of Ξ using 50 grid points in each direction, yielding a discrete distribution with 2500 scenarios. Samples for the SAA problems are generated from this discrete distribution.
We used dolfin-adjoint [16,18,43] with FEniCs [3,39] to evaluate the SAA objective functions and their derivatives, and solved the problems using moola's NewtonCG method [18,51]. Figure 1 depicts the reference solutions for Examples 1 and 2. To generate the surface plots depicted in Figure 1, the piecewise constant reference solutions were interpolated to the space of piecewise linear continuous functions.
To illustrate the convergence rate 1/ √ N for E[ u * − u * N U ], we generated 50 independent samples of u * − u * N U and computed the sample average. In order to empirically verify the exponential tail bound (17), we use the fact that it is equivalent to a certain bound on the Luxemburg norm of u * − u * N . We define the Luxemburg norm · L φ (Ω;U ) of a random vector Z : Ω → U by where φ : R → R is given by φ(x) = exp(x 2 ) − 1, and L φ (Ω; U ) = L φ( · U ) (Ω; U ) is the Orlicz space consisting of each random vector Z : Ω → U such that there exists ν > 0 with E[φ( Z U /ν)] < ∞; cf. [33,Sect. 6.2]. The exponential tail   bound (17) implies and (30) ensures Prob( u * − u * N U ≥ ε) ≤ 2e −τ −2 N ε 2 α 2 /27 for all ε > 0. These two statements follow from [11, Thm. 3.4 on p. 56] when applied to the realvalued random variable u * − u * N U . To empirically verify the convergence rate 1/ √ N for u * − u * N L φ (Ω;U ) , we approximated the expectation in (29) using the same samples used to estimate E[ u * − u * N U ]. as well as the corresponding convergence rates. The rates were computed using least squares. The empirical convergences rates depicted in Figure 2 are close to the theoretical rate 1/ √ N for E[ u * − u * N U ] and u * − u * N L φ (Ω;U ) ; see (16) and (30).

Discussion
We have considered convex stochastic programs posed in Hilbert spaces where the integrand is strongly convex with the same parameter for each random element's realization. We have established exponential tail bounds for the distance between SAA solutions and the true ones. For this problem class, tail bounds are optimal up to problem-independent, moderate constants. We have applied our findings to stochastic linear-quadratic control problems, a subclass of the above problem class.