Parabolic PDE-constrained optimal control under uncertainty with entropic risk measure using quasi-Monte Carlo integration

We study the application of a tailored quasi-Monte Carlo (QMC) method to a class of optimal control problems subject to parabolic partial differential equation (PDE) constraints under uncertainty: the state in our setting is the solution of a parabolic PDE with a random thermal diffusion coefficient, steered by a control function. To account for the presence of uncertainty in the optimal control problem, the objective function is composed with a risk measure. We focus on two risk measures, both involving high-dimensional integrals over the stochastic variables: the expected value and the (nonlinear) entropic risk measure. The high-dimensional integrals are computed numerically using specially designed QMC methods and, under moderate assumptions on the input random field, the error rate is shown to be essentially linear, independently of the stochastic dimension of the problem -- and thereby superior to ordinary Monte Carlo methods. Numerical results demonstrate the effectiveness of our method.


Introduction
Many problems in science and engineering, including optimal control problems governed by partial differential equations (PDEs), are subject to uncertainty.If not taken into account, the inherent uncertainty of such problems has the potential to render worthless any solutions obtained using state-of-the-art methods for deterministic problems.The careful analysis of the uncertainty in PDE-constrained optimization is hence indispensable and a growing field of research (see, e.g., [5,6,7,18,29,30,36,37,45,47,48]).
In this paper we consider the heat equation with an uncertain thermal diffusion coefficient, modelled by a series in which a countably infinite number of independent random variables enter affinely.By controlling the source term of the heat equation, we aim to steer its solution towards a desired target state.To study the effect of this randomness on the objective function, we consider two risk measures: the expected value and the entropic risk measure, both involving integrals with respect to the countably infinite random variables.The integrals are replaced by integrals over finitely many random variables by truncating the series that represents the input random field to a sum over finitely many terms and then approximated using quasi-Monte Carlo (QMC) methods.† Johann Radon Institute for Computational and Applied Mathematics, ÖAW, Altenbergerstrasse 69, 4040 Linz, Austria.E-mail: philipp.guth@ricam.oeaw.ac.at ‡ Fachbereich Mathematik und Informatik, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany.E-mail: vesa.kaarnioja@fu-berlin.de • c.schillings@fu-berlin.de § School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia.E-mail: f.kuo@unsw.edu.au• i.sloan@unsw.edu.auQMC approximations are particularly well suited for optimization since they preserve convexity due to their nonnegative (equal) cubature weights.Moreover, for sufficiently smooth integrands it is possible to construct QMC rules with error bounds not depending on the number of stochastic variables while attaining faster convergence rates compared to Monte Carlo methods.For these reasons QMC methods have been very successful in applications to PDEs with random coefficients (see, e.g., [2,11,16,19,20,24,25,26,33,34,35,39,42,43]) and especially in PDE-constrained optimization under uncertainty, see [22,23].In [32] the authors derive regularity results for the saddle point operator, which fall within the same framework as the QMC approximation of affine parametric operator equation setting considered in [43].
This paper builds upon our previous work [22].The novelty lies in the use and analysis of parabolic PDE constraints in conjunction with the nonlinear entropic risk measure, which inspired the development of an error analysis that is applicable in separable Banach spaces and thus discretization invariant.Solely based on regularity assumptions, our novel error analysis covers a very general class of problems.Specifically, we extend QMC error bounds in the literature (see, e.g., [12,33]) to separable Banach spaces.A crucial part of our new work is the regularity analysis of the entropic risk measure, which is used to prove our main theoretical results about error estimates and convergence rates for the dimension truncation and the QMC errors.We then apply these new bounds to assess the total errors in the optimal control problem under uncertainty.
The structure of this paper is as follows.The parametric weak formulation of the PDE problem is given in Section 2. The corresponding optimization problem is discussed in Section 3, with linear risk measures considered in Subsection 3.1, the entropic risk measures in Subsection 3.2, and optimality conditions in Subsection 3.3.While the regularity of the adjoint PDE problem is the topic of Section 4, the regularity analysis for the entropic risk measure is addressed in Section 5. Section 6 contains the main error analysis of this paper.Subsection 6.1 covers the truncation error and Subsection 6.2 analyzes the QMC integration error.Our approach differs from most studies of QMC in the literature insofar as we develop the QMC and dimension truncation error theory for the full PDE solutions (with respect to an appropriately chosen function space norm) instead of considering the composition of the PDE solution with a linear functional.In Section 7 we confirm our theoretical findings with supporting numerical experiments.Section 8 is a concluding summary of this paper.

Problem formulation
Let D ⊂ R d , d ∈ {1, 2, 3}, denote a bounded physical domain with Lipschitz boundary, let I := [0, T ] denote the time interval with finite time horizon 0 < T < ∞, and let U := [− 1  2 , 1 2 ] N denote a space of parameters.The components of the sequence y ∈ U are realizations of independently and identically distributed uniform random variables in [− 1  2 , 1  2 ], and the corresponding probability measure is µ(dy) = j≥1 dy j = dy.Let a y (x, t) := a 0 (x, t) + j≥1 y j ψ j (x, t), x ∈ D, y ∈ U, t ∈ I, (2.1) be an uncertain (thermal) diffusion coefficient, where we assume (i) for a.e.t ∈ I we have a 0 (•, t) ∈ L ∞ (D), ψ j (•, t) ∈ L ∞ (D) for all j ≥ 1, and that (sup t∈I ∥ψ j (•, t)∥ L ∞ (D) ) j≥1 ∈ ℓ 1 ; (ii) t → a y (x, t) is measurable on I; (iii) uniform ellipticity: there exist positive constants a min and a max such that 0 < a min ≤ a y (x, t) ≤ a max < ∞ for all x ∈ D, y ∈ U and a.e.t ∈ I. Time-varying diffusion coefficients occur e.g., in finance or cancer tomography.However, the presented setting clearly also includes time-constant diffusion coefficients, i.e., a y (x, t) = a y (x) ∀t ∈ I.
We consider the heat equation over the time interval I = [0, T ], given by the partial differential equation (PDE)      ∂ ∂t u y (x, t) − ∇ • a y (x, t)∇u y (x, t) = z(x, t), x ∈ D, t ∈ I, u y (x, t) = 0, x ∈ ∂D, t ∈ I, u y (x, 0) = u 0 (x), x ∈ D, for all y ∈ U .Here z(x, t) is the control and u 0 ∈ L 2 (D) denotes the initial heat distribution.We denote the input functions collectively by f := (z, u 0 ).We have imposed homogeneous Dirichlet boundary conditions.Given a target state u(x, t), we will study the problem of minimizing the following objective function: subject to the PDE (2.2) and constraints on the control to be defined later in the manuscript.By R we denote a risk measure, which is a functional that maps a set of random variables into the extended real numbers.Specifically, R will later be either the expected value or the entropic risk measure, both involving high-dimensional integrals with respect to y.
We will first introduce a function space setting to describe the problem properly, including the definition of the L 2 (V ; I) and L 2 (V ′ ; I) norms.

Function space setting
We define V := H 1 0 (D) and its (topological) dual space V ′ := H −1 (D), and identify L 2 (D) with its own dual.Let ⟨•, •⟩ V ′ ,V denotes the duality pairing between V ′ and V .The norm and inner product in V are defined as usual by We shall make use of the Riesz operator R V : V → V ′ defined by as well as its inverse In turn we define the inner product in V ′ by The norm induced by this inner product is equal to the usual dual norm.
We use analogous notations for inner products and duality pairings between function spaces on the space-time cylinder D × I.The space L 2 (V ; I) consists of all measurable functions v : I → V with finite norm Note that (L 2 (V ; I)) ′ = L 2 (V ′ ; I), with the duality pairing given by We extend the Riesz operator and we extend the inverse R −1 V : L 2 (V ′ ; I) → L 2 (V ; I) analogously.We define the space of solutions u y for y ∈ U by which is the space of all functions v in L 2 (V ; I) with (distributional) derivative ∂ ∂t v in L 2 (V ′ ; I), and which is equipped with the (graph) norm Finally, because there are two inputs in equation (2.2), namely z ∈ L 2 (V ′ ; I) and u 0 ∈ L 2 (D), it is convenient to define the product space Y := L 2 (V ; I) × L 2 (D), and its dual space by .
In particular, we extend X to Y as follows.For all v ∈ X we interpret v as an element of Y as v = (v(x, t), v(x, 0)).This gives X ⊆ Y.We further know from [13, Theorem 5.9.3] that X → C(L 2 (D); I) and max t∈I ∥v( , where C 1 depends on T only.Hence we obtain for all v ∈ X that and thus we get that X is continuously embedded into Y, i.e., X → Y.

Variational formulation
Based on these spaces, using integration by parts with respect to x we can write (2.2) as a variational problem as follows.Given the input functions f = (z, u 0 ) ∈ Y ′ and y ∈ U , find a function where for all w ∈ X , , and B y w = (B y 1 w, B y 2 w).For better readability we have omitted the parameter dependence v = (v 1 (x, t), v 2 (x)), f = (z(x, t), u 0 (x)), w = w(x, t) and a y = a y (x, t).Note that a solution of (2.6) automatically satisfies u y (•, 0) = u 0 , as can be seen by setting v 1 = 0 and allowing arbitrary v 2 .
The parametric family of parabolic evolution operators {B y , y ∈ U } associated with this bilinear form is a family of isomorphisms from X to Y ′ , see, e.g., [10].In [44] a shorter proof based on the characterization of the bounded invertibility of linear operators between Hilbert spaces is presented, together with precise bounds on the norms of the operator and its inverse: there exist constants 0 where and hence for all y ∈ U we have the a priori estimate The following regularity result for the state u y was proved in [32].
Lemma 2.1.Let f = (z, u 0 ) ∈ Y ′ .For all ν ∈ F and all y ∈ U , we have where β 1 is as described in (2.8) and the sequence b = (b j ) j≥1 is defined by For our later derivation of the optimality conditions for the optimal control problem, it is helpful to write the variational form of the PDE (2.6) as an operator equation using (2.7): with B y 1 : X → L 2 (V ′ ; I) and B y 2 : X → L 2 (D) given by where Λ 1 : Y ′ → L 2 (V ′ ; I) and Λ 2 : Y ′ → L 2 (D) are the restriction operators defined, for any For the definition of a meaningful inverse of the operators B y 1 and B y 2 , we first define the trivial extension operators Ξ 1 : We observe that P 1 := Ξ 1 Λ 1 is an orthogonal projection on the L 2 (V ′ ; I)-component in Y ′ and analogously P 2 := Ξ 2 Λ 2 is an orthogonal projection on the L 2 (D)-component in Y ′ .This is verified as follows.For all v, u ∈ Y ′ it is true that where I Y ′ denotes the identity operator on Y ′ .We clearly have I Y ′ = P 1 + P 2 .Therefore we can write any element v in Y ′ as v = P 1 v + P 2 v in Y ′ , and by linearity of (B y ) −1 we get A meaningful inverse of the operators B y 1 : X → L 2 (V ′ ; I) and B y 2 : X → L 2 (D) are then given by (B y 1 ) † : L 2 (V ′ ; I) → X and (B y 2 ) † : L 2 (D) → X , defined as We call the operator (B y 1 ) † the pseudoinverse of B y 1 and the operator (B y 2 ) † the pseudoinverse of B y 2 .Clearly, the pseudoinverse operators are linear and bounded operators.Lemma 2.2.The pseudoinverse operators (B y 1 ) † and (B y 2 ) † defined by (2.13) satisfy 2 ) † , and which are the identity operators on L 2 (V ′ ; I), L 2 (D), and X , respectively.
Proof.From the definition of various operators, we have as required.
Lemma 2.3.For y ∈ U and given (z, u 0 ) ∈ Y ′ , the solution u y of the operator equation (2.12) can be written as Proof.From (2.14) we have u y = (B y 1 ) † B y 1 u y + (B y 2 ) † B y 2 u y = (B y 1 ) † z + (B y 2 ) † u 0 , as required.

Dual problem
In the following we will need the dual operators (B y ) ′ , (B y 1 ) ′ and (B y 2 ) ′ of B y , B y 1 and B y 2 , respectively, which are formally defined by The dual problem to (2.6) (or equivalently (2.12)) is as follows.Given the input function f dual ∈ X ′ and y ∈ U , find a function q y = (q y 1 , q y 2 ) ∈ Y such that or in operator form (B y ) ′ q y = f dual , which has the unique solution q y = (B y ) ′ −1 f dual .
Existence and uniqueness of the solution of the dual problem follow directly from the bounded invertibility of B y .We know that its inverse, (B y ) −1 , is a bounded linear operator and thus the dual of (B y ) −1 is (uniquely) defined (see, e.g., [49, Theorem 1 and Definition 1, Chapter VII]).The operator (B y ) −1 and its dual operator ((B y ) −1 ) ′ = ((B y ) ′ ) −1 are equal in their operator norms (see, e.g., [49, Theorem 2, Chapter VII]), i.e., the operator norms of the dual operator (B y ) ′ and its inverse are bounded by the constants β 2 and 1 β 1 in (2.8).Applying integration by parts with respect to the time variable in (2.7), the left-hand side of the dual problem (2.16) can be written as We may express the solution q y = (q y 1 , q y 2 ) ∈ Y of the dual problem (2.16) in terms of the dual operators of the pseudoinverse operators (B y 1 ) † and (B y 2 ) † .This is true because we get an analogous result to Lemma 2.2 in the dual spaces.
Proof.For all Similarly, for all v ∈ X and w ∈ X ′ we have This completes the proof.
Lemma 2.5.Given the input function f dual ∈ X ′ and y ∈ U , the (unique) solution of the dual problem (2.16) is given by as required.
We will see in the next section that, with the correct choice of the right-hand side f dual , the gradient of the objective function (2.3) can be computed using the solution q y of the dual problem.

Parabolic optimal control problems under uncertainty with control constraints
The presence of uncertainty in the optimization problem requires the introduction of a risk measure R that maps the random variable objective function (see (3.3) below) to the extended real numbers.Let (Ω, A, P) be a complete probability space.A functional R : (2) Translation equivariance: R(X + c) = R(X) + c for all c ∈ R.
Coherent risk measures are popular as numerous desirable properties can be derived from the above conditions (see, e.g., [29] and the references therein).However, it can be shown (see [30,Theorem 1]) that the only coherent risk measures that are Fréchet differentiable are linear ones.The expected value has all of these properties, but is risk-neutral.In order to address also risk-averse problems we focus on the (nonlinear) entropic risk measures, which are risk-averse, Fréchet differentiable, and satisfy the conditions (1)-( 3) above, i.e., they are not positively homogeneous (and thus not coherent).Risk measures satisfying (2) and ( 3) are called monetary risk measures, and a monetary risk measure that also satisfies ( 1) is called a convex risk measure (see [14]).
In this section we will first discuss the required conditions on the risk measure R under which the optimal control problem has a unique solution.We will then present two classes of risk measures that satisfy these conditions, namely the linear risk measures that include the expected value, and the entropic risk measures.Finally we derive necessary and sufficient optimality conditions for the optimal control problem with these two risk measures.We assume that the target state u belongs to X and that the constants α 1 , α 2 are nonnegative with α 1 + α 2 > 0 and α 3 > 0. Then we consider the following problem: minimize J(u, z) defined in (2.3) subject to the parabolic PDE (2.2) and constraints on the control with Z being nonempty, bounded, closed and convex.We want to analyze the problem in its reduced form, i.e., expressing the state 3) in terms of the control z.This reformulation is possible because of the bounded invertibility of the operator B y for every y ∈ U , see Section 2.2 and the references therein.We therefore introduce an alternative notation u(z) = (u y (z))(x, t) = u y (x, t).(Of course u y depends also on u 0 , but we can think of u 0 as fixed, and therefore uninteresting.)The reduced problem is then to minimize where we can equivalently write the reduced problem as With the uniformly boundedly invertible forward operator B y , our setting fits into the abstract framework of [29] where the authors derive existence and optimality conditions for PDE-constrained optimization under uncertainty.In particular, the forward operator B y , the regularization term α 3 2 ∥z∥ 2 L 2 (V ′ ;I) and the random variable tracking-type objective function Φ y satisfy the assumptions of [29,Proposition 3.12].In order to present the result about the existence and uniqueness of the solution of (3.4), which is based on [29, Proposition 3.12], we recall some definitions from convex analysis (see, e.g., [29] and the references therein Lemma 3.1.Let α 1 , α 2 ≥ 0 and α 3 > 0 with α 1 + α 2 > 0 and let R be proper, closed, convex and monotonic, then there exists a unique solution of (3.4).
Proof.The existence of the solution follows directly from [29,Proposition 3.12].We thus only prove the strong convexity of the objective function, which implies strict convexity and hence uniqueness of the solution.Clearly α 3 2 ∥z∥ 2 L 2 (V ′ ;I) is strongly convex.Since the sum of a convex and a strongly convex function is strongly convex it remains to show the convexity of R(Φ y (z)).By the linearity and the bounded invertibility of the linear forward operator B y , the tracking-type objective functional Φ y (z) is quadratic in z and hence convex, i.e., we have for z, z ∈ L 2 (V ′ ; I) and λ ∈ [0, 1] that Φ y (λz + (1 − λ)z) ≤ λΦ y (z)+(1−λ)Φ y (z).Then, by the monotonicity and the convexity of the risk measure R we get R(Φ y (λz

Linear risk measures, including the expected value
First we derive a formula for the Fréchet derivative of (3.2) when R is linear, which includes the special case R(•) = U (•) dy.Lemma 3.2.Let R be linear.Then the Fréchet derivative of (3.2) as an element of (L 2 (V ′ ; I)) ′ = L 2 (V ; I) is given by Proof.For z, δ ∈ L 2 (V ′ ; I), we can write where we used (2.15) to write Expanding the squared norms using ∥v + w∥ 2 = ⟨v + w, v + w⟩ = ∥v∥ 2 + 2⟨v, w⟩ + ∥w∥ 2 , we obtain It remains to simplify the three terms.Using the extended Riesz operator R V : where the third equality follows since (B y 1 ) † δ ∈ X → L 2 (V ; I), and the fourth equality follows from the definition of the dual operator ((B y 1 ) † ) ′ : Next, using the definition of the dual operator (E T ) ′ : L 2 (D) → X ′ , we can write Finally, using the definition of the L 2 (V ′ , I) inner product and the extended inverse and collecting the terms above leads to the expression for J ′ (z) in (3.5).
We call J ′ (z) the gradient of J(z) and show next, that J ′ (z) can be computed using the solution of the dual problem (2.16) with We show this first for the special case when R is linear.
For every y ∈ U , let u y ∈ X be the solution of (2.2) and then let q y ∈ Y be the solution of (2.16) with f dual given by (3.6).Then for R linear, the gradient of (3.2) is given as an element of L 2 (V ; I) by Proof.This follows immediately from (3.6), Lemma 3.2 and Lemma 2.5.
Proposition 3.4.Under the conditions of Lemma 3.3, with f dual given by (3.6), the dual solution q y = (q y 1 , q y 2 ) ∈ Y satisfies Consequently, the left-hand side of (2.16) reduces to and hence q y 1 is the solution to where the first equation holds for x ∈ D, t ∈ I, and the second equation holds for x ∈ ∂D, t ∈ I, and the last equation holds for x ∈ D.
Proof.Since (2.16) holds for arbitrary w ∈ X , it holds in particular for the special case for t ∈ T n , T , with arbitrary v ∈ V .For f dual given by (3.6), the right-hand side of (2.16) becomes From (2.17) the left-hand side of (2. 16) is now Equating the two sides, letting n → ∞, and noting that v ∈ V is arbitrary, we conclude that necessarily q y 2 = q y 1 (•, 0).Hence, the left-hand side of (2.16) reduces to (3.8).By analogy with the weak form of (2.2), using the transformation t → T − t, we conclude that q y 1 is the solution to (3.9).

The entropic risk measure
The expected value is risk neutral.Next, we consider risk averse risk measures such as the entropic risk measure for an essentially bounded random variable Y (y) and some θ ∈ (0, ∞).Using R = R e in (3.2), the optimal control problem becomes min z∈Z J(z), with for some θ ∈ (0, ∞) and Φ y defined in (3.3).
In the following we want to compute the Fréchet derivative of J(z) with respect to z ∈ L 2 (V ′ ; I).To this end, we verify that Φ Then for all y ∈ U , the function Φ y defined by (3.3) satisfies Thus for all θ > 0 we have ) Proof.We have from (3.3) that which yields (3.11) after applying (2.9).
Using the preceding lemma, we compute the gradient of (3.10).
Proof.The application of the chain rule gives Then the integral is a bounded and linear operator and hence its Fréchet derivative is the operator itself.Exploiting this fact, we obtain that Recalling from the previous subsection that ∂ z ( and collecting terms gives (3.14).

Optimality conditions
In the case when the feasible set of controls Z is a nonempty and convex set, we know (see, e.g., [46,Lemma 2.21]) that the optimal control z * satisfies the variational inequality For convex objective functionals J(z), like the ones considered in this work, the variational inequality is a necessary and sufficient condition for optimality.The complete optimality conditions are then given by the following result.
Theorem 3.7.Let R be the expected value or the entropic risk measure.A control z * ∈ L 2 (V ′ ; I) is the unique minimizer of (2.3) subject to (2.2) and (3.1) if and only if it satisfies the optimality system: which holds for all y ∈ U , and J ′ (z) is given by (3.7) for the expected value, or (3.14) for the entropic risk measure.
Observe that the optimality system in Theorem 3.7 contains the variational formulations of the state PDE (2.6) and the dual PDE (2.16) in the first and second equation, respectively.
It is convenient to reformulate the variational inequality (3.15) in terms of an orthogonal projection onto Z.The orthogonal projection onto a nonempty, closed and convex subset Z ⊂ H of a Hilbert space H, denoted by P Z : H → Z, is defined as Then, see, e.g., [27,Lemma 1.11], for all h ∈ H and γ > 0 the condition h ∈ Z, ⟨h, v−z⟩ H ≥ 0 ∀v ∈ Z is equivalent to z − P Z (z − γh) = 0. Using the definition of the Riesz operator and H = L 2 (V ′ ; I), we conclude that (3.15) is equivalent to This equivalence can then be used to develop projected descent methods to solve the optimal control problem, see, e.g., [

Parametric regularity of the adjoint state
In this section we derive an a priori bound for the adjoint state and the partial derivatives of the adjoint state with respect to the parametric variables.Existing results, e.g., [32,Theorem 4], do not directly apply to our case, since the right-hand side of the affine linear, parametric operator equation depends on the parametric variable, more specifically Lemma 4.1.Let α 1 , α 2 ≥ 0 and α 3 > 0, with α 1 + α 2 > 0. Let f = (z, u 0 ) ∈ Y ′ and u ∈ X .For every y ∈ U , let u y ∈ X be the solution of (2.2) and then let q y ∈ Y be the solution of (2.16) with f dual given by (3.6).Then we have where β 1 is described in (2.8).
Proof.By the bounded invertibility of B y and its dual operator, we have where we used (2.9).Combining the estimates gives the desired result.
Proof.For ν = 0 the assertion follows from the previous lemma.For ν ̸ = 0 we take derivatives Separating out the m = 0 term, we obtain By the bounded invertibility of (B y ) ′ , we have For m ̸ = 0, we conclude with (2.1) that ⟨v, ∂ m (B y ) ′ w⟩ X ,X ′ = I D ψ j ∇v • ∇w dx dt if m = e j , and otherwise it is zero.Hence for m = e j we obtain for all v ∈ Y that Hence By Lemma 4.1 this recursion is true for ν = 0 and we may apply [33,Lemma 9.1] to get From (2.9) and (2.10) we have We finally arrive at where the equality follows from [33, Formula (9.4)].

Regularity analysis for the entropic risk measure
Our goal is to use QMC to approximate the following high-dimensional integrals appearing in the denominator and numerator of the gradient (3.14).To this end, we develop regularity bounds for the integrands.
For every y ∈ U , let u y ∈ X be the solution of (2.2) and let Φ y be as in (3.3).Then for all ν ∈ F we have where the sequence b = (b j ) j≥1 is defined by (2.11).
Proof.The case ν = 0 is precisely (3.11).Consider now ν ̸ = 0. We estimate the partial derivatives of Φ y by differentiating under the integral sign and using the Leibniz product rule in conjunction with the Cauchy-Schwarz inequality to obtain Separating out the m = 0 and m = ν terms and utilizing (2.10), we obtain where the sum over m can be rewritten as where we used the identity which is a simple consequence of the Vandermonde convolution [38,Equation (5.1)].Combining the estimates yields the required result.
Proof.We prove (5.3) for all ν ̸ = 0 and 1 ≤ λ ≤ |ν| by induction on |ν|.The base case A e j ,1 is easy to verify.Let ν ̸ = 0 and suppose that (5.3) holds for all multi-indices m of order ≤ |ν| and all 1 ≤ λ ≤ |m|.The case A ν+e j ,1 is also straightforward to verify.We consider therefore 2 ≤ λ ≤ |ν| + 1.Using (5.2) and the induction hypothesis, we have where we used (5.1) and then regrouped the factors as binomial coefficients.Next we take the binomial identity [38,Equation (5.6)] separate out the ℓ = 0 term, and use λ−1 k=1 (−1) Substituting this back into (5.4) and simplifying the factors, we obtain For every y ∈ U , let u y ∈ X be the solution of (2.2) and let Φ y be as in (3.3).Then for all ν ∈ F we have where the sequence b = (b j ) j≥1 is defined by (2.11) and σ is defined by (3.13).
Proof.For ν = 0 we have from (3.12) that | exp(θ Φ y )| ≤ e σ , which satisfies the required bound.For ν ̸ = 0, from Faà di Bruno's formula (Theorem 5.2) we have with α ν,0 (y) = δ ν,0 , α ν,λ (y) = 0 for λ > |ν|, and where we used Lemma 5.1.Applying Lemma 5.3 we conclude that We have This then leads to a more complicated bound for Theorem 5.6 below.We have chosen to present the current form of Theorem 5.4 to simplify our subsequent analysis.Interestingly, the sum in (5.6) can also be rewritten as a sum with only positive terms: which is identical to the sequence [3, Proposition 7] and the sequence A181289 in the OEIS (written in slightly different form).However, we were unable to find a closed form expression for the sum; neither [21] nor [38] shed any light.The hope is to obtain an alternative bound for (5.7) that does not involve the factor e |ν| .This is open for future research.
As an alternative approach to the presented bootstrapping method, holomorphy arguments can be used to derive similar regularity bounds, see, e.g., [8].

Error analysis
Let z * denote the solution of (3.4) and let z * s,n be the minimizer of where Φ y s (z) = Φ (y 1 ,y 2 ,...,ys,0,0,...) (z) is the truncated version of Φ y (z) defined in (3.3), and R s,n is an approximation of the risk measure R, for which the integrals over the parameter domain ] s and then approximated by an n-point randomly-shifted QMC rule: for entropic risk measure, for θ ∈ (0, ∞), for carefully chosen QMC points y (i) , i = 1, . . ., n, involving a uniformly sampled random shift ∆ ∈ [0, 1] s , see Section 6.2.
We have seen in the proof of Lemma 3.1 that the risk measures considered in this manuscript are convex and the objective function J, see (3.2), is thus strongly convex.It is important to note that the n-point QMC rule preserves the convexity of the risk measure, so J s,n is a strongly convex function, because it is a sum of a convex and a strongly convex function.Therefore we have the optimality conditions Hence where we used the α 3 -strong convexity of J ′ s,n in the fourth step, i.e., Thus we have with (3.4) We will next expand this upper bound in order to split it into the different error contributions: dimension truncation error and QMC error.The different error contributions are then analyzed separately in the following subsections for both risk measures.
In the case of the expected value, it follows from (3.7) that where q y 1,s := q (y 1 ,y 2 ,...,ys,0,0,...) denotes the truncated version of q y 1 , and E ∆ denotes the expected value with respect to the random shift ∆ ∈ [0, 1] s .
In the case of the entropic risk measure, we recall that J ′ (z) is given by (3.14).Let then we have where we used T ≥ 1 and T s,n ≥ 1 and, using the abbreviated notation g y (x, t) := exp(θ Φ y (z)) q y 1 (x, t) we get where we used Theorem 5.6 with ν = 0. We can write For the first term on the right-hand side of (6.2) we obtain and for the second term we have Remark 6.1.Since we have ∥v 1 ∥ L 2 (V ;I) ≤ ∥v∥ Y for all v = (v 1 , v 2 ) ∈ Y by definition, and thus in particular ∥ U (q y 1 − q y 1,s ) dy∥ L 2 (V ;I) ≤ ∥ U (q y − q y s ) dy∥ Y , we can replace q y 1 , q y 1,s ∈ L 2 (V ; I) in (6.1) and (6.4) by q y , q y s ∈ Y.In order to obtain error bounds and convergence rates for (6.1) and (6.4), it is then sufficient to derive the results in the Y-norm, which is slightly stronger than the L 2 (V ; I)-norm.

Truncation error
In this section we derive bounds and convergence rates for the errors that occur by truncating the dimension, i.e., for the first terms in (6.1), (6.3) and (6.4).
We prove a new and very general theorem for the truncation error based on knowledge of regularity.The idea of the proof is based on a Taylor series expansion and is similar to the approach in [19,Theorem 4.1].The use of Taylor series for dimension truncation error analysis has also been considered, for instance, in [4,17].Theorem 6.2.Let Z be a separable Banach space and let g(y) : U → Z be analytically dependent on the sequence of parameters y ∈ U = [− 1 2 , 1 2 ] N .Suppose there exist constants C 0 > 0, r 1 ≥ 0, r 2 > 0, and a sequence ρ = (ρ j ) j≥1 ∈ ℓ p (N) for 0 < p < 1, with ρ 1 ≥ ρ 2 ≥ • • • , such that for all y ∈ U and ν ∈ F we have Then, denoting (y ≤s ; 0) = (y 1 , y 2 , . . ., y s , 0, 0, . ..), we have for all s ∈ N U g(y) − g(y ≤s ; 0) dy for C > 0 independent of s.
Proof.Let y ∈ U and G ∈ Z ′ with ∥G∥ Z ′ ≤ 1 and define For arbitrary k ≥ 1 we consider the Taylor expansion of F about the point (y ≤s ; 0) = (y 1 , y 2 , . . ., y s , 0, 0, . ..): Rearranging this equation and integrating over y ∈ U yields (1 − t) k y ν ∂ ν y F (y ≤s ; t y >s ) dt dy. ( If there is any component ν j = 1 with j > s, then the summand in the first term vanishes, since (for all ν ∈ F with ν j = 0 ∀j ≤ s) dy ≤s = 0, where we used Fubini's theorem.Taking the absolute value on both sides in (6.5) and using |y j | ≤ 1 2 , we obtain Furthermore, we have which is also bounded by the last expression in (6.6).
On the other hand, we can use the multinomial theorem to bound the second term in (6.6) where we used j>s ρ j ≤ s −1/p+1 ( ∞ j=1 ρ p j ) 1/p .(The last inequality follows directly from [9, Lemma 5.5], which is often attributed to Stechkin.For an elementary proof we refer to [31,Lemma 3.3].) Taking now k = ⌈ 1 1−p ⌉ yields that (6.6) is of order O(s −2/p+1 ).Note that k ≥ 2 for 0 < p < 1.The result can be extended to all s by a trivial adjustment of the constants.Remark 6.3.The assumption of analyticity of the integrand can be replaced since the Taylor series representation remains valid under the weaker assumption that only the ν-th partial derivatives with |ν| ≤ k + 1 for k = ⌈ 1 1−p ⌉ and 0 < p < 1 exist.
We now apply this general result to the first terms in (6.1), (6.3) and (6.4).
Theorem 6.4.Let θ > 0, α 1 , α 2 ≥ 0, with α 1 + α 2 > 0. Let f = (z, u 0 ) ∈ Y ′ and u ∈ X .For every y ∈ U , let u y ∈ X be the solution of (2.2) and Φ y be as in (3.3), and then let q y ∈ Y be the solution of (2.16) with f dual given by (3.6).Suppose the sequence b = (b j ) j≥1 defined by (2.11) satisfies j≥1 b p j < ∞ for some 0 < p < 1, and (6.7) Then for every s ∈ N, the truncated solutions u y s , q y s and Φ y s satisfy In each case we have a generic constant C > 0 independent of s, but depending on z, u 0 , u and other constants as appropriate.
Proof.The result is a corollary of Theorem 6.2 by applying the regularity bounds in Lemma 2.1, Theorem 4.2, Theorem 5.6 and Theorem 5.4.

Quasi-Monte Carlo error
We are interested in computing s-dimensional Bochner integrals of the form where g(y) is an element of a separable Banach space Z for each y ∈ U s .As our estimator of I s (g), we use a cubature rule of the form with weights α i ∈ R and cubature points y (i) ∈ U s .In particular, we are interested in QMC rules (see, e.g., [12,33]), which are cubature rules characterized by equal weights α i = 1/n and carefully chosen points y (i) for i = 1, . . ., n.
We shall see that for sufficiently smooth integrands, randomly shifted lattice rules lead to convergence rates not depending on the dimension, and which are faster compared to Monte Carlo methods.Randomly shifted lattice rules are QMC rules with cubature points given by y (i) where z ∈ N s is known as the generating vector, ∆ ∈ [0, 1] s is the random shift and frac(•) means to take the fractional part of each component in the vector.In order to get an unbiased estimator, in practice we take the mean over R uniformly drawn random shifts, i.e., we estimate I s (g) using ). (6.10)In this section we derive bounds and convergence rates for the errors that occur by applying a QMC method for the approximation of the integrals in the second terms in (6.1), (6.3) and (6.4).We first prove a new general result which holds for any cubature rule in a separable Banach space setting.Theorem 6.5.Let U s = [− 1 2 , 1 2 ] s and let W s be a Banach space of functions F : U s → R, which is continuously embedded in the space of continuous functions.Consider an n-point cubature rule with weights α i ∈ R and points y (i) ∈ U s , given by and define the worst case error of Q s,n in W s by e wor (Q s,n ; W s ) := sup Let Z be a separable Banach space and let Z ′ denote its dual space.Let g : y → g(y) be continuous and g(y) ∈ Z for all y ∈ U s .Then Proof.From the separability of Z and the continuity of g(y) we get strong measurability of g(y).Moreover, from the compactness of U s and the continuity of y → g(y) we conclude that sup y∈Us ∥g(y)∥ Z < ∞ and hence Us ∥g(y)∥ Z dy < ∞, which in turn implies ∥ Us g(y) dy∥ Z < ∞.Thus g(y) is Bochner integrable.Furthermore, for every normed space Z, its dual space Z ′ is a Banach space equipped with the norm ∥G∥ Z ′ := sup g∈Z, ∥g∥ Z ≤1 |⟨G, g⟩ Z ′ ,Z |.Then it holds for every g ∈ Z that ∥g∥ Z = sup G∈Z ′ , ∥G∥ Z ′ ≤1 |⟨G, g⟩ Z ′ ,Z |.This follows from the Hahn-Banach Theorem, see, e.g., [40,Theorem 4.3].
Thus we have where we used the linearity of G and the fact that for Bochner integrals we can swap the integral with the linear functional, see, e.g., [49,Corollary V.2].
From the definition of the worst case error of Q s,n in W s it follows that for any F ∈ W s we have Applying this to the special case F (y) = G(g(y)) = ⟨G, g(y)⟩ Z ′ ,Z in (6.12) yields (6.11).
Theorem 6.6.Let the assumptions of the preceding Theorem hold.In addition, suppose there exist constants C 0 > 0, r 1 ≥ 0, r 2 > 0 and a positive sequence ρ = (ρ j ) j≥1 such that for all u ⊆ {1 : s} and for all y ∈ U s we have Then, a randomly shifted lattice rule can be constructed using a component-by-component (CBC) algorithm such that where ϕ tot (n) is the Euler totient function, with 1/ϕ tot (n) ≤ 2/n when n is a prime power, and Proof.We consider randomly shifted lattice rules in the unanchored weighted Sobolev space W s,γ with norm ∥F ∥ 2 Ws,γ := u⊆{1:s} It is known that CBC construction yields a lattice generating vector satisfying We have from (6.11) that Using the definition of the W s,γ -norm, we have We can now use the assumption (6.13) and combine all of the estimates to arrive at the required bound.
Remark 6.7.Theorem 6.5 holds for arbitrary cubature rules, thus similar results to Theorem 6.6 can be stated for other cubature rules.In particular, the regularity bounds obtained in Sections 4 and 5 can also be used for worst case error analysis of higher-order QMC quadrature as well as sparse grid integration.
Theorem 6.8.Let θ > 0, α 1 , α 2 ≥ 0, with α 1 + α 2 > 0. Let f = (z, u 0 ) ∈ Y ′ and u ∈ X .For every y ∈ U and s ∈ N, let u y s ∈ X be the truncated solution of (2.2) and Φ y s be as in (3.3), and then let q y s ∈ Y be the truncated solution of (2.16) with f dual given by (3.6).Then a randomly shifted lattice rule can be constructed using a CBC algorithm such that for all λ ∈ ( 1 2 , 1] we have ) where ϕ tot (n) is the Euler totient function, with 1/ϕ tot (n) ≤ 2/n when n is a prime power.
Here C s,γ,λ is given by (6.14), with r 1 = 2, r 2 = e, ρ j = b j defined in (2.11), and C 0 > 0 is independent of s, n, λ and weights γ but depends on u, z 0 , u and other constants.With the choices Consequently, under the assumption (6.7), the above three mean-square errors are of order if p ∈ ( to assess the dimension truncation and QMC errors.
All computations were carried out on the computational cluster Katana supported by Research Technology Services at UNSW Sydney [28].

Dimension truncation error
The dimension truncation errors in Theorem 6.4 are estimated by approximating the quantities and as well as for s ′ ≫ s, by using a tailored lattice cubature rule generated using the fast CBC algorithm with n = 2 15 nodes and a single fixed random shift to compute the highdimensional parametric integrals.The obtained results are displayed in Figures 1 and 2 for the fluctuations (ψ j ) j≥1 corresponding to decay rates ϑ ∈ {1.3, 2.6} and dimensions s ∈ {2 k | k ∈ {1, . . ., 9}}.We use θ = 10 in the computations corresponding to S s and T s .
As the reference solution, we use the solutions corresponding to dimension s ′ = 2048 = 2 11 .The theoretical dimension truncation rate is readily observed in the case ϑ = 1.3.We note in the case ϑ = 2.6 that the dimension truncation convergence rates degenerate for large values of s.This may be explained by the fact that the QMC cubature with n = 2 15  nodes has an error around 10 −8 (see Figure 3 in Subsection 7.2), but the finite element discretization error may also be a contributing factor.For smaller values of s, the higher order convergence is also apparent in the case ϑ = 2.6.

QMC error
We investigate the QMC error rate by computing the root-mean-square approximations corresponding to (6.15)- (6.18),where Q and Q (r) are as in (6.10) for a randomly shifted lattice rule with cubature nodes (6.9),where the random shift ∆ is drawn from U ([0, 1] s ).
As the generating vector, we use lattice rules constructed using the fast CBC algorithm with n = 2 m , m ∈ {4, . . ., 15}, lattice points and R = 16 random shifts, and s = 100.We carry out the experiments using two different decay rates ϑ ∈ {1.3, 2.6} for the input random field.The results are displayed in Figure 3.The root-mean-square error converges at a linear rate in all experiments, which is consistent with the theory.

Optimal control problem
We consider the problem of finding the optimal control z ∈ Z that minimizes (2.3) subject to the PDE constraint (2.2).We consider constrained optimization over Z = {z ∈ L 2 (V ′ ; I) : ∥z∥ L 2 (V ′ ;I) ≤ 2} and compare our results with the reconstruction obtained by carrying out unconstrained optimization over Z = L 2 (V ′ ; I).To this end, we define the projection operator P(w) := min 1, 2 ∥w∥ L 2 (V ;I) w for w ∈ L 2 (V ; I) which is used in the constrained setting, while in the unconstrained setting we use P := I L 2 (V ;I) .The operator P acts on L 2 (V ; I) and hence it is different from the operator P Z introduced in Section 3.3, which projects onto Z.
To be able to handle elements of Z numerically, we apply the projected gradient method (see, e.g., [27]) as described in Algorithm 1 together with the projected Armijo rule stated in Algorithm 2. Note that evaluating J(R V w) and J ′ (R V w) in Algorithms 1 and 2 requires solving the state PDE with the source term R V w.In particular, the Riesz operator appears in the loading term after finite element discretization and can thus be evaluated using (2.4).We use the initial guess w 0 = 0.The parameters of the gradient descent method were chosen to be η 0 = 100, γ = 10 −4 , and β = 0.1.
We consider the entropic risk measure with θ = 10 and set ϑ = 1.3.The reconstructed optimal control obtained using the bounded set of feasible controls Z is displayed in Figure 4.The reconstructed optimal control at the terminal time T = 1 and its pointwise difference to the control obtained without imposing control constraints are displayed in Figure 5. Finally, the evolution of the objective functional as the number of gradient descent iterations increases is plotted in Figure 6 for the constrained and unconstrained optimization problems.
Algorithm 1 Projected gradient descent Input: feasible starting value w ∈ L 2 (V ; I) such that z = R V w ∈ Z 1: while ∥w − P(w − J ′ (R V w))∥ L 2 (V ;I) >TOL do  V z * of the reconstructed optimal control z * using the entropic risk measure for several values of t in the constrained setting.

Conclusion
We developed a specially designed QMC method for an optimal control problem subject to a parabolic PDE with an uncertain diffusion coefficient.To account for the uncertainty, we considered as measures of risk the expected value and the more conservative (nonlinear) entropic risk measure.For the high-dimensional integrals originating from the risk measures, we provide error bounds and convergence rates in terms of dimension truncation and the QMC approximation.In particular, after dimension truncation, the QMC error bounds do not depend on the number of uncertain variables, while leading to faster convergence rates compared to Monte Carlo methods.In addition we extended QMC error bounds in the literature to separable Banach spaces, and hence the presented error analysis is discretization invariant.

Figure 1 :
Figure 1: The approximate dimension truncation errors corresponding to the state and adjoint PDEs.

Figure 3 :
Figure 3: Left: The approximate root-mean-square error for QMC approximation of the integrals Us u y s dy and Us q y s dy.Right: The approximate root-mean-square error for QMC approximation of quantities Ss and Ts.All computations were carried out using R = 16 random shifts, n = 2 m , m ∈ {4, . . ., 15}, and dimension s = 100.

Figure 4 :
Figure 4: The inverse Riesz transform R −1V z * of the reconstructed optimal control z * using the entropic risk measure for several values of t in the constrained setting.

Figure 5 :
Figure 5: Left: the inverse Riesz transform of the control at time t = 1 in the constrained setting after 25 iterations of the projected gradient descent algorithm using the entropic risk measure.Right: The difference between the reconstruction obtained in the constrained setting and the corresponding solution in the unconstrained setting.

Figure 6 :
Figure 6: The value of the objective functional for each gradient descent iteration.The results corresponding to the constrained setting and the unconstrained setting are plotted in blue and red, respectively.
.19) Proof.Existence and uniqueness follow from the bounded invertibility of (B y ) ′ , see Subsection 2.2.Thus, we only need to verify that (2.19) solves the dual problem (2.16).It follows from (2.18) that 27, Chapter 2.2.2].Remark 3.8.If Z is the closed ball with radius r > 0 in a Hilbert space H, then the orthogonal projection P Z is given by Proof.The mean-square error bounds are a corollary of Theorem 6.6 by applying the regularity bounds in Lemma 2.1, Theorem 4.2, Theorem 5.6 and Theorem 5.4.For simplicity we set C 0 , r 1 and r 2 to be the largest values arising from the four results.= 10 −3 , α 2 = 10 −2 , and α 3 = 10 −7 .Moreover, we always use u 0 (x) = sin(2πx 1 ) sin(2πx 2 ).