Multilevel Monte Carlo estimators for elliptic PDEs with L\'evy-type diffusion coefficient

General elliptic equations with spatially discontinuous diffusion coefficients may be used as a simplified model for subsurface flow in heterogeneous or fractured porous media. In such a model, data sparsity and measurement errors are often taken into account by a randomization of the diffusion coefficient of the elliptic equation which reveals the necessity of the construction of flexible, spatially discontinuous random fields. Subordinated Gaussian random fields are random functions on higher dimensional parameter domains with discontinuous sample paths and great distributional flexibility. In the present work, we consider a random elliptic partial differential equation (PDE) where the discontinuous subordinated Gaussian random fields occur in the diffusion coefficient. Problem specific multilevel Monte Carlo (MLMC) Finite Element methods are constructed to approximate the mean of the solution to the random elliptic PDE. We prove a-priori convergence of a standard MLMC estimator and a modified MLMC - Control Variate estimator and validate our results in various numerical examples.

1. Introduction. Partial differential equations with random operators\ data\ domain are widely studied. For problems with sparse data or where measurement errors are unavoidable, uncertainties may be quantified using stochastic models. Methods to quantify uncertainty could be divided into two different branches: intrusive and non-intrusive. The former requires solving a high dimensional partial differential equation, where part of the dimensionality stems from the smoothness of the random field or process (see among others [6], [19], [30] and the references therein). The latter are (essentially) sampling methods and require repeated solutions of lower dimensional problems (see, among others, [1], [9], [11], [12], [29], [35]). Among the sampling method the multilevel Monte Carlo approach has been successfully established to lower the computational complexity for various uncertain problems, to the point where (depending on the dimension) it is asymptotically as costly as a single solve of the deterministic partial differential equation on a fine discretization level (see [9], [11], [13] and [21]). In the cited papers mostly Gaussian random fields were used as diffusivity coefficients in the elliptic equation. Gaussian random fields are stochastically very well understood objects and they may be used in both approaches. The distributions underlying the field are, however, Gaussian and therefore the model lacks flexibility, in the sense that fields cannot have pointwise marginal distributions having heavy-tails. Furthermore, Gaussian random fields with Matérn-type covariance operators have P-almost surely spatial continuous paths. There are some extensions in the literature (see, for example, [11], [29] and [17]).
In this paper we investigate multilevel Monte Carlo methods for an elliptic PDE where the coefficient is given by a subordinated Gaussian random field. The subordinated Gaussian random field is a type of a (discontinuous) Lévy field. Different subordinators display unique patterns in the discontinuities and have varied marginal distributions (see [7]). Existence and uniqueness of pathwise solutions to the problem was demonstrated in [8]. Spatial regularity of the solution depends on the subordinated Gaussian random field which itself depends on the subordinator. The discontinuities in the spatial domain pose additional difficulty in the pathwise discretization. A sample-adapted approach was considered in [8], but is limited to certain subordinators. Here we investigate not only the limitations of a sample-adapted approach in multilevel sampling, but also a Control Variates ansatz as presented first in [31].
We structured the rest of the paper as follows: In Section 2 we introduce a general stochastic elliptic equation and its weak solution under mild assumptions on the coefficient. These assumptions accommodate the subordinated Gaussian random fields we introduce in Section 3. In Section 4 we approximate the diffusion coefficient and state a convergence result of the elliptic equation with the approximated coefficient to the unapproximated solution. In Section 5 we discuss spatial approximation methods, which are needed for the multilevel Monte Carlo methods introduced in Section 6 and its Control Variates variant in Section 7. Numerical examples are presented in the last section.
2. The stochastic elliptic problem. In this section, we briefly introduce the general stochastic elliptic boundary value problem. For more details on the existence, uniqueness and measurability of the solution to the considered PDE, we refer the reader to [8] and [11]. For the rest of this paper we assume that a complete probability space (Ω, F, P) is given. Let (H, (⋅, ⋅) H ) be a Hilbert space. A H-valued random variable is a measurable function Z ∶ Ω → H. The space L p (Ω; H) contains all strongly measurable functions Z ∶ Ω → H with Z L p (Ω;H) < ∞, for p ∈ [1, +∞], where the norm is defined by For a H-valued random variable Z ∈ L 1 (Ω; H) we define the expectation by the Bochner integral E(Z) ∶= ∫ Ω Z dP. Further, for a square-integrable, H-valued random variable Z ∈ L 2 (Ω; H), the variance is defined by V ar(Z) ∶= Z − E(Z) 2 L 2 (Ω;H) . We refer to [15], [27], [28] or [32] for more details on general probability theory and Hilbert space-valued random variables. Here, we split the domain boundary in two (d−1)-dimensional manifolds Γ 1 , Γ 2 , i.e. ∂D = Γ 1 . ∪Γ 2 , where we assume that Γ 1 is of positive measure and that the exterior normal derivative → n ⋅ ∇v on Γ 2 is well-defined for every v ∈ C 1 (D). The mapping a ∶ Ω × D → R is a stochastic (jump diffusion) coefficient and f ∶ Ω × D → R is a (measurable) random source function. Further, → n is the outward unit normal vector to Γ 2 and g ∶ Ω × Γ 2 → R a measurable function. Note that we just reduce the theoretical analysis to the case of homogeneous Dirichlet boundary conditions on Γ 1 to simplify notation. One could also consider non-homogeneous Dirichlet boundary conditions, since such a problem can always be considered as a version of (2.1) -(2.3) with modified source term and Neumann data (see also [11,Remark 2.1]).
The following general assumptions ensure the well-posedness of the elliptic boundary value problem (see also [8

Weak solution.
In this subsection, we introduce the pathwise weak solution of problem (2.1) -(2.3) following [8]. We denote by H 1 (D) the Sobolev space on D equipped with the norm , for x ∈ R d (see for example [18,Section 5.2] for an introduction to Sobolev spaces). We denote by T the trace operator [16]) and we introduce the solution space V ⊂ H 1 (D) by where we take over the standard Sobolev norm, i.e. ⋅ V ∶= ⋅ H 1 (D) . We identify H with its dual space H ′ and work on the Gelfand triplet V ⊂ H ≃ H ′ ⊂ V ′ . Hence, Assumption 2.1 guarantees that f (ω, ⋅) ∈ V ′ and g(ω, ⋅) ∈ H − 1 2 (Γ 2 ) for P-almost every ω ∈ Ω. We multiply the left hand side of Equation (2.1) by a test function v ∈ V and integrate by parts (see e.g. [36,Section 6.3]) to obtain This leads to the following pathwise weak formulation of the problem: for all v ∈ V . The function u(ω, ⋅) is then called pathwise weak solution to problem (2.1) -(2.3). The bilinear form B a(ω) and the operator F ω are defined by for fixed ω ∈ Ω, where the integrals in F ω are understood as the duality pairings: for v ∈ V . We have the following theorem on the existence of a unique solution to the random elliptic PDE (2.1) -(2.3).
In addition to the (pathwise) existence of the weak solution, the authors gave a rigorous justification of the measurability of the solution mapping 3. Subordinated Gaussian random fields. In [7], the authors proposed a new subordination approach to construct discontinuous Lévy-type random fields: the subordinated Gaussian random field. Motivated by the subordinated Brownian motion, the subordinated Gaussian Random field is constructed by replacing the spatial variables of a Gaussian random field (GRF) W on a general d-dimensional domain D ⊂ R d by d independent Lévy subordinators (see [7], [34], [3]). For d = 2, the detailed construction is as follows: For two positive horizons T 1 , T 2 < +∞, we define the domain D = [0, T 1 ] × [0, T 2 ]. We consider a GRF W = (W (x, y), (x, y) ∈ R 2 + ) with P-a.s. continuous paths and assume two independent Lévy subordinators l 1 = (l 1 (x), x ∈ [0, T 1 ]) and l 2 = (l 2 (y), y ∈ [0, T 2 ]) are given (see [7] and [3]). The subordinated GRF is then defined by The corresponding random field L = (L(x, y), (x, y) ∈ [0, T 1 ], ×[0, T 2 ]) is in general discontinuous on the spatial domain D.  In the presented samples, the underlying GRF is a Matérn-1.5 GRF. We recall that, for a given smoothness parameter ν M > 1 2, correlation parameter r M > 0 and variance σ 2 where Γ(⋅) is the Gamma function and K ν (⋅) is the modified Bessel function of the second kind (see [23, Section 2.2 and Proposition 1]). A Matérn-ν M GRF is a centered GRF with covariance function q M . It has been shown in [7] that the subordinated GRF constructed in (3.1) is separately measurable. Further, the corresponding random fields display great distributional flexibility, allow for a Lévy-Khinchin-type formula and formulas for their covariance functions can be derived which makes them attractive for applications. We refer the interested reader to [7] for a theoretical investigation of the constructed random fields.
4. The subordinated GRF in the elliptic model equation. In this section we incorporate the subordinated GRF in the diffusion coefficient of the elliptic PDE (2.1) -(2.3). Further, we show how to approximate the diffusion coefficient and state the most important results on the approximation of the corresponding PDE solution following [8]. For the proofs and a more detailed study of subordinated GRFs in the elliptic model equation we refer the reader to [8].
4.1. Subordinated GRFs in the diffusion coefficient. It follows from the Lévy-Itô decomposition that any Lévy process on a one-dimensional (time) domain can be additively decomposed into a deterministic drift part, a continuous noise part and a pure-jump process (see [3,Section 2.4]). Motivated by this, we construct the diffusion coefficient a in the elliptic PDE as follows.
• W 1 and W 2 are zero-mean GRFs on D respectively on [0, +∞) 2 with P − a.s. continuous paths. Theorem 4.2. (see [8,Theorem 3.6]) Let a be as in Definition 4.1 and let f ∈ L q (Ω; H), g ∈ L q (Ω; L 2 (Γ 2 )) for some q ∈ [1, +∞). Then there exists a unique pathwise weak solution u(ω, ⋅) ∈ V to problem (2.1) -(2.3) for P-almost every ω ∈ Ω. Furthermore, u ∈ L r (Ω; V ) for all r ∈ [1, q) and where C(a − , D) > 0 is a constant depending only on the indicated parameters. However, accessing this pathwise weak solution numerically is a different matter. Here, we face several challenges: The first difficulty is related to the domain on which the GRF W 2 is defined. The Lévy subordinators l 1 and l 2 can in general attain any value in [0, +∞). Hence, it is necessary to consider the GRF W 2 on the unbounded domain [0, +∞) 2 . However, most regularity and approximation results on GRFs are formulated for the case of a parameter space which is at least bounded and cannot easily be extended to unbounded domains (see e.g. [2, Chapter 1]). Therefore, we modify the diffusion coefficient a from Definition 4.1 and cut the Lévy-subordinators at a deterministic threshold K > 0 depending on the choice of the subordinator. The resulting problem then coincides with the original problem up to a set of samples, whose probability can be made arbitrary small (see [8,Remark 4.1]). Furthermore, we have to bound the diffusion coefficient itself by a deterministic upper bound A in order to show the convergence of the solution (see [8,Section 5] for details). Therefore, we also cut the diffusion coefficient at a deterministic level A > 0. It can be shown that this induces an additional error in the solution approximation which can be controlled and vanishes for growing threshold A (see [8, Section 5.1, esp. Theorem 5.3 and Theorem 5.4]). The two described modifications of the original problem (2.1) -(2.3) are formalized in the following subsection.
We define the cut function χ K (z) ∶= min(z, K), for z ∈ [0, +∞), with a positive number K > 0. Further, for fixed numbers K, A > 0, we consider the following problem where we impose the boundary conditions The diffusion coefficient a K,A is defined by 2 a K,A ∶ Ω × D → (0, +∞), (ω, x, y) ↦ χ A a(x, y) + Φ 1 (W 1 (x, y)) + Φ 2 (W 2 (χ K (l 1 (x)), χ K (l 2 (y)))) . Again, Theorem 4.2 applies in this case and yields the existence of a pathwise weak solution u K,A ∈ L r (Ω; V ), for r ∈ [1, q), if f ∈ L q (Ω; H) and g ∈ L q (Ω; L 2 (Γ 2 )). In [8], the authors investigated in detail how this modification affects the solution u K,A and how the resulting error can be controlled by the choice of the deterministic thresholds K and A. Therefore, from now on we decide to consider problem (4.2) -(4.5) for a fixed choice of K and A and focus on the approximation of the GRFs W 1 , W 2 and the Lévy subordinators l 1 , l 2 in the following. We come back on the choice of K and A in specific situations in Section 8.

4.3.
Approximation of the GRFs and the Lévy subordinators and convergence of the approximated solution. In order to approximate the random solution u K,A of problem (4.2) -(4.5) we have to approximate the GRFs W 1 , W 2 and the Lévy subordinators l 1 , l 2 to be able to generate samples of the diffusion coefficient a K,A defined in Equation (4.5). Therefore, we have to impose some additional assumptions on the GRFs and the Lévy subordinators. We summarize our working assumptions in the following.
i , i ∈ N) the eigenpairs associated to the covariance operators Q 1 and Q 2 . In particular, (e (1) i We assume that the eigenfunctions are continuously differentiable and there exist positive constants α, β, C e , C λ > 0 such that for any i ∈ N it holds ii There exist constants φ, ψ, C lip > 0 such that the continuous functions Φ 1 , Φ 2 ∶ R → [0, +∞) from Definition 4.1 satisfy In particular, Φ 1 ∈ C 1 (R). iii f ∈ L q (Ω; H) and g ∈ L q (Ω; L 2 (Γ 2 )) for some q ∈ (1, +∞). iv a ∶ D → (0, +∞) is deterministic, continuous and there exist constants a + , a − > 0 with a − ≤ a(x, y) ≤ a + for (x, y) ∈ D. v l 1 and l 2 are Lévy subordinators on [0, D] which are independent of the GRFs W 1 and W 2 . Further, we assume that we have approximations l of these processes and there exist constants C l > 0 and η > 1 such that for every s ∈ [1, η) it holds for ε l > 0, x ∈ [0, D] and j = 1, 2.
The first assumption on the eigenpairs of the GRFs is natural (see [11] and [23]). Assumption 4.3 ii is necessary to be able to quantify the error of the approximation of the diffusion coefficient and Assumption 4.3 iii guarantees the existence of a solution. The last assumption ensures that we can approximate the Lévy subordinators with a controllable L s -error, which can always be achieved using piecewise constant approximations of the processes under appropriate assumptions on the tails of the distribution of the Lévy subordinators, see [10, Assumption 3.6, Assumption 3.7 and Theorem 3.21]. For technical reasons we have to work under the following assumption on the integrability of the gradient of the solution ∇u K,A of problem (4.2) -(4.5). This assumption is necessary for the proof of the convergence of the approximation to the solution u K,A in Theorem 4.5. Its origin lies in the fact that we cannot approximate the Lévy subordinators in an L s (Ω; L ∞ ([0, D]))-sense on the domain due to the discontinuities. There are several results on higher integrability of the gradient of the solution to an elliptic PDE of the form (4.2) -(4.5) which guarantee the condition of Assumption 4.4. We refer to [8, Section 5.2] and especially Remark 5.6 and Remark 5.7 therein for more details.
We now turn to the final approximation of the diffusion coefficient using approximations W which are constructed by point evaluation of the random fields W 1 and W 2 on the grid points and linear interpolation between the them. We approximate the diffusion coefficient a K,A from Equation (4.5) by a for (x, y) ∈ D. Further, we denote by u Note that Theorem 4.2 also applies to the elliptic problem with coefficient a . We are now able to state the most important result on the convergence of the approximated solution u (ε W ,ε l ) K,A to u K,A . For a proof we refer the reader to [8]. with a fixed real number γ ∈ (0, min(1, β (2α)). Here, the parameters η, α and β are determined by the GRFs W 1 , W 2 and the Lévy subordinators l 1 , l 2 (see Assumption 4.3). Let m, n ∈ [1, +∞] be real numbers such that and let k reg ≥ 2 and j reg > 0 be the regularity specifiers given by Assumption 4.4. If it holds that n < 1 + j reg 2 and rm < k reg , converges to the solution u K,A of the truncated problem for ε W , ε l → 0 and it holds This result is essential since it guarantees the convergence of the approximated solution u to the solution u K,A with a controllable upper bound on the error. Further, the error estimate given by Theorem 4.5 will be used in the error equilibration for the MLMC estimator in Section 6. It allows to balance the errors resulting from the approximation of the diffusion coefficient and the Finite Element (FE) error resulting from the pathwise numerical approximation of the PDE solution.

Pathwise Finite Element approximation.
In this section, we describe the numerical method which is used to compute pathwise approximations of the solution to the considered elliptic PDE following [8, Section 6]. We use a FE approach with standard triangulations and sample-adapted triangulations of the spatial domain, which is described in the following.

The standard pathwise Finite Element approximation.
We approximate the solution u to problem (2.1) -(2.3) with diffusion coefficient a given by Equation (4.1) using a pathwise FE approximation of the solution u given by (4.6). Therefore, for almost all ω ∈ Ω, we aim to approximate the function u for every v ∈ V with fixed approximation parameters K, A, ε W , ε l . We compute a numerical approximation of the solution to this variational problem using a standard Galerkin approach with linear elements: assume V = (V , ∈ N 0 ) is a sequence of finite-dimensional subspaces V ⊂ V with increasing dim(V ) = d . Further, we denote by (h , ∈ N 0 ) the corresponding refinement sizes which are assumed to converge monotonically to zero for → ∞. Let ∈ N 0 be fixed and denote by {v Expanding the function u Let (K , ∈ N 0 ) be a sequence of triangulations on D and denote by θ > 0 the minimum interior angle of all triangles in K . We assume θ ≥ θ > 0 for a positive constant θ and define the maximum diameter of the triangulation K by h ∶= max K∈K diam(K), for ∈ N 0 as well as the finite dimensional subspaces by V ∶= {v ∈ V v K ∈ P 1 , K ∈ K }, where P 1 denotes the space of all polynomials up to degree one. If we assume that for P−almost all ω ∈ Ω it holds u for some positive number κ a > 0, and that there exists a finite bound for the fixed approximation parameters K, A, we immediately obtain the following estimate using Céa's lemma (see [11,Section 4], [8,Section 6], [26, By construction of the subordinated GRF, we always obtain an interface geometry with fixed angles and bounded jump height in the diffusion coefficient, which have great influence on the solution regularity, see e.g. [33]. Note that, for general deterministic interface problems, one obtains a pathwise discretization error of order κ a ∈ (1 2, 1) and in general one cannot expect the full order of convergence κ a = 1 without special treatment of the discontinuities of the diffusion coefficient (see [5] and [11]). The convergence may be improved by the use of sample-adapted triangulations.

Sample-adapted triangulations.
In [11], the authors suggest sample-adapted triangulations to improve the convergence of the FE approximation for elliptic jump diffusion coefficients. This approach is also used in this paper and the convergence of the corresponding FE method is compared to the performance with the use of standard triangulations. The construction of the sample-adapted triangulations is explained in the following. Consider a fixed ω ∈ Ω and assume that the discontinuities of the diffusion coefficient are described by the partition T (ω) = (T i , i = 1, . . . , τ (ω)) of the domain D with τ (ω) ∈ N and T i ⊂ D. Assume that K (ω) is a triangulation of D which is adjusted to the partition T (ω) in the sense that for every i = 1, . . . , τ (ω) it holds is a deterministic, decreasing sequence of refinement thresholds which converges to zero. We denote byV (ω) ⊂ V the corresponding finite-dimensional subspaces with dimensiond (ω) ∈ N. Figure 2 illustrates the adapted triangulation for a sample of the diffusion coefficient where we used a Poisson(5)-subordinated Matérn-1.5-GRF.
The sample-adapted approach leads to an improved sample-wise convergence rate for the elliptic PDE with discontinuous diffusion coefficient (see e.g. [11,Section 4.1]). This is particularly true in the situation of jump diffusion coefficients with polygonal jump geometry, which is the case for the diffusion coefficients considered in this paper (see Figure 2, [11], [14], [8,Section 7]). While mean squared convergence rates cannot be derived theoretically in our general setting due to the stochastic regularity of the PDE solutions, in practice one at least recovers the convergence rates of the deterministic jump diffusion problem in the strong error, which also has been investigated numerically in [8]. This observation, together with the comments in the end of Subsection 5.1, motivate the following assumption for the remaining theoretical analysis (see [8, Assumption 6.2]).
Assumption 5.1. There exist deterministic constantsĈ u , C u ,κ a , κ a > 0 such that for any ε W , ε l > 0 and any ∈ N 0 , the FE approximation errors ofû adapted) subspacesV , respectively u where the constantsĈ u , C u may depend on a, f, g, K, A but are independent ofĥ , h ,κ a and κ a .
6. MLMC estimation of the solution. In this section we construct a multilevel Monte Carlo (MLMC) estimator for the expectation E(u K,A ) of the PDE solution and prove an a-priori bound for the approximation error. We start with the introduction of a general singlelevel Monte Carlo (SLMC) estimation since the MLMC estimator is an extension of this approach. The next lemma follows by the definition of the inner product on the Sobolev space H 1 (D) =∶ V and will be useful in our theoretical investigations. Lemma 6.1. For independent, centered V -valued random variables Z 1 and Z 2 it holds Proof. We use the definition of the inner product on V together with the independence of Z 1 and Z 2 to calculate random variables and M ∈ N a fixed sample number. The singlelevel Monte Carlo estimator for the approximation of the mean E(u (1) ) is defined by and we have the following standard result (see also [9] and [11]).
One major disadvantage of the SLMC estimator described above is the slow convergence of the (statistical) error for increasing sample numbers M (see Lemma 6.2 and [20]). Multilevel Monte Carlo (MLMC) uses multigrid concepts to reduce the computational complexity for the estimation of the mean compared to the singlelevel approach. The idea is to compute samples of FE approximations with different accuracy where one takes many samples of FE approximations with lower accuracy (and lower computationally costs) and less samples of FE approximations with higher accuracy (and higher computational cost), see also [20] and [21].
For fixed parameters K, A the goal is to approximate the value E(u K,A ). For ease of notation, we focus here on the sample-adapted discretization with the corresponding approximationû with average refinement parameter E(ĥ 2κa ) 1 2 and convergence rateκ a in this section (see Assumption 5.1). However, the reader should always keep in mind that all results also hold in the case of standard triangulations where E(ĥ 2κa ) 1 2 should be replaced by h κa .
Assume a maximum level L ∈ N is given. We consider finite-dimensional subspaces (V , = 0, . . . , L) of V with refinement sizesĥ 0 > ⋅ ⋅ ⋅ >ĥ L > 0 and approximation parameters ε W,0 > ⋅ ⋅ ⋅ > ε W,L for the GRFs and ε l,0 > ⋅ ⋅ ⋅ > ε l,L for the Lévy subordinators. Since we fix the parameters K and A in this analysis, we omit them in the following and use the notationû ε W, ,ε l, , ∶=û for the FEM approximation of u (ε W, ,ε l, ) K,A onV , for = −1, . . . , L, where we setû ε W,−1 ,ε l,−1 ,−1 ∶= 0. If we expand the expectation on the finest level in a telescopic sum we obtain the following representation This motivates the multilevel Monte Carlo estimator, which estimates the left hand side of Equation (6.1) by singlelevel Monte Carlo estimations of each summand on the right hand side (see [20]). To be precise, let M be a natural number for = 0, ..., L. The multilevel Monte Carlo estimator ofû ε W,L ,ε l,L ,L is then defined by copies of the random variablê u ε W, ,ε l, , (resp.û ε W, −1 ,ε l, −1 , −1 ) for = 0, . . . , L (see also [20]). The following result gives an a-priori bound on the MLMC error. Similar formulations can be found, for example, in [9], [1] and [11]. Theorem 6.3. We set r = 2 and assume q > 2 in Assumption 4.3. Further, let b, c ≥ 1 be given such that Theorem 4.5 holds. For L ∈ N, letĥ > 0, M , ε W, > 0 and ε l, > 0 be the level-dependent approximation parameters for = 0, ..., L such thatĥ , ε W, , and ε l, are decreasing with respect to . It holds where C > 0 is a constant which is independent of L and the level-dependent approximation parameters. Note that the numbers γ > 0 and c ≥ 1 are determined by the GRFs resp. the subordinators (cf. Theorem 4.5).
Proof. We estimate We use the triangular inequality, Theorem 4.5 and Assumption 5.1 to obtain For the second term we use the definition of the MLMC estimator E L and Lemma 6.2 to obtain Similar as for the first summand I 1 we apply Theorem 4.5 and Assumption 5.1 to get for = 0, . . . , L and for = −1 it follows from Theorem 4.2 that since q > 2. Finally, we calculate where we used the monotonicity of (ε W, ) L =0 , (ε l, ) L =0 and (ĥ ) L =0 . The error estimate of Theorem 6.3 allows for an equilibriation of the error contributions resulting from the approximation of the diffusion coefficient and the approximation of the pathwise solution with the FE method which then leads to a higher computational efficiency compared to the singlelevel approach. This leads in general to the strategy that one takes only few of the accurate, but expensive samples for large ∈ {0, . . . , L} and one generates more on the cheap, but less accurate samples on the lower levels, which can be seen in the following corollary (see also [11,Section 5], [20] and [21]). for some positive parameter ξ > 0. Then, it holds Proof. We use Theorem 6.3 together with Equation (6.2) and Equation (6.3) to obtain where ζ(⋅) denotes the Riemann zeta function.

Multilevel Monte Carlo with Control
Variates. The jump-discontinuities in the coefficient a K,A of the elliptic problem (4.2) -(4.5) have a negative impact on the FE convergence due to the low regularity of the solution (see Section 5 and [8]). In Subsection 5.2 we presented one possible approach to enhance the FE convergence for discontinuous diffusion coefficients: the sample-adapted FE approach with triangulations adjusted to the discontinuities. However, this approach may be computationally not feasible anymore if one has many jump interfaces. For instance, using subordinators with high jump activity (e.g. Gamma subordinators) may result in a very high number of discontinuities making the construction of sample-adapted triangulations extremely expensive. Besides the usage of adapted triangulations, variance reduction techniques can also be used to improve the computational efficiency of the MLMC estimation of the mean of the PDE solution, as we see in this section. We start with an introduction to a specific variance reduction technique, the Control Variates (CV), and show subsequently how we use a Control Variate in our setting (cf. [31]).
The use of Control Variates aims to reduce the statistical error of a MC estimation by reducing the variance on the right hand side of (7.1). Assume we are given another real valued, square integrable random variable X with known expectation E(X) and a corresponding sequence of i.i.d. random variables (X i , i ∈ N) following the same distribution as X. For a given number of samples M ∈ N, the control variate estimator is then defined by In [31], the authors presented a MLMC-CV combination for the estimation of the mean of the solution to the problem (2.1) -(2.3), where the diffusion coefficient a is modeled as a lognormal GRF. They use a smoothed version of the GRF and the pathwise solution to the corresponding PDE problem to construct a highly-correlated Control Variate. The considered GRFs have at least continuous paths leading to continuous diffusion coefficients. In the following, we show how we use a similar approach for our discontinuous diffusion coefficients to enhance the efficiency of the MLMC estimator for the case of subordinators with high jump activity.

Smoothing the diffusion coefficient.
In this section we construct the Control Variate which is used to enhance the MLMC estimation of the mean of the solution to (4.2) -(4.5) for subordinators with high jump activity. Our approach is motivated by [31].
For a positive smoothing parameter ν s > 0 we consider the Gaussian kernel on R 2 : Further, we identify the jump diffusion coefficient a K,A from Equation (4.5) with its extended version on the domain R 2 , where we set a K,A (x, y) = 0 for (x, y) ∈ R 2 ∖ D, and define the smoothed version a (νs) K,A by convolution with the Gaussian kernel: Obviously, Theorem 4.2 applies also to the smoothed diffusion coefficient a (νs) K,A which guarantees the existence of a solution u (νs) K,A ∈ L r (Ω; V ), for r ∈ [1, q) with f ∈ L q (Ω; H) and g ∈ L q (Ω; L 2 (Γ 2 )), and yields the bound If the smoothing parameter ν s is small, the solution corresponding to the smoothed coefficient a (νs) K,A is highly correlated with the solution to the PDE with (unsmoothed) diffusion coefficient a K,A . Therefore, the smoothed solution is a reasonable choice as a Control Variate in the MLMC estimator being both: highly correlated with the solution to the rough problem and easy to approximate using the FE method due to the high regularity compared to the rough problem (see also [31] and [26, Sections 8 and 9]). Figure 3 shows a sample of the diffusion coefficient and smoothed versions using a Gaussian kernel with different smoothness parameters. 7.3. MLMC-CV estimator. Next, we define the MLMC-CV estimator following [31]. We fix a positive smoothing parameter ν s > 0. The smoothness parameter ν s controls the variance reduction achieved in the MLMC-CV estimator and its specific choice is problem dependent (see Subsection 8.3 and [31]). We assume L ∈ N and consider finite-dimensional subspaces (V , = 0, . . . , L) of V with refinement sizesĥ 0 > ⋅ ⋅ ⋅ >ĥ L > 0 and approximation parameters ε W,0 > ⋅ ⋅ ⋅ > ε W,L for the GRFs and ε l,0 > ⋅ ⋅ ⋅ > ε l,L for the Lévy subordinators (see Subsection 5.2). To unify notation, we focus here again on the sample-adapted discretization with corresponding approximation u (ε W ,ε l ) K,A, with averaged refinement parameter E(ĥ 2κa ) 1 2 and convergence rateκ a for the theoretical analysis of the estimator (see Assumption 5.1 and Section 6) and point out again that similar results hold for the non-adapted FE approach. Since we again fix the parameters K and A in this analysis, we omit them in the following and use the notationû ε W, ,ε l, , ∶=û with sample sizes M ∈ N for = 0, . . . , L.

Convergence of the MLMC-CV estimator.
For the theoretical investigation of the MLMC-CV estimator we extend Assumption 5.1 by the following assumption on the mean-square convergence rate of the pathwise FE method for the smoothed problem.
Assumption 7.1. There exist deterministic constantsĈ u,s , C u,s such that for any ε W , ε l > 0 and any ∈ N 0 , the FE approximation errors ofû where the constantsĈ u,s , C u,s may depend on a, f, g, K, A but are independent ofĥ and h . Further, we assume that Assumption 4.4 also holds for the solution u Note that this assumption is natural since we expect (pathwise) full order convergence of the linear FE method for the smoothed elliptic PDE (see also [1], [9], [11], [31] and [26,Section 8.5] together with [18,Section 6.3]). The assumption on the integrability of the gradient of the solution corresponding to the smoothed problem is also natural under Assumption 4.4, since the solution has a higher regularity than the solution u K,A to the elliptic problem with the jump diffusion coefficient a K,A . The following lemma states that the approximation error of the smoothed coefficient can be bounded by the approximation error of the rough diffusion coefficient.
Lemma 7.2. For t > 1 and fixed parameters ν s , K, A > 0 and any ε W , ε l > 0 it holds for Palmost every ω ∈ Ω a (νs) with a constant C = C(t, ν s , D) which depends only on the indicated parameters.
Proof. Let t ′ > 1 such that 1 t + 1 t ′ = 1. We calculate using Hölder's inequality and the integrability of the Gaussian kernel φ νs a (νs) In order to proof the convergence of the MLMC-CV estimator we need the following error bound on the approximation of the solution of the smoothed problem (cf. Theorem 4.5).
Theorem 7.3. Let r ≥ 2 and b, c ∈ [1, +∞] be given such that it holds rcγ ≥ 2 and 2b ≤ rc < η with a fixed real number γ ∈ (0, min(1, β (2α)). Here, the parameters η, α and β are determined by the GRFs W 1 , W 2 and the Lévy subordinators l 1 , l 2 (see Assumption 4.3). Let m, n ∈ [1, +∞] be real numbers such that and let k reg ≥ 2 and j reg > 0 be the regularity specifiers given by Assumption 4.4. If it holds that n < 1 + j reg 2 and rm < k reg , then the approximated solution u Proof. This theorem follows by the same arguments used in [8,Theorem 5.9] together with Lemma 7.2.
We are now able to prove the following a-priori bound on the mean-square error of the MLMC-CV estimator, similar to Theorem 6.3.
Theorem 7.4. We set r = 2 and assume q > 2. Further, let b, c ≥ 1 be given such that Theorem 4.5 (and Theorem 7.3) hold. For L ∈ N, letĥ > 0, M , ε W, > 0 and ε l, > 0 be the level-dependent approximation parameters, for = 0, ..., L, such thatĥ , ε W, , and ε l, decrease with respect to . It holds where C > 0 is a constant which is independent of L and the level-dependent approximation parameters. Note that the numbers γ > 0 and c ≥ 1 are determined by the GRFs resp. the subordinators (cf. Theorem 4.5 and Theorem 7.3).
Proof. We split the error by For the first term we estimate using Theorem 4.5 and Assumption 5.1 together with Theorem 7.3 and Assumption 7.1 to obtain For the second term we use the definition of the MLMC-CV estimator E CV,L and Lemma 6.2 to estimate We estimate each term in this summand with the same strategy as we did for the term I 1 using Theorem 4.5 and Assumption 5.1 together with Theorem 7.3 and Assumption 7.1 to obtain l, ) +CE(ĥ 2κa ) ).
Together, we obtain where we used monotonicity of (ε W, ) L =0 , (ε l, ) L =0 and (ĥ ) L =0 in the last step. As it is the case for the a-priori error bound for the MLMC estimator (see Theorem 6.3), Theorem 7.4 allows for an equilibration of all error contributions resulting from the approximation of the diffusion coefficient and the approximation of the pathwise solution by the FE method, which can be seen by the following corollary.
Corollary 7.5. Let the assumptions of Theorem 7.4 hold. For L ∈ N and given (stochastic) refinement parametersĥ 0 > ⋅ ⋅ ⋅ >ĥ L > 0 choose ε W, > 0 and ε l, > 0 such that and sample numbers M ∈ N such that for some positive parameter ξ > 0 it holds Then, it holds Proof. See Corollary 6.4.
We want to emphasize that Theorem 7.4 and Corollary 7.5 imply the same asymptotical convergence of the MLMC-CV estimator as the MLMC estimator which has been considered in Section 6. However, it is to be expected that the MLMC-CV estimator is more efficient due to the samplewise correction by the Control Variate and the resulting variance reduction on the different levels. We close this section with a remark on how to compute the mean of the Control Variate.
Remark 7.6. Unlike we assumed the CV mean E(u (νs) K,A ) is in general unknown for fixed parameters K, A, ν s > 0. Corollary 7.5 yields that it is sufficient to approximate the CV mean with any estimator which is convergent with order E(h 2κa L ) 1 2 . In fact, we denote by the realization of the desired estimator and we assume the existence of a constant C CV > 0 such that it holds in the notation of Theorem 7.4. Further, instead of the basis experimentû CV ε W, ,ε l, , from (7.3) we considerũ K,A )), for = 0, . . . , L, and we setũ CV ε W,−1 ,ε l,−1 ,−1 = 0 and denote the corresponding MLMC-CV estimator by E CV,L (ũ CV ε W, ,ε l, , ). Then, by Corollary 7.5, it holds For example, the CV mean could be estimated by another MLMC estimator on the level L where the parameters are choosen according to Corollary 7.5. for ω ∈ Ω. We use a reference grid with 401 × 401 equally spaced points on the domain D for interpolation and prolongation. The GRFs W 1 and W 2 are set to be a Matérn-1.5-GRFs on D (resp. on [0, K] 2 ) with varying correlation lengths and variance parameters. Note that for Matérn-1.5-GRFs we can expect γ = 1 in Theorem 4.5 (see [8,Section 7], [12,Chapter 5], [13]). We simulate the GRFs W 1 and W 2 by the circulant embedding method (see [24] and [25]) to obtain approximations W ≈ W 2 as described in Section 4.3. In the experiments, we choose the diffusion cut-off A in (4.5) large enough such that it has no influence on the numerical experiments for our choice of the GRFs, e.g. A = 100 and choose the cut-off level K for each experiment individually depending on the specific choice of the subordinator.

Numerical examples for the MLMC estimator.
In this section we conduct experiments with the MLMC estimator introduced in Section 6. We consider subordinators with different intensity and GRFs with varying correlation lengths in order to cover problems with different solution regularity. The comparatively low intensity of the subordinators used in this section (see also Subsection 8.3) allows the application of the pathwise sample-adapted approach introduced in Subsection 5.2 which can then be compared with the performance of the MLMC estimator with standard triangulations. During this section, we refer to these approaches with adapted FEM MLMC and non-adapted FEM MLMC. In our experiments, we use Poisson processes to subordinate the GRF W 2 in the diffusion coefficient in (4.5). We consider both, Poisson processes with high and low intensity parameter leading to a different number of jumps in the diffusion coefficient. For the simulation of the Poisson processes we have two options: the processes may be approximated under Assumption 4.3 v but they may also be simulated exactly (see Subsection 8.2.1). Hence, using Poisson subordinators allows for a detailed investigation of the approximation error caused by the approximation of the Lévy subordinators l 1 and l 2 . This will be explained briefly in the following subsection (see also [8,Section 7.3.1]).
8.2.1. The two approximation methods. We simulate the Poisson processes by two conceptional different approaches: the first approach is an exact and grid-independent simulation of a Poisson process using the Uniform Method (see [34,Section 8.1.2]). On the other hand, we may simulate approximations of the Poisson processes satisfying Assumption 4.3 v in the following way (see [8,Section 7 for j = 1, 2. Since the Poisson process has independent, Poisson distributed increments, values of the Poisson process at the discrete points {x i , i = 0, . . . , N l } may be generated by adding independent Poisson distributed random variables with appropriately scaled intensity parameters. For the rest of this paper, we refer to this approach as the approximation approach to simulate a Poisson process. Comparing the results of the MLMC experiments using the two described approaches for the simulation of the Poisson processes allows conclusions to be drawn on the numerical influence of an additional approximation of the subordinator (see Subsection 8.2.2). This is further important especially for situations in which the choice of the subordinators does not allow for an exact simulation of the process. Note that Poisson processes satisfy Assumption 4.3 v with η = +∞ (see [8,Section 7.3.1]). Since γ = 1 (see Subection 8.1), η = +∞ and f ∈ L q (Ω; H) for every q ≥ 1 we choose for any positive δ > 0 r = 2, c = b = 1 + δ to obtain from Theorem 4.5 where we have to assume that j reg > 2((1 + δ) δ − 1) and k reg > 2(1 + δ) δ for the regularity constants j reg , k reg given in Assumption 4.4. For δ = 0.5 we obtain Therefore, we get γ = 1 and c = 1.5 in the equilibration formula (6.2) for the numerical examples with the Poisson subordinators. (1) subordinators. In our first numerical example we use Poisson(1) -subordinators. With this choice, we get on average one jump in each direction of the diffusion coefficient. The standard deviation and the correlation parameters for the GRF W 1 (resp. W 2 ) are set to be σ 2 1 = 1.5 2 and r 1 = 0.5 (resp. σ 2 2 = 0.1 2 and r 2 = 0.5). Figure 4 shows samples of the diffusion coefficient and the corresponding PDE solution. The cut-off threshold K for the subordinators in (4.5) is choosen to be K = 8. With this choice we obtain 1] l j (t) ≥ K) = P(l j (1) ≥ K) ≈ 1.1252e −06 , for j = 1, 2, such that this cut-off has a negligible influence in the numerical example. We compute the RMSE E(u K,A ) − E L (û ε W,L ,ε l,L ,L ) L 2 (Ω;V ) for the sample-adapted and the non-adapted approach using 10 independent runs of the MLMC estimator on the levels L = 1, . . . , 5, where we set  The right hand side of Figure 5 demonstrates a slightly improved efficiency of adapted FEM MLMC compared to non-adapted FEM MLMC. The advantage of the sample-adapted approach can be further emphasized by the use of subordinators with a higher jump intensity and different correlation lengths of the underlying GRF, as we see in the following subsections. (5) subordinators -smooth underlying GRF. In the second numerical example we increase the jump-intensity of the subordinators and investigate the effect on the performance of the MLMC estimators. We use Poisson(5)-subordinators leading to an expected number of 5 jumps in each direction in the diffusion coefficient. The standard deviation and the correlation parameter for the GRF W 1 (resp. W 2 ) are set to be σ 2 1 = 0.5 2 and r 1 = 0.5 (resp. σ 2 2 = 0.3 2 and r 2 = 0.5). Figure 6 shows samples of the diffusion coefficient and the corresponding PDE solution.

Poisson
The cut-off threshold K for the subordinators in (4.5) is choosen to be K = 15. With this choice we obtain for j = 1, 2, such that this cut-off has a negligible influence in the numerical example. In order to avoid an expensive simulation of the GRF W 2 on the domain [0, 15] 2 we set K = 1 instead and consider the downscaled processesl for t ∈ [0, 1] and j = 1, 2. Note that this has no effect on the expected number of jumps of the processes. We use the Uniform Method to simulate the Poisson subordinators and estimate the RMSE of the MLMC estimators for the sample-adapted and the non-adapted approach using 10 independent MLMC runs on the levels L = 1, . . . , 5, where we set h = h = 0.2 ⋅ 1.7 −( −1) for = 1, . . . , 5. Further, we use a reference solution computed on level 7 with singlelevel Monte Carlo.   Figure 7 demonstrates a higher efficiency of the sample-adapted approach. However, one has to mention that differences in the performance of the estimators are rather small due to the comparatively high convergence rate for the non-adapted MLMC approach of approximately 0.85. This is due to the fact that the jumps in the diffusion coefficient are comparatively small on account of the high correlation length of the underlying GRF W 2 . We will see in the following subsection that a higher intensity of the jump heights has a significant negative influence on the performance of the non-adapted FEM MLMC approach. (5) subordinators -rough underlying GRF. In the jump diffusion coefficient (see (4.5)), the jumps are generated by the subordinated GRF in the following way: the number of spatial jumps is determined by the subordinators and the jump intensities (measured in the differences in diffusion values across a jump) are essentially determined by the GRF W 2 and its correlation length. Hence, we may control the jump intensities of the diffusion coefficient by the correlation parameter of the underlying GRF W 2 . In the following experiment we investigate the influence of the jump intensities of the diffusion coefficient on the convergence rates of the MLMC estimators. In Subsection 8.2.3 we subordinated a Matérn-1.5-GRF with correlation length r 2 = 0.5 by Poisson(5)-processes. In the following experiment we set the correlation length of the GRF W 2 to r 2 = 0.1 and leave all the other parameters unchanged. Figure 8 presents samples of the resulting GRFs with the different correlation lengths. By construction of the diffusion coefficient, the subordination of GRFs with small correlation lengths (right plots in Figure 8) results in higher jump intensities in the diffusion coefficient as the subordination of GRFs with higher correlation lengths (left plots in Figure 8). This relationship is demonstrated in Figure 9 (cf. Figure 6). We use the Uniform Method to compute the RMSE of the MLMC estimators for the sampleadapted and the non-adapted approach using 10 independent MLMC runs on the levels L = 1, . . . , 5, where we set h = h = 0.2 ⋅ 1.7 −( −1) for = 1, . . . , 5. Further, we use a reference solution computed on level 7 with singlelevel Monte Carlo. Figure 10 reveals that the higher jump intensities in the diffusion coefficient have a negative impact on the convergence rates of both estimators: the adapted and the non-adapted FEM MLMC approach. We obtain a convergence rate of approximately 0.85 for the adapted FEM MLMC estimator and a smaller rate of approximately 0.7 for the MLMC estimator with non-adapted triangulations. Compared to the experiment discussed in Subsection 8.2.3, where we used Poisson(5)-subordinators and a higher correlation length in the underlying GRF, we observe that both convergence rates are smaller in the current example. This matches our expectations since the FEM convergence rate has been shown to be influenced by the regularity of the jump diffusion coefficient (see e.g. [11] and [33]). It is also important to mention that the RMSE is significantly smaller for the adapted FEM MLMC estimator due to the higher jump intensities in this example. The higher efficiency of the sample-adapted approach is also demonstrated in the time-to-error plot on the right hand side of 10: In this example we see a significant improvement in the time-to-error plot for the adapted FEM MLMC approach compared to the non-adapted FEM MLMC estimator.

8.3.
Numerical examples for the MLMC-CV-estimator. In the following section, we present numerical examples for the MLMC-CV estimator introduced in Section 7. In Subsection 8.2 we considered Poisson subordinators and compared the non-adapted FEM MLMC estimator with the sample-adapted approach and saw that the latter leads to an improved performance of the estimator. However, this approach is computationally not feasible anymore if we consider subordinators with infinite activity, like Gamma subordinators. The aim of this section is to compare the (non-adapted FEM) MLMC estimator with the MLMC-CV estimator for diffusion coefficients with Gamma-subordinated GRFs.
8.3.1. Gamma subordinators. We approximate the Gamma processes in the same way as we approximate the Poisson subordinators in the approximation approach (see Subsection 8.2.1) and obtain a valid approximation in the sense of Assumption 4.3 v for any η < +∞ (see [8,Section 7.4] and [4]). Since we aim to compare the performance of the MLMC estimator with the MLMC-CV estimator we use optimal sample numbers in the numerical experiments in this subsection: Assume level dependent FE discretization sizes h are given, for = 1, . . . , L with L ∈ N. Further, we denote by V AR the (estimated) variances of u ε W, ,ε l, , − u ε W, −1 ,ε l, −1 , −1 (resp. u CV ε W, ,ε l, , − u CV ε W, −1 ,ε l, −1 , −1 for the MLMC-CV estimator). The optimal sample numbers are then given by the formula since this choice minimizes the variance of the MLMC(-CV) estimator for fixed computational costs (see [21,Section 1.3]). In our numerical experiments we choose l 1 and l 2 to be Gamma (4,10) processes. We set the diffusion cut-off to K = 2 to obtain P( sup t∈[0,1] l j (t) ≥ K) = P(l j (1) ≥ 2) ≈ 3.2042e −06 , for j = 1, 2. Hence, the influence of the subordinator cut-off is again negligible in our numerical experiments. Due to the high jump intensity we have to choose a sufficiently small smoothness parameter ν s since otherwise important detailed information of the diffusion coefficient might be unused. In our two numerical examples, we choose ν s = 0.01 which is small enough for Gamma(4, 10)-subordinators. The expectation of the mean of the control variate E(u (νs) K,A ) is estimated by a non-adapted FEM MLMC estimator on level L (see Remark 7.6). The standard deviation of the W 1 is set to be σ 2 1 = 1.5 2 and the correlation length is defined by r 1 = 0.5. The parameters of the GRF W 2 are varied in our numerical experiments.
8.3.2. MLMC-CV vs. MLMC for infinite activity subordinators. In this numerical example we choose σ 2 2 = 0.3 2 and r 2 = 0.05 for the GRF W 2 . Figure 11 shows samples of the diffusion coefficient and the corresponding PDE solutions. We define the level dependent FE  Figure 12 shows a similar convergence rate of approximately 0.75 for the MLMC and the MLMC-CV estimator. However, the sample-wise correction by the smooth PDE problem in the MLMC-CV estimator improves the approximation which yields significantly smaller values for the RMSE on the different levels compared to the standard MLMC estimator. The efficiency improvement obtained by the Control Variate is further demonstrated on the right hand side of Figure 12: The time-to-error plot demonstrates that the computational effort which is necessary to achieve a certain accuracy is significantly smaller for the MLMC-CV estimator compared to the standard MLMC estimator. In our last numerical example we choose Φ 1 = 1 5 exp(⋅), Φ 2 = 3 ⋅ and σ 2 2 = 0.5 2 , r 2 = 0.2 for the GRF W 2 and leave all other parameters unchanged. This leads to diffusion coefficient which is more noise accentuated with reduced jump intensity (see also Subsection 8.2.4) as can be seen in Figure 13. As in the last experiment, we define the level dependent FE discretization parameters h = 0.3 ⋅ 1.7 −( −1) for = 1, . . . , 5 and compare the MLMC estimator with the MLMC-CV estimator. We use 10 independent MLMC runs on the levels L = 1, . . . , 5 to estimate the RMSE and use a singlelevel Monte Carlo estimation on level 7 as reference solution. The results are given in Figure  14. The reduced jump intensity together with the emphasized (continuous) noise in the diffusion coefficient leads to a slightly improved convergence rate of approximately 0.85 for the estimators in this example (cf. Figure 12). As in the first experiment, we see that the usage of the Control Variate yields a significant improvement which is reflected in smaller values for the RMSE on the different levels compared to the standard MLMC approach. As expected, the right hand side of Figure 14 shows an improved efficiency of the MLMC-CV estimator compared to the standard MLMC estimator without Control Variates.