1 Introduction

The efficient numerical solution of partial differential equations with uncertain inputs is key in forward uncertainty quantification, i.e., the computational quantification of uncertainty of solutions to PDEs with uncertain inputs. It is also crucial in computational inverse uncertainty quantification, e.g. via Markov chain Monte Carlo  methods, where numerous numerical solves of the forward model subject to realizations of the uncertain input are required. Here, we consider the linear, elliptic diffusion with uncertain coefficient. It models a wide range of phenomena such as diffusion through a medium with uncertain or even unknown permeability, e.g. in subsurface flow, light scattering in dust clouds, to name but a few. Physical modelling of subsurface flow in particular stipulates systems of fractures of uncertain geometry with high permeability along fractures (see, e.g., [8] and the references there). With fracture geometry being only statistically known, it is natural in computational uncertainty quantification  (UQ) to specify the geometry in a nonparametric fashion, rather than, for instance, through a Gaussian random field (GRF for short) with a known, parametric two-point correlation to be calibrated from experimental data. This function space perspective has also become topical recently in the context of inverse imaging noisy signals. Modelling with random, fractal geometries also has found applications in biology (roots, lungs [2]). There, Gaussian parametric models have been found computationally efficient due to the availability of padding and circulant embedding based numerics, enabling the use of fast Fourier transform algorithms for sample path generation. However, Gaussian models are perceived as inadequate for the efficient representation of edges and interfaces in imaging. Accordingly, non-parametric representations of inputs with fractal irregularities in sample paths have been proposed recently, e.g. in [19, 23], and the references there. We also mention the so-called Besov priors in Bayesian inverse problems with elliptic PDE constraints (e.g. [4, 22, 24] and the references there).

In the present paper, we investigate so-called Besov random tree priors [23], as stochastic log-diffusion coefficient in a linear elliptic PDE. These priors are given by a wavelet series with stochastic coefficients, and certain terms in the expansion vanishing at random, according to the law of so-called Galton-Watson trees. Samples of the corresponding random fields involve fractal geometries, hence the Besov random tree prior may be a viable candidate in applications, where models based on GRFs do not allow for sufficiently flexibility. We quantify the pathwise regularity of the random tree prior in terms of Hölder-regularity, and investigate the forward propagation of the uncertainty in the elliptic PDE model in a bounded domain. All results in the present article encompass the “standard” Besov prior from [24] as special case, when no terms in the wavelet series are eliminated. As we point out in our analysis, regularity is inherently low, both with respect to the spatial and stochastic domain of the random field. This is taken into account when developing efficient numerical methods for the elliptic PDE problem at hand.

We develop a multilevel Monte Carlo  (MLMC) Finite Element (FE) simulation algorithm and furnish its mathematical analysis for the estimation of quantities of interest (QoI) in the forward PDE model. Multilevel Monte Carlo  methods ([6, 14, 15]) are, by now, a well-established methodology in computational UQ, and are effective in regimes with comparably low regularity. We mention that our MLMC-FE error analysis includes the case of standard low-regularity Besov priors as a special case. In contrast, higher-order methods that consider an equivalent parametric, deterministic PDE problem, such as (multilevel) Quasi-Monte Carlo  ([20]), generalized polynomial chaos (gPC) expansions ([21]), or multilevel Smolyak quadrature ([28]) are not suitable in the present random tree model: The parametrization of the prior involves discontinuities in the stochastic domain, which strongly violates the regularity requirements of the aforementioned higher-order methods. We refer to Appendix A.3 for a detailed discussion. On the other hand, MLMC techniques merely require square-integrability of the pathwise solution

1.1 Contributions

For a model linear elliptic diffusion equation, in a polytopal domain \({\mathcal {D}}\subset {\mathbb {R}}^d\), we provide the mathematical setting and the numerical analysis of a MLMC-FEM for diffusion in random media with log-fractal Besov random tree structure. In particular, we establish well-posedness of the forward problem including strong measurability of random solutions (a key ingredient in the ensuing MLMC-FE convergence analysis), and pathwise almost sure Besov regularity of weak solutions. Technical results of independent interest include: (i) Bounds on exponential moments of Besov random variables in Hölder norms, generalizing results in [11, 23, 24], (ii) Numerical analysis of elliptic forward problems with fractal coefficient, in particular bounds on the fractal scale truncation error and on the finite element approximation error, as well as the impact of numerical quadrature in view of low (Hölder) path regularity of the random coefficients, (iii) a complete MLMC-FE convergence analysis, for estimating the mean of non-linear functionals of the random solution field.

1.2 Preliminaries and notation

We denote by \({\mathcal {V}}'\) the topological dual for any vector space \({\mathcal {V}}\) and by \(_{{\mathcal {V}}'}^{}{\langle }_{}^{} \cdot ,\cdot \rangle _{{\mathcal {V}}}\) the associated dual pairing. We write \({\mathcal {X}}\hookrightarrow {\mathcal {Y}}\) for two normed spaces \(({\mathcal {X}}, \left\| \cdot \right\| _{\mathcal {X}}), ({\mathcal {Y}}, \left\| \cdot \right\| _{\mathcal {Y}})\), if \({\mathcal {X}}\) is continuously embedded in \({\mathcal {Y}}\), i.e., there exists \(C>0\) such that \(\Vert \varphi \Vert _{\mathcal {Y}}\le C\Vert \varphi \Vert _{\mathcal {X}}\) holds for all \(\varphi \in {\mathcal {X}}\). The Borel \(\sigma \)-algebra of any metric space \(({\mathcal {X}}, d_{\mathcal {X}})\) is generated by the open sets in \({\mathcal {X}}\) and denoted by \({\mathcal {B}}({\mathcal {X}})\). For any \(\sigma \)-finite and complete measure space \((E,{\mathcal {E}},\mu )\), a Banach space \(({\mathcal {X}}, \left\| \cdot \right\| _{\mathcal {X}})\), and integrability exponent \(p\in [1,\infty ]\), we define the Lebesgue-Bochner spaces

$$\begin{aligned} L^p(E; {\mathcal {X}}):=\{\varphi :E\rightarrow {\mathcal {X}}|\; \varphi \text { is strongly measurable and }\Vert \varphi \Vert _{L^p(E;{\mathcal {X}})}<\infty \}, \end{aligned}$$

where

$$\begin{aligned} \Vert \varphi \Vert _{L^p(E;{\mathcal {X}})}:= {\left\{ \begin{array}{ll} \left( \int _{E}\Vert \varphi (x)\Vert _{\mathcal {X}}^p\mu (dx)\right) ^{1/p},\quad &{}p\in [1,\infty ) \\ {{\,\textrm{ess sup}\,}}_{x\in E} \Vert \varphi (x)\Vert _{\mathcal {X}},\quad &{}p=\infty . \end{array}\right. } \end{aligned}$$

In case that \({\mathcal {X}}={\mathbb {R}}\), we use the shorthand notation \(L^p(E):=L^p(E;{\mathbb {R}})\). If \(E\subset {\mathbb {R}}^d\) is a subset of the Euclidean space, we assume \({\mathcal {E}}={\mathcal {B}}(E)\) and \(\mu \) is the Lebesgue measure, unless stated otherwise. For any bounded and connected spatial domain \({\mathcal {D}}\subset {\mathbb {R}}^d\) we denote for \(k\in {\mathbb {N}}\) and \(p\in [1,\infty ]\) the standard Sobolev space \(W^{k,p}({\mathcal {D}})\) with k-order weak derivatives in \(L^p({\mathcal {D}})\). The Sobolev-Slobodeckji space with fractional order \(s\ge 0\) is denoted by \(W^{s,p}({\mathcal {D}})\). Furthermore, \(H^s({\mathcal {D}}):=W^{s,2}({\mathcal {D}})\) for any \(s\ge 0\) and we use the identification \(H^0({\mathcal {D}})=L^2({\mathcal {D}})\). Given that \({\mathcal {D}}\) is a Lipschitz domain, we define for any \(s>1/2\)

$$\begin{aligned} H_0^s({\mathcal {D}}):=\textrm{ker}(\gamma _0) = \{\varphi \in H^s({\mathcal {D}})|\; \gamma _0(\varphi )=0 \;\text{ on }\; \partial {\mathcal {D}}\}, \end{aligned}$$
(1)

Here, \(\gamma _0\in {\mathcal {L}}(H^s({\mathcal {D}}),H^{s-1/2}(\partial {\mathcal {D}}))\) denotes the trace operator.

Let \(\textrm{C}(\overline{{\mathcal {D}}})\) denote the space of all continuous functions \(\varphi :\overline{{\mathcal {D}}}\rightarrow {\mathbb {R}}\). For any \(\alpha \in {\mathbb {N}}\), \(\textrm{C}^{\alpha }(\overline{{\mathcal {D}}})\) is the space of all functions \(\varphi \in \textrm{C}(\overline{{\mathcal {D}}})\) with \(\alpha \) continuous partial derivatives. For non-integer \(\alpha >0\), we denote by \(\textrm{C}^{\alpha }(\overline{{\mathcal {D}}})\) the space of all \(\varphi \in \textrm{C}^{\lfloor \alpha \rfloor }(\overline{{\mathcal {D}}})\) with \(\alpha -\lfloor \alpha \rfloor \)-Hölder continuous \(\lfloor \alpha \rfloor \)-th partial derivatives. For any positive, real \(\alpha >0\) we further denote by \({\mathcal {C}}^\alpha ({\mathcal {D}})\) the Hölder-Zygmund space of smoothness \(\alpha \). We refer to, e.g., [25, Section 1.2.2] for a definition. We denote by \(\textrm{S}({\mathbb {R}}^d)\) the Schwartz space of all smooth, rapidly decaying functions, and with \(\textrm{S}'({\mathbb {R}}^d)\) its dual, the space of tempered distributions. Moreover, for any open set \(O\subseteq {\mathbb {R}}^d\), \(\mathrm D(O)\) denotes the space of all smooth functions \(\varphi \in \textrm{C}^\infty (O)\) with compact support in O.

For the finite element error analysis we introduce a countable set \({\mathfrak {H}}\subset (0,\infty )\) of discretization parameters, and denote by \(h\in {\mathfrak {H}}\) a generic discretization parameter, such as in the present paper the FE meshwidth of a regular, simplicial and quasi-uniform partition of the physical domain. We further assume there exists a decreasing sequence \((h_\ell , \ell \in {\mathbb {N}})\subset {\mathfrak {H}}\) such that \(\lim _{\ell \rightarrow \infty } h_\ell = 0\).

1.3 Layout of this paper

In Sect. 2 we introduce the class of random fields taking values in the Besov spaces \(B^s_{p,p}\) which we will use in the sequel to model the logarithm of the diffusion coefficient function. Using multiresolution (“wavelet”) bases in \(B^s_{p,p}\), in Sects. 2.22.3 we construct probability measures on \(B^s_{p,p}\) in the spirit of the Gaussian measure on path space for the Wiener process, in Lévy-Cieselski representation. The multilevel structure of the construction will be essential in the ensuing MLMC-FE convergence analysis and its algorithmic realization. In Sect. 3 we introduce the linear, elliptic divergence-form PDE with Besov-tree coefficients. We recapitulate (mostly known) results on existence, uniqueness and on strong measurability of random solutions. In Sect. 4 we introduce a conforming Galerkin Finite Element discretization based on continuous, piecewise linear approximations in the physical domain. We account for the error due to finite truncation of the random tree priors, and provide sharp error bounds for the Finite Element discretization errors, under the (generally low) Besov regularity of the coefficient samples. Section 5 then addresses the MLMC-FE error analysis, also for Fréchet-differentiable, possibly nonlinear functionals. Section 6 then illustrates the theory with several numerical experiments, where the impact of the parameter choices in the Besov random tree priors on the overall error convergence of the MLMC-FEM algorithms is studied. Section 7 provides a brief summary of the main results, and indicates several generalizations of these and directions of further research. Appendix A collects definitions and key properties of Galton-Watson trees which are used in the main text. Appendix B provides a detailed description of the FE implementation in the experiments reported in Sect. 6.

2 Random variables in Besov spaces

2.1 Wavelet representation of Besov spaces

Let \({\mathbb {T}}^d:=[0,1]^d\) denote the d-dimensional torus for \(d\in {\mathbb {N}}\). We briefly recall the construction of orthonormal wavelet basis on \(L^2({\mathbb {R}}^d)\) and \(L^2({\mathbb {T}}^d)\) and the wavelet representation of the associated Besov spaces. For more detailed accounts we refer to [26, Chapter 1], [27, Chapter 1.2], and to [12, Chapter 5] for orthonormal wavelets in multiresolution analysis (MRA).

2.1.1 Univariate MRA

Let \(\phi \) and \(\psi \) be compactly supported scaling and wavelet functions in \(\textrm{C}^{\alpha }({\mathbb {R}})\), \(\alpha \ge 1\), suitable for multi-resolution analysis in \(L^2({\mathbb {R}})\). We assume that \(\psi \) satisfies the vanishing moment condition

$$\begin{aligned} \int _{\mathbb {R}}\psi (x) x^m dx = 0, \quad m\in {\mathbb {N}}_0,\; m<\alpha . \end{aligned}$$
(2)

One example are Daubechies wavelets with \( M:= \lfloor \alpha \rfloor \in {\mathbb {N}}\) vanishing moments also known as \({\mathrm D}{\mathrm B}(\lfloor \alpha \rfloor )\)-wavelets), that have support \([-M+1,M]\) and are in \(\textrm{C}^1({\mathbb {R}})\) for \(M\ge 5\) (see, e.g., [12, Section 7.1]). For any \(j\in {\mathbb {N}}_0\) and \(k\in {\mathbb {Z}}\), the MRA is defined by the dilated and translated functions

$$\begin{aligned} \psi _{j,k,0}(x):=\phi (2^jx-k), \quad \text {and} \quad \psi _{j,k,1}(x):=\psi (2^jx-k), \quad x\in {\mathbb {R}}. \end{aligned}$$
(3)

As \(\Vert \phi \Vert _{L^2({\mathbb {R}})}=\Vert \psi \Vert _{L^2({\mathbb {R}})}=1\), it follows that \(((\psi _{0,k,0}), k\in {\mathbb {Z}})\,\cup \,((2^{j/2}\psi _{j,k,1}), (j,k)\in {\mathbb {N}}_0\times {\mathbb {Z}})\) is an orthonormal basis of \(L^2({\mathbb {R}})\).

2.1.2 Multivariate MRA

A corresponding isotropicFootnote 1 wavelet basis that is orthormal in \(L^2({\mathbb {R}}^d)\), \(d\ge 2\) may be constructed by tensorization of univariate MRAs. We define index sets \({\mathcal {L}}_0:=\{0,1\}^d\) and \({\mathcal {L}}_j:={\mathcal {L}}_0\setminus \{(0,\dots ,0)\}\) for \(j\in {\mathbb {N}}\). We note that \({\mathcal {L}}_j\) has cardinality \(|{\mathcal {L}}_j|=2^d\) if \(j=0\), and \(|{\mathcal {L}}_j|=2^d-1\) otherwise. For any \(l\in {\mathcal {L}}_0\), we define furthermore

$$\begin{aligned} \psi _{j,k,l}(x):=2^{dj/2}\prod _{i=1}^d\psi _{j,k_i,l(i)}(x_i), \quad j\in {\mathbb {N}}_0,\; k\in {\mathbb {Z}}^d,\; x\in {\mathbb {R}}^d, \end{aligned}$$
(4)

to obtain that \(((\psi _{j,k,l}),\; j\in {\mathbb {N}}_0,\, k\in {\mathbb {Z}}^d,\, l\in {\mathcal {L}}_j)\) is an orthonormal basis of \(L^2({\mathbb {R}}^d)\).

Orthonormal bases consisting of locally supported, periodic functions on the torus \({\mathbb {T}}^d\) can be introduced by tensorization, as e.g. in [26, Section 1.3]. Given \(\phi \) and \(\psi \), we fix a scaling factor \(w\in {\mathbb {N}}\) such that

$$\begin{aligned} \text {supp}(\psi _{w,0,l})\subset \left\{ x\in {\mathbb {R}}^d\big |\, \Vert x\Vert _2<\frac{1}{2}\right\} , \quad l\in {\mathcal {L}}_0. \end{aligned}$$

With this choice of w, it follows for \(j\in {\mathbb {N}}_0\) that

$$\begin{aligned} \text {supp}(\psi _{j+w,0,l})\subset \left\{ x\in {\mathbb {R}}^d\big |\, \Vert x\Vert _2<2^{-j-1}\right\} . \end{aligned}$$

Now let \(K_{j}:=\{k\in {\mathbb {Z}}^d|\,0\le k_1,\dots ,k_d<2^{j}\}\subset 2^{j}{\mathbb {T}}^d\) and note that \(|K_{j+w}|=2^{d(j+w)}\). Define the one-periodic wavelet functions

$$\begin{aligned} \psi _{j,k,l}^{per}(x):=\sum _{n\in {\mathbb {Z}}^d} \psi _{j,k,l}(x-n), \quad j\in {\mathbb {N}}_0,\; k\in K_j,\;l\in {\mathcal {L}}_0,\;x\in {\mathbb {R}}^d, \end{aligned}$$

and their restrictions to \({\mathbb {T}}^d\) by

$$\begin{aligned} \psi _{j,k}^{l}(x):= \psi _{j,k,l}^{per}(x), \quad j\in {\mathbb {N}}_0,\; k\in K_j,\;l\in {\mathcal {L}}_0,\;x\in {\mathbb {T}}^d. \end{aligned}$$
(5)

We now obtain for the index set \({\mathcal {I}}_{w}:=\{j\in {\mathbb {N}}_0,\; k\in K_{j+w},\;l\in {\mathcal {L}}_j\}\) that

$$\begin{aligned} {\varvec{\Psi }}_w:=\left( (\psi _{j+w,k}^l),\;(j,k,l)\in {\mathcal {I}}_w \right) \end{aligned}$$
(6)

is a \(L^2({\mathbb {T}}^d)\)-orthonormal basis, see [26, Proposition 1.34]. We next introduce Besov spaces via suitable wavelet-characterization as developed in [26, Chapter 1.3]. For this we introduce the set of one-periodic distributions on \({\mathbb {R}}^d\) given by

$$\begin{aligned} \textrm{S}'^{per}({\mathbb {R}}^d):=\{\varphi \in \textrm{S}'({\mathbb {R}}^d)\big |\; \varphi (\cdot - k) = \varphi \quad \text {for all }k\in {\mathbb {Z}}^d\}, \end{aligned}$$

see [26, Eq. (1.131)]. We distinguish between spaces of one-periodic functions on \({\mathbb {R}}^d\) and their restrictions to \({\mathbb {T}}^d\):

Definition 2.1

  1. 1.

    For any \(p\in [1,\infty )\) and \(s\in (0,\alpha )\) the Besov space \(B_{p,p}^{s,per}({\mathbb {R}}^d)\) of one-periodic functions on \({\mathbb {R}}^d\) is given by

    $$\begin{aligned} B_{p,p}^{s,per}({\mathbb {R}}^d):= \left\{ \varphi \in \textrm{S}'^{per}({\mathbb {R}}^d)\bigg |\; \sum _{(j,k,l)\in {\mathcal {I}}_w} 2^{jp\left( s+\frac{d+w}{2}-\frac{d}{p}\right) } |(\varphi ,\psi _{j+w,k,l}^{per})_{L^2({\mathbb {T}}^d)}|^p <\infty \right\} .\nonumber \\ \end{aligned}$$
    (7)

    In case that \(p=\infty \), one has

    $$\begin{aligned} B_{\infty ,\infty }^{s,per}({\mathbb {R}}^d):= \left\{ \varphi \in \textrm{S}'^{per}({\mathbb {R}}^d)\bigg |\; \sup _{(j,k,l)\in {\mathcal {I}}_w} 2^{j\left( s+\frac{d+w}{2}\right) } |(\varphi ,\psi _{j+w,k,l}^{per})_{L^2({\mathbb {T}}^d)}| <\infty \right\} .\nonumber \\ \end{aligned}$$
    (8)
  2. 2.

    For any \(p\in [1,\infty )\) and \(s\in (0,\alpha )\) the Besov space \(B_{p,p}^s({\mathbb {T}}^d)\) on \({\mathbb {T}}^d\) is given by

    $$\begin{aligned} B_{p,p}^s({\mathbb {T}}^d):= \left\{ \varphi \in \mathrm D'({\mathbb {T}}^d) \bigg |\; \sum _{(j,k,l)\in {\mathcal {I}}_w} 2^{jp\left( s+\frac{d+w}{2}-\frac{d}{p}\right) } |(\varphi ,\psi _{j+w,k}^l)_{L^2({\mathbb {T}}^d)}|^p <\infty \right\} . \end{aligned}$$
    (9)

    In case that \(p=\infty \), we set

    $$\begin{aligned} B_{\infty ,\infty }^s({\mathbb {T}}^d):= \left\{ \varphi \in \mathrm D'({\mathbb {T}}^d) \bigg |\; \sup _{(j,k,l)\in {\mathcal {I}}_w} 2^{j\left( s+\frac{d+w}{2}\right) } |(\varphi ,\psi _{j+w,k}^l)_{L^2({\mathbb {T}}^d)}| <\infty \right\} . \end{aligned}$$
    (10)

Remark 2.2

Definition 2.1 may be generalized to define the spaces \(B_{p,q}^{s,per}({\mathbb {R}}^d)\) and \(B_{p,q}^s({\mathbb {T}}^d)\) with \(p,q\in [1,\infty ]\) and \(p\ne q\), see [26, Chapter 1.3]. The random fields introduced in Sects. 2.2 and 2.3 are naturally \(B_{p,p}^s({\mathbb {T}}^d)\)-valued by construction, thus we only treat the case \(p=q\) for the sake of brevity. By [26, Theorem 1.29] there exists a prolongation isomorphism

$$\begin{aligned} \textrm{prl}^{per}:B_{p,p}^s({\mathbb {T}}^d)\rightarrow B_{p,p}^{s,per}({\mathbb {R}}^d), \end{aligned}$$
(11)

that extends \(\varphi \in B_{p,p}^s({\mathbb {T}}^d)\) to its (unique) one-periodic counterpart \(\textrm{prl}^{per}(\varphi )\in B_{p,p}^{s,per}({\mathbb {R}}^d)\). This in turn allows to identify any \(\varphi \in B_{p,p}^s({\mathbb {T}}^d)\) as the restriction of a periodic function \(\textrm{prl}^{per}(\varphi )\in B_{p,p}^{s,per}({\mathbb {R}}^d)\) to \({\mathbb {T}}^d\). We use \(\textrm{prl}^{per}\) to define (non-periodic) Besov space-valued random variables on subsets \({\mathcal {D}}\subset {\mathbb {T}}^d\) by restriction in Sect. 3.2.

Definition 2.1 is based on an equivalent characterization of the spaces \(B_{p,p}^{s,per}({\mathbb {R}}^d)\) and \(B_{p,p}^s({\mathbb {T}}^d)\). They are often (equivalently) defined using a dyadic partition of unity (see e.g. [26, Definitions 1.22 and 1.27]): Using the latter definition for \(B_{p,p}^{s,per}({\mathbb {R}}^d)\), [26, Theorem 1.36(i)] shows that the spaces (7) resp. (8) are isometrically isomorphic to \(B_{p,p}^{s,per}({\mathbb {R}}^d)\). As a consequence of the prolongation map \(\textrm{prl}^{per}\) in (11), the same holds true for the spaces \(B_{p,p}^s({\mathbb {T}}^d)\), see [26, Theorem 1.37(i)].

2.1.3 Besov spaces and MRAs

We define the subspace \(V_{w+1}:=\text {span}\{\psi _{w,k}^l|\; k\in K_w,\ l\in {\mathcal {L}}_0\}\subset L^2({\mathbb {T}}^d)\) and observe that \(\dim (V_{w+1})=2^{d(w+1)}\). By the multiresolution analysis for one-periodic, univariate functions in [12, Chapter 9.3], it follows that \(( (\psi _{j,k}^l),\;j\le w,\; k\in K_j,\ l\in {\mathcal {L}}_j)\) is another orthonormal basis of \(V_{w+1}\). Hence, we may replace the first \(2^{d(w+1)}\) basis functions in (5) to obtain the (computationally more convenient) \(L^2({\mathbb {T}}^d)\)-orthonormal basis

$$\begin{aligned} {\varvec{\Psi }}:=\left( (\psi _{j,k}^l),\;(j,k,l)\in {\mathcal {I}}_{{\varvec{\Psi }}} \right) , \quad {\mathcal {I}}_{{\varvec{\Psi }}}:=\{j\in {\mathbb {N}}_0,\; k\in K_j,\;l\in {\mathcal {L}}_j\}. \end{aligned}$$
(12)

Based on (12), we define for \(s>0\), \(p\in [1,\infty )\), and sufficiently regular \(\varphi \in L^2({\mathbb {T}}^d)\) the Besov norms

$$\begin{aligned} \Vert \varphi \Vert _{B_{p,p}^s({\mathbb {T}}^d)}:= \left( \sum _{(j,k,l)\in {\mathcal {I}}_{{\varvec{\Psi }}}} 2^{jp\left( s+\frac{d}{2}-\frac{d}{p}\right) } |(\varphi ,\psi _{j,k}^l)_{L^2({\mathbb {T}}^d)}|^p \right) ^{1/p}, \end{aligned}$$
(13)

and

$$\begin{aligned} \Vert \varphi \Vert _{B_{\infty ,\infty }^s({\mathbb {T}}^d)}:= \sup _{(j,k,l)\in {\mathcal {I}}_{{\varvec{\Psi }}}} 2^{j\left( s+\frac{d}{2}\right) } |(\varphi ,\psi _{j,k}^l)_{L^2({\mathbb {T}}^d)}| <\infty . \end{aligned}$$
(14)

By Definition 2.1, it follows that \(\varphi \in B_{p,p}^s({\mathbb {T}}^d)\) if and only if \(\Vert \varphi \Vert _{B_{p,p}^s({\mathbb {T}}^d)}<\infty \).

2.1.4 Notation

We fix some notation for Besov, Hölder and Zygmund spaces to be used in the remainder of this paper. As the (periodic) domain \({\mathbb {T}}^d\) does not vary in the subsequent analysis, we use the abbreviations \(B_p^s:=B_{p,p}^s({\mathbb {T}}^d)\), \(\textrm{C}^\alpha :=\textrm{C}^\alpha ({\mathbb {T}}^d)\) and \({\mathcal {C}}^\alpha :={\mathcal {C}}^\alpha ({\mathbb {T}}^d)\) for convenience in the following. Furthermore, we will assume that the basis functions in \({\varvec{\Psi }}_w, {\varvec{\Psi }}\subset \textrm{C}^\alpha \), are sufficiently smooth with Hölder index \(\alpha >s\) for given \(s>0\), and therefore omit the restriction \(s\in (0,\alpha )\) in the following. In this case, it holds that \({\varvec{\Psi }}_w\) (and thus \({\varvec{\Psi }}\)) is a basis of \(B_p^s\) for \(p<\infty \), see [26, Theorem 1.37].

We recall that for any \(s>0\) there holds \({\mathcal {C}}^s=B^s_\infty \) (see [26, Remark 1.28]), as well as \(\textrm{C}^s={\mathcal {C}}^s\) for \(s\in (0,\infty )\setminus {\mathbb {N}}\), and \(\textrm{C}^s\subsetneq {\mathcal {C}}^s\) for \(s\in {\mathbb {N}}\) (see [25, Section 1.2.2]). By (13) and (14) we further obtain the continuous embeddings

$$\begin{aligned} \begin{aligned} B_p^s\hookrightarrow B_q^t&\quad \text {if }1\le p\le q<\infty \text { and }s-\frac{d}{p}\ge t-\frac{d}{q}, \\ B_p^s\hookrightarrow B^t_\infty ={\mathcal {C}}^t&\quad \text {for }t\in \left( 0,s-\frac{d}{p}\right] , \end{aligned} \end{aligned}$$
(15)

with the embedding constants in (15) bounded by one (cf. [27, Chapter 2.1]).

2.2 Besov priors

To introduce Besov space-valued random variables as in [24], we consider a complete probability space \((\Omega ,{\mathcal {A}}, P)\). Following the constructions in [4, 11, 23], based on the representation in (9), we now define \(B_p^s\)-valued random variables by replacing the \(L^2({\mathbb {T}}^d)\)-orthogonal projection coefficients \((\varphi ,\psi _{j,k}^l)_{L^2({\mathbb {T}}^d)}\) with suitable random variables. More precisely, consider for any \(p\in [1,\infty )\) an independent and identically distributed (i.i.d.) sequence \(X=((X_{j,k}^l), (j,k,l)\in {{\mathcal {I}}_{\varvec{\Psi }}})\) of p-exponential random variables. That is, each \(X_{j,k}^l:\Omega \rightarrow {\mathbb {R}}\) is \({\mathcal {A}}/{\mathcal {B}}({\mathbb {R}})\)-measurable with density

$$\begin{aligned} \phi _p(x):=\frac{1}{c_p}\exp \left( -\frac{|x|^p}{\kappa }\right) , \quad x\in {\mathbb {R}}, \qquad c_p:= \int _{\mathbb {R}}\exp \left( -\frac{|x|^p}{\kappa }\right) dx, \end{aligned}$$
(16)

where \(\kappa >0\) is a fixed scaling parameter. We recover the normal distribution with variance \(\frac{\kappa }{2}\) if \(p=2\), and the Laplace distribution with scaling \(\kappa \) for \(p=1\).

Definition 2.3

[24, Definition 9] Let \({\varvec{\Psi }}\) be the \(L^2({\mathbb {T}}^d)\)-orthogonal wavelet basis as in (12), let \(s>0\), \(p\in [1,\infty )\) and let \(X=((X_{j,k}^l), (j,k,l)\in {{\mathcal {I}}_{\varvec{\Psi }}})\) be an i.i.d. sequence of p-exponential random variables with density \(\phi _p\) as in (16). Let the random field \(b:\Omega \rightarrow L^2({\mathbb {T}}^d)\) be given by

$$\begin{aligned} b(\omega ):= \sum _{(j,k,l)\in {{\mathcal {I}}_{\varvec{\Psi }}}} \eta _j X_{j,k}^l(\omega )\psi _{j,k}^l, \quad \omega \in \Omega , \quad \text {where}\quad \eta _j:= 2^{-j\left( s+\frac{d}{2}-\frac{d}{p}\right) }, \quad j\in {\mathbb {N}}_0.\nonumber \\ \end{aligned}$$
(17)

We call b a \(B_p^s\)-random variable.

The random variables b from Definition 2.3 are also referred to as Besov priors in the literature on inverse problems. The following regularity results are well-known:

Proposition 2.4

 

  1. 1.

    ([24, Lemma 10], [11, Proposition 1]) Let b be a \(B_p^s\)-random variable for \(s>0\) and \(p\in [1,\infty )\). Then, the following conditions are equivalent:

    1. (i)

      \(\Vert b\Vert _{B_p^t}<\infty \) holds P-a.s.;

    2. (ii)

      \({\mathbb {E}}\left( \exp \left( \varepsilon \Vert b\Vert _{B_p^t}^p\right) \right) <\infty \),    \(\varepsilon \in \left( 0,\frac{1}{4 \kappa }\right) \);

    3. (iii)

      \(t<s-\frac{d}{p}\).

  2. 2.

    [11, Theorem 2.1] If, in addition, \({\varvec{\Psi }}\) forms a basis of \(B_p^t\) for a \(t<s-\frac{d}{p}\), \(t\notin {\mathbb {N}}\), then it holds

    $$\begin{aligned} {\mathbb {E}}\left( \exp \left( \varepsilon \Vert b\Vert _{\textrm{C}^t}\right) \right) <\infty ,\quad \varepsilon \in (0,\overline{\varepsilon }), \end{aligned}$$

    where \(\overline{\varepsilon }>0\) is a constant depending on pds and t.

Remark 2.5

Note that \(B_p^s\)-random variables as defined above only take values in \(B_p^t\) for \(t<s-\frac{d}{p}\) based on the previous proposition. Nevertheless, we use the notion \(B_p^s\)-random variables in the following, for a clearer emphasize on the dependence of \(\eta _j\) in (17) on s.

We derive a considerably stronger version of [11, Theorem 2.1] in Theorem 2.11 below, that implies in particular

$$\begin{aligned} {\mathbb {E}}\left( \exp \left( \varepsilon \Vert b\Vert ^p_{\textrm{C}^t}\right) \right) <\infty ,\quad \varepsilon \in (0,\overline{\varepsilon }), \end{aligned}$$

for any \(p\ge 1\) and some \(\overline{\varepsilon }>0\). In the Gaussian case with \(p=2\), this estimate would be a consequence of Fernique’s theorem, however, we are not aware of a similar result for arbitrary \(p\ge 1\) in the literature.

We recall from [26, Theorem 1.37] that \({\varvec{\Psi }}\) forms an unconditional basis of \(B_p^t\) (since \(p<\infty \)), if the scaling and wavelet functions \(\phi \) and \(\psi \) satisfy \(\phi , \psi \in \textrm{C}^\alpha ({\mathbb {R}}^d)\) for \(\alpha>t>0\) and the vanishing moment condition (2).

2.3 Besov random tree priors

Taking the cue from [23], we introduce Besov random tree priors in this subsection and derive several regularity results for this \(B_p^s\)-valued random variable. We investigate all results for periodic functions defined on the torus \({\mathbb {T}}^d\) in this subsection. For the elliptic problem in Sect. 3, we will later introduce the corresponding \(B_p^s({\mathcal {D}})\)-valued random variables on physical domains \({\mathcal {D}}\subset {\mathbb {R}}^d\) with \({\mathcal {D}}\subseteq {\mathbb {T}}^d\) by their restrictions from \({\mathbb {T}}^d\) (cf. Definition 3.6). The random tree structure in our prior construction is based on certain set-valued random variables, so-called Galton-Watson (GW) trees. For the readers’ convenience, definitions of discrete trees, GW trees, along with some other useful results, are listed in Appendix A.

Definition 2.6

[23, Definition 3] Let \({\varvec{\Psi }}\), \(s>0\), \(p\in [1,\infty )\), \(X=((X_{j,k}^l), (j,k,l)\in {{\varvec{{\mathcal {I}}}}_{\varvec{\Psi }}})\) and \((\eta _j,j\in {\mathbb {N}}_0)\) be as in Definition 2.3.

Let \({\mathfrak {T}}\) denote the set of all trees with no infinite node (cf. Definition A.1) and let \(T:\Omega \rightarrow {\mathfrak {T}}\) be a GW tree (cf. Definition A.3) with offspring distribution \({\mathcal {P}}=\text {Bin}(2^d, \beta )\) for \(\beta \in [0,1]\), and independent of X. Furthermore, let \(\mathfrak I_T\) be the set of wavelet indices associated to T from (78).

We define the random tree index set \({\mathcal {I}}_T(\omega ):=\{(j,k,l)|\; (j,k)\in \mathfrak I_T(\omega ),\; l\in {\mathcal {L}}_j\}\) and

$$\begin{aligned} b_T(\omega ):= \sum _{(j,k,l)\in {\mathcal {I}}_T(\omega )} \eta _j X_{j,k}^l(\omega )\psi _{j,k}^l, \quad \omega \in \Omega . \end{aligned}$$
(18)

We refer to \(b_T\) as a \(B_p^s\)-random variable with wavelet density \(\beta \).

We have depicted a sample of a binomial GW tree and the associated set of wavelet indices \(\mathfrak I_T\) for a series expansion in one physical dimension (\(d=1\)) in Fig. 1. Recall that for \(d=1\) there holds \({\mathcal {L}}_j=\{1\}\) for \(j\ge 1\).

Fig. 1
figure 1

Sample of a binomial GW tree T with offspring distribution \({\mathcal {P}}(2, \beta )\) with \(\beta =0.5\). Each box corresponds to a node \(\mathfrak n\) of the GW tree, displayed are all nodes with length \(|\mathfrak n|\le 3\). The left entry in each box is the node \(\mathfrak n\in T\), given by a finite sequence of integers. The right entry is the node-to-wavelet coefficient map \(\mathfrak n\mapsto \left( |\mathfrak n|, \mathfrak I^2_{d,|\mathfrak n|}\circ \mathfrak I^1_{d,|\mathfrak n|}(\mathfrak n)\right) \) evaluated at \(\mathfrak n\), that determines the associated wavelet indices (jk). Starting from the “root node” \(\varrho =()\), the two children of each node are eliminated with probability \(1-\beta \), and independent of each other. Once a node is eliminated (signified by an “(X)”), all of their offspring nodes are eliminated as well. The remaining “surviving” nodes determine the associated random set of active wavelet indices via \(\mathfrak I_T:=\left\{ \left( |\mathfrak n|, \mathfrak I^2_{d,|\mathfrak n|}\circ \mathfrak I^1_{d,|\mathfrak n|}(\mathfrak n)\right) \big |\; \mathfrak n\in T\right\} = \left\{ (0,0), (1,0), (1,1), (2,0), (2,1), (2,3), (3,1), (3,6)\right\} \). Note that each \(\mathfrak n\in T\) is connected to the root node \(\varrho \) through their ancestors, as indicated by the solid lines

Remark 2.7

Definition 2.6 actually slightly deviates from [23, Definition 3]. By definition of \({\mathcal {I}}_T(\omega )\), we include the constant function \(\psi _{0,0}^{(0,\dots ,0)}\equiv 1\in L^2({\mathbb {T}}^d)\) in the series expansion (18). Of course, adding the random constant \(X_{0,0}^{(0,\dots ,0)}\) does not affect the spatial regularity or integrability of \(b_T\). However, in our definition, series (18) has a natural interpretation as orthogonal expansion of a random function with respect to the (deterministic, fixed) basis \({\varvec{ \Psi }}\).

The tree structure in the “active” (i.e., with index in \({\mathcal {I}}_T\)) coefficients in the wavelet representation of \(b_T\) gives rise to random fractals on \({\mathbb {T}}^d\), that occur whenever the tree T in Definition 2.6 does not terminate after a finite number of nodes. It follows by Lemma A.4, that the latter event occurs with positive probability if \(\beta \in (2^{-d},1]\). In this case the Hausdorff dimension of the fractals is \(d+\log _2(\beta )\in (0,d]\), see [23, Section 3] for further details.

Examples of realizations of a \(B_p^s\)-random variable on \({\mathbb {T}}^2\) with varying wavelet density \(\beta \) are shown in Fig. 2.

Fig. 2
figure 2

Samples of a \(B_p^s\)-valued random variable on \({\mathbb {T}}^2=[0,1]^2\) with \(s=p=2\) and wavelet density \(\beta \in \{\frac{1}{4}, \frac{1}{3}, \frac{1}{2}\}\) (top row, from left to right) and \(\beta \in \{\frac{2}{3}, \frac{3}{4}, 1\}\) (bottom row, from left to right). All samples are based on DB(5)-wavelets and the same array of random numbers, that have been sampled with a spatial resolution of \(2^9\times 2^9\) equidistant grid points, and the expansion in  (18) was truncated at \(N=9\) levels of dyadic subdivision (cf. Sect. 4.1). By fixing the array of random numbers, the spatial grid and N, the depicted “evolution” in the panels highlights the effect of an increasing wavelet density \(\beta \). Note that the case \(p=2\) and \(\beta =1\) in the bottom right corresponds to a Gaussian prior

To treat elliptic inverse problems with \(b_T\) as log-diffusion coefficient, we detail the corresponding probability space of parameters. Let \({\mathbb {Q}}_0\) denote the univariate, p-exponential measure on \(({\mathbb {R}}, {\mathcal {B}}({\mathbb {R}}))\) of the random variables \(X_{j,k}^l\) with Lebesgue density as in (16). The product-probability space of the p-exponentials X is given by \((\Omega _p, {\mathcal {A}}_p, {\mathbb {Q}}_p)\), where

$$\begin{aligned} \Omega _p:={\mathbb {R}}^{\mathbb {N}},\quad {\mathcal {A}}_p:=\bigotimes _{n\in {\mathbb {N}}} {\mathcal {B}}({\mathbb {R}}),\quad \text {and} \quad {\mathbb {Q}}_p:=\bigotimes _{n\in {\mathbb {N}}} {\mathbb {Q}}_0. \end{aligned}$$

Now let \(s>0\) and \(p\in [1,\infty )\) be fixed such that \(s>\frac{d}{p}\). We define the weighted \(\ell ^p\)-spaces

$$\begin{aligned} \ell _s^p:=\left\{ x=\left( x_{j,k}^l, (j,k,l)\in {{\varvec{{\mathcal {I}}}}_{\varvec{\Psi }}}\right) \in {\mathbb {R}}^{\mathbb {N}}|\; \Vert x\Vert _{s,p}<\infty \right\} , \end{aligned}$$
(19)

where

$$\begin{aligned} \Vert x\Vert _{s,p}:= \left( \sum _{(j,k,l)\in {{\varvec{{\mathcal {I}}}}_{\varvec{\Psi }}}} 2^{-jps}|x_{j,k}^l|^p \right) ^{1/p}. \end{aligned}$$
(20)

Then \((\ell _s^p, \left\| \cdot \right\| _{s,p})\) is a separable Banach space, for \(1\le p<\infty \). Moreover, we observe that for \(X\sim {\mathbb {Q}}_p\) it holds

$$\begin{aligned} {\mathbb {E}}(\Vert X\Vert _{s,p}^p)&\le \sum _{(j,k,l)\in {{\varvec{{\mathcal {I}}}}_{\varvec{\Psi }}}} 2^{-jps}{\mathbb {E}}(|X_{j,k}^l|^p)\\ {}&\le C \sum _{j=0}^\infty 2^{-jps} 2^{dj} (2^d-1) \le C \sum _{j=0}^\infty 2^{-jp\bigg (s-\frac{d}{p}\bigg )} <\infty , \end{aligned}$$

since \(s>\frac{d}{p}\), thus \({\mathbb {Q}}_p\) is concentrated on \(\ell _s^p\). We therefore regard \((\ell _s^p, {\mathcal {B}}(\ell _s^p), {\mathbb {Q}}_p)\) as the probability space of random coefficient sequences X in the expansions (17) and (18). The set-valued random variable T is a GW tree, and hence takes values in the Polish space \(({\mathfrak {T}}, \delta _{{\mathfrak {T}}})\) of all trees with no infinite node. The metric \(\delta _{\mathfrak {T}}\) and the associated Borel \(\sigma \)-algebra \({\mathcal {B}}({\mathfrak {T}})\) with respect to \({\mathfrak {T}}\) are stated explicitly in Definition A.2 in the Appendix. The image measure \({\mathbb {Q}}_T\) of the GW tree T on \(({\mathfrak {T}}, {\mathcal {B}}({\mathfrak {T}}))\) then solely depends on the parameters \(\beta \) and d of the offspring distribution \({\mathcal {P}}=\text {Bin}(2^d, \beta )\), and is given in Equation (76) in the Appendix. Hence, the parameter probability space of GW trees is given by \(({\mathfrak {T}}, {\mathcal {B}}({\mathfrak {T}}), {\mathbb {Q}}_T)\). To combine the random coefficients X with the GW tree T, we define the cartesian product \(\Omega :=\ell _s^p\times {\mathfrak {T}}\) and equip \(\Omega \) with the metric

$$\begin{aligned} d_\Omega ((x_1, \mathbf{t_1}),(x_2, \mathbf{t_2})):= \Vert x_1-x_2\Vert _{s,p}+\delta _{\mathfrak {T}}(\mathbf{t_1}, \mathbf{t_2}). \end{aligned}$$

Proposition 2.8

The space \((\Omega , d_\Omega )\) is Polish with Borel \(\sigma \)-algebra given by \({\mathcal {B}}(\Omega )={\mathcal {B}}(\ell _s^p\times {\mathfrak {T}})={\mathcal {B}}(\ell _s^p)\otimes {\mathcal {B}}({\mathfrak {T}})\).

Proof

By Lemma [1, Lemma 2.1] the metric space \(({\mathfrak {T}}, \delta _{\mathfrak {T}})\) with \(\delta _{\mathfrak {T}}\) given in (74) is complete and separable. Therefore, separability and completeness of \((\Omega , d_\Omega )\) follows by [5, Corollary 3.39]. Moreover, \({\mathcal {B}}(\Omega )={\mathcal {B}}(\ell _s^p\times {\mathfrak {T}})={\mathcal {B}}(\ell _s^p)\otimes {\mathcal {B}}({\mathfrak {T}})\) holds by [5, Theorem 4.44]. \(\square \)

We are now ready to define the probability space associated to the \(\ell _s^p\times {\mathfrak {T}}\)-valued random variable (XT): Let \((\Omega , {\mathcal {A}}, {\mathbb {P}})\) be the product probabilty space given by

$$\begin{aligned} \Omega :=\ell _s^p\times {\mathfrak {T}},\quad {\mathcal {A}}:={\mathcal {B}}(\ell _s^p)\otimes {\mathcal {B}}({\mathfrak {T}}),\quad \text {and} \quad {\mathbb {P}}:={\mathbb {Q}}_p \otimes {\mathbb {Q}}_T. \end{aligned}$$
(21)

We remark that the product structure of the measure \({\mathbb {P}}={\mathbb {Q}}_p \otimes {\mathbb {Q}}_T\) is tantamount to stochastic independence of X and T. It still remains to identify a realization of the random variable (XT) with the corresponding random tree prior \(b_T\). To this end, we consider the canonical mapping

$$\begin{aligned} b_T:\Omega \rightarrow L^2({\mathbb {T}}^d),\quad \omega \mapsto \sum _{(j,k,l)\in {\mathcal {I}}_T(\omega )} \eta _j X_{j,k}^l(\omega )\psi _{j,k}^l. \end{aligned}$$
(22)

The map \(b_T:\Omega \rightarrow L^2({\mathbb {T}}^d)\) is indeed well-defined since \(\Vert b_T\Vert _{L^2({\mathbb {T}}^d)}<\infty \) holds due to \(s>\frac{d}{p}\). Moreover, \(b_T\) is \({\mathcal {A}}/{\mathcal {B}}(L^2({\mathbb {T}}^d))\)-measurable, as we show in Proposition 2.10 below. As \(b_T\) is a \(L^2({\mathbb {T}}^d)\)-valued random variable, we may define the pushforward probability measure of \(b_T\) via

$$\begin{aligned} b_T\#{\mathbb {P}}(B):={\mathbb {P}}(b_T^{-1}(B)),\quad B\in {\mathcal {B}}(L^2({\mathbb {T}}^d)). \end{aligned}$$
(23)

Thus, the associated probability space of \(B_p^s\)-random variables \(b_T\) with wavelet density \(\beta \) is

$$\begin{aligned} (L^2({\mathbb {T}}^d),\,{\mathcal {B}}(L^2({\mathbb {T}}^d)),\,b_T\#{\mathbb {P}}). \end{aligned}$$

Remark 2.9

We know from Proposition 2.4 that \(b_T\#{\mathbb {P}}\) is concentrated on \(B_p^t\) for any \(t\in (0, s-\frac{d}{p})\). A more refined result that concentrates \(b_T\#{\mathbb {P}}\) on Besov spaces \(B_q^t\) for \(q\ge 1\) with smoothness index \(t=t(s,d,p,\beta ,q)\) is given in Theorem 2.11 below.

We recall at this point that we have assumed Hölder-regularity \({\varvec{\Psi }}\subset \textrm{C}^{\alpha }\) for some \(\alpha \ge 1\) in Sect. 2.1. For the remainder of this article, we will from now on implicitly assume that the parameter \(s>0\) of a \(B_p^s\)-random variable satisfies \(\alpha \ge s>0\) for the sake of presentation.

Proposition 2.10

Let \(s>\frac{d}{p}\), \(\beta \in [0,1]\), and let \(b_T\) be a \(B_p^s\)-random variable with wavelet density \(\beta \). Then \(b_T:\Omega \rightarrow \textrm{C}({\mathbb {T}}^d)\) and \(b_T\) is (strongly) \({\mathcal {A}}/{\mathcal {B}}(\textrm{C}({\mathbb {T}}^d))\)-measurable.

Proof

Fix a \(t\in (0,s-\frac{d}{p})\) such that \(t\notin {\mathbb {N}}\). The second part of Proposition 2.4 then shows that \(b_T\in \textrm{C}^t \) holds P-a.s., and thus \(b_T:\Omega \mapsto \textrm{C}({\mathbb {T}}^d)\) follows, after possibly modifying \(b_T\) on a P-nullset.

As in Appendix A, we denote by \({\mathcal {U}}\) the set of all finite sequences in \({\mathbb {N}}\) and introduce the subset \({\mathcal {U}}_{\textrm{Bin}}\subset {\mathcal {U}}\) with entries in \(\{1,\dots , 2^d\}\) as

$$\begin{aligned} {\mathcal {U}}_{\textrm{Bin}}:=\{\mathfrak n\in {\mathcal {U}}|\, {\mathfrak n}_i\in \{1,\dots , 2^d\} \text { for }i\in \{1,\dots , |\mathfrak n|\}\}. \end{aligned}$$

Note that \(T(\omega )\subset {\mathcal {U}}_{\textrm{Bin}}\) holds P-a.s., since \({\mathcal {P}}=\text {Bin}(2^d, \beta )\). Now let \(\mathfrak I_{d,j}\) be as in (77) and recall from Appendix A.2 that \((j,k)\in \mathfrak I_T(\omega )\) if and only if there is \(\mathfrak n\in T(\omega )\) such that \((j,k)=(|\mathfrak n|, \mathfrak I_{d,|\mathfrak n|}(|\mathfrak n|))\). Hence, we may rewrite the series expansion (18) as

$$\begin{aligned} b_T(\omega )&= \sum _{(j,k)\in \mathfrak I_T(\omega )} \eta _j \sum _{l\in {\mathcal {L}}_j} X_{j,k}^l(\omega )\psi _{j,k}^l \\ {}&= \sum _{\mathfrak n\in {\mathcal {U}}_{\textrm{Bin}}} \mathbbm {1}_{\{\mathfrak n\in T(\omega )\}} \eta _{|\mathfrak n|} \sum _{l\in {\mathcal {L}}_{|\mathfrak n|}} X_{|\mathfrak n|,\mathfrak I_{d,|\mathfrak n|}(|\mathfrak n|)}^l(\omega )\psi _{|\mathfrak n|,\mathfrak I_{d,|\mathfrak n|}(|\mathfrak n|)}^l. \end{aligned}$$

As \(T:\Omega \rightarrow {\mathfrak {T}}\) is \({\mathcal {A}}/{\mathcal {B}}({\mathfrak {T}})\)-measurable, it holds that \(\mathbbm {1}_{\{\mathfrak n\in T(\cdot )\}}:\Omega \rightarrow \{0,1\}\) is measurable for any fixed \(\mathfrak n\in {\mathcal {U}}_{\textrm{Bin}}\). Also, the \(X_{j,k}^l\) are real-valued random variables and \(\psi _{j,k}^l\in \textrm{C}({\mathbb {T}}^d)\) by assumption. Thus, \(b_T:\Omega \rightarrow \textrm{C}({\mathbb {T}}^d)\) is measurable, and strongly measurable as \(\textrm{C}({\mathbb {T}}^d)\) is separable. \(\square \)

More insight in the pathwise regularity of Besov random tree priors, in particular with regard to their Hölder regularity, is obtained by the following result.

Theorem 2.11

Let \(b_T\) be a \(B_p^s\)-random variable with wavelet density \(\beta =2^{\gamma -d}\) as in Definition 2.6 with \(\gamma \in (-\infty , d]\).

  1. 1.)

    It holds that \(b_T\in L^q(\Omega ; {B^t_q})\), and hence \(b_T\in B_q^t\) P-a.s., for all \(t>0\) and \(q\ge 1\) such that \(t < s+ \frac{d-\gamma }{q} - \frac{d}{p}\).

  2. 2.)

    Let \(s-\frac{d}{p}>0\) and \(t\in (0,s-\frac{d}{p})\). Then there is a \(\varepsilon _p>0\) such that

    $$\begin{aligned} {\mathbb {E}}\left( \exp \left( \varepsilon \Vert b\Vert _{{\mathcal {C}}^t}^p\right) \right) < \infty ,\quad \varepsilon \in (0,\varepsilon _p), \end{aligned}$$

    In particular, it holds \(b_T\in L^q(\Omega ; {\mathcal {C}}^t)\) for any \(q\ge 1\).

  3. 3.)

    Let \(q\ge 1\) and \(s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q}>0\). For any \(t\in (0,s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q})\) it holds \(b_T\in L^q(\Omega ; {\mathcal {C}}^t)\).

Proof

(1) For given \(q\ge 1\) and \(t>0\) it holds by (13) that

$$\begin{aligned} \Vert b_T\Vert ^q_{B^t_q} = \sum _{(j,k,l)\in {\mathcal {I}}_T(\omega )} 2^{jq(t+\frac{d}{2}-\frac{d}{q})} \eta _j^q |X_{j,k}^l(\omega )|^q = \sum _{(j,k,l)\in {\mathcal {I}}_T(\omega )} 2^{jq(t - \frac{d}{q} -s + \frac{d}{p})} |X_{j,k}^l(\omega )|^q. \end{aligned}$$

For any given \(j\in {\mathbb {N}}\), by Definition 2.6, the number of nodes v(j) on scale j in the random tree T is binomial distributed (conditional on \(v(j-1)\)) as

$$\begin{aligned} v(j):=\#\{k\in K_j|(j,k,l)\in {\mathcal {I}}_T\} \sim \text {Bin}(2^dv(j-1), 2^{\gamma -d}), \end{aligned}$$

with initial value \(v(0)=1\). Now let \((X_{j,m}, j\in {\mathbb {N}}_0, m\in {\mathbb {N}})\) be an i.i.d. sequence of p-exponential random variables, independent of v(j) for any \(j\in {\mathbb {N}}\), and recall that \(l\in {\mathcal {L}}_j\) with \(|{\mathcal {L}}_0|=2^d\) and \(|{\mathcal {L}}_j|=2^d-1\) for \(\in {\mathbb {N}}\). We obtain by Fubini’s theorem, Wald’s identity, \({\mathbb {E}}(v(j))=(2^d\beta )^j=2^{j\gamma }\), and since \({\mathbb {E}}\left( |X_{j,m}|^q\right) <\infty \) for any \(q>0\) that

$$\begin{aligned} {\mathbb {E}}\left( \Vert b_T\Vert ^q_{B^t_q} \right)&= {\mathbb {E}}\left( \sum _{j=0}^\infty 2^{jq(t - \frac{d}{q} -s + \frac{d}{p})}\sum _{m=1}^{(2^d-1)v(j)}|X_{j,m}|^q \right) + {\mathbb {E}}\left( |X_{0,{2^d}}|^q\right) \\&= \sum _{j=0}^\infty 2^{jq(t - \frac{d}{q} -s + \frac{d}{p})} {\mathbb {E}}\left( (2^d-1)v(j)\right) {\mathbb {E}}\left( |X_{j,m}|^q\right) + {\mathbb {E}}\left( |X_{0,{2^d}}|^q\right) \\&\le 2^d{\mathbb {E}}\left( |X_{1,1}|^q\right) \sum _{j=0}^\infty 2^{jq(t - \frac{d}{q} -s + \frac{d}{p}+\frac{\gamma }{q})}, \end{aligned}$$

with \({\mathbb {E}}\left( |X_{1,1}|^q\right) <\infty \). The series converges if \(t < s+ \frac{d-\gamma }{q} - \frac{d}{p}\), in which case \(b_T\in L^q(\Omega ; {B^t_q})\), and hence \(b_T\in B^t_q\) holds P-a.s.

(2) Now let \(q_0\ge q\ge 1\), \(t_0>\frac{d}{q_0}\) and \(t=t_0-\frac{d}{q_0}\), so that \(B_{q_0}^{t_0}\hookrightarrow {\mathcal {C}}^t\) holds by (15). The embedding follows by a direct comparison of the norms in (13), (14) with \(t=t_0-\frac{d}{q_0}\), and also shows that the corresponding embedding constant \(C_0>0\) is bounded by \(C_0\le 1\).

We obtain with Hölder’s inequality and analogously to the first part the estimate

$$\begin{aligned} \begin{aligned} \Vert b_T\Vert ^q_{L^q(\Omega ; {\mathcal {C}}^t)}&\le {\mathbb {E}}\left( \Vert b_T\Vert ^{q_0}_{{\mathcal {C}}^t} \right) ^{\frac{q}{q_0}} \\&\le {\mathbb {E}}\left( \Vert b_T\Vert ^{q_0}_{B_{q_0}^{t_0}} \right) ^{\frac{q}{q_0}} \\&\le 2^{\frac{dq}{q_0}}{\mathbb {E}}\left( |X_{1,1}|^{q_0}\right) ^{\frac{q}{q_0}} \left( \sum _{j=0}^\infty 2^{jq_0(t_0 - \frac{d}{q_0} -s + \frac{d}{p}+\frac{\gamma }{q_0})} \right) ^{\frac{q}{q_0}} \\&= 2^{\frac{dq}{q_0}}{\mathbb {E}}\left( |X_{1,1}|^{q_0}\right) ^{\frac{q}{q_0}} \left( \sum _{j=0}^{\infty } 2^{jq_0(t - s + \frac{d}{p} + \frac{\gamma }{q_0})} \right) ^{\frac{q}{q_0}}. \end{aligned} \end{aligned}$$
(24)

Now let \(t<s-\frac{d}{p}\) be fixed, and let \(\gamma \in (0,d]\). For every fixed \(q\ge \max (2\gamma (s-\frac{d}{p}-t)^{-1}, 1)\), we choose \(q_0:=q\) to obtain that

$$\begin{aligned} \begin{aligned} \Vert b_T\Vert ^q_{L^q(\Omega ; {\mathcal {C}}^t)}&\le 2^{d}{\mathbb {E}}\left( |X_{1,1}|^{q}\right) \left( \sum _{j=0}^{\infty } 2^{jq_0(t - s + \frac{d}{p})/2} \right) ^{\frac{q}{q_0}} \\&\le 2^{d}{\mathbb {E}}\left( |X_{1,1}|^{q}\right) \left( \sum _{j=0}^{\infty } 2^{-j\gamma } \right) \\&\le C{\mathbb {E}}\left( |X_{1,1}|^{q}\right) , \end{aligned} \end{aligned}$$
(25)

with a constant \(C>0\) that is independent of q. We now define for given \(\varepsilon >0\), finite \(n\in {\mathbb {N}}\) and \(p \in [1,\infty )\) the random variable

$$\begin{aligned} E_n(\omega ):=\sum _{k=0}^n \frac{(\varepsilon \Vert b_T(\omega )\Vert ^p_{{\mathcal {C}}^t})^k}{k!}, \quad \omega \in \Omega . \end{aligned}$$

Clearly, \(E_n(\omega )\rightarrow \exp (\varepsilon \Vert b_T(\omega )\Vert ^p_{{\mathcal {C}}^t})\) holds P-.a.s as \(n\rightarrow \infty \) and Inequality (25) yields, for any \(n\in {\mathbb {N}}\) and \(n_\gamma :=\frac{1}{p}\lceil 2\gamma (s-\frac{d}{p}-t)^{-1} \rceil \), that

$$\begin{aligned} {\mathbb {E}}(E_n)= & {} \sum _{k=0}^n \frac{\varepsilon ^k}{k!} {\mathbb {E}}(\Vert b_T\Vert _{{\mathcal {C}}^t}^{pk}) \le {\widetilde{C}} + \sum _{k=n_\gamma }^n \frac{\varepsilon ^k}{k!} C^{pk}{\mathbb {E}}(|X_{1,1}|^{pk})\\= & {} {\widetilde{C}}+ {\mathbb {E}}\left( \sum _{k=n_\gamma }^n \frac{(\varepsilon C^p|X_{1,1}|^p)^k}{k!} \right) , \end{aligned}$$

where \({\widetilde{C}} = {\widetilde{C}}(\gamma , s,d,p,t)>0\). The monotone convergence theorem then shows that for sufficiently small \(\varepsilon >0\) and \(t<s-\frac{d}{p}\), it holds

$$\begin{aligned} {\mathbb {E}}\left( \exp (\varepsilon \Vert b_T\Vert ^p_{{\mathcal {C}}^t}) \right){} & {} \le {\widetilde{C}} + \lim _{n\rightarrow \infty } {\mathbb {E}}\left( \sum _{k=n_\gamma }^n \frac{(\varepsilon C^p|X_{1,1}|^p)^k}{k!} \right) \\{} & {} \le {\widetilde{C}} + {\mathbb {E}}\left( \exp (\varepsilon C^p |X_{1,1}|^p) \right) < \infty . \end{aligned}$$

(3) The claim for \(\gamma \in (0,d]\) follows by the previous part. For \(\gamma \in (-\infty ,0]\), \(q\ge 1\) and \(t\in (0,s-\frac{d}{p}-\frac{\gamma }{q})\), we finally use \(q_0 = q\) and \(t_0=t+\frac{d}{q}\) in (24) to obtain that

$$\begin{aligned} \Vert b_T\Vert _{L^q(\Omega ; {\mathcal {C}}^t)}&\le C^q {\mathbb {E}}\left( |X_{1,1}|^q\right) \left( \sum _{j=0}^{\infty } 2^{jq(t - s + \frac{d}{p} + \frac{\gamma }{q})} \right) < \infty . \end{aligned}$$

\(\square \)

Remark 2.12

  [23, Theorems 4 and 5] state that \(b_T\in B_p^t\) holds P-a.s. for all \(t\in (0, s-\gamma /p)\), and that \(b_T\notin B_p^{s-\gamma /p}\) occurs with probability \(1-p_\beta >0\) for \(\gamma \in (0, d]\), where \(p_\beta \) is the solution to the equation \(p_\beta =((1-\beta )+\beta p_\beta )^{2^d}\) (cf. Lemma A.4 in the Appendix.) We emphasize that Theorem 2.11 significantly extends these previous results, as we quantify precisely the regularity of \(b_T\) in terms of Besov and Hölder-Zygmund norms.

Recall that we may replace the Hölder-Zygmund spaces \({\mathcal {C}}^t\) in the second part of Theorem 2.11 by the “usual” Hölder spaces \(\textrm{C}^t\) if \(t\notin {\mathbb {N}}\) (which is not true for integer t). Theorem 2.11 shows that a wavelet density \(\beta =2^{\gamma -d}<1\) improves smoothness in \(B_q^t\), as the upper bound \(t < s+ \frac{d-\gamma }{q} - \frac{d}{p}\) is decreasing in \(\gamma \in (-\infty ,d]\). However, given that \(\gamma >0\) we may not expect to gain any (pathwise) Hölder regularity beyond \(t<s-\frac{d}{p}\). This is not surprising with regard to Remark 2.7: \(b_T\) admits an infinite series expansion on random fractals in \({\mathcal {D}}\) for \(\beta >2^{-d}\) with positive probability. Hence, the local Hölder-regularity of \(b_T\) on such fractals corresponds to a \(B^s_p\)-random variable b as in Definition 2.3 (with full wavelet density \(\beta =1\)). In case that \(\gamma \le 0\), the series expansion of \(b_T\) terminates almost surely after a finite number of terms. As \({\varvec{\Psi }}\subset \textrm{C}^{\alpha }\), this shows that \(b_T\in \textrm{C}^\alpha \) almost surely if \(\gamma \le 0\). Furthermore, we may increase the smoothness exponent t for \(b_T\in L^q(\Omega ; {\mathcal {C}}^t)\) to the admissible range \(t<s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q}\). For large q, we see that essentially the restriction \(t<s-\frac{d}{p}\) applies as for \(\gamma >0\). This in turn indicates that the bound

$$\begin{aligned} {\mathbb {E}}\left( \exp \left( \varepsilon \Vert b\Vert _{{\mathcal {C}}^t}^p\right) \right) <\infty ,\quad \varepsilon \in (0,\varepsilon _p), \end{aligned}$$

from part 2.) of Theorem 2.11 can not be improved to Hölder indices \(t\ge s-\frac{d}{p}\), even if \(\gamma \le 0\).

3 Linear elliptic PDEs with Besov random coefficients

In this section, we first recall well-posedness and regularity results for linear, second order elliptic diffusion problems with random coefficient. Thereafter, we transfer the results to a setting with Besov tree random diffusion coefficient by exploiting the results from Sect. 2.

3.1 Well-posedness and regularity

Let \({\mathcal {D}}\subset {\mathbb {R}}^d\), \(d\in \{1,2,3\}\), be a convex polygonal domain for \(d=2,3\), and a finite interval for \(d=1\), with the boundary \(\partial {\mathcal {D}}\) consisting of a finite number of line or plane segments. We consider the random elliptic problem to find \(u(\omega ):{\mathcal {D}}\rightarrow {\mathbb {R}}\) for given \(\omega \in \Omega \) such that

$$\begin{aligned} \begin{alignedat}{2} -\nabla \cdot (a(\omega )\nabla u(\omega ))&= f\quad{} & {} \text {in }{\mathcal {D}}, \quad u(\omega ) = 0 \quad{} & {} \text {on }\partial {\mathcal {D}}. \end{alignedat} \end{aligned}$$
(26)

The diffusion coefficient a in Problem (26) admits positive paths on \({\mathcal {D}}\), i.e., \(a(\omega ):{\mathcal {D}}\rightarrow {\mathbb {R}}_{>0}\). Moreover, a is a random variable \(a:\Omega \rightarrow {\mathcal {X}}\), taking values in a suitable Banach space \({\mathcal {X}}\subset L^\infty ({\mathcal {D}})\). The source term \(f:{\mathcal {D}}\rightarrow {\mathbb {R}}\) is assumed to be a deterministic function for the sake of simplicity, but may as well be modeled by a random function \(f:{\mathcal {D}}\times \Omega \rightarrow {\mathbb {R}}\). For the variational formulation of Problem (26) we define \(H:=L^2({\mathcal {D}})\), \(V:=H_0^1({\mathcal {D}})\) and recall that \(\left\| \cdot \right\| _V:V\rightarrow {\mathbb {R}}_{\ge 0},\, v\mapsto \Vert \nabla v \Vert _H\) defines a norm on V by Poincare’s inequality. The weak formulation of Problem (26) for fixed \(\omega \in \Omega \) is to find \(u(\omega )\in V\) such that for any \(v\in V\) it holds

$$\begin{aligned} \int _{\mathcal {D}}a(\omega )\nabla u(\omega )\cdot \nabla v dx = _{V'}^{}{\langle }_{}^{} f ,v \rangle _{V}. \end{aligned}$$
(27)

Definition 3.1

The map \(\omega \mapsto u(\omega ) \in V\) with \(u(\omega )\) the solution of (27) is the pathwise weak solution.

Existence and uniqueness of pathwise weak solutions are ensured by the following theorem.

Theorem 3.2

Let \(a:\Omega \rightarrow L^\infty ({\mathcal {D}})\) be strongly \({\mathcal {A}}/{\mathcal {B}}(L^\infty ({\mathcal {D}}))\)-measurable such that

$$\begin{aligned} a_-(\omega ):={{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}} \, a(x,\omega )>0,\quad P-\text {a.s.,} \end{aligned}$$
(28)

and let \(f\in V'\). Then, there exists for all \(\omega \in \Omega \) a unique weak solution \(u(\omega )\in V\) to Problem (26). The map \(u:\Omega \rightarrow V\) is strongly \({\mathcal {A}}/{\mathcal {B}}(V)\)-measurable.

Proof

By the completeness of \((\Omega ,{\mathcal {A}},P)\), we may assume without loss of generality that \(a_-(\omega )>0\) and \(a(\omega )\in L^\infty ({\mathcal {D}})\) holds for all \(\omega \in \Omega \)Footnote 2.

Existence and uniqueness of a pathwise solution \(u(\omega )\) now follows for all \(\omega \in \Omega \) by the Lax-Milgram Lemma. To show strong measurability of u, we use Lipschitz dependence of the coefficient-to-solution map: consider two diffusion coefficients \(a_1,a_2:\Omega \rightarrow L^\infty ({\mathcal {D}})\) that satisfy the assumption of the theorem with lower bounds \(a_{1,-}, a_{2,-}>0\) as in (28) and denote by \(u_1,u_2:\Omega \rightarrow V\) the associated unique weak solutions. Equation (26) together with \(\Vert v\Vert _V^2 = \Vert \nabla v\Vert _H^2\) and Hölder’s inequality yields for any fixed coefficients \(a_1,a_2 \in L^\infty ({\mathcal {D}})\) such that \(a_{i,-}:= {{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}} a_{i}(x) > 0\) that

$$\begin{aligned} \begin{aligned} \Vert u_1-u_2\Vert _V&\le \frac{\Vert u_2\Vert _{V}}{a_{1,-}}\Vert a_1-a_2\Vert _{L^\infty ({\mathcal {D}})} \le \frac{\Vert f\Vert _{V'}}{a_{1,-} a_{2,-}}\Vert a_1-a_2\Vert _{L^\infty ({\mathcal {D}})}. \end{aligned} \end{aligned}$$
(29)

Therefore, the data-to-solution map \(U: S \rightarrow V,\; a \mapsto u\) is (Lipschitz) continuous on the set \(S:=\{ a\in L^\infty ({\mathcal {D}})|\, {{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}} a(x) > 0 \}\). Since the pathwise weak solution \(u:\Omega \rightarrow V\) of (26) may be written as \(u=U\circ a\), the claim follows with the strong \({\mathcal {A}}/{\mathcal {B}}(L^\infty ({\mathcal {D}}))\)-measurability of \(a:\Omega \rightarrow L^\infty ({\mathcal {D}})\). \(\square \)

Lipschitz continuity (29) of the data-to-solution map will be essential in deriving error estimates in Sect. 4 ahead, and also implies strong measurability of random solutions.

Proposition 3.3

Let \(a_1,a_2:\Omega \rightarrow L^\infty ({\mathcal {D}})\) be strongly \({\mathcal {A}}/{\mathcal {B}}(L^\infty ({\mathcal {D}}))\)-measurable such that

$$\begin{aligned} a_{i,-}(\omega ):={{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}} \, a_i(x,\omega )>0,\quad P-\text {a.s. for }i\in \{1,2\}. \end{aligned}$$
(30)

Then, for every \(f\in V'\) exists for \(i\in \{1,2\}\) and for all \(\omega \in \Omega \) a unique weak solution \(u_i(\omega )\in V\) to Problem (26) (with a in place of \(a_i\)). There holds the continuous-dependence estimate

$$\begin{aligned} \begin{aligned} \Vert u_1-u_2\Vert _V \le \frac{\Vert f\Vert _{V'}}{a_{1,-} a_{2,-}}\Vert a_1-a_2\Vert _{L^\infty ({\mathcal {D}})}. \end{aligned} \end{aligned}$$

Proof

This follows immediately with Theorem 3.2 and (29). \(\square \)

From the regularity analysis of deterministic linear elliptic problems it is well known that \(H^s({\mathcal {D}})\)-regularity of u may be derived for certain \(s>1\), provided that a is Hölder continuous. The corresponding estimates usually do not reveal the explicit dependence of constants on \(a(\omega )\) or bounds on the Hölder norm \(\Vert a(\omega )\Vert _{\textrm{C}^t}\). For the stochastic problem and the ensuing numerical analysis in Sects. 4 and 5, however, we need the explicit dependence for given \(\omega \) to ensure that all pathwise estimates also hold in in \(L^q(\Omega ; H^s({\mathcal {D}}))\) for suitable \(q\ge 1\). To obtain explicit estimates, we follow the approach from [13, Chapter 3.3] for parametric elliptic PDEs, where regularity estimates are derived via the K-method of function space interpolation.Footnote 3 One obtains in particular Hölder spaces \(\textrm{C}^r(\overline{{\mathcal {D}}})\) by interpolation ([3, Lemma 7.36]):

$$\begin{aligned} \textrm{C}^r(\overline{{\mathcal {D}}})=[L^\infty ({\mathcal {D}}), W^{1,\infty }({\mathcal {D}})]_{r,\infty }, \quad r\in (0,1). \end{aligned}$$

To investigate spatial regularity of solutions to (26), we introduce the normed space

$$\begin{aligned} W:=\{v\in V|\; \Delta v \in H\}, \quad \Vert v\Vert _W:=\Vert \Delta v\Vert _H. \end{aligned}$$
(31)

Note that \(v=0\Leftrightarrow \Vert v\Vert _W=0\) follows by the maximum principle, since \(v\in V=H^1_0({\mathcal {D}})\) has vanishing trace. We formulate regularity results in terms of the interpolation space

$$\begin{aligned} W^r:=[V, W]_{r,\infty }, \quad r\in (0,1). \end{aligned}$$
(32)

For a concise notation, we further set \(W^1:=W\) in the following.

Lemma 3.4

[13, Propositions 3.2 and 3.5] Let \(a:\Omega \rightarrow \textrm{C}^r(\overline{{\mathcal {D}}})\subset L^\infty ({\mathcal {D}})\) be strongly measurable for some \(r\in (0,1]\) such that \(a_-(\omega )>0\) holds P-a.s. and let \(f\in H\). Then, there is a constant \(C=C(r,{\mathcal {D}})\), such that it holds

$$\begin{aligned} \Vert u(\omega )\Vert _{W^r} \le \frac{C}{a_-(\omega )} \left( 1+\left( \frac{\Vert a(\omega )\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})}}{a_-(\omega )}\right) ^{1/r}\right) \Vert f\Vert _H. \end{aligned}$$
(33)

All results from this subsection so far hold under the considerably weaker assumption that \({\mathcal {D}}\subset {\mathbb {R}}^d\) is a bounded Lipschitz domain. However, since \({\mathcal {D}}\) is assumed convex, we are able to embed \(W^r\) in (fractional) Sobolev spaces. This is made precise in the following Lemma, which is in required for the finite element error analysis in Sect. 4.2.

Lemma 3.5

Let \({\mathcal {D}}\) be convex, \(W^r:=[V, W]_{r,\infty }\) for \(r\in (0,1)\) and let \(W^1:=W\). Then, it holds that \(W=W^1\hookrightarrow H^2({\mathcal {D}})\). Moreover, \(W^r\hookrightarrow H^{1+r_0}({\mathcal {D}})\) for any \(r_0\in (0,r)\).

Proof

By convexity of \({\mathcal {D}}\), we have that \(\Vert v\Vert _{H^2({\mathcal {D}})}\le C_{\mathcal {D}}\Vert v\Vert _W\) holds for all \(v\in W\), where \(C_{\mathcal {D}}\) only depends on the diameter of \({\mathcal {D}}\), see, e.g., [16, Theorem 3.2.1.2]. Thus, \(W\hookrightarrow H^2({\mathcal {D}})\cap V\) follows.

For the case \(r\in (0,1)\), we recall that there is \(C_V>0\), such that \(\Vert v\Vert _{H^1({\mathcal {D}})}\le C_{V}\Vert v\Vert _V\) holds for all \(v\in V\) by Poincaré’s inequality. Moreover, we have \(\Vert w\Vert _{H^2({\mathcal {D}})}\le C_{{\mathcal {D}}}\Vert w\Vert _W\) for any \(w\in W\), and hence \(W\subset H^2\cap V\) from the first part. For \(v\in V\subset H^1\) this yields

$$\begin{aligned} \Vert v\Vert _{[H^1({\mathcal {D}}),H^2({\mathcal {D}})]_{r,\infty }}&= \sup _{z>0} z^{-r} \inf _{w\in H^2}\{\Vert v-w\Vert _{H^1({\mathcal {D}})} + z\Vert w\Vert _{H^2({\mathcal {D}})}\} \\&\le \sup _{z>0} z^{-r} \inf _{w\in W}\{\Vert v-w\Vert _{H^1({\mathcal {D}})} + z\Vert w\Vert _{H^2({\mathcal {D}})}\} \\&\le \sup _{z>0} z^{-r} \inf _{w\in W}\{C_V\Vert v-w\Vert _{V} + C_{\mathcal {D}}z\Vert w\Vert _{W}\} \\&\le \max (C_V, C_{\mathcal {D}}) \Vert v\Vert _{W^r}. \end{aligned}$$

Hence, \(W^r\hookrightarrow [H^1({\mathcal {D}}),H^2({\mathcal {D}})]_{r,\infty }\). The claim now follows since for any \(\varepsilon \in (0,1+r)\) there holds

$$\begin{aligned}{}[H^1({\mathcal {D}}),H^2({\mathcal {D}})]_{r,\infty } =[H^{1+r-\varepsilon }({\mathcal {D}}),H^{1+r+\varepsilon }({\mathcal {D}})]_{\frac{1}{2},\infty }\hookrightarrow H^{1+r-\varepsilon }({\mathcal {D}}), \end{aligned}$$

see [3, Section 7.32]. \(\square \)

3.2 Besov random tree priors as log-diffusion coefficient

To formulate Problem (26) with a Besov random tree coefficient, we assume that \({\mathcal {D}}\subseteq {\mathbb {T}}^d\). We follow [26, Section 2] and define, for given \(\omega \in \Omega \), the random element \(b_T(\omega ):{\mathcal {D}}\rightarrow {\mathbb {R}}\) as the restriction of a periodic function in \(B_{p,p}^{s,per}\) to the domain \({\mathcal {D}}\). The restriction \(\varphi |_{{\mathcal {D}}}\) of any \(\varphi \in \textrm{S}'({\mathbb {R}}^d)\) to \({\mathcal {D}}\) is in turn given by the element \(\varphi |_{{\mathcal {D}}}\in \textrm{D}'({\mathcal {D}})\) such that

$$\begin{aligned} _{\textrm{D}'({\mathcal {D}})}^{}{\langle }_{}^{} \varphi |_{{\mathcal {D}}} ,v \rangle _{\mathrm D({\mathcal {D}})} = _{\textrm{S}'({\mathbb {R}}^d)}^{}{\langle }_{}^{} \varphi ,v_0 \rangle _{\textrm{S}({\mathbb {R}}^d)}, \quad v\in \mathrm D({\mathcal {D}}), \end{aligned}$$

where \(v_0\in \mathrm D({\mathbb {R}}^d)\subset \textrm{S}({\mathbb {R}}^d)\) denotes the zero-extension of any \(v\in \mathrm D({\mathcal {D}})\).

Definition 3.6

Let \({\mathcal {D}}\subseteq {\mathbb {T}}^d\) be a bounded, connected domain. Let \(b_T\) be given in Definition 2.6 for \(p\in [1,\infty )\), \(s>0\) and \(\beta =2^{\gamma -d}\in [0,1]\), and let \(\textrm{prl}^{per}:B_{p,p}^s({\mathbb {T}}^d)\rightarrow B_{p,p}^{s,per}({\mathbb {R}}^d)\) denote the isomorphic extension operator from (11). Then we define for any \(\omega \in \Omega \)

$$\begin{aligned} b_{T,{\mathcal {D}}}(\omega ):= (\textrm{prl}^{per}b_T(\omega ))|_{{\mathcal {D}}}, \end{aligned}$$

and call \(b_{T,{\mathcal {D}}}\) a \(B_p^s({\mathcal {D}})\)-valued random variable.

Remark 3.7

In case that \({\mathcal {D}}={\mathbb {T}}^d\), we may readily use the identification \(b_{T,{\mathcal {D}}}=b_T\). Note that \(b_{T,{\mathcal {D}}}\) is periodic in this case, in the sense that there exists an extension \(\textrm{prl}^{per}b_{T,{\mathcal {D}}}\in B_{p,p}^{s,per}({\mathbb {R}}^d)\). If \({\mathcal {D}}\subsetneq {\mathbb {T}}^d\), however, \(b_{T,{\mathcal {D}}}\) is not (necessarily) periodic, but merely the restriction of a periodic function from the torus \({\mathbb {T}}^d\).

Remark 3.8

The same procedure could be applied for general bounded domains \({\mathcal {D}}\not \subset {\mathbb {T}}^d\), by extending Definition (2.6) from the torus \({\mathbb {T}}^d\) to a sufficiently large (periodic) domain \([-L,L]^d\) for \(L>1\). This would increase the index-set \(K_j\) of wavelet coefficients by at most a constant factor on each dyadic scale j. However, all regularity proofs from Sect. 2 are carried out similar in this setting, with minor changes to absolute constants. For instance, the admissible range of \(\varepsilon \) in Proposition 2.4 may become smaller if \(L>1\), but the smoothness parameter \(t\in (0,s-\frac{d}{p})\) is unaffected. Therefore, assuming \({\mathcal {D}}\subseteq {\mathbb {T}}^d\) for the sake of brevity does not have any substantial impact on the following results.

We consider Problem (26), resp. its weak formulation (27), with \(a(\omega ):=\exp \left( b_T(\omega )\right) \), where \(b_T\) is a \(B_p^s\)-random variable with wavelet density \(\beta \). That is, we model the log-diffusion by a Besov random tree prior to incorporate fractal structures. With this preparation, we are able to derive well-posedness and regularity of the corresponding pathwise weak solution.

Theorem 3.9

Let \(a:=\exp \left( b_{T,{\mathcal {D}}}\right) \) with \(b_{T,{\mathcal {D}}}\) given in Definition 3.6 for \(p\in [1,\infty )\), \(s>0\) and \(\beta =2^{\gamma -d}\in [0,1]\), so that \(sp>d\). Furthermore, let \(f\in V'\).

  1. (1)

    Then, there exists almost surely a unique weak solution \(u(\omega )\in V\) to (26) and \(u:\Omega \rightarrow V\) is strongly measurable.

  2. (2)

    For sufficiently small \(\kappa >0\) in (16), there are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that

    $$\begin{aligned} \Vert u\Vert _{L^q(\Omega ; V)}\le C \Vert f\Vert _{V'} <\infty \quad {\left\{ \begin{array}{ll} &{}\text {for }q\in [1,\overline{q})\text { if }p=1,\text { and} \\ &{}\text {for any }q\in [1,\infty )\text { if }p>1. \end{array}\right. } \end{aligned}$$
  3. (3)

    Let \(r\in (0,s-\frac{d}{p})\cap (0,1]\), \(f\in H\) and \(W^r\) as in (32). For sufficiently small \(\kappa >0\) in (16), there are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that

    $$\begin{aligned} \Vert u\Vert _{L^q(\Omega ; W^r)} \le C \Vert f\Vert _{H} <\infty \quad {\left\{ \begin{array}{ll} &{}\text {for }q\in [1,\overline{q})\text { if }p=1\text { and} \\ &{}\text {for any }q\in [1,\infty )\text { if }p>1. \end{array}\right. } \end{aligned}$$

Proof

1.) As \(sp>d\), Theorem 2.11 shows that \(b_T\in \textrm{C}({\mathbb {T}}^d)\) holds P-a.s. Moreover, \(b_T:\Omega \rightarrow \textrm{C}({\mathbb {T}}^d)\) is strongly measurable by Proposition 2.10, and thus in particular strongly \({\mathcal {A}}/{\mathcal {B}}(L^\infty ({\mathbb {T}}^d))\)-measurable, since \({\mathcal {B}}(\textrm{C}({\mathbb {T}}^d))\subset {\mathcal {B}}(L^\infty ({\mathbb {T}}^d))\). As \(b_{T,{\mathcal {D}}}\) in Definition 3.6 is the restriction of \(\textrm{ext}^{per}b_T\) to \({\mathcal {D}}\subset {\mathbb {T}}^d\), and \(a=\exp \circ \,b_{T,{\mathcal {D}}}\), it follows that, \(a:\Omega \rightarrow \textrm{C}(\overline{{\mathcal {D}}})\), and a is strongly \({\mathcal {A}}/{\mathcal {B}}(L^\infty ({\mathcal {D}}))\)-measurable such that \(a_->0\) holds P-a.s. Theorem 3.2 then guarantees the P-a.s. existence of a unique pathwise weak solution \(u(\omega )\). Moreover, \(u:\Omega \rightarrow V\) is strongly measurable and Equation (27) shows that

$$\begin{aligned} \Vert u(\omega )\Vert _V\le \frac{\Vert f\Vert _{V'}}{a_-(\omega )}. \end{aligned}$$

2.) To show the second part, we fix \(t\in (0,s-\frac{d}{p})\) and \(q\ge 1\) to see that

$$\begin{aligned} \Vert u\Vert _{L^q(\Omega ; V)}^q&\le {\mathbb {E}}\left( a_-^{-q}\right) \Vert f\Vert _{V'}^q \\&= {\mathbb {E}}\left( \left( {{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}}\; \exp (-b_{T,{\mathcal {D}}}(x))\right) ^q \right) \Vert f\Vert _{V'}^q \\&= {\mathbb {E}}\left( \exp \left( {{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}}\; -qb_{T,{\mathcal {D}}}(x)\right) \right) \Vert f\Vert _{V'}^q \\&\le {\mathbb {E}}\left( \exp \left( {{\,\textrm{ess sup}\,}}_{x\in {\mathbb {T}}^d}\; qb_T(x)\right) \right) \Vert f\Vert _{V'}^q \\&= {\mathbb {E}}\left( \exp \left( q\Vert b_T\Vert _{L^\infty ({\mathbb {T}}^d)}\right) \right) \Vert f\Vert _{V'}^q \\&\le {\mathbb {E}}\left( \exp \left( q\Vert b_T\Vert _{{\mathcal {C}}^t}\right) \right) \Vert f\Vert _{V'}^q. \end{aligned}$$

We have used that \(\exp (\cdot )\) is strictly increasing for the second equality, and that \(b_T(x)\) is a centered random variable such that \(b_T\) and \(-b_T\) are equal in distribution. This implies in turn that \(a_-\) and \(\Vert a\Vert _{L^\infty ({\mathcal {D}})}\) are equal in distribution, which we used in the second inequality. The last estimate is due to \(\Vert b_T\Vert _{L^\infty ({\mathbb {T}}^d)}\le \Vert b_T\Vert _{{\mathcal {C}}^t}\) for any \(t>0\). For \(p=1\), we note that \(\varepsilon _p\) in the second part of Theorem 2.11 may be chosen as \(\varepsilon _p=(\kappa C)^{-1}\), where \(C>0\) is the constant in (25). Hence, for sufficiently small \(\kappa >0\), we may set \(\overline{q}:= \varepsilon _p>1\) in the claim. In case that \(p>1\), Young’s inequality shows that for any \(q\ge 1\) there is an arbitrary small \(\varepsilon >0\) and a constant \(C_\varepsilon =C_\varepsilon (p,q)\in (0,\infty )\) such that

$$\begin{aligned} q\Vert b_T\Vert _{{\mathcal {C}}^t} \le \varepsilon \Vert b_T\Vert _{{\mathcal {C}}^t}^p + C_\varepsilon . \end{aligned}$$

Thus, we have no restrictions on \(q\in [1,\infty )\), which proves the second part of the claim.

3.) Let \(\left\| \cdot \right\| \) denote the Euclidean norm on \({\mathcal {D}}\). Observe that for any fixed \(r\in (0,1)\) we obtain by Taylor expansion and since \(\exp (\cdot )\) is strictly increasing that

$$\begin{aligned} \begin{aligned} \Vert \exp (b_{T,{\mathcal {D}}})\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})}&= \sup _{x,y\in {\mathcal {D}},\, x\ne y} \frac{|\exp (b_{T,{\mathcal {D}}}(x))-\exp (b_{T,{\mathcal {D}}}(y))|}{\Vert x-y\Vert ^r} + \Vert \exp (b_{T,{\mathcal {D}}})\Vert _{L^\infty ({\mathcal {D}})} \\&\le \Vert \exp (b_{T,{\mathcal {D}}})\Vert _{L^\infty ({\mathcal {D}})}\left( \sup _{x,y\in {\mathcal {D}},\, x\ne y} \frac{|b_{T,{\mathcal {D}}}(x)-b_{T,{\mathcal {D}}}(y)|}{\Vert x-y\Vert ^r} + 1\right) \\&\le \exp (\Vert b_{T,{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})}) \left( \Vert b_{T,{\mathcal {D}}}\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})} + 1\right) . \end{aligned} \end{aligned}$$
(34)

We obtain further for \(r=1\) that

$$\begin{aligned} \begin{aligned} \Vert \exp (b_{T,{\mathcal {D}}})\Vert _{\textrm{C}^1(\overline{{\mathcal {D}}})}&\le \exp (\Vert b_{T,{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})}) \left( \Vert b_{T,{\mathcal {D}}}\Vert _{\textrm{C}^1(\overline{{\mathcal {D}}})} + 1\right) . \end{aligned} \end{aligned}$$
(35)

For any fixed \(r\in (0,s-\frac{d}{p})\cap (0,1]\) Lemma 3.4 now shows that

$$\begin{aligned} \Vert u\Vert _{L^q(\Omega ; W^r)}^q{} & {} \le C^q{\mathbb {E}}\left[ a_-^{-q} \left( 1+a_-^{-\frac{1}{r}}\Vert \exp (b_{T,{\mathcal {D}}})\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})}^{\frac{1}{r}}\right) ^q \right] \Vert f\Vert _{H}^q \nonumber \\{} & {} \le C^q{\mathbb {E}}\left[ a_-^{-q} \left( 1+a_-^{-\frac{1}{r}} \exp (\Vert b_{T,{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})})^{\frac{1}{r}} \left( \Vert b_{T,{\mathcal {D}}}\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})} + 1\right) ^{\frac{1}{r}}\right) ^q \right] \Vert f\Vert _{H}^q \nonumber \\{} & {} \le \! C^q{\mathbb {E}}\left[ a_-^{-q} 2^{q-1}\left( 1\!+\!a_-^{-\!\frac{q}{r}} \exp (\Vert b_{T,{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})})^{\frac{q}{r}} 2^{q-\!1}\left( \Vert b_{T,{\mathcal {D}}}\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})}^{\frac{q}{r}} \!\!+\!\! 1\right) \right) \right] \Vert f\Vert _{H}^q \nonumber \\{} & {} \le C{\mathbb {E}}\left[ \exp (\Vert b_{T,{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})})^{q+\frac{2q}{r}} \left( \Vert b_{T,{\mathcal {D}}}\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})}^{\frac{q}{r}} + 1\right) \right] \Vert f\Vert _{H}^q, \end{aligned}$$
(36)

where we have used (34), (35) in the second step, applied Jensen’s inequality twice in the third step, and used again that \(b_T\) and \(-b_T\) are equal in distribution together with \(\exp (\Vert b_{T,{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})}) \ge 1\) to derive the third estimate. We may further assume without loss of generality that \(\Vert b_{T,{\mathcal {D}}}\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})}\ge 1\) to obtain with (36) and Hölder’s inequality for \(q_1, q_2>1\) such that \(\frac{1}{q_1}+\frac{1}{q_2}=1\)

$$\begin{aligned} \begin{aligned} \Vert u\Vert _{L^q(\Omega ; W^r)}^q&\le C{\mathbb {E}}\left[ \exp \left( q_1\left( q+\frac{2q}{r}\right) \Vert b_T\Vert _{\textrm{C}^r}\right) \right] ^{\frac{1}{q_1}} {\mathbb {E}}\left[ \Vert b_T\Vert _{\textrm{C}^r}^{\frac{q_2q}{r}} \right] ^{\frac{1}{q_2}}\Vert f\Vert _{H}^q, \end{aligned} \end{aligned}$$
(37)

where we have also used that \(\Vert b_{T,{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})}\le \Vert b_{T}\Vert _{\textrm{C}^r}\) and that \(\Vert b_{T,{\mathcal {D}}}\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})} \le \Vert b_T\Vert _{\textrm{C}^r}\) for any \(r>0\). To bound the Hölder-norm \(\Vert b_T\Vert _{\textrm{C}^r}\) in (37), we first consider the case \(r<1\). Then, we recall from Sect. 2.1 that \(\textrm{C}^r={\mathcal {C}}^r\) with equivalent norms, thus \(\Vert b_T\Vert _{\textrm{C}^r}\le C\Vert b_T\Vert _{{\mathcal {C}}^r}\). If \(r=1\), then \(s-\frac{d}{p}>1\), and we use the same argument to derive the bound \(\Vert b_T\Vert _{\textrm{C}^r}\le \Vert b_T\Vert _{\textrm{C}^{r+\varepsilon }}\le C\Vert b_T\Vert _{{\mathcal {C}}^{r+\varepsilon }}\) for any \(\varepsilon \in (0,s-\frac{d}{p}-1)\).

For \(p=1\), Theorem 2.11 now shows again that for sufficiently small \(\kappa >0\), there are admissible choices \(q,q_1,q_2\in [1,\infty )\), dependent on r, such that the right hand side in (37) is finite. The proof is concluded by noting that \(q\in [1,\infty )\) may again be arbitrary large in (37) if \(p>1\), independent of r. \(\square \)

4 Pathwise finite element approximation

4.1 Dimension truncation

To obtain a tractable approximation of \(b_T\) in (18), we truncate the wavelet series expansion after \(N\in {\mathbb {N}}\) scales to obtain the \(truncated random tree Besov prior \)

$$\begin{aligned} b_{T,N}(\omega ):= \sum _{\begin{array}{c} (j,k,l)\in {\mathcal {I}}_T(\omega ) \\ j\le N \end{array}} \eta _jX_{j,k}^l(\omega )\psi _{j,k}^l, \quad \omega \in \Omega . \end{aligned}$$
(38)

The corresponding diffusion problem in weak form with truncated coefficient for fixed \(\omega \in \Omega \) is to find \(u_N(\omega )\in V\) such that for all \(v\in V\)

$$\begin{aligned} \int _{\mathcal {D}}a_N(\omega )\nabla u_N(\omega )\cdot \nabla v dx = _{V'}^{}{\langle }_{}^{} f ,v \rangle _{V}, \end{aligned}$$
(39)

where

$$\begin{aligned} a_N:\Omega \rightarrow L^\infty ({\mathcal {D}}), \quad \omega \mapsto \exp (b_{T,N}(\omega )|_{{\mathcal {D}}}). \end{aligned}$$
(40)

Existence, uniqueness, and regularity of \(u_N\) follows analogously as for u in the previous section.

Corollary 4.1

Let \(N\in {\mathbb {N}}\), \(a_N=\exp \left( b_{T,N}|_{{\mathcal {D}}}\right) \) with \(b_{T,N}\) be given as in (38) for \(p\in [1,\infty )\), \(s>0\) and \(\beta =2^{\gamma -d}\in [0,1]\), so that \(sp>d\). Furthermore, let \(f\in V'\). Then the following holds.

  1. 1.)

    There exists almost surely a unique weak solution \(u_N(\omega )\in V\) to the truncated Problem (39) and \(u_N:\Omega \rightarrow V\) is strongly measurable.

  2. 2.)

    For sufficiently small \(\kappa >0\) in (16), there are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that for any \(N\in {\mathbb {N}}\)

    $$\begin{aligned} \Vert u_N\Vert _{L^q(\Omega ; V)}\le C \Vert f\Vert _{V'} <\infty \quad {\left\{ \begin{array}{ll} &{}\text {for }q\in [1,\overline{q})\text { if }p=1\text {, and} \\ &{}\text {for any }q\in [1,\infty )\text { if }p>1. \end{array}\right. } \end{aligned}$$
  3. 3.)

    Let \(r\in (0,s-\frac{d}{p})\cap (0,1]\) and \(f\in H\). There are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that for any \(N\in {\mathbb {N}}\)

    $$\begin{aligned} \Vert u_N\Vert _{L^q(\Omega ; W^r)}\le C \Vert f\Vert _{H} < \infty \quad {\left\{ \begin{array}{ll} &{}\text {for }q\in [1,\overline{q})\text { if }p=1\text { and} \\ &{}\text {for any }q\in [1,\infty )\text { if }p>1. \end{array}\right. } \end{aligned}$$

Proof

The result follows analogously to Theorems 2.11 and 3.9, upon observing that \(\Vert b_{T,N}(\omega )\Vert _{B^t_q}\le \Vert b_{T}(\omega )\Vert _{B^t_q}\) holds P-a.s. for any \(t>0\), \(q\in [1,\infty ]\), and \(N\in {\mathbb {N}}\). \(\square \)

The important observation from Corollary 4.1 is that the bounds are independent of N, which is crucial when estimating the finite element discretization error of \(u_N\) in the next subsection. We bound the truncation errors \(a-a_N\) and \(u-u_N\) in the remainder of this section.

Proposition 4.2

Let \(a:=\exp \left( b_{T,{\mathcal {D}}}\right) \) with \(b_{T,{\mathcal {D}}}\) as given in Definition 3.6 with \(p\in (1,\infty )\), \(s>0\), \(\beta =2^{\gamma -d}\in [0,1]\) and such that \(sp>d+\min (\gamma ,0)\). Let \(b_{T,N}\) and \(a_N\) be the approximations of \(b_T\) and a for given \(N\in {\mathbb {N}}\) as in (38) and (40), respectively.

  1. 1.)

    For any \(q\ge 1\) and \(t\in (0,s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q})\) there is a constant \(C>0\) such that for every \(N\in {\mathbb {N}}\) it holds

    $$\begin{aligned} \Vert b_{T,{\mathcal {D}}}-b_{T,N}|_{\mathcal {D}}\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))} \le C 2^{N\left( t-s+\frac{d}{p}+\frac{\min (\gamma ,0)}{q}\right) }. \end{aligned}$$
  2. 2.)

    Moreover, for any \(q\ge 1\), \(\varepsilon >0\) and \(t\in (0,s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q})\) there is a \(C>0\) such that for every \(N\in {\mathbb {N}}\) it holds

    $$\begin{aligned} \Vert a-a_N\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))} \le C 2^{N\left( t-s+\frac{d}{p}+\frac{\min (\gamma +\varepsilon ,0)}{q}\right) }. \end{aligned}$$

Proof

1.) Let \(q_0\ge q\), \(t_0>\frac{d}{q_0}\) and \(t = t_0-\frac{d}{q_0}\), so that \(B_{q_0}^{t_0}\hookrightarrow {\mathcal {C}}^t\). For any fixed \(N\in {\mathbb {N}}\), we obtain with Hölder’s inequality analogously to the proof of Theorem 2.11 the estimate

$$\begin{aligned} \begin{aligned} \Vert b_{T,{\mathcal {D}}}-b_{T,N}|_{\mathcal {D}}\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))}&\le \Vert b_T-b_{T,N}\Vert _{L^q(\Omega ; {\mathcal {C}}^t)} \\&\le {\mathbb {E}}\left( \Vert b_T-b_{T,N}\Vert ^{q_0}_{{\mathcal {C}}^t} \right) ^{\frac{1}{q_0}} \\&\le {\mathbb {E}}\left( \Vert b_T-b_{T,N}\Vert ^{q_0}_{B_{q_0}^{t_0}} \right) ^{\frac{1}{q_0}} \\&\le \left( \sum _{j=N+1}^\infty 2^{jq_0(t_0 - \frac{d}{q_0} -s + \frac{d}{p}+\frac{\gamma }{q_0})} \right) ^{\frac{1}{q_0}} \\&= 2^{N(t -s + \frac{d}{p}+\frac{\gamma }{q_0})} \left( \sum _{j=1}^{\infty } 2^{jq_0(t - s + \frac{d}{p} + \frac{\gamma }{q_0})} \right) ^{\frac{1}{q_0}}. \end{aligned} \end{aligned}$$
(41)

Now let \(t<s-\frac{d}{p}\) and \(\gamma \in (0,d]\) in (41), and choose \(q_0=\max \Big (\gamma N, 2\gamma (s-\frac{d}{p}-t)^{-1}, q\Big )\) (for sufficiently large, given N) to obtain that

$$\begin{aligned} \begin{aligned} \Vert b_{T,{\mathcal {D}}}-b_{T,N}|_{\mathcal {D}}\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))}&\le 2^{N\left( t -s + \frac{d}{p}\right) + 1} \left( \sum _{j=1}^{\infty } 2^{jq_0\left( t - s + \frac{d}{p}\right) \frac{1}{2}} \right) ^{\frac{1}{q_0}} \\&\le 2^{N\left( t -s + \frac{d}{p}\right) } 2\left( \sum _{j=1}^{\infty } 2^{j\left( t - s + \frac{d}{p}\right) \frac{1}{2}} \right) . \end{aligned} \end{aligned}$$
(42)

The final bound in (42) is independent of \(q_0=q_0(N)\), which shows the first part of the claim for \(\gamma \in (0,d]\). For \(\gamma \in (-\infty ,0]\), we use \(q_0 = q\) in (41) to obtain for any \(t\in (0,s-\frac{d}{p}-\frac{\gamma }{q})\) that

$$\begin{aligned} \Vert b_{T,{\mathcal {D}}}-b_{T,N}|_{\mathcal {D}}\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))}\le & {} 2^{N\left( t - s + \frac{d}{p} + \frac{\gamma }{q}\right) }\left( \sum _{j=1}^{\infty } 2^{jq(t - s + \frac{d}{p} + \frac{\gamma }{q})} \right) ^{\frac{1}{q}}\nonumber \\\le & {} C 2^{N\left( t - s + \frac{d}{p} + \frac{\gamma }{q}\right) }. \end{aligned}$$
(43)

2.) To prove the second part, we observe that for any \(t\in \left( 0, s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q}\right) \) there holds

$$\begin{aligned} \Vert a-a_N\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))}&\le \Vert e^{b_{T,N}|_{\mathcal {D}}}(e^{b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}}-1)\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))} \\ {}&\le \Vert e^{b_{T,N}|_{\mathcal {D}}}\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))} \Vert e^{b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}}-1\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))}, \end{aligned}$$

where the last equation follows by independence of \(b_T - b_{T,N}\) and \(b_{T,N}\). The first factor in this expression is bounded analogously to \(\Vert e^{b_{T,{\mathcal {D}}}}\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))}\) by using the estimates  (34) (resp. (35)), Theorem 2.11 and Hölder’s inequality

$$\begin{aligned} \Vert e^{b_{T,N}|_{\mathcal {D}}}\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))} \le \Vert e^{b_{T,N}|_{\mathcal {D}}}\Vert _{L^{q_1q}(\Omega ; L^\infty (\overline{{\mathcal {D}}}))} \left( \Vert b_{T,N}|_{\mathcal {D}}\Vert _{L^{q_2q}(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))} + 1\right) <\infty ,\nonumber \\ \end{aligned}$$
(44)

where \(q_1,q_2>1\) are such that \(\frac{1}{q_1}+\frac{1}{q_2}=1\) and the bound holds uniformly in N.

In addition, Taylor expansion yields

$$\begin{aligned}&\Vert e^{b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}}-1\Vert _{{\mathcal {C}}^t(\overline{{\mathcal {D}}})} = \sup _{x,y\in {\mathcal {D}},\; x\ne y} \frac{|e^{(b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}})(x)}-e^{(b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}})(y)}|}{\Vert x-y\Vert ^r}\\&\quad + \Vert e^{b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}}-1\Vert _{L^\infty ({\mathcal {D}})} \\&\quad \le \Vert e^{b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})}\sup _{x,y\in {\mathcal {D}},\; x\ne y} \frac{|(b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}})(x)-(b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}})(y)|}{\Vert x-y\Vert ^r} \\&\quad + \Vert e^{b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})} \Vert b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}\Vert _{L^\infty ({\mathcal {D}})} \\&\quad \le \Vert e^{b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})} \Vert b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}\Vert _{{\mathcal {C}}^t(\overline{{\mathcal {D}}})}. \end{aligned}$$

From the proof of the second part of Theorem 3.9 it follows that \(\Vert e^{b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}}\Vert _{L^q(\Omega ; L^\infty ({\mathcal {D}}))}<\infty \) is bounded uniformly with respect to N for all \(q\ge 1\).

Hölder’s inequality for \(p_1,p_2>1\) such that \(\frac{1}{p_1}+\frac{1}{p_2}=1\) thus shows together with the truncation error in (43) that

$$\begin{aligned} \Vert a-a_N\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))}&\le \Vert e^{b_{T,N}|_{\mathcal {D}}}(e^{b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}}-1)\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))} \\ {}&\le C \Vert e^{b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}}\Vert _{L^{p_1q}(\Omega ; L^\infty ({\mathcal {D}}))} \Vert b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}\Vert _{L^{p_2q}(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))} \\ {}&\le C 2^{N(t - s + \frac{d}{p} + \frac{\min (\gamma ,0)}{p_2q})} \end{aligned}$$

The claim follows for any \(\varepsilon >0\) by choosing \(p_2>1\) so small that \(\min (\gamma , 0) \le p_2\min (\gamma +\varepsilon , 0)\). \(\square \)

Remark 4.3

We emphasize that all estimates in Proposition 4.2 are independent of \({\mathcal {D}}\subset {\mathbb {T}}^d\), as all uniform error bounds are derived with respect to \({\mathbb {T}}^d\). Proposition 4.2 shows in particular that for any \(q\ge 1\) and \(t\in (0, s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q})\) there is a \(C>0\) such that for any \(N\in {\mathbb {N}}\) it holds

$$\begin{aligned} \Vert a-a_N\Vert _{L^q(\Omega ; L^\infty ({\mathcal {D}}))} \le C 2^{-Nt}. \end{aligned}$$

This estimate is essential to bound the truncation error \(u-u_N\) of the approximated elliptic problem in (39), see Theorem 4.4 below. In the borderline case \(p=1\) with sufficiently small \(\kappa >0\) and \(sp>d\), we still recover the slightly weaker estimates

$$\begin{aligned} \Vert a-a_N\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))} \le C 2^{N\left( t-s+\frac{d}{p}\right) }, \quad \Vert a-a_N\Vert _{L^q(\Omega ; L^\infty ({\mathcal {D}}))} \le C 2^{-tN} \end{aligned}$$
(45)

for sufficiently small \(q\ge 1\) (depending on \(\kappa \)) and \(t\in (0,s-\frac{d}{p})\), independently of \(\gamma \). This may be seen from by letting \(p_1\rightarrow 1\) and \(p_2\rightarrow \infty \) in the last part of the proof for Proposition 4.2.

Theorem 4.4

Let u be as in (26) with \(a=\exp \left( b_{T,{\mathcal {D}}}\right) \) and let \(u_N\) be as in (26) with \(a_N=\exp \left( b_{T,N}|_{\mathcal {D}}\right) \) given by (40). Furthermore, let \(b_{T,{\mathcal {D}}}\) be such that \(p\in (1,\infty )\), \(s>0\), \(\beta =2^{\gamma -d}\in [0,1]\), and \(sp >d\ge d + \min (\gamma ,0)\). Then, for any \(q\ge 1\) and \(t\in (0, s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q})\) there is a \(C>0\) such that for every \(N\in {\mathbb {N}}\) and it holds

$$\begin{aligned} \Vert u-u_N\Vert _{L^q(\Omega ; V)}\le C 2^{-Nt}. \end{aligned}$$

Proof

For fixed \(\omega \in \Omega \) and \(N\in {\mathbb {N}}\), we obtain by Proposition 3.3

$$\begin{aligned} \begin{aligned} \Vert u(\omega )-u_N(\omega )\Vert _V \le \frac{\Vert f\Vert _{V'}}{a_{-}(\omega )a_{N,-}(\omega )}\Vert a(\omega )-a_N(\omega )\Vert _{L^\infty ({\mathcal {D}})}, \end{aligned} \end{aligned}$$

where \(a_{N,-}(\omega ):={{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}} a_N(\omega ,x)\). Taking expectations yields with Hölder’s inequality

$$\begin{aligned} \begin{aligned} \Vert u-u_N\Vert _{L^q(\Omega ;V)} \le \Vert f\Vert _{V'} \Vert a_-^{-1}\Vert _{L^{q_1}(\Omega )} \Vert a_{N,-}^{-1}\Vert _{L^{q_2}(\Omega )} \Vert a-a_N\Vert _{L^{q_3}(\Omega ;L^\infty ({\mathcal {D}}))}, \end{aligned}\nonumber \\ \end{aligned}$$
(46)

where \(q_1,q_2,q_3>1\) are such that \(\frac{1}{q}=\sum _{i=1}^3\frac{1}{q_i}\) and \(\Vert f\Vert _{V'}<\infty \). As in the proof of part 2.) in Theorem 3.2, we conclude for any \(q_1\in [1,\infty )\) and \(t\in (0,s-\frac{d}{p})\) with Theorem 2.11 that

$$\begin{aligned} \Vert a_-^{-1}\Vert _{L^{q_1}(\Omega )} \le \Vert \exp (\Vert b_{T,{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})})\Vert _{L^{q_1}(\Omega )} \le \Vert \exp (\Vert b_{T}\Vert _{{\mathcal {C}}^t})\Vert _{L^{q_1}(\Omega )} <\infty . \end{aligned}$$

Similarly, it follows for all \(q_2\in [1,\infty )\) that

$$\begin{aligned} \Vert a_{N,-}^{-1}\Vert _{L^{q_2}(\Omega )} \le \Vert \exp (\Vert b_{T,N}\Vert _{{\mathcal {C}}^t})\Vert _{L^{q_2}(\Omega )} \le \Vert \exp (\Vert b_T\Vert _{{\mathcal {C}}^t})\Vert _{L^{q_2}(\Omega )} <\infty , \end{aligned}$$

where we emphasize that the last bound is uniform with respect to N. Proposition 4.2 and Remark 4.3 show for \(q_3\in [1,\infty )\) and \(t\in (0,s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q_3})\) that

$$\begin{aligned} \Vert a-a_N\Vert _{L^{q_3}(\Omega ; L^\infty ({\mathcal {D}}))} \le C 2^{-Nt}. \end{aligned}$$

This, together with (46), shows the claim, as \(q_3>q\) may be chosen arbitrary close to q, and

$$\begin{aligned} \Vert a_-^{-1}\Vert _{L^{q_1}(\Omega )}+\Vert a_{N,-}^{-1}\Vert _{L^{q_2}(\Omega )}\le C <\infty \end{aligned}$$

holds for all \(q_1,q_2\in [1,\infty )\) with \(C=C(q_1,q_2)>0\), and uniform with respect to N. \(\square \)

Remark 4.5

In view of Remark 4.3, we note that for \(p=1\) with sufficiently small \(\kappa >0\) and \(sp>d\) there holds the slightly weaker estimate

$$\begin{aligned} \Vert u-u_N\Vert _{L^q(\Omega ; V)} \le C 2^{-Nt}. \end{aligned}$$

for sufficiently small \(q\ge 1\) (depending on \(\kappa \)) and \(t\in (0,s-\frac{d}{p})\), independently of \(\gamma \). This may also be seen by letting \(q_1,q_2\rightarrow \frac{1}{2q}\) and \(q_3\rightarrow \infty \) in the proof of Theorem 4.4.

4.2 Finite element discretization

The solution \(u_N:\Omega \rightarrow V\) to Problem (39) with truncated diffusion coefficient is still not fully tractable, as it takes values in the infinite-dimensional Hilbert space V. Thus, we consider Galerkin-finite element approximations of \(u_N\) in a finite-dimensional subspace of V. Corollary 4.1 provides the necessary regularity of \(u_N\), independent of the truncation index N, therefore we fix \(N\in {\mathbb {N}}\) for the remainder of this section.

We partition the convex, polytopal domain \({\mathcal {D}}\subset {\mathbb {T}}^{d}\), \(d\in \{1,2,3\}\) by a sequence of simplices (intervals/triangles/tetrahedra) or parallelotopes (intervals/parallelograms/parallelepipeds), denoted by \(({\mathcal {K}}_h)_{h\in {\mathfrak {H}}}\). The refinement parameter \(h>0\) takes values in a countable index set \({\mathfrak {H}}\subset (0,\infty )\) and corresponds to the longest edge of a simplex/parallelotope \(K\in {\mathcal {K}}_h\). We impose the following assumptions on \(({\mathcal {K}}_h)_{h\in {\mathfrak {H}}}\) to obtain a sequence of “well-behaved” triangulations.

Assumption 4.6

The sequence \(({\mathcal {K}}_h)_{h\in {\mathfrak {H}}}\) satisfies:

  1. 1.

    Admissibility: For each \(h\in {\mathfrak {H}}\), \({\mathcal {K}}_h\) consists of open, non-empty simplices/parallelotopes K such that

    • \(\overline{{\mathcal {D}}}= \bigcup _{K\in {\mathcal {K}}_h} \overline{K}\),

    • \(K_1\cap K_2=\emptyset \) for any two \(K_1, K_2\in {\mathcal {K}}_h\) such that \(K_1\ne K_2\), and

    • the intersection \(\overline{K}_1\cap \overline{K}_2\) for \(K_1\ne K_2\) is either empty, a common edge, a common vertex, or (in space dimension \(d=3\)) a common face of \(K_1\) and \(K_2\).

  2. 2.

    Shape-regularity: Let \(\rho _{K,in}\) and \(\rho _{K,out}\) denote the radius of the largest in- and circumscribed circle, respectively, for a given \(K\in {\mathcal {K}}_h\). There is a constant \(\rho > 0\) such that

    $$\begin{aligned} \rho : = \sup _{h\in {\mathfrak {H}}} \sup _{K\in {\mathcal {K}}_h} \frac{\rho _{K,out}}{\rho _{K,in}} < \infty . \end{aligned}$$

Based on a given tesselation \({\mathcal {K}}_h\), we define the space of piecewise (multi-)linear finite elements

$$\begin{aligned} V_h:= {\left\{ \begin{array}{ll} \{v\in V|\; v|_T\text { is linear for all }K\in {\mathcal {K}}_h\}, &{}\quad \text {if }{\mathcal {K}}_h\text { consists of simplices}, \\ \{v\in V|\; v|_T\text { is }d\text {-linear for all }K\in {\mathcal {K}}_h\}, &{}\quad \text {if }{\mathcal {K}}_h\text { consists of parallelotopes}. \end{array}\right. } \end{aligned}$$

Clearly, \(V_h\subset V\) is a finite-dimensional space and we define \(n_h:=\dim (V_h)\in {\mathbb {N}}\). This yields for fixed \(\omega \in \Omega \) the fully discrete problem to find \(u_{N,h}(\omega )\in V_h\) such that for all \(v_h\in V_h\)

$$\begin{aligned} \int _{\mathcal {D}}a_N(\omega )\nabla u_{N,h}(\omega )\cdot \nabla v_h dx = _{V'}^{}{\langle }_{}^{} f ,v_h \rangle _{V}. \end{aligned}$$
(47)

Theorem 4.7

Let \(({\mathcal {K}}_h)_{h\in {\mathfrak {H}}}\) be a sequence of triangulations satisfying Assumption 4.6, and let \(u_N\) and \(u_{N,h}\) be the pathwise weak solutions to (39) and (47). Furthermore, let \(N\in {\mathbb {N}}\), \(a_N\) be given as in (40) for \(p\in [1,\infty )\) and \(s>0\), such that \(sp>d\), and with \(\beta =2^{\gamma -d}\in [0,1]\).

For any \(f\in H\), sufficiently small \(\kappa >0\) in (16) and any \(r\in (0,s-\frac{d}{p})\cap (0,1]\), there are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that for any \(N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) there holds

$$\begin{aligned} \Vert u_N-u_{N,h}\Vert _{L^q(\Omega ; V)}\le C h^r \quad {\left\{ \begin{array}{ll} &{}\text {for }q\in [1,\overline{q})\text { if }p=1\text {, and} \\ &{}\text {for any }q\in [1,\infty )\text { if }p>1. \end{array}\right. } \end{aligned}$$

Proof

We recall that \(a_{N,-}(\omega ):={{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}} a_{N,-}(\omega )>0\) and obtain by Cea’s Lemma

$$\begin{aligned} \Vert u_N(\omega )-u_{h,N}(\omega )\Vert _V \le \frac{\Vert a_N(\omega )\Vert _{L^\infty ({\mathcal {D}})}}{a_{N,-}(\omega )} \Vert f\Vert _{V'} \inf _{v_h\in V_h} \Vert u_N(\omega ) - v_h\Vert _{V}. \end{aligned}$$
(48)

Now first suppose that \(p>1\). Since \(f\in H\), it holds by Corollary 4.1 for any \(q\ge 1\) that \(u_N\in L^q(\Omega ; W^r)\) for \(r\in (0, s-\frac{d}{p})\cap (0,1]\). For \(0<s-\frac{d}{p}\le 1\), we have \(r\in (0,s-\frac{d}{p})\), and Lemma 3.5, shows \(u_N\in L^q(\Omega ; H^{1+r_0}({\mathcal {D}}))\) for any \(r_0\in (0,r)\). It hence follows for \(r_0\in (0,r)\) that

$$\begin{aligned} \inf _{v_h\in V_h} \Vert u_N(\omega ) - v_h\Vert _{V}\le C \Vert u_N(\omega )\Vert _{H^{1+ r_0}({\mathcal {D}})} h^{r_0}. \end{aligned}$$
(49)

This is a standard result for first order Lagrangian FEM, see, e.g., [17, Theorems 8.62/8.69] or [9, Theorem 4.4.20]. The constant \(C>0\) in (49) depends on the shape-regularity parameter \(\rho \) and on \({\mathcal {D}}\), but is independent of \(u_N\) and h. Combining (48) and (49) shows with Hölder’s inequality

$$\begin{aligned} \begin{aligned} \Vert u_N(\omega )-u_{h,N}(\omega )\Vert _{L^q(\Omega ; V)}&\le C \Vert f\Vert _{V'} \Vert a_N\Vert _{L^{3q}(\Omega ; L^\infty ({\mathcal {D}}))} \Vert a_{N,-}^{-1}\Vert _{L^{3q}(\Omega )} \Vert u_N\Vert _{L^{3q}(\Omega ; H^{1+r_0}({\mathcal {D}}))} h^{r_0} \\&\le C \Vert a_N\Vert _{L^{3q}(\Omega ; L^\infty ({\mathcal {D}}))}^2 \Vert u_N\Vert _{L^{3q}(\Omega ; H^{1+ r_0 }({\mathcal {D}}))} h^{r_0} \\&\le C \Vert a_N\Vert _{L^{3q}(\Omega ; L^\infty ({\mathcal {D}}))}^2 \Vert u_N\Vert _{L^{3q}(\Omega ; W^r)} h^{r_0} \\&\le C h^{r_0}. \end{aligned} \end{aligned}$$
(50)

We have used that \(a_{N,-}\) and \(\Vert a_N\Vert _{L^\infty ({\mathcal {D}})}\) are equal in distribution for the second estimate, and Lemma 3.5 in the third line. The last step follows for any \(q\in [1,\infty )\) by Corollary 4.1 and Proposition 4.2 since \(p>1\). Moreover, as a further consequence of Corollary 4.1 and Proposition 4.2, the constant \(C>0\) in the final estimate in (50) bounded independently of N and h. Since \(0<s-\frac{d}{p}\le 1\), we may choose \(r_0<r<s-\frac{d}{p}\) arbitrary close to \(s-\frac{d}{p}\).

On the other hand, if \(s-\frac{d}{p}>1\) and \(r=1\), Lemma 3.5 implies that \(u_N\in L^q(\Omega ; H^{2}({\mathcal {D}}))\). Estimates (49) and (50) then hold for \(r_0=r=1\), which proves the claim in case that \(p>1\).

For \(p=1\) and given \(q\ge 1\), we need to assume in addition that \(\kappa >0\) be sufficiently small such that Corollary 4.1 and (45) in Remark 4.3 hold with q replaced 3q. In this case, the claim for \(p=1\) follows analogously as for \(p>1\). \(\square \)

In the proof of Theorem 4.7, we obtain exponential moments of power 3q by Hölder’s inequality. For the case \(p=1\), we therefore need approximately that \(\kappa < \frac{1}{3q}\) (up to summation constants) to counter-balance this exponent and obtain \(a_N \in L^{3q}(\Omega ; L^\infty ({\mathcal {D}}))\) and \(u_N \in L^{3q}(\Omega ; W^r)\).

Theorem 4.8

Let the assumptions of Theorem 4.7 hold. For any \(f\in H\), sufficiently small \(\kappa >0\) in (16) and for any \(r\in (0,s-\frac{d}{p})\cap (0,1]\), there are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that for any \(N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) there holds

$$\begin{aligned} \Vert u_N-u_{N,h}\Vert _{L^q(\Omega ; H)}\le C h^{2r} \quad {\left\{ \begin{array}{ll} &{}\text {for }q\in [1,\overline{q})\text { if }p=1\text {, and} \\ &{}\text {for any }q\in [1,\infty )\text { if }p>1. \end{array}\right. } \end{aligned}$$

Proof

The proof uses the well-known Aubin-Nitsche duality argument. Let \(e_{N,h}:=u_N-u_{N,h}\) and consider for fixed \(\omega \in \Omega \) the dual problem to find \(\varphi (\omega )\in V\) such that for all \(v\in V\) it holds

$$\begin{aligned} \int _{\mathcal {D}}a_N(\omega )\nabla \varphi (\omega )\cdot \nabla v dx = _{V'}^{}{\langle }_{}^{} e_{N,h}(\omega ) ,v \rangle _{V}. \end{aligned}$$
(51)

We need to investigate the regularity and integrability of \(\varphi \) as a first step. Lemma 3.4 shows that

$$\begin{aligned} \Vert \varphi (\omega )\Vert _{W^r} \le \frac{C}{a_-(\omega )} \left( 1+\left( \frac{\Vert a(\omega )\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})}}{a_-(\omega )}\right) ^{1/r}\right) \Vert e_{N,h}(\omega )\Vert _H. \end{aligned}$$
(52)

Let \(t\in (0, s-\frac{d}{p})\) be fixed. We integrate both sides of (52) and use Hölder’s inequality as in the third part of Theorem 3.9 (cf. Inequality (37)) to obtain for \(q_0\ge 1\) and

\(q_1,\dots ,q_3\in [1,\infty )\) such that \(1=\sum _{i=1}^3\frac{1}{q_i}\)

$$\begin{aligned} \begin{aligned} \Vert \varphi \Vert _{L^{q_0}(\Omega ; W^r)}^{q_0}&\le C\,{\mathbb {E}}\left[ \exp \left( q_1\left( q_0+\frac{2q_0}{r}\right) \Vert b_T\Vert _{\textrm{C}^r}\right) \right] ^{\frac{1}{q_1}}\\&\cdot {\mathbb {E}}\left[ \Vert b_T\Vert _{\textrm{C}^r}^{\frac{q_2q_0}{r}} \right] ^{\frac{1}{q_2}} {\mathbb {E}}\left[ \Vert e_{N,h}\Vert _{H}^{q_0q_3}\right] ^{1/q_3} \end{aligned} \end{aligned}$$
(53)

Note that we have again assumed that \(\Vert b_{T,{\mathcal {D}}}\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})}\ge 1\) without loss of generality to derive (53).

By Theorems 2.114.7 and Proposition 4.2, we now conlude that the right hand side in (53) is finite and bounded uniformly in N for any \(q_0\ge 1\) if \(p>1\), as the Hölder conjugates \(q_1,\dots ,q_{3}\in [1,\infty )\) may be arbitrary large.

For \(p=1\), we further need that \(\kappa >0\) in (16) is sufficiently small, so that \(\varepsilon _p>q_0\max (q_1(1+\frac{1}{r}), \frac{q_2}{r})\) in Theorem 2.11 and that \(\overline{q}\ge q_0q_3\) in Theorem 4.7. Given that \(\kappa >0\) is sufficiently small, there is for any \(p\ge 1\) a \(q_0\ge 1\) such that \(\varphi \in L^{q_0}(\Omega ; W^r)\).

For the next step, we combine Equations (39) and (47) to show the Galerkin orthogonality

$$\begin{aligned} \int _{\mathcal {D}}a_N(\omega )\nabla e_{N,h}(\omega )\cdot \nabla v_h dx = 0, \quad v_h\in V_h. \end{aligned}$$
(54)

Let \(P_h:V\rightarrow V_h\) denote the V-orthogonal projection onto \(V_h\). Testing with \(v=e_{N,h}(\omega )\in V\) in (51) then shows together with \(v_h=P_h\varphi (\omega )\) in (54) that

$$\begin{aligned} \begin{aligned} \Vert e_{N,h}(\omega )\Vert _H^2&= \int _{\mathcal {D}}a_N(\omega )\nabla \varphi (\omega )\cdot \nabla e_{N,h}(\omega ) dx \\&\le \Vert a_N(\omega )\Vert _{L^\infty ({\mathcal {D}})} \Vert e_{N,h}(\omega )\Vert _V \Vert (I-P_h)\varphi (\omega )\Vert _V. \end{aligned} \end{aligned}$$
(55)

Estimate (55) then yields for \(q\in [1,\infty )\) with Hölder’s inequality

$$\begin{aligned} \begin{aligned} \Vert e_{N,h}\Vert _{L^q(\Omega ; H)} \le \Vert a_N\Vert _{L^{\frac{3q}{2}}(\Omega ; L^\infty ({\mathcal {D}}))} \Vert e_{N,h}\Vert _{L^{\frac{3q}{2}}(\Omega ; V)} \Vert (I-P_h)\varphi \Vert _{L^{\frac{3q}{2}}(\Omega ; V)}. \end{aligned} \end{aligned}$$

First, suppose again that \(p>1\), where \(\varphi \in L^{q_0}(\Omega ; W^r)\) holds for any \(q_0\ge 1\). Proposition 4.2 and Theorem 4.7 yield

$$\begin{aligned} \begin{aligned} \Vert e_{N,h}\Vert _{L^q(\Omega ; H)} \le C h^{r}\Vert (I-P_h)\varphi \Vert _{L^{\frac{3q}{2}}(\Omega ; V)}, \end{aligned} \end{aligned}$$

where \(C>0\) is independent of N and h.

Lemma 3.5, \(\varphi \in L^{q_0}(\Omega ; W^r)\), and (49) further show that

$$\begin{aligned} \begin{aligned} \Vert e_{N,h}\Vert _{L^q(\Omega ; H)} \le C h^{r+r_0}, \quad r_0\in (0,r)\cup \{\lfloor r \rfloor \}. \end{aligned} \end{aligned}$$
(56)

The claim follows as in Theorem 4.7, since \(r=r_0=1\) if \(s-\frac{d}{p}>1\), and \(r_0<r<s-\frac{d}{p}\) may be arbitrary close to \(s-\frac{d}{p}\) otherwise.

For \(p=1\) and given \(q\ge 1\) on the other hand, we need to assume that \(\kappa >0\) is sufficiently small so that \(\varphi \in L^{q_0}(\Omega ; W^r({\mathcal {D}}))\) for \(q_0=\frac{3q}{2}\), and that (45) and Theorem 4.7 hold with q replaced by \(\frac{3q}{2}\). The claim then follows as for \(p>1\) from (4.2). \(\square \)

Bounds on the overall approximation errors with respect to V and H now follow as an immediate consequence of Theorems 4.44.74.8 and Remark 4.5.

Corollary 4.9

Let the assumptions of Theorem 4.7 hold, let \(f\in H\), let \(t\in (0,s-\frac{d}{p})\) and assume given \(r\in (0,s-\frac{d}{p})\cap (0,1]\). Then there holds:

  1. 1.)

    For \(p=1\) and sufficiently small \(\kappa >0\) in (16), there are constants \(\overline{q}=\overline{q}(\kappa )\in (1,\infty )\) and \(C>0\) such that for any \(q\in [1,\overline{q})\), \(N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) there holds

    $$\begin{aligned} \Vert u-u_{N,h}\Vert _{L^q(\Omega ; V)}&\le C(2^{-tN}+h^{r}), \\ \Vert u-u_{N,h}\Vert _{L^q(\Omega ; H)}&\le C(2^{-tN}+h^{2r}). \end{aligned}$$
  2. 2.)

    For \(p\in (1,\infty )\) and any \(q\in [1,\infty )\) there is a constant \(C>0\) such that for any \(N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) there holds

    $$\begin{aligned} \Vert u-u_{N,h}\Vert _{L^q(\Omega ; V)}&\le C(2^{N\left( -t+\frac{\min (\gamma ,0)}{q}\right) }+h^{r}), \\ \Vert u-u_{N,h}\Vert _{L^q(\Omega ; H)}&\le C(2^{N\left( -t+\frac{\min (\gamma ,0)}{q}\right) }+h^{2r}). \end{aligned}$$

5 Multilevel Monte Carlo estimation

We consider Monte Carlo  estimation of \({\mathbb {E}}(\Psi (u))\) for a given functional \(\Psi \) and u as solution to (27) with Besov random tree coefficient a. We replace u by a tractable approximation \(u_{N,h}\) to evaluate \(\Psi (u_{N,h})\approx \Psi (u)\) and bound the overall error consisting of the pathwise discretization from Sect. 4 and the statistical error of the Monte Carlo  approximation.

Assumption 5.1

 

  1. 1.)

    Let \(\theta \in [0,1]\), let \(\Psi :H^\theta ({\mathcal {D}})\rightarrow {\mathbb {R}}\) be Fréchet-differentiable on \(H^\theta ({\mathcal {D}})\) and denote by

    $$\begin{aligned} \Psi ':H^\theta ({\mathcal {D}})\rightarrow {\mathcal {L}}(H^\theta ({\mathcal {D}}); {\mathbb {R}})=(H^{\theta }({\mathcal {D}}))' \end{aligned}$$

    the Fréchet-derivative of \(\Psi \). There are constants \(C>0\), \(\rho _1,\rho _2\ge 0\) such that for all \(v\in H^\theta ({\mathcal {D}})\)

    $$\begin{aligned} |\Psi (v)|\le C(1+\Vert v\Vert _{H^\theta ({\mathcal {D}})}^{\rho _1}), \quad \Vert \Psi '(v)\Vert _{{\mathcal {L}}(H^\theta ({\mathcal {D}}); {\mathbb {R}})}\le C(1+\Vert v\Vert _{H^\theta ({\mathcal {D}})}^{\rho _2}). \end{aligned}$$
    (57)
  2. 2.)

    For \(q:=2\max (\rho _1,\rho _2+1)\), there holds \(u\in L^{q}(\Omega ;V)\).

  3. 3.)

    \(({\mathcal {K}}_h)_{h\in {\mathfrak {H}}}\) is a collection of triangulations satisfying Assumption 4.6.

  4. 4.)

    There are constants \(t>0\), \(r\in (0,1]\) and \(C>0\) such that for \(q=2\max (\rho _1,\rho _2+1)\) and any \(N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) it holds

    $$\begin{aligned} \Vert u-u_{N,h}\Vert _{L^q(\Omega ; V)}\le C(2^{-tN}+h^{r}),&\quad \Vert u-u_{N,h}\Vert _{L^q(\Omega ; H)}\le C(2^{-tN}+h^{2r}). \end{aligned}$$

Remark 5.2

Assumption 5.1 is natural, and includes in particular bounded linear functions \(\Psi \), where \(\rho _1=1\) and \(\rho _2=0\). Item 2.) follows by Theorem 3.9 and Item 4.) by Corollary 4.9, with no further restrictions whenever \(p>1\). Only in case that \(p=1\), \(\kappa >0\) needs to be sufficiently small to ensure that all bounds hold for \(q=2\max (\rho _1,\rho _2 + 1)\ge 2\).

5.1 Singlelevel Monte Carlo

We use Monte Carlo (MC) methods to approximate \({\mathbb {E}}(\Psi (u))\) for a given functional \(\Psi \). To this end, we first consider the standard MC estimator for (general) real-valued random variables.

Definition 5.3

Let \(Y:\Omega \rightarrow {\mathbb {R}}\) be a random variable and let \((Y^{(i)},i\in {\mathbb {N}})\) be a sequence of i.i.d. copies of Y. For \(M\in {\mathbb {N}}\) we define Monte Carlo estimator \(E_M(Y):\Omega \rightarrow {\mathbb {R}}\) as

$$\begin{aligned} E_M(Y):=\frac{1}{M}\sum _{i=1}^M Y^{(i)}. \end{aligned}$$
(58)

As we are not able to sample directly from the distribution of u, we rely on i.i.d. copies \((u^{(i)}_{N,h}, i\in {\mathbb {N}})\) of the pathwise approximation \(u_{N,h}\) from Sect. 4. Thereby, in addition to the statistical MC error of order \({\mathcal {O}}(M^{-1/2})\), we introduce a sampling bias that depends on N and h.

Theorem 5.4

Let \(M\in {\mathbb {N}}\), let \(E_M(\Psi (u_{N,h}))\) be the MC estimator as in (58), and let Assumption 5.1 hold. Then, there is a constant \(C>0\), such that for any \(M, N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) it holds

$$\begin{aligned} \Vert {\mathbb {E}}(\Psi (u))-E_M(\Psi (u_{N,h}))\Vert _{L^2(\Omega )} \le C\left( M^{-1/2} + 2^{-tN} + h^{(2-\theta )r} \right) \end{aligned}$$

Proof

We split the overall error via

$$\begin{aligned} \Vert {\mathbb {E}}(\Psi (u))-E_M(\Psi (u_{N,h}))\Vert _{L^2(\Omega )}&\le \Vert {\mathbb {E}}(\Psi (u))-E_M(\Psi (u))\Vert _{L^2(\Omega )} \\&\quad +\Vert E_M(\Psi (u))-E_M(\Psi (u_{N,h}))\Vert _{L^2(\Omega )} \\&:= I+II. \end{aligned}$$

To bound I, we use independence of \(\Psi (u)^{(i)}\) and \(\Psi (u)^{(j)}\) for \(i\ne j\) to see that

$$\begin{aligned} I^2&= {\mathbb {E}}\left( \left( {\mathbb {E}}(\Psi (u)) - \frac{1}{M}\sum _{i=1}^M \Psi (u)^{(i)}\right) ^2\right) \\&= {\mathbb {E}}(\Psi (u))^2 -\frac{2}{M} \sum _{i=1}^M{\mathbb {E}}(\Psi (u))^2 +\frac{1}{M^2}\sum _{i,j=1}^M {\mathbb {E}}\left( \Psi (u)^{(i)}\Psi (u)^{(j)}\right) \\&=\frac{\textrm{Var}(\Psi (u))}{M}. \end{aligned}$$

Assumption 5.1 further shows that

$$\begin{aligned} I\le \frac{\Vert \Psi (u)\Vert _{L^2(\Omega )}}{M^{1/2}} \le C\frac{1+\Vert u\Vert _{L^{2\rho _1}(\Omega ; H^\theta ({\mathcal {D}}))}}{M^{1/2}} \le C\frac{1+\Vert u\Vert _{L^{2\rho _1}(\Omega ; V)}}{M^{1/2}} \le CM^{-1/2}, \end{aligned}$$

where we have used that \(\theta \le 1\) and \(u\in L^{2\rho _1}(\Omega ; V)\).

To bound II, we use Equation (57) and derive the pathwise estimate

$$\begin{aligned} \begin{aligned}&|\Psi (u)-\Psi (u_{N,h})| = \left| \int _0^1 \Psi '(u+z(u_{N,h}-u))(u-u_{N,h})dz\right| \\&\quad \le \int _0^1 \Vert \Psi '(u+z(u_{N,h}-u))\Vert _{{\mathcal {L}}(H^\theta ({\mathcal {D}});{\mathbb {R}})}\Vert u-u_{N,h}\Vert _{H^\theta ({\mathcal {D}})}dz \\&\quad \le C\left( 1+\Vert u\Vert _{H^\theta ({\mathcal {D}})}^{\rho _2}+\Vert u-u_{N,h}\Vert _{H^\theta ({\mathcal {D}})}^{\rho _2}\right) \Vert u-u_{N,h}\Vert _{H^\theta ({\mathcal {D}})} \\&\quad \le C\left( 1+\Vert u\Vert _{H^\theta ({\mathcal {D}})}^{\rho _2}+\Vert u_{N,h}\Vert _{H^\theta ({\mathcal {D}})}^{\rho _2}\right) \Vert u-u_{N,h}\Vert _{H^\theta ({\mathcal {D}})}. \end{aligned} \end{aligned}$$
(59)

By Assumption 5.1, there is a \(C>0\) such that for every N and every \(0<h\le 1\) it holds

$$\begin{aligned} \Vert u_{N,h}\Vert _{L^{2(\rho _2+1)}(\Omega ;H^\theta ({\mathcal {D}}))} \le C\Vert u\Vert _{L^{2(\rho _2+1)}(\Omega ;H^\theta ({\mathcal {D}}))} \le C \Vert u\Vert _{L^{2(\rho _2+1)}(\Omega ;V)} <\infty . \end{aligned}$$
(60)

Furthermore, as \(\theta \in [0,1]\), we have by the Gagliardo-Nirenberg interpolation inequality

$$\begin{aligned}{} & {} \Vert u-u_{N,h}\Vert _{L^{2(\rho _2+1)}(\Omega ;H^\theta ({\mathcal {D}}))} \le \Vert u-u_{N,h}\Vert _{L^{2(\rho _2+1)}(\Omega ;H)}^{1-\theta } \Vert u-u_{N,h}\Vert _{L^{2(\rho _2+1)}(\Omega ;V)}^{\theta }\nonumber \\{} & {} \quad \le C (2^{-tN}+h^{(2-\theta )r}). \end{aligned}$$
(61)

Thus, Hölder’s inequality with conjugate exponents \(q_1=\frac{\rho _2+1}{\rho _2}\), \(q_2=\rho _2+1\) (and \(q_1=\infty \), \(q_2=1\) for \(\rho _2=0\)) shows that

$$\begin{aligned} \begin{aligned} II&\le \Vert \Psi (u)-\Psi (u_{N,h})\Vert _{L^2(\Omega )} \\&{\mathop {\le }\limits ^{(59)}} C(1+\Vert u\Vert _{L^{q_12\rho _2}(\Omega ;H^\theta ({\mathcal {D}}))}+\Vert u_{N,h}\Vert _{L^{q_12\rho _2}(\Omega ;H^\theta ({\mathcal {D}}))}) \Vert u-u_{N,h}\Vert _{L^{q_22}(\Omega ;H^\theta ({\mathcal {D}}))} \\&{\mathop {\le }\limits ^{(60)}} C(1+\Vert u\Vert _{L^{2(\rho _2+1)}(\Omega ;V)}) \Vert u-u_{N,h}\Vert _{L^{2(\rho _2+1)}(\Omega ;H^\theta ({\mathcal {D}}))}\\&{\mathop {\le }\limits ^{(61)}} C (2^{-tN}+h^{(2-\theta )r}). \end{aligned} \end{aligned}$$
(62)

\(\square \)

The error contributions in Theorem 5.4 are balanced by choosing

$$\begin{aligned} M\approx 2^{2tN} \approx h^{-2(2-\theta )r}. \end{aligned}$$
(63)

With this choice, achieving the target accuracy \(\Vert {\mathbb {E}}(\Psi (u))-E_M(\Psi (u_{N,h}))\Vert _{L^2(\Omega )} ={\mathcal {O}}(\varepsilon )\) requires sampling \(M={\mathcal {O}}(\varepsilon ^{-2})\) high-fidelity approximations \(u_{N,h}\) with \(N={\mathcal {O}}(\frac{\log (\varepsilon )}{t})\) scales and mesh refinement \(h={\mathcal {O}}(\varepsilon ^{\frac{1}{(2-\theta )r}})\). This is computationally challenging in dimension \(d\ge 2\) and for low-regularity problems, i.e., when \(t,r>0\) are small. To alleviate the computational burden, we propose a multilevel Monte Carlo  extension of the estimator \(E_M\) in the next subsection.

5.2 Multilevel Monte Carlo

The multilevel Monte Carlo (MLMC) algorithm was invented by Heinrich [18] to compute parametric integrals, then rediscovered and popularized by Giles [14, 15], and has since then found various applications in uncertainty quantification and beyond.

To apply this methodology to our model problem we fix a maximum refinement level \(L\in {\mathbb {N}}\) and consider a sequence of approximated solutions \(u_{N_\ell , h_\ell }\) with \((N_\ell , h_\ell )\in {\mathbb {N}}\times {\mathfrak {H}}\) for \(\ell \in \{1,\dots ,L\}\). We assume that \(N_1<\dots <N_L\) and \(h_1>\dots >h_L\), so that the error \(u-u_{N_\ell ,h_\ell }\) decreases with respect to the level \(\ell \). For notational convenience, we define \(\Psi _\ell :=\Psi (u_{N_\ell ,h_\ell })\) as the approximation of the quantity of interest \(\Psi (u)\) on level \(\ell \), and set \(\Psi _0:=0\). The basic idea of the MLMC method for estimating \({\mathbb {E}}(\Psi (u))\) is to exploit the telescopic expansion

$$\begin{aligned} {\mathbb {E}}(\Psi (u))\approx {\mathbb {E}}(\Psi _L)={\mathbb {E}}(\Psi _L)-{\mathbb {E}}(\Psi _0)=\sum _{\ell =1}^L {\mathbb {E}}(\Psi _\ell -\Psi _{\ell -1}) \end{aligned}$$
(64)

of the high-fidelity approximation \(\Psi _L\). On each level \(\ell \), the correction \({\mathbb {E}}(\Psi _\ell -\Psi _{\ell -1})\) is estimated by (standard) MC estimator with \(M_\ell \) samples. This yields the multilevel Monte Carlo  estimator

$$\begin{aligned} E^L(\Psi _L):=\sum _{\ell =1}^L E_{M_\ell }(\Psi _\ell -\Psi _{\ell -1}), \end{aligned}$$
(65)

with level-dependent numbers of samples \(M_1,\dots ,M_L\in {\mathbb {N}}\). We assume that the estimators \(E_{M_\ell }(\Psi _\ell -\Psi _{\ell -1})\) are jointly independent across the levels \(\ell =1,\dots ,L\).

Provided that \(\textrm{Var}(\Psi _\ell -\Psi _{\ell -1})\) decays sufficiently fast in \(\ell \), we choose \(M_1>\dots >M_L\) such that the majority of samples are generated cheaply on low levels \(\ell \), while only a few expensive samples for large \(\ell \) are necessary. This entails massive computational savings compared to a singlelevel Monte Carlo (SLMC) estimator as in (58), that requires a large number of expensive samples on level L, and does not exploit the level hierarchy whatsoever. The computational gain of the MLMC method is precisely quantified under certain assumptions in Giles’ complexity theorem ([14, Theorem 3.1]). Given some \(\varepsilon >0\), Giles derives the optimal number of refinement levels L and associated numbers of samples \(M_1,\dots , M_L\) that guarantee \(\Vert {\mathbb {E}}(\Psi (u))-E^L(\Psi (u))\Vert _{L^2(\Omega )}\le \varepsilon \). The latter requires exact knowledge of all constants in Assumption 5.1, and, furthermore, exact knowledge of the cost for sampling one instance of \(\Psi _\ell \). As this is not feasible a-priori, we choose a slightly different approach to determine the MLMC parameters. We retain the optimal order of complexity as in [14, Theorem 3.1].

Assumption 5.5

Let \(({\mathcal {K}}_h)_{h\in {\mathfrak {H}}}\) be a sequence of triangulations satisfying Assumption 4.6, and assume that \(h_\ell \in {\mathfrak {H}}\) for any \(\ell \in {\mathbb {N}}\). Furthermore, in view of the multilevel convergence analysis, we assume that there are \(0<\underline{c}_{\mathcal {K}}\le \overline{c}_{\mathcal {K}}<1\) and \(h_0>0\) such that

$$\begin{aligned} \underline{c}_{\mathcal {K}}^\ell h_0\le h_\ell \le \overline{c}_{\mathcal {K}}^\ell h_0,\quad \ell \in {\mathbb {N}}. \end{aligned}$$
(66)

One sample of \(\Psi _\ell =\Psi (u_{N_\ell ,h_\ell })\) with \(u_{N_\ell ,h_\ell }\in V_{h_\ell }\) and \(n_\ell :=\dim (V_{h_\ell })={\mathcal {O}}(h_\ell ^{-d})\) is realized in \({\mathcal {O}}(n_\ell )\) work and memory.

Remark 5.6

Assumption 5.5 is natural and holds, for instance, with \(\underline{c}_{\mathcal {K}}, \overline{c}_{\mathcal {K}}\approx \frac{1}{2}\) if the mesh \({\mathcal {K}}_{h_\ell }\) is obtained from \({\mathcal {K}}_{h_{\ell -1}}\) by bisection of the longest edge of each \(K\in {\mathcal {K}}_{h_{\ell -1}}\). We remark that in general, it may be hard to achieve \(\underline{c}_{\mathcal {K}}= \overline{c}_{\mathcal {K}}\), which is why we imposed an upper and lower bound in (66). Simulating \(\Psi _\ell \) requires \({\mathcal {O}}(n_\ell )={\mathcal {O}}(h_\ell ^{-d})\) floating point operations per sample when using multilevel solvers for continuous piecewise linear or multi-linear elements. We also refer to Lemma B.2 in Appendix B.3, where we show that the expected cost of sampling \(b_{N,T}\) on the associated grid is of order \({\mathcal {O}}(h_\ell ^{-d})\) if \(\beta <1\).

Theorem 5.7

Let Assumptions 5.1 and 5.5 hold. For tr and \(\theta \) as in Assumption 5.1, let \(\varepsilon \in (0, h_0^{(2-\theta )r})\) and select the MLMC parameters in (65) for \(\ell \in \{1,\dots ,L\}\) as

$$\begin{aligned}{} & {} L:=\lceil \frac{\log (\varepsilon )}{(2-\theta )r\log (\underline{c}_{\mathcal {K}})} -\frac{\log (h_0)}{\log (\underline{c}_{\mathcal {K}})} \rceil ,\quad M_\ell :=\lceil \left( \frac{h_\ell }{h_L}\right) ^{2(2-\theta )r}w_\ell \rceil ,\nonumber \\{} & {} \quad N_\ell :=\lceil - \frac{\log (h_\ell )(2-\theta )r}{\log (2)t} \rceil . \end{aligned}$$
(67)

For given \(L\in {\mathbb {N}}\), choose the weights \(w_\ell >0\) to determine \(M_\ell \) such that \(\sum _{\ell =1}^L w_\ell ^{-1} \le C_w < \infty \), for sufficiently large, fixed \(C_w > 0\) independent of L.

Then, there is a \(C>0\), such that for any \(\varepsilon \in (0, h_0^{(2-\theta )r})\) it holds

$$\begin{aligned} \Vert {\mathbb {E}}(\Psi (u))-E^L(\Psi _L)\Vert _{L^2(\Omega )} \le C\varepsilon . \end{aligned}$$

Proof

We use the error splitting

$$\begin{aligned} \Vert {\mathbb {E}}(\Psi (u))-E^L(\Psi _L)\Vert _{L^2(\Omega )} \le \Vert {\mathbb {E}}(\Psi (u))-{\mathbb {E}}(\Psi _L))\Vert _{L^2(\Omega )} +\Vert {\mathbb {E}}(\Psi _L)-E^L(\Psi _L)\Vert _{L^2(\Omega )}. \end{aligned}$$

We obtain in the same fashion as for the term II in the proof of Theorem 5.4 that

$$\begin{aligned} \Vert {\mathbb {E}}(\Psi (u))-{\mathbb {E}}(\Psi _L)\Vert _{L^2(\Omega )}&\le \Vert \Psi (u)-\Psi (u_{N_L,h_L})\Vert _{L^2(\Omega )}\\&\le C (2^{-tN_L}+h_L^{(2-\theta )r}) \le C h_L^{(2-\theta )r}, \end{aligned}$$

where we have used \(2^{-tN_L}\le h_L^{(2-\theta )r}\) by (67) in the last step. To bound the second term, we expand \({\mathbb {E}}(\Psi _L)\) in a telescopic sum to obtain with (65)

$$\begin{aligned} \Vert {\mathbb {E}}(\Psi _L)-E^L(\Psi _L)\Vert _{L^2(\Omega )}^2&= \sum _{\ell =1}^L \Vert {\mathbb {E}}(\Psi _\ell -\Psi _{\ell -1})-E_{M_\ell }(\Psi _\ell -\Psi _{\ell -1})\Vert _{L^2(\Omega )}^2 \\&= \sum _{\ell =1}^L M_\ell ^{-1} \Vert \Psi _\ell -\Psi _{\ell -1}\Vert _{L^2(\Omega )}^2. \end{aligned}$$

The first equality holds since the MC estimators \(E_{M_1}(\Psi _1), \dots , E_{M_L}(\Psi _L-\Psi _{L-1})\) are jointly independent and unbiased in the sense that \({\mathbb {E}}(\Psi _\ell -\Psi _{\ell -1})={\mathbb {E}}(E_{M_\ell }(\Psi _\ell -\Psi _{\ell -1}))\). The triangle inequality and the estimate (62) (from the proof of Theorem 5.4) then further yield

$$\begin{aligned} \Vert {\mathbb {E}}(\Psi _L)-E^L(\Psi _L)\Vert _{L^2(\Omega )}^2&\le 2\sum _{\ell =1}^L M_\ell ^{-1}\left[ \Vert \Psi _\ell -\Psi (u)\Vert _{L^2(\Omega )}^2+\Vert \Psi (u)-\Psi _{\ell -1}\Vert _{L^2(\Omega )}^2\right] \\&\le C\sum _{\ell =1}^L M_\ell ^{-1}\left[ 2^{-2tN_\ell }+h_\ell ^{2(2-\theta )r}+2^{-2tN_{\ell -1}}+h_{\ell -1}^{2(2-\theta )r}\right] \\&\le C\sum _{\ell =1}^L w_\ell ^{-1} h_L^{2(2-\theta )r} , \end{aligned}$$

where we have used Assumption 5.5 and the choices for \(M_\ell \) and \(N_\ell \) in (67) in the last step. As \(\sum _{\ell =1}^L w_\ell ^{-1}\le C_w<\infty \) is bounded independently of L, we conclude with (66), L as in (67), and \(0<\underline{c}_{\mathcal {K}}\le \overline{c}_{\mathcal {K}}<1\) that

$$\begin{aligned} \Vert {\mathbb {E}}(\Psi (u))-E^L(\Psi _L)\Vert _{L^2(\Omega )}\le & {} C h_L^{(2-\theta )r} \le C (\overline{c}_{\mathcal {K}}^L h_0)^{(2-\theta )r}\\\le & {} C\varepsilon ^{\frac{\log (\overline{c}_{\mathcal {K}})}{\log (\underline{c}_{\mathcal {K}})}} h_0^{\left( 1-\frac{\log (\overline{c}_{\mathcal {K}})}{\log (\underline{c}_{\mathcal {K}})}\right) (2-\theta )r} \le C\varepsilon . \end{aligned}$$

\(\square \)

The computational advantages of the MLMC method are precisely quantified in the next statement. Therein, the choice of \(w_\ell \) plays a key role and depends on the relation of variance decay and cost of sampling on each level.

Theorem 5.8

Let Assumptions 5.1 and 5.5 hold, and let \(\varepsilon >0\). Given tr and \(\theta \) and \(\varepsilon >0\), set \(L, M_\ell \) and \(N_\ell \) as in Theorem 5.7 and choose the weight functions

$$\begin{aligned} w_\ell = {\left\{ \begin{array}{ll} \ell ^{1+\iota } \quad &{}\text {if }2(2-\theta )r>d \\ L \quad &{}\text {if }2(2-\theta )r=d \\ \underline{c}_{\mathcal {K}}^{(2(2-\theta )r-d)(L-\ell )/2} \quad &{}\text {if }2(2-\theta )r<d \\ \end{array}\right. }, \quad \ell \in \{1,\dots , L\}, \end{aligned}$$

where \(\iota >0\) is an arbitrary small constant. Then, the MLMC estimator satisfies

$$\begin{aligned} \Vert {\mathbb {E}}(\Psi (u))-E^L(\Psi _L)\Vert _{L^2(\Omega )} \le C\varepsilon , \end{aligned}$$

with computational cost \({\mathcal {C}}_{MLMC}\) for \(\varepsilon \rightarrow 0\) of order

$$\begin{aligned} {\mathcal {C}}_{MLMC} = {\left\{ \begin{array}{ll} {\mathcal {O}}(\varepsilon ^{-2}) \quad &{}\text {if }2(2-\theta )r>d \\ {\mathcal {O}}(\varepsilon ^{-2}\log (\varepsilon )^2) \quad &{}\text {if }2(2-\theta )r=d \\ {\mathcal {O}}(\varepsilon ^{-2-\frac{d-2(2-\theta )r}{(2-\theta )r}}) \quad &{}\text {if }2(2-\theta )r<d. \end{array}\right. } \end{aligned}$$

Proof

Since \(\underline{c}_{\mathcal {K}}\in (0,1)\), it holds in each scenario \(\sum _{\ell =1}^L w_\ell ^{-1}\le C\) for a constant \(C>0\), and uniform with respect to L. Therefore, we conclude by Theorem 5.7 that

$$\begin{aligned} \Vert {\mathbb {E}}(\Psi (u))-E^L(\Psi _L)\Vert _{L^2(\Omega )} \le C\varepsilon , \end{aligned}$$

and it remains to derive the complexity in terms of \(\varepsilon \).

Assumption 5.5 implies that \(h_L \le h_\ell \underline{c}_{\mathcal {K}}^{(L-\ell )} h_0\). We obtain with (67) that

$$\begin{aligned} M_\ell =\lceil \underline{c_{\mathcal {K}}}^{(\ell -L)2(2-\theta )r}w_\ell \rceil . \end{aligned}$$
(68)

Let \({\mathcal {C}}_\ell \) denote the work required to generate on sample of \(\Psi _\ell \). As \(h_\ell \ge \underline{c}_{\mathcal {K}}^{\ell } h_0\), it holds that

$$\begin{aligned} {\mathcal {C}}_\ell ={\mathcal {O}}(\dim (V_{h_\ell }))={\mathcal {O}}(h_\ell ^{-d})\le C {\underline{c}_{\mathcal {K}}}^{-d\ell } \end{aligned}$$
(69)

Since we generate \(M_\ell \) independent \(\Psi _\ell -\Psi _{\ell -1}\) on each level (and also generate independent samples across all levels), the overall cost of the MLMC estimator is with (68) and (69) bounded by

$$\begin{aligned} {\mathcal {C}}_{MLMC}&:= \sum _{\ell =1}^L M_\ell ({\mathcal {C}}_\ell +{\mathcal {C}}_{\ell -1}) \\&\le C \sum _{\ell =1}^L \underline{c}_{\mathcal {K}}^{(\ell -L)2(2-\theta )r}w_\ell (\underline{c}_{\mathcal {K}}^{-d\ell }+\underline{c}_{\mathcal {K}}^{-d(\ell -1)}) \\&\le C \underline{c}_{\mathcal {K}}^{-L2(2-\theta )r} \sum _{\ell =1}^L \left( \underline{c}_{\mathcal {K}}^{2(2-\theta )r-d}\right) ^{\ell } w_\ell . \end{aligned}$$

Now first suppose that \(2(2-\theta )r-d>0\). Since \(\underline{c}_{\mathcal {K}}\in (0,1)\), the ratio test for sum convergence shows that for any \(\iota >0\) we obtain the uniform bound (with respect \(L\in {\mathbb {N}}\))

$$\begin{aligned} \sum _{\ell =1}^L \left( \underline{c}_{\mathcal {K}}^{2(2-\theta )r-d}\right) ^{\ell } w_\ell \le \sum _{\ell \in {\mathbb {N}}} \left( \underline{c}_{\mathcal {K}}^{2(2-\theta )r-d}\right) ^{\ell } \ell ^{1+\iota } <\infty . \end{aligned}$$

On the other hand, if \(2(2-\theta )r-d=0\), there holds with \(w_\ell =L\)

$$\begin{aligned} \sum _{\ell =1}^L \left( \underline{c}_{\mathcal {K}}^{2(2-\theta )r-d}\right) ^{\ell } w_\ell = L^2. \end{aligned}$$

Finally, for \(2(2-\theta )r-d<0\), it follows that there is a \(C>0\) such that

$$\begin{aligned} \sum _{\ell =1}^L \left( \underline{c}_{\mathcal {K}}^{2(2-\theta )r-d}\right) ^{\ell } w_\ell = \underline{c}_{\mathcal {K}}^{L(2(2-\theta )r-d)/2} \sum _{\ell =1}^L \underline{c}_{\mathcal {K}}^{\ell (2(2-\theta )r-d)/2} \le C \underline{c}_{\mathcal {K}}^{L(2(2-\theta )r-d)}. \end{aligned}$$

Altogether, we obtain that there exists a constant \(C>0\) independent of L such that

$$\begin{aligned} {\mathcal {C}}_{MLMC} \le C {\left\{ \begin{array}{ll} \underline{c}_{\mathcal {K}}^{-L2(2-\theta )r} \quad &{}\text {if }2(2-\theta )r>d, \\ \underline{c}_{\mathcal {K}}^{-L2(2-\theta )r} L^2 \quad &{}\text {if }2(2-\theta )r=d, \\ \underline{c}_{\mathcal {K}}^{-L2(2-\theta )r}\underline{c}_{\mathcal {K}}^{L(2(2-\theta )r-d)} \quad &{}\text {if }2(2-\theta )r<d. \end{array}\right. } \end{aligned}$$

With L from (67) it follows that \(\underline{c}_{\mathcal {K}}^L={\mathcal {O}}(\varepsilon ^{\frac{1}{(2-\theta )r}}h_0^{-1})\) for \(\varepsilon \rightarrow 0\). This shows the following asymptotics for the \(\varepsilon \)-cost bounds as \(\varepsilon \rightarrow 0\)

$$\begin{aligned} {\mathcal {C}}_{MLMC}(\varepsilon ) = {\left\{ \begin{array}{ll} {\mathcal {O}}(\varepsilon ^{-2}) \quad &{}\text {if }2(2-\theta )r>d, \\ {\mathcal {O}}(\varepsilon ^{-2}\log (\varepsilon )^2) \quad &{}\text {if }2(2-\theta )r=d, \\ {\mathcal {O}}(\varepsilon ^{-2-\frac{d-2(2-\theta )r}{(2-\theta )r}}) \quad &{}\text {if }2(2-\theta )r<d. \end{array}\right. } \end{aligned}$$

\(\square \)

Remark 5.9

The asymptotic complexity bounds for \(\varepsilon \rightarrow 0\) are of the same magnitude as in [14, Theorem 3.1] and [10, Theorem 1], but only require knowledge of the parameters rt and \(\theta \), and not on further absolute constants. From Theorem 5.4, the SLMC estimator requires for given \(\varepsilon >0\) a total of \(M\approx \varepsilon ^{-2}\) samples with refinement parameters satisfying \(2^{-tN}\approx h^{(2-\theta )r}\approx \varepsilon \). Hence, \(h={\mathcal {O}}(\varepsilon ^{\frac{1}{(2-\theta )r}})\) and, assuming availability of a linear complexity solver such as multigrid, the computational cost per sample is bounded asymptotically as \({\mathcal {O}}(\dim (V_h))={\mathcal {O}}(h^{-d})={\mathcal {O}}(\varepsilon ^{-\frac{d}{(2-\theta )r}})\). The total cost of the SLMC estimator to achieve \(\varepsilon \)-accuracy is therefore

$$\begin{aligned} {\mathcal {C}}_{SLMC}(\varepsilon ) = {\mathcal {O}}(\varepsilon ^{-2-\frac{d}{(2-\theta )r}})>{\mathcal {C}}_{MLMC}(\varepsilon ),\quad \text {as }\varepsilon \rightarrow 0. \end{aligned}$$
(70)

Consequently, under the stated assumptions, MLMC-FEM achieves a considerable reduction in (asymptotic) \(\varepsilon \)-complexity, even in low-regularity regimes with \(2(2-\theta )r<d\).

In case that \(2(2-\theta )r>d\), the assumption that \(E_{M_1}(\Psi _1), \dots , E_{M_L}(\Psi _L-\Psi _{L-1})\) are independent MC estimators is not required to derive the optimal complexity \({\mathcal {C}}_{MLMC}={\mathcal {O}}(\varepsilon ^{-2})\). Instead, setting \(w_\ell =\ell ^{2(1+\iota )}\) is sufficient to compensate for the dependence across discretiation levels. This could be exploited in a simulation to “recycle” samples from coarser discretization levels in order to further increase efficiency. We refer, e.g., to the discussion in [7, Section 5.2].

6 Numerical experiments

We consider numerical experiments in the rectangular domain \({\mathcal {D}}:={\mathbb {T}}^2=(0,1)^2\) and use the constant source function \(f\equiv 1\). For the spatial discretization we use bilinear finite elements that may be efficiently computed by exploiting their tensor product-structure, see Appendix B.2 for details. The initial mesh width is given by \(h_0=\frac{1}{2}\) and we use dyadic refinements with factor \(\underline{c}_{\mathcal {K}}=\overline{c}_{\mathcal {K}}= \frac{1}{2}\) to obtain a sequence of tesselations \(({\mathcal {K}}_h,\; h=2^{-\ell } h_0,\; \ell \in {\mathbb {N}})\) that satisfies Assumption 4.6 for the MLMC algorithm. We further use midpoint quadrature to assemble the stiffness matrix for each realization of the diffusion coefficient. The resulting quadrature error does not dominate the FE error convergence from Theorems 4.7 and 4.8, as we show in Lemma B.1 in the Appendix. For given N and a rectangular mesh \({\mathcal {K}}_h\), we evaluate \(b_{T,N}\) at the midpoint of each \(K\in {\mathcal {K}}_h\) as described in Appendix B.3.

We investigate different parameter regimes of varying smoothness for the diffusion coefficient, the values and resulting pathwise approximation rates r and t as in Corollary 4.9 are collected in Table 1. In all experiments, we build the random field \(b_T\) resp. \(b_{T,N}\) based on Daubechies wavelets with five vanishing moments (“\(\textrm{DB}(5)\)-wavelets”), with smoothness \(\phi , \psi \in C^{1.177}({\mathbb {R}})\) (see [12, Section 7.1.2]). We consider the \(L^2({\mathcal {D}})\)-norm of the gradient as quantity of interest (QoI), with associated functional given by

$$\begin{aligned}&\Psi :H^1({\mathcal {D}})\rightarrow [0,\infty ), \quad u\mapsto \left( \int _{\mathcal {D}}|\nabla u|^2 dx\right) ^{1/2}, \end{aligned}$$

so that Assumption 5.1 holds with \(\theta =1\).

Table 1 Parameters values for the random tree Besov priors in the numerical experiments

Given \(\theta \), r and t, we prescribe target accuracies \(\varepsilon = 2^{-r\xi }\), \(\xi \in \{5,\dots , 9\}\) and select, for given \(\varepsilon >0\), the MLMC parameters as in Theorem 5.7. The maximum refinement level is denoted by \(L_\varepsilon \) and the corresponding estimators by \(E^{L_\varepsilon }(\Psi _{L_\varepsilon })\). We sample \(n_{ML}=2^8\) realizations of \(E^{L_\varepsilon }(\Psi _{L_\varepsilon })\) for every \(\varepsilon \). As reference solution, we use \(n_{ref}=2^4\) realizations of \(E^{L_{ref}}(\Psi _{L_{ref}})\) with parameters adjusted to achieve \(\varepsilon _{ref}:=2^{-r11}\). We report for prescribed \(\varepsilon \) the realized empirical RMSE

$$\begin{aligned} \text {RMSE}(\varepsilon )= \left( \frac{1}{n_{ML}}\sum _{i=1}^{n_{ML}} \left( E^{L_\varepsilon }(\Psi _{L_\varepsilon })(\omega _i) - \left( \frac{1}{n_{ref}}\sum _{j=1}^{n_{ref}}E^{L_{ref}}(\Psi _{L_{ref}})(\omega _j) \right) \right) ^2 \right) ^{\frac{1}{2}}. \end{aligned}$$

All computations are realized using MATLAB on a workstation with two Intel Xeon E5-2697 CPUs with 2.7 GHz, a total of 12 cores, and 256 Gigabyte RAM.

We start with the “smooth Gaussian” case from Table 1. A sample of the diffusion coefficient and the associated bilinear FE approximation of u is given in Fig. 3, where we also plot the (average) CPU times against the realized RMSE. As we see, the realized error is very close to the prescribed accuracy \(\varepsilon \), which corresponds to the error estimate from Theorem 5.7. Moreover, the empirical results are in line with the work estimates from Theorem 5.8, as is seen in the right plot of Fig. 3, since the computational complexity is (asymptotically) of order \({\mathcal {O}}(\varepsilon ^{-2}|\log (\varepsilon )|^2)\).

Next, we decrease the parameter s to obtain the “rough Gaussian” scenario from Table 1. A sample of the diffusion coefficient and the associated bilinear FE approximation of u is given in Fig. 4. Compared to Fig. 3, we now see more detailed, sharp features in the diffusion coefficient, due to the slower decay factor of the wavelet basis. Average CPU times vs. realized RMSE are given in Fig. 4. Again, the realized error is of order \({\mathcal {O}}(\varepsilon )\), and the computational times are asymptotically of order \({\mathcal {O}}(\varepsilon ^{-4})\), as expected from Theorem 5.8.

Finally, we investigate the “p-exponential” scenario from Table 1, where we use a heavier-tailed distribution of \(X_{j,k}^{l}\) and increase the wavelet density to \(\beta =\frac{3}{4}\). We use a standard Acceptance-Rejection algorithm to sample from the p-exponential density for \(p=1.6\). A sample of the diffusion coefficient and the associated bilinear FE approximation of u is given in Fig. 5. We observe that the variance of coefficient and solution is increased, compared to the previous two examples. This is indicated by the larger bars of the confidence intervals in the right plot of Fig. 5. The a-priori accuracy has been scaled by a factor of three in this plot, for a better visual comparison of realized and prescribed RMSE. Although absolute magnitude and variance of the realized RMSE have increased, we still recover in line with Theorem 5.8 the asymptotic error decay of order \({\mathcal {O}}(\varepsilon )\), together with CPU times of order \({\mathcal {O}}(\varepsilon ^{-\frac{8}{3}})\).

Our experiments show that sharper features in the prior model are achieved by decreasing either s or p. We emphasize that the asymptotic complexity depends on the differential dimension \(s-\frac{d}{p}\) as indicated in Table 1,the latter essentially corresponds to the parameter r in Theorem 5.7. Consequently, the effect of lowering p on the overall regularity is more pronounced in as the physical dimension d increases.

Fig. 3
figure 3

Sample of a Besov random tree prior/log-diffusion coefficient \(b_{T}\) (left) and the corresponding finite element approximation of u (middle) for the “smooth Gaussian” case in Table 1. Coefficient and solution in the figures have been sampled on a grid with \(2^9\times 2^9\) equidistant points, and wavelet series truncation was after \(N=9\) scales. Fractal structures in the log-diffusion caused by the wavelet density \(\beta =\frac{1}{2}\) are clearly visible in the left plot. The realized RMSE vs. computational complexity is depicted in the right plot, the curve shows the predicted asymptotic behavior of \({\mathcal {O}}(\varepsilon ^{-2}|\log (\varepsilon )|^2)\), as indicated by the dashed line

Fig. 4
figure 4

Sample of a Besov random tree prior/log-diffusion coefficient \(b_{T}\) (left) and the corresponding finite element approximation of u (middle) for the “rough Gaussian” case in Table 1 Coefficient and solution in the figures have been sampled on a grid with \(2^9\times 2^9\) equidistant points, and wavelet series truncation was after \(N=9\) scales. The diffusion coefficient exhibits sharper features, as compared to the smooth case Fig. 3. The realized RMSE vs. computational complexity is depicted in the right plot, the curve shows the predicted asymptotic behavior of \({\mathcal {O}}(\varepsilon ^{-4})\)

Fig. 5
figure 5

Sample of a Besov random tree prior/log-diffusion coefficient \(b_{T}\) (left) and the corresponding finite element approximation of u (middle) for the “p-exponential” case in Table 1. Coefficient and solution in the figures have been sampled on a grid with \(2^9\times 2^9\) equidistant points, and wavelet series truncation after \(N=9\) scales. We still recover the predicted asymptotic error of order \({\mathcal {O}}(\varepsilon )\) with computational work of order \({\mathcal {O}}(\varepsilon ^{-\frac{8}{3}})\)

7 Conclusions

We have developed a computational framework for the efficient discretization of linear, elliptic PDEs with log-Besov random field coefficients which are modelled by a multiresolution in the physical domain whose coefficients are p-exponential with random choices of active coefficients according to GW-trees. The corresponding pathwise diffusion coefficients generally admit only rather low path regularity, thereby mandating low order Finite Element discretizations in the physical domain. We established strong pathwise solution regularity, and FE error bounds for the corresponding single-level Monte Carlo-FEM algorithm. The corresponding error vs. work bounds for the multi-level Monte Carlo algorithm follow then in the standard way. We emphasize again that higher order sampling methods seem to be obstructed by the GW-tree structure which has recently been identified in [23] as well-suited for modelling diffuse media such as clouds, fog and aerosols. The presently proposed MLMC-FE error analysis for Elliptic PDEs with (log-) Besov random tree coefficients will imply corresponding complexity bounds in multilevel Markov Chain Monte Carlo sampling strategies for Bayesian Inverse Problems on log-Besov random tree priors, as considered for example in imaging applications in [4, 11, 19, 24]. Details on their analysis and computation will be developed elsewhere.