A posteriori error analysis and adaptivity for high-dimensional elliptic and parabolic boundary value problems

We derive a posteriori error estimates for the (stopped) weak Euler method to discretize SDE systems which emerge from the probabilistic reformulation of elliptic and parabolic (initial) boundary value problems. The a posteriori estimate exploits the use of a scaled random walk to represent noise, and distinguishes between realizations in the interior of the domain and those close to the boundary. We verify an optimal rate of (weak) convergence for the a posteriori error estimate on deterministic meshes. Based on this estimate, we then set up an adaptive method which automatically selects local deterministic mesh sizes, and prove its optimal convergence in terms of given tolerances. Provided with this theoretical backup, and since corresponding Monte-Carlo based realizations are simple to implement, these methods may serve to efficiently approximate solutions of high-dimensional (initial-)boundary value problems.


Introduction
Let D D D ⊂ R L be a bounded domain, and L ∈ N. In the first part of this work, we derive an a posteriori error estimate and an adaptive time-stepping strategy based on it, for a discretization which is based on the probabilistic representation of the elliptic partial differential equation (PDE) with Dirichlet condition 1 2 Tr σ σ σ (x)σ σ σ (x)D 2 for all x ∈ D D D, (1.2) see e.g. [40, p. 366], where (1) X x ≡ {X x t ; t ≥ 0} denotes the R L -valued solution of the stochastic differential equation (SDE) dX t = b(X t )dt + σ σ σ (X t )dW t for all t > 0, (1.3) starting in x ∈ D D D, where W ≡ {W t ; t ≥ 0} is an R L -valued Wiener process on a filtered probability space ( , F, {F t } t≥0 , P), and the first exit time of X x from D D D is τ τ τ x := inf t > 0 : X x t / ∈ D D D . (1.4) (2) V x ≡ {V x t ; t ≥ 0} resp. Z x ≡ {Z x t ; t ≥ 0} denote the R-valued solutions of the random ordinary differential equations (ODE) dV t = c(X x t )V t dt for all t > 0, V 0 = 1, and (1.5) dZ t = g(X x t )V t dt for all t > 0, Z 0 = 0. (1.6) To numerically solve (1.1), deterministic schemes based on finite differences, finite volumes, or finite elements are well-known, which are complemented by rigorous a priori and a posteriori error analysis. However, these methods all suffer from the 'curse of dimensionality', which restricts their implementations to small values 1 ≤ L ≤ 4 in practice. To simulate the boundary value problem (1.1) for L 4, other (deterministic) mesh-based methods are available, such as sparse grids, or methods that rely on tensorstructured data and (structure-inheriting) compatible operators; see also Sect. 2 for more details. In this work the probabilistic interpretation of (1.1) is taken to approximate u(x), x ∈ D D D, for high-dimensional problems, i.e., L 4, and free from (restrictive) constraints on data in (1.1): specifically, the first goal is an a posteriori error analysis for discretization Scheme 2 (see Sect. 3.3) based on [40, p. 365 ff., Sec. 6.3] to bound the approximation error for u(x), x ∈ D D D in terms of the computed solution {Y j X } j≥0 . A distinct feature of Scheme 2 is the use of a scaled random walk instead of unbounded Wiener increments to rigorously derive the a posteriori error bound (1.7) below.
To our knowledge, the first a posteriori (weak) error analysis for the Euler method with Wiener increments (unbounded) (4.29) to solve the Kolmogorov PDE on D D D = R L goes back to [48], whose application is restricted to low dimensions L. These techniques are later extended to the related (parabolic) boundary value problem in [20], where 'stopping' is realized when corresponding iterates have come 'close' to ∂D ∂D ∂D rather than onto ∂D ∂D ∂D; however, the a posteriori error analysis still suffers from same restrictions; see Sect. 2. In this work, for x ∈ D D D fixed, and a given mesh {t j } j≥0 ⊂ [0, ∞) with local mesh sizes {τ j+1 } j≥0 , we verify the following a posteriori (weak) error estimate for iterates with C C C(φ, g) > 0, (computable) a posteriori error estimators {G (·) } 3 =1 in terms of the discrete solution, and J * ≡ J * (x) ∈ N 0 the stopping index. Main achievements in our work are then (i) its construction, which is based on Taylor's formula (rather than Itô's formula) to properly address the use of scaled random walk; see Theorem 4.1. At time t j , the functional G (ii) Stability results in Sect. 3.3 concerning 'discrete stopping' ensure that the sum in (1.7) is in fact finite, and, besides the 'stopping'-mechanism in Scheme 2, they are the key to verify optimal first order of convergence for (1.7) on families of (time-)meshes with maximum mesh size τ max > 0, when τ max 0; see Theorem 4.6. (iii) Estimate (1.7) will be used in Sect. 5 to construct an adaptive time stepping algorithm (see Algorithm 5.1) for which we prove local, as well as global termination, and optimal convergence behaviour in terms of the tolerance (Tol > 0); see Sect. 5.
ad (i). The derivation of (1.7) in Sect. 4.1 conceptually follows the guideline of [35,Thm. 3.1], where an a posteriori (weak) error estimate is presented for the (semiimplicit) Euler method, which uses (unbounded) Wiener increments; in fact, G (·) 1 in (1.7) is conceptually close to the estimator in [35, (3.1)]; see also item 3. Remark 4.1. While [35] considered the Kolmogorov PDE (see (1.8)) on the whole space D D D = R L , we here consider bounded domains D D D ⊂ R L , which requires the proper numerical approximation of the stopping time τ τ τ x in (1.4) when X x crosses the boundary ∂D ∂D ∂D. To this end, the weak Euler method in Scheme 2 in combination with the corresponding 'stopping'-mechanism enables a successive (local) construction of iterates {Y X ∈ S S S τ j+1 , it is either projected onto ∂D ∂D ∂D and the procedure stops or is 'bounced back' to the interior of D D D (with some probability). This different treatment of realizations of Y j X via Scheme 2 is reflected in the error estimators {G (·) } 3 =1 in (1.7) (see Fig. 1): those which contribute to the functional G 1 accounts for incremental changes within the interior D D D\S S S τ j+1 . The terms in G ( j) 3 account for those iterates that have already entered S S S τ j+1 , and where the event of a 'projection' resp. 'bouncing back' is about to happen next. The terms in G ( j) 2 account for those realizations in S S S τ j+1 which will be 'bounced back'.
ad (ii). Once the a posteriori error estimate has been established in Theorem 4.1, we analyze its convergence behavior along sequences of shrinking meshes with maximum mesh size τ max > 0. The result in Theorem 4.6 shows an optimal rate of convergence, and thus recovers the well-known a priori estimate for iterates [40,p. 369,Thm. 3.4]. In fact, the presence of the estimator {G ( j) 2 } j≥0 is crucial to validate order 1; in fact, if it would be removed from the estimator, and an immediate projection onto ∂D ∂D ∂D of an iterate in the boundary strip would occur, only a convergence order 1 2 of the reduced a posteriori error estimate may be expected; this conclusion may be drawn from the a priori error analysis in [40,p. 370,Rem. 3.5] where this selective 'bouncing back/projection'-mechanism was conceived. Further crucial tools in the proof of Theorem 4.6 are stability results in Sect. 3.3 for the boundedness of visits in the boundary strips S S S τ j+1 j≥0 , and the discrete stopping time (see Lemmata 3.3 and 3.4), which generalize related stability results in [40,p. 367,Lem. 3.2] and [40,p. 367,Lem. 3.2] to non-uniform time steps; see also Remark 3.1 for further details.
ad (iii). In Sect. 5, the a posteriori error estimate (1.7) is used to construct an adaptive method (see Algorithm 5.1) which automatically selects deterministic (local) step sizes τ j+1 = t j+1 − t j in every iteration step. For this purpose, given some tolerance Tol > 0 and j ≥ 0, we check via iterated refinement/coarsening of the current step size τ j+1 whether the partial sum is below (a multiple of) a pre-assigned tolerance Tol > 0; see (5.1). If compared to [35,Algorithm 4.1], the main difficulty for the boundary value problem (1.1) here is to set up a thresholding criterion that properly addresses the 'discrete stopping', for which the stability results in Lemmata 3.3 and 3.4 hold; see also the discussion in the beginning of Sect. 5.1. Theorem 5.2 then validates computation of each new time step τ j+1 in Algorithm 5.1 after finitely many iterations (i.e., in O log(Tol −1 ) ), at most 832 F. Merle, A. Prohl The following example from [10] illustrates efficient local mesh refinementcoarsening by the adaptive Algorithm 5.1 for L 1.
Example 1.2 (see [10]) Let L = 10 and D D D The initial refinement and gradual coarsening of the step sizes ('U'-profile) in Fig. 3A is a typical consequence of Algorihm 5.1 allowing for an interaction between informations from the (empirical) error estimators {G (·),(M) } 3 =1 and a minor weightening of 'outliersamples' according to the shape of the distribution of the stopping time t J * ; see Fig. 2, and also the detailed classification of related dynamics in the beginning of Sect. 6. In a comparative consideration of Fig. 3A, B, and C, first samples enter the boundary strips at time ≈ 0.025 and hence (possibly) get projected onto ∂D ∂D ∂D, which is why we observe a refinement of step sizes up to this time. Within the time interval [0.025, 0.125], most of the samples hit ∂D ∂D ∂D which involves fine step sizes in this region to reach a certain level of accuracy regulated by the choice of Tol. Those samples, which have not been stopped before time 0.125 may be considered as 'outliersamples' which most likely spoil the approximation. The mechanism in Algorithm 5.1 automatically allows a gradual coarsening of related step sizes for the generation of these leftover samples, which increases the width of their boundary strips, and hence forces their immediate projection onto ∂D ∂D ∂D, i.e., a stopping of Algorithm 5.1. Moreover, Algorithm 5.1 is efficient to reach the same accuracy (Error ≈ 0.002, Tol = 0.005, M = 10 4 , x = 0); the needed number of steps to terminate in Algorithm 5.1 resp. the empirical mean of the stopping index J * is max The second part of this work focuses on the parabolic PDE with proper terminal and Dirichlet boundary data, If compared to deterministic numerical methods-see also Sect. 2.2-, a conceptional advantage of probabilistic numerical methods which approximate (1.8) is that only one (temporal) discretization parameter is needed. Consequently, main structural tools which led to a posteriori error estimate (1.7) for (1.1) now may easily be adopted to approximate (1.9), and evenly so for the later construction of an adaptive method; for (t, x) ∈ [0, T ) × D D D fixed, the a posteriori error estimate on a given mesh from Scheme 3 to approximate (1.9) takes again the form , and 0 ≤ J * ≡ J * (t, x) ≤ J the stopping index; see Theorem 4.7. The following example details {H (·) } 3 =1 in (1.13) for a prototype PDE (1.8).
x) ≡ 0, and φ smooth. Then, The three estimators take similar roles as in Example 1.1 for (1.1).
The following items (i)-(iii) comment on the construction of (1.13), its convergence analysis, and use to construct an adaptive method.
(i) If compared to Scheme 2 for the elliptic problem (1.1), Scheme 3 (see Sect. 3.4) exploits an additional observance of 'stopping' when there is no projection onto ∂D ∂D ∂D before the terminal time T > 0, but is similar elsewise. Consequently, the form of (1.13) is close to (1.7). (ii) If compared to (1.7), the convergence analysis of (1.13) along sequences of shrinking meshes with a maximum mesh size simplifies since the stopping time τ τ τ t,x in (1.11) is P−a.s. bounded by the terminal time T > 0, which, in particular, avoids a related stability result concerning 'discrete-stopping'. In fact, only Lemma 3.5 is needed, which is an analogue of Lemma 3.3 in the elliptic setting. (iii) Similar to Algorithm 5.1 in Sect. 5, we construct an adaptive time-stepping algorithm (see Algorithm 5.3) based on (1.13), for which we prove (again) local and global termination, as well as optimal convergence in terms of a given tolerance Tol > 0. The (successive) step size selection procedure in Algorithm 5.3 proceeds in the same way as in Algorithm 5.1: given Tol > 0 and j ≥ 0, the current step size τ j+1 is (automatically) generated (via iterated refinement/coarsening), such that the partial sum is below Tol times a specified 'temporal weight', which grows with t j , but is bounded by means of the stability result in Lemma 3.5. In the fully practical implementation of Algorithm 5.3 (as well as Algorithm 5.1), where arising expectations are approximated by Monte-Carlo method, the 'temporal weight' gradually forces those leftover samples, which have not been projected onto ∂D ∂D ∂D with the majority of samples, to a projection. These 'forced' projections are obtained by enlarging corresponding boundary strips through a gradual coarsening of the step sizes (see Examples 1.2 and 1.4, and also Fig. 2). We refer to Sects. 5 and 6 for further details.
The following example from [33, Experiment 7.1] illustrates local mesh refinementcoarsening by Algorithm 5.3.
The corresponding solution is given by . We fix  The remainder of this paper is organized as follows: Sect. 2 provides a survey of existing (adaptive) methods for the approximation of the elliptic, as well as the parabolic PDE. Section 3 collects the assumptions needed for the data in (1.1) resp. (1.8), recalls a priori bounds for the solution of (1.1) resp. (1.8) and presents Schemes 1 to 3, as well as corresponding stability results. The a posteriori error estimates (1.7) and (1.13) are derived in Sect. 4, where also its optimal convergence orders are shown. The related adaptive methods are proposed and analyzed in Sect. 5. Section 6 presents computational studies.

A short review of a posteriori error analysis and adaptivity
Deterministic methods to solve PDE's (1.1) and (1.8) usually employ meshes to resolve the state space, and their implementation usually is complicated. In contrast, probabilistic methods are meshless, comparatively easier to implement, and still are applicable in high dimensions L. Their efficiency increases rapidly with the recent emergence of modern (parallel) GPU architectures; see [30]. The main goal in this section is to survey some existing representative directions in the a posteriori error analysis and adaptive numerical methods for the (initial-)boundary value problems (1.1) and (1.8).

Probabilistic methods to discretize high-dimensional PDE's
In the literature, there exist different numerical methods for (1.2) or (1.9), which may be seen as examples of more general 'stopped diffusion' problems. Most of them use the explicit Euler method, i.e., (3.1) where 'ξ ξ ξ j+1 √ τ j+1 ' is replaced by the Wiener increments 'W t j+1 −W t j '; see (4.29). The main difficulty then is to accurately compute the (discrete) stopping time (1.4) resp. (1.11), when the related (discrete) solution path leaves the domain D D D. This problem becomes even more prominent when the related first exit timeτ ττ of the (abstract) continuified Euler process Y Y Y X (see (4.30)) is compared in this context on an interval [t j , t j+1 ]-where trajectories may exit D D D even though all discrete (explicit) Euler iterates lie in D D D; see Fig. 5a below. An a priori error analysis therefore cuts the convergence rate from 1 to 1 2 ; see [27,Thm. 2.3]. To recover optimal order, more simulations are needed close to the boundary to accurately capture discrete stopping. First works in this direction are [27,28], which prove optimal convergence order 1. To use the method in [28], the exit probability of Y Y Y X leaving the domain, i.e., the probability thatτ ττ lies in a time interval specified by two (consecutive) grid points needs be available explicitly, which is only known for certain domains (e.g. when D D D is a half-space; see e.g. [28]). For general underlying domains D D D, this approach needs be combined with local transformations of ∂D ∂D ∂D to be successful.
In order to avoid local charts close to ∂D ∂D ∂D, the 'boundary shifting method' is presented in [29] which shrinks the domain D D D to generate more frequent exits. If compared to the 'Brownian bridge method' above, an explicit formula for exit probabilities of Y Y Y X is not required anymore, which broadens the applicability of the method to more general domains. The corresponding error analysis guarantees order o( √ t), while computations evidence order 1. For further, different strategies to ensure accurate 'stopping', we also refer to [10,[36][37][38].
The methods that we discussed so far were supplemented by a priori error analysis; to our knowledge, the only work that addresses a posteriori error analysis in this setting is [20]. For g ≡ 0 in (1.8), and based on an (asymptotic) weak a posteriori error expansion with computable leading order term, a time-stepping method is proposed which generates global stochastic (adaptive) meshes to approximate (1.9). However, these (random) mesh generations are only based on the computable part of the underlying error expansion, and adaptivity here thus remains heuristic. The corresponding derivation uses computable exit probabilities (similar to [27,28]), and is elsewise similar to the procedure in [41,48]: the derivation rests on the weak error expansion via PDE (1.8), which practically involves numerical approximations of derivatives of the (unknown) solution u of (1.8), whose simulation is only feasible in small dimensions L. The computational experiments in this work indicate a convergence order 1, but no theoretical results are known that support these observations.
The methods above are primarily addressing the (efficient) approximation (1.8) rather than (1.1). We here mention the works [8,9,11] which computationally study the approximation of (1.1) by extending the ideas from [28]: there, τ τ τ x in (1.4) is (accurately) approximated by sampling from a distribution, which is constructed by means of the related exit probability. Computational studies with the corresponding method evidence an improved convergence order 1 as well.
These schemes all use the (explicit) Euler method with unbounded Wiener increments. From a practical viewpoint however, the weak Euler method in (3.1) is an alternative option, as it uses bounded random variables in every iteration step (see Scheme 1 below) to avoid overshootings outside the domain by controlling the steps up to the boundary; in particular the work [39] verifies first order convergence for (3.1) in an associated scheme (very close to Scheme 2 resp. 3), which-except for a projection onto ∂D ∂D ∂D resp. bouncing back to the inside of D D D-does not require further adjustments of its iterates close to ∂D ∂D ∂D. We use this simple, fully practical method (3.1) within Schemes 2 and 3 in Sect. 4 to provide computable right-hand sides in an a posteriori error analysis, which may then be used to set up an adaptive time-stepping strategy based on it in Sect. 5. 2

Deterministic adaptive methods in low dimension-AFEM
Adaptive finite element methods (AFEM) base an automatic adjustment of a given mesh T 0 covering D D D ⊂ R L on an a posteriori error estimator η 2 (u T 0 , T 0 ) ≡ η 2 = T m,0 ∈T 0 η 2 T m,0 for the computed approximation u T 0 : D D D → R for (1.1) on T 0 . FEM is a general deterministic Galerkin method for (1.1) posed on an arbitrary lowdimensional domain D D D; in practice, its use then leads to large coupled algebraic systems to be inverted by means of advanced iterative solvers, where its performance crucially hinges on the ellipticity of (1.1). AFEM extends these concepts, by trying to optimally distribute nodal mesh points across D D D, guided by local {η T m,0 ; T m,0 ∈ T 0 } where η T ≡ η T (u T 0 T m,0 , T m,0 ), while aiming for optimal accuracy under fixed computational costs; it is an iterative method which repeatedly refines meshes locally and thus generates a family of nested {T } * =1 -until the related approximate u T * : D D D → R of (1.1) fulfills a certain threshold criterion.
Existing AFEM mainly uses Hilbert space methods to derive an (residual-based) a posteriori estimator η(u T , T ) to upperly bound the error u −u T in the 'energy norm', with an unknown factor (e.g., Poincare's constant reflecting stability properties of (1.1), and another one which accounts for admitted triangulations; see [21,42]). For AFEM, this estimate then suggests the following loop to automatically generate a sequence of (increasingly more) specific, (locally) refined meshes {T h } , starting from a coarse mesh T 0 , which all cover D: for a given T , we (1) ('Solve') first compute u T with the help of direct or indirect solvers (e.g., PCG, multigrid method, GMRES, or BICG) that solves a large linear system. Then (2) ('Estimate') the estimator η(u T , T ) is computed to decide whether or not u T is sufficiently accurate, and/or T should be refined or not. Based on the estimator alone is (3) ('Mark') 'Dörfler's marking strategy' (see (2.2) below), which selects those elements T := {T m, ∈ T } which are up to refinement. (4) ('Refine') Only mesh refinement is admitted to obtain the new nested mesh T +1 -via the 'newest bisection method' that splits the marked elements in (3).
It remained open until [19] to show that tuple {(u T , T )} obtained from (1) to (4) meet a pre-assigned error tolerance within finite steps * : next to the assumption of a sufficiently fine initial T 0 and the 'one interior node'-condition in (4), the convergence proof for AFEM in [19] for Poisson's problem rests on 'Dörfler's strategy marking': for a fixed 0 < θ < 1, which ensures that sufficiently many elements from T ≡ {T m, } m are chosen that constitute a fixed proportion of the global error estimator [42]. The work [19] initiated a whole series of works to broaden convergence results for more general AFEM of sort (2.1), s.t. the relevant contraction property remains valid, which is with a generic constant C ≡ C(D D D) > 0 that depends on D D D and admitted mesh geometries: to e.g. remove in [14] the (too costly) 'one interior node'-condition in (4), next to required sufficiently fine initial T 0 , the concept of 'total error' was central. Another direction generalizes the convergence property of AFEM to nonsymmetric linear-elliptic, and even quasi-linear problems; see e.g. [5,22]. Next to the contraction property, 'mesh optimality' is a crucial property for AFEM to have, which bounds the number of degrees of freedom N * = T * in the terminating mesh T * : the first work in this direction is [7], which shows optimal convergence rates (in terms of N * ; for the Poisson problem) for a certain AFEM which included a crucial coasening step; this step was later removed by a modified approach in [47]. For a further discussion of 'mesh optimality' for AFEM we refer to [43], and [13] where sufficient criteria are identified which ensure optimal convergence rates for a general AFEM; and to e.g. [5,22] for more general (PDEs). We remark that the proof of 'mesh optimality' usually requires θ ∈ (0, θ * ) in (2.2), for θ * sufficiently small to bound the number of marked elements in step (3)-whose value is not explicit for actual simulations. We refer to [18] for a further discussion. These concepts are applied in [16,23,31] to construct adaptive methods based on the implicit Euler method for the heat equation, as a special example of the evolutionary PDE (1.8): for every n ≥ 0, to (iteratively) find the new time step τ n , and then the spatial mesh T n to cover D D D, different error indicators are identified which subsequently (and thus independently) address these goals. These indicators are space-time localizations of computable terms in the a posteriori error estimate [50], see also [31,Thm. 3.1], whose derivation is based on the concept of weak (variational) solution for (1.8), to bound the error in the global Bochner norm L 2 (0, T ; . As a consequence, given n ≥ 0, and τ n , the construction of a mesh T n to approximate the solution of where L = − , via the convergent AFEM strategy (2.1) is then possible. However, the subtle interplay of different spatial and temporal scales and the decoupled treatment of related error occuring in each time step makes the construction of an efficient adaptive method for (1.8) more challenging in this parabolic case (see also [23]): also, we have to make sure that (i) τ n may iteratively be constructed via a finite sequence {τ n, } * n ≥0 , and that (ii) The final time T is reached after finitely many steps, i.e., there exists N ∈ N (deterministic): τ N ≥ T to conclude convergence of the adaptive method. The adaptive method in [16] satisfies (i) but lacks (ii); a first convergent method is given in [31], where each time-step starts with a possible coarsening of (τ n−1 , T n−1 ), and only refinements afterwards; a uniform energy estimate for iterates is now employed to determine a (uniform) minimum admissible time step for each n to meet the error tolerance, and thus show termination of the adaptive method, i.e., property (ii)-although with nonoptimal complexity bounds.

Deterministic methods to discretize high-dimensional PDE's-tensor sparsity
For D D D = (0, 1) L , mesh-based methods (such as FEM used in Sect. 2.2) to e.g. solve PDE (1.1) suffer the curse of dimensionality: for N the number of points on a uniform mesh per dimension, the number of related nodal basis functions is O(N L ), which grows exponentially with the dimension L. Sparse grids on hypercubes D D D = (0, 1) L drastically cut down this complexity of a full mesh to O(N | log(N )| L−1 ) many grid points: they discard those elements of a hierarchical basis in tensor product form which have small support, such that no loss of approximation power for sufficiently smooth solutions of PDE (1.1) occurs, see e.g. [12]; for the heat equation in (1.8) and rough initial data in tensor form, graded time meshes properly address this requirement for sparse spatial grids in [44]. As e.g. detailed in [12,45], the efficient use of sparse grids for high dimensions L requires a restricted data setting (b, σ σ σ , c, g, φ, D D D): for non-constant elliptic operators L including convection, or a domain that is not of tensor structure, as well as non-constant (Dirichlet-) boundary data partly non-trivial extensions are necessary, and those setups of data typically lower accuracy, and reachable L; see the discussion in [49]. Also, a theoretical backup for local adaptive mesh adjustments (see [12,45]) that preserve optimal complexity as L increases is less developed.
To approach even larger dimensions L based on tensor product representations for approximate solutions of PDE (1.1) with 'Laplacian like operator', the construction of a proper (sub-)set of basis functions will be part in the low rank approximation method itself; see e.g. [4] for a recent survey. We also mention [25,46], where its complexity is compared with sparse grids, and smoothness of the function was again found to be crucial for the efficiency of the low rank approximation. According to [1], its efficiency crucially hinges on the differential operator in PDE (1.1) to 'have a simple tensor product structure', and that (L-dependent) ranks, whose optimal value is not evident in general [2] should be chosen properly; see also [4,Sect. 5.3]. In fact, related theoretical discussions in [17] for the high-dimensional PDE (1.1) with constant, symmetric elliptic operators L ≡ −div(A A A∇u) with A A A ∈ R L×L diag conclude the transfer of tensor-sparsity from data to solutions, which motivates low-rank tensor format approximations for the solution of PDE (1.1) in those cases; but such a structural transfer may get lost in the case of stronger couplings [3] for general A A A ∈ R L×L spd , demanding higher ranks for a proper approximation.
While current research on deterministic methods for large L mainly focuses on the efficient use of 'tensor-sparsity respecting data' to fight the 'curse of dimensionality', we base the construction of easily implementable, adaptive methods to solve PDEs (1.1) resp. (1.8) on their probabilistic reformulations (1.2) resp. (1.9): general domains D D D ⊂ R L , and elliptic differential operators L ≡ L(x) in (1.1) resp. (1.8) are admitted, which appear in physical applications in particular, for which convergence with optimal rates for the related adaptive Algorithms 5.1 and 5.3 that base on a posteriori error estimators will provide a theoretical backup.

Assumptions and tools
Section 3.1 lists basic requirements on data b, σ σ σ , c, g, φ in (1.1), which guarantee the existence of a unique classical solution u : D D D → R of (1.1); see e.g. [24,Ch. 6]. Moreover, we recall bounds for {D x u} 4 =1 of (1.1). In almost the same manner, Sect. 3.2 presents assumptions on data b, g, φ in (1.8), which ensure the existence of a unique classical solution u of (1.8); see e.g. [32,p. 320,Thm. 5.2]. Moreover, we recall bounds for {D x u} 4 =1 of (1.8). For a sufficiently smooth ' ∈ C(R L ; R n ), corresponding (matrix) operator norms are given as ( n, L ∈ N, where · R n denotes the (Euclidean) vector norm of a R n -valued vector. If n = L, where the above summation is taken over all multi-index j j j of length | j j j |.
In a similar manner, we denote by see also [32, p. 2 ff.] for further details.
(A0) D D D is bounded, and the boundary ∂D ∂D ∂D is of class C 4+β .
In the next section, we need extra asssumptions (cf. Remark 3.1). Hence, we assume for some constant C C C > 0 depending on the data in (1.8), the dimension L and the domain D D D.

Discretization for the elliptic PDE (1.1): scheme and stability
Scheme 2 below will be used to approximate (1.2) from (1.1). For this purpose, we fix x ∈ D D D and let {t j } j≥0 ⊂ [0, ∞) be a mesh with local mesh sizes {τ j+1 } j≥0 .
is a R L -valued random vector, whose entries are independent two-point distributed random variables, taking values ±1 with probability Recalling the characterization of the boundary strip S S S τ j+1 in Sect. 1 (see also Fig. 5b), we observe that λ j > 0 has to be chosen such , as (computable) upper bound of the distance between two iterates. Hence, choosing For the following, we denote by ∂D ∂D ∂D : D D D → ∂D ∂D ∂D the projection onto the boundary ∂D ∂D ∂D, and by n n n ∂D ∂D ∂D (z) the unit internal normal to ∂D ∂D ∂D at ∂D ∂D ∂D (z); see Fig. 5b.
The following lemma estimates the number of iterates {Y j X } j≥0 from Scheme 2 in the boundary strips; it may be considered as a generalization of [40,p. 367,Lem. 3.2] for non-uniform time steps.

Lemma 3.3 Assume (A0)-(A3). Fix
Proof Let j ∈ N. Since the probability p j in (3.5) is greater than 1 2 , we obtain The following lemma yields boundedness of the expected discrete stopping time

Proof
Step 1: (Derivation of a 'discrete Dynkin-formula') We derive a 'discrete Dynkin-formula' adapted to our setting. Let f ∈ C(R L ) and k ∈ N. A first calculation yields According to the procedure in Scheme 2, we have where where T 1,1,1 T 1,1,1 , as well as the tower property and independency arguments to get Similar arguments as in (3.9), i.e., representing the increment 'Y j+1 X − Y j X ' in T 1,1,2 T 1,1,2 T 1,1,2 via (3.1) and using standard calculations, as well as independency arguments lead to where Plugging (3.9) and (3.10) into (3.8) yields (3.11) c) (Investigation of T 1,2 T 1,2 T 1,2 ) Since the investigation of T 1,2 T 1,2 T 1,2 is similar to T 1,1 T 1,1 T 1,1 , we obtain where in the trace terms is replaced by Y j X , and 1 Y j is replaced by According to the procedure in Scheme 2, we obtain (3.13) e) We insert (3.11) and (3.12) into (3.7), and plug the resulting expression as well as (3.13) into (3.6) to obtain (3.14) Step 2: (Proof of the statement for τ max sufficiently small) Let n ∈ N. Choose a B ∈ R L such that min We refer to [40, p.

For the usual Euler method in (4.29), it is possible to derive a similar result as in Lemma 3.4 only under (A0)-(A3) thanks to Dynkin's-formula.
4. The constant C > 0 can be explicitly identified under assumption (A1 * ) resp. (A1 * * ). In the first case

Discretization for the parabolic PDE (1.8): scheme and stability
to approximate (1.10) and (1.12). In the following, we state Scheme 3, which is closely based on [40, p. 353 ff., Subsec. 6.2.1], and which can be seen as an analog to Scheme 2 in the elliptic setting to approximate (1.9) by

A posteriori weak error estimation: derivation and optimality for the elliptic PDE (1.1)
The following result bounds the approximation error u( where C C C(φ, g) > 0 is the constant from Lemma 3.1, and the a posteriori error estimators {G ( j) } 3 =1 , are given by with computable terms The derivation of the a posteriori error estimate (4.1) crucially depends on the use of the weak Euler method (3.1) and the associated procedure in Scheme 2. Note that the right-hand side of (4.1) is 'computable', i.e., in practice, the terms {E E E ·)} =1,...,15 may be approximated by Monte-Carlo method, which typically provides a basis for an efficient error approximation (see Sect. 6 for further details). In contrast, we present an a posteriori error analysis via the explicit Euler method (4.29) in Sect. 4.3, whose derivation is (also) close to [35,Thm. 3.1], and discuss upcoming difficulties, where, in particular, the computation of terms in a posteriori form involved there is not straightforward; cf.  4), respectively. The derivation of the a posteriori error estimate (4.1) then follows by combining these lemmata. (A0) -(A3).

Lemma 4.3 Assume (A0) -(A3). Fix x ∈ D D D. Let {t
Then, for every j ≥ 0, we have where I j I j I j is given in (4.2), and C C C(φ, g) > 0 is from Lemma 3.1.
Proof In the following, we write A j := Y j X ∈ D D D\S S S τ j+1 to simplify the notation. In a first step, we rewrite I j I j I j by making use of (3.2) and (3.3) in Scheme 1.
Step 1: (Employing PDE (1.1)) We use Taylor's formula to deduce from (4.6) 1). Now, we use the identity in (1.1) to restate g(Y j X ) · τ j+1 in (4.7) Next, we use (3.1) to represent Y j+1 X − Y j X in (4.8), and standard calculations, Step 2: We estimate the terms in (4.9) independently. (a) (Estimation of K 1 K 1 K 1 ) We use Taylor's formula to get independency, the fact that Y j V > 0 (this follows from the generation of Y j V via the implicit Euler method (3.2)), Lemma 3.1 and standard arguments lead to

and standard arguments immediately lead to
We add and substract D 2 x u(Y j X ), use independency and Lemma 3.1 to obtain Plugging (4.13) into (4.12) and using ξ ξ ξ j+1 R L = √ L then leads to We start with a straightforward rewriting of K 4 K 4 K 4 .
The following lemma estimates I I j I I j I I j from (4.3). Its proof is very similar to the proof of Lemma 4.3 and will thus be omitted; see also [34] for more details.
where I I j I I j I I j is given in (4.3), and C C C(φ, g) > 0 is from Lemma 3.1.
The next lemma estimates I I I j I I I j I I I j from (4.4). (A0) -(A3).

Lemma 4.5 Assume
Then, for every j ≥ 0, we have where I I I j I I I j I I I j is given in (4.4), and C C C(φ, g) > 0 is from Lemma 3.1.
Proof We take the conditional expectation w.r.t. Y j X and use measureability arguments to obtain in a first calculation   I I I j  I I I j  I I I where p j is given in (3.5). We apply the mean value theorem twice to get for some pointsŶ j X ,Ŷ j X between Y j X and Y j X + λ j √ τ j+1 n n n ∂D ∂D ∂D (Y j X ) , and between ∂D ∂D ∂D (Y j X ) and Y j X + λ j √ τ j+1 n n n ∂D ∂D ∂D (Y j X ) , respectively. Next, we define ϕ z, n n n ∂D ∂D ∂D (Y we can rewrite (4.20) as follows, the assertion then follows from (4.21).
Next, we show convergence with optimal (weak) order 1 of the a posteriori error estimate (4.1) on a mesh with maximum mesh size τ max > 0. Theorems 4.1 and 4.6 then imply the a priori estimate [40,p. 369,Thm. 3.4], in particular.
Proof In the following, C C C > 0 is a constant, which might differ from line to line, but is always independent of τ max .

a) Bounds for {E
for some constant C C C > 0. Moreover, since c ≤ 0, we have 0 < Y j V ≤ 1 for every j ≥ 0. Let j ≥ 0. We immediately obtain By means of (3.3) and standard arguments used before, we further get Moreover, due to (3.2), we obtain Consequently, by considering the representation of G 2 : Similar to a), we obtain Step 2: Let j ≥ 0. By means of the representation of λ j in (3.4), and the fact that (4.24) Step 3: We plug (4.22), (4.23) and (4.24) into (4.1), and use τ j+1 ≤ τ max , to get . (4.25) Due to Lemma 3.3, we have 26) and due to Lemma 3.4, we obtain in both cases (i) and (ii) for some constant C > 0 independent of j ≥ 0 and τ max .

A posteriori weak error estimation: derivation and optimality for the parabolic PDE (1.8)
Close to Theorems 4.1 and 4.6 for the elliptic PDE (1.1), we derive corresponding results for PDE (1.8); see Theorems 4.7 and 4.8 below. Since its derivation is similar to the one that lead to Theorem 4.1 elsewise, we present the upcoming results without proof and refer to [34] for a detailed derivation.
where the a posteriori error estimators {H ( j) } 3 =1 , are given by and C C C(φ, g) ≥ 1 is the constant from Lemma 3.2, and with computable terms The following theorem states convergence with optimal (weak) order 1 of (4.28); its proof is similar to the one in Theorem 4.6 and can be found in [34].

A posteriori error analysis for the Euler method
Consider the Euler method (4.29) to approximate (1.8). The interpolating continuified Euler process is now used to approximate (1.9). In this respect, (4.32) first localizesτ ττ in (t j , t j+1 ], and if it is assured, proceeds via the approximationτ ττ ≈ t j+1 . The following theorem presents a related a posteriori error estimate for (4.29), (4.31). From its representation it is not difficult to obtain first order of convergence for (4.33) on families of (time-) meshes with maximum mesh size τ max > 0. We refer to [34] for more details. (4.30). Then, for C C C(φ, g) ≥ 1 from Lemma 3.2, where the a posteriori error estimators {H ( j) } 2 =1 are given bỹ Proof (Sketch of the proof of Theorem 4.9) We add and substract the term ) in the expression on the left-hand side of (4.33) to get , Itô's formula, the identity in the first line of (1.8), Lemma 3.2, as well as Malliavin calculus techniques, we obtaiñ b) Estimation ofĨ Ĩ I Ĩ I I : Using Itô's formula immediately leads tõ Since 1 {τ ττ ∈(t j ,t j+1 ]} = 1 − 1 {τ ττ ≤t j } − 1 {τ ττ >t j+1 } , and due to arguments from probability theory, we get Thus, we concludeĨ

Adaptive weak Euler methods: algorithm and convergence
The Theorems 4.1 resp. 4.7 provide a posteriori error estimates for the approximation of (1.2) resp. (1.9). In this section, we use these results to set up adaptive methods that automatically steer successive local mesh size selection to meet a pre-assigned tolerance Tol > 0 of the overall errors.

Adaptive weak Euler method for the elliptic PDE (1.1)
The algorithm below combines Scheme 2 with an automatic step size selection procedure, which is based on the a posteriori error estimate in Sect. 4  with k ≤ j in summarized form, and aims for Tol times a 'temporal weight' as corresponding upper bound to properly address stopping in (5.2) in Theorem 5.2. Hence, a summation also appears on the right-hand side of (5.1) to address accumulated errors; see Theorem 5.2 below. Furthermore, {a j } j≥0 is a sequence of additional 'weights' with 1 ≤ a j−1 ≤ a j ≤ C for some C > 0, which may be given by the user; see Sect. 6 for a detailed explanation. A suitable choice may be a j := min 1 + L · t j , 1 + L · (C + k) , for some k ∈ N, where C > 0 is the constant from Lemma 3.4. We also refer to Sect. 6 for a Monte-Carlo version of Algorithm 5.1 below.
where C > 0 depends on C C C(φ, g) > 0 from Lemma 3.1, C > 0 from Lemma 3.4, and the upper bound of max j≥0 a j .
In the following proof, let C C C ≥ 1 be a constant which might differ from line to line, but is always independent of τ j+1 and j ≥ 0.

(b) Global termination We show by induction that
This means that all step sizes are bounded from below by Tol 2C C C , which, in particular, can be considered as the smallest step size that Algorithm 5.1 is able to generate.
The base case follows by the choice of the initial mesh size τ 1 ≥ Tol. Now suppose that we have generated Y j X , Y j V and Y j Z with step size τ j ≥ Tol 2C C C and no 'stopping' has been detected before. In order to successfully compute Y x ∈ R, we conclude by means of (5.5) that Having (5.6) at hand, we conclude that By Lemma 3.4, we may infer E J * = O Tol −1 .
(c) Convergence rate Let j ≥ 0. We (upper) bound the right-hand side of (5.1) (independent of j). By means of the boundedness of a j ≤ C, and Lemmata 3.3 and 3.4, we immediately obtain where C > 0 is from Lemma 3.4. Now, (5.2) immediately follows from the a posteriori error estimate (4.1), the tolerance criterion in Algorithm 5.1 (cf. (5.1)), and (5.7).

Adaptive weak Euler method for the parabolic PDE (1.8)
By following a similar guideline in the setup of Algorithm 5.1, we present an adaptive algorithm (see Algorithm 5.3 below) for the (pointwise, i.e., for (t, x) ∈ [0, T ) × D D D fixed) approximation of PDE (1.8). Similar to Theorem 5.2 in Sect. 5.1, Theorem 5.4 validates termination and convergence of the adaptive method. For a detailed discussion and verification of the upcoming reults, we refer to [34].
For M = 10 5 fixed, we investigate the convergence rate for the approximation u (M) (x) of the solution u(x) = φ(x) at x = ( 5 /100, . . . , 5 /100) . Figure 6B illustrates optimal order of convergence 1 with respect to Tol-as opposed to sub-optimal order 1 2 (as also found in [6, Subsec. 5.5, Fig. 9]) on uniform-meshes; see Fig. 6C. It seems that time adaptivity in this respect preserves the theoretically stated first order convergence even for 'complicated' functions g and φ involved in (1.1). Figure 6D illustrates the time efficiency to reach the same accuracy of Algorithm 5.1 over Scheme 2 on a uniform mesh.
The corresponding solution is given by u(x) = x 1 x 2 x 3 . We fix x = (0.57, 0.57, 0.57) , and use Algorithm 5.1 (with Tol = 0.01, M = 10 5 ) to approximate u(x) (see Fig. 7B and C for resulting features). As stated in [6, Example 1], the suboptimal performance there is due the frequent overshoots of the inner corner of D D D, where an 'ad-hoc' value λ j ≡ 2 in (3.4) is used within the related method (on uniform meshes) to characterize the boundary strips. In contrast, the (automatic) choice of adaptive step sizes in combination with the flexible choice of λ j according to (3.4) accurately identifies particular boundary strips, i.e., leads to proper projections onto ∂D ∂D ∂D, and yields a first order convergent approximation. Furthermore, Algorithm 5.1 is efficient to reach the same accuracy (Error ≈ 0.002); the needed number of steps to terminate is max

Simulations for the parabolic PDE (1.8)
The following example from [15] C)). In both step size plots, we observe a 'L'-profile structure-opposed to a 'U'-structure as before, which is due to the temporal dynamics involved here, since 'not enough' samples are projected onto ∂D ∂D ∂D within the short time interval [0.98, 1]. Consequently, a gradual coarsening, i.e., the 'end phase' does not occur-opposed to Fig. 9B and C, where for Tol = 0.01 fixed, related step size plots are shown at a different time t = 0, at which the incompatibility has dissolved. Moreover, the reversed 'L'-profile structure of the step size plot in Fig. 9B results from the position of the (spatial) point (close to ∂D ∂D ∂D) at which u is approximated, and the choice of Tol: here, samples are already located in the boundary strips at the beginning, which is why the 'initial phase' does not take   Example 6.5 below is from [44, Subsec. 6.2] (again, 'time-reversed' here) and similarly to Example 6.4, exhibits an incompatibility of boundary and terminal data at times close to T > 0, but in a high-dimensional domain D D D. Within the framework of this example-also complementing phenomena of Algorithm 5.3 in Example 6.4, we continue the investigation of different aspects of Algorithm 5.3 concerning the choice of M and Tol with respect to the absolute error. We fix x = (0.05, 0.5, . . . , 0.5) ∈ D D D. Tables 1, 2, 3and 4display errors in relation to Tol and M for the approximation of u(t, x) at different times t. As we can see in the tables below, the approximation via Algorithm 5.3 at times close to T , i.e., where the incompatibility effect is present, needs much more Monte-Carlo samples M and very small values for Tol to aim for a 'small' corresponding error, as opposed to the approximation at times 'away' from T , where the incompatibility effect dissolves.