A randomized operator splitting scheme inspired by stochastic optimization methods

In this paper, we combine the operator splitting methodology for abstract evolution equations with that of stochastic methods for large-scale optimization problems. The combination results in a randomized splitting scheme, which in a given time step does not necessarily use all the parts of the split operator. This is in contrast to deterministic splitting schemes which always use every part at least once, and often several times. As a result, the computational cost can be significantly decreased in comparison to such methods. We rigorously define a randomized operator splitting scheme in an abstract setting and provide an error analysis where we prove that the temporal convergence order of the scheme is at least 1/2. We illustrate the theory by numerical experiments on both linear and quasilinear diffusion problems, using a randomized domain decomposition approach. We conclude that choosing the randomization in certain ways may improve the order to 1. This is as accurate as applying e.g. backward (implicit) Euler to the full problem, without splitting.


Introduction
The main objective of this paper is to combine two successful strategies from the literature: the first being operator splitting schemes for evolution equations on general, infinite dimensional frameworks and the second being stochastic optimization methods.Operator splitting schemes are an established tool in the field of numerical analysis of evolution equations and have a wide range of applications.Stochastic optimization methods have proven to be efficient at solving large-scale optimization problems, where it is infeasible to evaluate full gradients.They can drastically decrease the computational cost in e.g. machine learning settings.The link between these two seemingly disparate areas is that an iterative method applied to an optimization problem can also be seen as a time-stepping method applied to a gradient flow connected to the optimization problem.In particular, stochastic optimization methods can then be interpreted as randomized operator splitting schemes for such gradient flows.In this context, we introduce a general randomized splitting method that can be applied directly to evolution equations, and provide a rigorous convergence analysis.
Abstract evolution equations of the type u (t) + A(t)u(t) = f (t), t ∈ (0, T ], u(0) = u 0 are an important building block for modeling processes in physics, biology and social sciences.Standard examples which appear in a variety of applications are fluid flow problems, where we model how a flow evolves on a given domain over time, compare [1,26] and [37,Section 1.3].The operator A(t) can denote, for example, a non-linear diffusion operator such as the p-Laplacian or a porous medium operator.
Deterministic operator splitting schemes as discussed in more detail in [16] are a powerful tool for this type of equation.An example is given by a domain decomposition scheme, where we split the domain into sub-domains.Instead of solving one expensive problem on the entire domain, we deal with cheaper problems on the sub-domains.This is particularly useful in modern computer architectures, as the sub-problems may often be solved in parallel.
Moreover, evolution equations are tightly connected to unconstrained optimization problems, because the solution of min u F (u) is a stationary point of the gradient flow u (t) = −∇F (u(t)).The latter is an evolution equation on an infinite time horizon with A = −∇F and f = 0.In the large-scale case, such optimization problems benefit from stochastic optimization schemes.The most basic such method, the stochastic gradient descent, was first introduced already in [32], but since then it has been extended and generalized in many directions.See, e.g., the review article [3] and the references therein.
Via the gradient flow interpretation, we can see these optimization methods as time-stepping schemes where a randomly chosen sub-problem is considered in each time step.In essence, it is therefore a randomized operator splitting scheme.The difference between the works mentioned above and ours is that we apply these stochastic optimization techniques to solve the evolution equation itself rather than just finding its stationary state.
We consider nonlinear evolution equations in an abstract framework similar to [7,10,11] where operators of a monotone type have been studied.Deterministic splitting schemes for such equations has been considered in e.g.[14,15,17,29].A particular kind of splitting schemes which is most closely related to our work, domain decomposition methods, have been studied in [6,7,13,30,31].In this paper, we extend this framework of deterministic splitting schemes to a setting of randomized methods.
Outside of the context of optimization, other kinds of randomized methods have already proved themselves to be useful for solving evolution equations.Starting in [34,35] explicit schemes for ordinary differential equations have been randomized.This approach has been further extended in [2,4,18,22,24].In [8], it has been extended both to implicit methods and to partial differential equations and in [23] to finite element approximations.While these works considered certain randomizations in their schemes, they are conceptually different from our approach.Their main idea is to approximate any appearing integrals through where ξ n is a random variable that takes on values in [t n−1 , t n ].This ansatz coincides with a Monte Carlo integration idea.In this paper, we use a different approach where we decompose the operator in a randomized fashion.More precisely, we approximate data where the batch B ⊂ {1, . . ., s} is chosen randomly.The stochastic approximations f B and A B of the original data f and A are cheaper to evaluate in applications.This is less related to Monte Carlo integration and more similar to stochastic optimization methods, compare [3,9].Similar ideas have been considered in [19,20,28], where a random batch method for interacting particle systems has been studied.Moreover, very recently and during the preparation of this work, a similar approach has also been applied to the optimal control of linear time invariant (LTI) dynamical systems in [38].While the convergence rate provided there is essentially the same as what we establish in our main result Theorem 5.2, our setting is more general and allows for nonlinear operators on infinite dimensional spaces rather than finite dimensional matrices.We also consider the error of the time stepping method that is used to approximate the solution to u (t) + A B (t)u(t) = f B (t), while the error bounds in [38] assume that this evolution equation is solved exactly.This paper is organized as follows.In Section 2, we begin by explaining our abstract framework.This includes both the precise assumptions that we make and the definition of our time-stepping scheme.We give a more concrete application of the abstract framework in Section 3.With the setting fixed, we first prove in Section 4 that the scheme and its solution are indeed well-defined.We prove the convergence of the scheme in expectation in Section 5.These theoretical convergence results are illustrated by numerical experiments with two-dimensional linear and quasilinear nonlinear and linear diffusion problem in Section 6.Finally, we collect some more technical auxiliary results in Appendix A.

Setting
In the following, we introduce a theoretical framework for the randomized operator splitting.This setting is similar to the one in [7].
be a real, separable Hilbert space and let (V, • V ) be a real, separable, reflexive Banach space, which is continuously and densely embedded into H.Moreover, there exists a semi-norm Denoting the dual space of V by V * and identifying the Hilbert space H with its dual space, the spaces from Assumption 1 form a Gelfand triple and fulfill, in particular, Assumption 2. Let the spaces H and V be given as stated in Assumption 1. Furthermore, for T ∈ (0, ∞) as well as p ∈ [2, ∞), let {A(t)} t∈[0,T ] be a family of operators A(t) : V → V * that satisfy the following conditions: a monotonicity-type condition in the sense that there exists η A ∈ [0, ∞), which does not depend on t, such that is uniformly bounded such that there exists β A ∈ [0, ∞), which does not depend on t, with , fulfills a uniform semicoercivity condition such that there exist µ A , λ A ∈ [0, ∞), which do not depend on t, with Assumption 3. The function f is an element of the Bochner space L 2 (0, T ; H), and the initial value u 0 ∈ H, where H is the Hilbert space from Assumption 1.
Assumption 1-3, are requirements on the problem that we want to solve.The following Assumptions 4-5 are needed to state the approximation scheme for the given problem.Assumption 4. Let (Ω, F, P) be a complete probability space and let {ξ n } n∈N be a family of mutually independent random variables.Further, let the filtration {F n } n∈N be given by where σ denotes the generated σ-algebra.
In the following, we denote the expectation with respect to the probability distribution of ξ for a random variable X in the Bochner space L 1 (Ω; H) by E ξ [X].Moreover, we abbreviate the total expectation by We denote the space of Hölder continuous functions on [0, T ] with Hölder coefficient γ ∈ (0, 1) and values in H by C γ ([0, T ]; H).For notational convenience we include the case γ = 1 and denote the space of Lipschitz continuous functions by C 1 ([0, T ]; H).Assumption 5. Let Assumptions 1-4 be fulfilled.Assume that for almost every ω ∈ Ω, there exists a real Banach space V ξ(ω) such that . Moreover, the mapping from ω → V ξ(ω) is measurable in the sense that for every v ∈ H the set {ω ∈ Ω : v ∈ V ξ(ω) } is an element of the complete generated σ-algebra Further, let the family of operators {A ξ(ω) (t)} ω∈Ω,t∈[0,T ] be such that for almost every ω ∈ Ω, {A ξ(ω) (t)} t∈[0,T ] fulfills Assumption 2 with the spaces V ξ(ω) , H and V * ξ(ω) and corresponding constants κ ξ(ω) , η ξ(ω) , β ξ(ω) , µ ξ(ω) and λ ξ(ω) .Moreover, the mapping Further, let the family {f ξ(ω) } ω∈Ω be given such that f ξ(ω) ∈ L 2 (0, T ; H).Moreover, the mapping Under the setting explained in the above assumptions, we consider the initial value problem and a family of random variables {f n } n∈{1,...,N } such that f n : Ω → H is F ξn -measurable, we consider the scheme Note that U n : Ω → H is a random variable and therefore some statements involving it below only hold almost surely.Whenever there is no risk of misinterpretation, we omit writing almost surely for the sake of brevity.
When proving that the scheme is well-defined and establishing an a priori bound, it is sufficient to assume that {f ξn } n∈{1,...,N } are integrable with respect to the temporal parameter.In that case, we can choose for example In our error bounds, we assume more regularity for the functions {f ξn } n∈{1,...,N } and demand continuity with respect to the temporal parameter.In this case, we may also use We will focus on this second choice for the error bounds in Section 5.

Application: Domain decomposition
One main application that is allowed by our abstract framework is a domain decomposition scheme for a nonlinear fluid flow problem.Domain decomposition schemes are well-known for deterministic operator splittings.However, to the best of our knowledge, it has not been studied in the context of a randomized operator splitting scheme.

Deterministic domain decomposition.
To exemplify our abstract equation (2.1), we consider a (nonlinear) parabolic differential equation.In the following, let D ⊂ R d , d ∈ N, be a bounded domain with a Lipschitz boundary ∂D.For p ∈ [2, ∞), we consider the parabolic p-Laplacian with homogeneous Dirichlet boundary conditions for α : [0, T ] → R and u 0 : D → R. The notation f is used to differentiate between the function f : (0, T ) × D → R and the abstract function f on (0, T ) that it gives rise to through [f (t)](x) = f (t, x).We consider a domain decomposition scheme similar to [13] for p = 2 and to [6,7] for p ∈ [2, ∞).For the sake of completeness, we recapitulate the setting here also with a different boundary condition.For s ∈ N, let {D } s =1 be a family of overlapping subsets of D. Let each subset have a Lipschitz boundary and let the union of them fulfill D) be given such that the following criteria are fulfilled for ∈ {1, . . ., s}.With the help of the functions {χ } ∈{1,...,s} , it is now possible to introduce suitable functional spaces {V } ∈{1,...,s} .We use the weighted Lebesgue space L p (D , χ is finite.In the following, let the pivot space (H, (•, •) H , • H ) be the space L 2 (D) of square integrable functions on D with the usual norm and inner product.The spaces V and V , ∈ {1, . . ., s}, are given by , with respect to the norms and semi-norms Note that a bootstrap argument involving the Sobolev embedding theorem shows that the norm given in (3.2) is equivalent to the standard norm in the space.We can now introduce the operators Similarly, we define the right-hand sides f : [0, T ] → H, ∈ {1, . . ., s}, where f (t) = χ f (t) in H for almost every t ∈ (0, T ).Lemma 3.1.Let the parameters of the equation (3.1) be given such that α ∈ C([0, T ]; R), u 0 ∈ L 2 (D) and f ∈ L 2 ((0, T ) × D).Then the setting described above fulfills Assumptions 1-3.
In this paper, we instead fix the weight functions, but choose a random part of the operator in every time step.For the operator Here τ is the proper scaling factor which ensures that We tacitly assume that τ > 0, because otherwise we would be in a situation where at least one A (t) is never chosen.Such a strategy would obviously not work.We set V ξ(ω) = ∈ξ(ω) V .
Proof.In the following proof, we drop the index n to keep the notation simpler.The embedding and norm properties are fulfilled as verified in the previous lemma.It remains to verify the measurability condition.We need to verify that for every Moreover, we need to verify that the mapping ω → A ξ(ω) (t)v is measurable for every v ∈ H.This can be seen from the decomposition A ξ (t)v = S A(t)v • ξ where S A(t)v : 2 {1,...,s} → V * is given through S A(t)v (B) = ∈B A (t)v.As ξ −1 (B) ∈ F ξ for all B ⊂ 2 {1,...,s} and S −1 A(t)v (X) ⊂ 2 {1,...,s} for any open set X ⊂ V * , the mapping ω → A ξ(ω) (t)v is measurable.Analogously, it can be proved that mapping ω → f ξ(ω) (t) is measurable.In Lemma 3.1, we already verified that an operator A ξ(w) fulfills the conditions from Assumption 2. Thus, it only remains to prove the expectation property from Assumption 5.This is fulfilled as

Solution is well-defined
In the coming section, we show that our scheme (2.2) is well-defined.This includes that first of all the scheme possesses a unique solution.We consider a purely deterministic equation (2.1).However, as the numerical scheme is randomized, the solution U n of (2.2) is a mapping of the type U n : Ω → H. Thus, we also need to make sure that it is a measurable function.These facts are verified in Lemma 4.1.Moreover, we provide an integrability result in the form of an a priori bound in Lemma 4.2.
Proof.For ω ∈ Ω, we find that the operator is monotone, radially continuous and coercive.Thus, it is surjective, compare [33,Theorem 2.18].Moreover, for Thus, it follows that U 1 − U 2 H = 0 and I + h n A ξn(ω) (t n ) is injective for κh n < 1 and, in particular, bijective.It remains to verify that U n : Ω → H is well-defined.We define the auxiliary function g where e ∈ V * with e V * = 1.In the following, we want to apply Lemma A.3 to the function g to prove that U n is measurable.Applying [33, Lemma 2.16], it follow that for fixed ω ∈ Ω, the function v → g(ω, v), w V * ×V is continuous for all v, w ∈ V ξn(ω) .It remains to verify that for fixed v ∈ H and w ∈ V , the function ω → g(ω, v), w V * ×V is measurable.Let B be an open set in V * .It then follows that

As the function
⊂ Ω is measurable.The sets T 1 and T 3 are measurable by assumption.Thus, it follows that ω → g(ω, v) and therefore ω → g(ω, v), w V * ×V is measurable.
As argued above for every ω ∈ Ω, there exists a unique element U n (ω) such that g(ω, U n (ω)) = 0. Thus, we can now apply Lemma A.3 to prove that U n : Ω → H is F n -measurable.Lemma 4.2.Let Assumptions 1-5 be fulfilled.Further, let the random variables f n : Ω → H be given such that they are F ξn -measurable and E ξn f n 2 H < ∞ for every n ∈ {1, . . ., N }.Then for 2κh n ≤ 2κh < 1 the solution {U n } n∈{1,...,N } of (2.2) fulfills the a priori bound Proof.We start by testing (2.2) with the solution U i to find that For the first term of this equality, we use the identity Due to the coercivity condition from Assumption 2 (v), we obtain .
For the right-hand side of (4.1), we observe Combining the previous statements, we find After rearranging the terms and multiplying both sides of the inequality with the factor 2, we obtain the following bound Taking the expectation and using Assumption 5 shows that This inequality is summed up from i = 1 to n ∈ {1, . . ., N }, where we only made the right-hand side bigger by summing to the final value N .In the following, denote i max ∈ {1, . . ., N } such that max i∈{1,...,N For the last term in (4.2) we then have where , and note that it follows directly from this definition that In conclusion, (4.2) therefore implies that Applying the discrete Grönwall inequality in Lemma A.1 yields As this inequality holds for every n ∈ {1, . . ., N }, it is also fulfilled for i max .Thus, it follows that We can now use that x 2 ≤ 2ax + b 2 implies that x ≤ 2a + b for a, b, x ∈ [0, ∞) and find Inserting this bound in (4.3) and applying Young's inequality (Lemma A.2 with ε = 1 2 ), we then obtain which finishes the proof.

Stability and convergence in expectation
With the previous sections in mind, we can now turn our attention to the main results of this paper.We provide error bounds for the scheme (2.2) measured in expectation.First, we give a stability result in Theorem 5.1.The stability bound can be proved in a similar manner to the a priori bound in Lemma 4.2.The aim of this bound is to show how two solutions of the same scheme with respect to different right-hand sides and initial values differ.This stability result can then be used to prove the desired error bounds in Theorem 5.2 by using well-chosen data that agrees with the exact solution at the grid points.Note that in contrast to other works (e.g.[10,11]), we measure f (t) − A(t)u(t) in the H-norm.This can be interpreted as a stricter regularity assumption.The advantage is that certain error terms disappear in expectation, compare the second bound in Lemma A.4.
Theorem 5.1.Let Assumptions 1-5 be fulfilled.Further, let the random variable f n : Ω → H be given such that it is F ξn -measurable and E ξn f n 2 H < ∞ for every n ∈ {1, . . ., N }.Let {U n } n∈{1,...,N } be the solution of (2.2) and let {V n } n∈{1,...,N } be the solution of for v 0 ∈ H and g n : Ω → H such that it is F ξn -measurable and E ξn g n 2 H < ∞ for every n ∈ {1, . . ., N }.Then for 2κh n ≤ 2κh < 1, it follows that Proof.We start by subtracting (5.1) from (2.2) and testing with U i − V i to get (5.2) For the first term of this equality, we use the identity (a − b, a) Due to the monotonicity condition from Assumption 2 (iii), we obtain .
It remains to find a bound for the right-hand side of (5.2).Applying Cauchy-Schwarz's inequality and the weighted Young inequality for products (Lemma A.2 with ε = 1), it follows that Combining the previous statements, we find After rearranging the terms and multiplying both sides of the inequality with the factor 2, we obtain the following bound By first taking the E ξi -expectation of this inequality and then applying also the E i−1 -expectation, we find that Combining the previous two inequalities and summing up from i = 1 to n ∈ {1, . . ., N }, we obtain where we only made the right-hand side bigger by summing to the final value N .In the following, denote i max ∈ {1, . . ., N } such that max i∈{1,..., Therefore, we find that To keep the presentation compact, we abbreviate Setting We can now apply Grönwall's inequality (Lemma A.1) to (5.3).It follows that (5.4) x As this inequality holds for every n ∈ {1, . . ., N }, it is also fulfilled for i max .Thus, it follows that We can now use that and find Inserting this bound in (5.4) and applying Young's inequality (Lemma A.2 for ε = 1), we then obtain It only remains to insert to finish the proof.
Theorem 5.2.Let Assumptions 1-5 be fulfilled.Further, let f ξn ∈ C([0, T ]; H) almost surely and let f n = f ξn (t n ) ∈ L 2 (Ω; H) for all n ∈ {1, . . ., N }.Let {U n } n∈{1,...,N } be the solution of (2.2) and u be the solution of (2.1) that fulfills Proof.We use {V n } n∈{1,...,N } given by where With this particular choice of g n , we can now show that V n = u(t n ) for every n ∈ {1. . . ., N }.Given the initial value u 0 , the solution V 1 is then given by Therefore, it follows that Together with the stability estimate from Theorem 5.1 we find for e where Altogether, we obtain Remark 5.3.The main results can all be modified to a slightly different setting, where the right-hand side f (t) takes values in V * and where the family {ξ n } n∈N of random variables does not have to be mutually independent.In return, this setting requires slightly stronger assumptions on the operator A(t).First, we assume additionally that there exists a constant c V ∈ (0, ∞) such that To generalize the a priori bound from Lemma 4.2 and the stability results from Theorem 5.1, we need to assume that µ A from Assumption 2 (v) and η A from Assumption 2 (iii) are strictly positive, respectively.Moreover, if there exist γ ∈ (0, 1] and is fulfilled and u ∈ C γ ([0, T ]; H), we obtain similar error bounds.We omit the proofs, which are very similar to the ones presented above.

Numerical experiments
To illustrate the theoretical convergence results for the randomized scheme in practice, we apply it to the parabolic differential equation (3.1) as discussed in Section 3.This boundary, initial-value problem fits our setting as already explained there.We also consider what happens when we replace the nonlinear diffusion term with linear diffusion, and a smoother exact solution.
In both cases, we consider the problem on the spatial domain D = [−1, 1]×[−1, 1] which we split into rectangular sub-domains D , ∈ {1, . . ., s}, with M x rectangles along the x-axis and M y rectangles along the y-axis.We choose D such that they have an overlap of 0.2 on all internal sides.This means that with M x = M y = 3, We have to choose a strategy for which sub-problems to select in each time step, i.e. specify the probabilities P(Ω ξ=B ) for B ⊂ 2 {1,...,s} .We consider two strategies.In the first, we simply use P(Ω ξ={ } ) = 1/s.Thus every sub-domain is equally probable to be chosen.As a minor variation, we instead select a set of k sub-domains by drawing with replacement according to the uniform probabilities.
In the second strategy, we make use of a predictor.In addition to the stochastic approximation, we compute a deterministic approximation Z n using the backward Euler method, but on a coarser spatial mesh.The idea is that while this approximation is less accurate, it should be significantly cheaper to compute and still resemble the true solution.In the n th time step, we compute This function is either 0 or 1 and indicates where in the domain something is actually happening.For each sub-domain, we then check whether it is "sufficiently active" or not by evaluating Ψ n χ l ≥ ρ Ψ n for a parameter ρ ∈ (0, 1).We select the set of those sub-domains which pass the test with probability 1 − ρ and the set of all the other sub-domains with probability ρ.
6.1.A nonlinear example.In our first experiment, we use the problem parameters T = 1, p = 4 and α(t) ≡ 1.Further, we choose the source term f such that the exact solution is given by u(t, x, y) = ũ(x − r cos(2πt), y − r sin(2πt)) with r = 1/2, ũ(x, y) = 0.03 − 10 3/8 4 (x 2 + y 2 ) + and [•] + = max{•, 0}.This describes a localized pulse that starts centered at (0.5, 0) and which then rotates around the origin at the constant distance r.The shape of the pulse is inspired by the closed-form Barenblatt solution to ∂ t u = ∇ • (|∇u(t, x)| p−2 ∇u), see e.g.[21].At t = 0, this solution is a Dirac delta, which then expands into a cone-shaped peak for t > 0. Our pulse is this solution frozen at the time t = 0.001.We note that due to the sharp interface where the pulse meets the x-y-plane and to the sharp peak, u is of low regularity.We discretize the problem in space using central finite differences, such that the approximation of the p-Laplacian is 2nd-order accurate.We use 41 computational nodes in each spatial dimension, for a total of 1681 degrees of freedom.For the temporal discretization, we use the scheme (2.2), along with one of the two strategies outlined above.For the first strategy, we try k = 1 and k = 2.For the second, we evaluate the different parameters ρ = 0.01, 0.05, 0.1, 0.2.We compute approximations for the different (constant) time steps h n = 2 −5 , 2 −6 , . . ., 2 −13 and estimate their corresponding errors at the final time by running the method with 50 random iterations and averaging.That is, we approximate where U ref is the exact solution u(t N , •, •) evaluated at the spatial grid.
Figure 1 shows the resulting relative errors vs. the time steps, with the first strategy in the left plot and the second strategy in the right.We observe that both The left plot uses the first randomized strategy and the right plot uses the second strategy.We observe that the errors decay as O(h 1/2 ), in line with Theorem 5.2, irrespective of the choice of ρ or k.A smaller ρ or larger k decreases the error, but of course also incurs a higher computational cost.strategies result in errors that decrease as O(h 1/2 ), in line with Theorem 5.2.We note, however, that the errors for the first strategy are noticeably larger than those of the second strategy.We have also used fewer sub-domains for the first strategy, M x = 3 and M y = 1, rather than the M x = 3 and M y = 3 which we used for the second strategy.This is because the nonlinear p-Laplacian provides less smoothing than its linear counterpart, the Laplacian.The error incurred by choosing the "wrong" sub-domain therefore decays slowly, which makes this strategy work less well with too many sub-domains.The second strategy works better with more sub-domains, since it essentially adaptively groups them into only two larger subdomains; the active set and the inactive set.Increasing the number of sub-domains increases the fidelity such that the choice of whether each sub-domain is active or not becomes easier, albeit at a higher computational cost.If the spatial discretization is using finite elements, the limit case would be when every element is its own subdomain.This is what is considered in [36] for a deterministic scheme, where it is, indeed, observed that the overhead costs can be prohibitive even when using very efficient data structures.6.2.A linear example.As a second experiment, we consider a linear version of the previous problem.We use the same parameters as in the previous section, except that we set p = 2 and α(t) = 0.1, and that the rotating pulse is now Gaussian rather than a sharp peak.More precisely, the exact solution is given by u(t, x, y) = e −100(x−r cos(2πt)) 2 −100(y−r sin(2πt)) 2  .
The resulting errors are shown in Figure 2. Again, we note that the first, uniform, strategy converges as O(h 1/2 ), in line with Theorem 5.2.The second strategy with ρ = 0.01 performs significantly better and converges as O(h) until the spatial error starts to dominate.This is essentially the same behaviour as if we would apply backward Euler to the full problem, but the method only updates the approximation on the most relevant sub-domains and is therefore cheaper to evaluate.This improved convergence order is possible due to the extra smoothness present in this linear problem.In the error bound of Theorem 5.2, the first term becomes v, w ∈ V ξ(ω) .For every ω ∈ Ω, the function g has a unique root which lies in V ξ(ω) .We denote this root by r(ω) ∈ V ξ(ω) , i.e. g(ω, r(ω)) = 0. Then the function r : Ω → H is measurable.
A similar proof can be found in [5,Lemma 2.1.4]and [8,Lemma 4.3].The main difference in this version is that the function g maps from Ω × H instead of Ω × V and therefore some small technical alterations have to be considered.Since ω → g(ω, v), w V * ×V is measurable for v ∈ H and w ∈ Q, the set {ω ∈ Ω : g(ω, u), v V * ×V = 0} = g(•, u), v V * ×V −1 (0) is an element of F ξ for v ∈ Q and u ∈ H.If the set B only contains a countable amount of elements, it follows directly that r −1 (B) ∈ F ξ .

Figure 1 .
Figure 1.The relative errors E N U − U ref 2 1/2 / U ref for nonlinear setting described in Section 6.1.The left plot uses the first randomized strategy and the right plot uses the second strategy.We observe that the errors decay as O(h 1/2 ), in line with Theorem 5.2, irrespective of the choice of ρ or k.A smaller ρ or larger k decreases the error, but of course also incurs a higher computational cost.
t) − A(t)u(t) dt E ξi f ξi (t i ) − A ξi (t i )u(t i ) − (f (t i ) − A(t i )u(t i )) E i f ξi (t i ) − A ξi (t i )u(t i ) − (f (t i ) − A(t i )u(t i ))To further bound the last row, we apply Hölder's inequality and the regularity condition u ∈ C γ ([0, T ]; H).We then find that 2 It remains to prove the second estimate of the lemma.Recall thatE ξi f ξi (t i ) = f (t i ) and E ξi A ξi (t i )u(t i ) = A(t i )u(t i ) is fulfilled by Assumption 5. Using these equalities, it follows that N i=1 h i E ξi f ξi (t i ) − A ξi (t i )u(t i ) − (t i ) − u (t) 2 H dt ≤ h 2γ |u | 2 C γ ([0,T ];H) T.