A parametric acceleration of multilevel Monte Carlo convergence for nonlinear variably saturated flow

We present a multilevel Monte Carlo (MLMC) method for the uncertainty quantification of variably saturated porous media flow that are modeled using the Richards' equation. We propose a stochastic extension for the empirical models that are typically employed to close the Richards' equations. This is achieved by treating the soil parameters in these models as spatially correlated random fields with appropriately defined marginal distributions. As some of these parameters can only take values in a specific range, non-Gaussian models are utilized. The randomness in these parameters may result in path-wise highly nonlinear systems, so that a robust solver with respect to the random input is required. For this purpose, a solution method based on a combination of the modified Picard iteration and a cell-centered multigrid method for heterogeneous diffusion coefficients is utilized. Moreover, we propose a non-standard MLMC estimator to solve the resulting high-dimensional stochastic Richards' equation. The improved efficiency of this multilevel estimator is achieved by parametric continuation that allows us to incorporate simpler nonlinear problems on coarser levels for variance reduction while the target strongly nonlinear problem is solved only on the finest level. Several numerical experiments are presented showing computational savings obtained by the new estimator compared to the original MC estimator.


Introduction
Mass transport through a variably saturated porous medium can be accurately predicted using the Richards' equation [1]. This modelling approach is of critical importance for several physics and engineering problems, for instance, when studying aquifer recharge via rainfall infiltration, or for understanding the environmental impact of mining operations. When reliable measurements of the hydraulic properties are available, numerical solutions originating from the Richards' equation have been reasonably successful for transport prediction in a broad range of soil types. Different formulations for the Richards' equation are available in the literature, along with well established mathematical theory, such as the pressure head, the water content or a mixed formulation, see e.g. [2,3,4,5,6,7]. The aforementioned formulations contain nonlinearities due to a parametric dependence of the pressure head on the saturation and the relative hydraulic conductivity. Depending on the soil parameters, these nonlinearities range from mild to strong. The extreme sensitivity of the soil parameters on the output necessitates accurate measurements of the hydraulic properties. For many realistic problems, complete information of these quantities is however not available. In such scenarios, these parameters may be modeled in a probabilistic framework and the solution output may be expressed by means of a prediction interval (with mean and variance), rather than as a single value. Such approaches are nowadays common in the case of saturated groundwater flow, where uncertainties are included when modeling the hydraulic conductivity as a spatially correlated lognormal random field [8,9,10]. The purpose of the present work is to develop and analyze a stochastic extension of the Richards' equation, along with an efficient numerical method to solve the resulting nonlinear partial differential equation with random coefficients.
Previous work on the uncertainty quantification (UQ) of unsaturated flows was often based on an uncertain hydraulic conductivity [11,12,13,14]. In addition to that, in the present work, we introduce stochasticity in the so-called van Genuchten and Mualem model [15,16], which is typically utilized to close the Richards' equation. This model provides a closed-form analytic expression for the unsaturated hydraulic conductivity based on a sigmoid type function for the soil-water retention curve. This curve is defined by four independent parameters that are estimated by curve-fitting, based on field measurements.
Typically, these parameters are fixed throughout the domain during numerical simulations, assuming the soil to be homogeneous. Realistic models should however also incorporate the intrinsic heterogeneity in the soil. Therefore, we model these soil parameters as random variables with a certain, specified probability distribution and spatial correlations. To assure the well-posedness of the Richards' equation, these parameters should be within a certain range. Thus, the probability distributions for these parameters are chosen such that the random samples will be in the domain of validity for these parameters. A practical choice is to employ non-Gaussian random fields with marginal distributions from expert knowledge or from field measurements.
With the stochastic Richards' equation formulated, an appropriate UQ technique is required to compute the statistical moments of the desired quantities of interest (QoIs). This choice primarily depends on the number of uncertainty dimensions. Other practical factors, such as ease of implementation and availability of an iterative solver which is robust with respect to the random input, also play a role in the selection of a suitable UQ technique. The proposed stochastic extension of the Richards' equation results in a very highdimensional problem, and the use of deterministic sampling approaches such as polynomial chaos expansion, stochastic collocation or stochastic Galerkin is therefore limited. For these UQ methods, the cost grows exponentially with the number of random inputs. Furthermore, a deterministic sampling approach may not adequately represent those regions in the stochastic space where strong nonlinearity may be encountered.
In previous works of Zhang [13,17], the moments method was applied for the uncertainty quantification a large class of problems, see, e.g. [18,19,20,21]. The efficiency of the MLMC estimation comes from solving the problem of interest on a coarse grid and subsequently adding corrections based on finer mesh resolutions. As these correction terms have smaller variances, they can be computed accurately using only a few samples. The estimates at different levels are then combined using a telescopic sum. The standard practice is to solve the PDE with random coefficients on a hierarchy of grids.
The original, grid-based MLMC estimator may be utilized to solve the stochastic Richards' equation, however, this approach may not be the most efficient, especially not when strongly nonlinear problems need to be solved. Such problems require a very fine spatio-temporal mesh thereby restricting the use of coarse grids to improve the efficiency of the MLMC estimator. In this article, we utilize a non-standard MLMC estimator based on the parametric continuation technique. Continuation methods for solving nonlinear PDEs are very popular in engineering applications [22,23,24,25,26,27]. Within continuation methods, corresponds to a linear (or mildly nonlinear) problem and p(Θ * ) to the target strongly nonlinear. The key idea is to march from p(Θ 0 ) to p(Θ * ) in small steps of size δΘ, where at each step we use the solution from the previous step as an initial guess. Usually, Θ is some physical parameter, for e.g. the Reynolds number, the Mach number, etc. In the current work, we use parametric continuation to obtain variance reduction within the multilevel Monte Carlo framework. This is achieved by solving simpler nonlinear problems on coarser levels and the target strongly nonlinear problem is only solved on the finest level.
This new estimator allows us to incorporate comparatively coarser spatio-temporal grids in the MLMC hierarchy and, as such, the computational cost of each estimator in the telescopic sum is greatly reduced.
We furthermore propose a solution method for Richards' equation based on a combination of the modified Picard method [2] and a cell-centered multigrid, as proposed in [28]. We benchmark the performance of this combined solver in a probabilistic framework. A number of tests for a wide range of soil parameters and for hydraulic conductivities with different heterogeneity levels is performed.
The rest of the article is organized as follows. In Section 2, we briefly discuss the deterministic Richards' equation along with the van Genuchten-Mualem parameterization. Section 3 describes the stochastic Richards' equation as well as the modeling of various uncertain soil parameters. The description of the modified Picard method in combination with the cell-centered multigrid method is provided in Section 4.
Also, in this section we present some numerical experiments to assess the performance of the combined solver for an infiltration problem. The non-standard MLMC estimator is explained in Section 5 and its performance is analysed in Section 6. Finally, some conclusions are drawn in Section 7.

Deterministic Richards' equation
We respectively, subject to boundary and initial conditions: where φ[L 3 /L 3 ] is the porosity, S w [L 3 /L 3 ] is the water-phase saturation; q is the Darcy flux, which depends on the pressure head, p[L], and the depth z[L] in the vertical direction; K s [L/T ] represents the saturated hydraulic conductivity field at saturation; K rw is the relative conductivity of the water phase with respect to air and f is the source/sink term. The initial pressure head value is given by p 0 . The quantities g D and g N denote, respectively, the Dirichlet and Neumann boundary conditions that are imposed at the boundaries Γ D and Γ N , respectively, with n the unit normal vector to Γ N .
It is obtained by substituting the moisture content, i.e, θ = φS w (p). By using the above PDE can be reformulated into the pressure head formulation: where C(p) = ∂θ ∂p is the specific moisture capacity. It is well-known that numerical solutions originating from the pressure head formulation may give rise to a significant mass balance error, resulting in an inaccurate prediction of the infiltration depth.
Numerical methods based on the mixed form (using finite differences or mass-lumped finite elements) are popular as they result in mass conservation schemes [2]. Therefore, we will work with the mixed form (2.6) of the Richards' equation.

Van Genuchten-Mualem model
To complete the PDE formulations, (2.6) or (2.7), closure models for approximating K rw and θ are required. A number of models have been presented in the literature and the most popular ones are by Brooks-Corey [29] and van Genuchten-Mualem [15,16]. These two models employ nonlinear constitutive relations for K rw and p, and for θ and p, respectively. We consider the parameterization introduced by van Genuchten and Mualem here. For the saturation, van Genuchten [15] proposed the following analytic formula: where θ s and θ r are the saturated and residual water contents, respectively, and α[L −1 ], n and m = 1 − n −1 are obtained by fitting data characterizing the statistics of the soil. Specifically, the parameter α provides a measure of the average pore-size in the soil matrix and n is related to the pore-size distribution of the soil [30].
We may derive the specific moisture content, C(p), analytically from (2.8), as In previous work, Mualem [16] derived a closed-form expression for K rw , which is given by: (2.10) Using (2.8), the above integral equation reduces to the following analytic expression: (2.11) The complexity of the numerical solution of the Richards' equation depends on the values of the parameters n and α. For n ∈ (1, 2) and p → 0, the relative hydraulic conductivity K rw (p) is not Lipschitz continuous and the derivative K rw (p) becomes infinite as p approaches zero [30,31]. Moreover, for small values of n, a sharp K rw vs. p profile is encountered. Similarly, for large values of the parameter α, the pressure head exhibits a transition behaviour with a steep gradient from the saturated to the unsaturated region.
In general, for a small n or for large α strong nonlinearities are encountered, thus implying convergence issues for nonlinear iterative techniques such as the Newton or Picard methods.

Stochastic Richards' model
Here, we describe a stochastic extension of the van Genuchten model. We assume that the unknown soil parameters belong to the probability space (Ω, F, P), where Ω is the sample space with σ-field F ⊂ 2 Ω as a set of events and the probability measure P : The stochastic extension is based on modeling the soil parameters as spatially correlated random fields in order to incorporate spatial heterogeneity. For the saturated hydraulic conductivity, K s , it is standard practice to model it as a lognormal random field, as follows, s (x) is the baseline hydraulic conductivity and Z(x, ω) is a zero mean Gaussian random field with a specified covariance kernel. So, In the present work, we consider an anisotropic Matérn covariance function, C Φ , defined as Here, we denote the gamma function by Γ and by K νc the modified Bessel function of the second kind. The Matérn function is characterized by the parameter set Φ = {ν c , λ cx , λ cz , σ 2 c }. Parameter ν c ≥ 0 defines the differentiability of Z, σ 2 c > 0 is the marginal variance and λ cx and λ cz are correlation lengths along x-and z-coordinates, respectively. When ν c = 1/2, the Matérn function corresponds to an exponential covariance function and for ν c → ∞ to a squared exponential covariance model. Simulating a Gaussian random field can be based on the Karhunen-Loéve (KL) decomposition [32] of Z(x, ω), i.e., Here, λ j and Ψ j are eigenvalues and eigenfunctions of the covariance kernel C Φ (x 1 , x 2 ), obtained from the solution of the Fredholm integral, The sum (3.5) represents an infinite-dimensional uncertain field with a decaying contribution of the eigenmodes. The rate of decay typically depends on the smoothness and correlation length of the covariance function. The sum is truncated at a finite number of terms, M KL , which is usually decided by balancing the KL-truncation error with other sources of error, like the discretization and sampling errors. For Gaussian processes with small correlation lengths and large variances, typically a large number of terms is needed to include the critical eigenmodes [32]. The evaluation of the eigenmodes in the KL-expansion is expensive as it requires solving the integral equation (3.6) for each mode. In the case of stationary covariance models, fast sampling of random fields can be achieved via spectral generators that employ the FFT (Fast Fourier Transform) [33,34] for the factorization of the covariance matrix. Another advantage of using these spectral methods is that they are able to simulate random fields on the sampling mesh without any bias (for example, in the case of the KL-expansion). In this article, we use the Fast Fourier Transform moving average (FFT-MA) algorithm from [35] to sample Gaussian random fields, see Appendix A for details.

Sampling of non-Gaussian random fields
For sampling the van Genuchten parameters, α(x, ω), n(x, ω), θ s (x, ω), θ r (x, ω) in Section 2.1, we employ random fields with non-Gaussian marginal distributions. This choice of distributions is practical as these parameters can only take values in a certain range, see e.g [31]. We introduce stochasticity in the parameters via an additive noise, where α (bl) (x) is the deterministic baseline value and ε α (x, ω) is a random field with a non-Gaussian marginal distribution and covariance C Φ . Notations are analogously for the other three van Genuchten parameters. Next, we describe a technique proposed in [36] for the point-wise transformation of a standard Gaussian random field to a non-Gaussian random field.
Non-Gaussian random fields are difficult to simulate as they are not uniquely determined by their mean and variance. There are however different techniques available for simulating non-Gaussian fields, see e.g. [36,37]. In this work, we will follow a basic approach based on a generalized Polynomial Chaos (gPC) expansion [36], which approximates the non-Gaussian field in terms of a weighted combination of Hermite orthogonal polynomials of the standard Gaussian field, where Y (x, ω) is the non-Gaussian random field (with a marginal distribution, e.g. the uniform distribution, gamma distribution, truncated normal, etc). H j (Z) is the Hermite polynomial in Z of order j with weight w j and N P C is the order of the expansion. Hermite polynomials can be expressed as: As Hermite polynomials are orthogonal with respect to the Gaussian measure, the weights can be evaluated . (3.10) Here, the denominator is basically an expectation of a polynomial of the Gaussian random variable, which has an analytic expression. As the dependence between Y and Z is unknown, the expectation in the numerator is not well-defined. Since the cumulative distribution for Y , defined as F Y (y) = Prob(Y ≤ y), is however known, one can utilize the relation Y = F −1 Y (F Z (Z)) to reformulate (3.10) as is the cumulative distribution for standard Gaussian random variable Z. The integral (3.11) can be numerically computed using any conventional integration technique, for example, by using Monte Carlo quadrature. The weights only need to be computed once, so that the cost of sampling a non-Gaussian random field with a stationary covariance function is of the same order as that of a Gaussian random field.
We will experiment here with both isotropic and anisotropic Matérn covariance models. In Table 1, the two Matérn parameters sets are listed, Φ 1 corresponding to an isotropic model and Φ 2 to an anisotropic model. In Figure 1, we present some samples of the random fields with a Gaussian and a uniform marginal distribution, for the two Matérn parameters. We use N P C = 6 in Equation (3.8) for generating the random fields with uniform marginal distribution. Due to a small correlation length and low spatial regularity, the numerical solutions of the PDE with random coefficients based on Φ 2 are comparatively more expensive to compute than those obtained with Φ 1 . A comprehensive study on the computational cost of solving elliptic PDEs with different Matérn parameters can be found in [28]. We will study the effect of covariance functions on the performance of the solver for the Richards' equation.

Modified Picard iteration combined with the cell-centered multigrid method
Algorithms based on the modified Picard iteration from Celia et al. [2] are often employed as efficient iterative solution methods for the Richards' equation. These methods are relatively easy to implement, as they do not require the computation of Jacobians and they also have low storage requirements. Within each modified Picard iteration, a diffusion equation with variable coefficients needs to be solved. For this, we propose to utilize the cell-centered multigrid (CCMG) for heterogeneous diffusion coefficients, as proposed in [28,38,39]. The CCMG algorithm is efficient as it is constructed with a simple set of transfer operators and it has been demonstrated to perform well for a large class of highly heterogeneous and also jumping diffusion coefficients [28].

Modified Picard iteration
We briefly recall the fully-implicit Picard iteration for the mixed formulation of the Richards' equation from [2]. With ∆t the time-step and for any integer J > 1, we define a uniform temporal grid by {t j = j∆t, j = 0, . . . , J}. The iteration number within a time-step is denoted by an integer k > 0. For simplicity, we use a simplified notation for θ j,k = θ(p j,k ) and K j,k = K s K rw (p j,k ). The backward Euler approximation of (2.6) is then written as The key idea of the modified Picard iteration is the use of a Taylor expansion for θ j+1,k+1 with respect to p, i.e.
where the derivative ∂θ(p) ∂p = C(p) is analytically computed by using (2.9). By neglecting the higher-order terms in (4.2) and substitution in (4.1), we get with δp j+1,k = p j+1,k+1 − p j+1,k . The above equation can be expressed in the form: The next pressure head iterate is obtained by the update p j+1,k+1 = p j+1,k + δp j+1,k . Notice that the left-hand side of the above equation is the residual associated with the Picard iteration, which should be equal to zero for a converged solution. Therefore, one may use ||δp j+1,k || ∞ < ε P I as a stopping criterion with ε P I > 0 as the tolerance for Picard iteration. The pressure head at time t j+1 is then given by p j+1 = p j+1,k+1 , with k the total number of Picard iterations to converge to ε P I . The iterative scheme (4.4) is a general mixed-formulation Picard iteration, which results in perfect mass balance.

Cell-centered multigrid
Focussing on the k-th Picard iteration (4.4) at time t j+1 , the following elliptic PDE with variable coefficients is obtained, using simplified notation, with the known quantities and the unknown δ p = δp j+1,k . To discretize the above problem, we use a cell-centered finite volume scheme for which the hydraulic conductivity at the cell-face is based on the harmonic averaging of the hydraulic conductivities from the adjacent cells, derived by the continuity of fluxes [38,39].
For the discretization of (4.5), a uniform grid D h on a unit square domain with the same mesh width is considered. For each interior cell (edges do not lie on a boundary) with center (x i1 , z i2 ), denoted by , we obtain a five-point scheme, where, for instance, K i1,i2 is the diffusion coefficient associated with cell D i1,i2 h and the source term f h i1,i2 is an approximation of f in that cell. This scheme is modified appropriately for cells close to the boundary.
Next, we describe the multigrid method for solving the linear system arising from the above discretization. The multigrid hierarchy is based on uniform grid coarsening, i.e. the cell-width is doubled in each coarsening step in each direction. As the smoothing method, we use the lexicographic Gauss-Seidel iteration, and as the transfer operators between the fine and coarse grids a simple piece-wise constant prolongation operator, P h 2h , is applied and its scaled adjoint is used as the restriction operator R 2h h on the cell-centered grid. In classical stencil notation, these are written as, respectively, where denotes the position of the cell center. The coarse grid operator is obtained via a direct discretization of the PDE operator on the coarse grid. For this discretization on a coarser grid, we need to appropriately define the diffusion coefficients on the coarse cell edges. The technique to define the suitable diffusion coefficient on a coarse cell edge is graphically described in Figure 2 and its caption.
In [28], the W (2, 2)-cycle was found to be a very robust and efficient multigrid cycling strategy, and, therefore, we also employ this cycle in our experiments. The number of multigrid iterations is based on the where L h denotes the linear operator after the discretization of Equation (4.5) and ε M G > 0. We consider the modified Picard method in this article as it is widely adopted, although many modifications have been proposed to improve its robustness. For instance, the authors in [40] studied a spatiotemporal adaptive solution method to improve the numerical stability of the modified Picard iteration.
Another interesting improvement was proposed in [41], where an Anderson acceleration was applied to improve the robustness and computational cost for the standard Picard iteration scheme. These improvements can easily be extended to the modified Picard-CCMG solver studied here. Also, there are a number of effective solution approaches based on Newton's method, see for e.g. [42,43,44]. These methods exhibit a quadratic convergence rate but are very sensitive to initial solution approximations.

Performance of the modified Picard-CCMG solver
We study the performance of the modified Picard-CCMG solver for a range of values of the parameters α, n and the effect of the heterogeneity of the hydraulic conductivity on the performance of the solver. For this we consider an infiltration problem [44,45] on a two-dimensional computational domain D = (0, 1) 2 .
The right-hand side is assumed to be zero, and we consider the a final time T f inal = 0.1 [h]. In Table 2, we provide a list of 20 values for α and n, used in the experiments. In total, we test 400 pairings of α and n.
Parameters θ s = 0.50 and θ r = 0.05 are fixed, as they do not pose any problems for the convergence rate of the solver. The samples of hydraulic conductivity are generated according to (   Based on numerical experiments, the performance of the modified Picard-CCMG solver for the stochastic Richards' equation can be summarized as follows: • In general, the cost increases by decreasing n and increasing α. The cost of the solver rises steeply for n < 1.5 and α > 3.0, and the cost increment with respect to the decrease in the value of n is more pronounced compared to the increase in α.
• While a spatio-temporal mesh refinement improves the robustness with respect to α and n, the improvement is less pronounced for n and may require a very fine mesh as n → 1.
• For a given spatio-temporal mesh, the modified Picard-CCMG solver is less robust and more expensive for anisotropic hydraulic conductivity compared to the isotropic case. A similar (α, n)-robustness can be achieved for anisotropic cases by using a sufficiently refined mesh. The standard deviation contours for the cost show a similar behavior as the average cost contour and we observe a large standard deviation for the cost when n < 1.5 and α > 3.0. In Figure 5, we present the number of samples (out of 64 samples), for which the solver did not converge for Φ 1 and Φ 2 . For almost all samples convergence failed with α close to 4.0 and n close to 1.1.
A few remarks are in order. We point out that the (α, n)-cost map may vary depending on the type of boundary and initial conditions as well as on T f inal . For instance, in the above experiments, an initially wet profile for the porous media was considered. We expect the performance of the modified Picard-CCMG solver to vary for problems in which infiltration takes place into an initially dry media and the convergence rates may depend on the values of θ r and θ s (see e.g. [46]). Furthermore, the robustness of the solver will also depend on the properties of the hydraulic conductivity field such as on the degree of heterogeneity and anisotropy. These topics will be actively explored in the future work.

Multilevel Monte Carlo with parametric continuation
We have observed in the preceding section that the total number of multigrid iterations increases rapidly with a decrease in the value of parameter n and an increase in α. We also noticed that the solver is less robust on a coarse spatio-temporal mesh. Therefore, when using the original MLMC estimator for a  "difficult" (α, n) pair, a relatively fine spatio-temporal mesh will be required (and employed), even on the coarsest level of the MLMC hierarchy, resulting in an expensive estimator. To deal with this drawback, we propose an MLMC estimator based on the parametric continuation technique. In this approach, we solve the original problem only on the finest level of the MLMC hierarchy and simplify the parameter settings dictating the nonlinearities as we work on coarser levels. This allows us to include a comparatively coarser spatio-temporal mesh compared to the original MLMC estimator as simpler problems are solved on coarser levels.
This idea is motivated by continuation based multigrid solvers for nonlinear boundary value problems [22,23,24]. In the context of multigrid solvers, continuation is commonly applied in the FMG-FAS (Full MultiGrid-Full Approximation Scheme) algorithm. In these algorithms, the continuation process is integrated in the FMG hierarchy, where the coarse grid solves the simplest problem and is used as a good first approximation for the next grid with a slightly more complicated problem. This process is repeated until the finest grid is reached where the target problem is solved. Although the continuation strategy works well for a large class of nonlinear problems, there is no guarantee that the simpler problem is close enough to the next difficult problem. One can use bifurcation diagrams to understand the solution dependence on nonlinearity dictating parameters. These diagrams can also reveal multiple branches and bifurcation points, where the solution differs greatly even if there is a slight perturbation in the parameter value. In such cases, an arclength procedure [25] can be applied to determine the appropriate perturbation size.

MLMC estimator
To explain the MLMC estimator, we consider the pressure head field at some final time T f inal as the QoI. Further, we define a spatio-temporal hierarchy of grid levels {D , T } L =0 using where h 0 is the cell-width on the coarsest mesh D 0 and s > 0 represents a grid refinement factor. We further define a hierarchy of parameter sets, {Θ } L =0 , where Θ L is the parameter set corresponding to the target (strongly nonlinear) problem to be solved. For instance, we can define a parametric hierarchy using the set of van Genuchten parameters, e.g. Θ = {α (bl) , n (bl) }. The approximation of the pressure head on the level at T f inal is denoted by p h ,Θ . Using the linearity of the expectation operator, one can define the expected value of the pressure head on the finest level, L, with the original parameter set, Θ L , by the following telescopic sum: is the standard MC estimator obtained by averaging N independent, identically distributed (i.i.d.) samples as Similarly, a multilevel estimator for the variance of the pressure head, V[p h L ,Θ L ], can be defined as Again, the computational savings for the variance estimator (5.5) are obtained by computing individual The above variance estimator can be seen as an extension of the standard multilevel variance estimator proposed in [47].
For the multilevel estimators, an appropriate spatial interpolation procedure is required to combine expectations from all levels. Typically, the polynomial order of the interpolation scheme should be equal to or higher than the order of the discretization to avoid any additional dominant source of error. In some more detail, when using the estimator (5.3) to compute E M L L [p h L ,Θ L ], we begin by computing E M C N0 [p h0,Θ0 ] on the coarsest grid D 0 . This quantity is then interpolated to the next finer grid D 1 and is added to ]. This is again interpolated to the next grid level D 2 and added to the next correction term E M C N2 [p h2,Θ2 − p h1,Θ1 ]. This process is repeated until the final level is reached.

Accuracy of MLMC estimator
Throughout this paper, we use the L 2 − based norm for the error analysis of the multilevel Monte Carlo estimator. We assume that the pressure considered belongs to the functional space (5.7) The mean-square error (MSE) in E M L L [p h L ,Θ L ] can then be expressed as the sum of the discretization and the sampling errors as Both errors in the MLMC estimator can be dealt with separately. The discretization error can be quantified as: where c 0 is a constant independent of h L but depending on the parameter set Θ L . The rate a typically depends on the regularity of the PDE and the accuracy of the discretization. The next task is to bound the sampling errors. As the MLMC estimator E M L L [p h L ,Θ L ] is composed of L + 1 independent MC estimators, the sampling error in the MLMC estimator is just the sum of sampling errors from the individual MC estimators. Therefore, see [48,49] for a proof. Obtaining a bound on the level-variance ||V || L 2 (D) is more involved due to its dependence on the grid size h as well as on the nonlinearity parameter set Θ . We numerically estimate it by To achieve a tolerance of ε, one needs to ensure that with d the number of spatial dimensions. As proposed in [18,19], the number of samples at different levels is typically derived by minimizing the total cost such that the sampling error of the MLMC estimator reduces below ε 2 , i.e., Using the standard Lagrange multiplier approach [18], gives us and hence the total cost to obtain a tolerance of ε is given by In the above formula, the product ||V || L 2 (D) W determines the cost contribution from any level . For instance, if the product decays with increasing , the dominant cost comes from the coarsest level whereas if the product grows with , the dominant contribution comes from the finest level.
Remark 5.1. The optimal number of samples given in (5.14) is based on a pre-defined hierarchy of parameters {Θ } L =0 . A more general approach is to find N along with the parameter set {Θ } L =0 for which the total cost of the MLMC estimator is minimum. Solving such optimization problem analytically is non-trivial. Furthermore, numerically obtaining the best values for Θ can also be highly expensive. In the numerical experiments section, we will discuss some heuristics that can be applied to find Θ .

MLMC algorithm with parametric continuation
To compute the estimator E M L L [p h L ,Θ L ], the standard MLMC algorithm from [18,19] cannot be directly employed as it requires solving the same problem on all grid levels. Here, we describe a modified version of the standard MLMC technique to compute E M L L [p h L ,Θ L ]. This algorithm assumes that the total number of levels in the MLMC hierarchy and the values of the nonlinearity parameters Θ for all levels are known in advance. The algorithm can be described by the following steps: Algorithm 1 P C_M LM C algorithm 1: Fix the tolerance ε, D , T , Θ and warm-up samples N * for = 0, 1, 2, ..., L.
and ||V || L 2 (D) using samples N = N * for all levels. 3: Update N using the formula (5.14) for all levels. In the above algorithm, the value of N * should not be set too high, especially not on the finest level, in order to avoid oversampling. Further, the cost per sample W can also be estimated "on-the-fly" by averaging the CPU times from the computation of warm-up samples.

Numerical experiments
We evaluate the performance of the new MLMC estimator and study the improvements with respect to the standard MLMC estimator. For all the experiments, we use the infiltration problem with conditions given in (4.9), however, with T f inal = 0.2 [h] (in hours) and the two Matérn covariance parameters from Table 1. We employ a geometric hierarchy of spatio-temporal grids with refinement factor s = 2 in (5.1) and we use h = ∆t . For all experiments, the following baseline values are prescribed, K  Table 3. The sampling and upscaling procedure for a Gaussian random field is described in Appendix A. The sampling of random fields with uniform marginal is described in Section 3. Note that the above stochastic model is extremely high-dimensional as it comprises five independent random fields. For each random field the degree of freedom is equal to the number of grid points in the sampling mesh. The dimensionality can be reduced using the KL-expansion method (3.5), however as we use random fields with small correlation lengths, we will still need to use a very large number of KL-modes for an accurate representation of these random fields.

Convergence of discretization bias
We begin by analyzing the reduction of the discretization error ||p h − p h −1 || L 2 (Ω;D) with respect to mesh refinement for different baseline values of α (bl) and n (bl) . The relative error is used to bound the exact discretization bias as where a is the convergence rate defined in (5.9). The relative errors for, Φ 1 and Φ 2 are presented in the left and right pictures in Figure 6, respectively. For both cases a convergence rate close to first-order is observed, i.e. a ≈ 1. The convergence rate typically depends on the order of the spatio-temporal discretization scheme as well as on the smoothness parameter ν c in the covariance function. In fact, the dominant error comes from the first-order accurate backward Euler time discretization. The magnitude of the error grows with increasing α (bl) and reduces with increasing n (bl) values. Note that for the most difficult cases, n (bl) = 1.45, α (bl) = 3.0 for Φ 1 and n (bl) = 1.55, α (bl) = 2.8 for Φ 2 , the convergent solutions are obtained from h = 1/64 onwards.
(a) Φ1 (b) Φ2 Figure 6: Convergence of discretization error with respect to mesh refinement for different baseline values of α (bl) and n (bl) .

MLMC simulation
Here, we describe the algorithm to compute the multilevel estimator E M L L [p h L ,Θ L ]. We perform the MLMC simulations for two test cases based on Φ 1 and Φ 2 , respectively. The original problem for Φ 1 uses We first investigate the correlations for the pressure head profiles when the baseline values for α (bl) and n (bl) are varied however employing the same random fields. In Figure 7, we compare three pressure head solutions with different baseline values, and with the same random fields, Z(x, ω), ε θs (x, ω), ε θr (x, ω), ε α (x, ω) and ε n (x, ω) (see Section 3). Clearly, the pressure head profile becomes more diffusive when "easier" parameters are prescribed. We also compare the cross sections of the pressure head profiles at x = 0.5 in    Next, we study the behavior of the level-dependent variance ||V || L 2 (D) when using the parametric continuation approach. For this we define the so-called parametric continuation variables, ∇α = α − α −1 and ∇n = n −1 −n , with the purpose to reduce the nonlinearity when processing coarse grids. In Figure 9, the discretization error shows a first-order decay (see Figure 6), we set the tolerance ε = O(h L ). We use Algorithm 1 to reduce the sampling error to ε. In Table 4, the two estimators for the isotropic Matérn parameter Φ 1 are compared. Due to the sample optimization strategy (5.14), a large number of samples is shifted to coarser grids when using the P C_M LM C estimator. Furthermore, a fewer number of samples are required for the P C_M LM C estimator compared to the Std_M LM C, even on the finest level. This is due the fact that the sum L =0 ||V || L 2 (D) W for the P C_M LM C estimator is slightly smaller than for Std_M LM C. Moreover, a large computational gain is induced by the reduction in the number of samples on grid h = 1/64, for instance, for ε = 0.005, the number of samples reduced from 438 to 35 when using the parametric continuation. In Figure 10 (a) the CPU times for the two estimators are also compared. We observe a speed-up of about a factor of three for ε = 0.005.
A similar test is performed for the anisotropic problem. The number of samples for different tolerances are provided in Table 5 and the CPU times in Figure 10 (b). Again some improvement in computation times are observed, although the gain is not as pronounced as for the first problem. This is due to the fact that the second case uses simpler baseline values n (bl) = 1.55, α (bl) = 2.8 and the cost reduction with parameter simplification is not very rapid. For the isotropic case with n (bl) = 1.45, α (bl) = 3.0, the cost decay is more rapid with parameter simplification. This is more evident from the cost map in Figure   4, where we see more dense contour lines around n (bl) = 1.45. Therefore, the parametric continuation approach is very effective when a strongly nonlinear stochastic problem needs to be solved.
We also wish to highlight the fact that both MLMC estimators are optimal since the cost scales as      Table 5: Comparison of number of samples needed to achieve tolerances ε using the standard MLMC (Std_M LM C) and parametric continuation MLMC (P C_M LM C) estimators for Φ 2 , n (bl) = 1.55, α (bl) = 2.8.
In the last part of this section, we validate the stochastic moments computed using the proposed estimator. It is expected that the mean pressure field computed using the two MLMC estimators should converge to a similar solution for a given tolerance. In Figure 11, the mean pressure head profile for the isotropic case is shown. It is computed using the number of samples from Table 4, with ε = 0.005. For a closer inspection, we also compare the mean pressure head profiles at x = 0.5. Similarly, the mean profile for the anisotropic case is presented in Figure 12, using the number of samples from Table 5 with ε = 0.0046.
We see good agreement between the mean profiles computed from the two estimators. The isotropic case exhibits a seemingly smoother transition from the saturated to the unsaturated zone compared to the anisotropic problem. In Figure 13 we also present the variance field for the isotropic test case, computed using the multilevel variance estimator V M L L [p h L ,Θ L ] given in (5.5). The two variance fields are very similar, although some discrepancy in the magnitude is observed. This is due to the fact that the two variance fields are computed using the samples based on the error analysis of E M L L [p h L ,Θ L ] (from Table 4) and not on the The results from the two estimators also showed good agreement with the plain Monte Carlo solutions performed on the grid h L = 1/128. This is done in order to verify that a proper upscaling of the random fields on coarser levels is carried out while using the MLMC estimator. These results are omitted for the sake of brevity.

Conclusion
In this work, an efficient uncertainty propagation method for a high-dimensional stochastic extension of Richards' equation was proposed. All the soil parameters were treated as unknown and modeled as random fields with appropriate marginal distributions. We also studied a modified Picard iteration and cell-centered multigrid method for solving the nonlinear systems with heterogeneous coefficients. We found that the combined solver is robust for a wide parameter range and the performance further improves with spatio-temporal refinements. This combination of solvers is general, therefore, its robustness can be further   is also applicable to other parameter dependent nonlinear PDEs. One of the research problems that needs to be addressed is finding a computationally viable way of obtaining optimal step sizes for the nonlinear parameters used in continuation. This problem will be actively investigated in future work.

Appendix A. Sampling and upscaling of Gaussian random fields
In this section, we outline the procedure for sampling Gaussian random fields that is used to sample the hydraulic conductivity fields (3.1) and also the non-Gaussian soil parameters (3.8). For this, we use the Fast Fourier Transform Moving Average (FFT-MA) algorithm from [35]. Although this sampling method is similar to the Cholesky decomposition technique, the FFT-MA method achieves a faster factorization of a covariance matrix by making the computational domain periodic. The resulting covariance operator is also periodic and can be decomposed as a convolutional product. This allows us to compute the samples of the random fields using cheaper vector-vector products compared to the expensive matrix-vector operation when using Cholesky factorization. Next, we provide a brief description of FFT-MA method from [35].
When using the Cholesky factorization, the samples of correlated Gaussian random vectors z(ω) can be obtained as: where F denotes the discrete FFT and · denotes component-wise multiplication. As the FFT operation requires a periodic signal, first we transform the vector c into a periodic signal, which is also real, positive and symmetric. For more details on the practical aspects of this transformation see [21,33,34,50]. Here It is pointed out that due to the periodicity in the covariance vector c, the resulting random field z is also periodic. Therefore, we only retain the part of the vector that corresponds to the physical domain and the remaining part is discarded. Also note that it takes two FFT evaluations to obtain one sample of z (ignoring the FFT operation in (A.4) that is performed just once). For a given mesh, the cost of sampling random fields is negligible compared to the cost of solving the nonlinear PDE using the modified Picard-CCMG solver.
Next, we describe the upscaling procedure for the random fields from grid D to D −1 . For clarity, we denote the above vectors computed on mesh D with subscript , for example, z , s , y . While estimating the correction term E M C N [p h ,Θ − p h −1 ,Θ −1 ] in the telescopic sum (5.3), the approximations p h ,Θ (ω i ) and p h −1 ,Θ −1 (ω i ) need to be positively correlated such that the level dependent variance ||V || L 2 (D) is small (see Eq. (5.11)). This is typically achieved by first sampling the fine grid Gaussian random field z to compute p h ,Θ (ω i ) and using an upscaled version z −1 for p h −1 ,Θ −1 (ω i ). While performing such upscaling of random fields, it is important to ensure that the telescopic sum ( Using spatial averaging for obtaining an upscaled version may result in a modified covariance structure on the coarser levels, violating (A.7). This issue can be avoided by using the covariance upscaling [21] that employs the spectral generator on two consecutive grids using the same normally distributed vector y . When using the FFT-MA algorithm, the vector y is associated with respective grid points, coarser realizations of the fine grid Gaussian random field z can be obtained by using multi-dimensional averaging of the vector y . For instance, in two dimensions for the cell-centred mesh, y i,j −1 = 1 2 (y 2i−1,2j−1 + y 2i−1,2j + y 2i,2j−1 + y 2i,2j ), (A. 8) where (i, j) is the cell index for the mesh D −1 . The scaling by a factor 2 is needed to obtain a standard normal distribution for the averaged quantity y i,j −1 . The coarse random field can now be simply assembled as z −1 = F −1 (F(s −1 ) · F(y −1 )). (A.9) This process can be recursively applied to generate upscaled random fields on next coarser scales. As the averaging in (A.8) smooths out high frequencies, the upscaled version z −1 will also be smoother compared to z . These upscaled Gaussian random fields can be utilized to generate upscaled non-Gaussian fields using (3.8).